Chip ___ DAA - Intel like instruction qestion?
Sapieha
Posts: 2,964
Hi Chip.
It is possible made that instruction?
Decimal Adjust Accumulator instruction
(DAA) to correct the result of a packed BCD
Description:
This instruction conditionally adjusts the Accumulator for BCD addition and
subtraction operations. For addition (ADD, ADC, INC) or subtraction (SUB,
SBC, DEC, NEG), the following table indicates the operation performed:
It is possible made that instruction?
Decimal Adjust Accumulator instruction
(DAA) to correct the result of a packed BCD
Description:
This instruction conditionally adjusts the Accumulator for BCD addition and
subtraction operations. For addition (ADD, ADC, INC) or subtraction (SUB,
SBC, DEC, NEG), the following table indicates the operation performed:
Comments
Please to God no DAA.
As a builder of 8080 and Z80 emulators perhaps I should welcome such an idea.
But what I found was that 8080 and Z80 DAA instructions do not do the same thing exactly. The descriptions of what DAA does in the data sheets does not exactly describe what they do correctly. Every emulator ever written has had bugs in their implementations of DAA. It took pullmoll and I quite a while to perfect DAA emulation on the Propeller.
On top of that as far as I can tell almost no 8080/Z80 code, at least in the CP/M world, uses it. Compilers will never use it.
Seems to me getting Chip to try and add a DAA instruction to the Prop would crash the entire project.
DAA is the most un-RISC and un-prop like instruction I can imagine.
That's all I have to say about DAA.
Correctly --- It simple convert - Byte hex value to 2 last significant BCD digits and in C-flag shows if it was overflow.
In propeller world Can use 24 bit's value and convert it to BCD.
As many of sensors Output BCD -- That can help with manipulating data from them ---- For addition, subtraction and more!
You want to convert 24 bits to BCD ? in one opcode (why not go for 26.57 bits to fully pack 8 digit BCD ?)
Yes, that would be cool, but I suspect not small in silicon!.
I've checked a 32 bit DAA library I have here, and it needs ~1688 cycles
With one clock DIV --- It is only one clock BCD-DIGIT
I'm are to sick -- So I have not looked on VHDL, Verilog code for that.
BUT - My concern is more on LOGIC needed to that ---
With correct coding -- 24 bit Can be done in 1-4 Clock's
Yes very simple conversion. Here is how to do it in C, from https://github.com/redcode/Z80: Anyone know if that is correct?
... a lot of silicon maybe to get 1-4 clocks ?
Take this example, if you have a 32/32 DIV that gives both DIV and REM, my calc gives this
98765432/1e6 = 98.765432 (DAA 98 )
98765432%1e6 = 765432
98765432%1e6/1e4 = 76.5432 (DAA 76 )
98765432%1e6%1e4 = 5432
98765432%1e6%1e4/1e2 = 54.32 (DAA 54 )
98765432%1e6%1e4%1e2 = 32 (DAA 32 )
so that needs 3 Div.Rem operations, and 4 8 bit DAA's plus some placement shuffling.
Of course, if you have 32/32, you can just use more Div.Rem, and skip DAA entirely.
A later derivative for the Nintendo Entertainment System dropped this mode of operation. And that led to Eric's partial 6502 core being reasonable enough to form the basis for the "impossible" NES emulation on P1.
@Heater: Ouch! That's a painful instruction.
Seems to me, optimizing a conversion function resolves this nicely. Once that conversion is done, we've got great math on the P2, and none of it is in decimal mode. Convert, deal with the data, done. Worst case, convert, deal with it, convert back, done.
Edit: As noted, the silicon cost is high!
This instruction is much used in Real Control Systems.
- Nitendo else like NOT need that
Of course the DAA instruction was an ugly kludge to supposedly simplify and speed up BCD addition and subtraction. It was to be used in conjunction with ADD and SUB to get the BCD operation you want.
The Motorola 68000 tried to do things the right way by providing instructions for BCD arithmetic, ABCD, SBCD. I bet they were never used either:)
I would imagine somewhere in your system you will want to use multiply or divide on you BCD numbers in which case it's time to convert them to normal integers anyway. Or you will be using those values in such a way that the normal integers is what you want. All in all DAA is a waste of silicon.
If you do have sensors producing BCD perhaps an instruction to quickly convert them to normal integers is in order. And perhaps back again. But do such things really need the speed?
Is there even one use case in the Propeller world?
Not entirely : In the 8 bit space, with limited or no MUL/DIV, then using a DAA based library can give a compact and fast way (one lib call) to prepare up to a 32 bit binary value for easy passing to fast display or string export.
Probably more commonly seen with assembler, as the HLLs do not really know what to do with a 5 Byte BCD array.
You can use base-100, then you only have to divide by 10 and 100 (you could use a table...).
The BCD instructions in the z80 where extensively used by TI in its calculators (TI-85 and descendants). The TI-92, I'd assume uses base-100 or base-10000 like the utility bc.
It is i8085 Schematic of the decimal adjust circuitry in the 8085 microprocessor.
Did they open source the TI calculator code, how do we know about their use of DAA?
Sapieha,
That ugliness (the schematic) should not be allowed din the Prop:)
Anyone writing a Prop II simulator in future will curse anyone who puts a DAA in there:)
In case anyone is wondering, DAA is not even a binary to BCD conversion, it's a hack in the processor that you have to use after you have used ADD or SUB on BCD encoded numbers to get a correct BCD encoded result. "Decimal Adjust for Addition". This must get even uglier for operating on 32 bit results to get 8 BCD digits.
That SCH come from.
http://www.righto.com/2013/08/reverse-engineering-8085s-decimal.html
As a forefront statement, as Ale, I love BCD maths.
I believe I has almost understood it all, but one intriguing question made me scratch my head, since last dawn, when I first read your sugestion, for Chip to create the DAA instruction.
As for your main concerns, are you asking for a simple adjust scheme, after some binary ADD or SUB being done, between TWO ALREADY 8 digit DECIMAL operands?
If the above holds true, as for your next main concerns, in priority order, its about code density (the lesser the number of instructions to achieve the solution, the better), or about number of cycles to have it performed (the faster it can be done, the better)?
Yanomani
Someone with a FPGA P2 could code a Div.Rem version of 32bin -> 8.BCD, and check the speed, and then see if there is any DAA variant that might improve on that in size and speed.
Size probably matters more than speed, and adding a NextDiv = NextDiv/1e2 in a loop should shrink code into a loop, at a small speed cost.
It's about 22 cycles, but 17 of them can be used to execute other instructions for optimized code.
Andy
Thanks. In the most compact form I make it two Div.rem operations per moved digit, and the tail code of shift/merge/loop could easily fit within 17, so an estimate is 7*(22+22) to convert 98765432, and ~ 9*(22+22) to convert 4231043210. (less for smaller numbers)
- under 5us, even at 80MHz (and 17x faster than the looping 8 bit DAA code)
I'm just wondering if it can be made at the Hub to Cog circuitry level.
Perhaps in some exchange alike behavior, then, the same instruction can left one value or a pair of them, to be adjusted/converted/operated, as it reads back the result of a previously requested one.
Maybe like an alternate QUADS behavior, using its logic construction, with alternate source/destinations at Hub level.
Then the operation can be done, as the Hub progresses its cycles, and the result, returned eight cycles later, from the point where the requested operation was first served.
This is meant to not bloating Cog circuitry, at the expense of a little time penalty.
Yanomani
Surely DAA, or any instruction almost, is there to speed things up. Making it dependent on HUB timing slows it down a lot. Thus cancelling out any benefit. There is no point in implementing it then.
Anyone,
Can you point us to a Z80 BCD arithmetic library? I mean one that does maths on 16 or 32 bit numbers in BCD or calculator style floats using BCD. I would have guessed that for any significant calculation it would have been smaller and quicker to convert the BCD input to binary, do the calculation, and then conver the result back to BCD.
Google finds this :
SDCC: DA A is used in printf_fast and printf_tiny. But it is never generated by SDCC.
XCHD is never used nor generated.
Heater
Was exactly about this I was thinking, when I wrote my last post.
In single task execution, its possible to gather some data, operate on it, and store the results in just a few clock cycles.
But, when it comes to a multitasking environment, the total number of cycles between gathering the operand, having it adjusted, and store the results, considerably increases, in the inverse proportion to the number of simultaneous running tasks, in a given Cog.
Another info that matters on the subject: For DAA to operate over up to eight digits, simultaneously, it certainly will consume a lot of logic. If the final goal, will be the existence of BCD maths variants for multiplication and division, more silicon real state will then be used.
If it can be a multicycle instruction, or instructions, in the case of BCD maths, then code density will be priorized, at the expense of time.
Suposing that only DAA will exist, allowing adjutment for a binary number, resulting of binary adition and subtraction, on up to eight bcd digits, then multiplication and division certainly will demand dual conversions, twice before to convert the operands, and single after to convert the result.
EDIT. Or it will rely in looping BCD aditions and DAA adjustments, to do multiplication, for example, as we used to do in the Z80.
After all, time will pass, whichever solution is taken. Unless Chip decides to craft, a full BCD version of the current math operations,
Yanomani
P.S. Compucorp 400 series microprocessors had a single flag bit, to control wheter the full set of maths, operated BCD wise, or Binary wise. It had 52 mantissa digits, two exponent digits and a nibble, to hold signs for the mantissa and the exponent parts. Two bits, at the sign nibble, held unused. The main accumulator also had two guard digits, upper and lower, to allow for over and underflow control. I'm counting my age in eons.:nerd:
Edit: The above underlined text, about the number of mantissa digits, was a mistake that I made, when I initially posted it.
The exact expression should read:
13 mantissa digits (52 bits)
Sorry for this mistake, I'll try to be extra carefull in the future.
Yanomani
Sorry I don't answered to all post's of interest ...... Buy some day's I have BIG war with my sickness.
I even don't know if I can run REAL P2 --- Life will show that --- So i can only dream for possibility on have P3 - Verilog code
I have readed in Yours answers botch good and bad arguments
Thanks for Input.
For me IF it is only DAA on only one BYTE is still perfect! --- That can simplify simulating 8-Bit micros.
And don't say to me -- It is not needed ---- That micros have it and for correct function need that.
Heater:
I know it can be done in code --- But how much valuable space of COG it will use instead of simple instruction?
Sapieha
After suffering from pneumonia, twice, in the past ten months, both lungs compromised, two times. I understand your thoughts.
But if life is short, sure we can make it happy, if we only smile twice a day.
Yanomani
P.S. Smoke is for reversed polarized tantalum capacitors, not lungs!
A simple question of mine.
A table driven conversion, as the one suggested by Ale at post #12, does not solves your dual BCD digit problem?
Sure it will cost four instructions, if I counted it right, and some Hub memory (and time). But it does not involve a lot of decisions and other operations. It will use only the discrimination between adition and subtraction, to properly acquire the results.
Yanomani
I do hope we all live long enough to see the P2, and even the P3 !
Both Zicog and PullMoll's Z80 emulators implement the DAA instruction logic in LMM code. So no valuable COG space is taken.
That is very slow of course but it does not matter as DAA is never used except in the instruction set testing programs
You can also look it up in your own calc. I do not think that they posted any code themselves, though. Some years ago it was all the rage to get the ROM entry points... The ROMS are available through the emu sites. I have done extensive analysis and named/found many routines
here some extracts from my own code-analyzer:
enjoy