Chip ___ DAA - Intel like instruction qestion?

Sapieha · 2013-10-09 21:19

Hi Chip.

It is possible made that instruction?

Decimal Adjust Accumulator instruction
(DAA) to correct the result of a packed BCD

Description:
This instruction conditionally adjusts the Accumulator for BCD addition and
subtraction operations. For addition (ADD, ADC, INC) or subtraction (SUB,
SBC, DEC, NEG), the following table indicates the operation performed:

Heater. · 2013-10-09 21:30

Sapieha,

Please to God no DAA.

As a builder of 8080 and Z80 emulators perhaps I should welcome such an idea.

But what I found was that 8080 and Z80 DAA instructions do not do the same thing exactly. The descriptions of what DAA does in the data sheets does not exactly describe what they do correctly. Every emulator ever written has had bugs in their implementations of DAA. It took pullmoll and I quite a while to perfect DAA emulation on the Propeller.

On top of that as far as I can tell almost no 8080/Z80 code, at least in the CP/M world, uses it. Compilers will never use it.

Seems to me getting Chip to try and add a DAA instruction to the Prop would crash the entire project.

DAA is the most un-RISC and un-prop like instruction I can imagine.

That's all I have to say about DAA.

Sapieha · 2013-10-09 21:39

Hi Heater.

Correctly --- It simple convert - Byte hex value to 2 last significant BCD digits and in C-flag shows if it was overflow.

In propeller world Can use 24 bit's value and convert it to BCD.
As many of sensors Output BCD -- That can help with manipulating data from them ---- For addition, subtraction and more!

Heater. wrote: »

Sapieha,

Please to God no DAA.

As a builder of 8080 and Z80 emulators perhaps I should welcome such an idea.

But what I found was that 8080 and Z80 DAA instructions do not do the same thing exactly. The descriptions of what DAA does in the data sheets does not exactly describe what they do correctly. Every emulator ever written has had bugs in their implementations of DAA. It took pullmoll and I quite a while to perfect DAA emulation on the Propeller.

On top of that as far as I can tell almost no 8080/Z80 code, at least in the CP/M world, uses it. Compilers will never use it.

Seems to me getting Chip to try and add a DAA instruction to the Prop would crash the entire project.

DAA is the most un-RISC and un-prop like instruction I can imagine.

That's all I have to say about DAA.

jmg · 2013-10-09 22:00

Sapieha wrote: »

Correctly --- It simple convert - Byte hex value to 2 last significant BCD digits and in C-flag shows if it was overflow.
In propeller world Can use 24 bit's value and convert it to BCD.

You want to convert 24 bits to BCD ? in one opcode (why not go for 26.57 bits to fully pack 8 digit BCD ?)

Yes, that would be cool, but I suspect not small in silicon!.

I've checked a 32 bit DAA library I have here, and it needs ~1688 cycles

Sapieha · 2013-10-09 22:11

Hi jmg.

With one clock DIV --- It is only one clock BCD-DIGIT
I'm are to sick -- So I have not looked on VHDL, Verilog code for that.
BUT - My concern is more on LOGIC needed to that ---
With correct coding -- 24 bit Can be done in 1-4 Clock's

jmg wrote: »

You want to convert 24 bits to BCD ? in one opcode (why not go for 26.57 bits to fully pack 8 digit BCD ?)

Yes, that would be cool, but I suspect not small in silicon!.

I've checked a 32 bit DAA library I have here, and it needs ~1688 cycles

Heater. · 2013-10-09 22:23

Sapieha,

Yes very simple conversion. Here is how to do it in C, from https://github.com/redcode/Z80:

INSTRUCTION(daa)

	{
	quint8 t = A; PC++;
	if (F & NF)
		{
		if (F_H || (A & 0xF ) > 9) t -= 6;
		if (F_C ||  A > 0x99)	   t -= 0x60;
		}
	else	{
		if (F_H || (A & 0xF ) > 9) t += 6;
		if (F_C ||  A > 0x99)	   t += 0x60;
		}
	F =	(F & NF)		/* NF unchanged			*/
		| (t & SYXF)		/* SF = A.7; YF = A.5; XF = A.3	*/
		| ZF_ZERO(t)		/* ZF = !A			*/
		| ((A & HF) ^ (t & HF))	/* HF = pA.4 xor A.4		*/
		| (F_C | (A > 0x99));	/* CF = CF | (pA > 0x99)	*/
	A = t;
	CYCLES(4);
	}

Anyone know if that is correct?

jmg · 2013-10-09 22:41

Sapieha wrote: »

Hi jmg.

With one clock DIV --- It is only one clock BCD-DIGIT
I'm are to sick -- So I have not looked on VHDL, Verilog code for that.
BUT - My concern is more on LOGIC needed to that ---
With correct coding -- 24 bit Can be done in 1-4 Clock's

... a lot of silicon maybe to get 1-4 clocks ?

Take this example, if you have a 32/32 DIV that gives both DIV and REM, my calc gives this

98765432/1e6 = 98.765432 (DAA 98 )
98765432%1e6 = 765432
98765432%1e6/1e4 = 76.5432 (DAA 76 )
98765432%1e6%1e4 = 5432
98765432%1e6%1e4/1e2 = 54.32 (DAA 54 )
98765432%1e6%1e4%1e2 = 32 (DAA 32 )

so that needs 3 Div.Rem operations, and 4 8 bit DAA's plus some placement shuffling.
Of course, if you have 32/32, you can just use more Div.Rem, and skip DAA entirely.

potatohead · 2013-10-09 22:51

6502 had the BCD mode. It went largely unused as you describe for 8080 / Z80 Heater. Some assembly language programmers used it once in a while, but it was not targeted by compilers.

A later derivative for the Nintendo Entertainment System dropped this mode of operation. And that led to Eric's partial 6502 core being reasonable enough to form the basis for the "impossible" NES emulation on P1.

@Heater: Ouch! That's a painful instruction.

Seems to me, optimizing a conversion function resolves this nicely. Once that conversion is done, we've got great math on the P2, and none of it is in decimal mode. Convert, deal with the data, done. Worst case, convert, deal with it, convert back, done.

Edit: As noted, the silicon cost is high!

Sapieha · 2013-10-09 22:57

Hi potatohead

This instruction is much used in Real Control Systems.
- Nitendo else like NOT need that

potatohead wrote: »

6502 had the BCD mode. It went largely unused as you describe for 8080 / Z80. Some assembly language programmers used it once in a while, but it was not targeted by compilers.

A later derivative for the Nintendo Entertainment System dropped this mode of operation. And that led to Eric's partial 6502 core being reasonable enough to form the basis for the "impossible" NES emulation on P1.

@Heater: Ouch! That's a painful instruction.

Seems to me, optimizing a conversion function resolves this nicely. Once that conversion is done, we've got great math on the P2, and none of it is in decimal mode. Convert, deal with the data, done. Worst case, convert, deal with it, convert back, done.

Heater. · 2013-10-09 23:23

Never seen DAA used in any control system I have worked on. Ah well.

Of course the DAA instruction was an ugly kludge to supposedly simplify and speed up BCD addition and subtraction. It was to be used in conjunction with ADD and SUB to get the BCD operation you want.

The Motorola 68000 tried to do things the right way by providing instructions for BCD arithmetic, ABCD, SBCD. I bet they were never used either:)

I would imagine somewhere in your system you will want to use multiply or divide on you BCD numbers in which case it's time to convert them to normal integers anyway. Or you will be using those values in such a way that the normal integers is what you want. All in all DAA is a waste of silicon.

If you do have sensors producing BCD perhaps an instruction to quickly convert them to normal integers is in order. And perhaps back again. But do such things really need the speed?

Is there even one use case in the Propeller world?

jmg · 2013-10-09 23:52

Heater. wrote: »

I would imagine somewhere in your system you will want to use multiply or divide on you BCD numbers in which case it's time to convert them to normal integers anyway. Or you will be using those values in such a way that the normal integers is what you want. All in all DAA is a waste of silicon.

Not entirely : In the 8 bit space, with limited or no MUL/DIV, then using a DAA based library can give a compact and fast way (one lib call) to prepare up to a 32 bit binary value for easy passing to fast display or string export.

Probably more commonly seen with assembler, as the HLLs do not really know what to do with a 5 Byte BCD array.

Ale · 2013-10-10 00:16

As someone who loves BCD, I'd say... there are other ways to do it. (And I wrote some BCD-aware code in the MATH page below..., not really fast but it works

)
You can use base-100, then you only have to divide by 10 and 100 (you could use a table...).
The BCD instructions in the z80 where extensively used by TI in its calculators (TI-85 and descendants). The TI-92, I'd assume uses base-100 or base-10000 like the utility bc.

Sapieha · 2013-10-10 02:17

Hi.

It is i8085 Schematic of the decimal adjust circuitry in the 8085 microprocessor.

Heater. · 2013-10-10 03:00

Ale,

Did they open source the TI calculator code, how do we know about their use of DAA?

Sapieha,

That ugliness (the schematic) should not be allowed din the Prop:)

Anyone writing a Prop II simulator in future will curse anyone who puts a DAA in there:)

In case anyone is wondering, DAA is not even a binary to BCD conversion, it's a hack in the processor that you have to use after you have used ADD or SUB on BCD encoded numbers to get a correct BCD encoded result. "Decimal Adjust for Addition". This must get even uglier for operating on 32 bit results to get 8 BCD digits.

Sapieha · 2013-10-10 03:19

Hi Heater.

That SCH come from.

http://www.righto.com/2013/08/reverse-engineering-8085s-decimal.html

Heater. wrote: »

Ale,

Did they open source the TI calculator code, how do we know about their use of DAA?

Sapieha,

That ugliness (the schematic) should not be allowed din the Prop:)

Anyone writing a Prop II simulator in future will curse anyone who puts a DAA in there:)

In case anyone is wondering, DAA is not even a binary to BCD conversion, it's a hack in the processor that you have to use after you have used ADD or SUB on BCD encoded numbers to get a correct BCD encoded result. "Decimal Adjust for Addition". This must get even uglier for operating on 32 bit results to get 8 BCD digits.

Yanomani · 2013-10-10 07:34

Sapieha

As a forefront statement, as Ale, I love BCD maths.

I believe I has almost understood it all, but one intriguing question made me scratch my head, since last dawn, when I first read your sugestion, for Chip to create the DAA instruction.

As for your main concerns, are you asking for a simple adjust scheme, after some binary ADD or SUB being done, between TWO ALREADY 8 digit DECIMAL operands?

If the above holds true, as for your next main concerns, in priority order, its about code density (the lesser the number of instructions to achieve the solution, the better), or about number of cycles to have it performed (the faster it can be done, the better)?

Yanomani

jmg · 2013-10-10 12:16

I cannot see a cycles count for the P2 64/32 divider mentioned ?

Someone with a FPGA P2 could code a Div.Rem version of 32bin -> 8.BCD, and check the speed, and then see if there is any DAA variant that might improve on that in size and speed.

Size probably matters more than speed, and adding a NextDiv = NextDiv/1e2 in a loop should shrink code into a loop, at a small speed cost.

Ariba · 2013-10-10 12:37

jmg wrote: »

I cannot see a cycles count for the P2 64/32 divider mentioned ?
..

It's about 22 cycles, but 17 of them can be used to execute other instructions for optimized code.

Andy

jmg · 2013-10-10 13:21

Ariba wrote: »

It's about 22 cycles, but 17 of them can be used to execute other instructions for optimized code.

Thanks. In the most compact form I make it two Div.rem operations per moved digit, and the tail code of shift/merge/loop could easily fit within 17, so an estimate is 7*(22+22) to convert 98765432, and ~ 9*(22+22) to convert 4231043210. (less for smaller numbers)
- under 5us, even at 80MHz (and 17x faster than the looping 8 bit DAA code)

Yanomani · 2013-10-10 13:35

Hi Sapieha,

I'm just wondering if it can be made at the Hub to Cog circuitry level.

Perhaps in some exchange alike behavior, then, the same instruction can left one value or a pair of them, to be adjusted/converted/operated, as it reads back the result of a previously requested one.

Maybe like an alternate QUADS behavior, using its logic construction, with alternate source/destinations at Hub level.

Then the operation can be done, as the Hub progresses its cycles, and the result, returned eight cycles later, from the point where the requested operation was first served.
This is meant to not bloating Cog circuitry, at the expense of a little time penalty.

Yanomani

Heater. · 2013-10-10 22:05

Yanomani,

Surely DAA, or any instruction almost, is there to speed things up. Making it dependent on HUB timing slows it down a lot. Thus cancelling out any benefit. There is no point in implementing it then.

Anyone,

Can you point us to a Z80 BCD arithmetic library? I mean one that does maths on 16 or 32 bit numbers in BCD or calculator style floats using BCD. I would have guessed that for any significant calculation it would have been smaller and quicker to convert the BCD input to binary, do the calculation, and then conver the result back to BCD.

jmg · 2013-10-10 22:38

Not Z80, but 8051, which has a DA A and is probably more widely used now than Z80,

Google finds this :

SDCC: DA A is used in printf_fast and printf_tiny. But it is never generated by SDCC.
XCHD is never used nor generated.

Yanomani · 2013-10-10 23:12

Heater. wrote: »

Yanomani,

Surely DAA, or any instruction almost, is there to speed things up. Making it dependent on HUB timing slows it down a lot. Thus cancelling out any benefit. There is no point in implementing it then.

Anyone,

Can you point us to a Z80 BCD arithmetic library? I mean one that does maths on 16 or 32 bit numbers in BCD or calculator style floats using BCD. I would have guessed that for any significant calculation it would have been smaller and quicker to convert the BCD input to binary, do the calculation, and then conver the result back to BCD.

Heater

Was exactly about this I was thinking, when I wrote my last post.
In single task execution, its possible to gather some data, operate on it, and store the results in just a few clock cycles.
But, when it comes to a multitasking environment, the total number of cycles between gathering the operand, having it adjusted, and store the results, considerably increases, in the inverse proportion to the number of simultaneous running tasks, in a given Cog.
Another info that matters on the subject: For DAA to operate over up to eight digits, simultaneously, it certainly will consume a lot of logic. If the final goal, will be the existence of BCD maths variants for multiplication and division, more silicon real state will then be used.
If it can be a multicycle instruction, or instructions, in the case of BCD maths, then code density will be priorized, at the expense of time.
Suposing that only DAA will exist, allowing adjutment for a binary number, resulting of binary adition and subtraction, on up to eight bcd digits, then multiplication and division certainly will demand dual conversions, twice before to convert the operands, and single after to convert the result.

EDIT. Or it will rely in looping BCD aditions and DAA adjustments, to do multiplication, for example, as we used to do in the Z80.

After all, time will pass, whichever solution is taken. Unless Chip decides to craft, a full BCD version of the current math operations,

Yanomani

P.S. Compucorp 400 series microprocessors had a single flag bit, to control wheter the full set of maths, operated BCD wise, or Binary wise. It had 52 mantissa digits, two exponent digits and a nibble, to hold signs for the mantissa and the exponent parts. Two bits, at the sign nibble, held unused. The main accumulator also had two guard digits, upper and lower, to allow for over and underflow control. I'm counting my age in eons.:nerd:

Edit: The above underlined text, about the number of mantissa digits, was a mistake that I made, when I initially posted it.
The exact expression should read:

13 mantissa digits (52 bits)

Sorry for this mistake, I'll try to be extra carefull in the future.

Yanomani

Sapieha · 2013-10-10 23:45

Hi All.

Sorry I don't answered to all post's of interest ...... Buy some day's I have BIG war with my sickness.
I even don't know if I can run REAL P2 --- Life will show that --- So i can only dream for possibility on have P3 - Verilog code

I have readed in Yours answers botch good and bad arguments

Thanks for Input.

For me IF it is only DAA on only one BYTE is still perfect! --- That can simplify simulating 8-Bit micros.
And don't say to me -- It is not needed ---- That micros have it and for correct function need that.

Heater:
I know it can be done in code --- But how much valuable space of COG it will use instead of simple instruction?

jmg wrote: »

Not Z80, but 8051, which has a DA A and is probably more widely used now than Z80,

Google finds this :

SDCC: DA A is used in printf_fast and printf_tiny. But it is never generated by SDCC.
XCHD is never used nor generated.

Yanomani · 2013-10-10 23:51

Sapieha wrote: »

Hi All.

Sorry I don't answered to all post's of interest ...... Buy some day's I have BIG war with my sickness.
I even don't know if I can run REAL P2 --- Life will show that --- So i can only dream for possibility on have P3 - Verilog code

Sapieha

After suffering from pneumonia, twice, in the past ten months, both lungs compromised, two times. I understand your thoughts.

But if life is short, sure we can make it happy, if we only smile twice a day.

Yanomani

P.S. Smoke is for reversed polarized tantalum capacitors, not lungs!

Yanomani · 2013-10-11 00:09

Sapieha

A simple question of mine.

A table driven conversion, as the one suggested by Ale at post #12, does not solves your dual BCD digit problem?
Sure it will cost four instructions, if I counted it right, and some Hub memory (and time). But it does not involve a lot of decisions and other operations. It will use only the discrimination between adition and subtraction, to properly acquire the results.

Yanomani

Heater. · 2013-10-11 00:17

Sapieha,

I do hope we all live long enough to see the P2, and even the P3 !

Both Zicog and PullMoll's Z80 emulators implement the DAA instruction logic in LMM code. So no valuable COG space is taken.

That is very slow of course but it does not matter as DAA is never used except in the instruction set testing programs

Ale · 2013-10-15 08:06

Heater,

You can also look it up in your own calc. I do not think that they posted any code themselves, though. Some years ago it was all the rage to get the ROM entry points... The ROMS are available through the emu sites. I have done extensive analysis and named/found many routines

here some extracts from my own code-analyzer:

                         ;****************************************************
                         ;*                  
                         ;*                  Sub_00038
                         ;*                  
                         ;****************************************************
                         Sub_00038:

1b5c  af                           xor  a
                         Loc_00129:
1b5d  ed 67                        rrd
1b5f  23                           inc  hl
1b60  ed 67                        rrd
1b62  23                           inc  hl
1b63  ed 67                        rrd
1b65  23                           inc  hl
1b66  ed 67                        rrd
1b68  23                           inc  hl
1b69  ed 67                        rrd
1b6b  23                           inc  hl
1b6c  ed 67                        rrd
1b6e  23                           inc  hl
1b6f  ed 67                        rrd
1b71  23                           inc  hl
1b72  ed 67                        rrd
1b74  23                           inc  hl
1b75  c9                           ret

                         Loc_00131:
1c47  1a                           ld   a,(de)
1c48  86                           add  a,(hl)
1c49  27                           daa
1c4a  12                           ld   (de),a
1c4b  1b                           dec  de
1c4c  2b                           dec  hl
1c4d  1a                           ld   a,(de)
1c4e  8e                           adc  a,(hl)
1c4f  27                           daa
1c50  12                           ld   (de),a
1c51  1b                           dec  de
1c52  2b                           dec  hl
1c53  1a                           ld   a,(de)
1c54  8e                           adc  a,(hl)
1c55  27                           daa
1c56  12                           ld   (de),a
1c57  1b                           dec  de
1c58  2b                           dec  hl
1c59  1a                           ld   a,(de)
1c5a  8e                           adc  a,(hl)
1c5b  27                           daa
1c5c  12                           ld   (de),a
1c5d  1b                           dec  de
1c5e  2b                           dec  hl
1c5f  1a                           ld   a,(de)
1c60  8e                           adc  a,(hl)
1c61  27                           daa
1c62  12                           ld   (de),a
1c63  1b                           dec  de
1c64  2b                           dec  hl
1c65  1a                           ld   a,(de)
1c66  8e                           adc  a,(hl)
1c67  27                           daa
1c68  12                           ld   (de),a
1c69  1b                           dec  de
1c6a  2b                           dec  hl
1c6b  1a                           ld   a,(de)
1c6c  8e                           adc  a,(hl)
1c6d  27                           daa
1c6e  12                           ld   (de),a
1c6f  1b                           dec  de
1c70  2b                           dec  hl
1c71  1a                           ld   a,(de)
1c72  8e                           adc  a,(hl)
1c73  27                           daa
1c74  12                           ld   (de),a
1c75  1b                           dec  de
1c76  2b                           dec  hl
1c77  c9                           ret

enjoy

Chip ___ DAA - Intel like instruction qestion?

Comments