BCD = (DAA) Instruction -- Question to Chip
Sapieha
Posts: 2,964
Hi Chip.
In another thread I aksed on DAA instruction.
And now I ask second time --
And post program example from ozpropdev ----
SETRACE Report Generator
That conversion needs in many programs and instead of that big program as in example can be made in simple instruction
In another thread I aksed on DAA instruction.
And now I ask second time --
And post program example from ozpropdev ----
SETRACE Report Generator
That conversion needs in many programs and instead of that big program as in example can be made in simple instruction
'####################################################################### 'Convert 32 bit value to 10 digit bcd value 'Uses Shift and add3 algorithm 'shift value into result, if a nibble => 5 add 3 bin2bcd mov num_bits,#32 'do 32 bit conversion mov rs,#0 mov rs2,#0 mov s1,#0 'init result :again shl value,#1 wc rcl rs,#1 wc rcl rs2,#1 'shift out value into result djz num_bits,#bin2bcd_exit :loop getnib nibble,rs,#0-0 'check nibbles 7 to 0 cmp nibble,#5 wz,wc if_ae add nibble,#3 :n2 if_ae setnib rs,nibble,#0-0 'adjust nibble incmod s1,#7 wz setnib :loop,s1,#6 setnib :n2,s1,#6 nop if_nz jmp #:loop getnib nibble,rs2,#0 'check nibble 8 (upper 2 digits of result) cmp nibble,#5 wz,wc if_ae add nibble,#3 if_ae setnib rs2,nibble,#0 jmp #:again bin2bcd_exit mov value,rs 'return result in value bin2bcd_ret ret '#######################################################################
Comments
Nooo, the dreaded DAA is back, that zombie instruction from the 8080 that hangs on in the latest x86 and is never used anywhere.
More seriously, how are we supposed to do all that work in a single instruction in the same time as all other simple instructions that does not mess up the pipeline or threading and does not eat a pile of silicon? There is a lot of looping and temporary variables in there.
On the 8080 DAA only worked on a single byte and probably only took a few gates to implement.
Do you have any Verilog or VHDL to demonstrate this ?
It is all that needs in verilog
I have just realized something. What you are presenting is not DAA (Decimal Adjust for Addition).
DAA assumes you start with BCD digits, do addition or subtraction on those bits with normal ADD/SUB and then use DAA to adjust the result back to BCD. In this way you can do arithmetic whilst all the time keeping the numbers in BCD format.
What you have there is regular binary encoding to BCD conversion.
Or have I missed a point somewhere?
P.S. That looks like VHDL to me.
Are you wanting an instruction that converts a long into an 8-digit BCD value?
Correct, its a 32 bit binary to 10 digit bcd conversion.
That would be nice Chip.
The reference to DAA was probably a bit confusing.
I'm sure this is what Sapieha meant.
I've thought about this function before, but it requires a cascade of compares/subtracts for each digit. This is something that might as well be computed in software. The time requirement for computing this is much like a divide operation. There are no easy shortcuts, as series of tests must be made to get a result.
The repeat and div/rem must get very close, then you just need to collect the REMs.
How many lines of code is that ? A handful ?
In first place at least 16bit HEX to corresponding x-BCD that have 2-BCD in one byte for simplify and speed up display that HEX value for use with Frequency counters and like needs.
In second place full DAA - can be only on LSB BYTE from long that can simplify writing old types of 8-bits CPU's
Edit: What if the following was applied to each nibble?
For example a instruction called BCDADJ
To convert a N-bit value into a bcd value, a loop could shift in each bit through C N times.
This would simplify the instruction implementation and reduce conversion code size and time.
7 bits = (6 x (shift in +BCDADJ )) + shift in
16 bits = (15 x (shift in +BCDADJ)) + shift in
32 bits = (31 x (shift in +BCDADJ )) + shift in
Nice and easy.
Using the WC effect on BCDADJ to shift out MSB extends its range also. (64 bits or more)
Ozpropdev
BCD conversion is just a series of Div/rem with some shifts.
The Nibbles are then peeled off and added to '0' on display.
code I posted do it in one clock cycle.
Are you sure ?
Why do you need 1 clk - how many clocks and size are the existing opcodes ?
This still has to propagate to a display, so 5ns vs 250ns is pretty moot ?
CLK is only for latching some registers.
IMHO it only needs a single routine to do it in sw, and it's not time conscious to display to the user, so I think there are way more valuable instructions with higher priority.
You mean Convert 32 bit value to 10 digit bcd value, in one clock cycle ?
Did you actually test it ?
On a 6502, a "cycle" was one clock, and the CPU did something on each cycle. Internally, that cycle was spread out across the logic inside, and not much could happen. So instructions took from 2 cycles to something like 8, depending. NOP is two cycles. LDA ($50, Y) might be 8 cycles. Cycles were read, process instruction, add, write, etc... For the NOP, it was read, process the no operation, read next instruction.
On an Intel chip, a cycle was also a clock, but it took several clocks to get anything done! A 4Mhz 8080 wasn't much faster than a 1-2Mhz 6502. The CPU didn't always do something every cycle. Intel was more like, get ready to read, read it, get ready to process it, process it, add, then add some more if it's a 16 bit operation, then add on some more, if it's indexed, then get ready to write, write it, etc....
Clock cycles don't mean anything out of context, IMHO.
When I see "one clock cycle" what I really see is "one instruction" in this context, and that means there needs to be enough time internally to do all that work. And I also see, :"can happen in the length of time one instruction would take" and I wonder about that given the iterations needed to resolve this.
BTW: 6502 had decimal mode. Instead of having the complicated conversion instructions, that chip would just operate in decimal for the ADD and SUB operations. (which is all that chip did) If you've got decimal math, turn the mode on, do it, turn the mode off, return to binary.
Both approaches have their merits. The ALU got bigger with decimal mode, and interrupts had to be managed, because that mode could hose things up if not preserved. Having the adjust instruction made for a CPU speed bottleneck, or a very long instruction time, etc...
In most cases, a routine got used anyway. Decimal mode was not used too often on 6502, probably due to the interrupt issue, or code porting where other devices didn't have decimal mode at all. Ended up being simpler to add this to the display library anyway.
I also see, "I can write one instruction" instead of "a subroutine", but I agree with those saying it's just one routine, and rendering to a display rarely needs one instruction speed. What display is even capable of this? Refresh rates limit one to a frame. I suppose LED displays can rip through the digits more quickly than that, but then one needs a brain upgrade to make any sense of them, right?
Seems to me that the original Intel microprocessor designs were envisaged for use in calculators. So wanting to handle BCD was a natural.
As it is microprocessors turned out to have other uses and that DAA remained largely unused. Just wasting a few gates on every chip ever since.
Even for a micro used exclusively for a small cash registers having an instruction that converts a 32/64 bit integer to a BCD number is not worthwhile. The time saved by having a single instruction vs a subroutine works out to a small fraction of a percent when compared to the total execution time of the program.
Much better to convert the input bcd/ascii to a 32/64 bit integer in software, perform all the calculations, and convert the result back to bcd/ascii for display. I am sure of this because I wrote all the conversion routines to do this on a 16 bit mini many years ago. When the total software package execution time was profiled the time spent in all the math routines was less than one percent.
MUL and DIV are the problems with BCD. The others can be handled sort of easily if you have a ROL/ROR kind of instruction that can shift 4 bits at once between longs (look at the code, plenty of sghift and masking... and self-modifying code).
I posted some z80 code in the other BCD thread... I haven't seen any other BCD use, except in calculators. The evolution seems to be base-10000 (or 100 for byte storage), where you have to divide by 10, 100 and so on. Just use tables.
If you want to add/sub/mul/div what the user with a keyboard entered... and asumming a fast-keyboard input of 100 ms per key... you have lots of time to do it per software... Even with carry-look-ahead you have to make 2 or three pases to get the results right for multi-digit calculations, you don't get the speed... but I'd really like to talk about implementations with you if you want, as I want BCD for my prototype processors...
Edit: corrected typo
6502 has no multiply. Add, shift, rotate only. That page has a BCD routine that works off of subtract by 2^A * 10^B routine + a table.
He wants a BCD2DEC and DEC2BCD conversion instruction. In the industrial field there are many "intelligent" sensors and displays that want the data BCD coded.
For this reason almost every PLC have these instructions built-in.
But ... this devices are usually serially connected and with the execution speed we are going to have from the new P2 I also think that these conversions can be done in software.
I know nothing about PLCs. What processors are inside PLC's that have BCD2DEC and DEC2BCD conversion instructions?
Surely this is just a software function implemented in whatever language PLC's use.
The P2 has Divide opcodes, and BCD is just a repeated remainder operation for nibble-count loops.
eg
Bv=2^16;
Repeating this
RemD=Bv%10;Bv=floor(Bv/10);RemD
Gives this
ans = 6
ans = 3
ans = 5
ans = 5
ans = 6
ans = 0
In P2 I think one opcode is needed for Remainder and Divide, so BCD library can be small and fast
Should easily complete in under 1us, which seems plenty quick enough for my eyes ..
This is all you need.
Ozpropdev
Edit: The code sample in post #1 was also a test of the nibble manipulation instructions.
It doesn't matter since you are never programming the CPU directly.
Every PLC have a small OS inside that takes a photo of all of your inputs to the internal memory, than executes your program and after that copies the output's photo from memory to real outputs. This warranties that, in the same program scan, all referrals to the same input from everywhere in the program reads the same state. The same way if during the same scan you set an output, than reset it and set it again the output actually not pulse but only the last "memory" state is then copied to the real word. This allows that several outputs can be driven in sync even in complex and long programs where between one output control instruction and the other can be many hundred of other instructions.
With prop you can do the same if you change an internal register and then copy it to the outputs, otherwise between the output1 drive in line eg 1 and output2 drive in line 50 (supposed a linear program) you'll have the outputs 50 clocks out of sync. The PLS OS takes care of it for you. It enforces program execution time (you must take care how you make loops, to not exceed program watchdog). And many other things ...
While for example Schneider Electric (ex. Telemecanique) is programmed more or less in basic (if/then/else, for/next, do/loop/until/while ...), the Siemens one is very close to assembler (they call it Step7 (older was Step5)) you have single operand instructions, accumulators ... But every PLC handless BCD2DEC, DEC2BCD in one instruction. I also presume that for most of them it's made in software, because everyone of them, at the end, seems runnin a bytecode interpreter (more or less like our Spin is)
At first I agreed. Then I saw Ozpropdev's example. After that I searched for instructions in Cluso99 thread (sorry I haven't studied the new P2 set) and I sow that he is already using the remainder instruction (getdivr ) so it seems it already exists. I have loosen myself a bit with getdivr /getdivq not knowing exactly how these are working.
Perhaps a new div32uwr can with one instruction place the result and the remainder in 2 consecutive longs saving some gets .... but Ozpropdev's code doesn't seems so terribly hard in terms of program lenght nor speed to justify the waste of silicon for a instruction that is usually needed in low baudrate serial communications to specialised devices.
Neat! I used the ROLNIB (rotate nibble left) to save one instruction.
Thanks. That what I thought, the conversions are ultimately done in software functions not real hardware machine instructions. I had to ask because I have never see a CPU with such conversion instructions.
ROLNIB, I forgot about that one! Nice!