BCD = (DAA) Instruction -- Question to Chip

Sapieha · 2013-12-06 06:51

Hi Chip.

In another thread I aksed on DAA instruction.

And now I ask second time --
And post program example from ozpropdev ----
SETRACE Report Generator

That conversion needs in many programs and instead of that big program as in example can be made in simple instruction

'#######################################################################

'Convert 32 bit value to 10 digit bcd value

'Uses Shift and add3 algorithm
'shift value into result, if a nibble => 5 add 3

bin2bcd            mov    num_bits,#32        'do 32 bit conversion
            mov    rs,#0
            mov    rs2,#0
            mov    s1,#0            'init result

:again            shl    value,#1 wc
            rcl    rs,#1 wc
            rcl    rs2,#1            'shift out value into result

            djz    num_bits,#bin2bcd_exit
            
:loop            getnib    nibble,rs,#0-0        'check nibbles 7 to 0
            cmp    nibble,#5 wz,wc
        if_ae    add    nibble,#3
:n2        if_ae    setnib    rs,nibble,#0-0        'adjust nibble
            incmod    s1,#7 wz
            setnib    :loop,s1,#6
            setnib    :n2,s1,#6
            nop
        if_nz    jmp    #:loop    

            getnib    nibble,rs2,#0        'check nibble 8 (upper 2 digits of result)
            cmp    nibble,#5 wz,wc
        if_ae    add    nibble,#3
        if_ae    setnib    rs2,nibble,#0

            jmp    #:again

bin2bcd_exit        mov    value,rs        'return result in value
bin2bcd_ret        ret

'#######################################################################

Heater. · 2013-12-06 07:16

Sapieha,

Nooo, the dreaded DAA is back, that zombie instruction from the 8080 that hangs on in the latest x86 and is never used anywhere.

More seriously, how are we supposed to do all that work in a single instruction in the same time as all other simple instructions that does not mess up the pipeline or threading and does not eat a pile of silicon? There is a lot of looping and temporary variables in there.

On the 8080 DAA only worked on a single byte and probably only took a few gates to implement.

Do you have any Verilog or VHDL to demonstrate this ?

Sapieha · 2013-12-06 08:42

Hi Heater.

It is all that needs in verilog

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;

ENTITY HEX2BCD16 IS 
PORT( 
      RESET : IN STD_LOGIC; 
      HEX :IN STD_LOGIC_VECTOR(15 DOWNTO 0); 
      BCD :OUT STD_LOGIC_VECTOR(23 DOWNTO 0); 
      CLK :IN  STD_LOGIC); 
END ENTITY; 
 
ARCHITECTURE BEHAV OF HEX2BCD16 IS 
SIGNAL TEMP_HEX : STD_LOGIC_VECTOR(15 DOWNTO 0):= (OTHERS => '0'); 
SIGNAL TEMP_BCD1 : STD_LOGIC_VECTOR(23 DOWNTO 0):= (OTHERS => '0'); 
SIGNAL TEMP_BCD2 : STD_LOGIC_VECTOR(23 DOWNTO 0):= (OTHERS => '0'); 
signal I: INTEGER RANGE 0 TO 31:=18; 
BEGIN 
    PROCESS(CLK,RESET) 
    BEGIN 
    IF RESET = '1' THEN 
        I<=0; 
        TEMP_BCD1 <= (OTHERS => '0'); 
    ELSE 
        IF CLK'EVENT AND CLK='1' THEN 
           
            IF I=0 THEN 
                TEMP_HEX(15 DOWNTO 1)<= HEX(14 DOWNTO 0); 
                TEMP_BCD1(0) <= HEX(15); 
                I<=I+1; 
            ELSIF I < 16 THEN 
                TEMP_BCD1(23 DOWNTO 1) <= TEMP_BCD2(22 DOWNTO 0); 
                TEMP_BCD1(0) <= TEMP_HEX(15); 
                TEMP_HEX(15 DOWNTO 1) <= TEMP_HEX(14 DOWNTO 0); 
                I <= I+1; 
            ELSE 
                I <= 18;  
            END IF; 
            
        END IF; 
    END IF; 
             
    END PROCESS; 
     
    PROCESS(TEMP_BCD1) 
    BEGIN 
       
        IF(TEMP_BCD1(3 DOWNTO 0) > 4) THEN 
           TEMP_BCD2(3 DOWNTO 0) <= TEMP_BCD1(3 DOWNTO 0)+"0011"; 
        ELSE 
           TEMP_BCD2(3 DOWNTO 0) <= TEMP_BCD1(3 DOWNTO 0); 
        END IF; 
     
        IF(TEMP_BCD1(7 DOWNTO 4) > 4) THEN 
           TEMP_BCD2(7 DOWNTO 4) <= TEMP_BCD1(7 DOWNTO 4)+"0011"; 
         ELSE 
           TEMP_BCD2(7 DOWNTO 4) <= TEMP_BCD1(7 DOWNTO 4); 
        END IF; 
          
        IF(TEMP_BCD1(11 DOWNTO 8) > 4) THEN 
           TEMP_BCD2(11 DOWNTO 8) <= TEMP_BCD1(11 DOWNTO 8)+"0011"; 
        ELSE 
           TEMP_BCD2(11 DOWNTO 8) <= TEMP_BCD1(11 DOWNTO 8); 
        END IF;  
 
        IF(TEMP_BCD1(15 DOWNTO 12) > 4) THEN 
           TEMP_BCD2(15 DOWNTO 12) <= TEMP_BCD1(15 DOWNTO 12)+"0011"; 
        ELSE 
           TEMP_BCD2(15 DOWNTO 12) <= TEMP_BCD1(15 DOWNTO 12); 
         
        END IF;  
          
        IF(TEMP_BCD1(19 DOWNTO 16) > 4) THEN 
           TEMP_BCD2(19 DOWNTO 16) <= TEMP_BCD1(19 DOWNTO 16)+"0011"; 
        ELSE 
           TEMP_BCD2(19 DOWNTO 16) <= TEMP_BCD1(19 DOWNTO 16); 
        END IF;  
  
        IF(TEMP_BCD1(23 DOWNTO 20) > 4) THEN 
           TEMP_BCD2(23 DOWNTO 20) <= TEMP_BCD1(23 DOWNTO 20)+"0011"; 
        ELSE 
           TEMP_BCD2(23 DOWNTO 20) <= TEMP_BCD1(23 DOWNTO 20); 
        END IF;  
         
    END PROCESS; 
process(CLK,I,TEMP_BCD1) 
BEGIN 
    if rising_edge(CLK) then 
       if I=18 then 
    BCD <= TEMP_BCD1; 
    end if; 
end if; 
end process; 
     
--    PROCESS(TEMP_BCD2) 
--    BEGIN 
--  --      TEMP_BCD1 <= TEMP_BCD2; 
--          
--    END PROCESS; 
END;

Heater. wrote: »

Sapieha,

Nooo, the dreaded DAA is back, that zombie instruction from the 8080 that hangs on in the latest x86 and is never used anywhere.

More seriously, how are we supposed to do all that work in a single instruction in the same time as all other simple instructions that does not mess up the pipeline or threading and does not eat a pile of silicon? There is a lot of looping and temporary variables in there.

On the 8080 DAA only worked on a single byte and probably only took a few gates to implement.

Do you have any Verilog or VHDL to demonstrate this ?

Heater. · 2013-12-06 09:54

Sapieha,

I have just realized something. What you are presenting is not DAA (Decimal Adjust for Addition).

DAA assumes you start with BCD digits, do addition or subtraction on those bits with normal ADD/SUB and then use DAA to adjust the result back to BCD. In this way you can do arithmetic whilst all the time keeping the numbers in BCD format.

What you have there is regular binary encoding to BCD conversion.

Or have I missed a point somewhere?

P.S. That looks like VHDL to me.

cgracey · 2013-12-06 11:27

Sapieha,

Are you wanting an instruction that converts a long into an 8-digit BCD value?

ozpropdev · 2013-12-06 12:37

Heater. wrote: »

Sapieha,

I have just realized something. What you are presenting is not DAA (Decimal Adjust for Addition).

DAA assumes you start with BCD digits, do addition or subtraction on those bits with normal ADD/SUB and then use DAA to adjust the result back to BCD. In this way you can do arithmetic whilst all the time keeping the numbers in BCD format.

What you have there is regular binary encoding to BCD conversion.

Or have I missed a point somewhere?

P.S. That looks like VHDL to me.

Correct, its a 32 bit binary to 10 digit bcd conversion.

cgracey wrote: »

Sapieha,

Are you wanting an instruction that converts a long into an 8-digit BCD value?

That would be nice Chip.
The reference to DAA was probably a bit confusing.
I'm sure this is what Sapieha meant.

cgracey · 2013-12-06 12:41

ozpropdev wrote: »

Correct, its a 32 bit binary to 10 digit bcd conversion.

That would be nice Chip.
The reference to DAA was probably a bit confusing.
I'm sure this is what Sapieha meant.

I've thought about this function before, but it requires a cascade of compares/subtracts for each digit. This is something that might as well be computed in software. The time requirement for computing this is much like a divide operation. There are no easy shortcuts, as series of tests must be made to get a result.

jmg · 2013-12-06 12:46

cgracey wrote: »

I've thought about this function before, but it requires a cascade of compares/subtracts for each digit. This is something that might as well be computed in software. The time requirement for computing this is much like a divide operation. There are no easy shortcuts, as series of tests must be made to get a result.

The repeat and div/rem must get very close, then you just need to collect the REMs.
How many lines of code is that ? A handful ?

Sapieha · 2013-12-06 13:42

Hi Chip.

In first place at least 16bit HEX to corresponding x-BCD that have 2-BCD in one byte for simplify and speed up display that HEX value for use with Frequency counters and like needs.

In second place full DAA - can be only on LSB BYTE from long that can simplify writing old types of 8-bits CPU's

ozpropdev · 2013-12-06 15:15

What if a instruction could shift in a C bit and apply the following truth table to each nibble.

Edit: What if the following was applied to each nibble?

module add3(in,out);
input [3:0] in;
output [3:0] out;
reg [3:0] out;

always @ (in)
	case (in)
	4'b0000: out <= 4'b0000;
	4'b0001: out <= 4'b0001;
	4'b0010: out <= 4'b0010;
	4'b0011: out <= 4'b0011;
	4'b0100: out <= 4'b0100;
	4'b0101: out <= 4'b1000;
	4'b0110: out <= 4'b1001;
	4'b0111: out <= 4'b1010;
	4'b1000: out <= 4'b1011;
	4'b1001: out <= 4'b1100;
	default: out <= 4'b0000;
	endcase
endmodule

For example a instruction called BCDADJ

To convert a N-bit value into a bcd value, a loop could shift in each bit through C N times.
This would simplify the instruction implementation and reduce conversion code size and time.
7 bits = (6 x (shift in +BCDADJ )) + shift in
16 bits = (15 x (shift in +BCDADJ)) + shift in
32 bits = (31 x (shift in +BCDADJ )) + shift in
Nice and easy.

Using the WC effect on BCDADJ to shift out MSB extends its range also. (64 bits or more)

Ozpropdev

jmg · 2013-12-06 16:56

Because the P2 has a Divide/remainder opcode, why not just use that in a 8x loop, one per nibble ?
BCD conversion is just a series of Div/rem with some shifts.
The Nibbles are then peeled off and added to '0' on display.

Sapieha · 2013-12-06 17:05

Hi jmg.

code I posted do it in one clock cycle.

jmg wrote: »

Because the P2 has a Divide/remainder opcode, why not just use that in a 8x loop, one per nibble ?
BCD conversion is just a series of Div/rem with some shifts.
The Nibbles are then peeled off and added to '0' on display.

jmg · 2013-12-06 17:13

Sapieha wrote: »

Hi jmg.

code I posted do it in one clock cycle.

Are you sure ?

Why do you need 1 clk - how many clocks and size are the existing opcodes ?
This still has to propagate to a display, so 5ns vs 250ns is pretty moot ?

Sapieha · 2013-12-06 17:24

Hi jmg.

CLK is only for latching some registers.

jmg wrote: »

Are you sure ?

Why do you need 1 clk - how many clocks and size are the existing opcodes ?

This still has to propagate to a display, so 5ns vs 250ns is pretty moot ?

Cluso99 · 2013-12-06 17:34

Sorry Sapieha,
IMHO it only needs a single routine to do it in sw, and it's not time conscious to display to the user, so I think there are way more valuable instructions with higher priority.

jmg · 2013-12-07 00:29

Sapieha wrote: »

Hi jmg.

code I posted do it in one clock cycle.

You mean Convert 32 bit value to 10 digit bcd value, in one clock cycle ?
Did you actually test it ?

potatohead · 2013-12-07 08:42

This cycle business is slippery.

On a 6502, a "cycle" was one clock, and the CPU did something on each cycle. Internally, that cycle was spread out across the logic inside, and not much could happen. So instructions took from 2 cycles to something like 8, depending. NOP is two cycles. LDA ($50, Y) might be 8 cycles. Cycles were read, process instruction, add, write, etc... For the NOP, it was read, process the no operation, read next instruction.

On an Intel chip, a cycle was also a clock, but it took several clocks to get anything done! A 4Mhz 8080 wasn't much faster than a 1-2Mhz 6502. The CPU didn't always do something every cycle. Intel was more like, get ready to read, read it, get ready to process it, process it, add, then add some more if it's a 16 bit operation, then add on some more, if it's indexed, then get ready to write, write it, etc....

Clock cycles don't mean anything out of context, IMHO.

When I see "one clock cycle" what I really see is "one instruction" in this context, and that means there needs to be enough time internally to do all that work. And I also see, :"can happen in the length of time one instruction would take" and I wonder about that given the iterations needed to resolve this.

BTW: 6502 had decimal mode. Instead of having the complicated conversion instructions, that chip would just operate in decimal for the ADD and SUB operations. (which is all that chip did) If you've got decimal math, turn the mode on, do it, turn the mode off, return to binary.

Both approaches have their merits. The ALU got bigger with decimal mode, and interrupts had to be managed, because that mode could hose things up if not preserved. Having the adjust instruction made for a CPU speed bottleneck, or a very long instruction time, etc...

In most cases, a routine got used anyway. Decimal mode was not used too often on 6502, probably due to the interrupt issue, or code porting where other devices didn't have decimal mode at all. Ended up being simpler to add this to the display library anyway.

I also see, "I can write one instruction" instead of "a subroutine", but I agree with those saying it's just one routine, and rendering to a display rarely needs one instruction speed. What display is even capable of this? Refresh rates limit one to a frame. I suppose LED displays can rip through the digits more quickly than that, but then one needs a brain upgrade to make any sense of them, right?

Heater. · 2013-12-07 09:34

I always wondered about this DAA business.

Seems to me that the original Intel microprocessor designs were envisaged for use in calculators. So wanting to handle BCD was a natural.

As it is microprocessors turned out to have other uses and that DAA remained largely unused. Just wasting a few gates on every chip ever since.

potatohead · 2013-12-07 09:55

I seem to recall it being seen as important for financial work too.

Sapieha · 2013-12-07 10:37

And what say --- It cant be used in small cash register -- with all possibility's it have?

Heater. wrote: »

I always wondered about this DAA business.

Seems to me that the original Intel microprocessor designs were envisaged for use in calculators. So wanting to handle BCD was a natural.

As it is microprocessors turned out to have other uses and that DAA remained largely unused. Just wasting a few gates on every chip ever since.

kwinn · 2013-12-07 11:19

Sapieha wrote: »

And what say --- It cant be used in small cash register -- with all possibility's it have?

Even for a micro used exclusively for a small cash registers having an instruction that converts a 32/64 bit integer to a BCD number is not worthwhile. The time saved by having a single instruction vs a subroutine works out to a small fraction of a percent when compared to the total execution time of the program.

Much better to convert the input bcd/ascii to a 32/64 bit integer in software, perform all the calculations, and convert the result back to bcd/ascii for display. I am sure of this because I wrote all the conversion routines to do this on a 16 bit mini many years ago. When the total software package execution time was profiled the time spent in all the math routines was less than one percent.

Ale · 2013-12-09 01:22

Sapieha, look at the Math link in my sig. That is Prop I code, and it works.
MUL and DIV are the problems with BCD. The others can be handled sort of easily if you have a ROL/ROR kind of instruction that can shift 4 bits at once between longs (look at the code, plenty of sghift and masking... and self-modifying code).
I posted some z80 code in the other BCD thread... I haven't seen any other BCD use, except in calculators. The evolution seems to be base-10000 (or 100 for byte storage), where you have to divide by 10, 100 and so on. Just use tables.
If you want to add/sub/mul/div what the user with a keyboard entered... and asumming a fast-keyboard input of 100 ms per key... you have lots of time to do it per software... Even with carry-look-ahead you have to make 2 or three pases to get the results right for multi-digit calculations, you don't get the speed... but I'd really like to talk about implementations with you if you want, as I want BCD for my prototype processors...

Edit: corrected typo

potatohead · 2013-12-09 01:30

http://6502org.wikidot.com/software-output-decimal

6502 has no multiply. Add, shift, rotate only. That page has a BCD routine that works off of subtract by 2^A * 10^B routine + a table.

dMajo · 2013-12-09 02:09

I can understand Sapieha.

He wants a BCD2DEC and DEC2BCD conversion instruction. In the industrial field there are many "intelligent" sensors and displays that want the data BCD coded.
For this reason almost every PLC have these instructions built-in.

But ... this devices are usually serially connected and with the execution speed we are going to have from the new P2 I also think that these conversions can be done in software.

Heater. · 2013-12-09 02:19

dMajo,

I know nothing about PLCs. What processors are inside PLC's that have BCD2DEC and DEC2BCD conversion instructions?
Surely this is just a software function implemented in whatever language PLC's use.

jmg · 2013-12-09 02:29

dMajo wrote: »

He wants a BCD2DEC and DEC2BCD conversion instruction. In the industrial field there are many "intelligent" sensors and displays that want the data BCD coded.
For this reason almost every PLC have these instructions built-in.

But ... this devices are usually serially connected and with the execution speed we are going to have from the new P2 I also think that these conversions can be done in software.

The P2 has Divide opcodes, and BCD is just a repeated remainder operation for nibble-count loops.

eg
Bv=2^16;

Repeating this
RemD=Bv%10;Bv=floor(Bv/10);RemD
Gives this
ans = 6
ans = 3
ans = 5
ans = 5
ans = 6
ans = 0

In P2 I think one opcode is needed for Remainder and Divide, so BCD library can be small and fast
Should easily complete in under 1us, which seems plenty quick enough for my eyes ..

ozpropdev · 2013-12-09 02:40

bin2bcd		reps	#8,#7
		mov	dx,_10_000_000
		div32u	value,dx
		getdivr	value
		getdivq	ax
		shl	answer,#4
		or	answer,ax
		div32u	dx,#10
		getdivq	dx
bin2bcd_ret	ret

_10_000_000	long	10_000_000

This is all you need.

Ozpropdev

Edit: The code sample in post #1 was also a test of the nibble manipulation instructions.

dMajo · 2013-12-09 03:52

Heater. wrote: »

dMajo,

What processors are inside PLC's ...
Surely this is just a software function implemented in whatever language PLC's use.

It doesn't matter since you are never programming the CPU directly.
Every PLC have a small OS inside that takes a photo of all of your inputs to the internal memory, than executes your program and after that copies the output's photo from memory to real outputs. This warranties that, in the same program scan, all referrals to the same input from everywhere in the program reads the same state. The same way if during the same scan you set an output, than reset it and set it again the output actually not pulse but only the last "memory" state is then copied to the real word. This allows that several outputs can be driven in sync even in complex and long programs where between one output control instruction and the other can be many hundred of other instructions.
With prop you can do the same if you change an internal register and then copy it to the outputs, otherwise between the output1 drive in line eg 1 and output2 drive in line 50 (supposed a linear program) you'll have the outputs 50 clocks out of sync. The PLS OS takes care of it for you. It enforces program execution time (you must take care how you make loops, to not exceed program watchdog). And many other things ...

While for example Schneider Electric (ex. Telemecanique) is programmed more or less in basic (if/then/else, for/next, do/loop/until/while ...), the Siemens one is very close to assembler (they call it Step7 (older was Step5)) you have single operand instructions, accumulators ... But every PLC handless BCD2DEC, DEC2BCD in one instruction. I also presume that for most of them it's made in software, because everyone of them, at the end, seems runnin a bytecode interpreter (more or less like our Spin is)

jmg wrote: »

The P2 has Divide opcodes, and BCD is just a repeated remainder operation for nibble-count loops.

eg
Bv=2^16;

Repeating this
RemD=Bv;Bv=floor(Bv/10);RemD
Gives this
ans = 6
ans = 3
ans = 5
ans = 5
ans = 6
ans = 0

In P2 I think one opcode is needed for Remainder and Divide, so BCD library can be small and fast
Should easily complete in under 1us, which seems plenty quick enough for my eyes ..

At first I agreed. Then I saw Ozpropdev's example. After that I searched for instructions in Cluso99 thread (sorry I haven't studied the new P2 set) and I sow that he is already using the remainder instruction (getdivr ) so it seems it already exists. I have loosen myself a bit with getdivr /getdivq not knowing exactly how these are working.
Perhaps a new div32uwr can with one instruction place the result and the remainder in 2 consecutive longs saving some gets .... but Ozpropdev's code doesn't seems so terribly hard in terms of program lenght nor speed to justify the waste of silicon for a instruction that is usually needed in low baudrate serial communications to specialised devices.

cgracey · 2013-12-09 03:55

ozpropdev wrote: »
bin2bcd		reps	#8,#6
		mov	dx,_10_000_000
		div32u	value,dx
		getdivr	value
		getdivq	ax
		rolnib	answer,ax
		div32u	dx,#10
		getdivq	dx
bin2bcd_ret	ret

_10_000_000	long	10_000_000
This is all you need.

Ozpropdev

Edit: The code sample in post #1 was also a test of the nibble manipulation instructions.

Neat! I used the ROLNIB (rotate nibble left) to save one instruction.

Heater. · 2013-12-09 03:58

dMajo,

Thanks. That what I thought, the conversions are ultimately done in software functions not real hardware machine instructions. I had to ask because I have never see a CPU with such conversion instructions.

ozpropdev · 2013-12-09 04:00

cgracey wrote: »

Neat! I used the ROLNIB (rotate nibble left) to save one instruction.

ROLNIB, I forgot about that one! Nice!

BCD = (DAA) Instruction -- Question to Chip

Comments