Shop OBEX P1 Docs P2 Docs Learn Events
BCD = (DAA) Instruction -- Question to Chip - Page 3 — Parallax Forums

BCD = (DAA) Instruction -- Question to Chip

13»

Comments

  • cgraceycgracey Posts: 14,151
    edited 2013-12-10 02:45
    ozpropdev wrote: »
    That's correct, no carry from nibble to nibble.
    8 digits would be great in 1 instruction.
    I'm not aware of a reverse trick. :)

    Maybe we should just do a single-clock instruction that works on D, inputting and outputting a carry. That way, they could be chained together to get those ten digits, or even twenty if you started with a 64-bit value. That would be simpler to implement and more flexible.
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-10 02:50
    cgracey wrote: »
    Maybe we should just do a single-clock instruction that works on D, inputting and outputting a carry. That way, they could be chained together to get those ten digits, or even twenty if you started with a 64-bit value. That would be simpler to implement and more flexible.

    That is the best solution I think. Flexibility is the key. :)
  • cgraceycgracey Posts: 14,151
    edited 2013-12-10 02:56
    ozpropdev wrote: »
    That is the best solution I think. Flexibility is the key. :)

    Can't it be reversed to go the other direction, or does something get lost?

    If you subtracted 3 from any nibble that was => 8 then right-shifted, wouldn't that do the trick? I'm amazed at how people come up with these algorithms.
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-10 03:12
    I'm scribbling on the whiteboard now....
  • AleAle Posts: 2,363
    edited 2013-12-10 03:18
    I for one would add add-and-saturate/substract-and-saturate word and byte-wise (SIMD instructions) (in case they are not there) before I even consider BCD. Division is already there reducing the amount of work to transform a bin to BCD, unless it is in one instruction... I can hardly see its real usefulness. The problem with BCD is multiplying and dividing BCD, not converting to and from binary... a 2 digit BCD times a BCD seems more useful... just my opinion.
  • cgraceycgracey Posts: 14,151
    edited 2013-12-10 03:28
    Ale wrote: »
    I for one would add add-and-saturate/substract-and-saturate word and byte-wise (SIMD instructions) (in case they are not there) before I even consider BCD. Division is already there reducing the amount of work to transform a bin to BCD, unless it is in one instruction... I can hardly see its real usefulness. The problem with BCD is multiplying and dividing BCD, not converting to and from binary... a 2 digit BCD times a BCD seems more useful... just my opinion.

    The pixel mixer does four-byte add and saturate.

    Yes, the BCDADJ instruction is not necessary, but it's interesting to know about.
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-10 05:01
    cgracey wrote: »
    Can't it be reversed to go the other direction, or does something get lost?

    If you subtracted 3 from any nibble that was => 8 then right-shifted, wouldn't that do the trick? I'm amazed at how people come up with these algorithms.

    FYI Chip,
    To reverse, shift right then any nibble >4 sub 3.
    Seems to work Ok. :)
  • jmgjmg Posts: 15,173
    edited 2013-12-10 09:57
    ozpropdev wrote: »
    The 10 digit version would only be 9 longs and 159 cycles.

    Can you post a10 digit example, optimized for size ? ( staying with the opcodes we have :) )
  • jmgjmg Posts: 15,173
    edited 2013-12-10 10:06
    cgracey wrote: »
    Maybe we should just do a single-clock instruction that works on D, inputting and outputting a carry. That way, they could be chained together to get those ten digits, or even twenty if you started with a 64-bit value. That would be simpler to implement and more flexible.

    An easy way to manage 64 (or more!) bits would be great, and 10 digits is important to fit all of 32 bits.

    Since this seems to be constrained within nibbles,except for the 1 bit carry, that should not be too hard.
  • Mike GreenMike Green Posts: 23,101
    edited 2013-12-10 11:33
    A small cash register does not need the speed of a one clock cycle DAA instruction. All of the calculations can be done internally using scaled integer arithmetic with the conversion to BCD done in software for display.
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-10 14:59
    jmg wrote: »
    Can you post a10 digit example, optimized for size ? ( staying with the opcodes we have :) )

    With what we've got in opcodes your V3 seems to be the smallest SW solution. :)
  • CircuitsoftCircuitsoft Posts: 1,166
    edited 2013-12-10 15:33
    Mike Green wrote: »
    A small cash register does not need the speed of a one clock cycle DAA instruction. All of the calculations can be done internally using scaled integer arithmetic with the conversion to BCD done in software for display.
    While that was agreed upon, it was brought up earlier that many industrial sensors output data in either ASCII decimal or BCD, and so this conversion would be helpful in that case.
    dMajo wrote: »
    I can understand Sapieha.

    He wants a BCD2DEC and DEC2BCD conversion instruction. In the industrial field there are many "intelligent" sensors and displays that want the data BCD coded.
    For this reason almost every PLC have these instructions built-in.

    But ... this devices are usually serially connected and with the execution speed we are going to have from the new P2 I also think that these conversions can be done in software.
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-10 16:56
    IIRC some NC machines use BCD output shaft encoders.

    As mentioned earlier the "DAA" reference was probably misleading.
    I don't believe anyone was wanting to do BCD math, it was more about conversion
    of values from hex to decimal for display purposes in the most efficient/compact way.
    Many SW solutions evolved from this discussion.

    In P1 Spin the DEC function in some of the serial comms objects is a typical example.
    In some debugging code where there is already code to display hex values a conversion
    to BCD can use the same HEX function to display the result.

    We now have a 64 bit system counter and multiply hardware with 64 bit results.
    Converting these values introduces more challenges again.

    COG ram is limited, so making the most of it is worth looking at different methodologies.
    If we get HUBEXEC then sure it becomes irrelevant.

    Anyhow I know I've learnt a few new tricks from this discussion so that's a bonus too. :)
  • jmgjmg Posts: 15,173
    edited 2013-12-10 17:46
    ozpropdev wrote: »
    IIRC some NC machines use BCD output shaft encoders.

    More common is Gray Code absolute encoders, which are used because they have one-bit-change per location change.

    There is a similar problem to BCD, for Gray to Binary conversion

    A series of XORs can convert, like this from google.

    Converting the Gray code to binary is:

    Retain the Most significant bit as it is and for the rest of the bits keep xoring the successive bits.
    ie Gn Gn-1 Gn-2 ........ G1 is the gray code and Bn Bn-1 .......B1 is the binary code.
    Bn= Gn and for all the other bits Bn-1 = Gn-1 XOR Gn


    Smallest code to do that on P2 (16..32 bits) ?

    There is also an interesting variant, called Single Track Gray Code
    http://en.wikipedia.org/wiki/Gray_code
    Those do not quite cover all codes, but can allow some better mechanical solutions.
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-10 18:04
    P2 already has BINGRY and GRYBIN instructions to do gray code conversion. :)
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-10 18:18
    jmg wrote: »
    There is also an interesting variant, called Single Track Gray Code
    http://en.wikipedia.org/wiki/Gray_code

    STGC makes the encoder wheel fairly simple to construct but as more resolution is required
    the sensor/detector arrangement becomes quite complex.
    An interesting shift register arrangement is also needed to calculate position.

    I looked at this a few years ago, I still find it interesting :)
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-10 22:43
    FYI
    Here's a 64 bit binary to 20 digit BCD optimized for size.
    'Convert 64 bit value to 20 digit bcd value
    
    'Uses Shift and add3 algorithm
    'shift value into result, if a nibble > 4 add 3
    
    ' 14635 cycles
    
    bin2bcd20		setquaz	#answer0		'zero all answer regs in 1 inst.
    			mov	num_bits,#63		'64 bits - 1
    			mov	s2,#0
    
    next_bit		call	#shift			'shift value into answer
    			reps	#24,#9			'check 24 nibbles in answer
    			fixinda	#answer0,#answer2
    :loop			getnib	nibble,inda,#0-0
    			cmp	nibble,#4 wz,wc
    		if_a	add	nibble,#3		'adjust nibble
    :n2			setnib	inda++,nibble,#0-0
    			incmod	s2,#2 wz		'adjust nibble prescaler
    		if_z	incmod	s1,#7			'adjust nibble index
    			setnib	:loop,s1,#6
    			setnib	:n2,s1,#6
    			nop
    
    			djnz	num_bits,#next_bit
    			call	#shift
    
    bin2bcd20_ret		ret
    
    shift			shl	value0,#1 wc		'shift value into answer
    			rcl	value1,#1 wc
    shift_ret		retd
    			rcl	answer0,#1 wc
    			rcl	answer1,#1 wc
    			rcl	answer2,#1
    
    answer0			long	0
    answer1			long	0
    answer2			long	0
    s1			long	0
    s2			long	0
    
    value0			long	-1
    value1			long	-1
    
    
    
    
  • cgraceycgracey Posts: 14,151
    edited 2013-12-10 23:06
    ozpropdev wrote: »
    FYI
    Here's a 64 bit binary to 20 digit BCD optimized for size.
    'Convert 64 bit value to 20 digit bcd value
    
    'Uses Shift and add3 algorithm
    'shift value into result, if a nibble > 4 add 3
    
    ' 14635 cycles
    
    bin2bcd20		setquaz	#answer0		'zero all answer regs in 1 inst.
    			mov	num_bits,#63		'64 bits - 1
    			mov	s2,#0
    
    next_bit		call	#shift			'shift value into answer
    			reps	#24,#9			'check 24 nibbles in answer
    			fixinda	#answer0,#answer2
    :loop			getnib	nibble,inda,#0-0
    			cmp	nibble,#4 wz,wc
    		if_a	add	nibble,#3		'adjust nibble
    :n2			setnib	inda++,nibble,#0-0
    			incmod	s2,#2 wz		'adjust nibble prescaler
    		if_z	incmod	s1,#7			'adjust nibble index
    			setnib	:loop,s1,#6
    			setnib	:n2,s1,#6
    			nop
    
    			djnz	num_bits,#next_bit
    			call	#shift
    
    bin2bcd20_ret		ret
    
    shift			shl	value0,#1 wc		'shift value into answer
    			rcl	value1,#1 wc
    shift_ret		retd
    			rcl	answer0,#1 wc
    			rcl	answer1,#1 wc
    			rcl	answer2,#1
    
    answer0			long	0
    answer1			long	0
    answer2			long	0
    s1			long	0
    s2			long	0
    
    value0			long	-1
    value1			long	-1
    
    
    
    

    Neat use of SETQUAZ. Those special instructions sure would make that shorter.
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-10 23:58
    I am sorry, but I just don't see the requirement for this to be fast, so it may as well be done in sw and keep any spare opcodes for something that really counts (for now or later).
  • cgraceycgracey Posts: 14,151
    edited 2013-12-11 00:14
    Cluso99 wrote: »
    I am sorry, but I just don't see the requirement for this to be fast, so it may as well be done in sw and keep any spare opcodes for something that really counts (for now or later).

    It's not speed, so much as it is denser code and ease of use.

    I'm rearranging the opcodes again to accommodate the hub execution branches. I've expanded and simplified things, somewhat, so there are lots of opcode slots available for all kinds of things. This sort of function hardly takes any gates. It's mainly the muxing into the result path, which is far from critical-path for something like BCDADJ. It really wouldn't be a big deal.
  • cgraceycgracey Posts: 14,151
    edited 2013-12-11 15:32
    I added two more single-clock unary instructions to perform the binary<->BCD shift function we've been talking about. This amounts to a small number of gates and the output feeds into the 'result' bus, which includes a bit for carry.

    Here's the binary to BCD shift instruction:
    // binbcd
    
    wire [3:0] binbcdi [7:0];
    wire [3:0] binbcdo [7:0];
    
    assign binbcdi[7]	= d[31:28];
    assign binbcdi[6]	= d[27:24];
    assign binbcdi[5]	= d[23:20];
    assign binbcdi[4]	= d[19:16];
    assign binbcdi[3]	= d[15:12];
    assign binbcdi[2]	= d[11:8];
    assign binbcdi[1]	= d[7:4];
    assign binbcdi[0]	= d[3:0];
    
    assign binbcdo[7]	= binbcdi[7] >= 4'h5 ? binbcdi[7] + 4'h3 : binbcdi[7];
    assign binbcdo[6]	= binbcdi[6] >= 4'h5 ? binbcdi[6] + 4'h3 : binbcdi[6];
    assign binbcdo[5]	= binbcdi[5] >= 4'h5 ? binbcdi[5] + 4'h3 : binbcdi[5];
    assign binbcdo[4]	= binbcdi[4] >= 4'h5 ? binbcdi[4] + 4'h3 : binbcdi[4];
    assign binbcdo[3]	= binbcdi[3] >= 4'h5 ? binbcdi[3] + 4'h3 : binbcdi[3];
    assign binbcdo[2]	= binbcdi[2] >= 4'h5 ? binbcdi[2] + 4'h3 : binbcdi[2];
    assign binbcdo[1]	= binbcdi[1] >= 4'h5 ? binbcdi[1] + 4'h3 : binbcdi[1];
    assign binbcdo[0]	= binbcdi[0] >= 4'h5 ? binbcdi[0] + 4'h3 : binbcdi[0];
    
    wire [31:0] binbcdx	= {	binbcdo[7],
    				binbcdo[6],
    				binbcdo[5],
    				binbcdo[4],
    				binbcdo[3],
    				binbcdo[2],
    				binbcdo[1],
    				binbcdo[0] };
    
    wire binbcd_c		= binbcdx[31];
    
    wire [31:0] binbcd_q	= {binbcdx[30:0], c};
    


    Here's the BCD to binary shift instruction:
    // bcdbin
    
    wire [3:0] bcdbini [7:0];
    wire [3:0] bcdbino [7:0];
    
    assign bcdbini[7]	= {c, d[31:29]};
    assign bcdbini[6]	= d[28:25];
    assign bcdbini[5]	= d[24:21];
    assign bcdbini[4]	= d[20:17];
    assign bcdbini[3]	= d[16:13];
    assign bcdbini[2]	= d[12:9];
    assign bcdbini[1]	= d[8:5];
    assign bcdbini[0]	= d[4:1];
    
    assign bcdbino[7]	= bcdbini[7] >= 4'h8 ? bcdbini[7] - 4'h3 : bcdbini[7];
    assign bcdbino[6]	= bcdbini[6] >= 4'h8 ? bcdbini[6] - 4'h3 : bcdbini[6];
    assign bcdbino[5]	= bcdbini[5] >= 4'h8 ? bcdbini[5] - 4'h3 : bcdbini[5];
    assign bcdbino[4]	= bcdbini[4] >= 4'h8 ? bcdbini[4] - 4'h3 : bcdbini[4];
    assign bcdbino[3]	= bcdbini[3] >= 4'h8 ? bcdbini[3] - 4'h3 : bcdbini[3];
    assign bcdbino[2]	= bcdbini[2] >= 4'h8 ? bcdbini[2] - 4'h3 : bcdbini[2];
    assign bcdbino[1]	= bcdbini[1] >= 4'h8 ? bcdbini[1] - 4'h3 : bcdbini[1];
    assign bcdbino[0]	= bcdbini[0] >= 4'h8 ? bcdbini[0] - 4'h3 : bcdbini[0];
    
    wire bcdbin_c		= d[0];
    
    wire [31:0] bcdbin_q	= {	bcdbino[7],
    				bcdbino[6],
    				bcdbino[5],
    				bcdbino[4],
    				bcdbino[3],
    				bcdbino[2],
    				bcdbino[1],
    				bcdbino[0] };
    


    Note that only the middle sections of each Verilog snippet actually create any logic. The pre sections get the data jigged up into 4-bit fields that can be easily worked and the post sections put it back into a long.
  • ozpropdevozpropdev Posts: 2,792
    edited 2013-12-11 16:34
    Neat! :)

    Thanks for posting the Verilog code too.
    Nice to know how it all works.
  • jmgjmg Posts: 15,173
    edited 2013-12-11 17:01
    Cool. That will be useful for compact Ascii String to Number, and Ascii Number to String conversion.

    I think this works with Shift&Test algorithm, so there is no real limit on how many digits are supported ?
  • cgraceycgracey Posts: 14,151
    edited 2013-12-11 17:14
    jmg wrote: »
    Cool. That will be useful for compact Ascii String to Number, and Ascii Number to String conversion.

    I think this works with Shift&Test algorithm, so there is no real limit on how many digits are supported ?

    You can do more digits than any human can make sense of, or than can represent any practical measurement.
  • David BetzDavid Betz Posts: 14,516
    edited 2013-12-11 17:16
    cgracey wrote: »
    I added two more single-clock unary instructions to perform the binary<->BCD shift function we've been talking about. This amounts to a small number of gates and the output feeds into the 'result' bus, which includes a bit for carry.

    Here's the binary to BCD shift instruction:
    // binbcd
    
    wire [3:0] binbcdi [7:0];
    wire [3:0] binbcdo [7:0];
    
    assign binbcdi[7]	= d[30:27];
    assign binbcdi[6]	= d[26:23];
    assign binbcdi[5]	= d[22:19];
    assign binbcdi[4]	= d[18:15];
    assign binbcdi[3]	= d[14:11];
    assign binbcdi[2]	= d[10:7];
    assign binbcdi[1]	= d[6:3];
    assign binbcdi[0]	= {d[2:0], c};
    
    assign binbcdo[7]	= binbcdi[7] >= 4'h5 ? binbcdi[7] + 4'h3 : binbcdi[7];
    assign binbcdo[6]	= binbcdi[6] >= 4'h5 ? binbcdi[6] + 4'h3 : binbcdi[6];
    assign binbcdo[5]	= binbcdi[5] >= 4'h5 ? binbcdi[5] + 4'h3 : binbcdi[5];
    assign binbcdo[4]	= binbcdi[4] >= 4'h5 ? binbcdi[4] + 4'h3 : binbcdi[4];
    assign binbcdo[3]	= binbcdi[3] >= 4'h5 ? binbcdi[3] + 4'h3 : binbcdi[3];
    assign binbcdo[2]	= binbcdi[2] >= 4'h5 ? binbcdi[2] + 4'h3 : binbcdi[2];
    assign binbcdo[1]	= binbcdi[1] >= 4'h5 ? binbcdi[1] + 4'h3 : binbcdi[1];
    assign binbcdo[0]	= binbcdi[0] >= 4'h5 ? binbcdi[0] + 4'h3 : binbcdi[0];
    
    wire binbcd_c		= d[31];
    
    wire [31:0] binbcd_q	= {	binbcdo[7],
    				binbcdo[6],
    				binbcdo[5],
    				binbcdo[4],
    				binbcdo[3],
    				binbcdo[2],
    				binbcdo[1],
    				binbcdo[0] };
    


    Here's the BCD to binary shift instruction:
    // bcdbin
    
    wire [3:0] bcdbini [7:0];
    wire [3:0] bcdbino [7:0];
    
    assign bcdbini[7]	= d[31:28];
    assign bcdbini[6]	= d[27:24];
    assign bcdbini[5]	= d[23:20];
    assign bcdbini[4]	= d[19:16];
    assign bcdbini[3]	= d[15:12];
    assign bcdbini[2]	= d[11:8];
    assign bcdbini[1]	= d[7:4];
    assign bcdbini[0]	= d[3:0];
    
    assign bcdbino[7]	= bcdbini[7] >= 4'h8 ? bcdbini[7] - 4'h3 : bcdbini[7];
    assign bcdbino[6]	= bcdbini[6] >= 4'h8 ? bcdbini[6] - 4'h3 : bcdbini[6];
    assign bcdbino[5]	= bcdbini[5] >= 4'h8 ? bcdbini[5] - 4'h3 : bcdbini[5];
    assign bcdbino[4]	= bcdbini[4] >= 4'h8 ? bcdbini[4] - 4'h3 : bcdbini[4];
    assign bcdbino[3]	= bcdbini[3] >= 4'h8 ? bcdbini[3] - 4'h3 : bcdbini[3];
    assign bcdbino[2]	= bcdbini[2] >= 4'h8 ? bcdbini[2] - 4'h3 : bcdbini[2];
    assign bcdbino[1]	= bcdbini[1] >= 4'h8 ? bcdbini[1] - 4'h3 : bcdbini[1];
    assign bcdbino[0]	= bcdbini[0] >= 4'h8 ? bcdbini[0] - 4'h3 : bcdbini[0];
    
    wire [31:0] bcdbinx	= {	bcdbino[7],
    				bcdbino[6],
    				bcdbino[5],
    				bcdbino[4],
    				bcdbino[3],
    				bcdbino[2],
    				bcdbino[1],
    				bcdbino[0] };
    
    wire bcdbin_c		= bcdbinx[0];
    
    wire bcdbin_q		= {c, bcdbinx[31:1]};
    


    Note that only the middle sections of each Verilog snippet actually create any logic. The pre sections get the data jigged up into 4-bit fields that can be easily worked and the post sections put it back into a long.
    Thanks for posting the Verilog code! I can't wait to see the Verilog for the whole COG if you end up opening that up for P3.
  • jmgjmg Posts: 15,173
    edited 2013-12-11 17:19
    David Betz wrote: »
    Thanks for posting the Verilog code! I can't wait to see the Verilog for the whole COG if you end up opening that up for P3.

    Or even just the code for the Counters ... :)
  • Cluso99Cluso99 Posts: 18,069
    edited 2013-12-11 18:53
    While I don't think the above was necessary, and neither is the following, might it be a worthwhile consideration?

    Conversion from an ASCII byte to/from hex 0..9,A..F is often done. It is a simple conversion, now aided by cmpr.

    There are two useful conversions...
    ASCII $39-$39,$41..$46,$61..$66 (0..9,A..F,a..f) -> $00-0F
    $00..$0F -> ASCII $30..$39,$41..$46 (0..9,A..F)
    It could even be done as a hex byte to 2 ASCII bytes (word).
    The ASCII to hex could optionally set C if it was not within conversion ranges.
  • SapiehaSapieha Posts: 2,964
    edited 2013-12-11 19:01
    Hi Cluso.

    In my opinion --- Al that save COG space are worth to think on if Chip have spare Instructions space!

    Cluso99 wrote: »
    While I don't think the above was necessary, and neither is the following, might it be a worthwhile consideration?

    Conversion from an ASCII byte to/from hex 0..9,A..F is often done. It is a simple conversion, now aided by cmpr.

    There are two useful conversions...
    ASCII $39-$39,$41..$46,$61..$66 (0..9,A..F,a..f) -> $00-0F
    $00..$0F -> ASCII $30..$39,$41..$46 (0..9,A..F)
    It could even be done as a hex byte to 2 ASCII bytes (word).
    The ASCII to hex could optionally set C if it was not within conversion ranges.
Sign In or Register to comment.