BCD = (DAA) Instruction -- Question to Chip

cgracey · 2013-12-10 02:45

ozpropdev wrote: »

That's correct, no carry from nibble to nibble.
8 digits would be great in 1 instruction.
I'm not aware of a reverse trick.

Maybe we should just do a single-clock instruction that works on D, inputting and outputting a carry. That way, they could be chained together to get those ten digits, or even twenty if you started with a 64-bit value. That would be simpler to implement and more flexible.

ozpropdev · 2013-12-10 02:50

cgracey wrote: »

Maybe we should just do a single-clock instruction that works on D, inputting and outputting a carry. That way, they could be chained together to get those ten digits, or even twenty if you started with a 64-bit value. That would be simpler to implement and more flexible.

That is the best solution I think. Flexibility is the key.

cgracey · 2013-12-10 02:56

ozpropdev wrote: »

That is the best solution I think. Flexibility is the key.

Can't it be reversed to go the other direction, or does something get lost?

If you subtracted 3 from any nibble that was => 8 then right-shifted, wouldn't that do the trick? I'm amazed at how people come up with these algorithms.

ozpropdev · 2013-12-10 03:12

I'm scribbling on the whiteboard now....

Ale · 2013-12-10 03:18

I for one would add add-and-saturate/substract-and-saturate word and byte-wise (SIMD instructions) (in case they are not there) before I even consider BCD. Division is already there reducing the amount of work to transform a bin to BCD, unless it is in one instruction... I can hardly see its real usefulness. The problem with BCD is multiplying and dividing BCD, not converting to and from binary... a 2 digit BCD times a BCD seems more useful... just my opinion.

cgracey · 2013-12-10 03:28

Ale wrote: »

I for one would add add-and-saturate/substract-and-saturate word and byte-wise (SIMD instructions) (in case they are not there) before I even consider BCD. Division is already there reducing the amount of work to transform a bin to BCD, unless it is in one instruction... I can hardly see its real usefulness. The problem with BCD is multiplying and dividing BCD, not converting to and from binary... a 2 digit BCD times a BCD seems more useful... just my opinion.

The pixel mixer does four-byte add and saturate.

Yes, the BCDADJ instruction is not necessary, but it's interesting to know about.

ozpropdev · 2013-12-10 05:01

cgracey wrote: »

Can't it be reversed to go the other direction, or does something get lost?

If you subtracted 3 from any nibble that was => 8 then right-shifted, wouldn't that do the trick? I'm amazed at how people come up with these algorithms.

FYI Chip,
To reverse, shift right then any nibble >4 sub 3.
Seems to work Ok.

jmg · 2013-12-10 09:57

ozpropdev wrote: »

The 10 digit version would only be 9 longs and 159 cycles.

Can you post a10 digit example, optimized for size ? ( staying with the opcodes we have

)

jmg · 2013-12-10 10:06

cgracey wrote: »

Maybe we should just do a single-clock instruction that works on D, inputting and outputting a carry. That way, they could be chained together to get those ten digits, or even twenty if you started with a 64-bit value. That would be simpler to implement and more flexible.

An easy way to manage 64 (or more!) bits would be great, and 10 digits is important to fit all of 32 bits.

Since this seems to be constrained within nibbles,except for the 1 bit carry, that should not be too hard.

Mike Green · 2013-12-10 11:33

A small cash register does not need the speed of a one clock cycle DAA instruction. All of the calculations can be done internally using scaled integer arithmetic with the conversion to BCD done in software for display.

ozpropdev · 2013-12-10 14:59

jmg wrote: »

Can you post a10 digit example, optimized for size ? ( staying with the opcodes we have )

With what we've got in opcodes your V3 seems to be the smallest SW solution.

Circuitsoft · 2013-12-10 15:33

Mike Green wrote: »

A small cash register does not need the speed of a one clock cycle DAA instruction. All of the calculations can be done internally using scaled integer arithmetic with the conversion to BCD done in software for display.

While that was agreed upon, it was brought up earlier that many industrial sensors output data in either ASCII decimal or BCD, and so this conversion would be helpful in that case.

dMajo wrote: »

I can understand Sapieha.

He wants a BCD2DEC and DEC2BCD conversion instruction. In the industrial field there are many "intelligent" sensors and displays that want the data BCD coded.
For this reason almost every PLC have these instructions built-in.

But ... this devices are usually serially connected and with the execution speed we are going to have from the new P2 I also think that these conversions can be done in software.

ozpropdev · 2013-12-10 16:56

IIRC some NC machines use BCD output shaft encoders.

As mentioned earlier the "DAA" reference was probably misleading.
I don't believe anyone was wanting to do BCD math, it was more about conversion
of values from hex to decimal for display purposes in the most efficient/compact way.
Many SW solutions evolved from this discussion.

In P1 Spin the DEC function in some of the serial comms objects is a typical example.
In some debugging code where there is already code to display hex values a conversion
to BCD can use the same HEX function to display the result.

We now have a 64 bit system counter and multiply hardware with 64 bit results.
Converting these values introduces more challenges again.

COG ram is limited, so making the most of it is worth looking at different methodologies.
If we get HUBEXEC then sure it becomes irrelevant.

Anyhow I know I've learnt a few new tricks from this discussion so that's a bonus too.

jmg · 2013-12-10 17:46

ozpropdev wrote: »

IIRC some NC machines use BCD output shaft encoders.

More common is Gray Code absolute encoders, which are used because they have one-bit-change per location change.

There is a similar problem to BCD, for Gray to Binary conversion

A series of XORs can convert, like this from google.

Converting the Gray code to binary is:

Retain the Most significant bit as it is and for the rest of the bits keep xoring the successive bits.
ie Gn Gn-1 Gn-2 ........ G1 is the gray code and Bn Bn-1 .......B1 is the binary code.
Bn= Gn and for all the other bits Bn-1 = Gn-1 XOR Gn

Smallest code to do that on P2 (16..32 bits) ?

There is also an interesting variant, called Single Track Gray Code
http://en.wikipedia.org/wiki/Gray_code
Those do not quite cover all codes, but can allow some better mechanical solutions.

ozpropdev · 2013-12-10 18:04

P2 already has BINGRY and GRYBIN instructions to do gray code conversion.

ozpropdev · 2013-12-10 18:18

jmg wrote: »

There is also an interesting variant, called Single Track Gray Code
http://en.wikipedia.org/wiki/Gray_code

STGC makes the encoder wheel fairly simple to construct but as more resolution is required
the sensor/detector arrangement becomes quite complex.
An interesting shift register arrangement is also needed to calculate position.

I looked at this a few years ago, I still find it interesting

ozpropdev · 2013-12-10 22:43

FYI
Here's a 64 bit binary to 20 digit BCD optimized for size.

'Convert 64 bit value to 20 digit bcd value

'Uses Shift and add3 algorithm
'shift value into result, if a nibble > 4 add 3

' 14635 cycles

bin2bcd20		setquaz	#answer0		'zero all answer regs in 1 inst.
			mov	num_bits,#63		'64 bits - 1
			mov	s2,#0

next_bit		call	#shift			'shift value into answer
			reps	#24,#9			'check 24 nibbles in answer
			fixinda	#answer0,#answer2
:loop			getnib	nibble,inda,#0-0
			cmp	nibble,#4 wz,wc
		if_a	add	nibble,#3		'adjust nibble
:n2			setnib	inda++,nibble,#0-0
			incmod	s2,#2 wz		'adjust nibble prescaler
		if_z	incmod	s1,#7			'adjust nibble index
			setnib	:loop,s1,#6
			setnib	:n2,s1,#6
			nop

			djnz	num_bits,#next_bit
			call	#shift

bin2bcd20_ret		ret

shift			shl	value0,#1 wc		'shift value into answer
			rcl	value1,#1 wc
shift_ret		retd
			rcl	answer0,#1 wc
			rcl	answer1,#1 wc
			rcl	answer2,#1

answer0			long	0
answer1			long	0
answer2			long	0
s1			long	0
s2			long	0

value0			long	-1
value1			long	-1

cgracey · 2013-12-10 23:06

ozpropdev wrote: »

FYI
Here's a 64 bit binary to 20 digit BCD optimized for size.

'Convert 64 bit value to 20 digit bcd value

'Uses Shift and add3 algorithm
'shift value into result, if a nibble > 4 add 3

' 14635 cycles

bin2bcd20		setquaz	#answer0		'zero all answer regs in 1 inst.
			mov	num_bits,#63		'64 bits - 1
			mov	s2,#0

next_bit		call	#shift			'shift value into answer
			reps	#24,#9			'check 24 nibbles in answer
			fixinda	#answer0,#answer2
:loop			getnib	nibble,inda,#0-0
			cmp	nibble,#4 wz,wc
		if_a	add	nibble,#3		'adjust nibble
:n2			setnib	inda++,nibble,#0-0
			incmod	s2,#2 wz		'adjust nibble prescaler
		if_z	incmod	s1,#7			'adjust nibble index
			setnib	:loop,s1,#6
			setnib	:n2,s1,#6
			nop

			djnz	num_bits,#next_bit
			call	#shift

bin2bcd20_ret		ret

shift			shl	value0,#1 wc		'shift value into answer
			rcl	value1,#1 wc
shift_ret		retd
			rcl	answer0,#1 wc
			rcl	answer1,#1 wc
			rcl	answer2,#1

answer0			long	0
answer1			long	0
answer2			long	0
s1			long	0
s2			long	0

value0			long	-1
value1			long	-1

Neat use of SETQUAZ. Those special instructions sure would make that shorter.

Cluso99 · 2013-12-10 23:58

I am sorry, but I just don't see the requirement for this to be fast, so it may as well be done in sw and keep any spare opcodes for something that really counts (for now or later).

cgracey · 2013-12-11 00:14

Cluso99 wrote: »

I am sorry, but I just don't see the requirement for this to be fast, so it may as well be done in sw and keep any spare opcodes for something that really counts (for now or later).

It's not speed, so much as it is denser code and ease of use.

I'm rearranging the opcodes again to accommodate the hub execution branches. I've expanded and simplified things, somewhat, so there are lots of opcode slots available for all kinds of things. This sort of function hardly takes any gates. It's mainly the muxing into the result path, which is far from critical-path for something like BCDADJ. It really wouldn't be a big deal.

cgracey · 2013-12-11 15:32

I added two more single-clock unary instructions to perform the binary<->BCD shift function we've been talking about. This amounts to a small number of gates and the output feeds into the 'result' bus, which includes a bit for carry.

Here's the binary to BCD shift instruction:

// binbcd

wire [3:0] binbcdi [7:0];
wire [3:0] binbcdo [7:0];

assign binbcdi[7]	= d[31:28];
assign binbcdi[6]	= d[27:24];
assign binbcdi[5]	= d[23:20];
assign binbcdi[4]	= d[19:16];
assign binbcdi[3]	= d[15:12];
assign binbcdi[2]	= d[11:8];
assign binbcdi[1]	= d[7:4];
assign binbcdi[0]	= d[3:0];

assign binbcdo[7]	= binbcdi[7] >= 4'h5 ? binbcdi[7] + 4'h3 : binbcdi[7];
assign binbcdo[6]	= binbcdi[6] >= 4'h5 ? binbcdi[6] + 4'h3 : binbcdi[6];
assign binbcdo[5]	= binbcdi[5] >= 4'h5 ? binbcdi[5] + 4'h3 : binbcdi[5];
assign binbcdo[4]	= binbcdi[4] >= 4'h5 ? binbcdi[4] + 4'h3 : binbcdi[4];
assign binbcdo[3]	= binbcdi[3] >= 4'h5 ? binbcdi[3] + 4'h3 : binbcdi[3];
assign binbcdo[2]	= binbcdi[2] >= 4'h5 ? binbcdi[2] + 4'h3 : binbcdi[2];
assign binbcdo[1]	= binbcdi[1] >= 4'h5 ? binbcdi[1] + 4'h3 : binbcdi[1];
assign binbcdo[0]	= binbcdi[0] >= 4'h5 ? binbcdi[0] + 4'h3 : binbcdi[0];

wire [31:0] binbcdx	= {	binbcdo[7],
				binbcdo[6],
				binbcdo[5],
				binbcdo[4],
				binbcdo[3],
				binbcdo[2],
				binbcdo[1],
				binbcdo[0] };

wire binbcd_c		= binbcdx[31];

wire [31:0] binbcd_q	= {binbcdx[30:0], c};

Here's the BCD to binary shift instruction:

// bcdbin

wire [3:0] bcdbini [7:0];
wire [3:0] bcdbino [7:0];

assign bcdbini[7]	= {c, d[31:29]};
assign bcdbini[6]	= d[28:25];
assign bcdbini[5]	= d[24:21];
assign bcdbini[4]	= d[20:17];
assign bcdbini[3]	= d[16:13];
assign bcdbini[2]	= d[12:9];
assign bcdbini[1]	= d[8:5];
assign bcdbini[0]	= d[4:1];

assign bcdbino[7]	= bcdbini[7] >= 4'h8 ? bcdbini[7] - 4'h3 : bcdbini[7];
assign bcdbino[6]	= bcdbini[6] >= 4'h8 ? bcdbini[6] - 4'h3 : bcdbini[6];
assign bcdbino[5]	= bcdbini[5] >= 4'h8 ? bcdbini[5] - 4'h3 : bcdbini[5];
assign bcdbino[4]	= bcdbini[4] >= 4'h8 ? bcdbini[4] - 4'h3 : bcdbini[4];
assign bcdbino[3]	= bcdbini[3] >= 4'h8 ? bcdbini[3] - 4'h3 : bcdbini[3];
assign bcdbino[2]	= bcdbini[2] >= 4'h8 ? bcdbini[2] - 4'h3 : bcdbini[2];
assign bcdbino[1]	= bcdbini[1] >= 4'h8 ? bcdbini[1] - 4'h3 : bcdbini[1];
assign bcdbino[0]	= bcdbini[0] >= 4'h8 ? bcdbini[0] - 4'h3 : bcdbini[0];

wire bcdbin_c		= d[0];

wire [31:0] bcdbin_q	= {	bcdbino[7],
				bcdbino[6],
				bcdbino[5],
				bcdbino[4],
				bcdbino[3],
				bcdbino[2],
				bcdbino[1],
				bcdbino[0] };

Note that only the middle sections of each Verilog snippet actually create any logic. The pre sections get the data jigged up into 4-bit fields that can be easily worked and the post sections put it back into a long.

ozpropdev · 2013-12-11 16:34

Neat!

Thanks for posting the Verilog code too.
Nice to know how it all works.

jmg · 2013-12-11 17:01

Cool. That will be useful for compact Ascii String to Number, and Ascii Number to String conversion.

I think this works with Shift&Test algorithm, so there is no real limit on how many digits are supported ?

cgracey · 2013-12-11 17:14

jmg wrote: »

Cool. That will be useful for compact Ascii String to Number, and Ascii Number to String conversion.

I think this works with Shift&Test algorithm, so there is no real limit on how many digits are supported ?

You can do more digits than any human can make sense of, or than can represent any practical measurement.

David Betz · 2013-12-11 17:16

cgracey wrote: »

I added two more single-clock unary instructions to perform the binary<->BCD shift function we've been talking about. This amounts to a small number of gates and the output feeds into the 'result' bus, which includes a bit for carry.

Here's the binary to BCD shift instruction:

// binbcd

wire [3:0] binbcdi [7:0];
wire [3:0] binbcdo [7:0];

assign binbcdi[7]	= d[30:27];
assign binbcdi[6]	= d[26:23];
assign binbcdi[5]	= d[22:19];
assign binbcdi[4]	= d[18:15];
assign binbcdi[3]	= d[14:11];
assign binbcdi[2]	= d[10:7];
assign binbcdi[1]	= d[6:3];
assign binbcdi[0]	= {d[2:0], c};

assign binbcdo[7]	= binbcdi[7] >= 4'h5 ? binbcdi[7] + 4'h3 : binbcdi[7];
assign binbcdo[6]	= binbcdi[6] >= 4'h5 ? binbcdi[6] + 4'h3 : binbcdi[6];
assign binbcdo[5]	= binbcdi[5] >= 4'h5 ? binbcdi[5] + 4'h3 : binbcdi[5];
assign binbcdo[4]	= binbcdi[4] >= 4'h5 ? binbcdi[4] + 4'h3 : binbcdi[4];
assign binbcdo[3]	= binbcdi[3] >= 4'h5 ? binbcdi[3] + 4'h3 : binbcdi[3];
assign binbcdo[2]	= binbcdi[2] >= 4'h5 ? binbcdi[2] + 4'h3 : binbcdi[2];
assign binbcdo[1]	= binbcdi[1] >= 4'h5 ? binbcdi[1] + 4'h3 : binbcdi[1];
assign binbcdo[0]	= binbcdi[0] >= 4'h5 ? binbcdi[0] + 4'h3 : binbcdi[0];

wire binbcd_c		= d[31];

wire [31:0] binbcd_q	= {	binbcdo[7],
				binbcdo[6],
				binbcdo[5],
				binbcdo[4],
				binbcdo[3],
				binbcdo[2],
				binbcdo[1],
				binbcdo[0] };

Here's the BCD to binary shift instruction:

// bcdbin

wire [3:0] bcdbini [7:0];
wire [3:0] bcdbino [7:0];

assign bcdbini[7]	= d[31:28];
assign bcdbini[6]	= d[27:24];
assign bcdbini[5]	= d[23:20];
assign bcdbini[4]	= d[19:16];
assign bcdbini[3]	= d[15:12];
assign bcdbini[2]	= d[11:8];
assign bcdbini[1]	= d[7:4];
assign bcdbini[0]	= d[3:0];

assign bcdbino[7]	= bcdbini[7] >= 4'h8 ? bcdbini[7] - 4'h3 : bcdbini[7];
assign bcdbino[6]	= bcdbini[6] >= 4'h8 ? bcdbini[6] - 4'h3 : bcdbini[6];
assign bcdbino[5]	= bcdbini[5] >= 4'h8 ? bcdbini[5] - 4'h3 : bcdbini[5];
assign bcdbino[4]	= bcdbini[4] >= 4'h8 ? bcdbini[4] - 4'h3 : bcdbini[4];
assign bcdbino[3]	= bcdbini[3] >= 4'h8 ? bcdbini[3] - 4'h3 : bcdbini[3];
assign bcdbino[2]	= bcdbini[2] >= 4'h8 ? bcdbini[2] - 4'h3 : bcdbini[2];
assign bcdbino[1]	= bcdbini[1] >= 4'h8 ? bcdbini[1] - 4'h3 : bcdbini[1];
assign bcdbino[0]	= bcdbini[0] >= 4'h8 ? bcdbini[0] - 4'h3 : bcdbini[0];

wire [31:0] bcdbinx	= {	bcdbino[7],
				bcdbino[6],
				bcdbino[5],
				bcdbino[4],
				bcdbino[3],
				bcdbino[2],
				bcdbino[1],
				bcdbino[0] };

wire bcdbin_c		= bcdbinx[0];

wire bcdbin_q		= {c, bcdbinx[31:1]};

Note that only the middle sections of each Verilog snippet actually create any logic. The pre sections get the data jigged up into 4-bit fields that can be easily worked and the post sections put it back into a long.

Thanks for posting the Verilog code! I can't wait to see the Verilog for the whole COG if you end up opening that up for P3.

jmg · 2013-12-11 17:19

David Betz wrote: »

Thanks for posting the Verilog code! I can't wait to see the Verilog for the whole COG if you end up opening that up for P3.

Or even just the code for the Counters ...

Cluso99 · 2013-12-11 18:53

While I don't think the above was necessary, and neither is the following, might it be a worthwhile consideration?

Conversion from an ASCII byte to/from hex 0..9,A..F is often done. It is a simple conversion, now aided by cmpr.

There are two useful conversions...
ASCII $39-$39,$41..$46,$61..$66 (0..9,A..F,a..f) -> $00-0F
$00..$0F -> ASCII $30..$39,$41..$46 (0..9,A..F)
It could even be done as a hex byte to 2 ASCII bytes (word).
The ASCII to hex could optionally set C if it was not within conversion ranges.

Sapieha · 2013-12-11 19:01

Hi Cluso.

In my opinion --- Al that save COG space are worth to think on if Chip have spare Instructions space!

Cluso99 wrote: »

While I don't think the above was necessary, and neither is the following, might it be a worthwhile consideration?

Conversion from an ASCII byte to/from hex 0..9,A..F is often done. It is a simple conversion, now aided by cmpr.

There are two useful conversions...
ASCII $39-$39,$41..$46,$61..$66 (0..9,A..F,a..f) -> $00-0F
$00..$0F -> ASCII $30..$39,$41..$46 (0..9,A..F)
It could even be done as a hex byte to 2 ASCII bytes (word).
The ASCII to hex could optionally set C if it was not within conversion ranges.

BCD = (DAA) Instruction -- Question to Chip

Comments