That's correct, no carry from nibble to nibble.
8 digits would be great in 1 instruction.
I'm not aware of a reverse trick.
Maybe we should just do a single-clock instruction that works on D, inputting and outputting a carry. That way, they could be chained together to get those ten digits, or even twenty if you started with a 64-bit value. That would be simpler to implement and more flexible.
Maybe we should just do a single-clock instruction that works on D, inputting and outputting a carry. That way, they could be chained together to get those ten digits, or even twenty if you started with a 64-bit value. That would be simpler to implement and more flexible.
That is the best solution I think. Flexibility is the key.
That is the best solution I think. Flexibility is the key.
Can't it be reversed to go the other direction, or does something get lost?
If you subtracted 3 from any nibble that was => 8 then right-shifted, wouldn't that do the trick? I'm amazed at how people come up with these algorithms.
I for one would add add-and-saturate/substract-and-saturate word and byte-wise (SIMD instructions) (in case they are not there) before I even consider BCD. Division is already there reducing the amount of work to transform a bin to BCD, unless it is in one instruction... I can hardly see its real usefulness. The problem with BCD is multiplying and dividing BCD, not converting to and from binary... a 2 digit BCD times a BCD seems more useful... just my opinion.
I for one would add add-and-saturate/substract-and-saturate word and byte-wise (SIMD instructions) (in case they are not there) before I even consider BCD. Division is already there reducing the amount of work to transform a bin to BCD, unless it is in one instruction... I can hardly see its real usefulness. The problem with BCD is multiplying and dividing BCD, not converting to and from binary... a 2 digit BCD times a BCD seems more useful... just my opinion.
The pixel mixer does four-byte add and saturate.
Yes, the BCDADJ instruction is not necessary, but it's interesting to know about.
Can't it be reversed to go the other direction, or does something get lost?
If you subtracted 3 from any nibble that was => 8 then right-shifted, wouldn't that do the trick? I'm amazed at how people come up with these algorithms.
FYI Chip,
To reverse, shift right then any nibble >4 sub 3.
Seems to work Ok.
Maybe we should just do a single-clock instruction that works on D, inputting and outputting a carry. That way, they could be chained together to get those ten digits, or even twenty if you started with a 64-bit value. That would be simpler to implement and more flexible.
An easy way to manage 64 (or more!) bits would be great, and 10 digits is important to fit all of 32 bits.
Since this seems to be constrained within nibbles,except for the 1 bit carry, that should not be too hard.
A small cash register does not need the speed of a one clock cycle DAA instruction. All of the calculations can be done internally using scaled integer arithmetic with the conversion to BCD done in software for display.
A small cash register does not need the speed of a one clock cycle DAA instruction. All of the calculations can be done internally using scaled integer arithmetic with the conversion to BCD done in software for display.
While that was agreed upon, it was brought up earlier that many industrial sensors output data in either ASCII decimal or BCD, and so this conversion would be helpful in that case.
He wants a BCD2DEC and DEC2BCD conversion instruction. In the industrial field there are many "intelligent" sensors and displays that want the data BCD coded.
For this reason almost every PLC have these instructions built-in.
But ... this devices are usually serially connected and with the execution speed we are going to have from the new P2 I also think that these conversions can be done in software.
IIRC some NC machines use BCD output shaft encoders.
As mentioned earlier the "DAA" reference was probably misleading.
I don't believe anyone was wanting to do BCD math, it was more about conversion
of values from hex to decimal for display purposes in the most efficient/compact way.
Many SW solutions evolved from this discussion.
In P1 Spin the DEC function in some of the serial comms objects is a typical example.
In some debugging code where there is already code to display hex values a conversion
to BCD can use the same HEX function to display the result.
We now have a 64 bit system counter and multiply hardware with 64 bit results.
Converting these values introduces more challenges again.
COG ram is limited, so making the most of it is worth looking at different methodologies.
If we get HUBEXEC then sure it becomes irrelevant.
Anyhow I know I've learnt a few new tricks from this discussion so that's a bonus too.
IIRC some NC machines use BCD output shaft encoders.
More common is Gray Code absolute encoders, which are used because they have one-bit-change per location change.
There is a similar problem to BCD, for Gray to Binary conversion
A series of XORs can convert, like this from google.
Converting the Gray code to binary is:
Retain the Most significant bit as it is and for the rest of the bits keep xoring the successive bits.
ie Gn Gn-1 Gn-2 ........ G1 is the gray code and Bn Bn-1 .......B1 is the binary code.
Bn= Gn and for all the other bits Bn-1 = Gn-1 XOR Gn
Smallest code to do that on P2 (16..32 bits) ?
There is also an interesting variant, called Single Track Gray Code http://en.wikipedia.org/wiki/Gray_code
Those do not quite cover all codes, but can allow some better mechanical solutions.
STGC makes the encoder wheel fairly simple to construct but as more resolution is required
the sensor/detector arrangement becomes quite complex.
An interesting shift register arrangement is also needed to calculate position.
I looked at this a few years ago, I still find it interesting
I am sorry, but I just don't see the requirement for this to be fast, so it may as well be done in sw and keep any spare opcodes for something that really counts (for now or later).
I am sorry, but I just don't see the requirement for this to be fast, so it may as well be done in sw and keep any spare opcodes for something that really counts (for now or later).
It's not speed, so much as it is denser code and ease of use.
I'm rearranging the opcodes again to accommodate the hub execution branches. I've expanded and simplified things, somewhat, so there are lots of opcode slots available for all kinds of things. This sort of function hardly takes any gates. It's mainly the muxing into the result path, which is far from critical-path for something like BCDADJ. It really wouldn't be a big deal.
I added two more single-clock unary instructions to perform the binary<->BCD shift function we've been talking about. This amounts to a small number of gates and the output feeds into the 'result' bus, which includes a bit for carry.
Note that only the middle sections of each Verilog snippet actually create any logic. The pre sections get the data jigged up into 4-bit fields that can be easily worked and the post sections put it back into a long.
I added two more single-clock unary instructions to perform the binary<->BCD shift function we've been talking about. This amounts to a small number of gates and the output feeds into the 'result' bus, which includes a bit for carry.
Note that only the middle sections of each Verilog snippet actually create any logic. The pre sections get the data jigged up into 4-bit fields that can be easily worked and the post sections put it back into a long.
Thanks for posting the Verilog code! I can't wait to see the Verilog for the whole COG if you end up opening that up for P3.
While I don't think the above was necessary, and neither is the following, might it be a worthwhile consideration?
Conversion from an ASCII byte to/from hex 0..9,A..F is often done. It is a simple conversion, now aided by cmpr.
There are two useful conversions...
ASCII $39-$39,$41..$46,$61..$66 (0..9,A..F,a..f) -> $00-0F
$00..$0F -> ASCII $30..$39,$41..$46 (0..9,A..F)
It could even be done as a hex byte to 2 ASCII bytes (word).
The ASCII to hex could optionally set C if it was not within conversion ranges.
While I don't think the above was necessary, and neither is the following, might it be a worthwhile consideration?
Conversion from an ASCII byte to/from hex 0..9,A..F is often done. It is a simple conversion, now aided by cmpr.
There are two useful conversions...
ASCII $39-$39,$41..$46,$61..$66 (0..9,A..F,a..f) -> $00-0F
$00..$0F -> ASCII $30..$39,$41..$46 (0..9,A..F)
It could even be done as a hex byte to 2 ASCII bytes (word).
The ASCII to hex could optionally set C if it was not within conversion ranges.
Comments
Maybe we should just do a single-clock instruction that works on D, inputting and outputting a carry. That way, they could be chained together to get those ten digits, or even twenty if you started with a 64-bit value. That would be simpler to implement and more flexible.
That is the best solution I think. Flexibility is the key.
Can't it be reversed to go the other direction, or does something get lost?
If you subtracted 3 from any nibble that was => 8 then right-shifted, wouldn't that do the trick? I'm amazed at how people come up with these algorithms.
The pixel mixer does four-byte add and saturate.
Yes, the BCDADJ instruction is not necessary, but it's interesting to know about.
FYI Chip,
To reverse, shift right then any nibble >4 sub 3.
Seems to work Ok.
Can you post a10 digit example, optimized for size ? ( staying with the opcodes we have )
An easy way to manage 64 (or more!) bits would be great, and 10 digits is important to fit all of 32 bits.
Since this seems to be constrained within nibbles,except for the 1 bit carry, that should not be too hard.
With what we've got in opcodes your V3 seems to be the smallest SW solution.
As mentioned earlier the "DAA" reference was probably misleading.
I don't believe anyone was wanting to do BCD math, it was more about conversion
of values from hex to decimal for display purposes in the most efficient/compact way.
Many SW solutions evolved from this discussion.
In P1 Spin the DEC function in some of the serial comms objects is a typical example.
In some debugging code where there is already code to display hex values a conversion
to BCD can use the same HEX function to display the result.
We now have a 64 bit system counter and multiply hardware with 64 bit results.
Converting these values introduces more challenges again.
COG ram is limited, so making the most of it is worth looking at different methodologies.
If we get HUBEXEC then sure it becomes irrelevant.
Anyhow I know I've learnt a few new tricks from this discussion so that's a bonus too.
More common is Gray Code absolute encoders, which are used because they have one-bit-change per location change.
There is a similar problem to BCD, for Gray to Binary conversion
A series of XORs can convert, like this from google.
Converting the Gray code to binary is:
Retain the Most significant bit as it is and for the rest of the bits keep xoring the successive bits.
ie Gn Gn-1 Gn-2 ........ G1 is the gray code and Bn Bn-1 .......B1 is the binary code.
Bn= Gn and for all the other bits Bn-1 = Gn-1 XOR Gn
Smallest code to do that on P2 (16..32 bits) ?
There is also an interesting variant, called Single Track Gray Code
http://en.wikipedia.org/wiki/Gray_code
Those do not quite cover all codes, but can allow some better mechanical solutions.
STGC makes the encoder wheel fairly simple to construct but as more resolution is required
the sensor/detector arrangement becomes quite complex.
An interesting shift register arrangement is also needed to calculate position.
I looked at this a few years ago, I still find it interesting
Here's a 64 bit binary to 20 digit BCD optimized for size.
Neat use of SETQUAZ. Those special instructions sure would make that shorter.
It's not speed, so much as it is denser code and ease of use.
I'm rearranging the opcodes again to accommodate the hub execution branches. I've expanded and simplified things, somewhat, so there are lots of opcode slots available for all kinds of things. This sort of function hardly takes any gates. It's mainly the muxing into the result path, which is far from critical-path for something like BCDADJ. It really wouldn't be a big deal.
Here's the binary to BCD shift instruction:
Here's the BCD to binary shift instruction:
Note that only the middle sections of each Verilog snippet actually create any logic. The pre sections get the data jigged up into 4-bit fields that can be easily worked and the post sections put it back into a long.
Thanks for posting the Verilog code too.
Nice to know how it all works.
I think this works with Shift&Test algorithm, so there is no real limit on how many digits are supported ?
You can do more digits than any human can make sense of, or than can represent any practical measurement.
Or even just the code for the Counters ...
Conversion from an ASCII byte to/from hex 0..9,A..F is often done. It is a simple conversion, now aided by cmpr.
There are two useful conversions...
ASCII $39-$39,$41..$46,$61..$66 (0..9,A..F,a..f) -> $00-0F
$00..$0F -> ASCII $30..$39,$41..$46 (0..9,A..F)
It could even be done as a hex byte to 2 ASCII bytes (word).
The ASCII to hex could optionally set C if it was not within conversion ranges.
In my opinion --- Al that save COG space are worth to think on if Chip have spare Instructions space!