yes - relative to small.
The best and worst are the variances, but the average "should" be more relevant. However, during some timing of the various digits, I noticed the values are not truly random, because the same code changes in some of the msd digits produce opposite results for the lsd digits !!
I thought about running for every combination of 0-$FFFF but that would take many days from my rough calcs.
yes - relative to small.
The best and worst are the variances, but the average "should" be more relevant.
It depends - usually in embedded design, deterministic limits are quite important, especially in time, and that makes MAX more important, but average is going to be useful for power-usage calculations, provided it is an actual usage average.
However, during some timing of the various digits, I noticed the values are not truly random, because the same code changes in some of the msd digits produce opposite results for the lsd digits !!
I thought about running for every combination of 0-$FFFF but that would take many days from my rough calcs.
Worst case / best case test patterns vary with code design, but the simplest/smallest loop will have the longest flight times for the most adds, which means the most BCD 9 digits output
Here, any average is highly value sensitive, and yet a final use will not expect to see random values - a user will be working on a measurement to some numbers of digits, and they may even be sitting on a 10,000,000 // 9,999,999 boundary and never actually 'see' an average outcome.
If the MIN is to be valid, then a test value of 0000000 needs to be in the test pattern.
As a reality check, even 347 instructions is fast - a result in 17.35us, or a conversion bit rate of 2.4MBaud based on BCD data, or 4.8MBd based on ASCII data.
One possible use for fast/small Decimal streaming, is to write a CSV logging file directly to serial memory.
I see Spansion mentioned 1.5MB/s write speeds, (call that the cutting edge), so to feed to that limit, indicates around 107 instructions/conversion.
Right in the ballpark of these results. ( #31 meets that, if at a largish code size, and less LZS friendly )
Note also that a Prop II will easily outpace present Spansion best write speeds, even in the smallest-size variant.
To me, the worst and/or the size is the most important parameter.
The average would be quite a bit less important.
If I had to score each methoid, I'd give them something like worst speed=40%, size=40%, average speed=20%
Here is my new submission.
Here is what is reported
Code size 19 longs
Best=133
Worst=313
Avg=181
Things to note:
Unique use of djnz in the digit loop
Moved "res" variables to init code
mov ndigits,#10 'Do ten digits.
mov digit_addr,dec_addr 'Initialize digits hub pointer.
mov digit,neg_ascii_zero 'Start with -"0"
:first_digit
cmpsub x,first_comp wc, wr ' Do first digit as special case
if_c djnz digit,#:first_digit ' As each subtract succeeds, digit will advance (more negative)
:next_digit
neg digit,digit 'Negate digit value to get ascii
wrbyte digit,digit_addr 'Write the digit to the string.
add digit_addr,#1 'Increment digit pointer.
mov digit,neg_ascii_zero 'Start with -"0"
:next_digit_loop
cmpsub x,second_comp wc, wr 'Main digit loop
if_c djnz digit,#:next_digit_loop 'As each subtract succeeds, digit will advance (more negative)
mov comp,x 'x = x * 10
shl x,#2
add x,comp
shl x,#1
djnz ndigits,#:next_digit 'Go back for next digit.
Thanks Phil. And good point about not needing the constant. That brings the size down to only 18 longs.
Size=18 longs
Best=133
Worst=313
Avg=181
mov ndigits,#10 'Do ten digits.
mov digit_addr,dec_addr 'Initialize digits hub pointer.
neg digit,#"0" 'Start with -"0"
:first_digit
cmpsub x,first_comp wc, wr ' Do first digit as special case
if_c djnz digit,#:first_digit ' As each subtract succeeds, digit will advance (more negative)
:next_digit
neg digit,digit 'Negate digit value to get ascii
wrbyte digit,digit_addr 'Write the digit to the string.
add digit_addr,#1 'Increment digit pointer.
neg digit,#"0" 'Start with -"0"
:next_digit_loop
cmpsub x,second_comp wc, wr 'Main digit loop
if_c djnz digit,#:next_digit_loop 'As each subtract succeeds, digit will advance (more negative)
mov comp,x 'x = x * 10
shl x,#2
add x,comp
shl x,#1
djnz ndigits,#:next_digit 'Go back for next digit.
'-------[End Your Golf Code]---------------------------------------------------
end_code add timer,cnt 'DO NOT MODIFY.
wrlong timer,time_addr 'DO NOT MODIFY.
mov x,#0 'DO NOT MODIFY.
wrlong x,par 'DO NOT MODIFY.
jmp #main_lp 'DO NOT MODIFY.
begin_var 'DO NOT MODIFY.
'-------[Begin Your Golf Variables]--------------------------------------------
first_comp long 1_000_000_000 'Initial digit compare values.
second_comp long 0_100_000_000
Thanks Phil. And good point about not needing the constant. That brings the size down to only 18 longs.
Size=18 longs
Best=133
Worst=313
Avg=181
Re-investing the saved long can get you down to 127/311/175, a further 3 (22) result in 127/307/175. Minor restructuring (20) will get us down to 125/305/173.
20: 125/305/173
[COLOR="#FF0000"]mov ndigits,#9{+1}[/COLOR] 'Do ten digits.
mov digit_addr,dec_addr 'Initialize digits hub pointer.
neg digit,#"0" 'Start with -"0"
:first_digit
cmpsub x,first_comp wc ' Do first digit as special case
if_c djnz digit,#:first_digit ' As each subtract succeeds, digit will advance (more negative)
neg digit,digit 'Negate digit value to get ascii
wrbyte digit,digit_addr 'Write the digit to the string.
:again add digit_addr,#1 'Increment digit pointer.
neg digit,#"0" 'Start with -"0"
:next_digit_loop
cmpsub x,second_comp wc 'Main digit loop
if_c djnz digit,#:next_digit_loop 'As each subtract succeeds, digit will advance (more negative)
mov comp,x 'x = x * 10
shl x,#2
add x,comp
shl x,#1
[COLOR="#FF0000"]neg digit,digit[/COLOR] 'Negate digit value to get ascii
[COLOR="#FF0000"]wrbyte digit,digit_addr[/COLOR] 'Write the digit to the string.
djnz ndigits,#:again 'Go back for next digit.
21: 121/301/169
mov ndigits,#9{+1} 'Do ten digits.
mov digit_addr,dec_addr 'Initialize digits hub pointer.
neg digit,#"0" 'Start with -"0"
:first_digit
cmpsub x,first_comp wc ' Do first digit as special case
if_c djnz digit,#:first_digit ' As each subtract succeeds, digit will advance (more negative)
neg digit,digit 'Negate digit value to get ascii
wrbyte digit,digit_addr 'Write the digit to the string.
jmp #:skip
:again mov comp,x 'x = x * 10
shl x,#2
add x,comp
shl x,#1
:skip add digit_addr,#1 'Increment digit pointer.
neg digit,#"0" 'Start with -"0"
:next_digit_loop
cmpsub x,second_comp wc 'Main digit loop
if_c djnz digit,#:next_digit_loop 'As each subtract succeeds, digit will advance (more negative)
neg digit,digit 'Negate digit value to get ascii
wrbyte digit,digit_addr 'Write the digit to the string.
djnz ndigits,#:again 'Go back for next digit.
Adding Bean's trick of -"0" brings my smallest code down to
21 longs: 123/275/164
movs loop,#comp1 ' setup msd comparison
mov ndigits,#10 'Do ten digits.
mov digit_addr,dec_addr 'Initialize digits hub pointer.
next_digit neg digit,#"0" 'Start with -"0"
loop cmpsub x,0-0 wc,wz ' at least 1?
if_c djnz digit,#loop ' y: -(digit+1)
neg digit,digit 'Negate digit value to get ascii
wrbyte digit,digit_addr 'Write the digit to the string.
add digit_addr,#1 'Increment digit pointer.
add loop,#1 ' setup for next msd comparison
djnz ndigits,#next_digit 'Go back for next digit.
'-------[Begin Your Golf Variables]--------------------------------------------
comp1 long 1_000_000_000
long 0_100_000_000
long 0_010_000_000
long 0_001_000_000
long 0_000_100_000
long 0_000_010_000
long 0_000_001_000
long 0_000_000_100
long 0_000_000_010
long 0_000_000_001
Great going, guys! I hadn't dreamed that things could be taken this far. There is one more trick that could be spun on Cluso's code to reduce its hub footprint. Although my test template does not accommodate one-time initialization, I believe Cluso's table of ten constants can be pre-computed using fewer than ten instructions, thus moving it to the RES area. Saving hub space like this does have a trade-off in requiring more cog space, though.
Based on bin2dec_golf_v3_cluso[smaller2].spin: 21/122/270/162. Size could be one less if you are prepared to relocate the table to a suitable location where test loop, #%1111 wz gives you a usable end condition.
movs loop,#comp1 ' setup msd comparison
mov digit_addr,dec_addr ' Initialize digits hub pointer.
next_digit neg digit,#"0" ' Start with -"0"
loop cmpsub x,0-0 wc ' at least 1?
if_c djnz digit,#loop ' y: -(digit+1)
neg digit,digit ' Negate digit value to get ascii
wrbyte digit,digit_addr ' Write the digit to the string.
add digit_addr,#1 ' Increment digit pointer.
[COLOR="orange"]cmp loop,base wz[/COLOR] ' 10 digits done?
if_nz djnz loop,#next_digit ' Go back for next digit.
'-----------------------------------------------
base cmpsub x,comp0 wc
comp0 long 0_000_000_001
long 0_000_000_010
long 0_000_000_100
long 0_000_001_000
long 0_000_010_000
long 0_000_100_000
long 0_001_000_000
long 0_010_000_000
long 0_100_000_000
comp1 long 1_000_000_000
Phil: I did in fact start on this method to create the values in code. I am also going to create the comparison code too. 1 is *10, so the logic is simple. But I ran out of time - it's Mothers Day here so we have been out since Friday evening with the kids and today my brother is bringing my mum over. I have been amazed at the little tricks that each of us have contributed. We have versions throughout the scale from fast to small. What a group of people can do is really inspiring.
In trying to work out the possibility of the code generating the decimal number table, I noticed an interesting pattern between the binary digits and the decimal value.
Not sure that I can do much with it, but an interesting pattern emerging none the less... especially the LSB of the Binary compared to the Decimal
Here's Kuroneko's code with my brain-dead initialization of the table, saving one hub long:
'-------[Begin Your Golf Initialization]---------------------------------------
'Insert any initialization code here.
mov comp,#1
store_comp mov comp0,comp
add store_comp,_512
mov digit,comp
shl digit,#2
add comp,digit wc
shl comp,#1
if_nc jmp #store_comp
'-------[End Your Golf Initialization]-----------------------------------------
end_init 'DO NOT MODIFY.
main_lp rdlong x,par wz 'DO NOT MODIFY.
if_z jmp #main_lp 'DO NOT MODIFY.
neg timer,cnt 'DO NOT MODIFY.
begin_code 'DO NOT MODIFY.
'-------[Begin Your Golf Code]-------------------------------------------------
'
' Upon entry, x contains a 32-bit binary value, and
' dec_addr contains the hub address of the first byte of the result string.
' Your golf code must write ten ASCII digits to the result string.
movs loop,#comp1 ' setup msd comparison
mov digit_addr,dec_addr ' Initialize digits hub pointer.
next_digit neg digit,#"0" ' Start with -"0"
loop cmpsub x,0-0 wc ' at least 1?
if_c djnz digit,#loop ' y: -(digit+1)
neg digit,digit ' Negate digit value to get ascii
wrbyte digit,digit_addr ' Write the digit to the string.
add digit_addr,#1 ' Increment digit pointer.
cmp loop,base wz ' 10 digits done?
if_nz djnz loop,#next_digit ' Go back for next digit.
'-------[End Your Golf Code]---------------------------------------------------
end_code add timer,cnt 'DO NOT MODIFY.
wrlong timer,time_addr 'DO NOT MODIFY.
mov x,#0 'DO NOT MODIFY.
wrlong x,par 'DO NOT MODIFY.
jmp #main_lp 'DO NOT MODIFY.
begin_var 'DO NOT MODIFY.
'-------[Begin Your Golf Variables]--------------------------------------------
base cmpsub x,comp0 wc
_512 long 512
comp0 res 9
comp1 res 1
comp res 1
digit res 1
digit_addr res 1
'-------[End Your Golf Variables]----------------------------------------------
And the results:
bin2dec code size: 20 longs
Success after 7916 tests.
Number of instructions needed per conversion:
Best: 122
Worst: 270
Avg: 162
Attached is a new template that allows for initialization code.
This has been great to follow along with and I have learned a lot about cog register reuse. But I have a question: in Bean's code he reassigns cog registers
specifically the second register of cog memory
comp add time_addr, #4
gets reassigned as as temporary register for computing x = x * 10.
Doesn't this violate the Rules of Golf?
I am also wondering and attempting to figure out if the log tables can be used to play a round with the equivalent of only a four iron in my bag.
While it's Phil's challenge, my interpretation is there are two things normally in play in the propeller...
1. Hub space
2. Speed
With both these in mind, there is likely 3 solutions...
smallest and fastest are the obvious ones
and a mid solution(s) that are a result of trading hub and speed.
While the original challenge was to keep the code identical, if there is a solution by re-arranging Phil's code, I definately think it would be worthwhile posting and letting Phil decide its merrits.
It is really great to see all the ideas being fleshed out.
And, yes, it is a great tool for others to follow along and see different ideas in the ways the prop can be exploited.
Now, back to trying to attain almost the fastest (~68) with a minimum of code
While it's Phil's challenge, my interpretation is there are two things normally in play in the propeller...
1. Hub space
2. Speed
I always try to minimize COG space. This is why I used the init code for my variables. (I know I broke the rules, but hey...I'm a rule breaker)
Cog space is much more limited, and the hub space can be reclaimed after the cog is loaded.
I think there should be three categories:
Minimum Space (however you want to define it)
Maximum Speed (however you want to define it)
Overall-Small and Fast (however you want to define it)
I always try to minimize COG space. This is why I used the init code for my variables.
I'd agree, and this sort of code block/library, is likely to be included into a COG that is already doing many other things, and so COG size of such LIBs is very important.
The size that is counted also needs to be the total memory footprint, not just the inner loop.
Another candidate for a shrink effort, would be a Scaled Ratio Mathlib, that does result = Scale.32 * Num.32/Denom.32
- ie takes 3 32 bit values a,b,c , and does a32*b32->64 then 64/c32->r32
There is one line that makes me a little nervous....
if_nc jmpret par,#store_comp wc,nr
... although it never writes to the par register, the function of the "golf challenge" still works if you change that line to read...
if_nc jmp #store_comp
Yes, now that you mention it, the jmpret is from an earlier attempt to set the carry flag which suffered from a missing carry-set condition for the first loop cycle (would have cost an extra instruction). Now the carry is set by the second neg so all is fine. That said, there is nothing to get nervous about. Perfectly normal way to set the carry flag
OK this may seem tangential. I thought that use of the log lookup tables might be a way to do divide by powers of 10 associated with bin to dec conversion. Here is my program... pretty much right from the manual. I never got it to the Rules of Golf template beacuase the program does a really lousy job. It could be condensed I think but it still lacks interpolation and so is lousy.
Is this worth pursuing or should it be relegated to the I really do not know what to do with these tables category.
Postedit: Note I reorganised Phil's code to gain some additional speed
Total cog (hub) space = 69 longs total (Phil's code uses 12 longs, so for previous comparisons, this is 57 longs)
times 83/83/83
My code generates the code and tables into res longs in the cog memory. Total cog usage is now 240 longs. I am sure there are a few tricks left to shring the generation code to lower the cog/hub usage by a few longs here and there. Better code generation could yield times to 35/103/67 but I haven't bothered with this for now.
I could not think of a way to report the "res" usage. Any ideas???
Comments
First is small size
Then is fast
And last is faster
Of course, I have taken some of your ideas presented here
bin2dec_golf_v3_cluso[faster].spinbin2dec_golf_v3_cluso[small].spinbin2dec_golf_v3_cluso[fast].spin
faster ?? only if you ignore the worst column, which to most users, is the most important column.
Saves 4 longs, but jumped from 275 to 347 cycles.
Or did you mean relative to #31, not your SMALL: ?
There may be users who consider the gain of 4 longs, more precious than the trade off in MAX times.
The appeal of a single loop, is adding Leading Zero Blanking, has the smallest added-code cost.
I think here, it adds 3 lines of PASM ?
The best and worst are the variances, but the average "should" be more relevant. However, during some timing of the various digits, I noticed the values are not truly random, because the same code changes in some of the msd digits produce opposite results for the lsd digits !!
I thought about running for every combination of 0-$FFFF but that would take many days from my rough calcs.
It depends - usually in embedded design, deterministic limits are quite important, especially in time, and that makes MAX more important, but average is going to be useful for power-usage calculations, provided it is an actual usage average.
Worst case / best case test patterns vary with code design, but the simplest/smallest loop will have the longest flight times for the most adds, which means the most BCD 9 digits output
Here, any average is highly value sensitive, and yet a final use will not expect to see random values - a user will be working on a measurement to some numbers of digits, and they may even be sitting on a 10,000,000 // 9,999,999 boundary and never actually 'see' an average outcome.
If the MIN is to be valid, then a test value of 0000000 needs to be in the test pattern.
As a reality check, even 347 instructions is fast - a result in 17.35us, or a conversion bit rate of 2.4MBaud based on BCD data, or 4.8MBd based on ASCII data.
One possible use for fast/small Decimal streaming, is to write a CSV logging file directly to serial memory.
I see Spansion mentioned 1.5MB/s write speeds, (call that the cutting edge), so to feed to that limit, indicates around 107 instructions/conversion.
Right in the ballpark of these results. ( #31 meets that, if at a largish code size, and less LZS friendly )
Note also that a Prop II will easily outpace present Spansion best write speeds, even in the smallest-size variant.
The average would be quite a bit less important.
If I had to score each methoid, I'd give them something like worst speed=40%, size=40%, average speed=20%
Bean
Here is what is reported
Code size 19 longs
Best=133
Worst=313
Avg=181
Things to note:
Unique use of djnz in the digit loop
Moved "res" variables to init code
Bean
-Phil
Size=18 longs
Best=133
Worst=313
Avg=181
Bean
It's great to see the little tricks everyone has come up with.
20: 125/305/173
21: 121/301/169
21 longs: 123/275/164
bin2dec_golf_v3_cluso[smaller2].spin
Cool. Only 11 instructions.
In a large program some (maybe most) of the constants could be reused since they are common values.
Bean
-Phil
It might generate some interest in the propeller.
Bean
Not sure that I can do much with it, but an interesting pattern emerging none the less... especially the LSB of the Binary compared to the Decimal
And the results:
Attached is a new template that allows for initialization code.
-Phil
specifically the second register of cog memory
gets reassigned as as temporary register for computing x = x * 10.
Doesn't this violate the Rules of Golf?
I am also wondering and attempting to figure out if the log tables can be used to play a round with the equivalent of only a four iron in my bag.
Good fun!
sm
-Phil
1. Hub space
2. Speed
With both these in mind, there is likely 3 solutions...
smallest and fastest are the obvious ones
and a mid solution(s) that are a result of trading hub and speed.
While the original challenge was to keep the code identical, if there is a solution by re-arranging Phil's code, I definately think it would be worthwhile posting and letting Phil decide its merrits.
It is really great to see all the ideas being fleshed out.
And, yes, it is a great tool for others to follow along and see different ideas in the ways the prop can be exploited.
Now, back to trying to attain almost the fastest (~68) with a minimum of code
I always try to minimize COG space. This is why I used the init code for my variables. (I know I broke the rules, but hey...I'm a rule breaker)
Cog space is much more limited, and the hub space can be reclaimed after the cog is loaded.
I think there should be three categories:
Minimum Space (however you want to define it)
Maximum Speed (however you want to define it)
Overall-Small and Fast (however you want to define it)
Bean
-Phil
There is one line that makes me a little nervous....
... although it never writes to the par register, the function of the "golf challenge" still works if you change that line to read...
... also you can reduce the res count down by two if you alias 'digit' and 'digit_addr'.
I'd agree, and this sort of code block/library, is likely to be included into a COG that is already doing many other things, and so COG size of such LIBs is very important.
The size that is counted also needs to be the total memory footprint, not just the inner loop.
Another candidate for a shrink effort, would be a Scaled Ratio Mathlib, that does result = Scale.32 * Num.32/Denom.32
- ie takes 3 32 bit values a,b,c , and does a32*b32->64 then 64/c32->r32
Is this worth pursuing or should it be relegated to the I really do not know what to do with these tables category.
(So much for playing golf with only a four iron).
Regards,
sm
Postedit: Note I reorganised Phil's code to gain some additional speed
Total cog (hub) space = 69 longs total (Phil's code uses 12 longs, so for previous comparisons, this is 57 longs)
times 83/83/83
My code generates the code and tables into res longs in the cog memory. Total cog usage is now 240 longs. I am sure there are a few tricks left to shring the generation code to lower the cog/hub usage by a few longs here and there. Better code generation could yield times to 35/103/67 but I haven't bothered with this for now.
I could not think of a way to report the "res" usage. Any ideas???
bin2dec_golf_v4_cluso[2].spin