I've done a little code tidy up and it now sorts the score table by combination order. Here's the s16 scores again, I've also run the "Byte1" scoring now too:
Ramon,
Check out me "extract_score" Bash function. I'm a little proud of that one. Lots of googling involved to get it right.
I still don't really know any of the meanings of the various bracketing mechanisms [], [[]], (), (()), {} ... about the only one I'm confident in using is "" and '', and even then I'm not sure if there is any distinction between single and double quotes.
I just noticed something. The higher up the adder chain, the better the random quality gets. Bytes, even a bit, extracted from the top is better quality than data extracted from lower bits.
Yeah, that was the reason I was historically sometimes trimming the search range to improve completion times but I've decided that's too early to be blindly eliminating candidates.
Here's the current sources again but with all scoring variants enabled now:
I had to search what the ^^ was doing there. Also the same for shopt.
I prefer not to employ commands that depend on some version of bash. That uppercase (^^) needs at least bash version 4.0. They can also be done with 'tr' or 'awk'. It will be much slower but maybe is more portable.
Also the readline can be reordered to the first command and grep can be put inside a '<( )'to avoid the pipe (and shopt).
$ read length value suffix line < <(grep 'length=' test-s9/PRscore-xs-s9bit-a1b2c6.out | tail -1)
Or much better ... you can directly modify PractRand source code to output your desired printf format. (but maybe it's not as funny as shell scripting) :
Thanks for the update. The scripts worked great (for bit test) (I have just seen that you updated the scripts to enable all test). Previous version had some issue detecting the CPU load with my computer. Before I needed to comment out the section that detected CPU load, but this time they worked perfect the first time.
Yeah, there was a bug in the CPU load test early on. It would just keep launching a new task irrespective. I think that was another example of me not understanding the bracketing I was using.
Chip,
I've just finished s19 and s20 bit and byte1 scoring runs and I'm thinking it's showing weakness in the lower bits in general, not just bit 0. It's more obvious in the bigger word sizes.
Chip,
I've just finished s19 and s20 bit and byte1 scoring runs and I'm thinking it's showing weakness in the lower bits in general, not just bit 0. It's more obvious in the bigger word sizes.
I'll do some more testing ...
No, there's no real consistency. Other than "byte1" variant never showing any peaks the way the others do, any other signs aren't enough to be concerned over given how sensitive Practrand seems to be.
Here's the s16 scores again with a couple of extra variants added from the above experimental look into bit1 quality. Our chosen 14 2 7 combination holds up well again with its consistency.
Variant Key:
- "Word1" is the full summed word size minus the lsb, for s16 (Xorshift32+/Xoroshiro32+) that is the top 15 bits [15:1].
- "Word2" is the full summed word size minus two lsb, for s16 that is the top 14 bits [15:2].
- "Byte08" is the most significant 8 bits of the summed word, for s16 that is bits [15:8]. Label changes per position.
- "Byte04" is half way down the summed word, for s16 that is bits [11:4]. Label changes per position.
- "Byte2" is always bits [9:2] for all word sizes.
- "Byte1" is always bits [8:1] for all word sizes.
- "Bit" is msb.
dat org
bmask dirb,#15 'drive LEDs
loop getword outb,state,#1 'output sum to LEDs (LSB is low-quality)
add outb,state
waitx ##80_000_000/10
xoro32 state 'iterate xoroshiro32+
jmp #loop
state long 1 'seed {s1,s0} with 1
The top 15 bits of the 16-bit sum 'are high-quality pseudo-random data' but bit 0 is poor quality. A 16-bit addition gives a 17-bit result and I was wondering whether bit 16 is more random than bit 0 and could replace it?
Nope, the carry bit is horribly poor quality. I had posed that very question myself back when I first got interested. In fact it was that question that got me doing my own testing.
Hehe, many many hours. So many mistakes with having to do all the testing over and over again. But I certainly learnt a few things along the way. Although the knowledge is fading already.
Dang, it just dawned on me, far too late, that DRAM prices have been going through the roof all this year! And even with those prices, supply appears to have gone to virtually nil.
One of my new 16GB DIMMs failed and there is no exact replacements in stock. I've got the full credit for them (Returned the pair) and can claim it as a refund, but the options for replacement are slim pickings.
I'm back using me old PC for the moment, and the remainder of the year looks bleak.
I've been thinking about XORO32 ...
The top 15 bits of the 16-bit sum 'are high-quality pseudo-random data' but bit 0 is poor quality. A 16-bit addition gives a 17-bit result and I was wondering whether bit 16 is more random than bit 0 and could replace it?
Nope, the carry bit is horribly poor quality. I had posed that very question myself back when I first got interested. In fact it was that question that got me doing my own testing.
Thanks Evan. Before you forget even more, could you please explain how the most random triplet combination you found of 14,2,7 translates into Chip's logic below?
Chip wants to keep the state storage to 32 bits so it fits cleanly in a single Cog register. So, its really as good as it can be given the requirements.
As for how the final triplet combination was chosen, here's where I came into sync with what Chip was after - http://forums.parallax.com/discussion/comment/1407995/#Comment_1407995 You can see there why Chip picked out 14,2,7 with its even quality in all three sampling variants. I'd sort of favoured 10,5,13 with its higher full word quality but wasn't pushing it due to its non-conforming triplet combination.
Prior to that, I'd kind of been off on my own mission not realising that starting with known full repeat lengths was pretty important. Heater beat me up about that.
You do have the option of retaining the bottom bit of the summing when using the XORO32 instruction. It's not that bad a quality. Someone suggested it lowered the results to equivalent of Xorshift, which is still considered a perfectly okay PRNG.
As for how the final triplet combination was chosen, here's where I came into sync with what Chip was after - http://forums.parallax.com/discussion/comment/1407995/#Comment_1407995 You can see there why Chip picked out 14,2,7 with its even quality in all three sampling variants. I'd sort of favoured 10,5,13 with its higher full word quality but wasn't pushing it due to its non-conforming triplet combination.
Prior to that, I'd kind of been off on my own mission not realising that starting with known full repeat lengths was pretty important. Heater beat me up about that.
Thanks again Evan. I've been reading all the earlier posts in this thread today once again, which I should have done before. Sorry to all for asking newbie-style questions.
Chip wants to keep the state storage to 32 bits so it fits cleanly in a single Cog register. So, it's really as good as it can be given the requirements.
Ah, but is it? Can we please put an end to these malicious rumours that the P2 is a 32-bit processor? It's 34-bit!
XORO32 (existing)
dat org
bmask dirb,#15 'drive LEDs
loop getword outb,state,#1 'output sum to LEDs
'(LSB is low-quality)
add outb,state
waitx ##80_000_000/10
xoro32 state 'iterate xoroshiro32+
jmp #loop
state long 1 'seed {s1,s0} with 1
XORO34 (possible future option?)
dat org
bmask dirb,#15
loop getword outb,state,#1
add outb,state
if_c_and_z add outb,#1 'extra instruction
'increment sum if LSBs both 1
'all 16 bits are high-quality
waitx ##80_000_000/10
xoro34 state wcz 'modified instruction, wcz added
'state includes c and z where
'c is LSB of high 17-bit word
'z if LSB of low 17-bit word
jmp #loop
state long 1 'or 0
There's a bit more to seeding than shown as c & z are involved, but not very much.
Presumably 14,2,7 for xoro32 translates to 15,3,8 for xoro34?
I wonder how that performs in randomness tests? Or is there a better set?
Presumably 14,2,7 for xoro32 translates to 15,3,8 for xoro34?
I wonder how that performs in randomness tests? Or is there a better set? (Probably, I imagine, assuming all of this new xoro34 stuff isn't just like a chimp bashing away on a typewriter.)
I'm at work so I'll get back to you in full later. Suffice to say I also learnt there is no patterns to the combinations. Even my "conforming" to the original ratios is probably worthless.
I did some full run tests on many word sizes. But only the s16 size was done with the latest scoring variations. You can find some s17 (Xoroshiro34) results here - http://forums.parallax.com/discussion/comment/1410683/#Comment_1410683 They aren't very pretty, the odd sizes were that way. I'll do some more testing with the extra variants to see if anything sticks out ...
Are any of these max length? Any other possible sets?
The score charts cover every full period combination for the specified word size.
Finding all the full period triplet candidates is where the long search times come from. The PractRand quality testing is short in comparison. Beyond s20 (Xoroshiro40) it became weeks per attempt on my 8-core CPU. I had thought about trying to use CUDA or something to help out, but never got around to it.
PS: My early testing was just doing quality testing of groups of arbitrary triplets. Some got very high scores, but they weren't full period triplets.
Clarification: I keep reverting back to the term "full length repeat", I note you've used the term "max length". The researchers I read seem to use the term "full period". I'll try to stick to "full period" naming.
Finding all the full period triplet candidates is where the long search times come from. The PractRand quality testing is short in comparison. Beyond s20 (Xoroshiro40) it became weeks per attempt on my 8-core CPU.
It's a basic brute force of cycle the PRNG state until the starting seed repeats. Full period is identified by a repeat at exactly the maximum possible. For s16, it's 2^32-1. For s17, it's 2^34-1. My code also checks to see if any exceed the presumed full period. So far, none have.
The more long period triplets there are in a particular word size the longer the whole search takes. This is amplified massively by the raised to power X nature of the word size.
The researchers that do the basic development also devise methods for jumping ahead in the PRNG state sequence. That way they can find (probably in some statistically fashion) full period candidates even when the period is enormous. Not something I've even come close to comprehending.
Yup, my version completed an s16 search in a few minutes. s17 was probably about an hour. s20 was a couple of days. s21 never completed when I terminated it after maybe a week.
Comments
Check out me "extract_score" Bash function. I'm a little proud of that one. Lots of googling involved to get it right.
I still don't really know any of the meanings of the various bracketing mechanisms [], [[]], (), (()), {} ... about the only one I'm confident in using is "" and '', and even then I'm not sure if there is any distinction between single and double quotes.
EDIT: Corrected spelling of Ramon's name.
Here's the current sources again but with all scoring variants enabled now:
I had to search what the ^^ was doing there. Also the same for shopt.
I prefer not to employ commands that depend on some version of bash. That uppercase (^^) needs at least bash version 4.0. They can also be done with 'tr' or 'awk'. It will be much slower but maybe is more portable.
Also the readline can be reordered to the first command and grep can be put inside a '<( )'to avoid the pipe (and shopt).
Or much better ... you can directly modify PractRand source code to output your desired printf format. (but maybe it's not as funny as shell scripting) :
Thanks for the update. The scripts worked great (for bit test) (I have just seen that you updated the scripts to enable all test). Previous version had some issue detecting the CPU load with my computer. Before I needed to comment out the section that detected CPU load, but this time they worked perfect the first time.
I'll do both those.
Yeah, there was a bug in the CPU load test early on. It would just keep launching a new task irrespective. I think that was another example of me not understanding the bracketing I was using.
Chip,
I've just finished s19 and s20 bit and byte1 scoring runs and I'm thinking it's showing weakness in the lower bits in general, not just bit 0. It's more obvious in the bigger word sizes.
I'll do some more testing ...
Variant Key:
- "Word1" is the full summed word size minus the lsb, for s16 (Xorshift32+/Xoroshiro32+) that is the top 15 bits [15:1].
- "Word2" is the full summed word size minus two lsb, for s16 that is the top 14 bits [15:2].
- "Byte08" is the most significant 8 bits of the summed word, for s16 that is bits [15:8]. Label changes per position.
- "Byte04" is half way down the summed word, for s16 that is bits [11:4]. Label changes per position.
- "Byte2" is always bits [9:2] for all word sizes.
- "Byte1" is always bits [8:1] for all word sizes.
- "Bit" is msb.
I think ours will probably be used for bits 15:01 and for bits 15:08. So, (14,2,7) looks the best for that, and that's what's in the Verilog.
The top 15 bits of the 16-bit sum 'are high-quality pseudo-random data' but bit 0 is poor quality. A 16-bit addition gives a 17-bit result and I was wondering whether bit 16 is more random than bit 0 and could replace it?
How many hours did we all spend testing this and that here?
One of my new 16GB DIMMs failed and there is no exact replacements in stock. I've got the full credit for them (Returned the pair) and can claim it as a refund, but the options for replacement are slim pickings.
I'm back using me old PC for the moment, and the remainder of the year looks bleak.
Thanks Evan. Before you forget even more, could you please explain how the most random triplet combination you found of 14,2,7 translates into Chip's logic below?
Does this algorithm work for arbitrary data widths? Specifically, could there be a xoro34?
Whole point of xoro34 is to produce 16 bits of high-quality pseudo-random data on a P2-like CPU some time in the future.
As for how the final triplet combination was chosen, here's where I came into sync with what Chip was after - http://forums.parallax.com/discussion/comment/1407995/#Comment_1407995 You can see there why Chip picked out 14,2,7 with its even quality in all three sampling variants. I'd sort of favoured 10,5,13 with its higher full word quality but wasn't pushing it due to its non-conforming triplet combination.
Prior to that, I'd kind of been off on my own mission not realising that starting with known full repeat lengths was pretty important. Heater beat me up about that.
Thanks again Evan. I've been reading all the earlier posts in this thread today once again, which I should have done before. Sorry to all for asking newbie-style questions.
Ah, but is it? Can we please put an end to these malicious rumours that the P2 is a 32-bit processor? It's 34-bit!
XORO32 (existing)
XORO34 (possible future option?)
There's a bit more to seeding than shown as c & z are involved, but not very much.
Presumably 14,2,7 for xoro32 translates to 15,3,8 for xoro34?
I wonder how that performs in randomness tests? Or is there a better set?
I did some full run tests on many word sizes. But only the s16 size was done with the latest scoring variations. You can find some s17 (Xoroshiro34) results here - http://forums.parallax.com/discussion/comment/1410683/#Comment_1410683 They aren't very pretty, the odd sizes were that way. I'll do some more testing with the extra variants to see if anything sticks out ...
Best Xoro34 sets appear to be:
Are any of these max length? Any other possible sets?
For the comparison with Xoro32 to be fair, Xoro34_sum[15:1] should be tested as well as Xoro34_sum[15:0].
Finding all the full period triplet candidates is where the long search times come from. The PractRand quality testing is short in comparison. Beyond s20 (Xoroshiro40) it became weeks per attempt on my 8-core CPU. I had thought about trying to use CUDA or something to help out, but never got around to it.
PS: My early testing was just doing quality testing of groups of arbitrary triplets. Some got very high scores, but they weren't full period triplets.
Clarification: I keep reverting back to the term "full length repeat", I note you've used the term "max length". The researchers I read seem to use the term "full period". I'll try to stick to "full period" naming.
If so, I'm really keen to know how Xoro34_sum[15:1] perform.
The more long period triplets there are in a particular word size the longer the whole search takes. This is amplified massively by the raised to power X nature of the word size.
The researchers that do the basic development also devise methods for jumping ahead in the PRNG state sequence. That way they can find (probably in some statistically fashion) full period candidates even when the period is enormous. Not something I've even come close to comprehending.
http://forums.parallax.com/discussion/comment/1407299/#Comment_1407299
Any triplets that failed were ruled out immediately.
Change to 2^34-1 and we're good to go?