I happened to do a quick run of that last night. Attached is the full set of tests. {8,2,3} is the only test that makes it to 64 MB.
There's only 16M iterations in 24 bits, so the random tester must have clued into the pattern after the 4th full cycle. What else could explain that?
Ha, I hadn't thought about it. Well, there is signs, three uncommonly suspicious cases, of an impending failure at 32 MB. It's common to see the DC6-9x1Bytes-1 case showing up early but not those other three.
Now I really don't get it. With a Schnider 32 bit LFSR, or similar, you get a 2^32 -1 sequence and surely the "quality" plenty good enough.
I was going to say, "Actually, no. You get a 2**27 -1 sequence, since you have to iterate 32 times to get a random 32-bit value. Otherwise, patterns show up. As a consequence, given a starting seed, 31/32 of the potential values are never seen in the sequence."
But that's not true, due to the "-1" (zero not being in the sequence). So, yeah, the non-repeating sequence is 2**32 - 1 long.
I would like to see this tested just using the MSB of the sum as input to the test suite. Evanh, would you mind doing that? Just testing the MSB of the sum? It would take 8 iterations for each byte of output. I wonder what the failure point would be, then. I suspect it might fully iterate through all of its 2^32-1 states before failing, in that case.
Oh, ah, The B in MSB is Byte, right?
So you want to truncate each 12-bit sum to the top 8 bits and spit those individually at PractRand rather than my existing concatenating, correct?
I was assuming you did that on the 64MB-pass case. Did you do something else? I assumed you did the xoroshiro24+ iteration and passed out the top 8 bits of the 12+12 bit sum.
In my question above, I was wondering how far the best xoroshiro32+ would pass by only outputting the top bit (not byte) of the sum. You'd have to do eight iterations to get each byte of output.
Here's the byte truncated version - throwing away the bottom 4 bits of each generated number. You can see all test scores are smaller with {8,2,3} failing at 8 MB now instead of 64 MB. That's smaller than I expected but is still one of the best scores of that run. Oops, just a moment ... I was throwing out the top 3 bits and bottom bit. Corrected now and {11,2,4} stands out as better than {8,2,3} with HQ up to 8 MB, fails at 16 MB.
I'm off to a meal so will be a bit longer with the bit truncated version ...
Evanh, thanks for running all those tests. It's interesting that you are not seeing failure up until 16MB. That kind of suggests that among known max-length shift settings, there may be one to make it all the way to 16MB. Extracting 1 byte per iteration means that its entire iteration range is almost perfect at 24 bits. This probably means that among those max-length xoroshiro32+ settings I found, one of them can go all the way to 4GB when extracting the top byte of the sum (as opposed to the top 15 bits), which is the iteration range for 32 bits. It would be ideal to get one of those xoroshiro32+ settings nailed down. I had 86 candidates there. Uh, what do you say?
I will say I'm not seeing any fixed pattern to the scores of truncations. In fact sometimes the quality of the bit sized truncation seems on par with byte sized in terms of megabytes processed. As far as I can see, this implies that repeats are not being detected by the PractRand test suit.
I guess for everything it's statistical. It's not like the whole data stream is kept and compared over its entire length all the time.
With the shifting of the lsb truncation out of the generator function I've decided to change my labelling of the filenames. Where I used to label them as size15 and size11, it's now s16 and s12 respectively.
I've rerun a set of each variant of each size and packaged them all together: (but had to change to 7zip to fit the whole lot within the forum's 2 MB limit. 7zip halved the archive size!)
It would be ideal to get one of those xoroshiro32+ settings nailed down. I had 86 candidates there. Uh, what do you say?
Since there wasn't much of a match between quality and repeat length at the full length of generated numbers, it might be worth comparing your 86 candidates with the best of the bit truncated set. It might be possible to spot repeat length as the failure mode there. Admittedly I've not provided every possible combination though.
EDIT: Doh! I've found your posted list of 86. I hadn't realised what that was before ...
This is what I get after filtering for A>C>B (I'm assuming that's a minimum requirement):
6,2,3
6,2,5
8,1,7
9,1,4
10,2,7
11,1,2
11,2,6
11,3,10
11,7,10
12,8,11
12,10,11
13,5,8
13,5,10
13,10,12
14,1,11
14,2,7
14,2,9
14,3,13
14,8,9
14,11,13
14,12,13
15,1,2
15,3,6
15,4,12
15,7,8
I'm running the full s16 combination sweep of 3375 tests for each variant now. Might take a while
Of note is the bit-truncated variant is munching through the combinations at half the speed of the other two. It has a lot of large scores so far. I'm guessing the odd combinations of a-b-c don't upset the results for this variant the same way.
Byte-truncated seems to have the most small scores. It is currently about 20-25% faster than the full word 15-bit variant.
PS: Also started same full combination sweep for the three s12 variants too. This is 1331 combinations each.
PPS: With one still finishing up an old size25 run there is now seven Bash scripts concurrently compiling and testing. I can feel the warm air from me Ryzen rig! I've never done anything like this before.
Hmm, the server I'm currently sitting on is called Mumble.co.nz (has low ping time for me, and user creatable channels). Port 64738. Maybe you can connect to it manually. I've created a Prop2 channel.
I'm running the full s16 combination sweep of 3375 tests for each variant now...
Of the 86, four had zeros, so really only 82.
The quality difference between the three variants isn't significant, imho.
Full sweep bit-truncated produced an impressive 236 (of 3375) scores that failed at 1 GByte. 63 of these 236 map onto the 82. Not perfect but way closer than the other two variants.
Byte-truncated variant produced 2 scores that failed at 2 GB, 11 scores at 1 GB, and 47 scores at 512 MB. 11 of these 60 map onto the 82.
Full 15-bit variant produced 19 scores that failed at 2 GB, and 65 scores at 1 GB. A mere 7 of these 84 map onto the 82.
I've edited {10,5,13} to show the 512MB entry for the bit-truncated column. It doesn't conform to the original 128+ configuration but does look good on the scores. Another contender would be {14,2,7}, it conforms and is nicely balanced.
Evanh, thanks so much for doing all these tests! I've only got my phone today, so I can't get into any zip files, but your last big list is very helpful.
The reason those max-run-length settings are critical to stick with is because they WILL cycle through all non-0 states, never getting hung up in some short loop, due to some unfortunate seed value. I did cut my test somewhat short, though, so it's possible that some of your higher-scoring settings are also max-run-length.
The way I discovered those max-run-length settings was to use a seed of $0000_0001, then iterate until $0000_0001 reappears, counting iterations as I go. If $0000_0001 first reappears after exactly $FFFF_FFFF iterations, you've got a max-run-length setting that cycles through every non-0 value before repeating. If you are not back to $0000_0001 after $FFFF_FFFF iterations, the PRNG got hung up in a loop and those shift settings are not ideal.
It's interesting that regardless of how you extract the random values, you get nearly the same quality, even though the numbers of iterations are up to 15x more for single-bit vs. 15-bit.
It looks like 14,2,7 is the best, overall, doesn't it?
This is what I get after filtering for A>C>B (I'm assuming that's a minimum requirement):
6,2,3
6,2,5
8,1,7
9,1,4
10,2,7
11,1,2
11,2,6
11,3,10
11,7,10
12,8,11
12,10,11
13,5,8
13,5,10
13,10,12
14,1,11
14,2,7
14,2,9
14,3,13
14,8,9
14,11,13
14,12,13
15,1,2
15,3,6
15,4,12
15,7,8
I don't know that that's a requirement, but maybe it is.
Evanh, thanks so much for doing all these tests! I've only got my phone today, so I can't get into any zip files, but your last big list is very helpful.
Compliments do feel good.
As for those zip files, the dump of scores probably don't have much value. I was still meandering at that stage.
... I did cut my test somewhat short, though, so it's possible that some of your higher-scoring settings are also max-run-length.
I'll take a shot at that too ...
It looks like 14,2,7 is the best, overall, doesn't it?
Evanh, thanks so much for doing all these tests! I've only got my phone today, so I can't get into any zip files, but your last big list is very helpful.
Compliments do feel good.
As for those zip files, the dump of scores probably don't have much value. I was still meandering at that stage.
... I did cut my test somewhat short, though, so it's possible that some of your higher-scoring settings are also max-run-length.
I'll take a shot at that too ...
It looks like 14,2,7 is the best, overall, doesn't it?
Yep, I like that one.
Isn't it funny how all this thought and effort go into coming up with a sequence of three 4-bit numbers?
If anyone ever looks at xoroshiro128+ and wonders about a 32-bit version, they will search for xoroshiro32 and wind up reading this thread, where they will find an optimal implementation.
We are somewhat reliant on PractRand giving meaningful answers without understanding how it does so.
I do have the source code but it's rather big and complicated and full of C++. I find C++ has this natural obfuscation, there is so much squishiness to the data types and their operators. The levels of indirection and hidden data copying just swamp me.
Not that I've ever really tried that hard. I'm comfortable with C.
EDIT: And I don't code for a job these days. PLCs and ladder logic is as far as I've ever needed go there. That said, learning C was an on the job self taught crash course when the opportunity arose ... so it often seems to be necessity driven.
EDIT2: That opportunity actually sprung from prior success with PLCs. We wanted a big colour touchscreen in the 1990's when they were all really expensive in the industrial market space. So, using a raw laptop LCD (Quite hard to source in small numbers back then.) plugged into a PC on an ISA card we cobbled together something close to what is now called a Panel-PC. Everything was folded and painted sheet steel. That's when I really learned C properly.
... I did cut my test somewhat short, though, so it's possible that some of your higher-scoring settings are also max-run-length.
I'll take a shot at that too ...
Took me maybe three or four hours of tinkering and three or four minutes to run. I was running it as a single task but decided to make a script that launched 15 processes in rapid fire to make use of the hardware. Attached is a screenshot, just after it was finished, showing the ramp down of processor utilisation, bottom right of display, due to differing run times.
As for the result, I get 84 hits without any zeros. That's two extra: {7,2,14} and {11,11,14}. All others are perfect 1:1 match.
Because the iterator loop can terminate on two criteria - when (s0 == 1 AND s1 == 0), and also the, supposedly impossible, roll-over of 2^32. I though I'd see if there is any cases where the impossible does occur.
Modifying the print criteria, for what gets listed in the report, gives empty reports! And I double checked by tacking on a single end-of-line upon program completion which shows up as file size of one. Yay, no impossibles occurred.
Comments
I was going to say, "Actually, no. You get a 2**27 -1 sequence, since you have to iterate 32 times to get a random 32-bit value. Otherwise, patterns show up. As a consequence, given a starting seed, 31/32 of the potential values are never seen in the sequence."
But that's not true, due to the "-1" (zero not being in the sequence). So, yeah, the non-repeating sequence is 2**32 - 1 long.
-Phil
I was assuming you did that on the 64MB-pass case. Did you do something else? I assumed you did the xoroshiro24+ iteration and passed out the top 8 bits of the 12+12 bit sum.
In my question above, I was wondering how far the best xoroshiro32+ would pass by only outputting the top bit (not byte) of the sum. You'd have to do eight iterations to get each byte of output.
I'm off to a meal so will be a bit longer with the bit truncated version ...
I will say I'm not seeing any fixed pattern to the scores of truncations. In fact sometimes the quality of the bit sized truncation seems on par with byte sized in terms of megabytes processed. As far as I can see, this implies that repeats are not being detected by the PractRand test suit.
I guess for everything it's statistical. It's not like the whole data stream is kept and compared over its entire length all the time.
I've rerun a set of each variant of each size and packaged them all together: (but had to change to 7zip to fit the whole lot within the forum's 2 MB limit. 7zip halved the archive size!)
EDIT: Doh! I've found your posted list of 86. I hadn't realised what that was before ...
6,2,3
6,2,5
8,1,7
9,1,4
10,2,7
11,1,2
11,2,6
11,3,10
11,7,10
12,8,11
12,10,11
13,5,8
13,5,10
13,10,12
14,1,11
14,2,7
14,2,9
14,3,13
14,8,9
14,11,13
14,12,13
15,1,2
15,3,6
15,4,12
15,7,8
Of note is the bit-truncated variant is munching through the combinations at half the speed of the other two. It has a lot of large scores so far. I'm guessing the odd combinations of a-b-c don't upset the results for this variant the same way.
Byte-truncated seems to have the most small scores. It is currently about 20-25% faster than the full word 15-bit variant.
PS: Also started same full combination sweep for the three s12 variants too. This is 1331 combinations each.
PPS: With one still finishing up an old size25 run there is now seven Bash scripts concurrently compiling and testing. I can feel the warm air from me Ryzen rig! I've never done anything like this before.
I've installed Mumble now. It seems easy enough, there's lots of open servers automatically registered.
Last time I did any VoIP stuff was with Teamspeak in 2006 when I was playing Eve.
I installed some mumble thing, but it doesn't seem to find any servers.
PS: Mumble version is 1.2.18
The quality difference between the three variants isn't significant, imho.
Full sweep bit-truncated produced an impressive 236 (of 3375) scores that failed at 1 GByte. 63 of these 236 map onto the 82. Not perfect but way closer than the other two variants.
Byte-truncated variant produced 2 scores that failed at 2 GB, 11 scores at 1 GB, and 47 scores at 512 MB. 11 of these 60 map onto the 82.
Full 15-bit variant produced 19 scores that failed at 2 GB, and 65 scores at 1 GB. A mere 7 of these 84 map onto the 82. I've edited {10,5,13} to show the 512MB entry for the bit-truncated column. It doesn't conform to the original 128+ configuration but does look good on the scores. Another contender would be {14,2,7}, it conforms and is nicely balanced.
The reason those max-run-length settings are critical to stick with is because they WILL cycle through all non-0 states, never getting hung up in some short loop, due to some unfortunate seed value. I did cut my test somewhat short, though, so it's possible that some of your higher-scoring settings are also max-run-length.
The way I discovered those max-run-length settings was to use a seed of $0000_0001, then iterate until $0000_0001 reappears, counting iterations as I go. If $0000_0001 first reappears after exactly $FFFF_FFFF iterations, you've got a max-run-length setting that cycles through every non-0 value before repeating. If you are not back to $0000_0001 after $FFFF_FFFF iterations, the PRNG got hung up in a loop and those shift settings are not ideal.
It's interesting that regardless of how you extract the random values, you get nearly the same quality, even though the numbers of iterations are up to 15x more for single-bit vs. 15-bit.
It looks like 14,2,7 is the best, overall, doesn't it?
I don't know that that's a requirement, but maybe it is.
As for those zip files, the dump of scores probably don't have much value. I was still meandering at that stage.
I'll take a shot at that too ...
Yep, I like that one.
Isn't it funny how all this thought and effort go into coming up with a sequence of three 4-bit numbers?
If anyone ever looks at xoroshiro128+ and wonders about a 32-bit version, they will search for xoroshiro32 and wind up reading this thread, where they will find an optimal implementation.
I do have the source code but it's rather big and complicated and full of C++. I find C++ has this natural obfuscation, there is so much squishiness to the data types and their operators. The levels of indirection and hidden data copying just swamp me.
EDIT: And I don't code for a job these days. PLCs and ladder logic is as far as I've ever needed go there. That said, learning C was an on the job self taught crash course when the opportunity arose ... so it often seems to be necessity driven.
EDIT2: That opportunity actually sprung from prior success with PLCs. We wanted a big colour touchscreen in the 1990's when they were all really expensive in the industrial market space. So, using a raw laptop LCD (Quite hard to source in small numbers back then.) plugged into a PC on an ISA card we cobbled together something close to what is now called a Panel-PC. Everything was folded and painted sheet steel. That's when I really learned C properly.
As for the result, I get 84 hits without any zeros. That's two extra: {7,2,14} and {11,11,14}. All others are perfect 1:1 match.
Modifying the print criteria, for what gets listed in the report, gives empty reports! And I double checked by tacking on a single end-of-line upon program completion which shows up as file size of one. Yay, no impossibles occurred.
Range was 1-15, I didn't do the zeros. So 15^3 = 3375 sets total, but 15^2 = 225 per task.