As I suspected, random people are throwing randomly generated algorithms at the problem hoping something will stick. With perhaps a smattering of theory to back them up.
Now is the time, if we want to add an adder stage to the XOROSHIRO128+, like we did for the XORO32++.
Is there much risk in using the guideline given about a prime rotation value near 32?
Never mind that extra adder question for the central PRNG. What we have is probably much more than we need, anyway, and I need to get the files to On Semi.
I just sent On Semi what are, hopefully, the final Verilog files.
Is there much risk in using the guideline given about a prime rotation value near 32?
XORO32 experience suggest it's okay, assuming the original triplet is truly a good quality candidate. The [14 2 7] triplet produced a wide group of good scores.
The outcome is likely a big improvement no matter what, so I guess on that basis you are good to do it. But no guarantees.
Never mind that extra adder question for the central PRNG. What we have is probably much more than we need, anyway, and I need to get the file to On Semi.
Actually, I'll run more selections of 32-bit widths first. That's what the Cogs use after all ...
It would be best to test the actual 32-bit subsets used by GETRND D or 8-bit (?) subsets used for dithering.
I've started a run of 32-bit widths and the first one to drop out is, as expected, the one that has bit1 as lsb, [32:1]. It baled at 256 GB, which is in the ballpark of previous equivalent:
length= 256 gigabytes (2^38 bytes), time= 4539 seconds
Test Name Raw Processed Evaluation
[Low1/32]BRank(18):12K(1) R=+86692 p~= 0 FAIL !!!!!!!!
...and 1567 test result(s) without anomalies
Still running is [33:2] and [34:3], all eight Ryzen cores are active. Not expecting these two to drop out until the 16 TB mark I guess. After that I'll have a shot at an odd bit width - which should go much further.
PS: Just over 2 GB of RAM used by each at the moment. I think the prior 16 TB runs got up to about 6.5 GB of RAM each. I'll keep an eye on this run too.
I told her about the xoroshiro++ algorithm on February 22nd.
Great, Pseudo Random Number Generator wars!
As I suspected, random people are throwing randomly generated algorithms at the problem hoping something will stick. With perhaps a smattering of theory to back them up.
No, just "not-invented-here" or more particularly "not-invented-by-me" syndrome, to a significant extent. xoroshiro+ is old hat now, anyway.
I've started a run of 32-bit widths and the first one to drop out is, as expected, the one that has bit1 as lsb, [32:1]. It baled at 256 GB, which is in the ballpark of previous equivalent:
length= 256 gigabytes (2^38 bytes), time= 4539 seconds
Test Name Raw Processed Evaluation
[Low1/32]BRank(18):12K(1) R=+86692 p~= 0 FAIL !!!!!!!!
...and 1567 test result(s) without anomalies
Still running is [33:2] and [34:3], all eight Ryzen cores are active. Not expecting these two to drop out until the 16 TB mark I guess. After that I'll have a shot at an odd bit width - which should go much further.
PS: Just over 2 GB of RAM used by each at the moment. I think the prior 16 TB runs got up to about 6.5 GB of RAM each. I'll keep an eye on this run too.
Both have just breezed passed 16 TB ... another 3 days 'till 32 TB mark. Not sure how that conforms, I expected at least the [33:2] to fail at or before the earlier [57:2] run.
I had planned on reflashing the BIOS this weekend too. That sucks a little now cos it might be some weeks.
No I cancelled that so it wouldn't slow these two down. I guess now that they haven't failed where I expected them to they can be lowered in priority.
The short run test, [7:0] sampling is usually a weak point, that I did do of the new algorithm satisfied me it is good to go. I'll do a short run full width 64-bit run now to double check that ...
Seba has confirmed our test outputs for the new and much improved xoroshiro128. So new it hasn't been made public yet. All 64 bits of the output are very good and are used equally by the bit scramblers. Previously with xoroshiro128+ the top 62 bits were used unequally and the weak lowest two bits ignored. Everything is fine for xoroshiro128.
Not necessarily so for XORO32. A little while ago we switched algorithms to xoroshiro32++ [14,2,7,5] from [14,2,7,6]. The only difference is a left rotation of 5, not 6, in the PRN calculation. I asked Seba to check [14,2,7,5] with BigCrush, which is one of the battery tests in TestUI, an alternative random number tester to PractRand that Evan has been using. He says we should not rely on a single test program.
The main batteries in the TestUI suite are SmallCrush, Crush and BigCrush, which are progressively more stringent and time-consuming. BigCrush results for [14,2,7,5] were much better than [14,2,7,6]. However, they were also totally wrong as xoroshiro64++ [14,2,7,5] was tested by mistake.
Unfortunately I can't compare like-for-like. Seba told me that [14,2,7,6] passes all tests in Crush. I really need to see corresponding report files to be 100% sure but as things stand I think we should revert to [14,2,7,6] which has better scores with normal PractRand tests, but not extended ones.
I'm rerunning the same candidates with and without extended testing right now ...
PS: Seba's earlier scores for [14 2 7 x] would have both been for Xoroshiro64++ not for 32++. My guess is we have no idea how good Xoroshiro32++ [14 2 7 6] scores are on the Crush tests.
Evanh, I hope our new XORO32 is optimal. I don't want to force a change with OnSemi over something like that.
TonyB_, any change of opinion?
Chip, Seba said he would run some more Crush and BigCrush tests and we'll have the output files to make a proper comparison between the two. Results not here yet and might not be now until tomorrow. I hope our XORO32 is optimal and we haven't gone backwards by switching.
Evan, can you post your extended results here, now that xoroshiro++ is not a secret any more? The problem I have is that the best two quadruples of [5,2,6,9] and [6,2,5,9] do badly on repeat frequency, which makes me doubt the worth of the extended tests. [14,2,7,5] and [14,2,7,6] do well on repeat frequency.
Chip,
This is certainly not something that requires intervention. The differences are nit picking.
Having said that, it's even murkier at this end now. It would seem I was only just starting my foray into the extended options of PractRand when I generated that score table. It didn't include the -tf 2 option.
Here's a repost of that truncated table, plus I've added the equivalent table for standard options and also for -te 1 -tf 2 extended options:
Chip,
This is certainly not something that requires intervention. The differences are nit picking.
It's more than nit-picking if [14,2,7,6] passes all the Crush tests with [14,2,7,5] failing 5. Or if [14,2,7,5] has fewer passes in BigCrush than [14,2,7,6]. The failures are due to the small state, if the generator is good, but that applies to PractRand as well.
P.S. xoroshiro[x]y++ is in the text in the output files.
Comments
As I suspected, random people are throwing randomly generated algorithms at the problem hoping something will stick. With perhaps a smattering of theory to back them up.
Notice the length of Melissa's PractRand tests:
xoroshiro64+ [31:16]
Failed after 14 hours but kept running for 210 days!
xoroshiro128+ [63:32]
Failed after 101 days!
Evan's [55:0] & [57:2] tests failed after 3 days
I had to rewrite it, to reveal its regularity, in order to manipulate it confidently:
This looks right to me.
Is there much risk in using the guideline given about a prime rotation value near 32?
Never mind that extra adder question for the central PRNG. What we have is probably much more than we need, anyway, and I need to get the files to On Semi.
I just sent On Semi what are, hopefully, the final Verilog files.
XORO32 experience suggest it's okay, assuming the original triplet is truly a good quality candidate. The [14 2 7] triplet produced a wide group of good scores.
The outcome is likely a big improvement no matter what, so I guess on that basis you are good to do it. But no guarantees.
Agreed.
length= 256 gigabytes (2^38 bytes), time= 4539 seconds
Test Name Raw Processed Evaluation
[Low1/32]BRank(18):12K(1) R=+86692 p~= 0 FAIL !!!!!!!!
...and 1567 test result(s) without anomalies
Still running is [33:2] and [34:3], all eight Ryzen cores are active. Not expecting these two to drop out until the 16 TB mark I guess. After that I'll have a shot at an odd bit width - which should go much further.
PS: Just over 2 GB of RAM used by each at the moment. I think the prior 16 TB runs got up to about 6.5 GB of RAM each. I'll keep an eye on this run too.
More confirmation that new XORO32 working as intended.
The new Verilog looks much cleaner.
No, just "not-invented-here" or more particularly "not-invented-by-me" syndrome, to a significant extent. xoroshiro+ is old hat now, anyway.
PM just sent.
I found the bug. It had to do with SETQ2+WRLONG (writing LUT longs to hub). All fixed now. New files going off to OnSemi.
I had planned on reflashing the BIOS this weekend too. That sucks a little now cos it might be some weeks.
RAM usage is now over 15 GB between them.
Wait... You mean you are also running the NEW one now? That would be good.
The short run test, [7:0] sampling is usually a weak point, that I did do of the new algorithm satisfied me it is good to go. I'll do a short run full width 64-bit run now to double check that ...
Not necessarily so for XORO32. A little while ago we switched algorithms to xoroshiro32++ [14,2,7,5] from [14,2,7,6]. The only difference is a left rotation of 5, not 6, in the PRN calculation. I asked Seba to check [14,2,7,5] with BigCrush, which is one of the battery tests in TestUI, an alternative random number tester to PractRand that Evan has been using. He says we should not rely on a single test program.
The main batteries in the TestUI suite are SmallCrush, Crush and BigCrush, which are progressively more stringent and time-consuming. BigCrush results for [14,2,7,5] were much better than [14,2,7,6]. However, they were also totally wrong as xoroshiro64++ [14,2,7,5] was tested by mistake.
I have been sent the following report files:
SmallCrush [14,2,7,5] - 1 test failed (of 10)
Crush [14,2,7,5] - 5 tests failed (of 96)
BigCrush [14,2,7,6] - 11 tests failed (of 106)
Unfortunately I can't compare like-for-like. Seba told me that [14,2,7,6] passes all tests in Crush. I really need to see corresponding report files to be 100% sure but as things stand I think we should revert to [14,2,7,6] which has better scores with normal PractRand tests, but not extended ones.
I'm not in favour of reverting.
I left the full width sampling test of the updated Xoroshiro128 running overnight. It went past 2 TB without issue.
EDIT: To put that in context, under same test configuration, the original Xoroshiro128+ falls over at a tiny 4 MB! See https://forums.parallax.com/discussion/comment/1434165/#Comment_1434165
TonyB_, any change of opinion?
PS: Seba's earlier scores for [14 2 7 x] would have both been for Xoroshiro64++ not for 32++. My guess is we have no idea how good Xoroshiro32++ [14 2 7 6] scores are on the Crush tests.
Chip, Seba said he would run some more Crush and BigCrush tests and we'll have the output files to make a proper comparison between the two. Results not here yet and might not be now until tomorrow. I hope our XORO32 is optimal and we haven't gone backwards by switching.
Evan, can you post your extended results here, now that xoroshiro++ is not a secret any more? The problem I have is that the best two quadruples of [5,2,6,9] and [6,2,5,9] do badly on repeat frequency, which makes me doubt the worth of the extended tests. [14,2,7,5] and [14,2,7,6] do well on repeat frequency.
This is certainly not something that requires intervention. The differences are nit picking.
Having said that, it's even murkier at this end now.
Here's a repost of that truncated table, plus I've added the equivalent table for standard options and also for -te 1 -tf 2 extended options:
Xoroshiro32(16)++ PractRand Score Table. PractRand v0.93 options: -multithreaded -te 1 -tlmin 1KB __________________________________________________________________________________________________________ Sampling apertures of the generator output, labelled as most to least bit-significance Candidate ---------------------------------------------------------------------------------------- [ A B C D] 15:00 14:00 13:00 15:01 15:02 15:08 14:07 13:06 12:05 11:04 10:03 9:02 8:01 7:00 ========================================================================================================== [ 5 2 6 9] 1G 4G 2G 2G 1G 2G 2G 1G 1G 1G 2G 1G 1G 1G [ 6 2 5 9] 1G 4G 2G 4G 1G 2G 1G 2G 1G 1G 1G 1G 1G 2G [14 2 7 5] 512M 4G 4G 2G 4G 1G 1G 1G 1G 1G 1G 1G 1G 1G [14 2 7 6] 512M 8G 2G 4G 1G 2G 1G 256M 256M 2G 512M 512M 512M 512M [15 4 12 9] 512M 16G 8G 8G 2G 1G 1G 512M 512M 512M 512M 1G 1G 1G Xoroshiro32(16)++ PractRand Score Table. PractRand v0.93 options: -multithreaded -tlmin 1KB __________________________________________________________________________________________________________ Sampling apertures of the generator output, labelled as most to least bit-significance Candidate ---------------------------------------------------------------------------------------- [ A B C D] 15:00 14:00 13:00 15:01 15:02 15:08 14:07 13:06 12:05 11:04 10:03 9:02 8:01 7:00 ========================================================================================================== [ 5 2 6 9] 1G 8G 16G 8G 4G 2G 2G 1G 2G 2G 2G 8G 2G 1G [ 6 2 5 9] 1G 8G 8G 8G 4G 2G 2G 2G 2G 2G 2G 2G 1G 2G [14 2 7 5] 512M 16G 16G 8G 16G 2G 2G 1G 1G 2G 2G 2G 2G 2G [14 2 7 6] 512M 16G 8G 16G 8G 4G 2G 1G 512M 2G 1G 1G 1G 2G [15 4 12 9] 512M 16G 16G 16G 8G 1G 1G 512M 512M 1G 1G 1G 1G 1G Xoroshiro32(16)++ PractRand Score Table. PractRand v0.93 options: -multithreaded -te 1 -tf 2 -tlmin 1KB __________________________________________________________________________________________________________ Sampling apertures of the generator output, labelled as most to least bit-significance Candidate ---------------------------------------------------------------------------------------- [ A B C D] 15:00 14:00 13:00 15:01 15:02 15:08 14:07 13:06 12:05 11:04 10:03 9:02 8:01 7:00 ========================================================================================================== [ 5 2 6 9] 1G 4G 2G 2G 1G 2G 2G 1G 1G 1G 2G 1G 1G 1G [ 6 2 5 9] 1G 4G 2G 4G 1G 2G 1G 256M 1G 1G 1G 1G 1G 2G [14 2 7 5] 256M 4G 4G 2G 4G 1G 1G 1G 1G 1G 1G 1G 1G 1G [14 2 7 6] 512M 8G 2G 4G 2G 2G 1G 256M 256M 2G 512M 512M 512M 512M [15 4 12 9] 256M 16G 8G 8G 2G 1G 1G 512M 512M 512M 1G 1G 1G 1GNotably, with full extended options, [14 2 7 5] degrades to 256 MB on the full width sampling.It's more than nit-picking if [14,2,7,6] passes all the Crush tests with [14,2,7,5] failing 5. Or if [14,2,7,5] has fewer passes in BigCrush than [14,2,7,6]. The failures are due to the small state, if the generator is good, but that applies to PractRand as well.
P.S. xoroshiro[x]y++ is in the text in the output files.
I'm confident we haven't seen any Crush scores for [14,2,7,6], and that it won't be significantly different to [14,2,7,5].