As an independent check, because I can't use PractRand for various reasons, I thought of a simple test: how often an output is repeated, i.e. two successive outputs the same. For xoroshiro32 with 32-bit state and 16-bit output from 0-65535, the probability of a repeat is 1/65536 and the expected total number of repeats is (2^32-1)/65536 = 65536.
It would be most unrandom if every output repeated exactly once, some will repeat twice or more and others never. If the output is truly random there should be a binomial distribution of repeats, with relative frequencies as follows:
As can be seen, [5,2,6,9] repeat frequency is bad and yet it was the best performer in extended PractRand tests.
[6,2,5,9] is similar to [5,2,6,9] but I didn't save it.
At first Seba suggested exactly half rotation for constant d, 8 in this case, but he changed his mind after running some tests. The repeat frequency confirms this, although the worst d might not be 8 if one of the other constants is 8. Worst d +/- 1 are usually not great.
========= Summary results of Crush =========
Version: TestU01 1.2.3
Generator: xoroshiro[16]32++-14-2-7
Number of statistics: 144
Total CPU time: 00:29:09.44
The following tests gave p-values outside [0.001, 0.9990]:
(eps means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):
Test p-value
----------------------------------------------
1 SerialOver, t = 2 1 - 5.1e-5
2 SerialOver, t = 4 1.3e-8
42 MaxOft, t = 10 1 - 1.1e-6
43 MaxOft, t = 20 1 - 2.7e-5
44 MaxOft, t = 30 1 - 5.7e-9
----------------------------------------------
All other tests were passed
xoroshiro[16]32++ [14,2,7,6] Crush
========= Summary results of Crush =========
Version: TestU01 1.2.3
Generator: xoroshiro[16]32++-14-2-7
Number of statistics: 144
Total CPU time: 00:35:46.09
The following tests gave p-values outside [0.001, 0.9990]:
(eps means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):
Test p-value
----------------------------------------------
1 SerialOver, t = 2 1 - eps1
41 MaxOft, t = 5 0.9992
44 MaxOft, t = 30 1 - 9.6e-6
----------------------------------------------
All other tests were passed
xoroshiro[16]32++ [14,2,7,5] BigCrush
========= Summary results of BigCrush =========
Version: TestU01 1.2.3
Generator: xoroshiro[16]32++-14-2-7
Number of statistics: 160
Total CPU time: 03:12:04.36
The following tests gave p-values outside [0.001, 0.9990]:
(eps means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):
Test p-value
----------------------------------------------
2 SerialOver, r = 22 1.1e-4
34 Gap, r = 0 eps
35 Gap, r = 25 eps
36 Gap, r = 0 eps
37 Gap, r = 20 eps
46 MaxOft, t = 8 1 - eps1
47 MaxOft, t = 16 1 - eps1
48 MaxOft, t = 24 1 - eps1
49 MaxOft, t = 32 1 - eps1
65 SumCollector 8.3e-4
----------------------------------------------
All other tests were passed
xoroshiro[16]32++ [14,2,7,6] BigCrush
========= Summary results of BigCrush =========
Version: TestU01 1.2.3
Generator: xoroshiro[16]32++-14-2-7
Number of statistics: 160
Total CPU time: 03:14:37.20
The following tests gave p-values outside [0.001, 0.9990]:
(eps means a value < 1.0e-300):
(eps1 means a value < 1.0e-15):
Test p-value
----------------------------------------------
1 SerialOver, r = 0 eps
2 SerialOver, r = 22 1 - 1.7e-5
34 Gap, r = 0 9.1e-11
35 Gap, r = 25 eps
36 Gap, r = 0 eps
37 Gap, r = 20 eps
46 MaxOft, t = 8 1 - 4.6e-15
47 MaxOft, t = 16 1 - eps1
48 MaxOft, t = 24 1 - eps1
49 MaxOft, t = 32 1 - eps1
65 SumCollector 7.4e-4
----------------------------------------------
All other tests were passed
I've told Seba the P2 design is finished and the first chip will have 1 * xoroshiro128 + 8 * xoroshiro32++ [14,2,7,5] so we can't change it now!
Seba has helped us a lot via email and I've thanked him and David for that and their xoroshiro algorithms, on behalf of all of us. [Sebastiano Vigna & David Blackman.]
* * * * * * * * * *
Chip,
Seba would like to mention in his forthcoming paper that the new xoroshiro128 algorithm is/will be in a processor. He asked me how to quote Parallax and the P2 correctly.
Notably, with full extended options, [14 2 7 5] degrades to 256 MB on the full width sampling.
The main reason for the switch to [14,2,7,5] is its consistency. All nine 8-bit samples above have the same score. Evan, is it worth trying the five 12-bit groups? Also, is your new xoroshiro128 test still running?
Parallax has had permission to use xoroshiro++ without any strings attached since last November. I've just asked Seba to confirm the same applies to the new xoroshiro128.
The only formal declaration I've found is in the older Xorshift source code:
/* Written in 2014-2016 by Sebastiano Vigna (vigna@acm.org)
To the extent possible under law, the author has dedicated all copyright
and related and neighboring rights to this software to the public domain
worldwide. This software is distributed without any warranty.
Tony,
Does Seba have a handle on why we have stuck with Xoroshiro?
Reading Melisa's blog, it is very clear that multipliers, using a full-width constant, feature highly in the high quality PRNG algorithms. A full blown multiplier is way beyond what we can handle so making do with adders is doing well to get good quality.
Chris did indicate there is better using just adders and gave an example. But it needed a bigger adder circuit than what XORO32 was using. Chris may have been able to refine it more, dunno. In hindsight we didn't pursue that option as much as we probably should have.
The good news is we have gone most of the way by upgrading XORO32 to the dual 16-bit adders arrangement. This is still a decent logic saving over having a single 32-bit adder.
The free-running GETRND has also newly got a similar update to dual 64-bit adders. It now has a two stage internal circuit, ie: The random data results are always one clock lagging the state store register.
Another factor I guess is the pack-of-cards type randomness. I think most algorithms have many holes in state space utilisation (period is nerf'd and potentially poor distribution). Xorshift/Xoroshiro have state iteration separated from "hash"ing. My take of this separation is it allows for independent tuning of the randomness while still fully utilising, or not, the state space.
... Evan, is it worth trying the five 12-bit groups? Also, is your new xoroshiro128 test still running?
Currently three 128's running. Two are 32-bit sampling of original algorithm while the third is 8-bit, [7:0], sampling of the Prop2's algorithm.
They share the CPU cores a little. Max's out at about 5.3 hardware threads (virtual cores) used per running test case. Doesn't push much above 13 threads all combined though.
As for further XORO32 testing, recent piecemeal testing has been achieved with newly added scripts that can semi-automate the testing for smaller test sets. I used one yesterday to make the partial score tables with. These new scripts don't do the CPU load checking, so I have to be careful not to fire off too many test cases at once ...
Tony,
Does Seba have a handle on why we have stuck with Xoroshiro?
Reading Melisa's blog, it is very clear that multipliers, using a full-width constant, feature highly in the high quality PRNG algorithms. A full blown multiplier is way beyond what we can handle so making do with adders is doing well to get good quality.
Chris did indicate there is better using just adders and gave an example. But it needed a bigger adder circuit than what XORO32 was using. Chris may have been able to refine it more, dunno. In hindsight we didn't pursue that option as much as we probably should have.
The good news is we have gone most of the way by upgrading XORO32 to the dual 16-bit adders arrangement. This is still a decent logic saving over having a single 32-bit adder.
The free-running GETRND has also newly got a similar update to dual 64-bit adders. It now has a two stage internal circuit, ie: The random data results are always one clock lagging the state store register.
I think Seba is really pleased his and David's algorithms will be implemented in a microprocessor. Aren't you glad we've stuck with xoroshiro, after doing thousands of tests on it? It's far too late to change now, anyway, we're done.
The two different xoroshiro algorithms in the P2 are state-of-the-art and yet don't need a great deal of logic. I think we've said as much as we can or should about the new xoroshiro128 until it is made public. We are lucky to know about either algorithm, to be honest. After I put a message on his forum about our version of the parity trick to improve xoroshiro+, Seba mentioned xoroshiro++ at the end of an email as he was curious to know whether there would be enough instruction time for the P2 to do it.
There was more good fortune with the new xoroshiro128. We started using xoroshiro32++ in November and in the middle of February I enquired when the ++ algorithm would be announced. Seba replied that it never would be, as he had a better generator but 128-bit only. He didn't volunteer this earlier because he knew we were using a 32-bit xoroshiro++ and probably I hadn't told him the P2 also has a 128-bit generator. We might have found out about the new xoroshiro128 only after the P2 had been finished.
... Seba told me about xoroshiro++ after I put a message on his forum about our version of the parity trick to improve xoroshiro+, as he was curious to know whether there would enough time for the P2 to do it.
Was he curious about time to product launch or instruction execution speed?
What I'm trying to make clear here is it's the latter, instruction speed (and compactness), that has been a defining factor all along. For a hardware solution, it's hard to beat Xoroshiro on the speed, simplicity and quality all combined. I figure it is worth clarifying that this is why we started with Xoroshiro and are still using it.
Bare in mind that Melissa is being critical towards Xoroshiro128+'s fame. She has the luxury of a fast 64-bit multiplier instruction to play with, so naturally she can demonstrate higher quality solutions that run just as fast as a software Xoroshiro does. On a software solution basis she is right to point this out.
Obviously she has not been comparing the newer Xoroshiro versions though. So, yep, I am very grateful of Seba giving us our head start.
And then there's Ahle, who was the one that started this whole investigation.
Yes, I've certainly learnt much about PRNGs and their uses as a result of this topic. I was totally green before this.
I presume something else he was doing must have jogged him into thinking about progress with the Prop2 and what was then only the free running generator.
... Seba told me about xoroshiro++ after I put a message on his forum about our version of the parity trick to improve xoroshiro+, as he was curious to know whether there would be enough time for the P2 to do it.
Was he curious about time to product launch or instruction execution speed?
What I'm trying to make clear here is it's the latter, instruction speed (and compactness), that has been a defining factor all along. For a hardware solution, it's hard to beat Xoroshiro on the speed, simplicity and quality all combined. I figure it is worth clarifying that this is why we started with Xoroshiro and are still using it.
Bear in mind that Melissa is being critical towards Xoroshiro128+'s fame. She has the luxury of a fast 64-bit multiplier instruction to play with, so naturally she can demonstrate higher quality solutions that run just as fast as a software Xoroshiro does. On a software solution basis she is right to point this out.
Obviously she has not been comparing the newer Xoroshiro versions though. So, yep, I am very grateful of Seba giving us our head start.
Any further critiques of xoroshiro+ are pointless. It's very old hat.
That's not a fair judgement beyond the Prop2 at this stage. And original Xoroshiro is barely two years old!
Maybe you could say that after the new version is actually published, but even then it would still be worth comparing new and old.
Regarding the age of hats, perhaps it's the criticism that's the headwear? Seba finds it exasperating and Melissa knew there was a better xoroshiro before making her recent blog post.
Seba finds it exasperating and Melissa knew there was a better xoroshiro before making her recent blog post.
I'm sure he can handle it.
It'll be a good comparison that can be looked back on and everyone can say those niggles are solved ... until a tougher test suite is devised, that is .
/* Written in 2014-2016 by Sebastiano Vigna (vigna@acm.org)
To the extent possible under law, the author has dedicated all copyright
and related and neighboring rights to this software to the public domain
worldwide. This software is distributed without any warranty.
I'm no copyright expert but as far as I know this is problematic. In most of the world, that has bowed down to the almighty Micky Mouse, it is impossible to put your work into the public domain. Unless you are a government or have been dead for many decades.
Copyright protection is granted automatically when one publishes a work. I never found any mechanism by which copyright law allows one to remove that protection and place the work into the public domain. Anyone ever hear of such a thing?
As such a statement like "To the extent possible under law... this software to the public domain..." is useless. There is no extent possible under the law.
This why everything open source is published with one of the billion open source licenses that have been dreamed up in recent years. You can't just say "public domain" you have to license it with terms that grant users the rights you want them to have.
Also, the creative commons link given does not work. CC does say "Currently, Creative Commons does not recommend the Public Domain Mark", for the same reasons I give above. https://creativecommons.org/choose/mark/
Comments
It would be most unrandom if every output repeated exactly once, some will repeat twice or more and others never. If the output is truly random there should be a binomial distribution of repeats, with relative frequencies as follows:
The sum of 1/0!+1/1!+1/2!+1/3!+1/4!... is the number e = 2.71828...
If the repeat total = 65536, the expected distribution with values to the nearest integer is:
The repeat frequency test was intended to help eliminate the worst, not select the best. Some test results to follow ...
Actual values
Repeat frequencies for xoroshiro[16]32++ [5,2,6,d]
Actual-Expected values
Repeat frequencies for xoroshiro[16]32++ [14,2,7,d]
Actual values
Repeat frequencies for xoroshiro[16]32++ [14,2,7,d]
Actual-Expected values
Repeat frequencies for xoroshiro[16]32++
Actual-Expected values
As can be seen, [5,2,6,9] repeat frequency is bad and yet it was the best performer in extended PractRand tests.
[6,2,5,9] is similar to [5,2,6,9] but I didn't save it.
0 * f0 + 1 * f1 + 2 * f2 + 3 * f3 + ... = repeats
f0 + f1 + f2 + f3 + ... = 65536
Pick one at, ahem...
xoroshiro[16]32++ [14,2,7,6] Crush
xoroshiro[16]32++ [14,2,7,5] BigCrush
xoroshiro[16]32++ [14,2,7,6] BigCrush
Tests failed:
Conclusion:
I think we can leave XORO32 as it is.
Seba has helped us a lot via email and I've thanked him and David for that and their xoroshiro algorithms, on behalf of all of us.
[Sebastiano Vigna & David Blackman.]
* * * * * * * * * *
Chip,
Seba would like to mention in his forthcoming paper that the new xoroshiro128 algorithm is/will be in a processor. He asked me how to quote Parallax and the P2 correctly.
The main reason for the switch to [14,2,7,5] is its consistency. All nine 8-bit samples above have the same score. Evan, is it worth trying the five 12-bit groups? Also, is your new xoroshiro128 test still running?
Chip, is the above good enough for you? Something very similar was for xoroshiro++.
Tony,
Does Seba have a handle on why we have stuck with Xoroshiro?
Reading Melisa's blog, it is very clear that multipliers, using a full-width constant, feature highly in the high quality PRNG algorithms. A full blown multiplier is way beyond what we can handle so making do with adders is doing well to get good quality.
Chris did indicate there is better using just adders and gave an example. But it needed a bigger adder circuit than what XORO32 was using. Chris may have been able to refine it more, dunno. In hindsight we didn't pursue that option as much as we probably should have.
The good news is we have gone most of the way by upgrading XORO32 to the dual 16-bit adders arrangement. This is still a decent logic saving over having a single 32-bit adder.
The free-running GETRND has also newly got a similar update to dual 64-bit adders. It now has a two stage internal circuit, ie: The random data results are always one clock lagging the state store register.
It also parallelises the algorithm!
They share the CPU cores a little. Max's out at about 5.3 hardware threads (virtual cores) used per running test case. Doesn't push much above 13 threads all combined though.
As for further XORO32 testing, recent piecemeal testing has been achieved with newly added scripts that can semi-automate the testing for smaller test sets. I used one yesterday to make the partial score tables with. These new scripts don't do the CPU load checking, so I have to be careful not to fire off too many test cases at once ...
I think Seba is really pleased his and David's algorithms will be implemented in a microprocessor. Aren't you glad we've stuck with xoroshiro, after doing thousands of tests on it? It's far too late to change now, anyway, we're done.
The two different xoroshiro algorithms in the P2 are state-of-the-art and yet don't need a great deal of logic. I think we've said as much as we can or should about the new xoroshiro128 until it is made public. We are lucky to know about either algorithm, to be honest. After I put a message on his forum about our version of the parity trick to improve xoroshiro+, Seba mentioned xoroshiro++ at the end of an email as he was curious to know whether there would be enough instruction time for the P2 to do it.
There was more good fortune with the new xoroshiro128. We started using xoroshiro32++ in November and in the middle of February I enquired when the ++ algorithm would be announced. Seba replied that it never would be, as he had a better generator but 128-bit only. He didn't volunteer this earlier because he knew we were using a 32-bit xoroshiro++ and probably I hadn't told him the P2 also has a 128-bit generator. We might have found out about the new xoroshiro128 only after the P2 had been finished.
What I'm trying to make clear here is it's the latter, instruction speed (and compactness), that has been a defining factor all along. For a hardware solution, it's hard to beat Xoroshiro on the speed, simplicity and quality all combined. I figure it is worth clarifying that this is why we started with Xoroshiro and are still using it.
Bare in mind that Melissa is being critical towards Xoroshiro128+'s fame. She has the luxury of a fast 64-bit multiplier instruction to play with, so naturally she can demonstrate higher quality solutions that run just as fast as a software Xoroshiro does. On a software solution basis she is right to point this out.
Obviously she has not been comparing the newer Xoroshiro versions though. So, yep, I am very grateful of Seba giving us our head start.
I presume something else he was doing must have jogged him into thinking about progress with the Prop2 and what was then only the free running generator.
Exactly!
I told Melissa all about xoroshiro++ in February. The tests mentioned in her recent post took so long that I think she started them last year:
http://www.pcg-random.org/posts/xoroshiro-fails-truncated.html
Any further critiques of xoroshiro+ are pointless. It's very old hat.
Maybe you could say that after the new version is actually published, but even then it would still be worth comparing new and old.
Regarding the age of hats, perhaps it's the criticism that's the headwear? Seba finds it exasperating and Melissa knew there was a better xoroshiro before making her recent blog post.
It'll be a good comparison that can be looked back on and everyone can say those niggles are solved ... until a tougher test suite is devised, that is .
I'm no copyright expert but as far as I know this is problematic. In most of the world, that has bowed down to the almighty Micky Mouse, it is impossible to put your work into the public domain. Unless you are a government or have been dead for many decades.
Copyright protection is granted automatically when one publishes a work. I never found any mechanism by which copyright law allows one to remove that protection and place the work into the public domain. Anyone ever hear of such a thing?
As such a statement like "To the extent possible under law... this software to the public domain..." is useless. There is no extent possible under the law.
This why everything open source is published with one of the billion open source licenses that have been dreamed up in recent years. You can't just say "public domain" you have to license it with terms that grant users the rights you want them to have.
Also, the creative commons link given does not work. CC does say "Currently, Creative Commons does not recommend the Public Domain Mark", for the same reasons I give above. https://creativecommons.org/choose/mark/