I have been so busy I have not had time to look over much of the data.
However, I did notice that Xoroshiro32(16)s1+s0- candidate [6 2 3 6] (that you mentioned) fails one of the new tests in Practrand 0.94, but at the same time that it fails other tests (~512MB).
Still unusual, though, as others I looked at did not exhibit that behavior.
If you do decide to use 0.94, start testing at 128KB, as the the occasional false-positive issue in 0.94 from 1KB-64KB should be fixed for the eventual 0.95 release, as I understand it.
I'm busy with meta-analysis of my xoroshiro128psp (which looks good on its own) in Big Crush as part of a TRNG where I sum its output with AES (counter mode) and Lehmer128. I've got 44 threads running 24/7 for the next few weeks.
version 0.94
RNGs
xsm64
revised
old version was falling well short of intended quality on some seeds
xsm32
revised
efiix8x384 / efiix16x384 / efiix32x384 / efiix64x384
the large array sizes didn't realy help anything, so I shrunk them down to reduce memory & cache footprints
where the old iteration table size used to be 128, it's now 32
and where the old indirection table size used to be 256, it's now 16
so the names are now efiix8x48 instead of efiix8x384, efiix16x48 instead of efiix16x384, etc
tools
RNG_test
adjusted formating of raw outputs to better preserve columns during extreme failures
changed listed seed for "RNGs" that don't have meaningful seed
fixed off-by-1 bug causing evaluation thresholds to be slightly incorrect
added new evalution threshold "normalish" between "normal" and "unusual"
SeedingTester and EntropyPoolingTester meta-rngs that were only supported by command line options -ttseed64 and -ttep can now be specified as part of the RNG name
RNG_output
increased buffer size (again)
fixed endianness issue (again?)
now supports SeedingTester and EntropyPoolingTester meta-rngs
adjust to how very large Raw values are printed, to preserve column alignment
Test Batteries:
core tests:
added a new test, mod3n
it doesn't help on all that many PRNGs, but the ones it does help with it helps a lot
and it doesn't take much resources (largely because it's set up to only test a small fraction of the data stream)
this one is inspired the mod3 test in jgrand
added a new test, TMFn
it doesn't help on all that many PRNGs, but the ones it does help with it helps a lot
and it doesn't take much resources (largely because it's set up to only test a small fraction of the data stream)
this one primarily targets LCGs
FPF
removed "all2" subtest
expanded tests:
added a new test, mod3n
added a new test, TMFn
added a new test, BirthdaySystematic
various other changes?
tests
DistC6
fixed bug that effected the parameterizations used in the expanded test set
bug caused crashes on some platforms, had little to no effect on others
BRank
fixed semi-minor bug (again?)
transforms
FirstNofM
bugfixes; however, I don't think this was actually used anywhere in PractRand before, so probably nothing visible
mod3n
based upon the mod3 test in gjrand (by Geronimo Jones)
but it targets 8, 16, 32, 64, 128, etc bit blocks, all simultaneously, the same way BCFN does
it does not find bias in a very wide variety of PRNGs, but several of the ones catches are otherwise very difficult to find bias in (flea32x1, ara32, ibaa32)
TMFn
intended to catch LCGs. I may replace it with a different test intended for the same purpose in a later version.
rng sets for testing
adding to ease testing automation
Hmm, the one I found was coming from Gap16 tests with very specific data input - I posted a binary input file here for him to work with. It would just lock up in a busy loop due to a divide-by-zero.
I'm not seeing Gap16 listed in the fixes.
I'm not sure the false positives are the same ones I'm dealing with either. I found the BCFN tests, though they were most obvious in the smaller sizes, could prematurely exit at any point. I have a whole script that splits off each test case just for sequencing the retries of Practrand.
Bother, 0.94 isn't compiling cleanly for me. Had a couple of case sensitive changes required to find the needed files. This indicates he's not on Linux for his regular development. Now it's complaining about undefined references during linking.
Oh, it's more filename mismatches than I first thought. I'm going to have to edit the sources ...
Just trying out the -te 10 "Birthday" test. Contrary to what he says in the help text, I'm not seeing a large RAM usage, but it's certainly way more processor hungry. It's sitting just above 60% utilisation of 16 threads with only 4 cases running.
For some reason it's also creating excessive kernel level burn too. About 40% of a core between them.
EDIT: Oh, I just realised how much random data was flowing, I noticed this when it clocked over 1 TB for each case! So the kernel burn would have been the pipes operating.
Okay, I've stopped it now, and don't have a clue what good the reports are. Back to the regular tests I guess.
I posted about the lockup issue also, and it will be fixed for 0.95. There is no work-around for that one in 0.94.
However, it only occurs for me on small state 2-dimensionally equidistributed PRNGs, where the obvious non-randomness creates a loop in one of the statistical tests due to a calculated NAN (not a number): link
Xoroshiro32++ (and variants we have been reviewing) are unaffected, but still need to start tlmin at 128KB (when using -te 1 -tf 2).
All my automation start at -tlmin 1KB. Only ever had the one issue and that was with an s8 run which was only an experiment anyway. If I hadn't done that I would never have struck the bug.
Here's my verbatim PM and reply:
evanh 2018-03-11 - 23:01:45
Hi Chris,
Dunno if you'll get this but I'll give it a go.
I've bumped into a bug in PractRand testing code. Somehow one of my generators produced a random sequence that locks PractRand solid every time. It happens at the 32 KB mark when piping data into RNG_Test using "stdin". When using "stdin16" it occurs at the 64 KB mark. When using "stdin32" it occurs at the 128 KB mark.
I have a 32 KB binary file of the trigger random data here for testing with. I can post it in the forum if you respond.
I only know C but I've still managed to follow the cause of the lockup to a -NaN value being passed into sample_to_pvalue() function in tests.cpp. This -NaN is coming from a log() function in g_test() function in math.cpp. It in turn is being fed a zero value which comes from a Gap16 "const Uint64 *count_ = counts.get_array();" array back in test.cpp, which is entirely blank for some reason.
That's as far as I got.
scro 2018-03-12 - 11:43:20
Really! I've never seen one of those before. I'll take a look at fixing it.
Thanks for letting me know.
So he did acknowledge it back then. I'm guessing he got sidetracked and forgot.
Looking at the sources for 0.94, the offending zero value is coming from line 1677 of tests.cpp.
I didn't try to find how that array is supposed to get filled. It felt like a separate, earlier, stage of the testing. I wasn't that keen to understand all the workings.
Ah, your handle is Cerian Knight and that link is to ChrisD's reply to you, isn't it. So v0.95 was already well underway and the bug already corrected in v0.95 before you even reported it.
That makes more sense now. 0.94 must have been too far along to be patched with what can be considered a low value bug fix.
Yes to all of the above. I am happy to wait for 0.95 (and beyond) if it means:
1. More accurate and stable small state analysis.
2. More comprehensive large state analysis.
As far as #2 goes, the TestU01 Big Crush meta-analysis I am doing should be simple, except for the fact that some of the statistics (at least 20 of 254, some important) are skewed due to the use of approximations in the p-val calculation.
Excluding those or using corrections, I have refined the process to the point where I see evidence of previously undisclosed flaws in many common PRNGs I have tested... so if I am the only one seeing this, then obviously more testing and validation is required.
Alternatively, another already fairly prominent tool would find some of the same issues I see...
I was hoping PractRand would one day grow beyond TestU01, as the former is easy to use, whereas the latter requires much more than my basic knowledge of C programming to fully take advantage of its extended features (e.g. automatic repetitions, etc.).
Hehe, I'm no fan of C++ either but, like most others that have used Practrand, I didn't have to know a thing about it because of the "stdin" command line option. All had to do was build the full program and then use that one feature.
TestU01 should have come with an equivalent example build. I couldn't make sense of compiling the sources without first reading Melisa's uses. It's almost like the API had changed and they'd never updated the examples.
... but still need to start tlmin at 128KB (when using -te 1 -tf 2).
Ah, this seems to be new to v0.94. Things weren't looking right with the gridding since upgrading Practrand from v0.93. I decided it was time to apply my new found knowledge about sorted Bash arrays to the gridded score averages.
Below is two lists side by side, left-hand list comes from scoring with Practrand v0.94 and the right-hand list comes from scoring with Practrand v0.93. As you see, there is a large number of <17 minimums for the v0.94 run.
Looking at the Practrand reports themselves, these new anomalies are all coming from FPF tests. Whereas the prior situation in v0.93 was all from BCFN tests.
The ExpMin < 17 on FPF 32/64 is a false positive, since when the 32/64 tests were added in 0.94 they were not calibrated below that level.
I believe even (the extremely rare) 2^17 FPF 32/64 failures should be retested with a different seed (or restarted with tlmin at 256KB, which cannot false positive, in my experience).
There is a reply to my post on that subject: link
Figures. I hadn't looked that close to notice they were the new tests. The BFCN situation seemed pretty similar in that the pValue was only just bumping over the fail threshold but was sparsely distributed instead.
And guess what? Looking through the 0.94 reports now I can't see any BCFN false positives. Presumably he retuned those tests in 0.94. I probably wasn't the first to notice it.
Amusingly, the script logic has noticed some scores are the result of BCFN tests failing but the subsequent automatic retest has confirmed they are all valid fails. I haven't eyeballed every one of these though, so I can't say with 100% certainty.
EDIT: Huh, the BCFN retests in v0.94 are all confined to just two, of 36, candidates: [4 1 9 6] and [4 1 9 8].
The BFCN situation seemed pretty similar in that the pValue was only just bumping over the fail threshold but was sparsely distributed instead.
The detail that I relied on was the false bump was always steep sided. It didn't have a gradual build up and fall off either side of the peak. At most only two sizes were involve, so retesting one size up was usually enough for skipping past it.
Here's the ChiSquared sorted distributions for the xom group of algorithm variants in CSV [tab] format. The cfg file contains expanded naming.
Thanks for the results, Evan. A few comments:
1. You've used a weighting of two for prank and one for zrank & nzrank. An equal weighting of one, which I've used in the past, does not change the best result for any +/- combination and makes relatively little difference to each top 20.
2. I'm not convinced about your names for the different +/- options. For one thing, it is hard to identify the usually similar positive/negative pairs (e.g. +++/---).
3. The innovation we have been studying is to add some value, call it X for now, to s0 during the same time that s0 and s1 are added, then add the latter sum rotated to the former sum. Here add can mean add or subtract. Thus PRN = rotl(+/- s1 +/- s0, d ) +/- (s0 + X).
X has been chosen deliberately to be an even multiple of s0 so that (s0 + X) is an odd multiple of s0. We could introduce a fifth constant, e, such that s0 * e = s0 + X. e = 5 produces better distributions than e = 1 (the original xoroshiro++) so does this mean e = 3 would be somewhere in between? Or could we increase e > 5 and get better results?
Instead of using non-obvious names, [a,b,c,d,e]+++ to [a,b,c,d,e]--- could identify the eight different +/- options, where
[a,b,c,d,1]+++ = [a,b,c,d] ++ = original xoroshiro++ algorithm
4. The best result for the new distributions you posted is this one:
I presume it's important that X is a multiple of s0, ie: not a constant.
As for the naming, that's all a bit hackish anyway. That particular edition was from before I started down the "try all combinations" path. I've just left it alone. The other seven editions, as per your example, are more long-winded naming.
I've done a number of grid runs now, though not the Xoroshiro32(16)-s1+s0-x run yet. Have been repairing/changing some scripts in the process.
At some point, increasing e will decrease quality as fewer bits of s0 are used twice (in different positions) in the s0 * e addition. Is e = 5 best, or is it 3 or 7 or 9?
Chris recommended x5. I just followed instructions.
Huh, that sign pairing really stands out. I've just completed the first grid run for xom10 (-++) and comparing it to the earlier xom03 (+--) it comes out remarkably close sorting order. And not just the order either, matching candidate grid score averages are close too. So, not surprisingly, whole gridding run averages are also close: 29.584 for xom03, 29.626 for xom10.
At some point, increasing e will decrease quality as fewer bits of s0 are used twice (in different positions) in the s0 * e addition. Is e = 5 best, or is it 3 or 7 or 9?
Chris recommended x5. I just followed instructions.
s0 * 5 is an extra 14-bit adder for each iteration compared to just s0, therefore the double iteration in XORO32 would use 92 add/subtract bits instead of 64 now. No extra routing resources should be needed for s0 + s0 * 4 as both operands are the same.
At some point, increasing e will decrease quality as fewer bits of s0 are used twice (in different positions) in the s0 * e addition. Is e = 5 best, or is it 3 or 7 or 9?
Chris recommended x5. I just followed instructions.
s0 * 5 is an extra 14-bit adder for each iteration compared to just s0, therefore the double iteration in XORO32 would use 92 add/subtract bits instead of 64 now. No extra routing resources should be needed for s0 + s0 * 4 as both operands are the same.
I notice: s0 + s0 * 4, from a hardware implementation standpoint, would likely be nearly the same as s0 + rotl(s0, 2).
The worth of those two (normally discarded) bits might be high, with the right constants, but equidistribution would have to be verified.
I have not tested this idea yet in my own code (which uses two * 5s), but from a software implementation standpoint, I suspect it would likely overburden an already saturated CPU pipeline.
In terms of the execution stages of the pipeline, even though there is two clock cycles allotted, it really all happens in the first of the two clocks. Chip has mentioned the second execution clock is result mux'ing only - Which presumably means there is a wide mass of flops in the middle of execution. I think this was a design decision Chip made to eliminate a nightmare in the critical path analysis.
Comments
However, I did notice that Xoroshiro32(16)s1+s0- candidate [6 2 3 6] (that you mentioned) fails one of the new tests in Practrand 0.94, but at the same time that it fails other tests (~512MB).
Still unusual, though, as others I looked at did not exhibit that behavior.
If you do decide to use 0.94, start testing at 128KB, as the the occasional false-positive issue in 0.94 from 1KB-64KB should be fixed for the eventual 0.95 release, as I understand it.
I'm busy with meta-analysis of my xoroshiro128psp (which looks good on its own) in Big Crush as part of a TRNG where I sum its output with AES (counter mode) and Lehmer128. I've got 44 threads running 24/7 for the next few weeks.
I'm not seeing Gap16 listed in the fixes.
I'm not sure the false positives are the same ones I'm dealing with either. I found the BCFN tests, though they were most obvious in the smaller sizes, could prematurely exit at any point. I have a whole script that splits off each test case just for sequencing the retries of Practrand.
Time to try v0.94 out I guess ...
Here's the binary input data for locking up practrand - https://forums.parallax.com/discussion/comment/1432700/#Comment_1432700
And piping command used:
Oh, it's more filename mismatches than I first thought. I'm going to have to edit the sources ...
For some reason it's also creating excessive kernel level burn too. About 40% of a core between them.
EDIT: Oh, I just realised how much random data was flowing, I noticed this when it clocked over 1 TB for each case! So the kernel burn would have been the pipes operating.
Okay, I've stopped it now, and don't have a clue what good the reports are. Back to the regular tests I guess.
However, it only occurs for me on small state 2-dimensionally equidistributed PRNGs, where the obvious non-randomness creates a loop in one of the statistical tests due to a calculated NAN (not a number): link
Xoroshiro32++ (and variants we have been reviewing) are unaffected, but still need to start tlmin at 128KB (when using -te 1 -tf 2).
Here's my verbatim PM and reply:
So he did acknowledge it back then. I'm guessing he got sidetracked and forgot.
I didn't try to find how that array is supposed to get filled. It felt like a separate, earlier, stage of the testing. I wasn't that keen to understand all the workings.
That makes more sense now. 0.94 must have been too far along to be patched with what can be considered a low value bug fix.
1. More accurate and stable small state analysis.
2. More comprehensive large state analysis.
As far as #2 goes, the TestU01 Big Crush meta-analysis I am doing should be simple, except for the fact that some of the statistics (at least 20 of 254, some important) are skewed due to the use of approximations in the p-val calculation.
Excluding those or using corrections, I have refined the process to the point where I see evidence of previously undisclosed flaws in many common PRNGs I have tested... so if I am the only one seeing this, then obviously more testing and validation is required.
Alternatively, another already fairly prominent tool would find some of the same issues I see...
I was hoping PractRand would one day grow beyond TestU01, as the former is easy to use, whereas the latter requires much more than my basic knowledge of C programming to fully take advantage of its extended features (e.g. automatic repetitions, etc.).
TestU01 should have come with an equivalent example build. I couldn't make sense of compiling the sources without first reading Melisa's uses. It's almost like the API had changed and they'd never updated the examples.
Below is two lists side by side, left-hand list comes from scoring with Practrand v0.94 and the right-hand list comes from scoring with Practrand v0.93. As you see, there is a large number of <17 minimums for the v0.94 run.
Looking at the Practrand reports themselves, these new anomalies are all coming from FPF tests. Whereas the prior situation in v0.93 was all from BCFN tests.
I believe even (the extremely rare) 2^17 FPF 32/64 failures should be retested with a different seed (or restarted with tlmin at 256KB, which cannot false positive, in my experience).
There is a reply to my post on that subject: link
And guess what? Looking through the 0.94 reports now I can't see any BCFN false positives. Presumably he retuned those tests in 0.94. I probably wasn't the first to notice it.
Amusingly, the script logic has noticed some scores are the result of BCFN tests failing but the subsequent automatic retest has confirmed they are all valid fails. I haven't eyeballed every one of these though, so I can't say with 100% certainty.
EDIT: Huh, the BCFN retests in v0.94 are all confined to just two, of 36, candidates: [4 1 9 6] and [4 1 9 8].
Thanks for the results, Evan. A few comments:
1. You've used a weighting of two for prank and one for zrank & nzrank. An equal weighting of one, which I've used in the past, does not change the best result for any +/- combination and makes relatively little difference to each top 20.
2. I'm not convinced about your names for the different +/- options. For one thing, it is hard to identify the usually similar positive/negative pairs (e.g. +++/---).
3. The innovation we have been studying is to add some value, call it X for now, to s0 during the same time that s0 and s1 are added, then add the latter sum rotated to the former sum. Here add can mean add or subtract. Thus PRN = rotl(+/- s1 +/- s0, d ) +/- (s0 + X).
X has been chosen deliberately to be an even multiple of s0 so that (s0 + X) is an odd multiple of s0. We could introduce a fifth constant, e, such that s0 * e = s0 + X. e = 5 produces better distributions than e = 1 (the original xoroshiro++) so does this mean e = 3 would be somewhere in between? Or could we increase e > 5 and get better results?
Instead of using non-obvious names, [a,b,c,d,e]+++ to [a,b,c,d,e]--- could identify the eight different +/- options, where
4. The best result for the new distributions you posted is this one:
which could be known as xoroshiro[12,4,15,7,5]-+-
As for the naming, that's all a bit hackish anyway. That particular edition was from before I started down the "try all combinations" path. I've just left it alone. The other seven editions, as per your example, are more long-winded naming.
I've done a number of grid runs now, though not the Xoroshiro32(16)-s1+s0-x run yet. Have been repairing/changing some scripts in the process.
Here's what I mean. Best scores for xoroshiro[a,b,c,d,5]??? are:
Huh, that sign pairing really stands out. I've just completed the first grid run for xom10 (-++) and comparing it to the earlier xom03 (+--) it comes out remarkably close sorting order. And not just the order either, matching candidate grid score averages are close too. So, not surprisingly, whole gridding run averages are also close: 29.584 for xom03, 29.626 for xom10.
s0 * 5 is an extra 14-bit adder for each iteration compared to just s0, therefore the double iteration in XORO32 would use 92 add/subtract bits instead of 64 now. No extra routing resources should be needed for s0 + s0 * 4 as both operands are the same.
I notice: s0 + s0 * 4, from a hardware implementation standpoint, would likely be nearly the same as s0 + rotl(s0, 2).
The worth of those two (normally discarded) bits might be high, with the right constants, but equidistribution would have to be verified.
I have not tested this idea yet in my own code (which uses two * 5s), but from a software implementation standpoint, I suspect it would likely overburden an already saturated CPU pipeline.