zfreq(x) only has a meaning when x > 0 and zfreq(0) tells us nothing new as the code makes zfreq(0) = 2^32 - pfreq(0). The average non-zero run length = e ~ 2.718 and the average zero run length = e / (e-1) ~ 1.582. The former is greater than the latter because there are more non-zero pair values than zero ones. The number of runs is the same for both by definition (to within one at worst). I think no more distribution tests are needed and I'll put a summary of the scores together soon.

Thanks a lot, Evan. zfreq(0) is correct now. Looking just at zfreq, [6,2,3,9] and [6,2,11,6] do very well. Not checked others.

[6 2 11 6] comes up clean. There is three very poor scores but they all check out as borderline BCFN_FF cases. So, really only a single 256 MB and the rest are 512 MB or better.

Here's all nine of the every-aperture score tables, and also the extended PR reports for the three borderline cases of [6 2 11 6].

Thanks a lot, Evan. zfreq(0) is correct now. Looking just at zfreq, [6,2,3,9] and [6,2,11,6] do very well. Not checked others.

[6 2 11 6] comes up clean. There is three very poor scores but they all check out as borderline BCFN_FF cases. So, really only a single 256 MB and the rest are 512 MB or better.

Here's all nine of the every-aperture score tables, and also the extended PR reports for the three borderline cases of [6 2 11 6].

Thanks, Evan. Have you dropped a couple of candidates recently? Is it the case that the nine consist of the seven with the best scores, plus [14,2,7,5] and [14,2,7,6]?

The key question: based on PractRand tests which do you think is the best?

Here's all nine of the every-aperture score tables, and also the extended PR reports for the three borderline cases of [6 2 11 6].

Thanks, Evan. Have you dropped a couple of candidates recently? Is it the case that the nine consist of the seven with the best scores, plus [14,2,7,5] and [14,2,7,6]?

There was plenty more where those came from. The only other candidate I'd being working on lately was the poorly scoring one for comparison.

The key question: based on PractRand tests which do you think is the best?

I've been spending a lot more time coding than pondering the results.

"Are we alone in the universe?"
"Yes," said the Oracle.
"So there's no other life out there?"
"There is. They're alone too."

It would be useful to have an every-aperture table with all the corrections for borderline cases.

Ha! That won't be happening unless someone fines a way to tell Practrand to ease up on the sensitivity of the BCFN_FF tests.

You're welcome to hand edit the tables yourself.

It would be good if Chris (the PractRand author) could tell us how to modify the BCFN_FF tests. I could edit the tables, but the amended borderline scores are scattered about in different files and I think not all of them have been posted.

It's interesting that for eight of the nine [a,b,c,d] candidates b = 2, which is a left shift, whereas a, c & d are all left rotates. The exception is [13,5,10,10].

The nine were hand picked from the following table (which I don't seem to have posted before). These are the auto-culled most recent complete run of the full set of candidates:

EDIT: I note it took only 9 hours to run. That would suggest the culling threshold was set very high, ie: Any scores (except for direct 16+0 aperture) below 512 MB would have terminated the testing of that candidate.

"Are we alone in the universe?"
"Yes," said the Oracle.
"So there's no other life out there?"
"There is. They're alone too."

We need some simplified scoring method to compare the every-aperture results, maybe mean and standard deviation for each sample size. Another way could be visually, using 16 by 16 "randomgrams".

The nine were hand picked from the following table (which I don't seem to have posted before). <snip>
EDIT: I note it took only 9 hours to run. That would suggest the culling threshold was set very high, ie: Any scores (except for direct 16+0 aperture) below 512 MB would have terminated the testing of that candidate.

I wonder how long it would take to run an every-aperture test for all possible [a,b,c,d] candidates (84 * 16 = 1344), then stop each test if a 16-bit sample score is less than 512M? That should weed out most of the duds. Ideally, the BCFN_FF test would be amended first.

I wonder how long it would take to run an every-aperture test for all possible [a,b,c,d] candidates (84 * 16 = 1344) ...

2000 hours (84 days) without any culling.

Ideally, the BCFN_FF test would be amended first.

Yeah, I'll have a nosy at the Practrand sources. I'm not likely to solve it myself though.

This issue has probably already made that culled list above smaller than it should be. It's been present all along but I hadn't gone looking at the report files that closely until the oddball scores really stuck out.

"Are we alone in the universe?"
"Yes," said the Oracle.
"So there's no other life out there?"
"There is. They're alone too."

Notes
pfreq(x) = output pairs that occur x times
zfreq(x) = zero runs of length x

If N-1 = full period (2^32-1 for the above) then:

(a) Expected pfreq(x) = 1/x! * N/e
(b) Sum of pfreq(x) = N
(c) Sum of x * pfreq(x) = N-1
(d) Expected zero runs = (1-1/e) * N/e
(e) Expected mean zero run = 1/(1-1/e)
(f) Expected zfreq(x) = (1-1/e)^2 * N/e^x
(g) Sum of zfreq(x) = zero runs
(h) Sum of x * zfreq(x) = pfreq0
(i) Sum of x * zfreq(x) / Sum of zfreq(x) = mean run

x >= 0 for pfreq(x), x > 0 for zfreq(x)

Comments
These frequency distribution tests are intended as an independent "sanity check" but doing well does not imply the same will happen in other, more rigorous randomness tests. However, these frequency tests can detect certain bad candidates and also act as a "tie-breaker" for good candidates, as is the case here. For example, assuming all other scores are equal, [6,2,5,5] could be eliminated on pair frequency (which is more important in my opinion than zero run frequency).

I've made a multi-case launching script too. It uses all RAM and creates a lot of smaller files now. I mainly chose this approach because I was wary of many programs concurrently trying to append the same files.

There is a small error in the above code. If the last byte of the 4GB pair array is zero, then the last zero run will not be written to the zfreq array. Also, it would better if zfreq[0] held the number of zero runs, i.e. it is incremented whenever zfreq[tally] is incremented. I think the amended code could be:

The results two posts up are remarkably good. |Actual-Expected|/Expected show this best. The smaller the better and some of the errors are tiny. Note the consistency across pfreq0-pfreq3. I deduced the zero run exponential distribution from the results, I can't prove it!

I'd like to see how bad candidates perform, e.g. xoroshiro32++ [14,2,7,8] or xoroshiro32+p [14,2,7] or xoroshiro32+ [14,2,7], getting progressively worse. Repeat frequency tests suggests xoroshiro32++ [14,2,7,7] and [14,2,7,9] will be adversely affected by their proximity to [14,2,7,8].

We could run the frequency tests and use pfreq0 to cull candidates before any PractRand culling. For example, |Actual-Expected|/Expected less than 0.001 is achievable, thus the |Actual-Expected| threshold could be set to Expected/1000 = 1580030. In other words, we look for better than 0.999 accuracy (> 99.9%). Harsh, but fair. Record the candidates that pass, then try these with 16-bit sample PractRand tests with minimum threshold 512M. This process should be a lot faster than using PractRand only.

1. A pfreq13+ value that is the sum of all output pair frequencies of 13 and higher.
2. A zfreq22+ value that is the sum of all zero run lengths of 22 and higher.
3. A mean run value for average zero run length, which is the ratio of two sums calculated using test data and therefore arguably the most meaningful zero run result. The relative rankings between different candidates for mean run are very similar to pfreq0-pfreq3, which suggests the latter could be used solely as "tie-breakers".

Both 1 and 2 are expected to be zero for a good candidate, but could be significantly more than zero for a bad candidate (conjecture in the absence of an actual test).

## Comments

7190Vote UpVote Down7190Vote UpVote DownEvan, this full table with 256 PractRand scores is excellent. Do you have similar tables for all the 11 candidates you've been looking at recently?

5,4310Vote UpVote Down[6 2 11 6] comes up clean. There is three very poor scores but they all check out as borderline BCFN_FF cases. So, really only a single 256 MB and the rest are 512 MB or better.

Here's all nine of the every-aperture score tables, and also the extended PR reports for the three borderline cases of [6 2 11 6].

"Yes," said the Oracle.

"So there's no other life out there?"

"There is. They're alone too."

9,3700Vote UpVote DownIts very good work. Should be published and accessible.

Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One AnotherSKYPE = acuity_dougParallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>

5,4310Vote UpVote Down"Yes," said the Oracle.

"So there's no other life out there?"

"There is. They're alone too."

5,4310Vote UpVote Down"Yes," said the Oracle.

"So there's no other life out there?"

"There is. They're alone too."

5,4310Vote UpVote DownI haven't posted the sources in that topic. The testing source code is still evolving in some ways. And Tony has some of his own.

There is a number of archives in various posts of this topic though.

"Yes," said the Oracle.

"So there's no other life out there?"

"There is. They're alone too."

7190Vote UpVote DownMaybe one more, if possible. [14,2,7,8] should do poorly and would be a good comparison to the others.

7190Vote UpVote DownThanks, Evan. Have you dropped a couple of candidates recently? Is it the case that the nine consist of the seven with the best scores, plus [14,2,7,5] and [14,2,7,6]?

The key question: based on PractRand tests which do you think is the best?

7190Vote UpVote DownI think we should keep using this thread for test results and leave the other topic for general info about the algorithm plus recommended constants.

7190Vote UpVote DownIt would be useful to have an every-aperture table with all the corrections for borderline cases.

5,4310Vote UpVote DownHa! That won't be happening unless someone fines a way to tell Practrand to ease up on the sensitivity of the BCFN_FF tests.

You're welcome to hand edit the tables yourself.

"Yes," said the Oracle.

"So there's no other life out there?"

"There is. They're alone too."

5,4310Vote UpVote DownI've been spending a lot more time coding than pondering the results.

"Yes," said the Oracle.

"So there's no other life out there?"

"There is. They're alone too."

7190Vote UpVote DownIt would be good if Chris (the PractRand author) could tell us how to modify the BCFN_FF tests. I could edit the tables, but the amended borderline scores are scattered about in different files and I think not all of them have been posted.

It's interesting that for eight of the nine [a,b,c,d] candidates b = 2, which is a left shift, whereas a, c & d are all left rotates. The exception is [13,5,10,10].

5,4310Vote UpVote DownEDIT: I note it took only 9 hours to run. That would suggest the culling threshold was set very high, ie: Any scores (except for direct 16+0 aperture) below 512 MB would have terminated the testing of that candidate.

"Yes," said the Oracle.

"So there's no other life out there?"

"There is. They're alone too."

7190Vote UpVote Down7190Vote UpVote DownI wonder how long it would take to run an every-aperture test for all possible [a,b,c,d] candidates (84 * 16 = 1344), then stop each test if a 16-bit sample score is less than 512M? That should weed out most of the duds. Ideally, the BCFN_FF test would be amended first.

7190Vote UpVote Down5,4310Vote UpVote DownYeah, I'll have a nosy at the Practrand sources. I'm not likely to solve it myself though.

This issue has probably already made that culled list above smaller than it should be. It's been present all along but I hadn't gone looking at the report files that closely until the oddball scores really stuck out.

"Yes," said the Oracle.

"So there's no other life out there?"

"There is. They're alone too."

7190Vote UpVote DownHere are the processed results for the selected xoroshiro32++ [a,b,c,d] in the reports:

Pair frequencyActual and ExpectedPair frequency|Actual-Expected|/ExpectedZero run frequencyActual and ExpectedZero run frequency|Actual-Expected|/ExpectedNotespfreq(x) = output pairs that occur x times

zfreq(x) = zero runs of length x

If N-1 = full period (2^32-1 for the above) then:

(a) Expected pfreq(x) = 1/x! * N/e

(b) Sum of pfreq(x) = N

(c) Sum of x * pfreq(x) = N-1

(d) Expected zero runs = (1-1/e) * N/e

(e) Expected mean zero run = 1/(1-1/e)

(f) Expected zfreq(x) = (1-1/e)^2 * N/e^x

(g) Sum of zfreq(x) = zero runs

(h) Sum of x * zfreq(x) = pfreq0

(i) Sum of x * zfreq(x) / Sum of zfreq(x) = mean run

x >= 0 for pfreq(x), x > 0 for zfreq(x)

CommentsThese frequency distribution tests are intended as an independent "sanity check" but doing well does not imply the same will happen in other, more rigorous randomness tests. However, these frequency tests can detect certain bad candidates and also act as a "tie-breaker" for good candidates, as is the case here. For example, assuming all other scores are equal, [6,2,5,5] could be eliminated on pair frequency (which is more important in my opinion than zero run frequency).

7190Vote UpVote DownThere is a small error in the above code. If the last byte of the 4GB pair array is zero, then the last zero run will not be written to the zfreq array. Also, it would better if zfreq[0] held the number of zero runs, i.e. it is incremented whenever zfreq[tally] is incremented. I think the amended code could be:

7190Vote UpVote DownI'd like to see how bad candidates perform, e.g. xoroshiro32++ [14,2,7,8] or xoroshiro32+p [14,2,7] or xoroshiro32+ [14,2,7], getting progressively worse. Repeat frequency tests suggests xoroshiro32++ [14,2,7,7] and [14,2,7,9] will be adversely affected by their proximity to [14,2,7,8].

7190Vote UpVote Downbeforeany PractRand culling. For example, |Actual-Expected|/Expected less than 0.001 is achievable, thus the |Actual-Expected| threshold could be set to Expected/1000 = 1580030. In other words, we look for better than 0.999 accuracy (> 99.9%).Harsh, but fair.Record the candidates that pass, then try these with 16-bit sample PractRand tests with minimum threshold 512M. This process should be a lot faster than using PractRand only.7190Vote UpVote Down7190Vote UpVote DownAnd a response:

http://www.pcg-random.org/posts/on-vignas-pcg-critique.html

7190Vote UpVote Down1. A

pfreq13+value that is the sum of all output pair frequencies of 13 and higher.2. A

zfreq22+value that is the sum of all zero run lengths of 22 and higher.3. A

mean runvalue for average zero run length, which is the ratio of two sums calculated using test data and therefore arguably the most meaningful zero run result. The relative rankings between different candidates for mean run are very similar to pfreq0-pfreq3, which suggests the latter could be used solely as "tie-breakers".Both 1 and 2 are expected to be zero for a good candidate, but could be significantly more than zero for a bad candidate (conjecture in the absence of an actual test).

7190Vote UpVote DownDo you have the full-period constants for any of xoroshiro36/38/40?

5,4310Vote UpVote Down"Yes," said the Oracle.

"So there's no other life out there?"

"There is. They're alone too."

7190Vote UpVote DownMany thanks, Evan. xoroshiro40 has relatively few full-period constants.

5,4310Vote UpVote DownHere's the same run for Xorshift "xs-s20" but including the log file. I don't seem to have kept very many logs.

EDIT: Looking at the s15 candidate list, it's only 73 lines, which is shorter than the s16 list. That makes the s16 list oddly long at 84 lines.

"Yes," said the Oracle.

"So there's no other life out there?"

"There is. They're alone too."