zfreq(x) only has a meaning when x > 0 and zfreq(0) tells us nothing new as the code makes zfreq(0) = 2^32 - pfreq(0). The average non-zero run length = e ~ 2.718 and the average zero run length = e / (e-1) ~ 1.582. The former is greater than the latter because there are more non-zero pair values than zero ones. The number of runs is the same for both by definition (to within one at worst). I think no more distribution tests are needed and I'll put a summary of the scores together soon.

Thanks a lot, Evan. zfreq(0) is correct now. Looking just at zfreq, [6,2,3,9] and [6,2,11,6] do very well. Not checked others.

[6 2 11 6] comes up clean. There is three very poor scores but they all check out as borderline BCFN_FF cases. So, really only a single 256 MB and the rest are 512 MB or better.

Here's all nine of the every-aperture score tables, and also the extended PR reports for the three borderline cases of [6 2 11 6].

Thanks a lot, Evan. zfreq(0) is correct now. Looking just at zfreq, [6,2,3,9] and [6,2,11,6] do very well. Not checked others.

[6 2 11 6] comes up clean. There is three very poor scores but they all check out as borderline BCFN_FF cases. So, really only a single 256 MB and the rest are 512 MB or better.

Here's all nine of the every-aperture score tables, and also the extended PR reports for the three borderline cases of [6 2 11 6].

Thanks, Evan. Have you dropped a couple of candidates recently? Is it the case that the nine consist of the seven with the best scores, plus [14,2,7,5] and [14,2,7,6]?

The key question: based on PractRand tests which do you think is the best?

Here's all nine of the every-aperture score tables, and also the extended PR reports for the three borderline cases of [6 2 11 6].

Thanks, Evan. Have you dropped a couple of candidates recently? Is it the case that the nine consist of the seven with the best scores, plus [14,2,7,5] and [14,2,7,6]?

There was plenty more where those came from. The only other candidate I'd being working on lately was the poorly scoring one for comparison.

The key question: based on PractRand tests which do you think is the best?

I've been spending a lot more time coding than pondering the results.

It would be useful to have an every-aperture table with all the corrections for borderline cases.

Ha! That won't be happening unless someone fines a way to tell Practrand to ease up on the sensitivity of the BCFN_FF tests.

You're welcome to hand edit the tables yourself.

It would be good if Chris (the PractRand author) could tell us how to modify the BCFN_FF tests. I could edit the tables, but the amended borderline scores are scattered about in different files and I think not all of them have been posted.

It's interesting that for eight of the nine [a,b,c,d] candidates b = 2, which is a left shift, whereas a, c & d are all left rotates. The exception is [13,5,10,10].

The nine were hand picked from the following table (which I don't seem to have posted before). These are the auto-culled most recent complete run of the full set of candidates:

EDIT: I note it took only 9 hours to run. That would suggest the culling threshold was set very high, ie: Any scores (except for direct 16+0 aperture) below 512 MB would have terminated the testing of that candidate.

We need some simplified scoring method to compare the every-aperture results, maybe mean and standard deviation for each sample size. Another way could be visually, using 16 by 16 "randomgrams".

The nine were hand picked from the following table (which I don't seem to have posted before). <snip>
EDIT: I note it took only 9 hours to run. That would suggest the culling threshold was set very high, ie: Any scores (except for direct 16+0 aperture) below 512 MB would have terminated the testing of that candidate.

I wonder how long it would take to run an every-aperture test for all possible [a,b,c,d] candidates (84 * 16 = 1344), then stop each test if a 16-bit sample score is less than 512M? That should weed out most of the duds. Ideally, the BCFN_FF test would be amended first.

I wonder how long it would take to run an every-aperture test for all possible [a,b,c,d] candidates (84 * 16 = 1344) ...

2000 hours (84 days) without any culling.

Ideally, the BCFN_FF test would be amended first.

Yeah, I'll have a nosy at the Practrand sources. I'm not likely to solve it myself though.

This issue has probably already made that culled list above smaller than it should be. It's been present all along but I hadn't gone looking at the report files that closely until the oddball scores really stuck out.

Notes
pfreq(x) = output pairs that occur x times
zfreq(x) = zero runs of length x

If N-1 = full period (2^32-1 for the above) then:

(a) Expected pfreq(x) = 1/x! * N/e
(b) Sum of pfreq(x) = N
(c) Sum of x * pfreq(x) = N-1
(d) Expected zero runs = (1-1/e) * N/e
(e) Expected mean run = 1/(1-1/e)
(f) Expected zfreq(x) = (1-1/e)^2 * N/e^x
(g) Sum of zfreq(x) = zero runs
(h) Sum of x * zfreq(x) = pfreq0
(i) Sum of x * zfreq(x) / Sum of zfreq(x) = mean run

Comments
These frequency distribution tests are intended as an independent "sanity check" but doing well does not imply the same will happen in other, more rigorous randomness tests. However, these frequency tests can detect certain bad candidates and also act as a "tie-breaker" for good candidates, as is the case here. For example, assuming all other scores are equal, [6,2,5,5] could be eliminated on pair frequency (which is more important in my opinion than zero run frequency).

I've made a multi-case launching script too. It uses all RAM and creates a lot of smaller files now. I mainly chose this approach because I was wary of many programs concurrently trying to append the same files.

There is a small error in the above code. If the last byte of the 4GB pair array is zero, then the last zero run will not be written to the zfreq array. Also, it would better if zfreq[0] held the number of zero runs, i.e. it is incremented whenever zfreq[tally] is incremented. I think the amended code could be:

The results two posts up are remarkably good. |Actual-Expected|/Expected show this best. The smaller the better and some of the errors are tiny. Note the consistency across pfreq0-pfreq3. I deduced the zero run exponential distribution from the results, I can't prove it!

I'd like to see how bad candidates perform, e.g. xoroshiro32++ [14,2,7,8] or xoroshiro32+p [14,2,7] or xoroshiro32+ [14,2,7], getting progressively worse. Repeat frequency tests suggests xoroshiro32++ [14,2,7,7] and [14,2,7,9] will be adversely affected by their proximity to [14,2,7,8].

We could run the frequency tests and use pfreq0 to cull candidates before any PractRand culling. For example, |Actual-Expected|/Expected less than 0.001 is achievable, thus the |Actual-Expected| threshold could be set to Expected/1000 = 1580030. In other words, we look for better than 0.999 accuracy (> 99.9%). Harsh, but fair. Record the candidates that pass, then try these with 16-bit sample PractRand tests with minimum threshold 512M. This process should be a lot faster than using PractRand only.

## Comments

4560Vote UpVote Down4560Vote UpVote DownEvan, this full table with 256 PractRand scores is excellent. Do you have similar tables for all the 11 candidates you've been looking at recently?

4,8210Vote UpVote Down[6 2 11 6] comes up clean. There is three very poor scores but they all check out as borderline BCFN_FF cases. So, really only a single 256 MB and the rest are 512 MB or better.

Here's all nine of the every-aperture score tables, and also the extended PR reports for the three borderline cases of [6 2 11 6].

9,1450Vote UpVote DownIts very good work. Should be published and accessible.

Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One AnotherSKYPE = acuity_dougParallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>

4,8210Vote UpVote Down4,8210Vote UpVote Down4,8210Vote UpVote DownI haven't posted the sources in that topic. The testing source code is still evolving in some ways. And Tony has some of his own.

There is a number of archives in various posts of this topic though.

4560Vote UpVote DownMaybe one more, if possible. [14,2,7,8] should do poorly and would be a good comparison to the others.

4560Vote UpVote DownThanks, Evan. Have you dropped a couple of candidates recently? Is it the case that the nine consist of the seven with the best scores, plus [14,2,7,5] and [14,2,7,6]?

The key question: based on PractRand tests which do you think is the best?

4560Vote UpVote DownI think we should keep using this thread for test results and leave the other topic for general info about the algorithm plus recommended constants.

4560Vote UpVote DownIt would be useful to have an every-aperture table with all the corrections for borderline cases.

4,8210Vote UpVote DownHa! That won't be happening unless someone fines a way to tell Practrand to ease up on the sensitivity of the BCFN_FF tests.

You're welcome to hand edit the tables yourself.

4,8210Vote UpVote DownI've been spending a lot more time coding than pondering the results.

4560Vote UpVote DownIt would be good if Chris (the PractRand author) could tell us how to modify the BCFN_FF tests. I could edit the tables, but the amended borderline scores are scattered about in different files and I think not all of them have been posted.

It's interesting that for eight of the nine [a,b,c,d] candidates b = 2, which is a left shift, whereas a, c & d are all left rotates. The exception is [13,5,10,10].

4,8210Vote UpVote DownEDIT: I note it took only 9 hours to run. That would suggest the culling threshold was set very high, ie: Any scores (except for direct 16+0 aperture) below 512 MB would have terminated the testing of that candidate.

4560Vote UpVote Down4560Vote UpVote DownI wonder how long it would take to run an every-aperture test for all possible [a,b,c,d] candidates (84 * 16 = 1344), then stop each test if a 16-bit sample score is less than 512M? That should weed out most of the duds. Ideally, the BCFN_FF test would be amended first.

4560Vote UpVote Down4,8210Vote UpVote DownYeah, I'll have a nosy at the Practrand sources. I'm not likely to solve it myself though.

This issue has probably already made that culled list above smaller than it should be. It's been present all along but I hadn't gone looking at the report files that closely until the oddball scores really stuck out.

4560Vote UpVote DownHere are the processed results for the selected xoroshiro32++ [a,b,c,d] in the reports:

Pair frequencyActual and ExpectedPair frequency|Actual-Expected|/ExpectedZero run frequencyActual and ExpectedZero run frequency|Actual-Expected|/ExpectedNotespfreq(x) = output pairs that occur x times

zfreq(x) = zero runs of length x

If N-1 = full period (2^32-1 for the above) then:

(a) Expected pfreq(x) = 1/x! * N/e

(b) Sum of pfreq(x) = N

(c) Sum of x * pfreq(x) = N-1

(d) Expected zero runs = (1-1/e) * N/e

(e) Expected mean run = 1/(1-1/e)

(f) Expected zfreq(x) = (1-1/e)^2 * N/e^x

(g) Sum of zfreq(x) = zero runs

(h) Sum of x * zfreq(x) = pfreq0

(i) Sum of x * zfreq(x) / Sum of zfreq(x) = mean run

CommentsThese frequency distribution tests are intended as an independent "sanity check" but doing well does not imply the same will happen in other, more rigorous randomness tests. However, these frequency tests can detect certain bad candidates and also act as a "tie-breaker" for good candidates, as is the case here. For example, assuming all other scores are equal, [6,2,5,5] could be eliminated on pair frequency (which is more important in my opinion than zero run frequency).

4560Vote UpVote DownThere is a small error in the above code. If the last byte of the 4GB pair array is zero, then the last zero run will not be written to the zfreq array. Also, it would better if zfreq[0] held the number of zero runs, i.e. it is incremented whenever zfreq[tally] is incremented. I think the amended code could be:

4560Vote UpVote DownI'd like to see how bad candidates perform, e.g. xoroshiro32++ [14,2,7,8] or xoroshiro32+p [14,2,7] or xoroshiro32+ [14,2,7], getting progressively worse. Repeat frequency tests suggests xoroshiro32++ [14,2,7,7] and [14,2,7,9] will be adversely affected by their proximity to [14,2,7,8].

4560Vote UpVote Downbeforeany PractRand culling. For example, |Actual-Expected|/Expected less than 0.001 is achievable, thus the |Actual-Expected| threshold could be set to Expected/1000 = 1580030. In other words, we look for better than 0.999 accuracy (> 99.9%).Harsh, but fair.Record the candidates that pass, then try these with 16-bit sample PractRand tests with minimum threshold 512M. This process should be a lot faster than using PractRand only.4560Vote UpVote Down