I've made a multi-case launching script too. It uses all RAM and creates a lot of smaller files now. I mainly chose this approach because I was wary of many programs concurrently trying to append the same files.
There is a small error in the above code. If the last byte of the 4GB pair array is zero, then the last zero run will not be written to the zfreq array. Also, it would better if zfreq[0] held the number of zero runs, i.e. it is incremented whenever zfreq[tally] is incremented. I think the amended code could be:
Evan, would it be possible for you to run one further test of the pairs/zero runs distribution, with code amended as above? I am interested in xoroshiro32+ [6,2,3] to compare with xoroshiro32++ [6,2,3,9] already done. If you fancy doing two more tests then xoroshiro32+ [14,2,7] could be the other.
In Melissa's latest Birthday test post she says that xoroshiro+ has the correct number of expected repeated outputs for a small sample size. However, from my own tests, xoroshiro+ does not produce the expected binomial distribution of duplicated outputs (i.e. two successive outputs identical) for a complete iteration.
Hi Tony,
Hang fire just a bit. I'm not long into having another hack at this scoring business again. I slept on it for a few weeks before deciding that Chris isn't around any more and deciding I needed to find another way to make PractRand more dependable ... and think I've got it sus'd now.
I've added a check after each PractRand exit to extract the total number of FAILed tests and how many were also due to BCFN tests. If all the fails are due to BCFN tests then that whole case is run over again but setting PractRand's starting length at double the prior fail length, aka score. If it was the correct score then this will double the score, but otherwise the case will continue until the correct length. This seems to reliably push any glitched score up to its correct length.
I'm doing a full s16 run just now. And will be looking out for secondary glitches to see if maybe the checks need to go further.
Evan, would it be possible for you to run one further test of the pairs/zero runs distribution, with code amended as above? I am interested in xoroshiro32+ [6,2,3] to compare with xoroshiro32++ [6,2,3,9] already done. If you fancy doing two more tests then xoroshiro32+ [14,2,7] could be the other.
Right, I've modified the code to do original single summing as well. Here's those two candidates.
Evan, would it be possible for you to run one further test of the pairs/zero runs distribution, with code amended as above? I am interested in xoroshiro32+ [6,2,3] to compare with xoroshiro32++ [6,2,3,9] already done. If you fancy doing two more tests then xoroshiro32+ [14,2,7] could be the other.
Right, I've modified the code to do original single summing as well. Here's those two candidates.
Thanks, Evan. I've had a quick look and the pair distribution for [6,2,3] looks quite good, but its zero distribution looks really bad with far too many non-zero higher frequencies of 22 or more - strange.
Okay, the s16 full culling run is done. Attached is the complete score table and I've put in the log file as well. The good news is the log shows all glitches have been accommodated.
The bad news is the strong culling has wiped out all the [14 2 7 x] candidates while leaving plenty of others to choose from. This isn't anything to do with the glitches though. It's due to reorienting the scoring solely around byte sampling apertures.
Here's the short list I'll be targetting from now on:
[14,2,7,7] qualifies but is missing from the above list. The culling is a bit too strong maybe and you could continue after an 8-bit score of 256M, provided it is not for 7:0 or 15:8 (both of these must be at least 512M and 15:0).
An important point is that Parallax users will only see xoroshiro32++ as part of XORO32. If they want only the high or low 16 bits, then an extended double interation test for 15:0 would be handy, i.e. missing out every other 16-bit output. If users want only the high or low 8 bits of the 32-bit XORO32 output, then separate extended double iteration tests for 15:8 and 7:0 would be useful.
For old 8-bit CPUs that can shift or rotate just one bit at a time, xoroshiro32++ [2,8,7,8] is the fastest algorithm with good PractRand scores, although less so for duplicated outputs. xoroshiro32++ [13,9,8,8] requires one more shift/rotate (four, instead of three) but has a slightly better 7:0 score of 1G.
I'm running the selected grid scoring now. Not surprisingly, [3 2 6 9] is still on top.
[anomaly comments deleted] Made a mistake on the retest.
EDIT: As more of the grids complete ... it looks like this approach is working well. There is notable poor scores for nearly all candidates. [3 2 6 9] is the only flawless exception so far.
Ah, I just realised it wasn't previously [3 2 6 9], but rather we had been historically working with [6 2 3 9]. Unfortunately, the grid for [6 2 3 9] has one dismal 16 MB score at aperture 2>>1. I've checked that score out and it is legit. See attached report.
For old 8-bit CPUs that can shift or rotate just one bit at a time, xoroshiro32++ [2,8,7,8] is the fastest algorithm with good PractRand scores, although less so for duplicated outputs. xoroshiro32++ [13,9,8,8] requires one more shift/rotate (four, instead of three) but has a slightly better 7:0 score of 1G.
[2,8,7,8] isn't garbage but not a great choice either. The 12-bit sampling apertures stand out here. Even sized apertures are always lower scoring but that line is more distinct that usual, like the usual persistently low 16-bit aperture scores.
For old 8-bit CPUs that can shift or rotate just one bit at a time, xoroshiro32++ [2,8,7,8] is the fastest algorithm with good PractRand scores, although less so for duplicated outputs. xoroshiro32++ [13,9,8,8] requires one more shift/rotate (four, instead of three) but has a slightly better 7:0 score of 1G.
[2,8,7,8] isn't garbage but not a great choice either. The 12-bit sampling apertures stand out here. Even sized apertures are always lower scoring but that line is more distinct that usual, like the usual persistently low 16-bit aperture scores.
Anything with d = 8 won't be excellent, but both b = 8 and d = 8 make [2,8,7,8] the fastest option for 8-bit CPUs and the scores aren't terrible. Is [13,9,8,8] any better?
Also, regarding the already chosen [14 2 7 5] candidate, in the Prop2's XORO32 instruction, it still looks comparatively high quality in these grids. Its poorest grid score is a single 128 MB. And the only two 256 MB scores just happened to sit on the auto-culling line. I guess this leans support to reviewing more of the 256 MB culls.
At this stage, based on these grid scores, I'd place [14 2 7 5] as number two ranking, right behind [3 2 6 9].
And here's the revised [14 2 7 6], the original chosen candidate, as well. As seems to be the case over and over, it's virtually identical scoring to [14 2 7 5]. Slightly worse this time.
Opps, [3 2 6 5] is very strong too. It bottoms out at four 256 MB scores. That should place it in second spot above [14 2 7 5] due to not having any 128 MB scores. Although it is littered with about ten more 512 MB scores. Maybe second equal.
Pair frequency distributions for selected xoroshiro32+[a,b,c], xoroshiro32+p[a,b,c] and xoroshiro32++[a,b,c,d] candidates.
N.B. The pair distributions are identical to the XORO32 distributions, but ordering of data is different
Notes
pfreq(x) = output pairs that occur x times
If N-1 = full period (2^32-1 for the above) then:
(a) Expected pfreq(x) = 1/x! * N/e
(b) Sum of pfreq(x) = N
(c) Sum of x * pfreq(x) = N-1
I think you cull anything < 1G for [7:0] except for [7:0] = 512M and [15:0] = 1G, but this omits several candidates for which [7:0] = [15:0] = 512M.
Yeah, I've got some of the broader unculled 512 MB candidates lined up in the schedule. When that's all through I'll do another culling run set to 256 MB instead of 512 MB. It'll be a long run!
Well, inevitably, even after a couple of iterative improvements, my hack was still bound to miss a corner case ... And it has.
After generating the latest grid for candidate [13 5 10 10], I quickly spotted a suspicious score of 32 KB at aperture 7>>8. And looking through the log I see these two lines:
And digging out the specific PractRand report file it is as expected, ending at 32 KB because the revert overwrites whatever report was made from the "trying larger" attempt.
So, I run up another script that allows specifying all the component parameters for a single candidate, including telling PractRand to continue past normal failures. See attached. The problem reveals itself as being that both 32 KB and 64 KB are basically identical fails.
[3,6,9,x] and [6,3,9,x] have weight or degree = 15, close to theoretically-best 32/2.
I'm guessing you meant [3 2 6 x] and [6 2 3 x].
Oops, yes, very late night errors.
Sample size of 4 is worse than 3 or 5, 8 is worse than 7 or 9 and 16 worse than 15 because FPF test can detect equidistribution and more easily for even sample widths. I think we should give these odd sizes considerable importance when choosing between candidates, especially 15.
Another thought. The initial seed should not make any difference but as a check have you tried more than one seed when the score is unusually low? The sequences must not overlap, e.g. could use 1 and seed 2^31 iterations later (halfway point). Second seed would vary between candidates.
Sample size of 4 is worse than 3 or 5, 8 is worse than 7 or 9 and 16 worse than 15 because FPF test can detect equidistribution and more easily for even sample widths. I think we should give these odd sizes considerable importance when choosing between candidates, especially 15.
Isn't detection of equidistribution a valid scoring criteria? I was more inclined to ignore higher scoring from the odd-sized sampling apertures.
Another thought. The initial seed should not make any difference but as a check have you tried more than one seed when the score is unusually low? The sequences must not overlap, e.g. could use 1 and seed 2^31 iterations later (halfway point). Second seed would vary between candidates.
To get past the BCFN glitches? Or the oddball low scores (mid megabytes) that I've kept?
Sample size of 4 is worse than 3 or 5, 8 is worse than 7 or 9 and 16 worse than 15 because FPF test can detect equidistribution and more easily for even sample widths. I think we should give these odd sizes considerable importance when choosing between candidates, especially 15.
Isn't detection of equidistribution a valid scoring criteria? I was more inclined to ignore higher scoring from the odd-sized sampling apertures.
The higher scores for the odd sizes are due to the fact that the equidistribution is harder to detect because the extended FPF tests seem to work on 4, 8 or 16 bit samples. I think 15-bit scores are a good proxy for what 16-bit would be without FPF, especially now with sixteen different low bits, but disabling FPF leads to far too many high scores.
Another thought. The initial seed should not make any difference but as a check have you tried more than one seed when the score is unusually low? The sequences must not overlap, e.g. could use 1 and seed 2^31 iterations later (halfway point). Second seed would vary between candidates.
To get past the BCFN glitches? Or the oddball low scores (mid megabytes) that I've kept?
I was thinking mainly of the FPF failures for [6,2,3,9] at 16M.
The Z80 code for [14,8,9,9] is only 13% bigger and slower than for [2,8,7,8] and it would be interesting to see the former's grid scores, when they are ready.
I think 15-bit scores are a good proxy for what 16-bit would be without FPF, especially now with sixteen different low bits, but disabling FPF leads to far too many high scores.
Hmm, I don't think that matters for finding best candidate. There is scoring effects on both even and odd sized apertures equally. The fact that there is a step difference between evens and odds is an ignorable constant.
EDIT: Well, ignorable to the point of maybe a low score for odd sized apertures maybe should be considered poorer than the same low even sized score.
I think 15-bit scores are a good proxy for what 16-bit would be without FPF, especially now with sixteen different low bits, but disabling FPF leads to far too many high scores.
Hmm, I don't think that matters for finding best candidate. There is scoring effects on both even and odd sized apertures equally. The fact that there is a step difference between evens and odds is an ignorable constant.
EDIT: Well, ignorable to the point of maybe a low score for odd sized apertures maybe should be considered poorer than the same low even sized score.
We should look at the whole grid, even and odd. 15-bit scores might be good tie-breakers.
Comments
Evan, would it be possible for you to run one further test of the pairs/zero runs distribution, with code amended as above? I am interested in xoroshiro32+ [6,2,3] to compare with xoroshiro32++ [6,2,3,9] already done. If you fancy doing two more tests then xoroshiro32+ [14,2,7] could be the other.
In Melissa's latest Birthday test post she says that xoroshiro+ has the correct number of expected repeated outputs for a small sample size. However, from my own tests, xoroshiro+ does not produce the expected binomial distribution of duplicated outputs (i.e. two successive outputs identical) for a complete iteration.
Hang fire just a bit. I'm not long into having another hack at this scoring business again. I slept on it for a few weeks before deciding that Chris isn't around any more and deciding I needed to find another way to make PractRand more dependable ... and think I've got it sus'd now.
I've added a check after each PractRand exit to extract the total number of FAILed tests and how many were also due to BCFN tests. If all the fails are due to BCFN tests then that whole case is run over again but setting PractRand's starting length at double the prior fail length, aka score. If it was the correct score then this will double the score, but otherwise the case will continue until the correct length. This seems to reliably push any glitched score up to its correct length.
I'm doing a full s16 run just now. And will be looking out for secondary glitches to see if maybe the checks need to go further.
Thanks, Evan. I've had a quick look and the pair distribution for [6,2,3] looks quite good, but its zero distribution looks really bad with far too many non-zero higher frequencies of 22 or more - strange.
The bad news is the strong culling has wiped out all the [14 2 7 x] candidates while leaving plenty of others to choose from. This isn't anything to do with the glitches though. It's due to reorienting the scoring solely around byte sampling apertures.
Here's the short list I'll be targetting from now on:
An important point is that Parallax users will only see xoroshiro32++ as part of XORO32. If they want only the high or low 16 bits, then an extended double interation test for 15:0 would be handy, i.e. missing out every other 16-bit output. If users want only the high or low 8 bits of the 32-bit XORO32 output, then separate extended double iteration tests for 15:8 and 7:0 would be useful.
For old 8-bit CPUs that can shift or rotate just one bit at a time, xoroshiro32++ [2,8,7,8] is the fastest algorithm with good PractRand scores, although less so for duplicated outputs. xoroshiro32++ [13,9,8,8] requires one more shift/rotate (four, instead of three) but has a slightly better 7:0 score of 1G.
[anomaly comments deleted] Made a mistake on the retest.
EDIT: As more of the grids complete ... it looks like this approach is working well. There is notable poor scores for nearly all candidates. [3 2 6 9] is the only flawless exception so far.
[2,8,7,8] isn't garbage but not a great choice either. The 12-bit sampling apertures stand out here. Even sized apertures are always lower scoring but that line is more distinct that usual, like the usual persistently low 16-bit aperture scores.
Anything with d = 8 won't be excellent, but both b = 8 and d = 8 make [2,8,7,8] the fastest option for 8-bit CPUs and the scores aren't terrible. Is [13,9,8,8] any better?
I'm rerunning a bunch of the earlier grids to clean out the occasional doubled score. I'll post [3 2 6 9]'s grid after that.
[13,9,8,8] was never run before I started over. It'll be a day or so before it comes up in the schedule.
Also, regarding the already chosen [14 2 7 5] candidate, in the Prop2's XORO32 instruction, it still looks comparatively high quality in these grids. Its poorest grid score is a single 128 MB. And the only two 256 MB scores just happened to sit on the auto-culling line. I guess this leans support to reviewing more of the 256 MB culls.
At this stage, based on these grid scores, I'd place [14 2 7 5] as number two ranking, right behind [3 2 6 9].
And here's the revised [14 2 7 6], the original chosen candidate, as well. As seems to be the case over and over, it's virtually identical scoring to [14 2 7 5]. Slightly worse this time.
I think you cull anything < 1G for [7:0] except for [7:0] = 512M and [15:0] = 1G, but this omits several candidates for which [7:0] = [15:0] = 512M.
[2,8,7,8] seems to be better than [13,9,8,8].
[3,2,6,x] and [6,2,3,x] have weight or degree = 15, close to theoretically-best 32/2.
EDIT:
Correct mistakes in last line.
http://forums.parallax.com/discussion/comment/1441593/#Comment_1441593
Pair frequency distributions for selected xoroshiro32+[a,b,c], xoroshiro32+p[a,b,c] and xoroshiro32++[a,b,c,d] candidates.
N.B. The pair distributions are identical to the XORO32 distributions, but ordering of data is different
Pair frequency
Actual and Expected
Pair frequency
|Actual-Expected|/Expected Notes
pfreq(x) = output pairs that occur x times
If N-1 = full period (2^32-1 for the above) then:
(a) Expected pfreq(x) = 1/x! * N/e
(b) Sum of pfreq(x) = N
(c) Sum of x * pfreq(x) = N-1
I'm guessing you meant [3 2 6 x] and [6 2 3 x].
EDIT: Added the table.
After generating the latest grid for candidate [13 5 10 10], I quickly spotted a suspicious score of 32 KB at aperture 7>>8. And looking through the log I see these two lines:
And digging out the specific PractRand report file it is as expected, ending at 32 KB because the revert overwrites whatever report was made from the "trying larger" attempt.
So, I run up another script that allows specifying all the component parameters for a single candidate, including telling PractRand to continue past normal failures. See attached. The problem reveals itself as being that both 32 KB and 64 KB are basically identical fails.
Oops, yes, very late night errors.
Sample size of 4 is worse than 3 or 5, 8 is worse than 7 or 9 and 16 worse than 15 because FPF test can detect equidistribution and more easily for even sample widths. I think we should give these odd sizes considerable importance when choosing between candidates, especially 15.
Another thought. The initial seed should not make any difference but as a check have you tried more than one seed when the score is unusually low? The sequences must not overlap, e.g. could use 1 and seed 2^31 iterations later (halfway point). Second seed would vary between candidates.
To get past the BCFN glitches? Or the oddball low scores (mid megabytes) that I've kept?
The higher scores for the odd sizes are due to the fact that the equidistribution is harder to detect because the extended FPF tests seem to work on 4, 8 or 16 bit samples. I think 15-bit scores are a good proxy for what 16-bit would be without FPF, especially now with sixteen different low bits, but disabling FPF leads to far too many high scores.
I was thinking mainly of the FPF failures for [6,2,3,9] at 16M.
EDIT: Well, ignorable to the point of maybe a low score for odd sized apertures maybe should be considered poorer than the same low even sized score.
We should look at the whole grid, even and odd. 15-bit scores might be good tie-breakers.