The Z80 code for [14,8,9,9] is only 13% bigger and slower than for [2,8,7,8] and it would be interesting to see the former's grid scores, when they are ready.

[2,8,7,7] or [2,8,7,9] or [7,8,2,7] or [7,8,2,9] are also +13% compared to [2,8,7,8].

Heh, my focus is entirely on the low scores spoiling the brew. I've pondered a little on how to calculate a single value that strongly represents the few poor scores while still giving some weight to the bulk of the grid ... but my maths knowledge is pretty limited really.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

Heh, my focus is entirely on the low scores spoiling the brew. I've pondered a little on how to calculate a single value that strongly represents the few poor scores while still giving some weight to the bulk of the grid ... but my maths knowledge is pretty limited really.

A simple thing to try first is add all 256 grid scores in K, so that 16G becomes 16M and the result is guaranteed to fit into 32 bits. Do the best candidates have the highest total score? The minimum of the 256 scores could be recorded separately.

Because of the exponential scoring, that makes the low scores near irrelevant and the freak 16G's overly strong weight. The inverse effect of what I think is needed. I'm guessing a log base2 will be part of the answer.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

Zero frequency provides nothing extra really. For good candidates the results are very similar to pair frequency and for poor candidates the results are all over the shop.

Because of the exponential scoring, that makes the low scores near irrelevant and the freak 16G's overly strong weight. The inverse effect of what I think is needed. I'm guessing a log base2 will be part of the answer.

As the scores are all in the form of 2^N we could just use N:
34 for 16G, 33 for 8G, 32 for 4G, 31 for 2G, 30 for 1G, etc.

Because of the exponential scoring, that makes the low scores near irrelevant and the freak 16G's overly strong weight. The inverse effect of what I think is needed. I'm guessing a log base2 will be part of the answer.

As the scores are all in the form of 2^N we could just use N:
34 for 16G, 33 for 8G, 32 for 4G, 31 for 2G, 30 for 1G, etc.

Righty-ho, here's a graph of everything gridded so far. I've highlighted XORO32's [14 2 7 5] and the top scoring [3 2 6 9].

PS: Tony, the [6 2 3 9] average of 30.902 differs from your calculation because the grid you have for that candidate has a number of doubled scores in it.

Thanks, Evan! I was slightly out with my total for [6,2,3,9] by 3. I think there is space to show the odd values on the vertical axis of the graph. Is it worth displaying the highest scores as well?

Although xoshiro is not used in the P2, I'd like to know the ** scrambler for the new xoshiro32** (8-bit output) and also why xoshiro40** does not exist.

Although xoshiro is not used in the P2, I'd like to know the ** scrambler for the new xoshiro32** (8-bit output) and also why xoshiro40** does not exist.

I was rereading Melisa's initial review of Xoshiro** to see if I could understand the problem with "Invertible Output Functions". What I realised that Xoroshiro++ probably doesn't suffer from invertibility, or at least not a simple constant multiplier. The only constant involved in the output scrambler, "D", gets applied to one of two dynamic components prior to summing. This can't be a simple invertible situation.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

For old 8-bit CPUs that shift/rotate only one bit at a time, four xoroshiro32++ candidates in order of increasing code size, execution time and quality could be:

[2,8,7,8] or [13,9,8,8]
[2,8,7,7] or [7,8,2,7]
[3,2,6,9]

EDIT:
[3,2,6,9] is same size and speed as [14,2,7,5] on the Z80.

Oh, wow! I just made a small change to the culling code so that it uses any pre-existing PractRand report files that a previous run had generated. Which also includes the PR reports from grid scoring. Boy, what a speed up!

Of course, it was a long time before I'd stopped meddling with the basics like what PractRand options to settle on, so maybe this has only recently been useful feature anyway.

Tony,
Here's the matching complete set of grids and frequency distribution files for the above chart:

Although xoshiro is not used in the P2, I'd like to know the ** scrambler for the new xoshiro32** (8-bit output) and also why xoshiro40** does not exist.

I'd be very keen to see what Melissa makes of what we have here. Firstly, putting the Xoroshiro++ algorithm through her expert hands. It seems to me it's nice upgrade to the original algorithm.

But, secondly, also explaining the much less* severe but continued reduced scores on 16-bit sampling aperture problem that PractRand seems to be highlighting. Sometimes the effect is notable on 4/8/12-bit apertures too, but the 16-bit problem just never goes away, no exceptions. And to repeat, this did not show up with Chris's PRNG algorithm, for example.

*Much less severe compared to original Xoroshiro+ algorithm. As in 1000:1 better 16-bit scores.

EDIT: I suppose, to be fair, the effect is always a little bit visible in the 4/8/12-bit apertures. I should forget it and move on.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

Although xoshiro is not used in the P2, I'd like to know the ** scrambler for the new xoshiro32** (8-bit output) and also why xoshiro40** does not exist.

I'd be very keen to see what Melissa makes of what we have here. Firstly, putting the Xoroshiro++ algorithm through her expert hands. It seems to me it's nice upgrade to the original algorithm.

But, secondly, also explaining the much less* severe but continued reduced scores on 16-bit sampling aperture problem that PractRand seems to be highlighting. Sometimes the effect is notable on 4/8/12-bit apertures too, but the 16-bit problem just never goes away, no exceptions. And to repeat, this did not show up with Chris's PRNG algorithm, for example.

*Much less severe compared to original Xoroshiro+ algorithm. As in 1000:1 better 16-bit scores.

EDIT: I suppose, to be fair, the effect is always a little bit visible in the 4/8/12-bit apertures. I should forget it and move on.

Yes, I do intend to get in touch with Melissa again. Could you do the pair frequency distribution for xoroshiro32+ [3,2,6] first?

There's no doubt that xoroshiro++ is much better than xoroshiro+ and I'd like to send her our distributions for [3,2,6,9] and [3,2,6]. She doesn't like xoshiro very much, nor the simple ** scrambler which she thinks is easy to unscramble. It would be good if she could look at the ++ scrambler.

Is it worthwhile doing the grid tests on xoroshiro32+p [3,2,6] or even xoroshiro32+ [3,2,6]?

... Could you do the pair frequency distribution for xoroshiro32+ [3,2,6] first?
...
Is it worthwhile doing the grid tests on xoroshiro32+p [3,2,6] or even xoroshiro32+ [3,2,6]?

You're messing with my perfectly tuned scripts again! ...

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

Although xoshiro is not used in the P2, I'd like to know the ** scrambler for the new xoshiro32** (8-bit output) and also why xoshiro40** does not exist.

I'd be very keen to see what Melissa makes of what we have here. Firstly, putting the Xoroshiro++ algorithm through her expert hands. It seems to me it's nice upgrade to the original algorithm.

But, secondly, also explaining the much less* severe but continued reduced scores on 16-bit sampling aperture problem that PractRand seems to be highlighting. Sometimes the effect is notable on 4/8/12-bit apertures too, but the 16-bit problem just never goes away, no exceptions. And to repeat, this did not show up with Chris's PRNG algorithm, for example.

*Much less severe compared to original Xoroshiro+ algorithm. As in 1000:1 better 16-bit scores.

EDIT: I suppose, to be fair, the effect is always a little bit visible in the 4/8/12-bit apertures. I should forget it and move on.

Yes, I do intend to get in touch with Melissa again. Could you do the pair frequency distribution for xoroshiro32+ [3,2,6] first?

There's no doubt that xoroshiro++ is much better than xoroshiro+ and I'd like to send her our distributions for [3,2,6,9] and [3,2,6]. She doesn't like xoshiro very much, nor the simple ** scrambler which she thinks is easy to unscramble. It would be good if she could look at the ++ scrambler.

Is it worthwhile doing the grid tests on xoroshiro32+p [3,2,6] or even xoroshiro32+ [3,2,6]?

[3,2,6] was nowhere in the earlier xoroshiro+ testing. If the extended test scores are equally bad perhaps +p[14,2,7] or +[14,2,7] could also be done?

Found a newly added bug, in the last few days, that allowed a rare case of quadruple score. ... after some 17k report files scanned I can say luckily the bug hasn't affected any of the done grids. I picked it out on the latest culling run today.

Have to say it's been a tad hairy trying to suss the logic to manage PractRand's BCFN glitches. Here's the latest decision logic for it:

if [ ! -f "$PRreport" ]; then # run and score the candidate
"$RNGbin" | stdbuf -o L $PRcommand >"$PRreport"
if [ $? -ne 0 ]; then
printf "Aborted ${RNGbin} - PractRand error\n"
exit 2
fi
fi
# Output the score to console/logfile. This is not score extraction for score table.
extract_score
if [ $bcfncnt -lt $failcnt ]; then # a valid score. Only the BCFN tests trip up
printf "[${a} ${b} ${c} ${d}] sampling aperture ${sampsize}>>${samppos} - PractRand score: ${scoresize}B\n"
else # probably incorrect score due to too sensitive BCFN testing. Rerun testing from +1 power
if [ $bcfncnt -gt 2 ];then
printf "Passing because "
else
printf "BCFNs: $bcfncnt, [${a} ${b} ${c} ${d}] sampling aperture ${sampsize}>>${samppos}, PractRand score: ${scoresize}B - Trying larger ...\n"
mv "$PRreport" "${PRreport}.tmp"
scoresizedn=$scoresize
sizeup=$(( $sizekb * 2 ))
"$RNGbin" | stdbuf -o L $PRcommand -tlmin "${sizeup}KB" >"$PRreport"
extract_score
if [ $sizekb -eq $sizeup ]; then # recuring fails
if [ $bcfncnt -lt $failcnt ] || [ $bcfncnt -gt 2 ]; then # initial score was valid, revert
rm "$PRreport"
mv "${PRreport}.tmp" "$PRreport"
scoresize=$scoresizedn
printf "BCFNs: $bcfncnt, Fails: $failcnt, Reverted - "
bcfncnt=0
else # two BCFN glitches in a row! raise another power, rerun again
printf "BCFNs: $bcfncnt, Fails: $failcnt, ${RNGbin} - Larger again ...\n"
sizeup=$(( $sizekb * 2 ))
"$RNGbin" | stdbuf -o L $PRcommand -tlmin "${sizeup}KB" >"$PRreport"
extract_score
if [ $sizekb -eq $sizeup ]; then # initial score was valid, revert
rm "$PRreport"
mv "${PRreport}.tmp" "$PRreport"
scoresize=$scoresizedn
printf "BCFNs: $bcfncnt, Fails: $failcnt, Reverted - "
bcfncnt=0
else # got through it. Hopefully no other BCFN glitch!
rm "${PRreport}.tmp"
fi
fi
else # new report is normal, BCFN glitch cleared
rm "${PRreport}.tmp"
fi
fi
if [ $bcfncnt -eq $failcnt ]; then
printf "BCFN & Fails: $failcnt, "
fi
printf "[${a} ${b} ${c} ${d}] sampling aperture ${sampsize}>>${samppos} - PractRand score: ${scoresize}B\n"
fi

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

Oh, wow! I just made a small change to the culling code so that it uses any pre-existing PractRand report files that a previous run had generated. Which also includes the PR reports from grid scoring. Boy, what a speed up!

Of course, it was a long time before I'd stopped meddling with the basics like what PractRand options to settle on, so maybe this has only recently been useful feature anyway.

Tony,
Here's the matching complete set of grids and frequency distribution files for the above chart:

Evan, many thanks for the data.

Some thoughts about pair distribution. The C code iterates xoroshiro32++, concatenates successive 16-bit outputs and records how often each possible 32-bit value occurs. If the first 16-bit output is 1, the second 2, etc., then the C code produces the following 32-bit data:

Although the order is different the data are the same, i.e. the pair distribution is the XORO32 output distribution and therefore a good test of the XORO32 randomness. The xoroshiro32++ 16-bit output is equidistributed so that every non-zero values occurs 2^16 times and zero 2^16-1 times, but the XORO32 32-bit output is not equidistributed because the ++ scrambler is only 1-dimensionally equidistributed.

So dimensionality, and therefore equidistribution, is just a perspective thing then.

The consecutive generator output data is identical between my code and XORO32. In fact Chip even made a late change back to iterating the engine, post scrambling. The same as I was always sequencing things. Did we double check the output data after that change? I don't remember.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

So dimensionality, and therefore equidistribution, is just a perspective thing then.

The consecutive generator output data is identical between my code and XORO32. In fact Chip even made a late change back to iterating the engine, post scrambling. The same as I was always sequencing things. Did we double check the output data after that change? I don't remember.

Yes, we did check the data. We calculate the PRN using the state before it is changed, as that is the quickest way in hardware and in software if parallel processing is available.

It is faster for your C code to save the previous output and use it again rather than go through the whole period twice as in XORO32 but otherwise the results are the same.

Continuing from my last but one post, I think we need extra grid tests tailored to the 32-bit XORO32 output. In the table below the first figure is the sample size, followed by the range of numbers of the lsb's in the samples:

The total is 528 and 527 new ones as [32,0] will be identical to the [16,0] already done. These new tests would use data either from successive outputs or every other output. For example, a 16-bit sample could be the high byte of one output and the low byte of the next and the PractRand scores should be higher than one complete output as the equidistribution will be disturbed/destroyed.

EDIT:
The results could be included in the documentation so that users know which is the best subset for each sample size. A shift or rotate could align the data left or right, but this step might not be needed if the data are sent directly to pins.

## Comments

1,124[2,8,7,7] or [2,8,7,9] or [7,8,2,7] or [7,8,2,9] are also +13% compared to [2,8,7,8].

6,617hidden in the rings that we can't see,

the rings are almost pure ice."

6,617hidden in the rings that we can't see,

the rings are almost pure ice."

1,124A simple thing to try first is add all 256 grid scores in K, so that 16G becomes 16M and the result is guaranteed to fit into 32 bits. Do the best candidates have the highest total score? The minimum of the 256 scores could be recorded separately.

6,617hidden in the rings that we can't see,

the rings are almost pure ice."

6,617[6 2 5 5] is another second place contender.

hidden in the rings that we can't see,

the rings are almost pure ice."

1,124http://forums.parallax.com/discussion/comment/1441593/#Comment_1441593

Zero frequency provides nothing extra really. For good candidates the results are very similar to pair frequency and for poor candidates the results are all over the shop.

1,124As the scores are all in the form of 2^N we could just use N:

34 for 16G, 33 for 8G, 32 for 4G, 31 for 2G, 30 for 1G, etc.

Total of 256 scores fits easily in 16 bits.

1,124Selected results based on the scheme above:

Calculated manually. I hope Evan will modify his program from now on!

6,617PS: Tony, the [6 2 3 9] average of 30.902 differs from your calculation because the grid you have for that candidate has a number of doubled scores in it.

EDIT: Updated chart here - https://forums.parallax.com/discussion/download/123206/scores_xo++s16.png

hidden in the rings that we can't see,

the rings are almost pure ice."

1,1241,1241,124http://www.pcg-random.org/posts/xoshiro-repeat-flaws.html

Although xoshiro is not used in the P2, I'd like to know the ** scrambler for the new xoshiro32** (8-bit output) and also why xoshiro40** does not exist.

6,617I was rereading Melisa's initial review of Xoshiro** to see if I could understand the problem with "Invertible Output Functions". What I realised that Xoroshiro++ probably doesn't suffer from invertibility, or at least not a simple constant multiplier. The only constant involved in the output scrambler, "D", gets applied to one of two dynamic components prior to summing. This can't be a simple invertible situation.

hidden in the rings that we can't see,

the rings are almost pure ice."

1,124[7,10,10,7]

[5,2,6,8]

[6,2,3,8]

Zero frequency can be dropped.

1,124[2,8,7,8] or [13,9,8,8]

[2,8,7,7] or [7,8,2,7]

[3,2,6,9]

EDIT:

[3,2,6,9] is same size and speed as [14,2,7,5] on the Z80.

6,617Of course, it was a long time before I'd stopped meddling with the basics like what PractRand options to settle on, so maybe this has only recently been useful feature anyway.

Tony,

Here's the matching complete set of grids and frequency distribution files for the above chart:

hidden in the rings that we can't see,

the rings are almost pure ice."

6,617I'd be very keen to see what Melissa makes of what we have here. Firstly, putting the Xoroshiro++ algorithm through her expert hands. It seems to me it's nice upgrade to the original algorithm.

But, secondly, also explaining the much less* severe but continued reduced scores on 16-bit sampling aperture problem that PractRand seems to be highlighting. Sometimes the effect is notable on 4/8/12-bit apertures too, but the 16-bit problem just never goes away, no exceptions. And to repeat, this did not show up with Chris's PRNG algorithm, for example.

*Much less severe compared to original Xoroshiro+ algorithm. As in 1000:1 better 16-bit scores.

EDIT: I suppose, to be fair, the effect is always a little bit visible in the 4/8/12-bit apertures. I should forget it and move on.

hidden in the rings that we can't see,

the rings are almost pure ice."

1,124Yes, I do intend to get in touch with Melissa again. Could you do the pair frequency distribution for xoroshiro32+ [3,2,6] first?

There's no doubt that xoroshiro++ is much better than xoroshiro+ and I'd like to send her our distributions for [3,2,6,9] and [3,2,6]. She doesn't like xoshiro very much, nor the simple ** scrambler which she thinks is easy to unscramble. It would be good if she could look at the ++ scrambler.

Is it worthwhile doing the grid tests on xoroshiro32+p [3,2,6] or even xoroshiro32+ [3,2,6]?

6,617hidden in the rings that we can't see,

the rings are almost pure ice."

1,124[3,2,6] was nowhere in the earlier xoroshiro+ testing. If the extended test scores are equally bad perhaps +p[14,2,7] or +[14,2,7] could also be done?

6,617hidden in the rings that we can't see,

the rings are almost pure ice."

6,617Have to say it's been a tad hairy trying to suss the logic to manage PractRand's BCFN glitches. Here's the latest decision logic for it:

hidden in the rings that we can't see,

the rings are almost pure ice."

1,124Evan, many thanks for the data.

Some thoughts about pair distribution. The C code iterates xoroshiro32++, concatenates successive 16-bit outputs and records how often each possible 32-bit value occurs. If the first 16-bit output is 1, the second 2, etc., then the C code produces the following 32-bit data:

XORO32 does a double iteration and outputs 32-bit data in the following order:

Although the order is different the data are the same, i.e.

the pair distribution is the XORO32 output distribution and therefore a good test of the XORO32 randomness. The xoroshiro32++ 16-bit output is equidistributed so that every non-zero values occurs 2^16 times and zero 2^16-1 times, but the XORO32 32-bit output is not equidistributed because the ++ scrambler is only 1-dimensionally equidistributed.6,617The consecutive generator output data is identical between my code and XORO32. In fact Chip even made a late change back to iterating the engine, post scrambling. The same as I was always sequencing things. Did we double check the output data after that change? I don't remember.

hidden in the rings that we can't see,

the rings are almost pure ice."

1,124Yes, we did check the data. We calculate the PRN using the state before it is changed, as that is the quickest way in hardware and in software if parallel processing is available.

It is faster for your C code to save the previous output and use it again rather than go through the whole period twice as in XORO32 but otherwise the results are the same.

1,124The total is 528 and 527 new ones as [32,0] will be identical to the [16,0] already done. These new tests would use data either from successive outputs or every other output. For example, a 16-bit sample could be the high byte of one output and the low byte of the next and the PractRand scores should be higher than one complete output as the equidistribution will be disturbed/destroyed.

EDIT:

The results could be included in the documentation so that users know which is the best subset for each sample size. A shift or rotate could align the data left or right, but this step might not be needed if the data are sent directly to pins.

6,617hidden in the rings that we can't see,

the rings are almost pure ice."

1,1246,617seed=00000001

Output sequence is:

hidden in the rings that we can't see,

the rings are almost pure ice."