I've wiped all the parity sourced Practrand report files and started the culling and gridding runs afresh.

EDIT: Here's the culled tables, both with and without parity, for comparison. Notably [11 10 12]s parity scores means it shouldn't be included now but I've thrown it in to fill the final score chart.

1. Best xoroshiro32+ triple is [14,2,7].
2. Parity trick makes a huge improvement to bit 0.
3. Bit 1 is much better than bit 0 without parity but weaker than higher bits.
4. Improved bit 0 with parity masks relatively weak bit 1 for [7:0].

I'm most interested currently in which 16-bit subsample of double-iterated xoroshiro32++ [14,2,7,5], i.e. the P2 v1 XORO32 instruction, has the highest PractRand score. Frequency test results below suggest it will be [25:10].

How much better will the top score be than [15:0] or [31:16], I wonder?

Thanks, Evan. Do you think you might have time this week to do 16-bit double iteration tests for [14,2,7,5]? The results would have practical use on the P2 v1 and FPGA implementions. The three scores needed are [15:0], [31:16] and whichever other one is the highest.

How much info are you wanting? Just a few Practrand scores only? I could hand build and score a few individual cases one by one I guess.

What's the chances of doing any more later? If this is likely to recur I should probably work on getting the automation to do it. EDIT: We could grid them all too then.

How much info are you wanting? Just a few Practrand scores only? I could hand build and score a few individual cases one by one I guess.

What's the chances of doing any more later? If this is likely to recur I should probably work on getting the automation to do it. EDIT: We could grid them all too then.

[15:0], [31:16] and [25:10] if individual, just to see the difference. Automated grid to test all 16-bit subsamples (and possibly higher) could be handy, though.

Damn it. There has still been more BCFN glitches getting by undetected. This time at least no need to start over.

Latest scoring logic:

if [ ! -f "$PRreport" ]; then # If no pre-existing Practrand report then run and score the case
"$RNGbin" | stdbuf -o L $PRcommand >"$PRreport"
if [ $? -ne 0 ]; then
printf "Aborted ${RNGbin} - PractRand error\n"
exit 2
fi
fi
# Output the score to console/logfile. This is not score extraction for score table.
extract_score
if [ $bcfncnt -eq $failcnt ]; then # probably incorrect score due to too sensitive BCFN testing
if [ $bcfncnt -gt 2 ];then
printf "Passing because "
else # rerun testing from +1 power
mv "$PRreport" "${PRreport}.tmp"
printf "BCFNs: $bcfncnt, ${RNGbin}, PractRand score: ${scoresize}B - Trying larger ...\n"
scoresizedn=$scoresize
sizeup=$(( $sizekb * 2 ))
"$RNGbin" | stdbuf -o L $PRcommand -tlmin "${sizeup}KB" >"$PRreport"
extract_score
if [ $sizekb -eq $sizeup ] && [ $bcfncnt -gt 0 ]; then # recuring fails
if [ $bcfncnt -lt $failcnt ] || [ $bcfncnt -gt 2 ]; then # initial score was valid, revert
rm "$PRreport"
mv "${PRreport}.tmp" "$PRreport"
printf "BCFNs: $bcfncnt, Fails: $failcnt, Reverted - "
scoresize=$scoresizedn
bcfncnt=0
else # two BCFN glitches in a row! raise another power, rerun again
printf "BCFNs: $bcfncnt, Fails: $failcnt, ${RNGbin} - Larger again ...\n"
sizeup=$(( $sizekb * 2 ))
"$RNGbin" | stdbuf -o L $PRcommand -tlmin "${sizeup}KB" >"$PRreport"
extract_score
if [ $sizekb -eq $sizeup ]; then # plain giving up at three tries, revert
rm "$PRreport"
mv "${PRreport}.tmp" "$PRreport"
printf "BCFNs: $bcfncnt, Fails: $failcnt, Reverted - "
scoresize=$scoresizedn
bcfncnt=0
else # got through it, any more glitches for this case won't be detected on this run
rm "${PRreport}.tmp"
fi
fi
else # revised report is normal, BCFN glitch cleared
rm "${PRreport}.tmp"
if [ $bcfncnt -lt $failcnt ]; then # report the correction
printf "BCFNs: $bcfncnt, Fails: $failcnt, Accepted - "
fi
fi
fi
if [ $bcfncnt -eq $failcnt ]; then
printf "BCFN & Fails: $failcnt, "
fi
fi
printf "${RNGbin}, PractRand score: ${scoresize}B\n"
rm "$RNGbin"

I'm most interested currently in which 16-bit subsample of double-iterated xoroshiro32++ [14,2,7,5], i.e. the P2 v1 XORO32 instruction, has the highest PractRand score. Frequency test results below suggest it will be [25:10].

How much better will the top score be than [15:0] or [31:16], I wonder?

Thanks, Evan. Do you think you might have time this week to do 16-bit double iteration tests for [14,2,7,5]? The results would have practical use on the P2 v1 and FPGA implementions. The three scores needed are [15:0], [31:16] and whichever other one is the highest.

How much info are you wanting? Just a few Practrand scores only? I could hand build and score a few individual cases one by one I guess.

What's the chances of doing any more later? If this is likely to recur I should probably work on getting the automation to do it. EDIT: We could grid them all too then.

[15:0], [31:16] and [25:10] if individual, just to see the difference. Automated grid to test all 16-bit subsamples (and possibly higher) could be handy, though.

Bouncing around the walls again? Hehe. Yeah, I went all in for automatic testing of double iterating the generator. The C source was quick to change although it did require concentration for keeping sorted the separation of word widths. I needed to be fresh for sure.

While working my way through the scripts it started getting messy enough that I decided to do a cleaning round ... and that proved troublesome due to ripple effects right throughout.

Had a day off. Started a functioning culling run today - Looking to be twice the number of passing candidates with 512 MB threshold. Again, none of the [14 2 7 x] candidates made the grade.

I've only just finished the gridding clean-up right now. I'll have a grid done for [14 2 7 5] first ...

I've done the faster half (even-sized apertures) first:
Full 16x16 single iterated grid: Lowest Exponent = 27, Exponent Average = 30.957
Even 8x16 single iterated grid: Lowest Exponent = 27, Exponent Average = 30.242
Even 16x32 double iterated grid: Lowest Exponent = 26, Exponent Average = 31.126
Note the heightened average even though the worst case is lower. And it'll be higher again with the odds included.

At any rate, Practrand scoring has consistently matched the distribution scores all along. I've been a little surprised by how good your numbers are for predictability.

I guess you could say we've done a pretty good job verifying distribution scoring as a rapid candidate selection method. My Practrand based mass scoring approach has been a good test bed for doing the proving but it isn't practical to carry on using for longer worded engines. Although shorter engines would be fine.

I'd like to continue down current track. Finish the double iteration mass scoring. Then also do some testing of XoroshiroXX** as well - Try and get a gauge on the Prop2's free running generator too.

Full 32x32 double iterated grid: Lowest Exponent = 26, Exponent Average = 31.839.

Compared to an average of 30.957 for the single iterated grid scoring, that's not far off average scores being doubled in value. And lowest is half value.

So, based on this one candidate, double iterating destabilises the quality a little ... and distribution likewise?

I'd like to continue down current track. Finish the double iteration mass scoring. Then also do some testing of XoroshiroXX** as well - Try and get a gauge on the Prop2's free running generator too.

Running at about 3 hours per grid (for easy half). Currently at #4 of 53 candidates, so maybe another week to finish the gridding.

Hmm, one thing that has been getting on my wick is the amount of unusable RAM on PCs these days. It crazily increases with total RAM installed. This newish Ryzen setup I've got is losing about 600 MB for no apparent reason. Given the CPU has the whole memory controller built-in, the loses aren't likely to change with motherboards. EDIT: I'd be okay at 1% of that, say 5 MB unusable.

CPU AuthenticAMD, Features Code 178BFBFF, Model Code 00800F11
AMD Ryzen 7 1700X Eight-Core Processor
Measured - Minimum 3992 MHz, Maximum 3992 MHz
get_nprocs() - CPUs 16, Configured CPUs 16
get_phys_pages() and size - RAM Size 31.41 GB, Page Size 4096 Bytes
...

And here's with one DIMM removed. Now the unusable amount is down to about 340 MB. I totally don't get why it's dynamic at all.

CPU AuthenticAMD, Features Code 178BFBFF, Model Code 00800F11
AMD Ryzen 7 1700X Eight-Core Processor
Measured - Minimum 3992 MHz, Maximum 3993 MHz
get_nprocs() - CPUs 16, Configured CPUs 16
get_phys_pages() and size - RAM Size 15.66 GB, Page Size 4096 Bytes

Back to random number testing now. I note me CPU cooler is in need of a dust out too. Packed dust forming into top and bottom fins.

EDIT2: Big Oops! Those Physical RAM sizes weren't actually truly the amount of DRAM addressable. Asking google the right question made a huge difference to my understanding. Turns out those numbers exclude the whole Linux kernel space! And that has a reserved space that sizes as a percentage of the physical RAM installed.

User available (16289976 KB) + Kernel reserved (420332 KB) = Kernel available (16710308 KB) + Kernel code&data (23834 KB) = CPU addressable DRAM (16734142 KB).

User available (32805040 KB) + Kernel reserved (682484 KB) = Kernel available (33487524 KB) + Kernel code&data (23834 KB) = CPU addressable DRAM (33511358 KB).

Yay! The unusable space is a much smaller, and fixed amount, of slightly over 42 MB. I can handle that.

EDIT3: Looks like I've assumed too much again. My older PC doesn't conform to the above allocations. It seems that all kernel code and data must reside within the "reserved" space. So, the 23834 KB piece is gobbled up and that leaves a bit over 65 MB unaccounted for. It's only 1.5 MB unaccounted on the older PC!

Full 32x32 double iterated grid: Lowest Exponent = 26, Exponent Average = 31.839.

Compared to an average of 30.957 for the single iterated grid scoring, that's not far off average scores being doubled in value. And lowest is half value.

So, based on this one candidate, double iterating destabilises the quality a little ... and distribution likewise?

Looking at max scores, single iterated has one 16G for 15-bit, several 8G for 7-bit, plus 3-, 2- and 1-bit. Double iterated has four 32G for 29-bit and one for 23-bit, with more maximums for 15-bit and under than single-iterated, as would be expected.

Some double-iterated scores are lower, e.g. 8-bit [7:0]. Would it be worthwhile appending +/-/= to the double-iterated scores to show how they compare to single?

As for the distribution, the only one we have is 32-bit [31:0]. As the period is 2^32-1, each non-zero output would occur exactly once and zero never if they were equidistributed, which is not the case and the distribution is a binomial one instead. I don't know what the expected distributions are for less than 32-bit. The average frequency for 16-bit is 2^16, but there is no equation that can predict the distribution that I know of and the results will vary for each lsb, as my tests have shown.

Running full double-iterated grid tests for all non-culled candidates might not be the best use of the resources at the moment. I'd like to know:

1. Does the 32-bit distribution (pair/XORO32) vary with different lsb's for [14,2,7,5]? The ones to try are 5 or 19 or 20 as these have 2G score and 0 is only 256M. The C code would need a new constant to rotate the 32-bit output before incrementing the 4GB byte array.

2. How long does each 32-bit distribution test take? About a minute? I'd prefer to see all the [31:0] pair and zero distributions before any more PractRand grid tests. Which [a,b,c,d] is closest to the ideal? Currently it's [3,2,6,5] for pair frequency but lots of candidates have not been tested yet.

3. What is the distribution for scro's generator with 32-bit output and max score of 32G? Is it a binomial?

## Comments

1,075Thanks, Evan. [14,2,7]+p pair/XORO32 distribution is better than [14,2,7]+. Table updated at

http://forums.parallax.com/discussion/comment/1441593/#Comment_1441593

Above results confirm several points:

1. Best xoroshiro32+ triple is [14,2,7].

2. Parity trick makes a huge improvement to bit 0.

3. Bit 1 is much better than bit 0 without parity but weaker than higher bits.

4. Improved bit 0 with parity masks relatively weak bit 1 for [7:0].

1,075PractRand scores, single iteration

Pair/XORO32 frequency, double iteration

Actual and Expected

Pair/XORO32 frequency, double iteration

|Actual-Expected|/Expected

6,0681,075Thanks, Evan. Do you think you might have time this week to do 16-bit double iteration tests for [14,2,7,5]? The results would have practical use on the P2 v1 and FPGA implementions. The three scores needed are [15:0], [31:16] and whichever other one is the highest.

6,068What's the chances of doing any more later? If this is likely to recur I should probably work on getting the automation to do it. EDIT: We could grid them all too then.

1,075[15:0], [31:16] and [25:10] if individual, just to see the difference. Automated grid to test all 16-bit subsamples (and possibly higher) could be handy, though.

6,068Latest scoring logic:

6,0681,075Is is Thursday already? Time flies!

6,068While working my way through the scripts it started getting messy enough that I decided to do a cleaning round ... and that proved troublesome due to ripple effects right throughout.

Had a day off. Started a functioning culling run today - Looking to be twice the number of passing candidates with 512 MB threshold. Again, none of the [14 2 7 x] candidates made the grade.

I've only just finished the gridding clean-up right now. I'll have a grid done for [14 2 7 5] first ...

6,068Full 16x16 single iterated grid: Lowest Exponent = 27, Exponent Average = 30.957

Even 8x16 single iterated grid: Lowest Exponent = 27, Exponent Average = 30.242

Even 16x32 double iterated grid: Lowest Exponent = 26, Exponent Average = 31.126

Note the heightened average even though the worst case is lower. And it'll be higher again with the odds included.

1,0751,075The subsamples with the greatest frequency variations have the lowest PractRand scores. This is the opposite of what I predicted they might be.

6,068I gather you've got access to another computer to generate those numbers with.

6,068I guess you could say we've done a pretty good job verifying distribution scoring as a rapid candidate selection method. My Practrand based mass scoring approach has been a good test bed for doing the proving but it isn't practical to carry on using for longer worded engines. Although shorter engines would be fine.

6,0686,068Compared to an average of 30.957 for the single iterated grid scoring, that's not far off average scores being doubled in value. And lowest is half value.

So, based on this one candidate, double iterating destabilises the quality a little ... and distribution likewise?

6,0686,068And here's with one DIMM removed. Now the unusable amount is down to about 340 MB. I totally don't get why it's dynamic at all.

Back to random number testing now. I note me CPU cooler is in need of a dust out too. Packed dust forming into top and bottom fins.

EDIT2: Big Oops! Those Physical RAM sizes weren't actually truly the amount of DRAM addressable. Asking google the right question made a huge difference to my understanding. Turns out those numbers exclude the whole Linux kernel space! And that has a reserved space that sizes as a percentage of the physical RAM installed.

Here's some corrected numbers: User available (16289976 KB) + Kernel reserved (420332 KB) = Kernel available (16710308 KB)

+ Kernel code&data (23834 KB) = CPU addressable DRAM (16734142 KB).

Installed DRAM (16 GB = 16777216 KB) - Addressable (16734142 KB) = 43074 KB unaccounted for.

User available (32805040 KB) + Kernel reserved (682484 KB) = Kernel available (33487524 KB)

+ Kernel code&data (23834 KB) = CPU addressable DRAM (33511358 KB).

Installed DRAM (32 GB = 33554432 KB) - Addressable (33511358 KB) = 43074 KB unaccounted for.

Yay! The unusable space is a much smaller, and fixed amount, of slightly over 42 MB. I can handle that.

EDIT3: Looks like I've assumed too much again. My older PC doesn't conform to the above allocations. It seems that all kernel code and data must reside within the "reserved" space. So, the 23834 KB piece is gobbled up and that leaves a bit over 65 MB unaccounted for. It's only 1.5 MB unaccounted on the older PC!

1,075Evan, I added your PractRand scores. I can test low and high frequencies up to 16 bits.

1,0751,075Looking at max scores, single iterated has one 16G for 15-bit, several 8G for 7-bit, plus 3-, 2- and 1-bit. Double iterated has four 32G for 29-bit and one for 23-bit, with more maximums for 15-bit and under than single-iterated, as would be expected.

Some double-iterated scores are lower, e.g. 8-bit [7:0]. Would it be worthwhile appending +/-/= to the double-iterated scores to show how they compare to single?

As for the distribution, the only one we have is 32-bit [31:0]. As the period is 2^32-1, each non-zero output would occur exactly once and zero never if they were equidistributed, which is not the case and the distribution is a binomial one instead. I don't know what the expected distributions are for less than 32-bit. The average frequency for 16-bit is 2^16, but there is no equation that can predict the distribution that I know of and the results will vary for each lsb, as my tests have shown.

Running full double-iterated grid tests for all non-culled candidates might not be the best use of the resources at the moment. I'd like to know:

1. Does the 32-bit distribution (pair/XORO32) vary with different lsb's for [14,2,7,5]? The ones to try are 5 or 19 or 20 as these have 2G score and 0 is only 256M. The C code would need a new constant to rotate the 32-bit output before incrementing the 4GB byte array.

2. How long does each 32-bit distribution test take? About a minute? I'd prefer to see all the [31:0] pair and zero distributions before any more PractRand grid tests. Which [a,b,c,d] is closest to the ideal? Currently it's [3,2,6,5] for pair frequency but lots of candidates have not been tested yet.

3. What is the distribution for scro's generator with 32-bit output and max score of 32G? Is it a binomial?

6,068EDIT: Or was it based on period length? I never really studied the purpose.

EDIT2: Array index is formed from the pairing: So that means a 32-bit output word would require 64-bit indexes.

1,0751,0756,0681,0756,0681,0751,075