If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.
If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.
On that basis, we'd need to produce a distribution run for each and every score on the existing 16x16 grid before even starting on double spacing it.
If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.
On that basis, we'd need to produce a distribution run for each and every score on the existing 16x16 grid before even starting on double spacing it.
The existing grid doesn't really test XORO32, which is why I suggested the new tests earlier today. We would need to compare the PractRand scores with the distributions and I might be able to do the latter. There are 17 possible 16-bit tests and maybe we should run those for [14,2,7,5] as that's what in the P2?
For contiguous 16-bit subsets, the number of the lsb is 0-16 thus 17 tests.
We've used PractRand as a comparison tool to choose [a,b,c,d]. That's fixed now in the P2 to [14,2,7,5] so we could just do 17 distributions but I'd like to see how they compare to the PractRand scores and what the highest score is. Taking eight bits from one output and eight from another [lsb = 8] might be the least correlated and hence most random.
I've been working on your earlier requests. I gather you want me to change tact.
PS: [14 2 7 5] isn't necessarily fixed. I'm happy to keep the options open for a revision.
I'm just outputting ideas as they appear!
Most if not all the development here recently has been aimed at finding the very best [a,b,c,d] and [3,2,6,9] appears to be a tiny bit better than [14,2,7,5]. The first version of the P2 will use the latter and there is testing we can do just on that but I'm not saying we should ignore [3,2,6,9] or others.
I've resurrected the single summing scrambler functionality again. It had been disabled and not maintained for many months.
Gotta say, auto-culling went through real quick!. I don't remember it being that little an effort historically. I guess it's a case of so many of the unknowns are sorted and built into the automation now.
If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.
I've run tests to see the frequency distribution of some double-iterated xoroshiro32++ outputs as in the XORO32 instruction, using sample sizes of 8/12/16 and various but not all of the possible lsb's. As the PractRand FPF test can detect equidistribution, it's likely that the best PractRand scores will occur when the equidistribution is disturbed the most. To detect this I looked for the lowest and highest frequencies that are furthest from the equidistributed values and here are the results:
There is no effect on the equidistribution if the sample is entirely within one xoroshiro32++ output and in order to disturb the equidistribution the sample must contain data from both outputs. For an equal mix of 8/12/16-bit samples the lsb is 12/10/8. However, the results above show that a half-and-half mix does not lead to the biggest variances, which are very small for 8 or 12 bits and so any PractRand tests should concentrate on 16 bits. To make things clear, the [3,2,6,9] sample above is for bits [24:9] of the double-iterated output.
Here's the 10 grids for Xoroshiro32+ and a collection of distribution reports for Xoroshiro32+ and 32+p that I generated a couple days back. I still have to do some scripting fixes to do the grid run for 32+p without overwriting what I've got now.
Here's the 10 grids for Xoroshiro32+ and a collection of distribution reports for Xoroshiro32+ and 32+p that I generated a couple days back. I still have to do some scripting fixes to do the grid run for 32+p without overwriting what I've got now.
A few comments:
1. Rotating the xoroshiro32+ 16-bit samples one bit to the right to move bit 0 out of the lsb produces a considerably improved score. Still not great but I think this shows that PractRand gives extra importance to bit 0, which has been mentioned before probably.
2. The high byte score of 512M for xoroshiro32+ [14,2,7] is not bad at all and the 8-bit scores in general show how much better the higher bits are than the lower ones.
3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?
4. Pair distribution for xoroshiro32+ [3,2,6] is terrible, which highlights the huge quality gain possible with the extra rotation and addition in the ++ scrambler.
3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?
I've had a look at the code itself and can't see what could be wrong. The report files are correctly showing the p and not p, and that gets generated by the 'c' code directly, so the logic seems to be working as intended.
Having said that, I've just gone back and compared generator output data to https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 and found neither, with or without parity, fits! Both have occasional single bit flips, at different places, that at least shows the parity control is functioning and the engine is fine. I guess my "+" single summing scrambler must have a bug.
I've checked my XORO32 output data twice today with no issues showing for the ++ double summing scrambler.
3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?
I've had a look at the code itself and can't see what could be wrong. The report files are correctly showing the p and not p, and that gets generated by the 'c' code directly, so the logic seems to be working as intended.
Having said that, I've just gone back and compared generator output data to https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 and found neither, with or without parity, fits! Both have occasional single bit flips, at different places, that at least shows the parity control is functioning and the engine is fine. I guess my "+" single summing scrambler must have a bug.
I've checked my XORO32 output data twice today with no issues showing for the ++ double summing scrambler.
The parity trick inverts bit 0 50% of the time and it's possible these inversions "cancel out" to give identical distributions.
I'll assume the discrepancy with the parity data is acceptable given it's never been given a thorough workout in the past. I'm gridding this at the moment.
I've had some new scripting bugs with the extra logic for the parity switches so I've had to rerun things a bit. Turns out I hadn't previously been doing any string based conditional logic and got it quite wrong on first attempt. This meant all the newly added parity logic couldn't test true so defaulting to untrue at all times.
This didn't affect the frequency distribution code - It's separate and nearly all C.
Yep, I'm happy with my Xoroshiro generator code. Have no reason to doubt it at this stage. Here's the actual C source for the frequency distribution reports:
The few lines of code below the #ifdef PARITY_BIT is all that was added for parity handling. And that was verbatim copied from the old testing code from back when it was being used.
And here's my Xoroshiro32+p [14 2 7]. It definitely has bit zero discrepancies from the linked data.
3 2 1 0 7 6 5 4 b a 9 8 f e d c
===================================
40840001 2c7c5530 a249769f 202ddffc
5acaf2f0 d1d04dc9 0599adcc 80941b30
d6a4540d 2eac9c9d cf8635b4 be51545c
d2eb95d5 2f5433e6 36541dcd 66037010
ce5c08b0 d5213ac4 b4008e98 0521dced
aaa50f40 3d16f177 b5135ee2 f4cd5688
3952cd39 8b3b3b7c a270c928 483587d0
057fd4c1 86d4f024 8fc96f89 fe23172d
269ebb57 dc34446e b662b6ca c2e8e237
f8adc930 137e6131 32a84dbd 96f80fad
...
These are not correct, half the bit 0 values are wrong. I tested the pair frequency distributions of 00000000-000000FF for xoroshiro32+ and xoroshiro32+p and they are different.
If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.
I've run tests to see the frequency distribution of some double-iterated xoroshiro32++ outputs as in the XORO32 instruction, using sample sizes of 8/12/16 and various but not all of the possible lsb's. As the PractRand FPF test can detect equidistribution, it's likely that the best PractRand scores will occur when the equidistribution is disturbed the most. To detect this I looked for the lowest and highest frequencies that are furthest from the equidistributed values and here are the results:
There is no effect on the equidistribution if the sample is entirely within one xoroshiro32++ output and in order to disturb the equidistribution the sample must contain data from both outputs. For an equal mix of 8/12/16-bit samples the lsb is 12/10/8. However, the results above show that a half-and-half mix does not lead to the biggest variances, which are very small for 8 or 12 bits and so any PractRand tests should concentrate on 16 bits. To make things clear, the [3,2,6,9] sample above is for bits [24:9] of the double-iterated output.
I'm most interested currently in which 16-bit subsample of double-iterated xoroshiro32++ [14,2,7,5], i.e. the P2 v1 XORO32 instruction, has the highest PractRand score. Frequency test results below suggest it will be [25:10].
These are not correct, half the bit 0 values are wrong. I tested the pair frequency distributions of 00000000-000000FF for xoroshiro32+ and xoroshiro32+p and they are different.
Ha, I understand the bug now, and even why it has occurred during this resurrection. It's the "rngword_t" data type declaration I introduced recently. For a 16-bit shifter it is only 16 bits. This was changed from everything being 64-bit with masks to improve execution speed a small amount.
The parity calculation relies on extended carry existing but the current 16-bit data type has no room for extended carry. I'll switch back to the original 64-bit data type I think.
Didn't have to throw away any of the optimising in the end either. The solution was to split out the carry extraction separately from the result summing so it could be pushed down into regular data type. Here's the critical C foo:
I've wiped all the parity sourced Practrand report files and started the culling and gridding runs afresh.
EDIT: Here's the culled tables, both with and without parity, for comparison. Notably [11 10 12]s parity scores means it shouldn't be included now but I've thrown it in to fill the final score chart.
Comments
On that basis, we'd need to produce a distribution run for each and every score on the existing 16x16 grid before even starting on double spacing it.
The existing grid doesn't really test XORO32, which is why I suggested the new tests earlier today. We would need to compare the PractRand scores with the distributions and I might be able to do the latter. There are 17 possible 16-bit tests and maybe we should run those for [14,2,7,5] as that's what in the P2?
Table here:
http://forums.parallax.com/discussion/comment/1441863/#Comment_1441863
For contiguous 16-bit subsets, the number of the lsb is 0-16 thus 17 tests.
We've used PractRand as a comparison tool to choose [a,b,c,d]. That's fixed now in the P2 to [14,2,7,5] so we could just do 17 distributions but I'd like to see how they compare to the PractRand scores and what the highest score is. Taking eight bits from one output and eight from another [lsb = 8] might be the least correlated and hence most random.
PS: [14 2 7 5] isn't necessarily fixed. I'm happy to keep the options open for a revision.
I'm just outputting ideas as they appear!
Most if not all the development here recently has been aimed at finding the very best [a,b,c,d] and [3,2,6,9] appears to be a tiny bit better than [14,2,7,5]. The first version of the P2 will use the latter and there is testing we can do just on that but I'm not saying we should ignore [3,2,6,9] or others.
Gotta say, auto-culling went through real quick!. I don't remember it being that little an effort historically. I guess it's a case of so many of the unknowns are sorted and built into the automation now.
I'm running ten candidate grids right now:
I've run tests to see the frequency distribution of some double-iterated xoroshiro32++ outputs as in the XORO32 instruction, using sample sizes of 8/12/16 and various but not all of the possible lsb's. As the PractRand FPF test can detect equidistribution, it's likely that the best PractRand scores will occur when the equidistribution is disturbed the most. To detect this I looked for the lowest and highest frequencies that are furthest from the equidistributed values and here are the results:
There is no effect on the equidistribution if the sample is entirely within one xoroshiro32++ output and in order to disturb the equidistribution the sample must contain data from both outputs. For an equal mix of 8/12/16-bit samples the lsb is 12/10/8. However, the results above show that a half-and-half mix does not lead to the biggest variances, which are very small for 8 or 12 bits and so any PractRand tests should concentrate on 16 bits. To make things clear, the [3,2,6,9] sample above is for bits [24:9] of the double-iterated output.
EDIT: Flawed data removed
EDIT2: Corrected distribution data - https://forums.parallax.com/discussion/comment/1441973/#Comment_1441973
A few comments:
1. Rotating the xoroshiro32+ 16-bit samples one bit to the right to move bit 0 out of the lsb produces a considerably improved score. Still not great but I think this shows that PractRand gives extra importance to bit 0, which has been mentioned before probably.
2. The high byte score of 512M for xoroshiro32+ [14,2,7] is not bad at all and the 8-bit scores in general show how much better the higher bits are than the lower ones.
3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?
4. Pair distribution for xoroshiro32+ [3,2,6] is terrible, which highlights the huge quality gain possible with the extra rotation and addition in the ++ scrambler.
I've had a look at the code itself and can't see what could be wrong. The report files are correctly showing the p and not p, and that gets generated by the 'c' code directly, so the logic seems to be working as intended.
Having said that, I've just gone back and compared generator output data to https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 and found neither, with or without parity, fits! Both have occasional single bit flips, at different places, that at least shows the parity control is functioning and the engine is fine. I guess my "+" single summing scrambler must have a bug.
I've checked my XORO32 output data twice today with no issues showing for the ++ double summing scrambler.
The data at https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 are correct for xoroshiro32+p [14,2,7] with PRN calculated after state is iterated.
The parity trick inverts bit 0 50% of the time and it's possible these inversions "cancel out" to give identical distributions.
Correct Xoroshiro32+ [14 2 7] output data is here - https://forums.parallax.com/discussion/comment/1420802/#Comment_1420802
Here's the matching 32-bit text formatting from my compile:
We can declare all as good again I presume.
This didn't affect the frequency distribution code - It's separate and nearly all C.
Refreshed single-summing grid scores coming ...
The few lines of code below the #ifdef PARITY_BIT is all that was added for parity handling. And that was verbatim copied from the old testing code from back when it was being used.
EDIT: Flawed data removed
EDIT2: Revised graph - https://forums.parallax.com/discussion/comment/1442007/#Comment_1442007
Worked out as:
+ 8 lines of 16 GB scores (aperture 9-16 bits)
+ 4 lines of 8 GB scores (aperture 5-8 bits)
+ 2 lines of 4 GB scores (aperture 3-4 bits)
+ 1 line of 2 GB scores (aperture 2 bit)
+ 1 line of 1 GB scores (aperture 1 bit)
These are not correct, half the bit 0 values are wrong. I tested the pair frequency distributions of 00000000-000000FF for xoroshiro32+ and xoroshiro32+p and they are different.
I'm most interested currently in which 16-bit subsample of double-iterated xoroshiro32++ [14,2,7,5], i.e. the P2 v1 XORO32 instruction, has the highest PractRand score. Frequency test results below suggest it will be [25:10].
How much better will the top score be than [15:0] or [31:16], I wonder?
Ha, I understand the bug now, and even why it has occurred during this resurrection. It's the "rngword_t" data type declaration I introduced recently. For a 16-bit shifter it is only 16 bits. This was changed from everything being 64-bit with masks to improve execution speed a small amount.
The parity calculation relies on extended carry existing but the current 16-bit data type has no room for extended carry. I'll switch back to the original 64-bit data type I think.
EDIT: Here's the culled tables, both with and without parity, for comparison. Notably [11 10 12]s parity scores means it shouldn't be included now but I've thrown it in to fill the final score chart.
Here's completely updated frequency distribution reports for the ten candidates:
Correct.