If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.
If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.
On that basis, we'd need to produce a distribution run for each and every score on the existing 16x16 grid before even starting on double spacing it.
If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.
On that basis, we'd need to produce a distribution run for each and every score on the existing 16x16 grid before even starting on double spacing it.
The existing grid doesn't really test XORO32, which is why I suggested the new tests earlier today. We would need to compare the PractRand scores with the distributions and I might be able to do the latter. There are 17 possible 16-bit tests and maybe we should run those for [14,2,7,5] as that's what in the P2?
For contiguous 16-bit subsets, the number of the lsb is 0-16 thus 17 tests.
We've used PractRand as a comparison tool to choose [a,b,c,d]. That's fixed now in the P2 to [14,2,7,5] so we could just do 17 distributions but I'd like to see how they compare to the PractRand scores and what the highest score is. Taking eight bits from one output and eight from another [lsb = 8] might be the least correlated and hence most random.
I've been working on your earlier requests. I gather you want me to change tact.
PS: [14 2 7 5] isn't necessarily fixed. I'm happy to keep the options open for a revision.
I'm just outputting ideas as they appear!
Most if not all the development here recently has been aimed at finding the very best [a,b,c,d] and [3,2,6,9] appears to be a tiny bit better than [14,2,7,5]. The first version of the P2 will use the latter and there is testing we can do just on that but I'm not saying we should ignore [3,2,6,9] or others.
I've resurrected the single summing scrambler functionality again. It had been disabled and not maintained for many months.
Gotta say, auto-culling went through real quick!. I don't remember it being that little an effort historically. I guess it's a case of so many of the unknowns are sorted and built into the automation now.
If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.
I've run tests to see the frequency distribution of some double-iterated xoroshiro32++ outputs as in the XORO32 instruction, using sample sizes of 8/12/16 and various but not all of the possible lsb's. As the PractRand FPF test can detect equidistribution, it's likely that the best PractRand scores will occur when the equidistribution is disturbed the most. To detect this I looked for the lowest and highest frequencies that are furthest from the equidistributed values and here are the results:
There is no effect on the equidistribution if the sample is entirely within one xoroshiro32++ output and in order to disturb the equidistribution the sample must contain data from both outputs. For an equal mix of 8/12/16-bit samples the lsb is 12/10/8. However, the results above show that a half-and-half mix does not lead to the biggest variances, which are very small for 8 or 12 bits and so any PractRand tests should concentrate on 16 bits. To make things clear, the [3,2,6,9] sample above is for bits [24:9] of the double-iterated output.
Here's the 10 grids for Xoroshiro32+ and a collection of distribution reports for Xoroshiro32+ and 32+p that I generated a couple days back. I still have to do some scripting fixes to do the grid run for 32+p without overwriting what I've got now.
Here's the 10 grids for Xoroshiro32+ and a collection of distribution reports for Xoroshiro32+ and 32+p that I generated a couple days back. I still have to do some scripting fixes to do the grid run for 32+p without overwriting what I've got now.
A few comments:
1. Rotating the xoroshiro32+ 16-bit samples one bit to the right to move bit 0 out of the lsb produces a considerably improved score. Still not great but I think this shows that PractRand gives extra importance to bit 0, which has been mentioned before probably.
2. The high byte score of 512M for xoroshiro32+ [14,2,7] is not bad at all and the 8-bit scores in general show how much better the higher bits are than the lower ones.
3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?
4. Pair distribution for xoroshiro32+ [3,2,6] is terrible, which highlights the huge quality gain possible with the extra rotation and addition in the ++ scrambler.
3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?
I've had a look at the code itself and can't see what could be wrong. The report files are correctly showing the p and not p, and that gets generated by the 'c' code directly, so the logic seems to be working as intended.
Having said that, I've just gone back and compared generator output data to https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 and found neither, with or without parity, fits! Both have occasional single bit flips, at different places, that at least shows the parity control is functioning and the engine is fine. I guess my "+" single summing scrambler must have a bug.
I've checked my XORO32 output data twice today with no issues showing for the ++ double summing scrambler.
3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?
I've had a look at the code itself and can't see what could be wrong. The report files are correctly showing the p and not p, and that gets generated by the 'c' code directly, so the logic seems to be working as intended.
Having said that, I've just gone back and compared generator output data to https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 and found neither, with or without parity, fits! Both have occasional single bit flips, at different places, that at least shows the parity control is functioning and the engine is fine. I guess my "+" single summing scrambler must have a bug.
I've checked my XORO32 output data twice today with no issues showing for the ++ double summing scrambler.
The parity trick inverts bit 0 50% of the time and it's possible these inversions "cancel out" to give identical distributions.
I'll assume the discrepancy with the parity data is acceptable given it's never been given a thorough workout in the past. I'm gridding this at the moment.
I've had some new scripting bugs with the extra logic for the parity switches so I've had to rerun things a bit. Turns out I hadn't previously been doing any string based conditional logic and got it quite wrong on first attempt. This meant all the newly added parity logic couldn't test true so defaulting to untrue at all times.
This didn't affect the frequency distribution code - It's separate and nearly all C.
Yep, I'm happy with my Xoroshiro generator code. Have no reason to doubt it at this stage. Here's the actual C source for the frequency distribution reports:
The few lines of code below the #ifdef PARITY_BIT is all that was added for parity handling. And that was verbatim copied from the old testing code from back when it was being used.
And here's my Xoroshiro32+p [14 2 7]. It definitely has bit zero discrepancies from the linked data.
32107654 b a 98 f e d c
===================================
40840001 2c7c5530 a249769f 202ddffc
5acaf2f0 d1d04dc9 0599adcc 80941b30
d6a4540d 2eac9c9d cf8635b4 be51545c
d2eb95d5 2f5433e6 36541dcd 66037010
ce5c08b0 d5213ac4 b4008e98 0521dced
aaa50f40 3d16f177 b5135ee2 f4cd5688
3952cd39 8b3b3b7c a270c928 483587d0
057fd4c1 86d4f024 8fc96f89 fe23172d
269ebb57 dc34446e b662b6ca c2e8e237
f8adc930 137e6131 32a84dbd 96f80fad
...
These are not correct, half the bit 0 values are wrong. I tested the pair frequency distributions of 00000000-000000FF for xoroshiro32+ and xoroshiro32+p and they are different.
If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.
I've run tests to see the frequency distribution of some double-iterated xoroshiro32++ outputs as in the XORO32 instruction, using sample sizes of 8/12/16 and various but not all of the possible lsb's. As the PractRand FPF test can detect equidistribution, it's likely that the best PractRand scores will occur when the equidistribution is disturbed the most. To detect this I looked for the lowest and highest frequencies that are furthest from the equidistributed values and here are the results:
There is no effect on the equidistribution if the sample is entirely within one xoroshiro32++ output and in order to disturb the equidistribution the sample must contain data from both outputs. For an equal mix of 8/12/16-bit samples the lsb is 12/10/8. However, the results above show that a half-and-half mix does not lead to the biggest variances, which are very small for 8 or 12 bits and so any PractRand tests should concentrate on 16 bits. To make things clear, the [3,2,6,9] sample above is for bits [24:9] of the double-iterated output.
I'm most interested currently in which 16-bit subsample of double-iterated xoroshiro32++ [14,2,7,5], i.e. the P2 v1 XORO32 instruction, has the highest PractRand score. Frequency test results below suggest it will be [25:10].
These are not correct, half the bit 0 values are wrong. I tested the pair frequency distributions of 00000000-000000FF for xoroshiro32+ and xoroshiro32+p and they are different.
Ha, I understand the bug now, and even why it has occurred during this resurrection. It's the "rngword_t" data type declaration I introduced recently. For a 16-bit shifter it is only 16 bits. This was changed from everything being 64-bit with masks to improve execution speed a small amount.
The parity calculation relies on extended carry existing but the current 16-bit data type has no room for extended carry. I'll switch back to the original 64-bit data type I think.
Didn't have to throw away any of the optimising in the end either. The solution was to split out the carry extraction separately from the result summing so it could be pushed down into regular data type. Here's the critical C foo:
I've wiped all the parity sourced Practrand report files and started the culling and gridding runs afresh.
EDIT: Here's the culled tables, both with and without parity, for comparison. Notably [11 10 12]s parity scores means it shouldn't be included now but I've thrown it in to fill the final score chart.
Comments
On that basis, we'd need to produce a distribution run for each and every score on the existing 16x16 grid before even starting on double spacing it.
The existing grid doesn't really test XORO32, which is why I suggested the new tests earlier today. We would need to compare the PractRand scores with the distributions and I might be able to do the latter. There are 17 possible 16-bit tests and maybe we should run those for [14,2,7,5] as that's what in the P2?
Table here:
http://forums.parallax.com/discussion/comment/1441863/#Comment_1441863
For contiguous 16-bit subsets, the number of the lsb is 0-16 thus 17 tests.
We've used PractRand as a comparison tool to choose [a,b,c,d]. That's fixed now in the P2 to [14,2,7,5] so we could just do 17 distributions but I'd like to see how they compare to the PractRand scores and what the highest score is. Taking eight bits from one output and eight from another [lsb = 8] might be the least correlated and hence most random.
PS: [14 2 7 5] isn't necessarily fixed. I'm happy to keep the options open for a revision.
I'm just outputting ideas as they appear!
Most if not all the development here recently has been aimed at finding the very best [a,b,c,d] and [3,2,6,9] appears to be a tiny bit better than [14,2,7,5]. The first version of the P2 will use the latter and there is testing we can do just on that but I'm not saying we should ignore [3,2,6,9] or others.
Gotta say, auto-culling went through real quick!. I don't remember it being that little an effort historically. I guess it's a case of so many of the unknowns are sorted and built into the automation now.
I'm running ten candidate grids right now:
__________________________________________________________________________________________________________ Xoroshiro32(16)+ PractRand Score Table. Build 2018-07-06 03:34:09 +1200 PractRand v0.93 options: stdin -multithreaded -te 1 -tf 2 -tlmin 1KB Scoring ran from 2018-07-06 03:08:16 to 2018-07-06 03:33:18. Byte Sampled Double Full Period = 8 GB __________________________________________________________________________________________________________ Sampling apertures of the generator output, labelled as most to least bit-significance Candidate ---------------------------------------------------------------------------------------- [ A B C] 15:00 6:15 5:14 4:13 3:12 2:11 1:10 0:09 15:08 14:07 13:06 12:05 11:04 10:03 9:02 8:01 7:00 ========================================================================================================== [14 2 7] 256K 512M 512M 256M 256M 256M 128M 256M 16M 64K [13 5 8] 256K 128M 64M 128M 128M 64M 128M 256M 16M 64K [10 3 11] 256K 256M 256M 256M 256M 256M 256M 128M 16M 64K [15 3 6] 256K 256M 512M 256M 512M 256M 256M 256M 16M 64K [11 10 12] 256K 64M 128M 128M 256M 128M 64M 256M 16M 64K [ 3 2 6] 2M 1M 64K [ 6 2 3] 512K 512K 64K [10 7 11] 256K 256M 256M 256M 64M 64M 64M 256M 16M 64K [ 8 9 13] 256K 256M 256M 256M 128M 512M 256M 512M 16M 64K [13 9 8] 256K 256M 256M 64M 128M 256M 256M 64M 16M 64K
I've run tests to see the frequency distribution of some double-iterated xoroshiro32++ outputs as in the XORO32 instruction, using sample sizes of 8/12/16 and various but not all of the possible lsb's. As the PractRand FPF test can detect equidistribution, it's likely that the best PractRand scores will occur when the equidistribution is disturbed the most. To detect this I looked for the lowest and highest frequencies that are furthest from the equidistributed values and here are the results:
# a, b, c, d, size, lsb, low freq, high freq 14, 2, 7, 5, 8, 15, 0FFFC12, 10003EE 14, 2, 7, 5, 12, 14, 0FF50E, 100CE3 14, 2, 7, 5, 16, 10, 0F64A, 10A52 3, 2, 6, 9, 16, 9, 0F815, 1092D
There is no effect on the equidistribution if the sample is entirely within one xoroshiro32++ output and in order to disturb the equidistribution the sample must contain data from both outputs. For an equal mix of 8/12/16-bit samples the lsb is 12/10/8. However, the results above show that a half-and-half mix does not lead to the biggest variances, which are very small for 8 or 12 bits and so any PractRand tests should concentrate on 16 bits. To make things clear, the [3,2,6,9] sample above is for bits [24:9] of the double-iterated output.
EDIT: Flawed data removed
EDIT2: Corrected distribution data - https://forums.parallax.com/discussion/comment/1441973/#Comment_1441973
A few comments:
1. Rotating the xoroshiro32+ 16-bit samples one bit to the right to move bit 0 out of the lsb produces a considerably improved score. Still not great but I think this shows that PractRand gives extra importance to bit 0, which has been mentioned before probably.
2. The high byte score of 512M for xoroshiro32+ [14,2,7] is not bad at all and the 8-bit scores in general show how much better the higher bits are than the lower ones.
3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?
4. Pair distribution for xoroshiro32+ [3,2,6] is terrible, which highlights the huge quality gain possible with the extra rotation and addition in the ++ scrambler.
I've had a look at the code itself and can't see what could be wrong. The report files are correctly showing the p and not p, and that gets generated by the 'c' code directly, so the logic seems to be working as intended.
Having said that, I've just gone back and compared generator output data to https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 and found neither, with or without parity, fits! Both have occasional single bit flips, at different places, that at least shows the parity control is functioning and the engine is fine. I guess my "+" single summing scrambler must have a bug.
I've checked my XORO32 output data twice today with no issues showing for the ++ double summing scrambler.
The data at https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 are correct for xoroshiro32+p [14,2,7] with PRN calculated after state is iterated.
The parity trick inverts bit 0 50% of the time and it's possible these inversions "cancel out" to give identical distributions.
Correct Xoroshiro32+ [14 2 7] output data is here - https://forums.parallax.com/discussion/comment/1420802/#Comment_1420802
Here's the matching 32-bit text formatting from my compile:
3 2 1 0 7 6 5 4 b a 9 8 f e d c =================================== 40850001 2c7c5530 a248769f 202ddffd 5acaf2f1 d1d14dc8 0598adcd 80941b30 d6a4540c 2eac9c9d cf8735b4 be51545d d2ea95d5 2f5433e7 36551dcd 66027010 ce5d08b0 d5213ac5 b4008e99 0520dced aaa40f41 3d16f177 b5125ee3 f4cc5688 3953cd39 8b3b3b7c a270c928 483487d1 057fd4c1 86d5f024 8fc96f89 fe22172c 269ebb57 dc34446f b662b6cb c2e9e237 f8acc930 137f6130 32a84dbc 96f90fad ...
3 2 1 0 7 6 5 4 b a 9 8 f e d c =================================== 40840001 2c7c5530 a249769f 202ddffc 5acaf2f0 d1d04dc9 0599adcc 80941b30 d6a4540d 2eac9c9d cf8635b4 be51545c d2eb95d5 2f5433e6 36541dcd 66037010 ce5c08b0 d5213ac4 b4008e98 0521dced aaa50f40 3d16f177 b5135ee2 f4cd5688 3952cd39 8b3b3b7c a270c928 483587d0 057fd4c1 86d4f024 8fc96f89 fe23172d 269ebb57 dc34446e b662b6ca c2e8e237 f8adc930 137e6131 32a84dbd 96f80fad ...
3 2 1 0 7 6 5 4 b a 9 8 f e d c =================================== 50ad0021 a3d9b89a 9bd90c87 35023caa 3d248840 71f57287 a81890ac 2ce3b4c1 b31dcc58 b7a4df3d 3b2ea2e3 bfff2c20 5b9aafe9 35f43bb8 6a2d921c a217cd9a 739c207f 168ac6ac e78f2142 428f7b93 a241e7e6 2be7766d e99cd031 5b40e27d 06d42ff9 b4546543 85699088 881a5ad3 568ed3d8 773eb4c6 4e43fbfb 613629f7 a6d157d0 2af38417 1d1dcdf8 1c547b19 c847866b 1d3330c2 6e954560 9f64ba4e ...
We can declare all as good again I presume.
This didn't affect the frequency distribution code - It's separate and nearly all C.
Refreshed single-summing grid scores coming ...
The few lines of code below the #ifdef PARITY_BIT is all that was added for parity handling. And that was verbatim copied from the old testing code from back when it was being used.
EDIT: Flawed data removed
EDIT2: Revised graph - https://forums.parallax.com/discussion/comment/1442007/#Comment_1442007
Worked out as:
+ 8 lines of 16 GB scores (aperture 9-16 bits)
+ 4 lines of 8 GB scores (aperture 5-8 bits)
+ 2 lines of 4 GB scores (aperture 3-4 bits)
+ 1 line of 2 GB scores (aperture 2 bit)
+ 1 line of 1 GB scores (aperture 1 bit)
These are not correct, half the bit 0 values are wrong. I tested the pair frequency distributions of 00000000-000000FF for xoroshiro32+ and xoroshiro32+p and they are different.
I'm most interested currently in which 16-bit subsample of double-iterated xoroshiro32++ [14,2,7,5], i.e. the P2 v1 XORO32 instruction, has the highest PractRand score. Frequency test results below suggest it will be [25:10].
# a, b, c, d, size, lsb, low freq, high freq 14, 2, 7, 5, 16, 0, 0FFFF, 10000 ... 14, 2, 7, 5, 16, 7, 0FEE8, 1011A 14, 2, 7, 5, 16, 8, 0FACD, 10532 14, 2, 7, 5, 16, 9, 0F815, 1092D 14, 2, 7, 5, 16, 10, 0F64A, 10A52 14, 2, 7, 5, 16, 11, 0F8AB, 107D6 14, 2, 7, 5, 16, 12, 0FD88, 1025F ... 14, 2, 7, 5, 16, 16, 0FFFF, 10000
How much better will the top score be than [15:0] or [31:16], I wonder?
Ha, I understand the bug now, and even why it has occurred during this resurrection. It's the "rngword_t" data type declaration I introduced recently. For a 16-bit shifter it is only 16 bits. This was changed from everything being 64-bit with masks to improve execution speed a small amount.
The parity calculation relies on extended carry existing but the current 16-bit data type has no room for extended carry. I'll switch back to the original 64-bit data type I think.
0001 4084 5530 2c7c 769f a249 dffc 202c f2f0 5acb 4dc8 d1d0 adcd 0598 1b31 8094 540d d6a5 9c9d 2ead 35b5 cf86 545d be50 95d4 d2ea 33e7 2f55 1dcc 3655 7011 6602 08b1 ce5c 3ac5 d521 8e98 b400 dcec 0520 0f41 aaa5 f177 3d17 5ee3 b513 5688 f4cd cd39 3953 3b7d 8b3b c928 a270 87d0 4834 d4c1 057e f024 86d5 6f89 8fc9 172c fe23 bb56 269f 446f dc34 b6cb b662 e237 c2e8 c930 f8ad 6131 137f 4dbc 32a8 0fac 96f9 ...
rngword_t parity = ((unsigned __int128)(s0 + s1) >> ACCUM_SIZE) & 1;
EDIT: Here's the culled tables, both with and without parity, for comparison. Notably [11 10 12]s parity scores means it shouldn't be included now but I've thrown it in to fill the final score chart.
__________________________________________________________________________________________________________ Xoroshiro32(16)+ PractRand Score Table. Build 2018-07-06 03:34:09 +1200 PractRand v0.93 options: stdin -multithreaded -te 1 -tf 2 -tlmin 1KB Scoring ran from 2018-07-06 03:08:16 to 2018-07-06 03:33:18. Byte Sampled Double Full Period = 8 GB __________________________________________________________________________________________________________ Sampling apertures of the generator output, labelled as most to least bit-significance Candidate ---------------------------------------------------------------------------------------- [ A B C] 15:00 6:15 5:14 4:13 3:12 2:11 1:10 0:09 15:08 14:07 13:06 12:05 11:04 10:03 9:02 8:01 7:00 ========================================================================================================== [14 2 7] 256K 512M 512M 256M 256M 256M 128M 256M 16M 64K [13 5 8] 256K 128M 64M 128M 128M 64M 128M 256M 16M 64K [10 3 11] 256K 256M 256M 256M 256M 256M 256M 128M 16M 64K [15 3 6] 256K 256M 512M 256M 512M 256M 256M 256M 16M 64K [11 10 12] 256K 64M 128M 128M 256M 128M 64M 256M 16M 64K [ 3 2 6] 2M 1M 64K [ 6 2 3] 512K 512K 64K [10 7 11] 256K 256M 256M 256M 64M 64M 64M 256M 16M 64K [ 8 9 13] 256K 256M 256M 256M 128M 512M 256M 512M 16M 64K [13 9 8] 256K 256M 256M 64M 128M 256M 256M 64M 16M 64K __________________________________________________________________________________________________________ Xoroshiro32(16)+p PractRand Score Table. Build 2018-07-08 18:42:39 +1200 PractRand v0.93 options: stdin -multithreaded -te 1 -tf 2 -tlmin 1KB Scoring ran from 2018-07-08 18:20:49 to 2018-07-08 18:41:57. Byte Sampled Double Full Period = 8 GB __________________________________________________________________________________________________________ Sampling apertures of the generator output, labelled as most to least bit-significance Candidate ---------------------------------------------------------------------------------------- [ A B C] 15:00 6:15 5:14 4:13 3:12 2:11 1:10 0:09 15:08 14:07 13:06 12:05 11:04 10:03 9:02 8:01 7:00 ========================================================================================================== [14 2 7] 512M 512M 512M 256M 256M 256M 128M 256M 16M 512M [13 5 8] 128M 128M 64M 128M 128M 64M 128M 256M 16M 128M [10 3 11] 512M 256M 256M 256M 256M 256M 256M 128M 16M 256M [15 3 6] 512M 256M 512M 256M 512M 256M 256M 256M 16M 256M [11 10 12] 64M 128M 128M 256M 128M 64M 256M 16M [ 3 2 6] 2M [ 6 2 3] 512K [10 7 11] 256M 256M 256M 256M 64M 64M 64M 256M 16M 512M [ 8 9 13] 256M 256M 256M 256M 128M 512M 256M 512M 16M 512M [13 9 8] 64M 256M 256M 64M 128M 256M 256M 64M 16M 512M
Here's completely updated frequency distribution reports for the ten candidates:
Correct.