Random/LFSR on P2

TonyB_ · 2018-07-05 11:49

If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.

evanh · 2018-07-05 11:55

TonyB_ wrote: »

If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.

On that basis, we'd need to produce a distribution run for each and every score on the existing 16x16 grid before even starting on double spacing it.

TonyB_ · 2018-07-05 12:10

evanh wrote: »

TonyB_ wrote: »

If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.

On that basis, we'd need to produce a distribution run for each and every score on the existing 16x16 grid before even starting on double spacing it.

The existing grid doesn't really test XORO32, which is why I suggested the new tests earlier today. We would need to compare the PractRand scores with the distributions and I might be able to do the latter. There are 17 possible 16-bit tests and maybe we should run those for [14,2,7,5] as that's what in the P2?

evanh · 2018-07-05 12:28

17?

TonyB_ · 2018-07-05 12:37

evanh wrote: »

17?

Table here:
http://forums.parallax.com/discussion/comment/1441863/#Comment_1441863

For contiguous 16-bit subsets, the number of the lsb is 0-16 thus 17 tests.

We've used PractRand as a comparison tool to choose [a,b,c,d]. That's fixed now in the P2 to [14,2,7,5] so we could just do 17 distributions but I'd like to see how they compare to the PractRand scores and what the highest score is. Taking eight bits from one output and eight from another [lsb = 8] might be the least correlated and hence most random.

evanh · 2018-07-05 13:05

I've been working on your earlier requests. I gather you want me to change tact.

PS: [14 2 7 5] isn't necessarily fixed. I'm happy to keep the options open for a revision.

TonyB_ · 2018-07-05 14:48

evanh wrote: »

I've been working on your earlier requests. I gather you want me to change tact.

PS: [14 2 7 5] isn't necessarily fixed. I'm happy to keep the options open for a revision.

I'm just outputting ideas as they appear!

Most if not all the development here recently has been aimed at finding the very best [a,b,c,d] and [3,2,6,9] appears to be a tiny bit better than [14,2,7,5]. The first version of the P2 will use the latter and there is testing we can do just on that but I'm not saying we should ignore [3,2,6,9] or others.

evanh · 2018-07-05 16:36

I've resurrected the single summing scrambler functionality again. It had been disabled and not maintained for many months.

Gotta say, auto-culling went through real quick!. I don't remember it being that little an effort historically. I guess it's a case of so many of the unknowns are sorted and built into the automation now.

I'm running ten candidate grids right now:

__________________________________________________________________________________________________________

  Xoroshiro32(16)+ PractRand Score Table.  Build 2018-07-06 03:34:09 +1200
    PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    Scoring ran from 2018-07-06 03:08:16 to 2018-07-06 03:33:18.  Byte Sampled Double Full Period = 8 GB
__________________________________________________________________________________________________________
                Sampling apertures of the generator output, labelled as most to least bit-significance
  Candidate    ----------------------------------------------------------------------------------------
[ A  B  C]  15:00  6:15  5:14  4:13  3:12  2:11  1:10  0:09 15:08 14:07 13:06 12:05 11:04 10:03  9:02  8:01  7:00
==========================================================================================================
[14  2  7]                                             256K  512M  512M  256M  256M  256M  128M  256M   16M   64K
[13  5  8]                                             256K  128M   64M  128M  128M   64M  128M  256M   16M   64K
[10  3 11]                                             256K  256M  256M  256M  256M  256M  256M  128M   16M   64K
[15  3  6]                                             256K  256M  512M  256M  512M  256M  256M  256M   16M   64K
[11 10 12]                                             256K   64M  128M  128M  256M  128M   64M  256M   16M   64K
[ 3  2  6]                                                                                         2M    1M   64K
[ 6  2  3]                                                                                       512K  512K   64K
[10  7 11]                                             256K  256M  256M  256M   64M   64M   64M  256M   16M   64K
[ 8  9 13]                                             256K  256M  256M  256M  128M  512M  256M  512M   16M   64K
[13  9  8]                                             256K  256M  256M   64M  128M  256M  256M   64M   16M   64K

TonyB_ · 2018-07-06 00:35

TonyB_ wrote: »

If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.

I've run tests to see the frequency distribution of some double-iterated xoroshiro32++ outputs as in the XORO32 instruction, using sample sizes of 8/12/16 and various but not all of the possible lsb's. As the PractRand FPF test can detect equidistribution, it's likely that the best PractRand scores will occur when the equidistribution is disturbed the most. To detect this I looked for the lowest and highest frequencies that are furthest from the equidistributed values and here are the results:

# a,  b,  c,  d, size, lsb, low freq, high freq
 14,  2,  7,  5,    8,  15,  0FFFC12,  10003EE
 14,  2,  7,  5,   12, 	14,   0FF50E,   100CE3
 14,  2,  7,  5,   16, 	10,    0F64A,    10A52
  3,  2,  6,  9,   16,   9,    0F815,    1092D

There is no effect on the equidistribution if the sample is entirely within one xoroshiro32++ output and in order to disturb the equidistribution the sample must contain data from both outputs. For an equal mix of 8/12/16-bit samples the lsb is 12/10/8. However, the results above show that a half-and-half mix does not lead to the biggest variances, which are very small for 8 or 12 bits and so any PractRand tests should concentrate on 16 bits. To make things clear, the [3,2,6,9] sample above is for bits [24:9] of the double-iterated output.

evanh · 2018-07-06 02:01

Here's the 10 grids for Xoroshiro32+ and a collection of distribution reports for Xoroshiro32+ and 32+p that I generated a couple days back. I still have to do some scripting fixes to do the grid run for 32+p without overwriting what I've got now.

EDIT: Flawed data removed
EDIT2: Corrected distribution data - https://forums.parallax.com/discussion/comment/1441973/#Comment_1441973

TonyB_ · 2018-07-06 11:22

evanh wrote: »

Here's the 10 grids for Xoroshiro32+ and a collection of distribution reports for Xoroshiro32+ and 32+p that I generated a couple days back. I still have to do some scripting fixes to do the grid run for 32+p without overwriting what I've got now.

A few comments:

1. Rotating the xoroshiro32+ 16-bit samples one bit to the right to move bit 0 out of the lsb produces a considerably improved score. Still not great but I think this shows that PractRand gives extra importance to bit 0, which has been mentioned before probably.

2. The high byte score of 512M for xoroshiro32+ [14,2,7] is not bad at all and the 8-bit scores in general show how much better the higher bits are than the lower ones.

3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?

4. Pair distribution for xoroshiro32+ [3,2,6] is terrible, which highlights the huge quality gain possible with the extra rotation and addition in the ++ scrambler.

evanh · 2018-07-06 12:57

TonyB_ wrote: »

3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?

I've had a look at the code itself and can't see what could be wrong. The report files are correctly showing the p and not p, and that gets generated by the 'c' code directly, so the logic seems to be working as intended.

Having said that, I've just gone back and compared generator output data to https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 and found neither, with or without parity, fits! Both have occasional single bit flips, at different places, that at least shows the parity control is functioning and the engine is fine. I guess my "+" single summing scrambler must have a bug.

I've checked my XORO32 output data twice today with no issues showing for the ++ double summing scrambler.

TonyB_ · 2018-07-06 14:05

evanh wrote: »

TonyB_ wrote: »

3. The pair distributions for xoroshiro32+ and xoroshiro32+p are identical for three sets of constants I've looked at. Is the parity code correct?

I've had a look at the code itself and can't see what could be wrong. The report files are correctly showing the p and not p, and that gets generated by the 'c' code directly, so the logic seems to be working as intended.

Having said that, I've just gone back and compared generator output data to https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 and found neither, with or without parity, fits! Both have occasional single bit flips, at different places, that at least shows the parity control is functioning and the engine is fine. I guess my "+" single summing scrambler must have a bug.

I've checked my XORO32 output data twice today with no issues showing for the ++ double summing scrambler.

The data at https://forums.parallax.com/discussion/comment/1423960/#Comment_1423960 are correct for xoroshiro32+p [14,2,7] with PRN calculated after state is iterated.

The parity trick inverts bit 0 50% of the time and it's possible these inversions "cancel out" to give identical distributions.

evanh · 2018-07-06 14:22

Oops, that linked data is with parity. EDIT: Ha, I see you've also noticed that.

Correct Xoroshiro32+ [14 2 7] output data is here - https://forums.parallax.com/discussion/comment/1420802/#Comment_1420802

Here's the matching 32-bit text formatting from my compile:

 3 2 1 0  7 6 5 4  b a 9 8  f e d c
===================================
40850001 2c7c5530 a248769f 202ddffd
5acaf2f1 d1d14dc8 0598adcd 80941b30
d6a4540c 2eac9c9d cf8735b4 be51545d
d2ea95d5 2f5433e7 36551dcd 66027010

ce5d08b0 d5213ac5 b4008e99 0520dced
aaa40f41 3d16f177 b5125ee3 f4cc5688
3953cd39 8b3b3b7c a270c928 483487d1
057fd4c1 86d5f024 8fc96f89 fe22172c

269ebb57 dc34446f b662b6cb c2e9e237
f8acc930 137f6130 32a84dbc 96f90fad
...

evanh · 2018-07-06 14:37

And here's my Xoroshiro32+p [14 2 7]. It definitely has bit zero discrepancies from the linked data.

 3 2 1 0  7 6 5 4  b a 9 8  f e d c
===================================
40840001 2c7c5530 a249769f 202ddffc
5acaf2f0 d1d04dc9 0599adcc 80941b30
d6a4540d 2eac9c9d cf8635b4 be51545c
d2eb95d5 2f5433e6 36541dcd 66037010

ce5c08b0 d5213ac4 b4008e98 0521dced
aaa50f40 3d16f177 b5135ee2 f4cd5688
3952cd39 8b3b3b7c a270c928 483587d0
057fd4c1 86d4f024 8fc96f89 fe23172d

269ebb57 dc34446e b662b6ca c2e8e237
f8adc930 137e6131 32a84dbd 96f80fad
...

evanh · 2018-07-06 14:45

And my Xoroshiro32++ [14 2 7 5], XORO32, output data (Which is all good):

 3 2 1 0  7 6 5 4  b a 9 8  f e d c
===================================
50ad0021 a3d9b89a 9bd90c87 35023caa
3d248840 71f57287 a81890ac 2ce3b4c1
b31dcc58 b7a4df3d 3b2ea2e3 bfff2c20
5b9aafe9 35f43bb8 6a2d921c a217cd9a

739c207f 168ac6ac e78f2142 428f7b93
a241e7e6 2be7766d e99cd031 5b40e27d
06d42ff9 b4546543 85699088 881a5ad3
568ed3d8 773eb4c6 4e43fbfb 613629f7

a6d157d0 2af38417 1d1dcdf8 1c547b19
c847866b 1d3330c2 6e954560 9f64ba4e
...

evanh · 2018-07-06 15:13

TonyB_ wrote: »

The parity trick inverts bit 0 50% of the time and it's possible these inversions "cancel out" to give identical distributions.

I'll assume the discrepancy with the parity data is acceptable given it's never been given a thorough workout in the past. I'm gridding this at the moment.

We can declare all as good again I presume.

TonyB_ · 2018-07-06 15:29

deleted

TonyB_ · 2018-07-07 10:47

deleted

evanh · 2018-07-07 15:22

I've had some new scripting bugs with the extra logic for the parity switches so I've had to rerun things a bit. Turns out I hadn't previously been doing any string based conditional logic and got it quite wrong on first attempt. This meant all the newly added parity logic couldn't test true so defaulting to untrue at all times.

This didn't affect the frequency distribution code - It's separate and nearly all C.

Refreshed single-summing grid scores coming ...

evanh · 2018-07-07 15:41

Yep, I'm happy with my Xoroshiro generator code. Have no reason to doubt it at this stage. Here's the actual C source for the frequency distribution reports:

The few lines of code below the #ifdef PARITY_BIT is all that was added for parity handling. And that was verbatim copied from the old testing code from back when it was being used.

evanh · 2018-07-07 16:23

Here we go. Seems the earlier grid scores were fine. That must have been before the parity switch was added.

EDIT: Flawed data removed
EDIT2: Revised graph - https://forums.parallax.com/discussion/comment/1442007/#Comment_1442007

evanh · 2018-07-07 17:06

A detail: The best possible exponent average is 33.41785.

Worked out as:
+ 8 lines of 16 GB scores (aperture 9-16 bits)
+ 4 lines of 8 GB scores (aperture 5-8 bits)
+ 2 lines of 4 GB scores (aperture 3-4 bits)
+ 1 line of 2 GB scores (aperture 2 bit)
+ 1 line of 1 GB scores (aperture 1 bit)

TonyB_ · 2018-07-07 18:06

evanh wrote: »

And here's my Xoroshiro32+p [14 2 7]. It definitely has bit zero discrepancies from the linked data.

 3 2 1 0  7 6 5 4  b a 9 8  f e d c
===================================
40840001 2c7c5530 a249769f 202ddffc
5acaf2f0 d1d04dc9 0599adcc 80941b30
d6a4540d 2eac9c9d cf8635b4 be51545c
d2eb95d5 2f5433e6 36541dcd 66037010

ce5c08b0 d5213ac4 b4008e98 0521dced
aaa50f40 3d16f177 b5135ee2 f4cd5688
3952cd39 8b3b3b7c a270c928 483587d0
057fd4c1 86d4f024 8fc96f89 fe23172d

269ebb57 dc34446e b662b6ca c2e8e237
f8adc930 137e6131 32a84dbd 96f80fad
...

These are not correct, half the bit 0 values are wrong. I tested the pair frequency distributions of 00000000-000000FF for xoroshiro32+ and xoroshiro32+p and they are different.

TonyB_ · 2018-07-08 00:19

TonyB_ wrote: »
TonyB_ wrote: »

If users want, say, 16-bit PRNs with equidistribution, they should use both low and high words of the XORO32 output. If they want 16-bit PRNs without equidistribution, they should use the 16-bit subset of the 32-bit output with the best PractRand score, which is not known yet.

I've run tests to see the frequency distribution of some double-iterated xoroshiro32++ outputs as in the XORO32 instruction, using sample sizes of 8/12/16 and various but not all of the possible lsb's. As the PractRand FPF test can detect equidistribution, it's likely that the best PractRand scores will occur when the equidistribution is disturbed the most. To detect this I looked for the lowest and highest frequencies that are furthest from the equidistributed values and here are the results:
# a,  b,  c,  d, size, lsb, low freq, high freq
 14,  2,  7,  5,    8,  15,  0FFFC12,  10003EE
 14,  2,  7,  5,   12, 	14,   0FF50E,   100CE3
 14,  2,  7,  5,   16, 	10,    0F64A,    10A52
  3,  2,  6,  9,   16,   9,    0F815,    1092D
There is no effect on the equidistribution if the sample is entirely within one xoroshiro32++ output and in order to disturb the equidistribution the sample must contain data from both outputs. For an equal mix of 8/12/16-bit samples the lsb is 12/10/8. However, the results above show that a half-and-half mix does not lead to the biggest variances, which are very small for 8 or 12 bits and so any PractRand tests should concentrate on 16 bits. To make things clear, the [3,2,6,9] sample above is for bits [24:9] of the double-iterated output.

I'm most interested currently in which 16-bit subsample of double-iterated xoroshiro32++ [14,2,7,5], i.e. the P2 v1 XORO32 instruction, has the highest PractRand score. Frequency test results below suggest it will be [25:10].

# a,  b,  c,  d, size, lsb, low freq, high freq
 14,  2,  7,  5,   16, 	0,    0FFFF,    10000
...
 14,  2,  7,  5,   16, 	7,    0FEE8,    1011A
 14,  2,  7,  5,   16, 	8,    0FACD,    10532
 14,  2,  7,  5,   16, 	9,    0F815,    1092D
 14,  2,  7,  5,   16, 	10,   0F64A,    10A52
 14,  2,  7,  5,   16, 	11,   0F8AB,    107D6
 14,  2,  7,  5,   16, 	12,   0FD88,    1025F
...
 14,  2,  7,  5,   16, 	16,   0FFFF,    10000

How much better will the top score be than [15:0] or [31:16], I wonder?

evanh · 2018-07-08 02:30

TonyB_ wrote: »

These are not correct, half the bit 0 values are wrong. I tested the pair frequency distributions of 00000000-000000FF for xoroshiro32+ and xoroshiro32+p and they are different.

Ha, I understand the bug now, and even why it has occurred during this resurrection. It's the "rngword_t" data type declaration I introduced recently. For a 16-bit shifter it is only 16 bits. This was changed from everything being 64-bit with masks to improve execution speed a small amount.

The parity calculation relies on extended carry existing but the current 16-bit data type has no room for extended carry. I'll switch back to the original 64-bit data type I think.

evanh · 2018-07-08 03:35

Output from corrected Xoroshiro32+p in the distribution code (16-bit text formatting this time):

0001 4084 5530 2c7c 769f a249 dffc 202c
f2f0 5acb 4dc8 d1d0 adcd 0598 1b31 8094
540d d6a5 9c9d 2ead 35b5 cf86 545d be50
95d4 d2ea 33e7 2f55 1dcc 3655 7011 6602

08b1 ce5c 3ac5 d521 8e98 b400 dcec 0520
0f41 aaa5 f177 3d17 5ee3 b513 5688 f4cd
cd39 3953 3b7d 8b3b c928 a270 87d0 4834
d4c1 057e f024 86d5 6f89 8fc9 172c fe23

bb56 269f 446f dc34 b6cb b662 e237 c2e8
c930 f8ad 6131 137f 4dbc 32a8 0fac 96f9
...

evanh · 2018-07-08 04:32

Didn't have to throw away any of the optimising in the end either. The solution was to split out the carry extraction separately from the result summing so it could be pushed down into regular data type. Here's the critical C foo:

rngword_t  parity = ((unsigned __int128)(s0 + s1) >> ACCUM_SIZE) & 1;

evanh · 2018-07-08 06:50

I've wiped all the parity sourced Practrand report files and started the culling and gridding runs afresh.

EDIT: Here's the culled tables, both with and without parity, for comparison. Notably [11 10 12]s parity scores means it shouldn't be included now but I've thrown it in to fill the final score chart.

__________________________________________________________________________________________________________

  Xoroshiro32(16)+ PractRand Score Table.  Build 2018-07-06 03:34:09 +1200
    PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    Scoring ran from 2018-07-06 03:08:16 to 2018-07-06 03:33:18.  Byte Sampled Double Full Period = 8 GB
__________________________________________________________________________________________________________
                Sampling apertures of the generator output, labelled as most to least bit-significance
  Candidate    ----------------------------------------------------------------------------------------
[ A  B  C]  15:00  6:15  5:14  4:13  3:12  2:11  1:10  0:09 15:08 14:07 13:06 12:05 11:04 10:03  9:02  8:01  7:00
==========================================================================================================
[14  2  7]                                             256K  512M  512M  256M  256M  256M  128M  256M   16M   64K
[13  5  8]                                             256K  128M   64M  128M  128M   64M  128M  256M   16M   64K
[10  3 11]                                             256K  256M  256M  256M  256M  256M  256M  128M   16M   64K
[15  3  6]                                             256K  256M  512M  256M  512M  256M  256M  256M   16M   64K
[11 10 12]                                             256K   64M  128M  128M  256M  128M   64M  256M   16M   64K
[ 3  2  6]                                                                                         2M    1M   64K
[ 6  2  3]                                                                                       512K  512K   64K
[10  7 11]                                             256K  256M  256M  256M   64M   64M   64M  256M   16M   64K
[ 8  9 13]                                             256K  256M  256M  256M  128M  512M  256M  512M   16M   64K
[13  9  8]                                             256K  256M  256M   64M  128M  256M  256M   64M   16M   64K

__________________________________________________________________________________________________________

  Xoroshiro32(16)+p PractRand Score Table.  Build 2018-07-08 18:42:39 +1200
    PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    Scoring ran from 2018-07-08 18:20:49 to 2018-07-08 18:41:57.  Byte Sampled Double Full Period = 8 GB
__________________________________________________________________________________________________________
                Sampling apertures of the generator output, labelled as most to least bit-significance
  Candidate    ----------------------------------------------------------------------------------------
[ A  B  C]  15:00  6:15  5:14  4:13  3:12  2:11  1:10  0:09 15:08 14:07 13:06 12:05 11:04 10:03  9:02  8:01  7:00
==========================================================================================================
[14  2  7]   512M                                            512M  512M  256M  256M  256M  128M  256M   16M  512M
[13  5  8]   128M                                            128M   64M  128M  128M   64M  128M  256M   16M  128M
[10  3 11]   512M                                            256M  256M  256M  256M  256M  256M  128M   16M  256M
[15  3  6]   512M                                            256M  512M  256M  512M  256M  256M  256M   16M  256M
[11 10 12]                                                    64M  128M  128M  256M  128M   64M  256M         16M
[ 3  2  6]                                                                                         2M            
[ 6  2  3]                                                                                       512K            
[10  7 11]   256M                                            256M  256M  256M   64M   64M   64M  256M   16M  512M
[ 8  9 13]   256M                                            256M  256M  256M  128M  512M  256M  512M   16M  512M
[13  9  8]    64M                                            256M  256M   64M  128M  256M  256M   64M   16M  512M

Here's completely updated frequency distribution reports for the ten candidates:

TonyB_ · 2018-07-08 09:58

evanh wrote: »

Output from corrected Xoroshiro32+p in the distribution code (16-bit text formatting this time):

0001 4084 5530 2c7c 769f a249 dffc 202c
f2f0 5acb 4dc8 d1d0 adcd 0598 1b31 8094
540d d6a5 9c9d 2ead 35b5 cf86 545d be50
95d4 d2ea 33e7 2f55 1dcc 3655 7011 6602

08b1 ce5c 3ac5 d521 8e98 b400 dcec 0520
0f41 aaa5 f177 3d17 5ee3 b513 5688 f4cd
cd39 3953 3b7d 8b3b c928 a270 87d0 4834
d4c1 057e f024 86d5 6f89 8fc9 172c fe23

bb56 269f 446f dc34 b6cb b662 e237 c2e8
c930 f8ad 6131 137f 4dbc 32a8 0fac 96f9
...

Correct.

Random/LFSR on P2

Comments