Random/LFSR on P2

TonyB_ · 2018-10-05 02:30

evanh wrote: »

So you want double iterated 32-bit output from the Xoroshiro generator. And remove the pairing from the distribution test code so that it still only has to deal with 32-bit indexing. Right?

Do you have the code handy, Evan? It's quite late here. In a nutshell, don't save the previous output and do two new iterations to get the next 32-bit pair. Everything else is the same. No point testing more than one candidate.

TonyB_ · 2018-10-05 02:37

Actually, you could do this for the six candidates you cut off last night!

evanh · 2018-10-05 03:41

Here's the double iterate line from the gridding/culling test code:

#ifdef dITER
		random64 = rotr( (drnword_t)nextxoro() | ((drnword_t)nextxoro() << ACCUM_SIZE), SAMPLE_POSITION );
#else

The functional detail is it concatenates two consecutive 16-bit outputs into one 32-bit output that when sampled as 32>>0 still provides all the random data. And the second 2 Gwords has the the two halves of the output exchanged because of the 2^32-1 state space.

This looks functionally very similar to the pairing function in the distribution test code:

		next_prn = rotr( nextxoro(), SAMPLE_POSITION ) & SAMPLE_MASK;
		pairs[ prev_prn | (next_prn << SAMPLE_SIZE) ]++;  // 8-bit limited tallies
		prev_prn = next_prn;

There is a difference if the sample size and accumulator size are different but for the distribution testing, so far, they are the same.

I think the pairing might be doing what you want already. Was that maybe the original intent of the pairing? I thought it was a feature of the distribution test though.

TonyB_ · 2018-10-05 12:06

evanh wrote: »
Here's the double iterate line from the gridding/culling test code:
#ifdef dITER
		random64 = rotr( (drnword_t)nextxoro() | ((drnword_t)nextxoro() << ACCUM_SIZE), SAMPLE_POSITION );
#else
The functional detail is it concatenates two consecutive 16-bit outputs into one 32-bit output that when sampled as 32>>0 still provides all the random data. And the second 2 Gwords has the the two halves of the output exchanged because of the 2^32-1 state space.

This looks functionally very similar to the pairing function in the distribution test code:
		next_prn = rotr( nextxoro(), SAMPLE_POSITION ) & SAMPLE_MASK;
		pairs[ prev_prn | (next_prn << SAMPLE_SIZE) ]++;  // 8-bit limited tallies
		prev_prn = next_prn;
There is a difference if the sample size and accumulator size are different but for the distribution testing, so far, they are the same.

I think the pairing might be doing what you want already. Was that maybe the original intent of the pairing? I thought it was a feature of the distribution test though.

I found my original post:

TonyB_ wrote: »
Some thoughts about pair distribution. The C code iterates xoroshiro32++, concatenates successive 16-bit outputs and records how often each possible 32-bit value occurs. If the first 16-bit output is 1, the second 2, etc., then the C code produces the following 32-bit data:
{2,1}
{3,2}
{4,3}
...
{2^32-2,2^32-3}
{2^32-1,2^32-2}
{1,2^32-1}
XORO32 does a double iteration and outputs 32-bit data in the following order:
{2,1}
{4,3}
...
{2^32-2,2^32-3}
{1,2^32-1}
{3,2}
...
{2^32-1,2^23-2}
Although the order is different the data are the same, i.e. the pair distribution is the XORO32 output distribution and therefore a good test of the XORO32 randomness. The xoroshiro32++ 16-bit output is equidistributed so that every non-zero values occurs 2^16 times and zero 2^16-1 times, but the XORO32 32-bit output is not equidistributed because the ++ scrambler is only 1-dimensionally equidistributed.

It would have been very easy to make the C code work exactly like XORO32 but it is quicker the way it is now. If one candidate has the same distributions for either method then they all will and it seems that is the case, so thanks for checking. Every 16-bit output occurs twice and the C code interleaves the two ~2GB blocks that are separate with XORO32. Although the byte value at any particular address is of no consequence as it's the overall frequency distribution we want, the byte addresses differ by a one bit rotation, more or less, in the two methods.

TonyB_ · 2018-10-05 12:27

First 64 XORO32 outputs for [13,5,10,9] when seed = 1.

62690201
12A2AE16
D7194AE8
984B0C52
743C1DF1
BCC6DBA0
746C34C9
07FF3643
C642BBC0
594EAA85
701AD05A
F3AEA328
695A67EE
93C6C140
4964F5E1
FC575E24
A638D7AD
181AE233
7BE766B5
5CBD8445
49A45F17
C687C751
175E7CA8
8BBA386F
46776DFD
62BFF63A
4F64FD4F
5E5A15D9
B4E1379D
2E1E01C0
6A806734
B2D8BDB8
C4645DBB
99D47BC4
650F3E11
C0E82B43
82C2B0A0
9D53DC65
E0C12D02
92249C67
84638C3F
2A36E209
640EF1B9
D236A707
184A9FE3
8A183209
947E9886
FC1232D4
F6A2D1CA
FE8A985B
B93481C2
22425E3A
8EE02987
E12D494E
E72C03B4
EE576CA0
5C0B072E
6FFB0598
4D699199
16A97FAE
DC090AAC
689A45D1
E280AF76
436BCA2D

evanh · 2018-10-05 12:46

What was the pairing for, originally?

evanh · 2018-10-05 13:15

Here's the alternate source added for filling out what was the pairs count array, it's verbatim from the culling/gridding source:

	#ifdef dITER
		next_prn = rotr( (drnword_t)nextxoro() | ((drnword_t)nextxoro() << ACCUM_SIZE), SAMPLE_POSITION ) & SAMPLE_MASK;
		pairs[next_prn]++;
	#else
		next_prn = rotr( nextxoro(), SAMPLE_POSITION ) & SAMPLE_MASK;
		pairs[ prev_prn | (next_prn << SAMPLE_SIZE) ]++;  // 8-bit limited tallies
		prev_prn = next_prn;
	#endif

TonyB_ · 2018-10-05 13:16

evanh wrote: »

What was the pairing for, originally?

Originally, I looked at pairs of outputs to see whether they were duplicates. I knew the distribution would be binomial from one of Melissa's posts. The max frequency is less than 2^16 so it was easy to do on my PC. I soon realised a generalised pair distribution would be much more accurate, as every output would count instead of only one in 2^16, but my PC had insufficient RAM. A little while later it dawned on me that this pair distribution is identical to the XORO32 output distribution.

The distributions test XORO32 as a 32-bit generator. PractRand and the Crush suite test xoroshiro32++ as a 16-bit generator. A good score for one size does not imply a good score in the other. We need the overall best score in both and I think [13,5,10,9] gives us that.

evanh · 2018-10-05 13:34

Looks like the new code is running about 5% slower. Hmm, it's still doing 4 G loops of the pairs array filling, which makes it 8 G calls to nextxoro(). I think that might be over doing it.

Wow, twice the number of random numbers generated but it's still the same number of writes to main memory! DRAM speed is hitting hard!

evanh · 2018-10-05 13:53

Actually, the DRAM performance limit is worse than a simple write because it has to be a read-modify-write to do each increment.

TonyB_ · 2018-10-05 13:59

evanh wrote: »

Actually, the DRAM performance limit is worse than a simple write because it has to be a read-modify-write to do each increment.

Yes and it's only changing a byte at a time, but I don't see any way to make it faster.

XORO32 [13,5,10,9] outputs agree in BASIC and x86. Should be verified independently, though. Next stop is the modified Verilog.

evanh · 2018-10-05 14:10

Tony,
You realise, because of the 8 G calls, it's doubling up on every occurrence. All the pair[] tallies will be even values as a result.

TonyB_ · 2018-10-05 14:18

evanh wrote: »

Tony,
You realise, because of the 8 G calls, it's doubling up on every occurrence. All the pair[] tallies will be even values as a result.

You're doing 8G iterations to get 4G pairs now. Before it was 4G iterations as previous one was saved. Nothing else has changed. Remember period is 2^32-1, therefore a particular output that provides low word of address first time around provides high word the second time around and vice-versa.

evanh · 2018-10-05 14:23

Oh, right. I figured the total engine iterations shouldn't change. Doh!

evanh · 2018-10-05 14:31

Here's a quick selection using the dITER compile switch:

TonyB_ · 2018-10-05 14:48

evanh wrote: »

Here's a quick selection using the dITER compile switch:

Thanks Evan, the two different ways to generate the distributions agree.

TonyB_ · 2018-10-05 19:06

I think this post shows Verilog for XORO32 instruction, repeated immediately below.

[14,2,7,5] current:

wire [15:0] xoro32z	= d[31:16] ^ d[15:0];			// first iteration
wire [31:0] xoro32y	= { xoro32z[8:0], xoro32z[15:9],
			    {d[1:0], d[15:2]} ^
			    {xoro32z[13:0], 2'b0} ^ xoro32z };

wire [15:0] xoro32x	= xoro32y[31:16] ^ xoro32y[15:0];	// second iteration
wire [31:0] xoro32	= { xoro32x[8:0], xoro32x[15:9],	// xoro32 = d result (state store)
			    {xoro32y[1:0], xoro32y[15:2]} ^
			    {xoro32x[13:0], 2'b0} ^ xoro32x };

wire [15:0] xoro32a	= d[31:16] + d[15:0];			// hashing sum of input state
wire [15:0] xoro32b	= xoro32y[31:16] + xoro32y[15:0];	// hashing sum of first iterator state output

assign xoro32r	= { {xoro32b[10:0], xoro32b[15:11]} + xoro32y[15:0],
		    {xoro32a[10:0], xoro32a[15:11]} + d[15:0] }; // xoro32r = prng result, next instruction's s value

[13,5,10,9] proposed:

wire [15:0] xoro32z	= d[31:16] ^ d[15:0];			// first iteration
wire [31:0] xoro32y	= { xoro32z[5:0], xoro32z[15:6],
			    {d[2:0], d[15:3]} ^
			    {xoro32z[10:0], 5'b0} ^ xoro32z };

wire [15:0] xoro32x	= xoro32y[31:16] ^ xoro32y[15:0];	// second iteration
wire [31:0] xoro32	= { xoro32x[5:0], xoro32x[15:6],	// xoro32 = d result (state store)
			    {xoro32y[2:0], xoro32y[15:3]} ^
			    {xoro32x[10:0], 5'b0} ^ xoro32x };

wire [15:0] xoro32a	= d[31:16] + d[15:0];			// hashing sum of input state
wire [15:0] xoro32b	= xoro32y[31:16] + xoro32y[15:0];	// hashing sum of first iterator state output

assign xoro32r	= { {xoro32b[6:0], xoro32b[15:7]} + xoro32y[15:0],
		    {xoro32a[6:0], xoro32a[15:7]} + d[15:0] };	// xoro32r = prng result, next instruction's s value

xoro32y, xoro32 and xoro32r to be changed.
If the above is correct, it should produce the output here.

P.S. Good to see the old code blocks back as the recent change was affecting productivity.

evanh · 2018-10-06 01:40

Isn't this the change now?

TonyB_ · 2018-10-06 02:15

Does the proposed [13,5,10,9] Verilog look right to you, Evan?

I'm part way through creating some tables that show how good it is. I'll finish later today.

evanh · 2018-10-06 05:22

Looks right to me. I've commented the constants for quicker reading:

//  Verilog HDL for Xoroshiro32++ candidate [13 5 10 9]
///////////////////////////////////////////////////////////

wire [15:0] xoro32z	= d[31:16] ^ d[15:0];			// first iteration
wire [31:0] xoro32y	= { xoro32z[5:0], xoro32z[15:6],        // rol #10
			    {d[2:0], d[15:3]} ^                 // rol #13
			    {xoro32z[10:0], 5'b0} ^ xoro32z };  // shl #5

wire [15:0] xoro32x	= xoro32y[31:16] ^ xoro32y[15:0];	// second iteration
wire [31:0] xoro32	= { xoro32x[5:0], xoro32x[15:6],	// rol #10, xoro32 = d result (state store)
			    {xoro32y[2:0], xoro32y[15:3]} ^     // rol #13
			    {xoro32x[10:0], 5'b0} ^ xoro32x };  // shl #5

wire [15:0] xoro32a	= d[31:16] + d[15:0];			// hashing sum of input state
wire [15:0] xoro32b	= xoro32y[31:16] + xoro32y[15:0];	// hashing sum of first iterator state output

assign xoro32r	= { {xoro32b[6:0], xoro32b[15:7]} + xoro32y[15:0],
		    {xoro32a[6:0], xoro32a[15:7]} + d[15:0] };	// rol #9, xoro32r = prng result, next instruction's s value

TonyB_ · 2018-10-06 11:36

Thanks, Evan. It's a good idea to add the comments as a reminder of how it works.

TonyB_ · 2018-10-06 12:04

deleted

TonyB_ · 2018-10-06 12:05

deleted

TonyB_ · 2018-10-06 12:06

deleted

TonyB_ · 2018-10-06 12:19

The following three posts show distribution results for selected candidates. |Observed-Expected| is the absolute error.

TonyB_ · 2018-10-06 12:25

Pair frequency |Observed-Expected|

# |Observed-Expected|
# a,  b,  c,  d,     pfreq0,     pfreq1,     pfreq2,     pfreq3,     pfreq4,     pfreq5,     pfreq6,     pfreq7,     pfreq8,     pfreq9,    pfreq10,    pfreq11,    pfreq12
# scro   , , , ,      12447,      14664,       6423,       9195,       2453,       1491,        434,        120,        173,         34,         13,          4,          0
 13,  5, 10,  9,      78899,      89687,      41504,      42541,       5417,       2691,       1688,         89,         26,         49,         28,          6,          1
 13,  5,  8, 10,     119611,     135729,      56511,      50169,      14837,       6873,       1113,        677,        287,         14,         37,          5,          1
 14,  2,  7,  6,     693240,     705595,     333559,     116940,     150145,      56973,      17844,       2842,       1044,        126,         11,          8,          2
  6,  2,  3,  9,     700291,     715125,     325707,     114279,     137714,      64907,      18365,       4083,       1164,         58,         16,         13,          2
  3,  2,  6,  9,    1852203,    1862261,     896846,     283035,     392180,     168742,      48919,      11809,       1791,        368,         53,          3,          3
 14,  2,  7,  5,    3369116,    3393152,    1658965,     561575,     713103,     302470,      83700,      18360,       3320,        409,         63,          1,          0

TonyB_ · 2018-10-06 12:25

Zero run frequency |Observed-Expected|

# |Observed-Expected|
# a,  b,  c,  d,     zfreq1,     zfreq2,     zfreq3,     zfreq4,     zfreq5,     zfreq6,     zfreq7,     zfreq8,     zfreq9,    zfreq10,    zfreq11,    zfreq12,    zfreq13,    zfreq14,    zfreq15,    zfreq16,    zfreq17,    zfreq18,    zfreq19,    zfreq20,    zfreq21
# scro   , , , ,      17271,      11651,       8070,       2037,       2957,        444,        775,        515,        427,        230,        275,        148,         72,         24,         15,         19,          8,          4,         11,          3,          0
 13,  5, 10,  9,     274316,      12986,      14125,      38593,      19943,      10341,       7191,       4027,        574,        852,        317,        255,         74,        107,          7,          9,          2,          0,          4,          0,          2
 13,  5,  8, 10,     399287,      37005,      16381,      29568,      11097,       8837,       2939,       1086,        453,        238,         53,        138,        106,         15,         19,         16,          9,          4,          1,          1,          0
 14,  2,  7,  6,      62101,      44563,      23048,      37978,      31150,      23066,      11996,       5767,       4211,       1980,        830,        406,        222,         30,          9,          9,          1,          3,          3,          1,          3
  6,  2,  3,  9,     262462,     352036,     114239,      57912,      10014,      11589,      11818,       5940,       3476,       1584,        765,        527,        137,         56,         32,         36,         10,          7,          3,          1,          1
  3,  2,  6,  9,    1118583,      79497,     214756,     182201,     137546,      77988,      39744,      19549,       9537,       4176,       1966,        772,        315,        189,         33,         33,         18,         10,          2,          2,          1
 14,  2,  7,  5,    1866306,     236796,     410816,     336427,     246302,     139743,      72213,      34671,      16523,       7142,       2820,       1207,        475,        232,        107,         51,         21,          9,          1,          1,          0

TonyB_ · 2018-10-06 12:26

Non-zero run frequency |Observed-Expected|

# |Observed-Expected|
# a,  b,  c,  d,    nzfreq1,    nzfreq2,    nzfreq3,    nzfreq4,    nzfreq5,    nzfreq6,    nzfreq7,    nzfreq8,    nzfreq9,   nzfreq10,   nzfreq11,   nzfreq12,   nzfreq13,   nzfreq14,   nzfreq15,   nzfreq16,   nzfreq17,   nzfreq18,   nzfreq19,   nzfreq20,   nzfreq21,   nzfreq22,   nzfreq23,   nzfreq24,   nzfreq25,   nzfreq26,   nzfreq27,   nzfreq28,   nzfreq29,   nzfreq30,   nzfreq31,   nzfreq32,   nzfreq33,   nzfreq34,   nzfreq35,   nzfreq36,   nzfreq37,   nzfreq38,   nzfreq39,   nzfreq40,   nzfreq41,   nzfreq42,   nzfreq43,   nzfreq44,   nzfreq45
# scro   , , , ,      17548,       8203,       8451,      18153,      10472,       8301,       7820,       5494,        259,       3692,         14,       1199,        188,        227,        320,        756,        744,        283,        272,        184,        180,         61,         78,         21,         61,         16,         17,         31,         26,          5,         38,         12,          1,          2,          2,          0,          6,          2,          3,          4,          2,          0,          2,          1,          1
 13,  5, 10,  9,     123916,      28906,      96761,      13003,      13403,      24422,         86,       9758,      20511,       9668,       5791,       4952,       1724,       3477,       1213,        975,        212,        691,        201,         20,        140,         20,        114,        188,          3,         30,         10,         94,         14,         55,          4,         20,          6,          1,          1,          6,          9,          1,          0,          2,          1,          2,          0,          0,          1
 13,  5,  8, 10,     301733,      12768,     111630,      26604,       7345,      34203,      18037,      16628,       9519,       9368,      13916,       4820,       4130,       2393,       1137,         61,        892,        513,        299,        107,        222,        151,         51,         98,        210,         37,         44,         10,         36,         18,         41,          5,         12,         16,         18,          5,         10,          2,          9,          5,          2,          1,          0,          2,          0
 14,  2,  7,  6,     461338,      61444,      84874,     129163,      87316,      32725,      40235,      27583,      13636,       3551,       3536,       8258,       8282,       5669,       4864,       3407,       3451,       2262,       2161,       1686,       1247,        752,        572,        396,        178,        189,        194,         36,        107,         47,         33,         29,         23,         15,          9,          6,          8,          7,          5,          0,          1,          2,          2,          0,          0
  6,  2,  3,  9,     164117,     201875,      89806,     204802,      64224,      64097,      22963,      24718,      40474,      35969,      32163,      21653,      17188,      14458,       9364,       8544,       5381,       3922,       2496,       1889,       1204,        724,        661,        269,        281,        170,        153,        109,         88,         15,         42,          5,          6,         18,         18,         10,          4,          1,          0,          1,          0,          2,          1,          0,          1
  3,  2,  6,  9,     356549,     212743,     188614,     299169,     147752,      84527,      32412,      21264,       3829,      13929,      20684,      18030,      16223,      11768,      10150,       7530,       5548,       4239,       2753,       1775,       1316,        898,        565,        427,        420,        169,        101,        110,         18,         59,         53,         24,         23,          2,         13,          1,          3,          7,          3,          0,          0,          1,          1,          1,          1
 14,  2,  7,  5,     796603,     449094,     371827,     495986,     298229,     174131,      28618,      22581,      19650,      23117,      35575,      29984,      22917,      20059,      17747,      11176,       8308,       5917,       4938,       3120,       2176,       1456,       1062,        829,        579,        265,        301,        104,         89,         59,         20,         20,         24,         20,          8,          4,          8,          0,          1,          1,          0,          3,          1,          2,          1

TonyB_ · 2018-10-06 12:29

The next three posts show distribution results using the Chi-Square goodness of fit test. These show most clearly the difference between the current [14,2,7,5] and proposed [13,5,10,9] constants in the XORO32 instruction when considering the output as a single 32-bit value.

Scroll to the extreme right to see the total for all frequencies.

TonyB_ · 2018-10-06 23:14

Pair frequency Chi-Square = (Observed-Expected)^2/Expected

# (Observed-Expected)^2/Expected
# a,  b,  c,  d,     pfreq0,     pfreq1,     pfreq2,     pfreq3,     pfreq4,     pfreq5,     pfreq6,     pfreq7,     pfreq8,     pfreq9,    pfreq10,    pfreq11,    pfreq12,      total
# scro   , , , ,          0,          0,          0,          0,          0,          0,          0,          0,          1,          0,          0,          0,          0,          3
 13,  5, 10,  9,          4,          5,          2,          7,          0,          1,          1,          0,          0,          1,          2,          1,          0,         24
 13,  5,  8, 10,          9,         12,          4,         10,          3,          4,          1,          1,          2,          0,          3,          1,          0,         50
 14,  2,  7,  6,        304,        315,        141,         52,        342,        247,        145,         26,         28,          4,          0,          2,          1,       1606
  6,  2,  3,  9,        310,        324,        134,         50,        288,        320,        154,         53,         35,          1,          1,          4,          1,       1674
  3,  2,  6,  9,       2171,       2195,       1018,        304,       2336,       2163,       1090,        445,         82,         31,          6,          0,          3,      11845
 14,  2,  7,  5,       7184,       7287,       3484,       1198,       7724,       6948,       3192,       1075,        281,         38,          9,          0,          0,      38421

Random/LFSR on P2

Comments