Random/LFSR on P2

TonyB_ · 2018-10-06 23:15

Zero run frequency Chi-Square = (Observed-Expected)^2/Expected

# (Observed-Expected)^2/Expected
# a,  b,  c,  d,     zfreq1,     zfreq2,     zfreq3,     zfreq4,     zfreq5,     zfreq6,     zfreq7,     zfreq8,     zfreq9,    zfreq10,    zfreq11,    zfreq12,    zfreq13,    zfreq14,    zfreq15,    zfreq16,    zfreq17,    zfreq18,    zfreq19,    zfreq20,    zfreq21,      total
# scro   , , , ,          0,          1,          1,          0,          1,          0,          0,          0,          1,          1,          3,          2,          1,          0,          0,          2,          1,          1,         12,          2,          0,         30
 13,  5, 10,  9,        119,          1,          2,         47,         34,         25,         33,         28,          2,          9,          4,          6,          1,          8,          0,          0,          0,          0,          2,          0,          4,        327
 13,  5,  8, 10,        253,          6,          3,         28,         11,         18,          6,          2,          1,          1,          0,          2,          3,          0,          1,          1,          1,          1,          0,          0,          0,        337
 14,  2,  7,  6,          6,          9,          6,         46,         84,        125,         92,         58,         84,         50,         24,         16,         13,          1,          0,          0,          0,          0,          1,          0,          9,        624
  6,  2,  3,  9,        109,        534,        153,        107,          9,         32,         89,         61,         57,         32,         20,         26,          5,          2,          2,          7,          1,          2,          1,          0,          1,       1250
  3,  2,  6,  9,       1982,         27,        540,       1056,       1636,       1430,       1009,        664,        429,        224,        135,         57,         26,         25,          2,          6,          5,          4,          0,          1,          1,       9258
 14,  2,  7,  5,       5517,        241,       1975,       3601,       5246,       4591,       3332,       2088,       1289,        655,        277,        138,         58,         38,         22,         13,          6,          3,          0,          0,          0,      29092

TonyB_ · 2018-10-06 23:15

Non-zero run frequency Chi-Square = (Observed-Expected)^2/Expected

# (Observed-Expected)^2/Expected
# a,  b,  c,  d,    nzfreq1,    nzfreq2,    nzfreq3,    nzfreq4,    nzfreq5,    nzfreq6,    nzfreq7,    nzfreq8,    nzfreq9,   nzfreq10,   nzfreq11,   nzfreq12,   nzfreq13,   nzfreq14,   nzfreq15,   nzfreq16,   nzfreq17,   nzfreq18,   nzfreq19,   nzfreq20,   nzfreq21,   nzfreq22,   nzfreq23,   nzfreq24,   nzfreq25,   nzfreq26,   nzfreq27,   nzfreq28,   nzfreq29,   nzfreq30,   nzfreq31,   nzfreq32,   nzfreq33,   nzfreq34,   nzfreq35,   nzfreq36,   nzfreq37,   nzfreq38,   nzfreq39,   nzfreq40,   nzfreq41,   nzfreq42,   nzfreq43,   nzfreq44,   nzfreq45,      total
# scro   , , , ,          1,          0,          0,          4,          2,          2,          3,          2,          0,          2,          0,          1,          0,          0,          0,          2,          2,          1,          1,          1,          1,          0,          0,          0,          1,          0,          0,          1,          1,          0,          4,          1,          0,          0,          0,          0,          1,          0,          1,          3,          1,          0,          2,          1,          1,         41
 13,  5, 10,  9,         42,          4,         64,          2,          3,         16,          0,          6,         45,         16,          9,         10,          2,         13,          2,          3,          0,          3,          0,          0,          1,          0,          1,          4,          0,          0,          0,          6,          0,          5,          0,          2,          0,          0,          0,          1,          3,          0,          0,          1,          0,          1,          0,          0,          1,        266
 13,  5,  8, 10,        248,          1,         85,          8,          1,         32,         14,         19,         10,         15,         52,         10,         11,          6,          2,          0,          3,          2,          1,          0,          1,          1,          0,          1,          7,          0,          1,          0,          1,          1,          4,          0,          1,          3,          5,          1,          4,          0,          8,          4,          1,          0,          0,          4,          0,        567
 14,  2,  7,  6,        579,         16,         49,        180,        130,         29,         69,         51,         20,          2,          3,         29,         46,         34,         40,         31,         50,         34,         49,         47,         41,         23,         21,         16,          5,          9,         15,          1,         12,          4,          3,          3,          3,          2,          1,          1,          3,          3,          2,          0,          0,          1,          2,          0,          0,       1662
  6,  2,  3,  9,         73,        175,         55,        452,         70,        111,         22,         41,        175,        219,        276,        198,        198,        221,        147,        193,        121,        102,         65,         59,         38,         22,         29,          8,         13,          8,         10,          8,          8,          0,          5,          0,          0,          3,          5,          3,          1,          0,          0,          0,          0,          1,          0,          0,          1,       3137
  3,  2,  6,  9,        346,        195,        242,        964,        372,        193,         45,         31,          2,         33,        114,        137,        176,        146,        172,        150,        129,        119,         79,         52,         45,         33,         21,         19,         29,          7,          4,          8,          0,          6,          7,          2,          3,          0,          3,          0,          0,          3,          1,          0,          0,          0,          0,          1,          1,       3895
 14,  2,  7,  5,       1727,        868,        942,       2651,       1516,        818,         35,         34,         41,         90,        338,        380,        351,        426,        527,        331,        289,        232,        256,        161,        124,         88,         74,         71,         55,         18,         37,          7,          8,          6,          1,          2,          4,          4,          1,          0,          3,          0,          0,          0,          0,          3,          0,          4,          1,      12526

TonyB_ · 2018-10-07 00:20

Evan, as the person who has done the vast majority of the testing, are you happy about changing the xoroshiro32++ constants?

From the distribution tests there is no doubt that two successive 16-bit outputs using [13,5,10,9] come the closest of all the candidates to a random 32-bit output. Now this would not be a good enough reason to switch if it performed poorly as a 16-bit generator but the PractRand and Crush tests seem to me to be very acceptable.

One question: is the d = 0 slot in the distribution binaries really for d = 0 or is it xoroshiro32+p?

evanh · 2018-10-07 00:53

Yep, no problem with the choice of [13 5 10 9].

d = 0 in those files represents a test of the original xoroshiro+ algorithm. If parity was enabled then parity would be active for the d != 0 tests as well.

TonyB_ · 2018-10-07 01:41

Thanks for all your testing, Evan.

PM sent to the "Parallax developers".

Cluso99 · 2018-10-07 03:02

How is the [x x x x] setting achieved in the verilog code?
ie could there be an option to change or is it hardcoded in the verilog?

evanh · 2018-10-07 03:22

Making the constants programable would make a huge difference to the logic size for the instruction. Like a 1000 fold increase, to pluck a number out of the air. EDIT: Thinking about it more, it's not that bad. There is already four 16-bit adders. It would need another eight 16-bit barrel shifters, and four 4-bit registers to hold the presets. I'm not sure how bulky a barrel shifter is compared to an adder.

Here's repost from previous page in this topic:

//  Verilog HDL for Xoroshiro32++ candidate [13 5 10 9]
///////////////////////////////////////////////////////////

wire [15:0] xoro32z	= d[31:16] ^ d[15:0];			// first iteration
wire [31:0] xoro32y	= { xoro32z[5:0], xoro32z[15:6],        // rol #10
			    {d[2:0], d[15:3]} ^                 // rol #13
			    {xoro32z[10:0], 5'b0} ^ xoro32z };  // shl #5

wire [15:0] xoro32x	= xoro32y[31:16] ^ xoro32y[15:0];	// second iteration
wire [31:0] xoro32	= { xoro32x[5:0], xoro32x[15:6],	// rol #10, xoro32 = d result (state store)
			    {xoro32y[2:0], xoro32y[15:3]} ^     // rol #13
			    {xoro32x[10:0], 5'b0} ^ xoro32x };  // shl #5

wire [15:0] xoro32a	= d[31:16] + d[15:0];			// hashing sum of input state
wire [15:0] xoro32b	= xoro32y[31:16] + xoro32y[15:0];	// hashing sum of first iterator state output

assign xoro32r	= { {xoro32b[6:0], xoro32b[15:7]} + xoro32y[15:0], // rol #9
		    {xoro32a[6:0], xoro32a[15:7]} + d[15:0] };	// rol #9, xoro32r = prng result, next instruction's s value

Cluso99 · 2018-10-07 07:34

OK, while I don't quite follow the correlation of the [13 5 10 9], the implementation is a fixed barrel shifter which is heaps less than a programmable barrel shifter. It could be possible to implement two variants but I would expect that might be it. A full barrel shifter would take a lot of logic unless the code is using an existing one in the ALU.

evanh · 2018-10-07 07:49

Call it a fixed shift. Can't call it a barrel when it's fixed mapping like this.

Also to get the speed up, it's not one reusable shifter, but eight parallel shifts. It's not one reused adder, but four parallel adders. All firing on the same clock cycle.

It's possible for different instructions to reuse the same operations (up to the optimiser) but RISC is based around rapid execution. And that means lots of parallel operations in hardware.

evanh · 2018-10-07 07:52

As I've said quite often in the past: Even the 80486, and everything x86 since, is a RISC architecture.

evanh · 2018-10-07 08:05

I've talked about ALU ports in the Prop2 in the past too when discussing timings and operations of specific instructions. Chip has never concurred with my statements in those ramblings. That's because the term ALU is long dead and I probably should have stopped using it.

There isn't a specific pair of databuses feeding a logic block of operations like there was of old. There is only the cogram ports, instruction register and pipeline registers. Everything else goes to the logic optimiser.

TonyB_ · 2018-10-07 12:57

[13,5,10,9] will be in the respin (thanks Chip!) and it will do full justice to Chip's double-iteration brainwave. A macro that takes [a,b,c,d] and spews out the right Verilog would have saved us some thinking. Anyway, it's all done now, I hope.

As XORO32 is basically a black box this change is risk-free and in fact the logic for [13,5,10,9] will be a tiny bit smaller as two 5-bit left shifts replace two 2-bit ones.

A possible way to tell future versions of the P2 apart is to rotate the state bits by varying amounts in the logic that calculates the PRN so that a specific state will generate different outputs but over the whole period the distributions will be the same.

TonyB_ · 2018-10-07 13:03

Evan, I plan to get in touch with Seba again to update him. For the purely 16-bit output xoroshiro32++, is there any candidate that stands out as better than the rest? In other words, could we now recommend something other than [14,2,7,5] that is mentioned in the paper?

evanh · 2018-10-07 14:41

Obviously, stating the new choice of [13 5 10 9] is in order. There's no other of particular note.

I would certainly like to know his thoughts on the effectiveness of your approach with distribution. On the one hand it seems too easy a solution so it can't be that simple. But, on the other hand, the shear smallness of Xoroshiro32's state space means we can run a test that doesn't get used for any "real" generators ...

I think, only him answering this question can we offer anything of note back to Seba.

TonyB_ · 2018-10-08 00:55

Evan, I will ask for Seba's thoughts. The distribution tests are simple but impossible to do with 64-bit or 128-bit states. One could argue that the latter are less "real" in the sense they are too big to test fully, whereas the XORO32 distribution data are certainly real.

As you tested xoroshiro++ with all 15 non-zero rotations and xoroshiro+, we can see from their Chi-Square results how the two scramblers compare. The improvement in quality that the extra rotation and addition gives is huge. The table below is a summary of results posted earlier with xoroshiro32+[13,5,10] added at the bottom:

# (Observed-Expected)^2/Expected totals
# a,  b,  c,  d,  pfreq0-12,  zfreq1-21, nzfreq1-45 
 13,  5, 10,  9,         24,        327,        266
 13,  5,  8, 10,         50,        337,        567
 14,  2,  7,  6,       1606,        624,       1662
  6,  2,  3,  9,       1674,       1250,       3137
  3,  2,  6,  9,      11845,       9258,       3895
 14,  2,  7,  5,      38421,      29092,      12526
 13,  5, 10,  -,    6053796,    4753215,    7520400

Although [13,5,10] appears to be terrible, it is actually the best xoroshiro+ with rankings of 3 for pfreq and 1 for both zfreq and nzfreq. This seems to be confirmation that the engine common to [13,5,10] and [13,5,10,9] is the best one as regards the distributions.

evanh · 2018-10-08 01:18

That does sound promising.

cgracey · 2018-10-09 06:01

TonyB_, I implemented your changes:

wire [31:0] xi0		=	d;					// first iteration [13,5,10,9]
wire [15:0] xx0		=	xi0[31:16]^ xi0[15:00];
wire [31:0] xy0		= {	xx0[05:00], xx0[15:06],			// rol 10
			       {xi0[02:00], xi0[15:03]} ^		// rol 13
			       {xx0[10:00], 5'b0} ^ xx0 };		// shl 5

wire [31:0] xi1		=	xy0;					// second iteration
wire [15:0] xx1		=	xi1[31:16]^ xi1[15:00];
wire [31:0] xy1		= {	xx1[05:00], xx1[15:06],			// rol 10
			       {xi1[02:00], xi1[15:03]} ^		// rol 13
			       {xx1[10:00], 5'b0} ^ xx1 };		// shl 5

wire [31:0] xoro32	=	xy1;					// d result

wire [16:0] xs0		=	xi0[31:16] + xi0[15:00];		// first sum
wire [16:0] xs1		=	xi1[31:16] + xi1[15:00];		// second sum

assign xoro32_r		= { {xs1[06:00], xs1[15:07]} + xi1[15:00],	// prng result is next instruction's s value
			    {xs0[06:00], xs0[15:07]} + xi0[15:00] };	// rol 9

Everything checks out on the result side, too.

Thanks a lot for improving this!

TonyB_ · 2018-10-10 00:31

Thank you for making the change, Chip. A lot of testing by Evan and some by myself over the past few months has not gone to waste.

Heater. · 2018-10-10 03:22

Am I to understand that the xo???, whatever, algorithm in the P2 is no longer conformant with any of the published algorithms? http://xoshiro.di.unimi.it/

evanh · 2018-10-10 09:47

For the XORO32 instruction, correct. That has been true for some time. Seba and Dave provided the algorithm for Xoroshiro++ to us before they published their recent work. It wasn't included in the publication.

The free running generator in the hub is using Xoroshiro128** that turned up along with the newly published Xoshiro engine.

Xoroshiro++ is still the same Xoroshiro engine, with the same three constants. Only the scrambler has been extended to have a second adder and a (fourth) constant.

evanh · 2018-10-10 10:54

And we're also double iterating to make 32-bits from two consecutive 16-bit random numbers.

TonyB_ · 2018-10-10 12:48

Heater. wrote: »

Am I to understand that the xo???, whatever, algorithm in the P2 is no longer conformant with any of the published algorithms? http://xoshiro.di.unimi.it/

evanh wrote: »

For the XORO32 instruction, correct. That has been true for some time. Seba and Dave provided the algorithm for Xoroshiro++ to us before they published their recent work. It wasn't included in the publication.

The free running generator in the hub is using Xoroshiro128** that turned up along with the newly published Xoshiro engine.

Xoroshiro++ is still the same Xoroshiro engine, with the same three constants. Only the scrambler has been extended to have a second adder and a (fourth) constant.

Before publication Seba told me xoroshiro++ would not be in the paper but it ended up being included (para 10.7, p.38). A link to the paper and mention of xoroshiro32++ is at http://xoshiro.di.unimi.it/. The paper does not make it clear in my view how much better the ++ scramber is than +, although it does call the former strong and the latter weak.

TonyB_ · 2018-10-10 12:53

Evan, do you think any xoroshiro16++ might pass the extended PractRand tests? About half the candidates had max scores in the tests you did in February, but I can't remember when you started using the extended test.

evanh · 2018-10-10 14:58

Looks like I've deleted whatever I had in the testing directories. The oldest s8 Practrand reports are from March and those are only single + scrambler.

EDIT: I've found a Feb dated s8 zip file labelled with "te1tf2", so that's the full extended PR tests. Attached:

EDIT2: Hmm, that looks a tad limited in the number of reports. The culling is too strong.

TonyB_ · 2018-10-10 15:08

Evan, on 25 February you emailed me file PRscores-xo-s8.zip, which might tell you which type of test you did.

evanh · 2018-10-10 15:16

Here's a fresh run of the distribution data:

evanh · 2018-10-10 15:36

Oh, that's interesting. Gridding everything I'm getting a bunch of cases that lock/crash Practrand at the 32KB mark. I've seen this once before but had forgotten all about it. It must have been the last time I ran the s8's.

EDIT: Odd, it seems to be fine as long as I exclude the d=0 cases. :shrug:

TonyB_ · 2018-10-10 16:53

evanh wrote: »

Looks like I've deleted whatever I had in the testing directories. The oldest s8 Practrand reports are from March and those are only single + scrambler.

EDIT: I've found a Feb dated s8 zip file labelled with "te1tf2", so that's the full extended PR tests. Attached:

EDIT2: Hmm, that looks a tad limited in the number of reports. The culling is too strong.

Thanks, Evan. I didn't have the full extended results.
Re culling, remember there aren't many [a,b,c] and 8 bits is the most you get with xoroshiro16 !

evanh · 2018-10-10 17:05

I've skipped the culling altogether. Just over halfway through now at 3000+ report files ...

evanh · 2018-10-10 18:09

Here's all the Practrand gridding data. I haven't made any spreadsheet.

Something suspect though: Full period repeat is 64 KB so Practrand should be maxing out at 128 KB, but there is plenty of 512 KB!

Random/LFSR on P2

Comments