Random/LFSR on P2

TonyB_ · 2017-10-27 22:25

deleted

TonyB_ · 2017-10-27 22:50

TonyB_ wrote: »
Someone is saying that Xoroshiro32+ cannot possibly be as a good as an 8-bit Complementary Multiply-With-Carry PRNG and that CMWC in fact "is in an entirely different league."

Here's the pseudo-code:

;set constants

a = 253
;b = 8
;r = 8	

;initialise variables

c = 0
i = 0
x[0] = 82
x[1] = 97
x[2] = 120
x[3] = 111
x[4] = 102
x[5] = 116
x[6] = 20
x[7] = 12

;iterate CMWC

t := a * x[i] + c
c := t / 256			;high byte of t
x[i] := (t mod 256) xor 255	;1's complement of low byte of t
prn := x[i]
i := (i + 1) and 7

;a,c,i,x[i] are 8-bit
;t is 16-bit
;Period > 2^66
I like to know how this compares to the top byte of Xoroshiro32+ [14,2,7] but I don't have the testing tools.

First 16 prn values:

F5, D1, 07, D6, C3, F6, C8, 0F, D3, 80, 45, 7A, 75, 20, 64, 66

cgracey · 2017-10-27 22:54

Okay. I need one of you guys to tell me if these first 16 values from an initial seed of $00000001 are correct:

55304084
769F2C7C
DFFCA249
F2F0202C
4DC85ACB
ADCDD1D0
1B310598
540D8094
9C9DD6A5
35B52EAD
545DCF86
95D4BE50
33E7D2EA
1DCC2F55
70113655
08B16602

TonyB_ · 2017-10-27 23:09

cgracey wrote: »

Okay. I need one of you guys to tell me if these first 16 values from an initial seed of $00000001 are correct:

55304084
769F2C7C
DFFCA249
F2F0202C
4DC85ACB
ADCDD1D0
1B310598
540D8094
9C9DD6A5
35B52EAD
545DCF86
95D4BE50
33E7D2EA
1DCC2F55
70113655
08B16602

First eight longs are correct, Chip.

TonyB_ · 2017-10-27 23:16

Other eight longs also correct.

TonyB_ · 2017-10-27 23:17

TonyB wrote: »

Renaming XORO32 to RANDOM would make it so much easier to remember!

cgracey wrote: »

The XORO32 instruction iterates the state, but to actually get the random number out, you need to add the low and high 16-bit fields and use bits 15..1 of the sum, ignoring bit 0. So, just calling it RANDOM would be too simplistic.

TonyB wrote: »

Well, XORO32 will mean nothing to newbies, like me, especially when they find out that XORO16+16 would be more accurate.

cgracey wrote: »

Good point. We just used the naming precedent from xoroshiro128+. At first, I named it XORO32P, but that seemed twice as complex as XORO32.

No excuse now!

cgracey · 2017-10-27 23:24

The word RANDOM, while descriptive, is also quite generic regarding implementation. I think our instruction name should reflect the uniqueness, to some degree. I kind of like XORO32, myself. It's got a mysterious aura.

TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

Here's the Prop2 code that generated those 16 numbers:

again	mov	x,#16
	mov	y,#1

.lp	xoro32	y
	mov	z,0
	call	#z_to_hex
	djnz	x,#.lp

I want to be sure this is what we all think it is.

TonyB_ · 2017-10-27 23:32

cgracey wrote: »
TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

Here's the Prop2 code that generated those 16 numbers:
again	mov	x,#16
	mov	y,#1

.lp	xoro32	y
	mov	z,0
	call	#z_to_hex
	djnz	x,#.lp
I want to be sure this is what we all think it is.

I don't know what Evan uses, probably C. I'm sure he'll reply when he's recovered from staying up all night.

I use Z80 assembler! :cool:

cgracey · 2017-10-27 23:37

TonyB_ wrote: »
cgracey wrote: »
TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

Here's the Prop2 code that generated those 16 numbers:
again	mov	x,#16
	mov	y,#1

.lp	xoro32	y
	mov	z,0
	call	#z_to_hex
	djnz	x,#.lp
I want to be sure this is what we all think it is.
I don't know what Evan uses, probably C. I'm sure he'll reply when he's recovered from staying up all night.

I use Z80 assembler! :cool:

Z-80!!! Anything is possible, I guess. I'm curious to know how old you are. From your exuberance, I'd say you're young, but if you know Z-80...

Anyway, I guess all I need to know is that this new XORO32 mechanism tests well in the randomness analyzers.

TonyB_ · 2017-10-28 00:06

I'm kind of like XORO32, myself. I've got a mysterious aura.

Here are the first 32 values, generated using QuickBASIC!

55304084
769F2C7C
DFFCA249
F2F0202C
4DC85ACB
ADCDD1D0
1B310598
540D8094
9C9DD6A5
35B52EAD
545DCF86
95D4BE50
33E7D2EA
1DCC2F55
70113655
08B16602
3AC5CE5C
8E98D521
DCECB400
0F410520
F177AAA5
5EE33D17
5688B513
CD39F4CD
3B7D3953
C9288B3B
87D0A270
D4C14834
F024057E
6F8986D5
172C8FC9
BB56FE23

cgracey · 2017-10-28 00:11

Thanks, TonyB_.

That all checks out on this end.

Now, do we know that this scores well?

cgracey · 2017-10-28 00:45

Ia 14,2,7 still the optimal setting for how XORO32 is now working?

evanh · 2017-10-28 02:08

TonyB_ wrote: »
Thanks, Chip. Don't we want the first sum in the low word?
wire [31:0] xoro32r	= {xoro32b[15:1], ^xoro32b,		// xoro32r = prng result
			   xoro32a[15:1], ^xoro32a};

No change needed there. Chip got that right for little-endian. Remember, for those 32 bits of xoro32r, when accessed as bytes, the end bits written there [7:0] are the first byte. So, the bits from xoro32a are already in low memory.

evanh · 2017-10-28 02:29

cgracey wrote: »
Okay. I need one of you guys to tell me if these first 16 values from an initial seed of $00000001 are correct:
55304084  769F2C7C  DFFCA249  F2F0202C  4DC85ACB  ADCDD1D0  1B310598  540D8094
9C9DD6A5  35B52EAD  545DCF86  95D4BE50  33E7D2EA  1DCC2F55  70113655  08B16602

Affirmative, my first 40 16-bit sums:

0001 4084 5530 2c7c 769f a249 dffc 202c f2f0 5acb 4dc8 d1d0 adcd 0598 1b31 8094 540d d6a5 9c9d 2ead 35b5 cf86 545d be50 95d4 d2ea 33e7 2f55 1dcc 3655 7011 6602 08b1 ce5c 3ac5 d521 8e98 b400 dcec 0520

evanh · 2017-10-28 02:43

TonyB_ wrote: »

cgracey wrote: »

TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

I want to be sure this is what we all think it is.

I don't know what Evan uses, probably C. I'm sure he'll reply when he's recovered from staying up all night.

I use Z80 assembler! :cool:

Yep, C for the tight loops, plus PractRand. And Bash scripts for gluing it all together. The RNG and data formatting code is generalised so I can use any word size from 12 bits up to 64 bits. I've recently split the basic algorithm out of the data formatting code so that multiple versions of the data formatters can quickly be reassigned to any RNG.

EDIT: Renamed an incorrectly used term.

evanh · 2017-10-28 03:05

cgracey wrote: »

Ia 14,2,7 still the optimal setting for how XORO32 is now working?

That's a big question right now. One explanation for the notable improvement of triplet [15 3 6], when swapping bit9 for bit1, is another point that Melissa O'Neill makes about all LCGs, like Xoroshift, is that they will, with enough state bits, pass every current statistical test.

So, a possible way to interpret the good result of moving bit1 up to bit9 position is this has moved the dodginess out to the 512M statistical mark. Does this actually strengthen a small weakness? Dunno.

evanh · 2017-10-28 07:53

To see if there anything to compare I've just built score tables for every byte sampling variant, for the s16 triplet set, on both our current algorithm and also with the bit9<->bit1 swap.

Nothing notable showed up. There looked to be better candidates until I then compared with the word variants and that always had something bad.

I guess that means I'm testing wrong or the quality of the algorithm still needs work. I should try the PCG types ...

evanh · 2017-10-28 08:02

cgracey wrote: »

Ia 14,2,7 still the optimal setting for how XORO32 is now working?

Yes, accepting the 128M "Byte1" score, this is the best triplet for current algorithm. All other byte variants of this triplet came out at 512M or 1G.

TonyB_ · 2017-10-28 09:00

evanh wrote: »
TonyB_ wrote: »
Thanks, Chip. Don't we want the first sum in the low word?
wire [31:0] xoro32r	= {xoro32b[15:1], ^xoro32b,		// xoro32r = prng result
			   xoro32a[15:1], ^xoro32a};
No change needed there. Chip got that right for little-endian. Remember, for those 32 bits of xoro32r, when accessed as bytes, the end bits written there [7:0] are the first byte. So, the bits from xoro32a are already in low memory.

Chip edited his original post. I think it is right now and wire[7:0] xoro32r = xoro32a[7:0] = the first byte.

On another point, in xoroshiro128plus.c at http://xoroshiro.di.unimi.it/ the sum and the state are out of sync by one iteration. The sum is calculated first so that the code could run faster in parallel but the summing should be at the end as in XORO32. This is mentioned in one of the c files.

TonyB_ · 2017-10-28 09:05

TonyB_ wrote: »
TonyB_ wrote: »
Someone is saying that Xoroshiro32+ cannot possibly be as a good as an 8-bit Complementary Multiply-With-Carry PRNG and that CMWC in fact "is in an entirely different league."

Here's the pseudo-code:

;set constants

a = 253
;b = 8
;r = 8	

;initialise variables

c = 0
i = 0
x[0] = 82
x[1] = 97
x[2] = 120
x[3] = 111
x[4] = 102
x[5] = 116
x[6] = 20
x[7] = 12

;iterate CMWC

t := a * x[i] + c
c := t / 256			;high byte of t
x[i] := (t mod 256) xor 255	;1's complement of low byte of t
prn := x[i]
i := (i + 1) and 7

;a,c,i,x[i] are 8-bit
;t is 16-bit
;Period > 2^66
I like to know how this compares to the top byte of Xoroshiro32+ [14,2,7] but I don't have the testing tools.
First 16 prn values:
F5, D1, 07, D6, C3, F6, C8, 0F, D3, 80, 45, 7A, 75, 20, 64, 66

Evan, if you have the time and the interest, could you please confirm the outputs and run a PractRand test on this? It's the only test of CMWC I'll ever ask you to do.

cgracey · 2017-10-28 09:23

evanh wrote: »

cgracey wrote: »

Ia 14,2,7 still the optimal setting for how XORO32 is now working?

Yes, accepting the 128M "Byte1" score, this is the best triplet for current algorithm. All other byte variants of this triplet came out at 512M or 1G.

Okay. Great!

So, we can consider the improved XORO32 instruction to be optimized and DONE.

I think it is quite complete now and it has maybe realized its full potential as a seedable/steppable 32-bit PRNG for software use. It couldn't be higher quality for 32 bits and it couldn't be any easier to use.

It's very neat how your guys' ongoing efforts to understand and improve on this PRNG topology all came together, at the last minute, to present a way for encapsulstion, where every aspect of it could be so tidily sewn up. Very nice!

There are so many neat things in the prop2 that are the result of so many people having so much inspiration. There are interesting stories behind every bit of it. I've got to pm Johannes, the original poster on this thread, so he can see how things have turned out.

I'm going to add some seeding code in the ROM booter for our central XOROSHIRO 128+ generator. I"ll use P0's ADC calibration mode to garner a bunch of random bits to stuff in it, for like 100us, continuously. If this works, on no power-up will any Prop2 ever seed its PRNG with the same value as it has before, or any other Prop2 ever has or will in the future. We'll have randomness out the wazoo!

I keep smiling about how TonyB_'s fitful concern over getting a solid 16, 32, or 64 bits of quality per iteration, regardless of how awkward it might have worked using C and Z flags as extra storage, all resulted in some magical confluence of realizations, to make something maybe perfect, in the end.

And Evanh has been developing all these tests to discover what works best. It's like a big hunt.

Please keep thinking about PRNG's. I just wanted to acknowledge the current milestone we've reached.

TonyB_ · 2017-10-28 09:29

Yes, something magical happened to XORO32 in the past day or so and it's N times better than before, where N is quite a big number. Many thanks to David Roberts for the parity idea, Evan for all his testing and Chip for the Verilog change that seemed to appear from nowhere!

Chip, is the endian-ness of xoro32r definitely correct now?

cgracey · 2017-10-28 09:31

Yes, I had modified my Verilog post after your endian comment. It is what you see.

evanh · 2017-10-28 09:43

Thanks Chip and Tony. Glad to help.

I wonder if Mr Robert's parity calculation includes the carry bit. That is what made the big improvement to the parity idea.

evanh · 2017-10-28 09:45

TonyB_ wrote: »

Evan, if you have the time and the interest, could you please confirm the outputs and run a PractRand test on this? It's the only test of CMWC I'll ever ask you to do.

I've done the code entry for the algorithm:

static uint8_t  next(void)  {
//initialise variables
	static uint8_t  i = 0;  // table index, incremented per iteration
	static uint8_t  c = 0;  // high byte of t, held for next iteration
	static uint8_t  x[8] = {82, 97, 120, 111, 102, 116, 20, 12};  // main state table, one entry changed per iteration

	uint8_t  prn;
	uint16_t  t;


//iterate CMWC
	t = _A * x[i] + c;
	c = t / 256;    //high byte of t
	x[i] = (t % 256) ^ 255;    //1's complement of low byte of t
	prn = x[i];
	i = (i + 1) & 7;

	return prn;
}

TonyB_ · 2017-10-28 09:56

Excellent! Variable c is actually the carry in Complementary Multiply-With-Carry.

evanh · 2017-10-28 10:03

Oops, t should be 16 bits. I'll fix that ...

TonyB_ · 2017-10-28 10:09

Could you simply return x[ i ] (spaces to prevent italics) and get rid of prn? I only added prn to make things clear.

TonyB_ · 2017-10-28 10:16

Is _A declared as a constant = 253 somewhere else?

evanh · 2017-10-28 10:43

Clarity is nice. Optimiser takes care of variable elimination.

Yep, _A is what's called a #define. It's just a straight text replacement ahead of compiling. It never becomes a compile time symbol. Doing that that way was completely arbitrary. The labelling style of underscore-capital is sort of a convention in C for this type of code.

Random/LFSR on P2

Comments