Someone is saying that Xoroshiro32+ cannot possibly be as a good as an 8-bit Complementary Multiply-With-Carry PRNG and that CMWC in fact "is in an entirely different league."

Here's the pseudo-code:

;set constants
a = 253
;b = 8
;r = 8
;initialise variables
c = 0
i = 0
x[0] = 82
x[1] = 97
x[2] = 120
x[3] = 111
x[4] = 102
x[5] = 116
x[6] = 20
x[7] = 12
;iterate CMWC
t := a * x[i] + c
c := t / 256 ;high byte of t
x[i] := (t mod 256) xor 255 ;1's complement of low byte of t
prn := x[i]
i := (i + 1) and 7
;a,c,i,x[i] are 8-bit
;t is 16-bit
;Period > 2^66

I like to know how this compares to the top byte of Xoroshiro32+ [14,2,7] but I don't have the testing tools.

The XORO32 instruction iterates the state, but to actually get the random number out, you need to add the low and high 16-bit fields and use bits 15..1 of the sum, ignoring bit 0. So, just calling it RANDOM would be too simplistic.

The word RANDOM, while descriptive, is also quite generic regarding implementation. I think our instruction name should reflect the uniqueness, to some degree. I kind of like XORO32, myself. It's got a mysterious aura.

TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

Here's the Prop2 code that generated those 16 numbers:

again mov x,#16
mov y,#1
.lp xoro32 y
mov z,0
call #z_to_hex
djnz x,#.lp

I want to be sure this is what we all think it is.

No change needed there. Chip got that right for little-endian. Remember, for those 32 bits of xoro32r, when accessed as bytes, the end bits written there [7:0] are the first byte. So, the bits from xoro32a are already in low memory.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

I want to be sure this is what we all think it is.

I don't know what Evan uses, probably C. I'm sure he'll reply when he's recovered from staying up all night.

I use Z80 assembler! :cool:

Yep, C for the tight loops, plus PractRand. And Bash scripts for gluing it all together. The RNG and data formatting code is generalised so I can use any word size from 12 bits up to 64 bits. I've recently split the basic algorithm out of the data formatting code so that multiple versions of the data formatters can quickly be reassigned to any RNG.

EDIT: Renamed an incorrectly used term.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

Ia 14,2,7 still the optimal setting for how XORO32 is now working?

That's a big question right now. One explanation for the notable improvement of triplet [15 3 6], when swapping bit9 for bit1, is another point that Melissa O'Neill makes about all LCGs, like Xoroshift, is that they will, with enough state bits, pass every current statistical test.

So, a possible way to interpret the good result of moving bit1 up to bit9 position is this has moved the dodginess out to the 512M statistical mark. Does this actually strengthen a small weakness? Dunno.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

To see if there anything to compare I've just built score tables for every byte sampling variant, for the s16 triplet set, on both our current algorithm and also with the bit9<->bit1 swap.

Nothing notable showed up. There looked to be better candidates until I then compared with the word variants and that always had something bad.

I guess that means I'm testing wrong or the quality of the algorithm still needs work. I should try the PCG types ...

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

No change needed there. Chip got that right for little-endian. Remember, for those 32 bits of xoro32r, when accessed as bytes, the end bits written there [7:0] are the first byte. So, the bits from xoro32a are already in low memory.

Chip edited his original post. I think it is right now and wire[7:0] xoro32r = xoro32a[7:0] = the first byte.

On another point, in xoroshiro128plus.c at http://xoroshiro.di.unimi.it/ the sum and the state are out of sync by one iteration. The sum is calculated first so that the code could run faster in parallel but the summing should be at the end as in XORO32. This is mentioned in one of the c files.

Someone is saying that Xoroshiro32+ cannot possibly be as a good as an 8-bit Complementary Multiply-With-Carry PRNG and that CMWC in fact "is in an entirely different league."

Here's the pseudo-code:

;set constants
a = 253
;b = 8
;r = 8
;initialise variables
c = 0
i = 0
x[0] = 82
x[1] = 97
x[2] = 120
x[3] = 111
x[4] = 102
x[5] = 116
x[6] = 20
x[7] = 12
;iterate CMWC
t := a * x[i] + c
c := t / 256 ;high byte of t
x[i] := (t mod 256) xor 255 ;1's complement of low byte of t
prn := x[i]
i := (i + 1) and 7
;a,c,i,x[i] are 8-bit
;t is 16-bit
;Period > 2^66

I like to know how this compares to the top byte of Xoroshiro32+ [14,2,7] but I don't have the testing tools.

Evan, if you have the time and the interest, could you please confirm the outputs and run a PractRand test on this? It's the only test of CMWC I'll ever ask you to do.

Ia 14,2,7 still the optimal setting for how XORO32 is now working?

Yes, accepting the 128M "Byte1" score, this is the best triplet for current algorithm. All other byte variants of this triplet came out at 512M or 1G.

Okay. Great!

So, we can consider the improved XORO32 instruction to be optimized and DONE.

I think it is quite complete now and it has maybe realized its full potential as a seedable/steppable 32-bit PRNG for software use. It couldn't be higher quality for 32 bits and it couldn't be any easier to use.

It's very neat how your guys' ongoing efforts to understand and improve on this PRNG topology all came together, at the last minute, to present a way for encapsulstion, where every aspect of it could be so tidily sewn up. Very nice!

There are so many neat things in the prop2 that are the result of so many people having so much inspiration. There are interesting stories behind every bit of it. I've got to pm Johannes, the original poster on this thread, so he can see how things have turned out.

I'm going to add some seeding code in the ROM booter for our central XOROSHIRO 128+ generator. I"ll use P0's ADC calibration mode to garner a bunch of random bits to stuff in it, for like 100us, continuously. If this works, on no power-up will any Prop2 ever seed its PRNG with the same value as it has before, or any other Prop2 ever has or will in the future. We'll have randomness out the wazoo!

I keep smiling about how TonyB_'s fitful concern over getting a solid 16, 32, or 64 bits of quality per iteration, regardless of how awkward it might have worked using C and Z flags as extra storage, all resulted in some magical confluence of realizations, to make something maybe perfect, in the end.

And Evanh has been developing all these tests to discover what works best. It's like a big hunt.

Please keep thinking about PRNG's. I just wanted to acknowledge the current milestone we've reached.

Yes, something magical happened to XORO32 in the past day or so and it's N times better than before, where N is quite a big number. Many thanks to David Roberts for the parity idea, Evan for all his testing and Chip for the Verilog change that seemed to appear from nowhere!

Chip, is the endian-ness of xoro32r definitely correct now?

Evan, if you have the time and the interest, could you please confirm the outputs and run a PractRand test on this? It's the only test of CMWC I'll ever ask you to do.

I've done the code entry for the algorithm:

static uint8_t next(void) {
//initialise variables
static uint8_t i = 0; // table index, incremented per iteration
static uint8_t c = 0; // high byte of t, held for next iteration
static uint8_t x[8] = {82, 97, 120, 111, 102, 116, 20, 12}; // main state table, one entry changed per iteration
uint8_t prn;
uint16_t t;
//iterate CMWC
t = _A * x[i] + c;
c = t / 256; //high byte of t
x[i] = (t % 256) ^ 255; //1's complement of low byte of t
prn = x[i];
i = (i + 1) & 7;
return prn;
}

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

Clarity is nice. Optimiser takes care of variable elimination.

Yep, _A is what's called a #define. It's just a straight text replacement ahead of compiling. It never becomes a compile time symbol. Doing that that way was completely arbitrary. The labelling style of underscore-capital is sort of a convention in C for this type of code.

"There's no huge amount of massive material
hidden in the rings that we can't see,
the rings are almost pure ice."

## Comments

1,1241,124First 16 prn values:

10,92355304084

769F2C7C

DFFCA249

F2F0202C

4DC85ACB

ADCDD1D0

1B310598

540D8094

9C9DD6A5

35B52EAD

545DCF86

95D4BE50

33E7D2EA

1DCC2F55

70113655

08B16602

1,124First eight longs are correct, Chip.

1,1241,124No excuse now!

10,923TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

Here's the Prop2 code that generated those 16 numbers:

I want to be sure this is what we all think it is.

1,124I don't know what Evan uses, probably C. I'm sure he'll reply when he's recovered from staying up all night.

I use Z80 assembler! :cool:

10,923Z-80!!! Anything is possible, I guess. I'm curious to know how old you are. From your exuberance, I'd say you're young, but if you know Z-80...

Anyway, I guess all I need to know is that this new XORO32 mechanism tests well in the randomness analyzers.

1,124Here are the first 32 values, generated using QuickBASIC!

55304084

769F2C7C

DFFCA249

F2F0202C

4DC85ACB

ADCDD1D0

1B310598

540D8094

9C9DD6A5

35B52EAD

545DCF86

95D4BE50

33E7D2EA

1DCC2F55

70113655

08B16602

3AC5CE5C

8E98D521

DCECB400

0F410520

F177AAA5

5EE33D17

5688B513

CD39F4CD

3B7D3953

C9288B3B

87D0A270

D4C14834

F024057E

6F8986D5

172C8FC9

BB56FE23

10,923That all checks out on this end.

Now, do we know that this scores well?

10,9236,610hidden in the rings that we can't see,

the rings are almost pure ice."

6,610hidden in the rings that we can't see,

the rings are almost pure ice."

6,610Yep, C for the tight loops, plus PractRand. And Bash scripts for gluing it all together. The RNG and data formatting code is generalised so I can use any word size from 12 bits up to 64 bits. I've recently split the basic algorithm out of the data formatting code so that multiple versions of the data formatters can quickly be reassigned to any RNG.

EDIT: Renamed an incorrectly used term.

hidden in the rings that we can't see,

the rings are almost pure ice."

6,610That's a big question right now. One explanation for the notable improvement of triplet [15 3 6], when swapping bit9 for bit1, is another point that Melissa O'Neill makes about all LCGs, like Xoroshift, is that they will, with enough state bits, pass every current statistical test.

So, a possible way to interpret the good result of moving bit1 up to bit9 position is this has moved the dodginess out to the 512M statistical mark. Does this actually strengthen a small weakness? Dunno.

hidden in the rings that we can't see,

the rings are almost pure ice."

6,610Nothing notable showed up. There looked to be better candidates until I then compared with the word variants and that always had something bad.

I guess that means I'm testing wrong or the quality of the algorithm still needs work. I should try the PCG types ...

hidden in the rings that we can't see,

the rings are almost pure ice."

6,610hidden in the rings that we can't see,

the rings are almost pure ice."

1,124Chip edited his original post. I think it is right now and wire[7:0] xoro32r = xoro32a[7:0] = the first byte.

On another point, in xoroshiro128plus.c at http://xoroshiro.di.unimi.it/ the sum and the state are out of sync by one iteration. The sum is calculated first so that the code could run faster in parallel but the summing should be at the end as in XORO32. This is mentioned in one of the c files.

1,124Evan, if you have the time and the interest, could you please confirm the outputs and run a PractRand test on this? It's the only test of CMWC I'll ever ask you to do.

10,923Okay. Great!

So, we can consider the improved XORO32 instruction to be optimized and DONE.

I think it is quite complete now and it has maybe realized its full potential as a seedable/steppable 32-bit PRNG for software use. It couldn't be higher quality for 32 bits and it couldn't be any easier to use.

It's very neat how your guys' ongoing efforts to understand and improve on this PRNG topology all came together, at the last minute, to present a way for encapsulstion, where every aspect of it could be so tidily sewn up. Very nice!

There are so many neat things in the prop2 that are the result of so many people having so much inspiration. There are interesting stories behind every bit of it. I've got to pm Johannes, the original poster on this thread, so he can see how things have turned out.

I'm going to add some seeding code in the ROM booter for our central XOROSHIRO 128+ generator. I"ll use P0's ADC calibration mode to garner a bunch of random bits to stuff in it, for like 100us, continuously. If this works, on no power-up will any Prop2 ever seed its PRNG with the same value as it has before, or any other Prop2 ever has or will in the future. We'll have randomness out the wazoo!

I keep smiling about how TonyB_'s fitful concern over getting a solid 16, 32, or 64 bits of quality per iteration, regardless of how awkward it might have worked using C and Z flags as extra storage, all resulted in some magical confluence of realizations, to make something maybe perfect, in the end.

And Evanh has been developing all these tests to discover what works best. It's like a big hunt.

Please keep thinking about PRNG's. I just wanted to acknowledge the current milestone we've reached.

1,124Chip, is the endian-ness of xoro32r definitely correct now?

10,9236,610I wonder if Mr Robert's parity calculation includes the carry bit. That is what made the big improvement to the parity idea.

hidden in the rings that we can't see,

the rings are almost pure ice."

6,610hidden in the rings that we can't see,

the rings are almost pure ice."

1,1246,610hidden in the rings that we can't see,

the rings are almost pure ice."

1,1241,1246,610Yep, _A is what's called a #define. It's just a straight text replacement ahead of compiling. It never becomes a compile time symbol. Doing that that way was completely arbitrary. The labelling style of underscore-capital is sort of a convention in C for this type of code.

hidden in the rings that we can't see,

the rings are almost pure ice."