I've arbitrarily tried a few of those ideas myself. Most end up with total fails on Practrand, some have given equivalent quality ... hmm, I've not tried to use all bits at once though, since that part I'd long moved outside the generator code. Good point, I'll have another fiddle ...
Tony,
Ha! That was silly of me, not doing a non-truncated scoring run. Turns out I had succeeded there all by myself. Here's two score tables. The first table is of the regular Xoroshiro algorithm with simple summing of the two halves of the state.
The second table is one of my earlier spins, a couple of weeks back last weekend, from your prompting - The iterator hasn't changed but the summing gets dynamically modified. It sums the two halves still but a copy of the state is rotated based on high three bits of s1 before the summing.
PS: Word0 is the full summing, without any truncation, that produces complete 16-bit results for every iteration.
Someone is saying that Xoroshiro32+ cannot possibly be as a good as an 8-bit Complementary Multiply-With-Carry PRNG and that CMWC in fact "is in an entirely different league."
Here's the pseudo-code:
;set constants
a = 253
;b = 8
;r = 8
;initialise variables
c = 0
i = 0
x[0] = 82
x[1] = 97
x[2] = 120
x[3] = 111
x[4] = 102
x[5] = 116
x[6] = 20
x[7] = 12
;iterate CMWC
t := a * x[i] + c
c := t / 256 ;high byte of t
x[i] := (t mod 256) xor 255 ;1's complement of low byte of t
prn := x[i]
i := (i + 1) and 7
;a,c,i,x[i] are 8-bit
;t is 16-bit
;Period > 2^66
I like to know how this compares to the top byte of Xoroshiro32+ [14,2,7] but I don't have the testing tools.
It's virtually identical scores to original Xoroshiro (including the lsbit) when the lsbit is replaced with parity, just a few more non-512K word0 scores. When the sum is shifted down one bit and the parity place as msbit then it all goes to Smile.
Someone is saying that Xoroshiro32+ cannot possibly be as a good as an 8-bit Complementary Multiply-With-Carry PRNG and that CWMC in fact "is in an entirely different league."
Probably quite right. The geometric tests that Melissa demo'd always came up nice and fuzzy looking with that type of algorithm. She strongly advocates for them because they fit modern CPUs really well.
However, the problem with those type is they're costly to have a hardware implementation because of the multiply.
Someone is saying that Xoroshiro32+ cannot possibly be as a good as an 8-bit Complementary Multiply-With-Carry PRNG and that CMWC in fact "is in an entirely different league."
Probably quite right. The geometric tests that Melissa demo'd always came up nice and fuzzy looking with that type of algorithm. She strongly advocates for them because they fit modern CPUs really well.
However, the problem with those type is they're costly to have a hardware implementation because of the multiply.
CMWC is yet another PRNG by the late Prof. Marsaglia.
The numbers in the pseudo-code were chosen to make it as easy as possible to implement on 8-bit CPUs but does that affect the quality? Multiply by 253 is a byte shift and three subtracts.
There are different versions, e.g. 32-bit with r=256 or r=4096, but if you're interested please test this 8-bit version first! I could generate the first 16 PRNs tonight as a check. Seeding the lag table correctly is an issue that Xoroshiro+ doesn't have.
Interesting, examining the results I now see that the best word0 score is 16M. Still doesn't make it to the nominal good scores, I'd want at least 256M, but of note the whole column is pretty much 32x better than what was achieved without the parity.
Because PRN[n] is a function of PRN[n-r] not PRN[n-1]. If r=4096, a new PRN won't be used to generate another PRN until 4096 (or 4095) more iterations have been done, hence the lag.
If XORO32 can be used to produce a high-quality 16-bit result per iteration, then I could enhance the instruction to do TWO iterations, get 16-bit sums from each, and substitute the high-quality 32-bit PRNG result into the next instruction's S value. Meanwhile, the double iteration would be written back to the original D in the XORO32 instruction.
Not sure if I've got me head fully around your intent there but two consecutive Xoroshiro32's can not a make a one Xoroshiro64. The period in particular is still that of the Xoroshiro32.
That said, I'm all for squeezing 16 useful bits out of the XORO32 instruction.
Comments
Tony,
Ha! That was silly of me, not doing a non-truncated scoring run. Turns out I had succeeded there all by myself. Here's two score tables. The first table is of the regular Xoroshiro algorithm with simple summing of the two halves of the state.
The second table is one of my earlier spins, a couple of weeks back last weekend, from your prompting - The iterator hasn't changed but the summing gets dynamically modified. It sums the two halves still but a copy of the state is rotated based on high three bits of s1 before the summing.
PS: Word0 is the full summing, without any truncation, that produces complete 16-bit results for every iteration.
Xoroshiro32+ PractRand Score Table - Run 2017-10-27 23:25:54 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 msBit ============================================================================= [ 1 2 8] 512K 32M 64M 16M 64M 64M 32M 1G [ 2 1 11] 512K 8M 8M 128K 128K 128K 256K 1G [ 2 1 15] 64K 128K 128K 32K 32K 64K 32K 1G [ 2 2 7] 512K 8M 8M 32M 32M 32M 4M 1G [ 2 6 15] 512K 2M 4M 16M 128K 128K 128K 1G [ 2 8 7] 512K 8M 16M 128M 32M 32M 4M 1G [ 2 9 9] 256K 256K 256K 16K 128K 512K 64M 1G [ 2 11 3] 512K 2M 4M 512K 256K 256K 256K 1G [ 3 2 6] 512K 64M 32M 2M 2M 2M 1M 1G [ 3 3 10] 512K 16M 16M 256K 256K 512K 64M 1G [ 3 11 2] 256K 256K 512K 128K 64K 64K 64K 1G [ 3 11 14] 512K 4M 4M 256K 256K 512K 512K 1G [ 4 1 9] 512K 512K 1M 512K 8M 256K 512K 512M [ 4 7 15] 512K 64M 64M 128M 512K 512K 512K 1G [ 4 8 5] 512K 512M 128M 32M 8M 4M 2M 1G [ 5 2 6] 512K 128M 128M 8M 4M 4M 4M 1G [ 5 8 4] 512K 4M 16M 4M 1M 256K 128K 64M [ 5 14 12] 512K 128M 64M 2M 4M 16M 128M 1G [ 6 2 3] 512K 32M 32M 1M 1M 1M 2M 2M [ 6 2 5] 512K 16M 16M 256K 512K 512K 512K 4M [ 6 2 11] 512K 512M 256M 8M 8M 256M 128M 1G [ 6 3 15] 512K 32M 32M 1M 1M 2M 4M 1G [ 6 14 11] 512K 128M 128M 32M 64M 128M 16M 1G [ 7 1 8] 512K 64M 128M 16M 128M 16M 16M 1G [ 7 2 2] 512K 16M 16M 2M 2M 4M 4M 4M [ 7 2 10] 512K 1G 512M 128M 256M 64M 32M 1G [ 7 2 14] 512K 512M 256M 4M 8M 32M 32M 256M [ 7 8 2] 512K 512M 512M 64M 8M 4M 4M 512M [ 7 10 10] 512K 128M 256M 32M 32M 256M 128M 1G [ 7 15 8] 512K 2M 2M 256K 1M 512K 256K 1G [ 8 1 7] 512K 1M 4M 1M 1M 1M 1M 64M [ 8 2 1] 512K 32M 64M 8M 8M 16M 16M 16M [ 8 5 13] 512K 32M 64M 16M 8M 128M 128M 1G [ 8 7 15] 512K 2M 2M 512K 2M 2M 1M 1G [ 8 9 13] 512K 512M 128M 512M 1G 512M 128M 1G [ 8 15 7] 256K 512K 1M 256K 512K 512K 256K 1G [ 9 1 4] 512K 32M 64M 8M 16M 16M 16M 64M [ 9 2 14] 512K 2G 1G 128M 256M 64M 128M 1G [ 9 8 14] 512K 16M 32M 256M 32M 32M 64M 1G [ 9 9 2] 512K 32M 64M 16M 16M 16M 32M 64M [10 2 7] 512K 256M 128M 16M 16M 64M 32M 1G [10 3 3] 512K 64M 64M 32M 32M 32M 64M 256M [10 3 11] 512K 2G 1G 256M 256M 512M 128M 1G [10 5 13] 512K 2G 512M 1G 512M 256M 128M 512M [10 7 11] 512K 512M 128M 512M 128M 512M 128M 1G [10 10 7] 512K 16M 16M 16M 16M 512M 128M 512M [11 1 2] 512K 256M 256M 32M 32M 32M 32M 1G [11 1 14] 512K 1G 1G 32M 16M 16M 16M 1G [11 2 6] 512K 512M 512M 32M 64M 512M 128M 512M [11 3 10] 512K 32M 32M 64M 1M 128K 64K 128M [11 7 10] 256K 4M 8M 32M 128K 128K 64K 1G [11 8 12] 512K 256M 256M 512M 512M 512M 128M 1G [11 10 12] 512K 128M 128M 1G 512M 512M 128M 1G [11 11 14] 512K 64M 128M 16M 16M 16M 8M 1G [11 14 6] 512K 256M 128M 256M 64M 512M 128M 1G [12 4 15] 512K 512M 1G 128M 64M 16M 16M 1G [12 8 11] 256K 4M 4M 64M 128K 128K 64K 1G [12 10 11] 512K 2M 2M 128K 128K 128K 64K 1G [12 10 13] 512K 32M 64M 16M 16M 32M 8M 1G [12 14 5] 512K 32M 32M 128M 128M 64M 64M 1G [13 3 14] 512K 512M 512M 64M 32M 16M 8M 1G [13 5 8] 512K 256M 256M 512M 128M 256M 128M 1G [13 5 10] 512K 64M 32M 64M 16M 256K 256K 1G [13 9 8] 512K 128M 64M 512M 512M 512M 128M 1G [13 10 12] 512K 2M 2M 128K 128K 128K 64K 512M [13 11 14] 512K 16M 16M 8M 8M 8M 8M 1G [13 12 14] 512K 16M 16M 8M 8M 4M 4M 1G [13 13 14] 512K 16M 16M 8M 8M 8M 8M 1G [14 1 11] 512K 64M 32M 256K 256K 256K 256K 1G [14 2 7] 512K 1G 512M 1G 512M 512M 128M 1G [14 2 9] 512K 512M 256M 128M 128M 64M 128M 1G [14 3 13] 512K 16M 16M 1M 256K 64K 64K 1G [14 8 9] 512K 128M 64M 1G 64M 64M 64M 1G [14 11 3] 512K 512M 512M 256M 128M 128M 64M 1G [14 11 11] 512K 16M 8M 256K 256K 128K 256K 1G [14 11 13] 256K 512K 512K 128K 64K 64K 64K 1G [14 12 13] 128K 512K 256K 128K 64K 64K 64K 1G [14 13 13] 128K 128K 256K 64K 64K 128K 64K 256M [15 1 2] 512K 2M 2M 1M 512K 512K 256K 1G [15 3 6] 512K 1G 512M 512M 512M 256M 128M 1G [15 4 12] 512K 128M 64M 128M 8M 256K 256K 1G [15 6 2] 512K 128M 256M 128M 128M 32M 16M 1G [15 7 4] 512K 512M 128M 512M 512M 256M 128M 1G [15 7 8] 512K 64M 64M 32M 32M 4M 2M 512M
Xoroshiro32r+ PractRand Score Table - Run 2017-10-27 22:41:34 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 msBit ============================================================================= [ 1 2 8] 16M 32M 16M 32M 16M 1M 1M 32M [ 2 1 11] 8M 16M 16M 512K 512K 512K 512K 512M [ 2 1 15] 128K 256K 256K 128K 128K 128K 64K 8M [ 2 2 7] 4M 8M 8M 16M 2M 1M 1M 8M [ 2 6 15] 1M 1M 1M 2M 128K 128K 256K 1G [ 2 8 7] 32M 64M 32M 16M 4M 1M 1M 4M [ 2 9 9] 512K 4M 4M 512K 2M 1M 1M 256M [ 2 11 3] 4M 4M 4M 512K 512K 512K 512K 1G [ 3 2 6] 16M 64M 32M 4M 4M 1M 2M 512M [ 3 3 10] 16M 32M 32M 4M 2M 4M 4M 1G [ 3 11 2] 256K 512K 256K 256K 128K 128K 128K 32M [ 3 11 14] 8M 8M 4M 1M 512K 512K 512K 64M [ 4 1 9] 1M 1M 2M 4M 1M 2M 2M 64M [ 4 7 15] 16M 16M 16M 1M 256K 256K 512K 1G [ 4 8 5] 64M 128M 64M 16M 2M 1M 2M 1G [ 5 2 6] 64M 64M 64M 4M 2M 2M 4M 1G [ 5 8 4] 2M 8M 8M 2M 512K 512K 256K 1G [ 5 14 12] 256M 256M 128M 8M 16M 32M 128M 1G [ 6 2 3] 32M 32M 16M 4M 1M 1M 1M 128M [ 6 2 5] 16M 8M 8M 1M 2M 2M 512K 1G [ 6 2 11] 256M 512M 128M 8M 16M 64M 64M 1G [ 6 3 15] 64M 64M 64M 2M 1M 2M 2M 32M [ 6 14 11] 64M 128M 64M 4M 8M 32M 64M 1G [ 7 1 8] 64M 128M 64M 8M 2M 4M 8M 512M [ 7 2 2] 16M 32M 16M 4M 1M 2M 2M 1G [ 7 2 10] 32M 1G 256M 8M 64M 64M 16M 1G [ 7 2 14] 512M 1G 256M 8M 8M 32M 64M 512M [ 7 8 2] 128M 256M 128M 32M 2M 1M 2M 1G [ 7 10 10] 64M 256M 64M 2M 32M 32M 16M 1G [ 7 15 8] 2M 4M 4M 2M 1M 1M 2M 32M [ 8 1 7] 2M 4M 8M 4M 2M 4M 4M 128M [ 8 2 1] 32M 128M 128M 2M 1M 4M 4M 256M [ 8 5 13] 32M 128M 128M 16M 64M 256M 128M 256M [ 8 7 15] 2M 4M 4M 1M 512K 1M 1M 8M [ 8 9 13] 512M 1G 512M 64M 128M 128M 128M 1G [ 8 15 7] 1M 1M 1M 1M 512K 512K 512K 16M [ 9 1 4] 8M 32M 32M 16M 32M 16M 16M 1G [ 9 2 14] 512M 2G 1G 16M 32M 128M 256M 1G [ 9 8 14] 8M 16M 16M 2M 1M 1M 512K 1G [ 9 9 2] 8M 128M 128M 8M 8M 16M 32M 1G [10 2 7] 16M 256M 128M 32M 8M 16M 32M 1G [10 3 3] 32M 128M 128M 4M 64M 128M 64M 1G [10 3 11] 16M 1G 512M 16M 256M 512M 512M 1G [10 5 13] 32M 2G 1G 32M 128M 256M 512M 1G [10 7 11] 16M 256M 256M 32M 128M 128M 128M 1G [10 10 7] 16M 32M 32M 32M 4M 8M 16M 512M [11 1 2] 32M 512M 128M 1M 16M 64M 32M 1G [11 1 14] 16M 1G 512M 4M 64M 32M 32M 1G [11 2 6] 32M 512M 512M 64M 32M 64M 256M 1G [11 3 10] 8M 32M 16M 4M 2M 1M 512K 1G [11 7 10] 2M 8M 4M 512K 512K 512K 512K 1G [11 8 12] 16M 128M 128M 4M 16M 32M 8M 1G [11 10 12] 16M 128M 64M 4M 2M 1M 256K 1G [11 11 14] 16M 64M 64M 2M 16M 16M 16M 1G [11 14 6] 32M 256M 256M 32M 16M 32M 64M 1G [12 4 15] 16M 512M 512M 16M 64M 128M 128M 1G [12 8 11] 2M 8M 4M 256K 512K 256K 256K 1G [12 10 11] 2M 4M 4M 256K 512K 256K 256K 1G [12 10 13] 16M 128M 64M 2M 2M 1M 256K 64M [12 14 5] 32M 128M 128M 32M 64M 64M 32M 1G [13 3 14] 32M 256M 128M 4M 32M 32M 16M 1G [13 5 8] 32M 256M 128M 128M 128M 64M 128M 1G [13 5 10] 32M 64M 32M 4M 2M 2M 2M 1G [13 9 8] 32M 256M 128M 64M 64M 32M 16M 1G [13 10 12] 1M 2M 2M 128K 256K 256K 128K 128M [13 11 14] 16M 32M 16M 2M 1M 256K 256K 256M [13 12 14] 8M 16M 16M 2M 512K 256K 128K 4M [13 13 14] 8M 16M 32M 2M 16M 256K 128K 128M [14 1 11] 32M 64M 32M 1M 1M 512K 512K 1G [14 2 7] 32M 128M 64M 64M 128M 32M 16M 1G [14 2 9] 32M 256M 64M 32M 16M 8M 8M 1G [14 3 13] 4M 8M 4M 512K 256K 128K 128K 1G [14 8 9] 32M 128M 32M 16M 8M 1M 1M 1G [14 11 3] 32M 256M 128M 16M 16M 2M 2M 1G [14 11 11] 16M 32M 16M 1M 512K 512K 256K 1G [14 11 13] 512K 1M 512K 256K 256K 128K 128K 16M [14 12 13] 1M 1M 512K 256K 256K 128K 128K 32M [14 13 13] 512K 1M 512K 256K 256K 128K 128K 8M [15 1 2] 2M 4M 4M 1M 1M 256K 256K 1M [15 3 6] 64M 64M 16M 32M 128M 8M 2M 512M [15 4 12] 32M 32M 16M 8M 512K 1M 1M 1G [15 6 2] 16M 64M 32M 16M 32M 32M 32M 128M [15 7 4] 32M 128M 64M 32M 64M 32M 8M 1G [15 7 8] 16M 32M 16M 8M 4M 512K 512K 256M
uint64_t result = (s0 + s1) & ACCUM_MASK;
to thisint shift = s1 >> (ACCUM_SIZE-3); uint64_t result = (rotl( s0, shift ) + rotl( s1, shift )) & ACCUM_MASK;
It was just a quick hack to see what happened using the existing rotl() function as defined in my earlier source.I remember getting the idea from watching Melissa's lecture where she talks about using the high bits for dynamically rotating in her algorithms.
Have you had a chance to try the parity idea, sum and state?
Here's the pseudo-code:
I like to know how this compares to the top byte of Xoroshiro32+ [14,2,7] but I don't have the testing tools.
Here's the two modifications to algorithm:
int shift; uint64_t result, parity = 0; for( shift=1; shift < ACCUM_SIZE; shift++ ) { parity = parity ^ (s0 >> shift) ^ (s1 >> shift); } result = (((s0 + s1) & -2LL) | (parity & 1LL)) & ACCUM_MASK;
int shift; uint64_t result, parity = 0; for( shift=1; shift < ACCUM_SIZE; shift++ ) { parity = parity ^ (s0 >> shift) ^ (s1 >> shift); } result = (((s0 + s1) >> 1) | (parity << (ACCUM_SIZE-1))) & ACCUM_MASK;
EDIT: Oops, I see a bug in the second code snippet. Doesn't matter though, the first one proves that parity is no good.
EDIT2: Hmm, I've also left out the lsbit of s0 and s1 ... I'll do that all again ... but I doubt it'll make it better ...
Xoroshiro32p+ PractRand Score Table - Run 2017-10-28 01:45:41 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 msBit ============================================================================= [ 1 2 8] 512K 32M 64M 16M 64M 64M 32M 1G [ 2 1 11] 512K 8M 8M 128K 128K 128K 256K 1G [ 2 1 15] 64K 128K 128K 32K 32K 64K 32K 1G [ 2 2 7] 512K 8M 8M 32M 32M 32M 4M 1G [ 2 6 15] 512K 2M 4M 16M 128K 128K 128K 1G [ 2 8 7] 512K 8M 16M 128M 32M 32M 4M 1G [ 2 9 9] 128K 256K 256K 16K 128K 512K 64M 1G [ 2 11 3] 512K 2M 4M 512K 256K 256K 256K 1G [ 3 2 6] 512K 64M 32M 2M 2M 2M 1M 1G [ 3 3 10] 512K 16M 16M 256K 256K 512K 64M 1G [ 3 11 2] 256K 256K 512K 128K 64K 64K 64K 1G [ 3 11 14] 512K 4M 4M 256K 256K 512K 512K 1G [ 4 1 9] 512K 512K 1M 512K 8M 256K 512K 512M [ 4 7 15] 512K 64M 64M 128M 512K 512K 512K 1G [ 4 8 5] 512K 512M 128M 32M 8M 4M 2M 1G [ 5 2 6] 512K 128M 128M 8M 4M 4M 4M 1G [ 5 8 4] 512K 4M 16M 4M 1M 256K 128K 64M [ 5 14 12] 512K 128M 64M 2M 4M 16M 128M 1G [ 6 2 3] 512K 32M 32M 1M 1M 1M 2M 2M [ 6 2 5] 512K 16M 16M 256K 512K 512K 512K 4M [ 6 2 11] 512K 512M 256M 8M 8M 256M 128M 1G [ 6 3 15] 512K 32M 32M 1M 1M 2M 4M 1G [ 6 14 11] 512K 128M 128M 32M 64M 128M 16M 1G [ 7 1 8] 512K 64M 128M 16M 128M 16M 16M 1G [ 7 2 2] 512K 16M 16M 2M 2M 4M 4M 4M [ 7 2 10] 512K 1G 512M 128M 256M 64M 32M 1G [ 7 2 14] 512K 512M 256M 4M 8M 32M 32M 256M [ 7 8 2] 512K 512M 512M 64M 8M 4M 4M 512M [ 7 10 10] 512K 128M 256M 32M 32M 256M 128M 1G [ 7 15 8] 512K 2M 2M 256K 1M 512K 256K 1G [ 8 1 7] 512K 1M 4M 1M 1M 1M 1M 64M [ 8 2 1] 512K 32M 64M 8M 8M 16M 16M 16M [ 8 5 13] 512K 32M 64M 16M 8M 128M 128M 1G [ 8 7 15] 512K 2M 2M 512K 2M 2M 1M 1G [ 8 9 13] 512K 512M 128M 512M 1G 512M 128M 1G [ 8 15 7] 512K 512K 1M 256K 512K 512K 256K 1G [ 9 1 4] 512K 32M 64M 8M 16M 16M 16M 64M [ 9 2 14] 512K 2G 1G 128M 256M 64M 128M 1G [ 9 8 14] 512K 16M 32M 256M 32M 32M 64M 1G [ 9 9 2] 512K 32M 64M 16M 16M 16M 32M 64M [10 2 7] 512K 256M 128M 16M 16M 64M 32M 1G [10 3 3] 512K 64M 64M 32M 32M 32M 64M 256M [10 3 11] 512K 2G 1G 256M 256M 512M 128M 1G [10 5 13] 512K 2G 512M 1G 512M 256M 128M 512M [10 7 11] 512K 512M 128M 512M 128M 512M 128M 1G [10 10 7] 512K 16M 16M 16M 16M 512M 128M 512M [11 1 2] 512K 256M 256M 32M 32M 32M 32M 1G [11 1 14] 512K 1G 1G 32M 16M 16M 16M 1G [11 2 6] 512K 512M 512M 32M 64M 512M 128M 512M [11 3 10] 512K 32M 32M 64M 1M 128K 64K 128M [11 7 10] 512K 4M 8M 32M 128K 128K 64K 1G [11 8 12] 512K 256M 256M 512M 512M 512M 128M 1G [11 10 12] 512K 128M 128M 1G 512M 512M 128M 1G [11 11 14] 512K 64M 128M 16M 16M 16M 8M 1G [11 14 6] 512K 256M 128M 256M 64M 512M 128M 1G [12 4 15] 512K 512M 1G 128M 64M 16M 16M 1G [12 8 11] 512K 4M 4M 64M 128K 128K 64K 1G [12 10 11] 512K 2M 2M 128K 128K 128K 64K 1G [12 10 13] 512K 32M 64M 16M 16M 32M 8M 1G [12 14 5] 512K 32M 32M 128M 128M 64M 64M 1G [13 3 14] 512K 512M 512M 64M 32M 16M 8M 1G [13 5 8] 512K 256M 256M 512M 128M 256M 128M 1G [13 5 10] 512K 64M 32M 64M 16M 256K 256K 1G [13 9 8] 512K 128M 64M 512M 512M 512M 128M 1G [13 10 12] 512K 2M 2M 128K 128K 128K 64K 512M [13 11 14] 512K 16M 16M 8M 8M 8M 8M 1G [13 12 14] 512K 16M 16M 8M 8M 4M 4M 1G [13 13 14] 512K 16M 16M 8M 8M 8M 8M 1G [14 1 11] 512K 64M 32M 256K 256K 256K 256K 1G [14 2 7] 512K 1G 512M 1G 512M 512M 128M 1G [14 2 9] 512K 512M 256M 128M 128M 64M 128M 1G [14 3 13] 512K 16M 16M 1M 256K 64K 64K 1G [14 8 9] 512K 128M 64M 1G 64M 64M 64M 1G [14 11 3] 512K 512M 512M 256M 128M 128M 64M 1G [14 11 11] 512K 16M 8M 256K 256K 128K 256K 1G [14 11 13] 256K 512K 512K 128K 64K 64K 64K 1G [14 12 13] 256K 512K 256K 128K 64K 64K 64K 1G [14 13 13] 256K 128K 256K 64K 64K 128K 64K 256M [15 1 2] 512K 2M 2M 1M 512K 512K 256K 1G [15 3 6] 512K 1G 512M 512M 512M 256M 128M 1G [15 4 12] 512K 128M 64M 128M 8M 256K 256K 1G [15 6 2] 512K 128M 256M 128M 128M 32M 16M 1G [15 7 4] 512K 512M 128M 512M 512M 256M 128M 1G [15 7 8] 512K 64M 64M 32M 32M 4M 2M 512M
However, the problem with those type is they're costly to have a hardware implementation because of the multiply.
int shift; uint64_t result, parity = s0 ^ s1; for( shift=1; shift < ACCUM_SIZE; shift++ ) { parity = parity ^ (s0 >> shift) ^ (s1 >> shift); } result = (((s0 + s1) & -2LL) | (parity & 1LL)) & ACCUM_MASK;
And the score table still looks the same:
Xoroshiro32r+ PractRand Score Table - Run 2017-10-28 02:23:10 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 msBit ============================================================================= [ 1 2 8] 512K 32M 64M 16M 64M 64M 32M 1G [ 2 1 11] 512K 8M 8M 128K 128K 128K 256K 1G [ 2 1 15] 64K 128K 128K 32K 32K 64K 32K 1G [ 2 2 7] 512K 8M 8M 32M 32M 32M 4M 1G [ 2 6 15] 512K 2M 4M 16M 128K 128K 128K 1G [ 2 8 7] 512K 8M 16M 128M 32M 32M 4M 1G [ 2 9 9] 256K 256K 256K 16K 128K 512K 64M 1G [ 2 11 3] 512K 2M 4M 512K 256K 256K 256K 1G [ 3 2 6] 512K 64M 32M 2M 2M 2M 1M 1G [ 3 3 10] 512K 16M 16M 256K 256K 512K 64M 1G [ 3 11 2] 512K 256K 512K 128K 64K 64K 64K 1G [ 3 11 14] 512K 4M 4M 256K 256K 512K 512K 1G [ 4 1 9] 512K 512K 1M 512K 8M 256K 512K 512M [ 4 7 15] 512K 64M 64M 128M 512K 512K 512K 1G [ 4 8 5] 512K 512M 128M 32M 8M 4M 2M 1G [ 5 2 6] 512K 128M 128M 8M 4M 4M 4M 1G [ 5 8 4] 512K 4M 16M 4M 1M 256K 128K 64M [ 5 14 12] 512K 128M 64M 2M 4M 16M 128M 1G [ 6 2 3] 512K 32M 32M 1M 1M 1M 2M 2M [ 6 2 5] 512K 16M 16M 256K 512K 512K 512K 4M [ 6 2 11] 512K 512M 256M 8M 8M 256M 128M 1G [ 6 3 15] 512K 32M 32M 1M 1M 2M 4M 1G [ 6 14 11] 512K 128M 128M 32M 64M 128M 16M 1G [ 7 1 8] 512K 64M 128M 16M 128M 16M 16M 1G [ 7 2 2] 512K 16M 16M 2M 2M 4M 4M 4M [ 7 2 10] 512K 1G 512M 128M 256M 64M 32M 1G [ 7 2 14] 512K 512M 256M 4M 8M 32M 32M 256M [ 7 8 2] 512K 512M 512M 64M 8M 4M 4M 512M [ 7 10 10] 512K 128M 256M 32M 32M 256M 128M 1G [ 7 15 8] 512K 2M 2M 256K 1M 512K 256K 1G [ 8 1 7] 512K 1M 4M 1M 1M 1M 1M 64M [ 8 2 1] 512K 32M 64M 8M 8M 16M 16M 16M [ 8 5 13] 512K 32M 64M 16M 8M 128M 128M 1G [ 8 7 15] 512K 2M 2M 512K 2M 2M 1M 1G [ 8 9 13] 512K 512M 128M 512M 1G 512M 128M 1G [ 8 15 7] 512K 512K 1M 256K 512K 512K 256K 1G [ 9 1 4] 512K 32M 64M 8M 16M 16M 16M 64M [ 9 2 14] 512K 2G 1G 128M 256M 64M 128M 1G [ 9 8 14] 512K 16M 32M 256M 32M 32M 64M 1G [ 9 9 2] 512K 32M 64M 16M 16M 16M 32M 64M [10 2 7] 512K 256M 128M 16M 16M 64M 32M 1G [10 3 3] 512K 64M 64M 32M 32M 32M 64M 256M [10 3 11] 512K 2G 1G 256M 256M 512M 128M 1G [10 5 13] 512K 2G 512M 1G 512M 256M 128M 512M [10 7 11] 512K 512M 128M 512M 128M 512M 128M 1G [10 10 7] 512K 16M 16M 16M 16M 512M 128M 512M [11 1 2] 512K 256M 256M 32M 32M 32M 32M 1G [11 1 14] 512K 1G 1G 32M 16M 16M 16M 1G [11 2 6] 512K 512M 512M 32M 64M 512M 128M 512M [11 3 10] 512K 32M 32M 64M 1M 128K 64K 128M [11 7 10] 512K 4M 8M 32M 128K 128K 64K 1G [11 8 12] 512K 256M 256M 512M 512M 512M 128M 1G [11 10 12] 512K 128M 128M 1G 512M 512M 128M 1G [11 11 14] 512K 64M 128M 16M 16M 16M 8M 1G [11 14 6] 512K 256M 128M 256M 64M 512M 128M 1G [12 4 15] 512K 512M 1G 128M 64M 16M 16M 1G [12 8 11] 512K 4M 4M 64M 128K 128K 64K 1G [12 10 11] 512K 2M 2M 128K 128K 128K 64K 1G [12 10 13] 512K 32M 64M 16M 16M 32M 8M 1G [12 14 5] 512K 32M 32M 128M 128M 64M 64M 1G [13 3 14] 512K 512M 512M 64M 32M 16M 8M 1G [13 5 8] 512K 256M 256M 512M 128M 256M 128M 1G [13 5 10] 512K 64M 32M 64M 16M 256K 256K 1G [13 9 8] 512K 128M 64M 512M 512M 512M 128M 1G [13 10 12] 512K 2M 2M 128K 128K 128K 64K 512M [13 11 14] 512K 16M 16M 8M 8M 8M 8M 1G [13 12 14] 512K 16M 16M 8M 8M 4M 4M 1G [13 13 14] 512K 16M 16M 8M 8M 8M 8M 1G [14 1 11] 512K 64M 32M 256K 256K 256K 256K 1G [14 2 7] 512K 1G 512M 1G 512M 512M 128M 1G [14 2 9] 512K 512M 256M 128M 128M 64M 128M 1G [14 3 13] 512K 16M 16M 1M 256K 64K 64K 1G [14 8 9] 512K 128M 64M 1G 64M 64M 64M 1G [14 11 3] 512K 512M 512M 256M 128M 128M 64M 1G [14 11 11] 512K 16M 8M 256K 256K 128K 256K 1G [14 11 13] 512K 512K 512K 128K 64K 64K 64K 1G [14 12 13] 256K 512K 256K 128K 64K 64K 64K 1G [14 13 13] 256K 128K 256K 64K 64K 128K 64K 256M [15 1 2] 512K 2M 2M 1M 512K 512K 256K 1G [15 3 6] 512K 1G 512M 512M 512M 256M 128M 1G [15 4 12] 512K 128M 64M 128M 8M 256K 256K 1G [15 6 2] 512K 128M 256M 128M 128M 32M 16M 1G [15 7 4] 512K 512M 128M 512M 512M 256M 128M 1G [15 7 8] 512K 64M 64M 32M 32M 4M 2M 512M
CMWC is yet another PRNG by the late Prof. Marsaglia.
The numbers in the pseudo-code were chosen to make it as easy as possible to implement on 8-bit CPUs but does that affect the quality? Multiply by 253 is a byte shift and three subtracts.
Here's the scores from sum parity replacing the lsbit:
Xoroshiro32p+ PractRand Score Table - Run 2017-10-28 03:01:16 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 msBit ============================================================================= [ 1 2 8] 8M 32M 64M 16M 64M 64M 32M 1G [ 2 1 11] 8M 8M 8M 128K 128K 128K 256K 1G [ 2 1 15] 64K 128K 128K 32K 32K 64K 32K 1G [ 2 2 7] 4M 8M 8M 32M 32M 32M 4M 1G [ 2 6 15] 1M 2M 4M 16M 128K 128K 128K 1G [ 2 8 7] 4M 8M 16M 128M 32M 32M 4M 1G [ 2 9 9] 256K 256K 256K 16K 128K 512K 64M 1G [ 2 11 3] 2M 2M 4M 512K 256K 256K 256K 1G [ 3 2 6] 16M 64M 32M 2M 2M 2M 1M 1G [ 3 3 10] 16M 16M 16M 256K 256K 512K 64M 1G [ 3 11 2] 256K 256K 512K 128K 64K 64K 64K 1G [ 3 11 14] 8M 4M 4M 256K 256K 512K 512K 1G [ 4 1 9] 1M 512K 1M 512K 8M 256K 512K 512M [ 4 7 15] 16M 64M 64M 128M 512K 512K 512K 1G [ 4 8 5] 16M 512M 128M 32M 8M 4M 2M 1G [ 5 2 6] 16M 128M 128M 8M 4M 4M 4M 1G [ 5 8 4] 2M 4M 16M 4M 1M 256K 128K 64M [ 5 14 12] 16M 128M 64M 2M 4M 16M 128M 1G [ 6 2 3] 16M 32M 32M 1M 1M 1M 2M 2M [ 6 2 5] 8M 16M 16M 256K 512K 512K 512K 4M [ 6 2 11] 16M 512M 256M 8M 8M 256M 128M 1G [ 6 3 15] 16M 32M 32M 1M 1M 2M 4M 1G [ 6 14 11] 16M 128M 128M 32M 64M 128M 16M 1G [ 7 1 8] 1M 64M 128M 16M 128M 16M 16M 1G [ 7 2 2] 8M 16M 16M 2M 2M 4M 4M 4M [ 7 2 10] 16M 1G 512M 128M 256M 64M 32M 1G [ 7 2 14] 16M 512M 256M 4M 8M 32M 32M 256M [ 7 8 2] 16M 512M 512M 64M 8M 4M 4M 512M [ 7 10 10] 16M 128M 256M 32M 32M 256M 128M 1G [ 7 15 8] 1M 2M 2M 256K 1M 512K 256K 1G [ 8 1 7] 1M 1M 4M 1M 1M 1M 1M 64M [ 8 2 1] 4M 32M 64M 8M 8M 16M 16M 16M [ 8 5 13] 16M 32M 64M 16M 8M 128M 128M 1G [ 8 7 15] 1M 2M 2M 512K 2M 2M 1M 1G [ 8 9 13] 16M 512M 128M 512M 1G 512M 128M 1G [ 8 15 7] 512K 512K 1M 256K 512K 512K 256K 1G [ 9 1 4] 16M 32M 64M 8M 16M 16M 16M 64M [ 9 2 14] 16M 2G 1G 128M 256M 64M 128M 1G [ 9 8 14] 8M 16M 32M 256M 32M 32M 64M 1G [ 9 9 2] 16M 32M 64M 16M 16M 16M 32M 64M [10 2 7] 16M 256M 128M 16M 16M 64M 32M 1G [10 3 3] 16M 64M 64M 32M 32M 32M 64M 256M [10 3 11] 16M 2G 1G 256M 256M 512M 128M 1G [10 5 13] 16M 2G 512M 1G 512M 256M 128M 512M [10 7 11] 16M 512M 128M 512M 128M 512M 128M 1G [10 10 7] 16M 16M 16M 16M 16M 512M 128M 512M [11 1 2] 16M 256M 256M 32M 32M 32M 32M 1G [11 1 14] 16M 1G 1G 32M 16M 16M 16M 1G [11 2 6] 16M 512M 512M 32M 64M 512M 128M 512M [11 3 10] 8M 32M 32M 64M 1M 128K 64K 128M [11 7 10] 4M 4M 8M 32M 128K 128K 64K 1G [11 8 12] 16M 256M 256M 512M 512M 512M 128M 1G [11 10 12] 16M 128M 128M 1G 512M 512M 128M 1G [11 11 14] 16M 64M 128M 16M 16M 16M 8M 1G [11 14 6] 16M 256M 128M 256M 64M 512M 128M 1G [12 4 15] 16M 512M 1G 128M 64M 16M 16M 1G [12 8 11] 2M 4M 4M 64M 128K 128K 64K 1G [12 10 11] 2M 2M 2M 128K 128K 128K 64K 1G [12 10 13] 16M 32M 64M 16M 16M 32M 8M 1G [12 14 5] 16M 32M 32M 128M 128M 64M 64M 1G [13 3 14] 16M 512M 512M 64M 32M 16M 8M 1G [13 5 8] 16M 256M 256M 512M 128M 256M 128M 1G [13 5 10] 16M 64M 32M 64M 16M 256K 256K 1G [13 9 8] 16M 128M 64M 512M 512M 512M 128M 1G [13 10 12] 1M 2M 2M 128K 128K 128K 64K 512M [13 11 14] 16M 16M 16M 8M 8M 8M 8M 1G [13 12 14] 16M 16M 16M 8M 8M 4M 4M 1G [13 13 14] 16M 16M 16M 8M 8M 8M 8M 1G [14 1 11] 16M 64M 32M 256K 256K 256K 256K 1G [14 2 7] 16M 1G 512M 1G 512M 512M 128M 1G [14 2 9] 16M 512M 256M 128M 128M 64M 128M 1G [14 3 13] 8M 16M 16M 1M 256K 64K 64K 1G [14 8 9] 16M 128M 64M 1G 64M 64M 64M 1G [14 11 3] 16M 512M 512M 256M 128M 128M 64M 1G [14 11 11] 8M 16M 8M 256K 256K 128K 256K 1G [14 11 13] 512K 512K 512K 128K 64K 64K 64K 1G [14 12 13] 256K 512K 256K 128K 64K 64K 64K 1G [14 13 13] 256K 128K 256K 64K 64K 128K 64K 256M [15 1 2] 2M 2M 2M 1M 512K 512K 256K 1G [15 3 6] 16M 1G 512M 512M 512M 256M 128M 1G [15 4 12] 16M 128M 64M 128M 8M 256K 256K 1G [15 6 2] 16M 128M 256M 128M 128M 32M 16M 1G [15 7 4] 16M 512M 128M 512M 512M 256M 128M 1G [15 7 8] 16M 64M 64M 32M 32M 4M 2M 512M
And the relevant source snippet:
int shift; uint64_t result = (s0 + s1) & ACCUM_MASK; uint64_t parity = result; for( shift=1; shift < ACCUM_SIZE; shift++ ) { parity = parity ^ (result >> shift); } result = ((result & -2LL) | (parity & 1LL)) & ACCUM_MASK;
This also proves that the state on its own without summing is hopeless.
Because PRN[n] is a function of PRN[n-r] not PRN[n-1]. If r=4096, a new PRN won't be used to generate another PRN until 4096 (or 4095) more iterations have been done, hence the lag.
EDIT:
This would take too long to be worth doing on the P2, probably.
That said, I'm all for squeezing 16 useful bits out of the XORO32 instruction.
The outcomes are just boggling me now! For this tiny change, I think it's the worst lsbit result I've had yet!
So the only change is start by setting parity to zero instead of whatever the sum was. This effectively skips the lsbit of the sum for the parity.
int shift; uint64_t result = (s0 + s1) & ACCUM_MASK; uint64_t parity = 0; for( shift=1; shift < ACCUM_SIZE; shift++ ) { parity = parity ^ (result >> shift); } result = ((result & -2LL) | (parity & 1LL)) & ACCUM_MASK;
Xoroshiro32p+ PractRand Score Table - Run 2017-10-28 04:21:39 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 msBit ============================================================================= [ 1 2 8] 8K 32M 64M 16M 64M 64M 32M 1G [ 2 1 11] 16K 8M 8M 128K 128K 128K 256K 1G [ 2 1 15] 8K 128K 128K 32K 32K 64K 32K 1G [ 2 2 7] 8K 8M 8M 32M 32M 32M 4M 1G [ 2 6 15] 16K 2M 4M 16M 128K 128K 128K 1G [ 2 8 7] 8K 8M 16M 128M 32M 32M 4M 1G [ 2 9 9] 16K 256K 256K 16K 128K 512K 64M 1G [ 2 11 3] 16K 2M 4M 512K 256K 256K 256K 1G [ 3 2 6] 16K 64M 32M 2M 2M 2M 1M 1G [ 3 3 10] 16K 16M 16M 256K 256K 512K 64M 1G [ 3 11 2] 4K 256K 512K 128K 64K 64K 64K 1G [ 3 11 14] 8K 4M 4M 256K 256K 512K 512K 1G [ 4 1 9] 8K 512K 1M 512K 8M 256K 512K 512M [ 4 7 15] 16K 64M 64M 128M 512K 512K 512K 1G [ 4 8 5] 16K 512M 128M 32M 8M 4M 2M 1G [ 5 2 6] 8K 128M 128M 8M 4M 4M 4M 1G [ 5 8 4] 8K 4M 16M 4M 1M 256K 128K 64M [ 5 14 12] 16K 128M 64M 2M 4M 16M 128M 1G [ 6 2 3] 8K 32M 32M 1M 1M 1M 2M 2M [ 6 2 5] 16K 16M 16M 256K 512K 512K 512K 4M [ 6 2 11] 16K 512M 256M 8M 8M 256M 128M 1G [ 6 3 15] 8K 32M 32M 1M 1M 2M 4M 1G [ 6 14 11] 8K 128M 128M 32M 64M 128M 16M 1G [ 7 1 8] 8K 64M 128M 16M 128M 16M 16M 1G [ 7 2 2] 16K 16M 16M 2M 2M 4M 4M 4M [ 7 2 10] 8K 1G 512M 128M 256M 64M 32M 1G [ 7 2 14] 8K 512M 256M 4M 8M 32M 32M 256M [ 7 8 2] 8K 512M 512M 64M 8M 4M 4M 512M [ 7 10 10] 8K 128M 256M 32M 32M 256M 128M 1G [ 7 15 8] 16K 2M 2M 256K 1M 512K 256K 1G [ 8 1 7] 8K 1M 4M 1M 1M 1M 1M 64M [ 8 2 1] 16K 32M 64M 8M 8M 16M 16M 16M [ 8 5 13] 8K 32M 64M 16M 8M 128M 128M 1G [ 8 7 15] 16K 2M 2M 512K 2M 2M 1M 1G [ 8 9 13] 16K 512M 128M 512M 1G 512M 128M 1G [ 8 15 7] 16K 512K 1M 256K 512K 512K 256K 1G [ 9 1 4] 16K 32M 64M 8M 16M 16M 16M 64M [ 9 2 14] 8K 2G 1G 128M 256M 64M 128M 1G [ 9 8 14] 16K 16M 32M 256M 32M 32M 64M 1G [ 9 9 2] 4K 32M 64M 16M 16M 16M 32M 64M [10 2 7] 16K 256M 128M 16M 16M 64M 32M 1G [10 3 3] 16K 64M 64M 32M 32M 32M 64M 256M [10 3 11] 8K 2G 1G 256M 256M 512M 128M 1G [10 5 13] 16K 2G 512M 1G 512M 256M 128M 512M [10 7 11] 4K 512M 128M 512M 128M 512M 128M 1G [10 10 7] 8K 16M 16M 16M 16M 512M 128M 512M [11 1 2] 16K 256M 256M 32M 32M 32M 32M 1G [11 1 14] 8K 1G 1G 32M 16M 16M 16M 1G [11 2 6] 8K 512M 512M 32M 64M 512M 128M 512M [11 3 10] 16K 32M 32M 64M 1M 128K 64K 128M [11 7 10] 8K 4M 8M 32M 128K 128K 64K 1G [11 8 12] 16K 256M 256M 512M 512M 512M 128M 1G [11 10 12] 16K 128M 128M 1G 512M 512M 128M 1G [11 11 14] 16K 64M 128M 16M 16M 16M 8M 1G [11 14 6] 8K 256M 128M 256M 64M 512M 128M 1G [12 4 15] 16K 512M 1G 128M 64M 16M 16M 1G [12 8 11] 8K 4M 4M 64M 128K 128K 64K 1G [12 10 11] 16K 2M 2M 128K 128K 128K 64K 1G [12 10 13] 8K 32M 64M 16M 16M 32M 8M 1G [12 14 5] 16K 32M 32M 128M 128M 64M 64M 1G [13 3 14] 8K 512M 512M 64M 32M 16M 8M 1G [13 5 8] 16K 256M 256M 512M 128M 256M 128M 1G [13 5 10] 16K 64M 32M 64M 16M 256K 256K 1G [13 9 8] 16K 128M 64M 512M 512M 512M 128M 1G [13 10 12] 16K 2M 2M 128K 128K 128K 64K 512M [13 11 14] 16K 16M 16M 8M 8M 8M 8M 1G [13 12 14] 8K 16M 16M 8M 8M 4M 4M 1G [13 13 14] 8K 16M 16M 8M 8M 8M 8M 1G [14 1 11] 8K 64M 32M 256K 256K 256K 256K 1G [14 2 7] 16K 1G 512M 1G 512M 512M 128M 1G [14 2 9] 16K 512M 256M 128M 128M 64M 128M 1G [14 3 13] 16K 16M 16M 1M 256K 64K 64K 1G [14 8 9] 16K 128M 64M 1G 64M 64M 64M 1G [14 11 3] 8K 512M 512M 256M 128M 128M 64M 1G [14 11 11] 16K 16M 8M 256K 256K 128K 256K 1G [14 11 13] 8K 512K 512K 128K 64K 64K 64K 1G [14 12 13] 8K 512K 256K 128K 64K 64K 64K 1G [14 13 13] 8K 128K 256K 64K 64K 128K 64K 256M [15 1 2] 8K 2M 2M 1M 512K 512K 256K 1G [15 3 6] 8K 1G 512M 512M 512M 256M 128M 1G [15 4 12] 16K 128M 64M 128M 8M 256K 256K 1G [15 6 2] 8K 128M 256M 128M 128M 32M 16M 1G [15 7 4] 16K 512M 128M 512M 512M 256M 128M 1G [15 7 8] 16K 64M 64M 32M 32M 4M 2M 512M
EDIT: I tried starting parity as 1 instead of 0 for the hell of it but no surprise word0 stays full of 8K and 16K scores.