It's just that when I played with the KISS32 PRNG it would take forever to start looking "random" when seeded with zero or a low entropy seed. Much better to have at least an equal number of zero and one bits.
So I had to try the 1, 0 seed with xoroshiro128+. It gets into something looking "random" much quicker. Like so:
Casting to 16 bits and printing from the beginning gives the exact same sequence as you!
I lied; It does not! I got the same 32 bit result as Heater though... Something funky is going on!
Your result was throwing away the LSB. After I shifted my result by 1 bit, I got your same values from the 12345th iteration, as well as the bigger one.
I'm not getting the same output from verilog as the C version. I modified my C test harness to output the high 31 bits:
It starts out well, and many values along the way can be correct. With the odd error thrown in:
On AIX (Power 7) it fails with gcc as well, in 32-bit mode. Although differently than with xlc (IBM compiler). gcc 4.8.3
All's well with xlc when compiled with -q64.
'//' comes from C++, so I call it C++ comments, even if it's been adopted by C99 or whatever. Never liked the look of those comments either. I liked the Ada comments though:
I'm running Dieharder as we speak. On my i7-6700K @ 4GHz it will run for 30 minutes or so, using the raw input mode through the pipe mechanism in Linux!
The data rate is 9.51e+06 random character / sec.
I'm running Dieharder as we speak. On my i7-6700K @ 4GHz it will run for 30 minutes or so, using the raw input mode through the pipe mechanism in Linux!
The data rate is 9.51e+06 random character / sec.
My wire definition was fine. Changing it as suggested causes Icarus Verilog to give a warning:
$ iverilog -o xoroshiro128plus.vpp xoroshiro128plus_tb.v xoroshiro128plus.v
xoroshiro128plus_tb.v:17: warning: Port 3 (out) of rnd expects 63 bits, got 64.
xoroshiro128plus_tb.v:17: : Padding 1 high bits of the expression.
My C code test harness went wrong because I had a cast to intmax_t rather than uintmax_t so there was a signed shift right going on!
C and Verilog results match up fine now.
The C test harness:
//// Exercise the xoroshiro128+ PRNG //// For 64 bit output Compile with:// $ gcc -Wall -std=c99 -o xoroshiro128plus-test xoroshiro128plus-test.c//// For 32 bit output Compile with:// $ gcc -Wall -std=c99 -DOUTPUT_32 -o xoroshiro128plus-test xoroshiro128plus-test.c//// Also works on 32 bit machines (-m32)//// Note: 64 bit output will never contain zero. 32 bit output can be any 32 bit value.//#include<stdio.h>#include"xoroshiro128plus.c"// Seed MUST not be all zero.#define SEED_0 ((uint64_t)0x1)#define SEED_1 ((uint64_t)0x0) #define SAMPLE_SIZE 1000000#define HEAD 100 #define TAIL 8intmain(int argc, char* argv[]){
uint64_t random64;
uint64_t i;
char ellipsis = 1;
// Seed the state array
s[0] = SEED_0;
s[1] = SEED_1;
// Print some randomness for (i = 0; i < SAMPLE_SIZE; i++)
{
random64 = next();
if ((i < HEAD) || (i > (SAMPLE_SIZE - TAIL - 1)))
{
printf("%016jx\n", (uintmax_t)(random64 >> 1));
}
else
{
if (ellipsis)
{
printf("...\n");
ellipsis--;
}
}
}
return0;
}
I think it would be best to take the good 63 bits out of the xoroshiro128+ and come up with sixteen randomly-chosen static 32-bit patterns which each use 32 of those 63 bits. Most of those 63 bits should only be used 8 times across all 16 patterns, while a few will need to be used 9 times, since we only have 63 bits, not 64, from the xoroshiro128+. Each cog will get one of these 16 patterns for its own RND value.
Here is what the result needs to look like, except those "--" need to become "00".."62" values:
Those 32'hxxxxxxxx values are fractional parts of the square roots of the first 16 prime numbers. They are there to further distinguish patterns from each other. I know, they have no cryptographic benefit.
... come up with sixteen randomly-chosen static 32-bit patterns which each use 32 of those 63 bits.
Since the bits are random, do you need random pickoff ?
eg you could allocate 4 groups of 32 to 4 COGS and get different results, then you could repeat that byte shifted 4 x, and everyone has a different snapshot of the 128 bits.
If you really do want random taps, run this RNG to generate some
... come up with sixteen randomly-chosen static 32-bit patterns which each use 32 of those 63 bits.
Since the bits are random, do you need random pickoff ?
eg you could allocate 4 groups of 32 to 4 COGS and get different results, then you could repeat that byte shifted 4 x, and everyone has a different snapshot of the 128 bits.
If you really do want random taps, run this RNG to generate some
Random pickoff matters, relatively, between sets of 32 bits.
It's kind of a two-dimensional problem. Some divide-and-conquer is needed.
I assume the new FPGA images with this new random number generator will have the same instruction set and encoding as the current v16. Is that correct?
I assume the new FPGA images with this new random number generator will have the same instruction set and encoding as the current v16. Is that correct?
Yes. This is a very subtle change that almost nobody would recognize, in practice.
Comments
Yes!!! I'm able to output the 16 LSBs and they match your sequence. Here's what I saw:
0001
4001
0121
4122
C401
40E2
C544
5EE6
0CA2
C1B1
FE2C
ED27
656A
7508
D26F
44F2
80C3
4734
4DCE
1CB8
364B
3248
E5DB
A465
40D1
A65F
CE36
0C5E
2CBD
188D
66F4
8A00
2225
04EE
3834
9344
602A
7277
C0EA
CAD8
int main(int argc, char *argv[]) { PRNG prng; uint64_t result; for(uint32_t i = 0; i < 12345; ++i) { result = prng.getXoroshiro() >> 1; } std::cout << std::hex << result; return 0; }
In the PRNG class constructor
s[0] = 1; s[1] = 0;
PRNG get method
uint64_t PRNG::getXoroshiro() { const uint64_t s0 = s[0]; uint64_t s1 = s[1]; const uint64_t result = s0 + s1; s1 ^= s0; s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b s[1] = rotl(s1, 36); // c return result; }
Casting to 16 bits and printing from the beginning gives the exact same sequence as you!
I lied; It does not! I got the same 32 bit result as Heater though... Something funky is going on!
Your result was throwing away the LSB. After I shifted my result by 1 bit, I got your same values from the 12345th iteration, as well as the bigger one.
In the normal Verilog, I don't use the LSB, but for single-stepping, I made it output result[31:0].
Okay, we have it working then!
Right. There should be no surprises, though, if that xoroshiro128+ page was accurate in its claims.
It starts out well, and many values along the way can be correct. With the odd error thrown in:
C output:
0000000000000000
0040000800002000
000420100c000090
c0402b1808222091
04e812058c04e200
00ec10093ac4a071
34d82eb81903e2a2
efac90afec692f73
cd3293b975ac8651
25d3fd9d32afe0d8
d2232430c64dff16
ec48abda86ce7693
218df697abbbb2b5
1e994ae9f17d3a84
d34d862e033fe937
05f696ca131fa279
3d5db0e43348c061
263a9c552d81a39a
25dab82a39faa6e7
f55244eb04720e5c
Verilog output:
xxxxxxxxxxxxxxxx
0000000000000000
0040000800002000
000420100c000090
40402b1808222091 Err
04e812058c04e200
00ec10093ac4a071
34d82eb81903e2a2
6fac90afec692f73 Err
4d3293b975ac8651 Err
25d3fd9d32afe0d8
52232430c64dff16 Err
6c48abda86ce7693 Err
218df697abbbb2b5
1e994ae9f17d3a84
534d862e033fe937 Err
05f696ca131fa279
3d5db0e43348c061
263a9c552d81a39a
25dab82a39faa6e7
755244eb04720e5c Err
Something seems to be going wrong with the top bits.
Could be my test harness of course...
I made a verilog test harness:
module test; /* Make a reset that pulses low once. */ reg reset = 1; initial begin # 1 reset = 0; # 1 reset = 1; # 200 $stop; end /* Make a regular pulsing clock. */ reg clk = 0; always #5 clk = !clk; wire [62:0] value; rnd r1 (reset, clk, value); initial $monitor("%h", value); endmodule // test
I use the plain verilog version as I don't have the regscan macro available.
I run it under Icarus Verilog:
$ iverilog -o xoroshiro128plus.vpp xoroshiro128plus_tb.v xoroshiro128plus.v $ ./xoroshiro128plus.vpp
Chip, what have you done? I'm writing verilog!
On AIX (Power 7) it fails with gcc as well, in 32-bit mode. Although differently than with xlc (IBM compiler). gcc 4.8.3
All's well with xlc when compiled with -q64.
'//' comes from C++, so I call it C++ comments, even if it's been adopted by C99 or whatever. Never liked the look of those comments either. I liked the Ada comments though:
-- this is a comment
Yes, Heater needs to change this line:
wire [63:0] value;
And I'm reading C code.
The data rate is 9.51e+06 random character / sec.
/Johannes
That's the Chr(0..255) byte rate?
Super! I'm going to sleep now, but I'll be checking this thread for the results when I get up. If I could work all day and night, I would.
My wire definition was fine. Changing it as suggested causes Icarus Verilog to give a warning:
$ iverilog -o xoroshiro128plus.vpp xoroshiro128plus_tb.v xoroshiro128plus.v
xoroshiro128plus_tb.v:17: warning: Port 3 (out) of rnd expects 63 bits, got 64.
xoroshiro128plus_tb.v:17: : Padding 1 high bits of the expression.
My C code test harness went wrong because I had a cast to intmax_t rather than uintmax_t so there was a signed shift right going on!
C and Verilog results match up fine now.
The C test harness:
// // Exercise the xoroshiro128+ PRNG // // For 64 bit output Compile with: // $ gcc -Wall -std=c99 -o xoroshiro128plus-test xoroshiro128plus-test.c // // For 32 bit output Compile with: // $ gcc -Wall -std=c99 -DOUTPUT_32 -o xoroshiro128plus-test xoroshiro128plus-test.c // // Also works on 32 bit machines (-m32) // // Note: 64 bit output will never contain zero. 32 bit output can be any 32 bit value. // #include <stdio.h> #include "xoroshiro128plus.c" // Seed MUST not be all zero. #define SEED_0 ((uint64_t)0x1) #define SEED_1 ((uint64_t)0x0) #define SAMPLE_SIZE 1000000 #define HEAD 100 #define TAIL 8 int main(int argc, char* argv[]) { uint64_t random64; uint64_t i; char ellipsis = 1; // Seed the state array s[0] = SEED_0; s[1] = SEED_1; // Print some randomness for (i = 0; i < SAMPLE_SIZE; i++) { random64 = next(); if ((i < HEAD) || (i > (SAMPLE_SIZE - TAIL - 1))) { printf("%016jx\n", (uintmax_t)(random64 >> 1)); } else { if (ellipsis) { printf("...\n"); ellipsis--; } } } return 0; }
Results:C output: 0000000000000000 0040000800002000 000420100c000090 40402b1808222091 04e812058c04e200 00ec10093ac4a071 34d82eb81903e2a2 6fac90afec692f73 4d3293b975ac8651 25d3fd9d32afe0d8 52232430c64dff16 6c48abda86ce7693 218df697abbbb2b5 Verilog output: 0000000000000000 0040000800002000 000420100c000090 40402b1808222091 04e812058c04e200 00ec10093ac4a071 34d82eb81903e2a2 6fac90afec692f73 4d3293b975ac8651 25d3fd9d32afe0d8 52232430c64dff16 6c48abda86ce7693 218df697abbbb2b5
#=============================================================================# # dieharder version 3.31.1 Copyright 2003 Robert G. Brown # #=============================================================================# rng_name |rands/second| Seed | stdin_input_raw| 9.51e+06 |1595572412| #=============================================================================# test_name |ntup| tsamples |psamples| p-value |Assessment #=============================================================================# diehard_birthdays| 0| 100| 100|0.83379396| PASSED diehard_operm5| 0| 1000000| 100|0.25558220| PASSED diehard_rank_32x32| 0| 40000| 100|0.52897077| PASSED diehard_rank_6x8| 0| 100000| 100|0.38678624| PASSED diehard_bitstream| 0| 2097152| 100|0.29019586| PASSED diehard_opso| 0| 2097152| 100|0.69787334| PASSED diehard_oqso| 0| 2097152| 100|0.84063566| PASSED diehard_dna| 0| 2097152| 100|0.62916350| PASSED diehard_count_1s_str| 0| 256000| 100|0.92236112| PASSED diehard_count_1s_byt| 0| 256000| 100|0.59253912| PASSED diehard_parking_lot| 0| 12000| 100|0.83699181| PASSED diehard_2dsphere| 2| 8000| 100|0.72079614| PASSED diehard_3dsphere| 3| 4000| 100|0.92494671| PASSED diehard_squeeze| 0| 100000| 100|0.76276892| PASSED diehard_sums| 0| 100| 100|0.35705999| PASSED diehard_runs| 0| 100000| 100|0.95657635| PASSED diehard_runs| 0| 100000| 100|0.65493134| PASSED diehard_craps| 0| 200000| 100|0.12569810| PASSED diehard_craps| 0| 200000| 100|0.33025898| PASSED marsaglia_tsang_gcd| 0| 10000000| 100|0.41365816| PASSED marsaglia_tsang_gcd| 0| 10000000| 100|0.64628534| PASSED sts_monobit| 1| 100000| 100|0.16772087| PASSED sts_runs| 2| 100000| 100|0.91443072| PASSED sts_serial| 1| 100000| 100|0.95551409| PASSED sts_serial| 2| 100000| 100|0.11502741| PASSED sts_serial| 3| 100000| 100|0.24329000| PASSED sts_serial| 3| 100000| 100|0.95246396| PASSED sts_serial| 4| 100000| 100|0.86703713| PASSED sts_serial| 4| 100000| 100|0.94258614| PASSED sts_serial| 5| 100000| 100|0.04708983| PASSED sts_serial| 5| 100000| 100|0.15211892| PASSED sts_serial| 6| 100000| 100|0.30394607| PASSED sts_serial| 6| 100000| 100|0.16023796| PASSED sts_serial| 7| 100000| 100|0.70988015| PASSED sts_serial| 7| 100000| 100|0.75742550| PASSED sts_serial| 8| 100000| 100|0.87835733| PASSED sts_serial| 8| 100000| 100|0.74750865| PASSED sts_serial| 9| 100000| 100|0.06143689| PASSED sts_serial| 9| 100000| 100|0.11206147| PASSED sts_serial| 10| 100000| 100|0.68131394| PASSED sts_serial| 10| 100000| 100|0.16241324| PASSED sts_serial| 11| 100000| 100|0.85577452| PASSED sts_serial| 11| 100000| 100|0.84633148| PASSED sts_serial| 12| 100000| 100|0.25783932| PASSED sts_serial| 12| 100000| 100|0.96519459| PASSED sts_serial| 13| 100000| 100|0.87453165| PASSED sts_serial| 13| 100000| 100|0.80473399| PASSED sts_serial| 14| 100000| 100|0.90976972| PASSED sts_serial| 14| 100000| 100|0.46481592| PASSED sts_serial| 15| 100000| 100|0.33824103| PASSED sts_serial| 15| 100000| 100|0.35972982| PASSED sts_serial| 16| 100000| 100|0.24205718| PASSED sts_serial| 16| 100000| 100|0.58009166| PASSED rgb_bitdist| 1| 100000| 100|0.72455794| PASSED rgb_bitdist| 2| 100000| 100|0.08839988| PASSED rgb_bitdist| 3| 100000| 100|0.72988534| PASSED rgb_bitdist| 4| 100000| 100|0.02537498| PASSED rgb_bitdist| 5| 100000| 100|0.99235347| PASSED rgb_bitdist| 6| 100000| 100|0.92528294| PASSED rgb_bitdist| 7| 100000| 100|0.66209713| PASSED rgb_bitdist| 8| 100000| 100|0.39928028| PASSED rgb_bitdist| 9| 100000| 100|0.09853240| PASSED rgb_bitdist| 10| 100000| 100|0.33874593| PASSED rgb_bitdist| 11| 100000| 100|0.19805425| PASSED rgb_bitdist| 12| 100000| 100|0.89791848| PASSED rgb_minimum_distance| 2| 10000| 1000|0.11514702| PASSED rgb_minimum_distance| 3| 10000| 1000|0.09949151| PASSED rgb_minimum_distance| 4| 10000| 1000|0.71371410| PASSED rgb_minimum_distance| 5| 10000| 1000|0.42679082| PASSED rgb_permutations| 2| 100000| 100|0.28438129| PASSED rgb_permutations| 3| 100000| 100|0.13019670| PASSED rgb_permutations| 4| 100000| 100|0.93910058| PASSED rgb_permutations| 5| 100000| 100|0.34992170| PASSED rgb_lagged_sum| 0| 1000000| 100|0.91389520| PASSED rgb_lagged_sum| 1| 1000000| 100|0.45613194| PASSED rgb_lagged_sum| 2| 1000000| 100|0.49300600| PASSED rgb_lagged_sum| 3| 1000000| 100|0.71010206| PASSED rgb_lagged_sum| 4| 1000000| 100|0.94808677| PASSED rgb_lagged_sum| 5| 1000000| 100|0.98009360| PASSED rgb_lagged_sum| 6| 1000000| 100|0.93957434| PASSED rgb_lagged_sum| 7| 1000000| 100|0.12415484| PASSED rgb_lagged_sum| 8| 1000000| 100|0.67596394| PASSED rgb_lagged_sum| 9| 1000000| 100|0.00508887| PASSED rgb_lagged_sum| 10| 1000000| 100|0.94249200| PASSED rgb_lagged_sum| 11| 1000000| 100|0.60613939| PASSED rgb_lagged_sum| 12| 1000000| 100|0.26155684| PASSED rgb_lagged_sum| 13| 1000000| 100|0.08331932| PASSED rgb_lagged_sum| 14| 1000000| 100|0.99955094| WEAK rgb_lagged_sum| 15| 1000000| 100|0.85131082| PASSED rgb_lagged_sum| 16| 1000000| 100|0.80457554| PASSED rgb_lagged_sum| 17| 1000000| 100|0.36633132| PASSED rgb_lagged_sum| 18| 1000000| 100|0.95989992| PASSED rgb_lagged_sum| 19| 1000000| 100|0.09248094| PASSED rgb_lagged_sum| 20| 1000000| 100|0.79549433| PASSED rgb_lagged_sum| 21| 1000000| 100|0.52583117| PASSED rgb_lagged_sum| 22| 1000000| 100|0.40921376| PASSED rgb_lagged_sum| 23| 1000000| 100|0.58494999| PASSED rgb_lagged_sum| 24| 1000000| 100|0.01392463| PASSED rgb_lagged_sum| 25| 1000000| 100|0.85694357| PASSED rgb_lagged_sum| 26| 1000000| 100|0.63171725| PASSED rgb_lagged_sum| 27| 1000000| 100|0.42951286| PASSED rgb_lagged_sum| 28| 1000000| 100|0.37432100| PASSED rgb_lagged_sum| 29| 1000000| 100|0.97003672| PASSED rgb_lagged_sum| 30| 1000000| 100|0.63896568| PASSED rgb_lagged_sum| 31| 1000000| 100|0.30440336| PASSED rgb_lagged_sum| 32| 1000000| 100|0.93002810| PASSED rgb_kstest_test| 0| 10000| 1000|0.71159167| PASSED dab_bytedistrib| 0| 51200000| 1|0.92653216| PASSED dab_dct| 256| 50000| 1|0.56538469| PASSED Preparing to run test 207. ntuple = 0 dab_filltree| 32| 15000000| 1|0.72218965| PASSED dab_filltree| 32| 15000000| 1|0.70765494| PASSED Preparing to run test 208. ntuple = 0 dab_filltree2| 0| 5000000| 1|0.25990946| PASSED dab_filltree2| 1| 5000000| 1|0.51931629| PASSED Preparing to run test 209. ntuple = 0 dab_monobit2| 12| 65000000| 1|0.60109807| PASSED
I think we have a winner.
It will take 5 hours to complete!
The first few tests looks like this
#=============================================================================# # dieharder version 3.31.1 Copyright 2003 Robert G. Brown # #=============================================================================# rng_name |rands/second| Seed | stdin_input_raw| 1.82e+06 |1385337568| #=============================================================================# test_name |ntup| tsamples |psamples| p-value |Assessment #=============================================================================# diehard_birthdays| 0| 100| 100|0.00000000| FAILED diehard_operm5| 0| 1000000| 100|0.00000000| FAILED diehard_rank_32x32| 0| 40000| 100|0.00000000| FAILED diehard_rank_6x8| 0| 100000| 100|0.00000000| FAILED diehard_bitstream| 0| 2097152| 100|0.00000000| FAILED diehard_opso| 0| 2097152| 100|0.00000000| FAILED diehard_oqso| 0| 2097152| 100|0.00000000| FAILED diehard_dna| 0| 2097152| 100|0.00000000| FAILED diehard_count_1s_str| 0| 256000| 100|0.00000000| FAILED diehard_count_1s_byt| 0| 256000| 100|0.00000000| FAILED diehard_parking_lot| 0| 12000| 100|0.00000000| FAILED diehard_2dsphere| 2| 8000| 100|0.00000000| FAILED diehard_3dsphere| 3| 4000| 100|0.00000000| FAILED diehard_squeeze| 0| 100000| 100|0.00000000| FAILED diehard_sums| 0| 100| 100|0.31392291| PASSED diehard_runs| 0| 100000| 100|0.01074318| PASSED diehard_runs| 0| 100000| 100|0.00803469| PASSED diehard_craps| 0| 200000| 100|0.00000000| FAILED diehard_craps| 0| 200000| 100|0.00000000| FAILED
Without bit reordering I guess every test would fail!
We've got a much better random number generator now.
I think it would be best to take the good 63 bits out of the xoroshiro128+ and come up with sixteen randomly-chosen static 32-bit patterns which each use 32 of those 63 bits. Most of those 63 bits should only be used 8 times across all 16 patterns, while a few will need to be used 9 times, since we only have 63 bits, not 64, from the xoroshiro128+. Each cog will get one of these 16 patterns for its own RND value.
Here is what the result needs to look like, except those "--" need to become "00".."62" values:
wire [62:0] x = xoroshiro128plus_output; assign rnd = { {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h428A2F98, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h71374491, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'hB5C0FBCF, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'hE9B5DBA5, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h3956C25B, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h59F111F1, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h923F82A4, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'hAB1C5ED5, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'hD807AA98, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h12835B01, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h243185BE, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h550C7DC3, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h72BE5D74, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h80DEB1FE, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'h9BDC06A7, {x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--],x[--]} ^ 32'hC19BF174 };
Those 32'hxxxxxxxx values are fractional parts of the square roots of the first 16 prime numbers. They are there to further distinguish patterns from each other. I know, they have no cryptographic benefit.
Since the bits are random, do you need random pickoff ?
eg you could allocate 4 groups of 32 to 4 COGS and get different results, then you could repeat that byte shifted 4 x, and everyone has a different snapshot of the 128 bits.
If you really do want random taps, run this RNG to generate some
Random pickoff matters, relatively, between sets of 32 bits.
It's kind of a two-dimensional problem. Some divide-and-conquer is needed.
Yes. This is a very subtle change that almost nobody would recognize, in practice.