Random/LFSR on P2

David Betz · 2017-03-03 23:52

cgracey wrote: »

David Betz wrote: »

I assume the new FPGA images with this new random number generator will have the same instruction set and encoding as the current v16. Is that correct?

Yes. This is a very subtle change that almost nobody would recognize, in practice.

Thanks for confirming.

cgracey · 2017-03-04 00:02

Of the 63 PRNG bits, I figure that all 63 bits will get used 8 times and 8 of those 63 bits will get used 1 more time. This will keep the fan-out lowest. How to parse this, though?

evanh · 2017-03-04 00:47

cgracey wrote: »

Random pickoff matters, relatively, between sets of 32 bits.

It's kind of a two-dimensional problem. Some divide-and-conquer is needed.

How about dropping the constants and just use differing xor groupings of ss. Becomes 16 variant hashs of the full 63 bits down to 32 bits I guess.

evanh · 2017-03-04 00:53

cgracey wrote: »

... This will keep the fan-out lowest. How to parse this, though?

Is there a big advantage in having the generator being 64-bit in the first place?

evanh · 2017-03-04 01:27

evanh wrote: »

How about dropping the constants and just use differing xor groupings of ss. Becomes 16 variant hashs of the full 63 bits down to 32 bits I guess.

Here's an unrandomised example:

assign rnd = {
{x[31:00]^x[62:31]},
{x[39:08]^{x[62:39],x[07:00]}},
{}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}, {}
};

Ahle2 · 2017-03-04 09:02

Complete Dieharder results from the original PRNG using bit reordering (I will start a run without bit reordering now)

#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |rands/second|   Seed   |
stdin_input_raw|  1.82e+06  |1385337568|
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.00000000|  FAILED  
      diehard_operm5|   0|   1000000|     100|0.00000000|  FAILED  
  diehard_rank_32x32|   0|     40000|     100|0.00000000|  FAILED  
    diehard_rank_6x8|   0|    100000|     100|0.00000000|  FAILED  
   diehard_bitstream|   0|   2097152|     100|0.00000000|  FAILED  
        diehard_opso|   0|   2097152|     100|0.00000000|  FAILED  
        diehard_oqso|   0|   2097152|     100|0.00000000|  FAILED  
         diehard_dna|   0|   2097152|     100|0.00000000|  FAILED  
diehard_count_1s_str|   0|    256000|     100|0.00000000|  FAILED  
diehard_count_1s_byt|   0|    256000|     100|0.00000000|  FAILED  
 diehard_parking_lot|   0|     12000|     100|0.00000000|  FAILED  
    diehard_2dsphere|   2|      8000|     100|0.00000000|  FAILED  
    diehard_3dsphere|   3|      4000|     100|0.00000000|  FAILED  
     diehard_squeeze|   0|    100000|     100|0.00000000|  FAILED  
        diehard_sums|   0|       100|     100|0.31392291|  PASSED  
        diehard_runs|   0|    100000|     100|0.01074318|  PASSED  
        diehard_runs|   0|    100000|     100|0.00803469|  PASSED  
       diehard_craps|   0|    200000|     100|0.00000000|  FAILED  
       diehard_craps|   0|    200000|     100|0.00000000|  FAILED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.00000000|  FAILED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.00000000|  FAILED  
         sts_monobit|   1|    100000|     100|0.00000008|  FAILED  
            sts_runs|   2|    100000|     100|0.81212684|  PASSED  
          sts_serial|   1|    100000|     100|0.00000000|  FAILED  
          sts_serial|   2|    100000|     100|0.00000000|  FAILED  
          sts_serial|   3|    100000|     100|0.00000000|  FAILED  
          sts_serial|   3|    100000|     100|0.26169039|  PASSED  
          sts_serial|   4|    100000|     100|0.00000000|  FAILED  
          sts_serial|   4|    100000|     100|0.79338330|  PASSED  
          sts_serial|   5|    100000|     100|0.00000000|  FAILED  
          sts_serial|   5|    100000|     100|0.33393564|  PASSED  
          sts_serial|   6|    100000|     100|0.00000000|  FAILED  
          sts_serial|   6|    100000|     100|0.00000000|  FAILED  
          sts_serial|   7|    100000|     100|0.00000000|  FAILED  
          sts_serial|   7|    100000|     100|0.00000000|  FAILED  
          sts_serial|   8|    100000|     100|0.00000000|  FAILED  
          sts_serial|   8|    100000|     100|0.00000000|  FAILED  
          sts_serial|   9|    100000|     100|0.00000000|  FAILED  
          sts_serial|   9|    100000|     100|0.00000000|  FAILED  
          sts_serial|  10|    100000|     100|0.00000000|  FAILED  
          sts_serial|  10|    100000|     100|0.66663122|  PASSED  
          sts_serial|  11|    100000|     100|0.00000000|  FAILED  
          sts_serial|  11|    100000|     100|0.00000000|  FAILED  
          sts_serial|  12|    100000|     100|0.00000000|  FAILED  
          sts_serial|  12|    100000|     100|0.01778776|  PASSED  
          sts_serial|  13|    100000|     100|0.00000000|  FAILED  
          sts_serial|  13|    100000|     100|0.00000000|  FAILED  
          sts_serial|  14|    100000|     100|0.00000000|  FAILED  
          sts_serial|  14|    100000|     100|0.00000000|  FAILED  
          sts_serial|  15|    100000|     100|0.00000000|  FAILED  
          sts_serial|  15|    100000|     100|0.00000000|  FAILED  
          sts_serial|  16|    100000|     100|0.00000000|  FAILED  
          sts_serial|  16|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   1|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   2|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   3|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   4|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   5|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   6|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   7|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   8|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   9|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|  10|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|  11|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|  12|    100000|     100|0.00000000|  FAILED  
rgb_minimum_distance|   2|     10000|    1000|0.00000000|  FAILED  
rgb_minimum_distance|   3|     10000|    1000|0.00000000|  FAILED  
rgb_minimum_distance|   4|     10000|    1000|0.00000000|  FAILED  
rgb_minimum_distance|   5|     10000|    1000|0.00000000|  FAILED  
    rgb_permutations|   2|    100000|     100|0.99989978|   WEAK   
    rgb_permutations|   3|    100000|     100|0.00000000|  FAILED  
    rgb_permutations|   4|    100000|     100|0.00000000|  FAILED  
    rgb_permutations|   5|    100000|     100|0.00000000|  FAILED  
      rgb_lagged_sum|   0|   1000000|     100|0.61656083|  PASSED  
      rgb_lagged_sum|   1|   1000000|     100|0.83139925|  PASSED  
      rgb_lagged_sum|   2|   1000000|     100|0.14442315|  PASSED  
      rgb_lagged_sum|   3|   1000000|     100|0.64748367|  PASSED  
      rgb_lagged_sum|   4|   1000000|     100|0.97505881|  PASSED  
      rgb_lagged_sum|   5|   1000000|     100|0.65316692|  PASSED  
      rgb_lagged_sum|   6|   1000000|     100|0.37269121|  PASSED  
      rgb_lagged_sum|   7|   1000000|     100|0.66544910|  PASSED  
      rgb_lagged_sum|   8|   1000000|     100|0.62631348|  PASSED  
      rgb_lagged_sum|   9|   1000000|     100|0.31158884|  PASSED  
      rgb_lagged_sum|  10|   1000000|     100|0.09620060|  PASSED  
      rgb_lagged_sum|  11|   1000000|     100|0.50465463|  PASSED  
      rgb_lagged_sum|  12|   1000000|     100|0.94123827|  PASSED  
      rgb_lagged_sum|  13|   1000000|     100|0.87572616|  PASSED  
      rgb_lagged_sum|  14|   1000000|     100|0.29949509|  PASSED  
      rgb_lagged_sum|  15|   1000000|     100|0.91225994|  PASSED  
      rgb_lagged_sum|  16|   1000000|     100|0.85992301|  PASSED  
      rgb_lagged_sum|  17|   1000000|     100|0.10002499|  PASSED  
      rgb_lagged_sum|  18|   1000000|     100|0.26246601|  PASSED  
      rgb_lagged_sum|  19|   1000000|     100|0.18881850|  PASSED  
      rgb_lagged_sum|  20|   1000000|     100|0.56157055|  PASSED  
      rgb_lagged_sum|  21|   1000000|     100|0.05753914|  PASSED  
      rgb_lagged_sum|  22|   1000000|     100|0.30578792|  PASSED  
      rgb_lagged_sum|  23|   1000000|     100|0.24124016|  PASSED  
      rgb_lagged_sum|  24|   1000000|     100|0.52633383|  PASSED  
      rgb_lagged_sum|  25|   1000000|     100|0.28961217|  PASSED  
      rgb_lagged_sum|  26|   1000000|     100|0.13990288|  PASSED  
      rgb_lagged_sum|  27|   1000000|     100|0.72070403|  PASSED  
      rgb_lagged_sum|  28|   1000000|     100|0.92355423|  PASSED  
      rgb_lagged_sum|  29|   1000000|     100|0.70936471|  PASSED  
      rgb_lagged_sum|  30|   1000000|     100|0.55076650|  PASSED  
      rgb_lagged_sum|  31|   1000000|     100|0.64104031|  PASSED  
      rgb_lagged_sum|  32|   1000000|     100|0.04751119|  PASSED  
     rgb_kstest_test|   0|     10000|    1000|0.00000000|  FAILED  
     dab_bytedistrib|   0|  51200000|       1|0.00000000|  FAILED  
             dab_dct| 256|     50000|       1|0.00000000|  FAILED  
Preparing to run test 207.  ntuple = 0
        dab_filltree|  32|  15000000|       1|0.00000000|  FAILED  
        dab_filltree|  32|  15000000|       1|0.00000000|  FAILED  
Preparing to run test 208.  ntuple = 0
       dab_filltree2|   0|   5000000|       1|0.00000000|  FAILED  
       dab_filltree2|   1|   5000000|       1|0.00000000|  FAILED  
Preparing to run test 209.  ntuple = 0
        dab_monobit2|  12|  65000000|       1|1.00000000|  FAILED

cgracey · 2017-03-04 09:40

The XOR'ing idea is good, I think, but it would cause delays, since there'd be some logic to go through before interconnect amps and delays. Coming out of a flop is a lot faster.

I made a program which randomly selects sixteen 32-bit tap sets from the 63-bit PRNG output. It uses RealRandom to shuffle the data and then it optimizes the tap usage to minimize fan-out. It generates this, for example:

{!x[00], x[62],!x[02], x[28],!x[09], x[55], x[03],!x[34], x[45], x[37],!x[11], x[07],!x[41],!x[50],!x[52],!x[59],!x[27],!x[40], x[01], x[57], x[17],!x[53],!x[24],!x[22],!x[58], x[25], x[16],!x[29],!x[43], x[32],!x[14], x[30]},
{!x[47], x[49],!x[35],!x[38], x[33],!x[31], x[44], x[13], x[06], x[54],!x[05],!x[10],!x[23],!x[19],!x[51], x[20],!x[39],!x[18],!x[46],!x[42], x[36], x[08], x[04], x[60],!x[48],!x[56],!x[26], x[21],!x[12], x[61],!x[15], x[16]},
{!x[23],!x[21],!x[20],!x[47], x[10], x[48], x[51], x[07],!x[19], x[03],!x[50], x[60],!x[36], x[38],!x[33],!x[06],!x[28], x[27], x[00], x[53], x[24], x[52],!x[14],!x[29],!x[04], x[08], x[42],!x[13],!x[45],!x[56],!x[54], x[61]},
{!x[22], x[15], x[41],!x[05], x[59], x[58],!x[35],!x[26],!x[11],!x[12], x[31],!x[01], x[39],!x[43],!x[62],!x[25],!x[55],!x[32],!x[44],!x[02],!x[46],!x[37], x[09], x[17], x[49],!x[18],!x[34], x[40],!x[57], x[30],!x[56],!x[08]},
{!x[37],!x[42],!x[33], x[35], x[39], x[17],!x[10], x[09], x[23],!x[02],!x[60],!x[36], x[18],!x[00], x[32],!x[21],!x[27], x[34], x[20], x[53], x[48], x[15], x[55],!x[14], x[19],!x[62], x[04],!x[29], x[22],!x[28],!x[06], x[03]},
{!x[45], x[26], x[47],!x[43],!x[50],!x[11], x[57],!x[44], x[31],!x[12],!x[41], x[58],!x[13], x[25], x[07],!x[30], x[05],!x[61],!x[49],!x[59], x[46], x[01], x[52],!x[51],!x[32],!x[16],!x[24],!x[38],!x[54],!x[21], x[37], x[40]},
{ x[44],!x[30], x[52], x[62],!x[58], x[45], x[38],!x[55], x[00],!x[12], x[14], x[47],!x[13], x[39], x[53], x[40],!x[18], x[31],!x[03], x[41],!x[48],!x[27], x[09],!x[07],!x[17],!x[15],!x[56], x[20], x[16], x[51],!x[59],!x[11]},
{ x[04], x[06], x[22], x[26],!x[50], x[29],!x[34], x[19], x[60], x[05],!x[49], x[57],!x[33], x[10],!x[01], x[46], x[24],!x[08], x[25], x[23], x[35],!x[42], x[54], x[43],!x[36],!x[02],!x[31],!x[28], x[13],!x[15],!x[61],!x[53]},
{!x[27], x[44],!x[33], x[52],!x[34], x[29],!x[20],!x[35], x[01], x[11], x[60],!x[14],!x[45], x[10],!x[36], x[22],!x[21],!x[55], x[62],!x[46],!x[16],!x[26],!x[07],!x[30],!x[17], x[51], x[02],!x[48],!x[42], x[43],!x[08], x[19]},
{!x[38], x[49], x[05], x[28],!x[39], x[23],!x[57],!x[25],!x[18],!x[54],!x[61],!x[24],!x[09],!x[03], x[56], x[04], x[41], x[37], x[40],!x[47],!x[00],!x[06],!x[59],!x[50], x[51],!x[26],!x[22], x[32],!x[58], x[27],!x[12], x[48]},
{ x[12],!x[17], x[04],!x[54],!x[18], x[02],!x[35], x[44],!x[40], x[50], x[21], x[37], x[15], x[58],!x[19],!x[42], x[41], x[00],!x[47],!x[56], x[46], x[20], x[34], x[11], x[45], x[32],!x[10],!x[60], x[23],!x[62],!x[61], x[13]},
{ x[03], x[53],!x[08], x[33], x[29],!x[01], x[07],!x[28],!x[31],!x[24],!x[57], x[25], x[43],!x[49], x[55],!x[38],!x[06],!x[09], x[05],!x[59],!x[30], x[39], x[16],!x[52], x[14], x[21],!x[20],!x[36], x[13],!x[04],!x[37],!x[34]},
{!x[14],!x[36],!x[35],!x[05], x[45],!x[06],!x[54],!x[41], x[23],!x[15],!x[26],!x[49],!x[39], x[30], x[29],!x[38], x[22],!x[01], x[55],!x[43],!x[46], x[52],!x[53], x[40],!x[16], x[27],!x[00],!x[57],!x[18], x[62], x[28],!x[58]},
{ x[07], x[10], x[32], x[50], x[03],!x[12], x[59], x[09],!x[19], x[56],!x[44], x[25], x[47],!x[08], x[31],!x[51], x[17], x[42],!x[61],!x[02],!x[33], x[24], x[30], x[04],!x[43], x[48],!x[49],!x[11],!x[22],!x[29],!x[60],!x[38]},
{ x[42], x[37], x[10], x[08], x[61], x[21],!x[00], x[20], x[51], x[57],!x[33],!x[01],!x[03],!x[26],!x[32],!x[05],!x[13], x[19],!x[48],!x[31],!x[02],!x[60],!x[07],!x[36],!x[62], x[27],!x[34],!x[14], x[50],!x[16],!x[17], x[11]},
{!x[56],!x[23],!x[28], x[15], x[45], x[09],!x[24],!x[46],!x[53],!x[25],!x[52],!x[39],!x[40], x[44],!x[55], x[54], x[59],!x[21],!x[06], x[08],!x[18],!x[01],!x[11],!x[61], x[47],!x[41], x[12], x[49],!x[20],!x[29], x[35],!x[58]},

I wrote the generator in Spin. I realize in Spin2, we'll need a SWAP(v1,v2) instruction. We'll also need to clean up the string syntax and make it terser.

Here's the Spin program that generates the random tap sets:

Ahle2 · 2017-03-04 10:07

I tried to solve Chips programming challenge using brute force. Randomize everything and see if each bit gets tapped 8 or 9 times. Of course I did expect it to take a long time, but with a little bit of math and by measuring the factor between getting 3 and 4 bits right I can say it's not feasible. It would take 1,1013671599500632751e50 seconds on my computer!

cgracey · 2017-03-04 10:18

Ahle2 wrote: »

Chip, evanh,

I tried to solve Chips programming challenge using brute force. Randomize everything and see if each bit gets tapped 8 or 9 times. Of course I did expect it to take a long time, but with a little bit of math and by measuring the factor between getting 3 and 4 bits right I can say it's not feasible. It would take 1,1013671599500632751e50 seconds on my computer!

I made a program to do it in the post above. I just shuffle the deck, use the first 32, and track how many times each tap has been used. After shuffling the deck, though, I need to swap less-used taps into the lower 32 positions. It turns out that it uses 55 bits 8 times over and the other 8 bits 9 times over, which is optimal.

Thanks for working on this, Ahle2. I didn't know if anybody was going to work on it, so I spent some time figuring it out. It's been really quiet on the forum all day, after maybe 60 posts yesterday.

evanh · 2017-03-05 00:48

Chip,
If there was a desire to even that out to 8 uses for each bit then extending the ss result to 64 bits wouldn't take much. Just make the s0 + s1 adder have 65-bit result.

cgracey · 2017-03-05 03:06

evanh wrote: »

Chip,
If there was a desire to even that out to 8 uses for each bit then extending the ss result to 64 bits wouldn't take much. Just make the s0 + s1 adder have 65-bit result.

That would effectively be the carry output, right? I wonder if its behavior would pass all those tests that the other bits did.

Phil Pilgrim (PhiPi) · 2017-03-05 03:09

Chip,

What happens with your shuffle algo if the 64-bit random result turns out to have just one "1" bit in it? It could occur.

-Phil

cgracey · 2017-03-05 03:21

Phil Pilgrim (PhiPi) wrote: »

Chip,

What happens with your shuffle algo if the 64-bit random result turns out to have just one "1" bit in it? It could occur.

-Phil

I don't understand. That doesn't seem like a problem.

Someone said something that made me see things differently: If all those tests were passed in Dieharder, then every bit of output is like it's own TRNG. Therefore, it doesn't matter if you switch their positions around or statically invert some of them.

Phil Pilgrim (PhiPi) · 2017-03-05 03:25

I guess what I was thinking is that such an unbalance of ones vs. zeroes would increase the likelihood that two or more of the derivative 32-bit results would be identical -- in my hypothetical case, all zeroes. But maybe that's not a problem?

-Phil

cgracey · 2017-03-05 03:40

Phil Pilgrim (PhiPi) wrote: »

I guess what I was thinking is that such an unbalance of ones vs. zeroes would increase the likelihood that two or more of the derivative 32-bit results would be identical -- in my hypothetical case, all zeroes. But maybe that's not a problem?

-Phil

extreme imbalances are rare, but part of the idea, I guess.

cgracey · 2017-03-05 03:42

Ahle2 wrote: »

Running on average 2^32 iterations to see if it ever returns zero should just take some minutes seconds. Hopefully the PRNG is better than this!

This was pretty funny.

Phil Pilgrim (PhiPi) · 2017-03-05 04:42

Chip,

Perhaps I'm mistaken, but I believe what you want is for the 16 32-bit values derived from a single 64-bit value to be statistically independent from each other. Right? I guess what concerns me is that, as the imbalance of ones to zeroes in that 64-bit value increases, the likelihood of statistical independence of the 32-bit values decreases dramatically. Admittedly, this is just a gut feeling. Hopefully, the analysis of some Monte Carlo simulations will prove me wrong, but I would not put a wager on that being the case.

-Phil

evanh · 2017-03-05 08:04

cgracey wrote: »

evanh wrote: »

Chip,
If there was a desire to even that out to 8 uses for each bit then extending the ss result to 64 bits wouldn't take much. Just make the s0 + s1 adder have 65-bit result.

That would effectively be the carry output, right? I wonder if its behavior would pass all those tests that the other bits did.

Include the Carry, yep. Sequence doesn't change since the add exists outside the loop but obviously bit 63 of the result pops into existence, so that would have to be factored into any comparison.

Heater. · 2017-03-05 08:07

Yes, that is the problem. How to magically create 16 PRNG running in parallel from a single PRNG in such a way that one cannot tell they are related. The guys doing lots of simulation work on super computers worry about this a lot.

I'm not sure if that is even possible.

The best thing is to have 16 independent PRNG. That boils down to the problem of generating 16 seeds from a single seed in such a way that the PRNGS are all operating in different parts of the state space and will not overlap in state space during any feasible run time.

xoroshiro 128+ includes a method to solve this problem. One can take a single seed and generate up to 2^64 other seeds from it each of which can be used to drive a PRNG with a 2^64 state space that will not overlap with any other.

This is done with the JUMP function:

/* This is the jump function for the generator. It is equivalent
   to 2^64 calls to next(); it can be used to generate 2^64
   non-overlapping subsequences for parallel computations. */

void jump(void) {
	static const uint64_t JUMP[] = { 0xbeac0467eba5facb, 0xd86b048b86aa9922 };

	uint64_t s0 = 0;
	uint64_t s1 = 0;
	for(int i = 0; i < sizeof JUMP / sizeof *JUMP; i++)
		for(int b = 0; b < 64; b++) {
			if (JUMP[i] & 1ULL << b) {
				s0 ^= s[0];
				s1 ^= s[1];
			}
			next();
		}

	s[0] = s0;
	s[1] = s1;
}

My guess is this solution starts to get too big to put into hardware.

I'd be inclined to just arrange that no single PRNG value is ever used more than once. With or without any bit shuffling/inverting. That is to say be sure the PRNG advances by one step for every random number requested by anyone.

Ahle2 · 2017-03-05 08:11

Results from original P2 PRNG without bit reordering

#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |rands/second|   Seed   |
stdin_input_raw|  1.26e+07  |2130066915|
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.00000000|  FAILED  
      diehard_operm5|   0|   1000000|     100|0.00000000|  FAILED  
  diehard_rank_32x32|   0|     40000|     100|0.00000000|  FAILED  
    diehard_rank_6x8|   0|    100000|     100|0.00000000|  FAILED  
   diehard_bitstream|   0|   2097152|     100|0.00000000|  FAILED  
        diehard_opso|   0|   2097152|     100|0.00000000|  FAILED  
        diehard_oqso|   0|   2097152|     100|0.00000000|  FAILED  
         diehard_dna|   0|   2097152|     100|0.00000000|  FAILED  
diehard_count_1s_str|   0|    256000|     100|0.00000000|  FAILED  
diehard_count_1s_byt|   0|    256000|     100|0.00000000|  FAILED  
 diehard_parking_lot|   0|     12000|     100|0.00000000|  FAILED  
    diehard_2dsphere|   2|      8000|     100|0.00000000|  FAILED  
    diehard_3dsphere|   3|      4000|     100|0.00000000|  FAILED  
     diehard_squeeze|   0|    100000|     100|0.00000000|  FAILED  
        diehard_sums|   0|       100|     100|0.49236819|  PASSED  
        diehard_runs|   0|    100000|     100|0.00000000|  FAILED  
        diehard_runs|   0|    100000|     100|0.00000000|  FAILED  
       diehard_craps|   0|    200000|     100|0.00000000|  FAILED  
       diehard_craps|   0|    200000|     100|0.00000000|  FAILED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.00000000|  FAILED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.00000000|  FAILED  
         sts_monobit|   1|    100000|     100|0.00000005|  FAILED  
            sts_runs|   2|    100000|     100|0.00000000|  FAILED  
          sts_serial|   1|    100000|     100|0.00000041|  FAILED  
          sts_serial|   2|    100000|     100|0.00000000|  FAILED  
          sts_serial|   3|    100000|     100|0.00000000|  FAILED  
          sts_serial|   3|    100000|     100|0.00000000|  FAILED  
          sts_serial|   4|    100000|     100|0.00000000|  FAILED  
          sts_serial|   4|    100000|     100|0.00000000|  FAILED  
          sts_serial|   5|    100000|     100|0.00000000|  FAILED  
          sts_serial|   5|    100000|     100|0.00000000|  FAILED  
          sts_serial|   6|    100000|     100|0.00000000|  FAILED  
          sts_serial|   6|    100000|     100|0.00000000|  FAILED  
          sts_serial|   7|    100000|     100|0.00000000|  FAILED  
          sts_serial|   7|    100000|     100|0.00000000|  FAILED  
          sts_serial|   8|    100000|     100|0.00000000|  FAILED  
          sts_serial|   8|    100000|     100|0.00000000|  FAILED  
          sts_serial|   9|    100000|     100|0.00000000|  FAILED  
          sts_serial|   9|    100000|     100|0.00000000|  FAILED  
          sts_serial|  10|    100000|     100|0.00000000|  FAILED  
          sts_serial|  10|    100000|     100|0.00000000|  FAILED  
          sts_serial|  11|    100000|     100|0.00000000|  FAILED  
          sts_serial|  11|    100000|     100|0.00000000|  FAILED  
          sts_serial|  12|    100000|     100|0.00000000|  FAILED  
          sts_serial|  12|    100000|     100|0.00000000|  FAILED  
          sts_serial|  13|    100000|     100|0.00000000|  FAILED  
          sts_serial|  13|    100000|     100|0.00000000|  FAILED  
          sts_serial|  14|    100000|     100|0.00000000|  FAILED  
          sts_serial|  14|    100000|     100|0.00000000|  FAILED  
          sts_serial|  15|    100000|     100|0.00000000|  FAILED  
          sts_serial|  15|    100000|     100|0.00000000|  FAILED  
          sts_serial|  16|    100000|     100|0.00000000|  FAILED  
          sts_serial|  16|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   1|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   2|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   3|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   4|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   5|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   6|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   7|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   8|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|   9|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|  10|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|  11|    100000|     100|0.00000000|  FAILED  
         rgb_bitdist|  12|    100000|     100|0.00000000|  FAILED  
rgb_minimum_distance|   2|     10000|    1000|0.00000000|  FAILED  
rgb_minimum_distance|   3|     10000|    1000|0.00000000|  FAILED  
rgb_minimum_distance|   4|     10000|    1000|0.00000000|  FAILED  
rgb_minimum_distance|   5|     10000|    1000|0.00000000|  FAILED  
    rgb_permutations|   2|    100000|     100|0.31163984|  PASSED  
    rgb_permutations|   3|    100000|     100|0.00000000|  FAILED  
    rgb_permutations|   4|    100000|     100|0.00000000|  FAILED  
    rgb_permutations|   5|    100000|     100|0.00000000|  FAILED  
      rgb_lagged_sum|   0|   1000000|     100|0.98261992|  PASSED  
      rgb_lagged_sum|   1|   1000000|     100|0.48807245|  PASSED  
      rgb_lagged_sum|   2|   1000000|     100|0.35271091|  PASSED  
      rgb_lagged_sum|   3|   1000000|     100|0.77439189|  PASSED  
      rgb_lagged_sum|   4|   1000000|     100|0.75118990|  PASSED  
      rgb_lagged_sum|   5|   1000000|     100|0.83707751|  PASSED  
      rgb_lagged_sum|   6|   1000000|     100|0.24343711|  PASSED  
      rgb_lagged_sum|   7|   1000000|     100|0.97681044|  PASSED  
      rgb_lagged_sum|   8|   1000000|     100|0.74407693|  PASSED  
      rgb_lagged_sum|   9|   1000000|     100|0.85815772|  PASSED  
      rgb_lagged_sum|  10|   1000000|     100|0.88590719|  PASSED  
      rgb_lagged_sum|  11|   1000000|     100|0.28589723|  PASSED  
      rgb_lagged_sum|  12|   1000000|     100|0.41397485|  PASSED  
      rgb_lagged_sum|  13|   1000000|     100|0.28081672|  PASSED  
      rgb_lagged_sum|  14|   1000000|     100|0.81096109|  PASSED  
      rgb_lagged_sum|  15|   1000000|     100|0.20594569|  PASSED  
      rgb_lagged_sum|  16|   1000000|     100|0.86498183|  PASSED  
      rgb_lagged_sum|  17|   1000000|     100|0.99570450|   WEAK   
      rgb_lagged_sum|  18|   1000000|     100|0.85728156|  PASSED  
      rgb_lagged_sum|  19|   1000000|     100|0.64818851|  PASSED  
      rgb_lagged_sum|  20|   1000000|     100|0.45480794|  PASSED  
      rgb_lagged_sum|  21|   1000000|     100|0.53170652|  PASSED  
      rgb_lagged_sum|  22|   1000000|     100|0.88602349|  PASSED  
      rgb_lagged_sum|  23|   1000000|     100|0.99206933|  PASSED  
      rgb_lagged_sum|  24|   1000000|     100|0.55764663|  PASSED  
      rgb_lagged_sum|  25|   1000000|     100|0.83866851|  PASSED  
      rgb_lagged_sum|  26|   1000000|     100|0.45077292|  PASSED  
      rgb_lagged_sum|  27|   1000000|     100|0.79719375|  PASSED  
      rgb_lagged_sum|  28|   1000000|     100|0.99716738|   WEAK   
      rgb_lagged_sum|  29|   1000000|     100|0.82790995|  PASSED  
      rgb_lagged_sum|  30|   1000000|     100|0.85548821|  PASSED  
      rgb_lagged_sum|  31|   1000000|     100|0.01365643|  PASSED  
      rgb_lagged_sum|  32|   1000000|     100|0.32007171|  PASSED  
     rgb_kstest_test|   0|     10000|    1000|0.00000000|  FAILED  
     dab_bytedistrib|   0|  51200000|       1|0.28255495|  PASSED  
             dab_dct| 256|     50000|       1|0.00000000|  FAILED  
Preparing to run test 207.  ntuple = 0
        dab_filltree|  32|  15000000|       1|0.00000000|  FAILED  
        dab_filltree|  32|  15000000|       1|0.00000000|  FAILED  
Preparing to run test 208.  ntuple = 0
       dab_filltree2|   0|   5000000|       1|0.00000000|  FAILED  
       dab_filltree2|   1|   5000000|       1|0.00000000|  FAILED  
Preparing to run test 209.  ntuple = 0
        dab_monobit2|  12|  65000000|       1|1.00000000|  FAILED

evanh · 2017-03-05 08:42

Heater. wrote: »

I'd be inclined to just arrange that no single PRNG value is ever used more than once. With or without any bit shuffling/inverting. That is to say be sure the PRNG advances by one step for every random number requested by anyone.

Not a good idea, imho. The simple way would sacrifice speed and a little determinism. The more complex way sacrifices determinism and a little speed. The expensive way, as you've already noted, takes too much silicon.

Cogs have no need to be secured from each other. Propeller isn't targetting general computing.

Heater. · 2017-03-05 09:01

I though the idea was the the PRNG was continuously being clocked in the background. Endlessly stepping through it's sequence no matter if anybody read a random number form it or not. That is to say most of the random numbers get missed.

As such I would suspect that the independence of the random sequence that any COG sees, from any other, is good enough for all practical purposes. With or without bit shuffling.

GOGs will be waiting on I/O and timers etc so multiple COGs won't even be reading the PRNG in the same order all the time. Which randomizes things even more.

The downside of this is perhaps power consumption. Stepping the PRNG when it's output is not used.

Heater. · 2017-03-05 09:05

The words "secure" and "cryptographic" have popped up on this thread.

I'm pretty sure the xoroshiro128+ PRNG is not regarded as a Cryptographically Strong RRNG (CSPRNG) and security gurus would frown at using it for cryptographic purposes.

Depends how secure you want to be I guess.

evanh · 2017-03-05 09:10

Yep, all of the above. I imagine it's power cost will be about four 32-bit timers. Adders are linear and not much more than a counter I think. The 16 way shuffled order doesn't cost any extra silicon or power, that's why it's being included.

cgracey · 2017-03-05 09:14

The state of the PRNG does advance on every clock, regardless of whether it's being used anywhere.

Each cog receives a unique subset of 32 bits from the PRNG's 63 output bits. Those bits were randomly chosen the other day by the Spin program I made to pick them out, and they are in random order. Some of the bits are always inverted.

Each smart pin gets a unique subset of 8 bits from the PRNG, with some bits always inverted.

All these bit groups and inversions were chosen using RealRandom as a TRNG source.

While all cogs and smart pins receive data from the same PRNG, they each get different versions, with different bits and different inversions. This is probably overkill, but it was done to make every point of usage unique.

Here are the 16 unique longs that go to each cog. And each byte's worth goes to a different smart pin (64 total). The source of this data is the xoroshiro128+ PRNG which outputs x[62:0], which was randomly spread all over and randomly inverted to make these tap sets:

assign rnd = {
	{!x[00], x[62],!x[02], x[28],!x[09], x[55], x[03],!x[34], x[45], x[37],!x[11], x[07],!x[41],!x[50],!x[52],!x[59],!x[27],!x[40], x[01], x[57], x[17],!x[53],!x[24],!x[22],!x[58], x[25], x[16],!x[29],!x[43], x[32],!x[14], x[30]},
	{!x[47], x[49],!x[35],!x[38], x[33],!x[31], x[44], x[13], x[06], x[54],!x[05],!x[10],!x[23],!x[19],!x[51], x[20],!x[39],!x[18],!x[46],!x[42], x[36], x[08], x[04], x[60],!x[48],!x[56],!x[26], x[21],!x[12], x[61],!x[15], x[16]},
	{!x[23],!x[21],!x[20],!x[47], x[10], x[48], x[51], x[07],!x[19], x[03],!x[50], x[60],!x[36], x[38],!x[33],!x[06],!x[28], x[27], x[00], x[53], x[24], x[52],!x[14],!x[29],!x[04], x[08], x[42],!x[13],!x[45],!x[56],!x[54], x[61]},
	{!x[22], x[15], x[41],!x[05], x[59], x[58],!x[35],!x[26],!x[11],!x[12], x[31],!x[01], x[39],!x[43],!x[62],!x[25],!x[55],!x[32],!x[44],!x[02],!x[46],!x[37], x[09], x[17], x[49],!x[18],!x[34], x[40],!x[57], x[30],!x[56],!x[08]},
	{!x[37],!x[42],!x[33], x[35], x[39], x[17],!x[10], x[09], x[23],!x[02],!x[60],!x[36], x[18],!x[00], x[32],!x[21],!x[27], x[34], x[20], x[53], x[48], x[15], x[55],!x[14], x[19],!x[62], x[04],!x[29], x[22],!x[28],!x[06], x[03]},
	{!x[45], x[26], x[47],!x[43],!x[50],!x[11], x[57],!x[44], x[31],!x[12],!x[41], x[58],!x[13], x[25], x[07],!x[30], x[05],!x[61],!x[49],!x[59], x[46], x[01], x[52],!x[51],!x[32],!x[16],!x[24],!x[38],!x[54],!x[21], x[37], x[40]},
	{ x[44],!x[30], x[52], x[62],!x[58], x[45], x[38],!x[55], x[00],!x[12], x[14], x[47],!x[13], x[39], x[53], x[40],!x[18], x[31],!x[03], x[41],!x[48],!x[27], x[09],!x[07],!x[17],!x[15],!x[56], x[20], x[16], x[51],!x[59],!x[11]},
	{ x[04], x[06], x[22], x[26],!x[50], x[29],!x[34], x[19], x[60], x[05],!x[49], x[57],!x[33], x[10],!x[01], x[46], x[24],!x[08], x[25], x[23], x[35],!x[42], x[54], x[43],!x[36],!x[02],!x[31],!x[28], x[13],!x[15],!x[61],!x[53]},
	{!x[27], x[44],!x[33], x[52],!x[34], x[29],!x[20],!x[35], x[01], x[11], x[60],!x[14],!x[45], x[10],!x[36], x[22],!x[21],!x[55], x[62],!x[46],!x[16],!x[26],!x[07],!x[30],!x[17], x[51], x[02],!x[48],!x[42], x[43],!x[08], x[19]},
	{!x[38], x[49], x[05], x[28],!x[39], x[23],!x[57],!x[25],!x[18],!x[54],!x[61],!x[24],!x[09],!x[03], x[56], x[04], x[41], x[37], x[40],!x[47],!x[00],!x[06],!x[59],!x[50], x[51],!x[26],!x[22], x[32],!x[58], x[27],!x[12], x[48]},
	{ x[12],!x[17], x[04],!x[54],!x[18], x[02],!x[35], x[44],!x[40], x[50], x[21], x[37], x[15], x[58],!x[19],!x[42], x[41], x[00],!x[47],!x[56], x[46], x[20], x[34], x[11], x[45], x[32],!x[10],!x[60], x[23],!x[62],!x[61], x[13]},
	{ x[03], x[53],!x[08], x[33], x[29],!x[01], x[07],!x[28],!x[31],!x[24],!x[57], x[25], x[43],!x[49], x[55],!x[38],!x[06],!x[09], x[05],!x[59],!x[30], x[39], x[16],!x[52], x[14], x[21],!x[20],!x[36], x[13],!x[04],!x[37],!x[34]},
	{!x[14],!x[36],!x[35],!x[05], x[45],!x[06],!x[54],!x[41], x[23],!x[15],!x[26],!x[49],!x[39], x[30], x[29],!x[38], x[22],!x[01], x[55],!x[43],!x[46], x[52],!x[53], x[40],!x[16], x[27],!x[00],!x[57],!x[18], x[62], x[28],!x[58]},
	{ x[07], x[10], x[32], x[50], x[03],!x[12], x[59], x[09],!x[19], x[56],!x[44], x[25], x[47],!x[08], x[31],!x[51], x[17], x[42],!x[61],!x[02],!x[33], x[24], x[30], x[04],!x[43], x[48],!x[49],!x[11],!x[22],!x[29],!x[60],!x[38]},
	{ x[42], x[37], x[10], x[08], x[61], x[21],!x[00], x[20], x[51], x[57],!x[33],!x[01],!x[03],!x[26],!x[32],!x[05],!x[13], x[19],!x[48],!x[31],!x[02],!x[60],!x[07],!x[36],!x[62], x[27],!x[34],!x[14], x[50],!x[16],!x[17], x[11]},
	{!x[56],!x[23],!x[28], x[15], x[45], x[09],!x[24],!x[46],!x[53],!x[25],!x[52],!x[39],!x[40], x[44],!x[55], x[54], x[59],!x[21],!x[06], x[08],!x[18],!x[01],!x[11],!x[61], x[47],!x[41], x[12], x[49],!x[20],!x[29], x[35],!x[58]} };

evanh · 2017-03-05 09:20

Why was the lsb of ss removed to give 63 bit result?

cgracey · 2017-03-05 09:30

evanh wrote: »

Why was the lsb of ss removed to give 63 bit result?

Because it's only an LFSR, according to the xoroshiro128+ documentation. It wouldn't pass the tests.

As you pointed out, though, the carry output of the final adder would seem to be of good quality, but it must not be if they didn't mention it, because it would have been a great way to get a full 64 bits out of the PRNG.

evanh · 2017-03-05 09:35

cgracey wrote: »

evanh wrote: »

Why was the lsb of ss removed to give 63 bit result?

Because it's only an LFSR, according to the xoroshiro128+ documentation. It wouldn't pass the tests.

What? The bottom bit only?

Actually, I might have found the source of it, Heater dug up an example 32-bit conversion that used [32:01] - http://forums.parallax.com/discussion/comment/1402202/#Comment_1402202

Heater. · 2017-03-05 09:38

Because there is a comment in the original source code of xoroshiro128+ which says:

"
Beside passing BigCrush, this generator passes the PractRand test suite
up to (and included) 16TB, with the exception of binary rank tests,
which fail due to the lowest bit being an LFSR; all other bits pass all
tests. We suggest to use a sign test to extract a random Boolean value.
"

So basically the LSB taken alone is not as "random" as the other bits. it's a simple Linear Feedback Shift Register PRNG. It probably has a very short sequence compared to the 10^128 sequence length of the rest of the bits.

evanh · 2017-03-05 09:46

Hmm, I guess that means we have no clear answer for the summing carry bit either but I'd imagine it is a function of the lower bits.

Random/LFSR on P2

Comments