Random/LFSR on P2

Heater. · 2017-03-05 09:50

What summing carry bit? There is none in Chip's scheme.

evanh · 2017-03-05 09:51

cgracey wrote: »

As you pointed out, though, the carry output of the final adder would seem to be of good quality, but it must not be if they didn't mention it, because it would have been a great way to get a full 64 bits out of the PRNG.

The defence I have here is that the documentation never discusses this in a hardware context. In fact they only discuss high level languages without direct access to a carry result.

cgracey · 2017-03-05 09:52

evanh wrote: »

Hmm, I guess that means we have no clear answer for the summing carry bit either but I'd imagine it is a function of the lower bits.

Do you know who can resolve this question? Ahle2! He already ran the implementation we're using through the Dieharder test battery.

Ahle2, could you please run the Dieharder test on sum[64:01] of the 's0+s1' value. That would include the carry bit that is current being ignored. We need to keep ignoring sum[0]. Maybe we COULD get 64 good bits out of the PRNG.

evanh · 2017-03-05 09:52

Heater. wrote: »

What summing carry bit? There is none in Chip's scheme.

But it could be added with ease. The joys of designing the hardware the way you want it.

cgracey · 2017-03-05 09:53

Heater. wrote: »

What summing carry bit? There is none in Chip's scheme.

It's just the carry output of the 64-bit add of s0 and s1.

Ahle2 · 2017-03-05 10:51

cgracey wrote: »

evanh wrote: »

Hmm, I guess that means we have no clear answer for the summing carry bit either but I'd imagine it is a function of the lower bits.

Do you know who can resolve this question? Ahle2! He already ran the implementation we're using through the Dieharder test battery.

Ahle2, could you please run the Dieharder test on sum[64:01] of the 's0+s1' value. That would include the carry bit that is current being ignored. We need to keep ignoring sum[0]. Maybe we COULD get 64 good bits out of the PRNG.

Of course I can!

I have successfully compiled Practrand on my Linux machine now. It is supposed to be the ultimate PRNG test battery according to the "PRNG community". I will run every variant of PRNG's that I have previously run with Dieharder + the 'Xoroshiro128+ sum of s0+s1' using it! (I will run the sum with Dieharder as well)

Just to be clear, do you mean "s0[64:01] + s1[64:01]"?

evanh · 2017-03-05 10:59

Hehe, I've just solved a lib dependency problem where the package manager thought one of the packages was broken and absolutely insisted on removing it. After going though the rigmarole of reinstating it manually a couple of times I found instructions on how to remove dependency entries in either the package itself or the subsequent forced install. I chose the latter as it didn't need repackaged.

Now I can simple install dieharder from repository ... and do all the belated updates too ...

cgracey · 2017-03-05 11:00

Ahle2 wrote: »

cgracey wrote: »

evanh wrote: »

Hmm, I guess that means we have no clear answer for the summing carry bit either but I'd imagine it is a function of the lower bits.

Do you know who can resolve this question? Ahle2! He already ran the implementation we're using through the Dieharder test battery.

Ahle2, could you please run the Dieharder test on sum[64:01] of the 's0+s1' value. That would include the carry bit that is current being ignored. We need to keep ignoring sum[0]. Maybe we COULD get 64 good bits out of the PRNG.

Of course I can!

I have successfully compiled Practrand on my Linux machine now. It is supposed to be the ultimate PRNG test battery according to the "PRNG community". I will run every variant of PRNG's that I have previously run with Dieharder + the 'Xoroshiro128+ sum of s0+s1' using it! (I will run the sum with Dieharder as well)

Just to be clear, do you mean "s0[64:01] + s1[64:01]"?

No. I mean that when you compute s0+s1, you actually get a 65-bit result, with the bits ranging in position from 64 down to 0. We want to know what the integrity of bits 64 down to 1 are.

If you think in Verilog, it looks like this:

wire [63:0] s0, s1;
wire [64:0] ss = s0 + s1;
wire [63:0] rndbits = ss[64:1];

It's rndbits that we're after.

If you have a 65-bit adder, we are interested in: (s0 + s1) >> 1

evanh · 2017-03-05 11:03

Ahle2 wrote: »

Just to be clear, do you mean "s0[64:01] + s1[64:01]"?

No, it's: ss[64:00] = s0[63:00] + s1[63:00] with carry. Carry bit goes in ss[64]. It's addition's equivalent of the extended multiply, 64-bit = 32-bit x 32-bit.

Ahle2 · 2017-03-05 11:08

Okey...
GCC has built in support for 128 bits types (the C++ standard has not) so it should be easy!

Heater. · 2017-03-05 11:15

Ah, OK, that sum.

There is a bias in that carry bit.

Consider that if S0 and S1 were only 8 bits wide adding them would produce a carry only 49.8% of the time. If they were 16 bits it would only be 49.999237% of the time.

There must be a way to calculate the bias of the carry bit when adding random numbers of a given width. I have no idea what it is but a bit of brute force experimental mathematics produces the following:

Width: 1 Carries: 25% Bias: 25%
Width: 2 Carries: 37.5% Bias: 12.5%
Width: 3 Carries: 43.75% Bias: 6.25%
Width: 4 Carries: 46.875% Bias: 3.125%
Width: 5 Carries: 48.4375% Bias: 1.5625%
Width: 6 Carries: 49.21875% Bias: 0.78125%
Width: 7 Carries: 49.609375% Bias: 0.390625%
Width: 8 Carries: 49.8046875% Bias: 0.1953125%
Width: 9 Carries: 49.90234375% Bias: 0.09765625%
Width: 10 Carries: 49.951171875% Bias: 0.048828125%
Width: 11 Carries: 49.9755859375% Bias: 0.0244140625%
Width: 12 Carries: 49.98779296875% Bias: 0.01220703125%
Width: 13 Carries: 49.993896484375% Bias: 0.006103515625%
Width: 14 Carries: 49.9969482421875% Bias: 0.0030517578125%
Width: 15 Carries: 49.99847412109375% Bias: 0.00152587890625%
Width: 16 Carries: 49.999237060546875% Bias: 0.000762939453125%

We see the bias in the carry bit is halving with every extra bit of width. So we can speculate that the bias for a 64 bit carry is about 25 / Math.pow(2, 63), or 10E-18 percent.

Which is pretty huge considering we are working with a high quality PRNG with a period of 10E128 or whatever it is.

Would that bias ever show up in Dieharder? No idea.

Ahle2 · 2017-03-05 11:18

I tried clocking the original P2 LFSR 32 (as proposed by Phil) steps for each random number output to Dieharder. Amazingly most tests passes. Doing the same with Practrand fails everything while xoroshiro passes! It really seems like Practrand is superior for finding out which PRNG among good ones that stands out.

cgracey · 2017-03-05 11:21

Heater. wrote: »

Ah, OK, that sum.

There is a bias in that carry bit.

Consider that if S0 and S1 were only 8 bits wide adding them would produce a carry only 49.8% of the time. If they were 16 bits it would only be 49.999237% of the time.

There must be a way to calculate the bias of the carry bit when adding random numbers of a given width. I have no idea what it is but a bit of brute force experimental mathematics produces the following:

Width: 1 Carries: 25% Bias: 25%
Width: 2 Carries: 37.5% Bias: 12.5%
Width: 3 Carries: 43.75% Bias: 6.25%
Width: 4 Carries: 46.875% Bias: 3.125%
Width: 5 Carries: 48.4375% Bias: 1.5625%
Width: 6 Carries: 49.21875% Bias: 0.78125%
Width: 7 Carries: 49.609375% Bias: 0.390625%
Width: 8 Carries: 49.8046875% Bias: 0.1953125%
Width: 9 Carries: 49.90234375% Bias: 0.09765625%
Width: 10 Carries: 49.951171875% Bias: 0.048828125%
Width: 11 Carries: 49.9755859375% Bias: 0.0244140625%
Width: 12 Carries: 49.98779296875% Bias: 0.01220703125%
Width: 13 Carries: 49.993896484375% Bias: 0.006103515625%
Width: 14 Carries: 49.9969482421875% Bias: 0.0030517578125%
Width: 15 Carries: 49.99847412109375% Bias: 0.00152587890625%
Width: 16 Carries: 49.999237060546875% Bias: 0.000762939453125%

We see the bias in the carry bit is halving with every extra bit of width. So we can speculate that the bias for a 64 bit carry is about 25 / Math.pow(2, 63), or 1 in 10E-18.

Which is pretty huge considering we are working with a high quality PRNG with a period of 10E128 or whatever it is.

Would that bias ever show up in Dieharder? No idea.

Unless Dieharder could run the test for 10e29 years at 160MHz sample rate, the bias wouldn't show up.

cgracey · 2017-03-05 11:22

Ahle2 wrote: »

I tried clocking the original P2 LFSR 32 (as proposed by Phil) steps for each random number output to Dieharder. Amazingly most tests passes. Doing the same with Practrand fails everything while xoroshiro passes! It really seems like Practrand is superior for finding out which PRNG among good ones that stands out.

With 128-bit types, just do: (so + s1) >> 1

Heater. · 2017-03-05 11:23

Edit: Deleted brain fart.

Heater. · 2017-03-05 12:02

Yeah, OK, 10e29 years should be good enough for anybody

evanh · 2017-03-05 12:09

Heater. wrote: »

Width: 1 Carries: 25% Bias: 25%
Width: 2 Carries: 37.5% Bias: 12.5%
Width: 3 Carries: 43.75% Bias: 6.25%
Width: 4 Carries: 46.875% Bias: 3.125%
Width: 5 Carries: 48.4375% Bias: 1.5625%
Width: 6 Carries: 49.21875% Bias: 0.78125%
Width: 7 Carries: 49.609375% Bias: 0.390625%
Width: 8 Carries: 49.8046875% Bias: 0.1953125%
Width: 9 Carries: 49.90234375% Bias: 0.09765625%
...

Cool! I've wondered about that sort of thing before but didn't have a clue of significance. Mulling over a 1-bit addition in my head made that list very clear. Thanks Heater.

Heater. · 2017-03-05 13:51

The only remaining question then is: Is that carry bit, bit 64, any more or less "random" than the bit 0 ?

Heater. · 2017-03-06 04:34

Could not sleep so I thought I'd have a play with dieharder.

I changed my xoroshiro128+ to do 128 bit addition S0 + S1 and shift down by 1. Like so:

uint64_t next(void) {
	const uint64_t s0 = s[0];
	uint64_t s1 = s[1];
	const uint64_t result = ((unsigned __int128)s0 + (unsigned __int128)s1) >> 1;

	s1 ^= s0;
	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b
	s[1] = rotl(s1, 36); // c

	return result;
}

The test harness just writes the binary results of next() to standard out. Like so:

#include "xoroshiro128plus.c"

// Seed MUST not be all zero.
#define SEED_0 ((uint64_t)0x1)
#define SEED_1 ((uint64_t)0x0) 

int main(int argc, char* argv[])
{
    uint64_t random64;
    int i;

    // Seed the state array
    s[0] = SEED_0;  
    s[1] = SEED_1;

    // Print some randomness 
    while(1)
    {
        random64 = next();
        // printf("%016jx\n", (uintmax_t)(random64 >> 1));
        for (i = 0; i < 8; i++) {
            putchar(random64 >> i * 8);
        }
    }
    return 0;
}

I pipe the output into dieharder like so:

./xoroshiro128plus-test | dieharder -g200 -a

Sadly it fails miserably:

$ ./xoroshiro128plus-test | dieharder -g200 -a
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |rands/second|   Seed   |
stdin_input_raw|  2.46e+07  |2926663896|
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.90721666|  PASSED
      diehard_operm5|   0|   1000000|     100|0.00000000|  FAILED
  diehard_rank_32x32|   0|     40000|     100|0.81803211|  PASSED
    diehard_rank_6x8|   0|    100000|     100|0.67677153|  PASSED
   diehard_bitstream|   0|   2097152|     100|0.00000000|  FAILED
        diehard_opso|   0|   2097152|     100|0.91666900|  PASSED
        diehard_oqso|   0|   2097152|     100|0.21512226|  PASSED
         diehard_dna|   0|   2097152|     100|0.00000000|  FAILED
diehard_count_1s_str|   0|    256000|     100|0.00159028|   WEAK
diehard_count_1s_byt|   0|    256000|     100|0.82748065|  PASSED
 diehard_parking_lot|   0|     12000|     100|0.00000000|  FAILED
    diehard_2dsphere|   2|      8000|     100|0.08446717|  PASSED
    diehard_3dsphere|   3|      4000|     100|0.17783008|  PASSED
     diehard_squeeze|   0|    100000|     100|0.00000000|  FAILED
        diehard_sums|   0|       100|     100|0.02361403|  PASSED
        diehard_runs|   0|    100000|     100|0.00000000|  FAILED
        diehard_runs|   0|    100000|     100|0.00000000|  FAILED
       diehard_craps|   0|    200000|     100|0.00000000|  FAILED
       diehard_craps|   0|    200000|     100|0.00000000|  FAILED
 marsaglia_tsang_gcd|   0|  10000000|     100|0.00000000|  FAILED
 marsaglia_tsang_gcd|   0|  10000000|     100|0.33508574|  PASSED
         sts_monobit|   1|    100000|     100|0.29833481|  PASSED
            sts_runs|   2|    100000|     100|0.00000000|  FAILED
          sts_serial|   1|    100000|     100|0.75364224|  PASSED
          sts_serial|   2|    100000|     100|0.00000000|  FAILED
          sts_serial|   3|    100000|     100|0.00000000|  FAILED
          sts_serial|   3|    100000|     100|0.00000000|  FAILED
          sts_serial|   4|    100000|     100|0.00000000|  FAILED
          sts_serial|   4|    100000|     100|0.00000000|  FAILED
          sts_serial|   5|    100000|     100|0.00000000|  FAILED
          sts_serial|   5|    100000|     100|0.00001544|   WEAK
          sts_serial|   6|    100000|     100|0.00000000|  FAILED
          sts_serial|   6|    100000|     100|0.08702530|  PASSED
          sts_serial|   7|    100000|     100|0.00000000|  FAILED
          sts_serial|   7|    100000|     100|0.76972737|  PASSED
          sts_serial|   8|    100000|     100|0.00000000|  FAILED
          sts_serial|   8|    100000|     100|0.90452649|  PASSED
          sts_serial|   9|    100000|     100|0.00000000|  FAILED
          sts_serial|   9|    100000|     100|0.29567132|  PASSED
          sts_serial|  10|    100000|     100|0.00000000|  FAILED

Heater. · 2017-03-06 04:44

If I remove that ">> 1", keeping the 128 bit addition and using bits [63:0] then everything in dieharder starts passing again. So I guess the problem is not elsewhere.

Basically that carry cannot be used.

Unless I have slipped up again...

evanh · 2017-03-06 04:48

Ouch! Do it again with post-summing bit 63 removal so as to prove it's not a flaw in the 128-bit conversions.

Heater. · 2017-03-06 04:53

evanh,

What do you mean by "post-summing bit 63 removal" ?

evanh · 2017-03-06 07:29

Sorry for the delay, I'd written that while at work. I mean just knock out the msb of the returned result, eg:

return result & 0x7fffffffffffffff;

Heater. · 2017-03-06 08:04

But, but, the point of the exercise is to get a 64 bit result. Using the carry rather than bit 0. Which is said to be not so statistically "random" as the other 63 bits.

There is no way dieharde will like an input of 64 bit integers each with it's top bit set zero.

As it stands it looks like we may as well use that bit zero as it performs very well.

evanh · 2017-03-06 08:26

That's what the existing solution is, only 63 bits. I'm just trying to sort out where the problem is with the this experimental 64-bit variant.

Heater. · 2017-03-06 08:40

Yes, the existing solution uses only the high 63 bits. Why? Because the bit zero is said to be not so statistically random looking as the rest.

In posts above Chip wondered if that carry bit could be used instead. Thus making a 64 bit result. And was asking for that solution to be tested with Dieharder and whatever statistical tests. I think his idea is that when distributing bits to COGs with his "rats nest" of connections it would be nice to have a multiple of 8 bits and have all the bits used the same number of times. Or something like that.

I pointed out that there is a tiny bias in the carry bit. But we reckoned on it being so small as not to show up in a test. That left the question mark over it's statistical "randomness".

Then, being unable to sleep, I could not resist trying carry bit experiment myself.

Unless I have made a mistake I think we may as well use the original 64 bit result.

Ahle2 · 2017-03-06 08:45

Heater, It passes with flying colors for me!!

Here's my PRNG code

uint64_t PRNG::getXoroshiro()
{
    const uint64_t s0 = s[0];
    uint64_t s1 = s[1];
    const unsigned __int128 result = static_cast<unsigned __int128>(s0) + static_cast<unsigned __int128>(s1);

    s1 ^= s0;
    s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14); // a, b
    s[1] = rotl(s1, 36); // c

    return static_cast<uint64_t>(result >> 1);
}

The main code

            while(1)
            {
                auto random = prng.getXoroshiro();
                std::cout << static_cast<char>(random >> 24);
                std::cout << static_cast<char>(random >> 16);
                std::cout << static_cast<char>(random >> 8);
                std::cout << static_cast<char>(random);
            }

Ahle2 · 2017-03-06 08:52

Okay... I made a mistake as seen in my code above. Only passing 32 bits of the 64 to stdout. When I ran it again with all bits, it began failing misserable!

Here's the output from Practrand

PE 1 | ./RNG_test stdin
RNG_test using PractRand version 0.93
RNG = RNG_stdin, seed = 0xa4eaaac4
test set = normal, folding = standard(unknown format)

rng=RNG_stdin, seed=0xa4eaaac4
length= 128 megabytes (2^27 bytes), time= 3.8 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN(2+0,13-3,T)                  R= +74.9  p =  3.0e-35    FAIL !!!       
  BCFN(2+1,13-3,T)                  R= +18.9  p =  9.6e-9    VERY SUSPICIOUS 
  DC6-9x1Bytes-1                    R= +1413  p =  6.3e-868   FAIL !!!!!!!   
  Gap-16:A                          R=+149.1  p =  1.3e-126   FAIL !!!!!     
  Gap-16:B                          R=+367.1  p =  6.1e-294   FAIL !!!!!!    
  FPF-14+6/16:(0,14-0)              R= +5478  p =  8e-5069    FAIL !!!!!!!!  
  FPF-14+6/16:(1,14-0)              R= +2754  p =  5e-2548    FAIL !!!!!!!!  
  FPF-14+6/16:(2,14-0)              R= +1357  p =  2e-1255    FAIL !!!!!!!!  
  FPF-14+6/16:(3,14-1)              R=+967.8  p =  1.7e-857   FAIL !!!!!!!   
  FPF-14+6/16:(4,14-2)              R=+655.1  p =  1.0e-572   FAIL !!!!!!!   
  FPF-14+6/16:(5,14-2)              R=+254.4  p =  3.0e-222   FAIL !!!!!!    
  FPF-14+6/16:all                   R= +5306  p =  2e-4979    FAIL !!!!!!!!  
  FPF-14+6/16:all2                  R=+9738850 p = 0           FAIL !!!!!!!!  
  FPF-14+6/16:cross                 R= +5198  p =  2e-4373    FAIL !!!!!!!!  
  [Low1/8]Gap-16:B                  R=  +4.4  p =  1.0e-3   unusual          
  ...and 136 test result(s) without anomalies

Ahle2 · 2017-03-06 09:05

Heater,

const uint64_t result = ((unsigned __int128)s0 + (unsigned __int128)s1) >> 1;

I first thought that your code failed because of the implicit cast to the 64 bit result variable (according to the standard you can never know what the compiler decides to do under the hood), but obviously it fails in both Practrand and Dieharder for me as well using explicit casting in C++ 11;

Heater. · 2017-03-06 09:18

Yes, but you are not using all 64 bits of the output, only the low 32.

Here is my main code:

    while(1)
    {
        random64 = next();
        for (i = 0; i < 8; i++) {
            putchar(random64 >> (i * 8));
        }
    }

Random/LFSR on P2

Comments