Random/LFSR on P2

jmg · 2017-03-02 06:06

Ahle2 wrote: »

I will implement the old P2 LFSR, Jkiss , PCG (by Melissa O'Neill) and xoroshiro 128+. Then run the them through Testu01 and Dieharder suites. Stay tuned! (Melissa claims that her PRNG is the best available to this date. Her paper is under review as we speak)

.. and Chips P2ASM too ?
There is some mention above that was not quite xoroshiro 64b ?

cgracey wrote: »

If we can be sure this is correct, I'll update the PRNG in the hub to use this, instead.

Do you mean here update in verilog ?

Heater. · 2017-03-02 06:08

Ahle2,

What is the easiest way to run some decent tests on a generator?

Ages ago I used diehard occasionally, it always seemed a pain to work with. I just took a look at the testu01 and, well, the user manual is a monster.

All I want is to feed a giga byte of random in and get a "good, bad, ugly" result out !

I have a transistor avalanche random generator here that will need testing when I get it's output into a computer some how.

jmg · 2017-03-02 06:08

Ahle2 wrote: »

@Evanh
Have you tried your zener TRNG with some RNG test suites? It would be cool to see how well it performs. I guess It would pass with Flying colors!

A challenge here would be getting rid of fixed frequency ripple in the analog domain, like Mains Hum, SMPS switching frequencies etc. There will always be some spectral sneak thru...

Heater. · 2017-03-02 06:13

jmg,

I wondered about that.

One could run a zenner or transistor noise generator off a battery, have the whole thing in a metal box with an optical output.

All getting a bit over the top.

Not what you want for an on chip random generator !

Phil Pilgrim (PhiPi) · 2017-03-02 06:16

jmg wrote:

A challenge here would be getting rid of fixed frequency ripple in the analog domain, like Mains Hum, SMPS switching frequencies etc. There will always be some spectral sneak thru...

I think the Von Neumann algo that Chip uses in RealRandom should take care of any bias or spectral pass-thru.

-Phil

cgracey · 2017-03-02 07:43

Ahle2 wrote: »

@Chip
Do you think the smart pin LSB will act as a good TRNG? Maybe smart pins in combination with a zener diode and a...

I think the ADC feedback will have lots of thermal noise represented in it.

Ahle2 · 2017-03-02 09:51

@Heater
"What is the easiest way to run some decent tests on a generator?"
My way of doing it is with a tool that outputs random characters to stdout. Then you could pipe the output to a file or directly to Dieharder in raw input mode!

I'm currently tryging to figure out how to run TestU01 with my tool. All "experts" says that TestU01 is a superior test battery for deciding between a "great PRNG" and a "superior PRNG". My initial testings indicates that all mentioned PRNG in this threads passes the Dieharder tests. But I have not been able to test Chips LFSR yet. Btw, I'm talking about DiehardER not Diehard as that is considered "Smile" by todays standard.

@all
I have included the C++ source code for my testing tool. It would be nice if someone with knowledge in both C++ and Verilog could help me implement Chips LFSR, so we could use that as a reference for further testings.

/Johannes

Ahle2 · 2017-03-02 09:57

The command line I'm running is as follows:

./PP2PE 1 | dieharder -g200 -a

1 means Xoroshiro in this case.

/Johannes

Ahle2 · 2017-03-02 10:08

An output from Dieharder using PCG looks like this!

./PP2PE 2 | dieharder -g 200 -a
#=============================================================================#
#            dieharder version 3.31.1 Copyright 2003 Robert G. Brown          #
#=============================================================================#
   rng_name    |rands/second|   Seed   |
stdin_input_raw|  1.42e+07  | 885928609|
#=============================================================================#
        test_name   |ntup| tsamples |psamples|  p-value |Assessment
#=============================================================================#
   diehard_birthdays|   0|       100|     100|0.29960471|  PASSED  
      diehard_operm5|   0|   1000000|     100|0.26079820|  PASSED  
  diehard_rank_32x32|   0|     40000|     100|0.05263385|  PASSED  
    diehard_rank_6x8|   0|    100000|     100|0.48317317|  PASSED  
   diehard_bitstream|   0|   2097152|     100|0.57603361|  PASSED  
        diehard_opso|   0|   2097152|     100|0.71719125|  PASSED  
        diehard_oqso|   0|   2097152|     100|0.20879796|  PASSED  
         diehard_dna|   0|   2097152|     100|0.37774651|  PASSED  
diehard_count_1s_str|   0|    256000|     100|0.99932855|   WEAK   
diehard_count_1s_byt|   0|    256000|     100|0.85655375|  PASSED  
 diehard_parking_lot|   0|     12000|     100|0.99430846|  PASSED  
    diehard_2dsphere|   2|      8000|     100|0.46170135|  PASSED  
    diehard_3dsphere|   3|      4000|     100|0.88822056|  PASSED  
     diehard_squeeze|   0|    100000|     100|0.39573244|  PASSED  
        diehard_sums|   0|       100|     100|0.04406544|  PASSED  
        diehard_runs|   0|    100000|     100|0.84283422|  PASSED  
        diehard_runs|   0|    100000|     100|0.99944572|   WEAK   
       diehard_craps|   0|    200000|     100|0.50201529|  PASSED  
       diehard_craps|   0|    200000|     100|0.67240240|  PASSED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.61052204|  PASSED  
 marsaglia_tsang_gcd|   0|  10000000|     100|0.66039542|  PASSED

As you can see a few weak test results comes up. if I haven't done anything wrong when implementing Mellissas algorithm, it does not seem to be the best PRNG available today as she claims.

/Johannes

Heater. · 2017-03-02 10:15

Ahle2,

DiehardER

Ah yes. Heard of that. I don't think it existed when I last played with diehard.

Funny aside:

In the days of CD's George Marsaglia, creator of diehard and many PRNGs, published a CD with diehard on it. The CD also contained 100's megabytes of random bits.

Problem was that when George created the CD content he was using MSDOS and copied the files without using the "binary" option to the copy command. (/b was it?). MSDOS copied the file as text and fixed up all that it thought was line ending in the binary data. Replacing 0x0d with 0x0d, 0x0a or whatever it does.

Result: The random files on George's CD did not do very well in the diehard test!

I was nerdy enough to copy the files off the CD and fix the damage. Then the files passed diehard very well

Slippery things, random numbers.

Heater. · 2017-03-02 10:20

Ahle2,

Thanks for the heads up on dieharder.

I don't recall diehard classifying test results, PASSED, WEAK, etc so nicely. One had to look at the numbers and make a decision.

Heater. · 2017-03-02 13:43

Chip,

I wrapped xoroshiro128+ PRNG into a simple test which generates a million numbers and prints the first and last 4. You could check if your PASM version does the same. Be sure to use the same seed of course.

First the results:

First 4 and last 4 of 1000000 xoroshiro128plus PRNG output (hex):
4aa4921626d12792
da319ec1da35800a
6a392789cef68c3c
eb50047a5d895c50
...
26fa38e131ff753b
f4cf7cfb892c43a3
56575790aa078000
438948e55200ef1a
From seed: s[0] = 730358127afd03f7, s[1] = d7a13a03abd4239b

And test code:
(The xoroshiro128 source file comes from here: http://xoroshiro.di.unimi.it)

//
// Exercise the xoroshiro128+ PRNG 
//
// Compile with:
//    gcc -Wall -std=c99 -o xoroshiro128plus-test xoroshiro128plus-test.c
//
#include <stdio.h>
#include "xoroshiro128plus.c"

// Seed MUST not be all zero.
#define SEED_0 0x730358127afd03f7
#define SEED_1 0xd7a13a03abd4239b 
#define SAMPLE_SIZE 1000000
#define HEAD 4
#define TAIL 4

int main(int argc, char* argv[])
{
    uint64_t random;
    uint64_t i;
    char ellipsis = 1;

    // Seed the state array
    s[0] = SEED_0;  
    s[1] = SEED_1;

    printf ("First %d and last %d of %d xoroshiro128plus PRNG output:\n", HEAD, TAIL, SAMPLE_SIZE);

    // Print some randomness 
    for (i = 0; i < SAMPLE_SIZE; i++)
    {
        random = next();
        if ((i < HEAD) || (i > (SAMPLE_SIZE - TAIL - 1)))
        {
            printf("%016lx\n", random);
        }
        else
        {
            if (ellipsis)
            {
                printf("...\n");
                ellipsis--;
            }
        }    
    }

    printf("From seed: s[0] = %016lx, s[1] = %016lx\n", SEED_0, SEED_1); 
    return 0;
}

Ahle2 · 2017-03-02 14:09

@Chip
I don't want to add any more time to the P2 project, so don't wait for us to decide which one of the multitudes of PRNG's that is the best. Just go with Xoroshiro 128+ and it will endlessly much better than a simple LFSR!
The internal state is 128 bits, but all bits but the LSB is good as a tap for the final 32 bit value. That will fix the problem of the generator never returning a zero!

And to be honest everything points in favour of Xoroshiro 128+ according to all comparisons I have come across. Fastest (according to many implementations/tests, but that is only interesting from a software perspective though), least trouble with the more advanced test suites, just 128 bits of state memory and maybe most importantly it is completely free (in every sense of the word) to implement.

/Johannes

Ahle2 · 2017-03-02 14:32

BTW, if clocked at 160 Mhz, it will take 1.3984206859764595e+29 years before it wraps. I think it's good enough!

Heater. · 2017-03-02 14:53

Ahle2,

Is this what you had in mind? A thirty two bit output version of the xoroshiro128+ test.

This shifts the 64 bit result right by one and outputs the low 32 bits.

Sadly the authors of xoroshiro don't provide any test data. We have to trust our compiler is correct.

The results;

First 8 and last 8 of 1000000 xoroshiro128plus PRNG output (hex):
00000000
00002000
0c000090
08222091
8c04e200
3ac4a071
1903e2a2
ec692f73
...
88940a17
4c8836ca
808622f1
bb2ae479
c96b14cb
28208a41
aef9334d
d2ef9822
From seed: s[0] = 0x0000000000000001, s[1] = 0x0000000000000000

The code:

//
// Exercise the xoroshiro128+ PRNG 
//
// For 64 bit output Compile with:
//    $ gcc -Wall -std=c99 -o xoroshiro128plus-test xoroshiro128plus-test.c
//
// For 32 bit output Compile with:
//    $ gcc -Wall -std=c99 -DOUTPUT_32 -o xoroshiro128plus-test xoroshiro128plus-test.c
//
// Also works on 32 bit machines (-m32)
//
// Note: 64 bit output will never contain zero. 32 bit output can be any 32 bit value.
//
#include <stdio.h>
#include "xoroshiro128plus.c"

// Seed MUST not be all zero.
#define SEED_0 ((uint64_t)0x1)
#define SEED_1 ((uint64_t)0x0) 

#define SAMPLE_SIZE 1000000
#define HEAD 8
#define TAIL 8

int main(int argc, char* argv[])
{
    uint64_t random64;
    uint64_t i;
    char ellipsis = 1;

    // Seed the state array
    s[0] = SEED_0;  
    s[1] = SEED_1;

    printf ("First %d and last %d of %d xoroshiro128plus PRNG output (hex):\n", HEAD, TAIL, SAMPLE_SIZE);

    // Print some randomness 
    for (i = 0; i < SAMPLE_SIZE; i++)
    {
        random64 = next();
        if ((i < HEAD) || (i > (SAMPLE_SIZE - TAIL - 1)))
        {
#ifdef OUTPUT_32
            uint32_t random32 = random64 >> 1;
            printf("%08x\n", random32);
#else
             printf("%016jx\n", (intmax_t)random64);
#endif
        }
        else
        {
            if (ellipsis)
            {
                printf("...\n");
                ellipsis--;
            }
        }    
    }

    printf("From seed: s[0] = 0x%016jx, s[1] = 0x%016jx\n", (intmax_t)SEED_0, (intmax_t)SEED_1); 
    return 0;
}

Edit: Changed the seed to that used by Chip.

Ahle2 · 2017-03-02 14:58

Running on average 2^32 iterations to see if it ever returns zero should just take some minutes seconds. Hopefully the PRNG is better than this!

Heater. · 2017-03-02 15:53

EDIT: Ignore the following it's gibberish.

Not sure I follow. Surely to check by brute force that the output is never zero we have to run through all the outputs of the 64 b generator:

I read that does cycle through all possible 64 bit output values except zero.

That implies that whatever 32 bits one selects from the 64 bit result a 32 bit zero is very possible.

I think if one only wants a 32 bit output then one has to check those 32 bits for zero and discard them if zero and try again.

Love that old Dilbert.

Seairth · 2017-03-02 16:22

Heater. wrote: »

I think if one only wants a 32 bit output then one has to check those 32 bits for zero and discard them if zero and try again.

Is zero not an acceptable random value? Or do you mean "reject zero in your code when you don't want zero as a random value"?

Heater. · 2017-03-02 16:59

Oops...you are right. Sorry. All of a sudden my brain flipped into reverse for some odd reason.

Ahle2 · 2017-03-02 17:42

@Chip
I don't understand Verilog. Is there a spin version I could use as a base to convert to C?

Ariba · 2017-03-02 19:16

Ahle2

Here is a translation to Spin as I understand the Verilog. There is a lot of bit manipulation, so I use the trick with the unimplemented PortB registers to access single bits in a long:

VAR
  LONG  rndx,rnd

PUB Main
  rndx := -1

  repeat
    DIRB := rndx
    rndx := rndx<<1 + (DIRB[31] ^ DIRB[10] ^ DIRB[4] ^ DIRB[1])

    'Bit reorder
    DIRB := rndx
    repeat i from 31 to 0
      OUTB[i] := DIRB[lookupz(31-i : 23,7,25,8,4,9,11,3,15,31,10,6,5,2,26,18,20,28,16,17,0,27,22,13,14,30,1,29,12,19,21,24)]

    'cog pattern
    rnd := OUTB ^ $428A2F98   'for cog0

    'output rnd here

It's untested, I have not even tried to compile it!
Maybe the quality of the random generator is not affected by the bit reordering, and it's enough to test only the rndx calculation.

Ahle2 wrote: »

BTW, if clocked at 160 Mhz, it will take 1.3984206859764595e+29 years before it wraps. I think it's good enough!

I think we should verify this before the P2 goes into synthesis. Will give us enough time to discuss every aspect of SPIN2 ...

Andy

ersmith · 2017-03-02 19:32

Roy Eltham wrote: »

const mostly does nothing for modern C++ compilers. It's a hint at best.
It can be cast away anywhere in the codebase, so it's not a reliable thing for the optimizing compiler anyway.

I think you were thinking of "register" -- that's ignored by modern C++ compilers. "const" on the other hand certainly does have an effect; the compiler will give an error if you try to modify a const object, for example, and the optimizer is free to assume that consts do not change. It is true, as you pointed out, that const is easily cast away, but that's dangerous. If you tell the compiler something is const it's free to optimize based on that assumption (and GCC, at least, will do so).

Eric

Heater. · 2017-03-02 20:31

As I often say "There is no single person alive that understands C++"

Heater. · 2017-03-02 20:56

Practically, if Chip want's to use xoroshiro128+ in PASM then we need to know the PASM works correctly.

I have taken the original xoroshiro128+ C source and used it to generate test data that any PASM version can be checked against.

The only remaining doubt is that the authors of xoroshiro128+ do not provide any sample output. So we have to trust that GCC, say, does the right thing with the same source.

Perhaps someone else would like to reproduce what I have done on whatever machine/compiler they have?

So that we can cross-check.

Roy Eltham · 2017-03-02 22:05

ersmith wrote: »

Roy Eltham wrote: »

const mostly does nothing for modern C++ compilers. It's a hint at best.
It can be cast away anywhere in the codebase, so it's not a reliable thing for the optimizing compiler anyway.

I think you were thinking of "register" -- that's ignored by modern C++ compilers. "const" on the other hand certainly does have an effect; the compiler will give an error if you try to modify a const object, for example, and the optimizer is free to assume that consts do not change. It is true, as you pointed out, that const is easily cast away, but that's dangerous. If you tell the compiler something is const it's free to optimize based on that assumption (and GCC, at least, will do so).

Eric

No, I meant const, and I meant as it relates to the code generation and optimization. Yes it does introduce compile errors and protections at the source/compile time levels.

cgracey · 2017-03-03 06:50

I was implementing the xoroshiro and doing some tests running the PASM implementation when I realized there was an error in my code: In those REPeat blocks which make the 64-bit rotators, the first RCL instructions needed a WC. The carry was not propagating into the top long. I fixed the code in my post, so you should copy the new version if you want to run it.

cgracey · 2017-03-03 07:13

Just noticed another problem: I had implemented the shift-left-14 as a rotate-left-14. I updated the code in the post again to fix it.

cgracey · 2017-03-03 07:42

Here is xoroshiro128+ in Verilog:

// RND

module		rnd
(
input		resn,
input		clk,

output	[62:0]	out
);


// xoroshiro128+

reg  [63:0] s0, s1, ss;

wire [63:0] sx = s0 ^ s1;

always @(posedge clk or negedge resn)
if (!resn)
begin
  s0 <= 64'b1;    // seed s0
  s1 <= 64'b0;
  ss <= 64'b0;
end
else
begin
  s0 <= {s0[08:00], s0[63:09]} ^ sx ^ {sx[49:00], 14'b0};
  s1 <= {sx[27:00], sx[63:28]};
  ss <= s0 + s1;
end

assign out = ss[63:01];    // output high-quality bits (LSB=LFSR)

endmodule

With the scan-register macros, it's easier to look at:

// RND

module		rnd
(
input		resn,
input		clk,

output	[62:0]	out
);


// xoroshiro-128

reg  [63:0] s0, s1, ss;

wire [63:0] sx = s0 ^ s1;

`regscan (s0, 64'b1, 1'b1, {s0[08:00], s0[63:09]} ^ sx ^ {sx[49:00], 14'b0})
`regscan (s1, 64'b0, 1'b1, {sx[27:00], sx[63:28]})
`regscan (ss, 64'b0, 1'b1, s0 + s1)

assign out = ss[63:01];    // output high-quality bits (LSB=LFSR)

endmodule

cgracey · 2017-03-03 07:50

Ahle2 wrote: »

@Chip
I don't want to add any more time to the P2 project, so don't wait for us to decide which one of the multitudes of PRNG's that is the best. Just go with Xoroshiro 128+ and it will endlessly much better than a simple LFSR!
The internal state is 128 bits, but all bits but the LSB is good as a tap for the final 32 bit value. That will fix the problem of the generator never returning a zero!

And to be honest everything points in favour of Xoroshiro 128+ according to all comparisons I have come across. Fastest (according to many implementations/tests, but that is only interesting from a software perspective though), least trouble with the more advanced test suites, just 128 bits of state memory and maybe most importantly it is completely free (in every sense of the word) to implement.

/Johannes

The Verilog is done and now I'm plugging it in.

Ahle2, thanks a lot for this xoroshiro128+ pointer. This is a huge upgrade to what we had.

evanh · 2017-03-03 08:26

Ahle2 wrote: »

Evanh
Have you tried your zener TRNG with some RNG test suites? It would be cool to see how well it performs. I guess It would pass with Flying colors!

Nothing more than general knowledge on my behalf, sorry.

Random/LFSR on P2

Comments