Random/LFSR on P2

1356714

Comments

  • Thanks Chip and Johannes. Well worth the effort.
    As long as websites don't require javascript in the browser then I'm happy.
  • jmgjmg Posts: 9,261
    cgracey wrote: »
    The Verilog is done and now I'm plugging it in.

    What is the logic size of the two 'random' generators ?
    ie what does this code in die area ?

  • evanh wrote: »
    Thanks Chip and Johannes. Well worth the effort.

    Yes, this is a good development. Just in time, too.
  • TorTor Posts: 1,654
    Heater. wrote: »
    Perhaps someone else would like to reproduce what I have done on whatever machine/compiler they have?
    Verified on AIX 7.1 w/IBM compiler, and on Tru64. Had to massage the code a bit, getting rid of the C++ comments and some other modernities, to get it to compile. Didn't work on AIX in 32-bit mode (wrong result), even after adding 'ULL' to the long constants.
    Both of those machines are big endian btw, so big/little is ok.

  • jmg wrote: »
    cgracey wrote: »
    The Verilog is done and now I'm plugging it in.

    What is the logic size of the two 'random' generators ?
    ie what does this code in die area ?

    The old generator, of course, used 32 registers. I don't know how many ALMs it used.

    The new xoroshiro128+ generator uses 190 registers within 94.5 ALMs. To put that in perspective, it's about 1/4 the number of ALMs that a smart pin uses.
  • The new xoroshiro128+ PRNG is now implemented in the Prop2.

    I also changed the GETRND instruction to put unique RND bits into C and Z, instead of the same RND bit, as before. This will all be in the next release. Now, back to Spin2, for me.
  • Great stuff.

    As far as I can tell the Verilog seeds S0 and S1 with 1 and 0. Which makes the start up look not so random. I guess that is as random as anything else though.
  • Heater. wrote: »
    Great stuff.

    As far as I can tell the Verilog seeds S0 and S1 with 1 and 0. Which makes the start up look not so random. I guess that is as random as anything else though.

    That's right. It's all relative. The important thing is, from whatever state you start it in, it immediately goes haywire and does its thing.
  • Does this thing in HUB just free run generating numbers, and the cogs just sample it when they want a value?
    If so, then it should be sufficiently random, especially if something external (use input or reading an external device) causes an unknown delay at startup after it's going.
  • True enough.

    It's just that when I played with the KISS32 PRNG it would take forever to start looking "random" when seeded with zero or a low entropy seed. Much better to have at least an equal number of zero and one bits.

    So I had to try the 1, 0 seed with xoroshiro128+. It gets into something looking "random" much quicker. Like so:

    0000000000000001
    0080001000004001
    0008402018000121
    8080563010444122
    09d0240b1809c401
    01d82012758940e2
    69b05d703207c544
    df59215fd8d25ee6
    9a652772eb590ca2
    4ba7fb3a655fc1b1
    a44648618c9bfe2c
    d89157b50d9ced27
    431bed2f5777656a

    Does the verilog/PASM version generate the same sequence?
  • The generator could be allowed to run even when whole IC is in reset without any clear/reset logic in the registers. Ie: It just starts producing it's sequence as soon as the logic can function on rising supply voltage. Starting from a garbage state.
    As long as websites don't require javascript in the browser then I'm happy.
  • I'm assuming it's possible to encode that as Verilog of course.
    As long as websites don't require javascript in the browser then I'm happy.
  • Roy Eltham wrote: »
    Does this thing in HUB just free run generating numbers, and the cogs just sample it when they want a value?
    If so, then it should be sufficiently random, especially if something external (use input or reading an external device) causes an unknown delay at startup after it's going.

    Exactly. I don't know if anyone would have ever noticed a quality problem with the old LFSR, but this is really neat, anyway.
  • Would this work? (/me deletes the reset parts from Chips code)
    // RND
    
    module		rnd
    (
    input		clk,
    
    output	[62:0]	out
    );
    
    
    // xoroshiro128+
    
    reg  [63:0] s0, s1, ss;
    
    wire [63:0] sx = s0 ^ s1;
    
    always @(posedge clk)
    begin
      s0 <= {s0[08:00], s0[63:09]} ^ sx ^ {sx[49:00], 14'b0};
      s1 <= {sx[27:00], sx[63:28]};
      ss <= s0 + s1;
    end
    
    assign out = ss[63:01];    // output high-quality bits (LSB=LFSR)
    
    endmodule
    
    As long as websites don't require javascript in the browser then I'm happy.
  • cgraceycgracey Posts: 7,090
    edited March 3 Vote Up0Vote Down
    evanh wrote: »
    The generator could be allowed to run even when whole IC is in reset without any clear/reset logic in the registers. Ie: It just starts producing it's sequence as soon as the logic can function on rising supply voltage. Starting from a garbage state.

    Whoa!!! That's a great idea. It's a little late to change the schematic to make it do that, but there's another way we'll achieve the same effect:

    I just made it so that when you do a 'CLKSET D' with D[31] set, it doesn't set the clock mode, but writes D[31:0] into the upper 32 bits of S1 (one of two 64-bit registers in xoroshiro128+), leaving S0 and the lower 32 bits of S1 alone. With D[31] necessarily being set, there is no danger of the PRNG being zero'd. This portal into the PRNG allows 1/4 of its state bits to be altered. WHEN it gets altered, and WHAT it gets altered with, both have huge effects.

    So, here's how we'll have an effective 'true random number generator': The loader will configure a smart pin for ADC mode + GND calibration when it starts up. Every 1ms (20,000 clocks at 20MHz internal RC frequency), it will read the accumulated ADC sample, which will be a function of voltage, temperature, process, and thermal noise. It will then set the MSB of the sample data and do a 'SETCLK sample'. This will scramble 1/4 of the PRNG, and with each iteration, the randomizing affects compound. By the time the loader is ready to jump to the user's program, the PRNG is thoroughly seeded with true randomness. After that, it's deterministic, but with such rich state and such a good algorithm, it's going to have the quality of a TRNG. No need to make any other attempts at a TRNG. A 2-clock instruction (GETRND) can get you a random number any time.
  • cgracey wrote: »
    evanh wrote: »
    The generator could be allowed to run even when whole IC is in reset without any clear/reset logic in the registers. Ie: It just starts producing it's sequence as soon as the logic can function on rising supply voltage. Starting from a garbage state.

    Whoa!!! That's a great idea. It's a little late to change the schematic to make it do that, but there's another way we'll achieve the same effect:...
    This statement interests me. Evan provided some Verilog code that performs the random number function but you mentioned a schematic. Aren't you doing your coding in Verilog as well? Where does the schematic come in?

  • evanh wrote: »
    Would this work? (/me deletes the reset parts from Chips code)
    // RND
    
    module		rnd
    (
    input		clk,
    
    output	[62:0]	out
    );
    
    
    // xoroshiro128+
    
    reg  [63:0] s0, s1, ss;
    
    wire [63:0] sx = s0 ^ s1;
    
    always @(posedge clk)
    begin
      s0 <= {s0[08:00], s0[63:09]} ^ sx ^ {sx[49:00], 14'b0};
      s1 <= {sx[27:00], sx[63:28]};
      ss <= s0 + s1;
    end
    
    assign out = ss[63:01];    // output high-quality bits (LSB=LFSR)
    
    endmodule
    

    As long as it started up with at least a single '1' bit in either S0 or S1, it would work. Otherwise, if everything was 0's, it would never get out of that zero state. It would be very boring.
  • Ahle2Ahle2 Posts: 906
    edited March 3 Vote Up0Vote Down
    Awesome Chip, evanh, Tor, Roy Eltham, Heater and the rest!!!! :) :) :) I really like all things I have read so far on page 3 of this thread! :) :) :)
    I was about to ask about how we could seed the PRNG at startup, and now it seems like it's already taken care of!
    SIDcog - The sound of the Commodore 64 in a single cog: Thread, OBEX, SIDcogMedlay.mp3
    AYcog - An emulation of the AY3-8910 / YM2149F PSG: Thread, OBEX
    SNEcog - An emulation of the SN76489 PSG(and variants): Thread, OBEX
    Propeller chiptune player: Thread
  • Ahle2 wrote: »
    Awesome Chip, evanh, Tor, Roy Eltham, Heater and the rest!!!! :) :) :) I really like all things I have read so far on page 3 of this thread! :) :) :)
    I was about to ask about how we could seed the PRNG at startup, and now it seems like it's already taken care of!

    I think this sufficiently solves the need for a TRNG:

    true random seed + excellent PRNG
  • Yep, I love it. That was just the motivation behind my old JKISS32 PRNG seeded from RealRandom. Only this is better :)

    Does that verilog produce the same results as the original C code? We would not want a silly mistake shortening the period.
  • Heater. wrote: »
    Yep, I love it. That was just the motivation behind my old JKISS32 PRNG seeded from RealRandom. Only this is better :)

    Does that verilog produce the same results as the original C code? We would not want a silly mistake shortening the period.

    I don't know, but the implementation is so simple, and it was made from the original C code.
  • Ahle2Ahle2 Posts: 906
    edited March 3 Vote Up0Vote Down
    How could I obtain a ~4 GB file worth of data produced by your P2 implementation to run through Dieharder, Practrand and TestU01?! If it successfully runs through these tests, I think your implemenation is good and if the seed is known I could verify that it produces the same sequence as the C version with my tool!

    /Johannes
    SIDcog - The sound of the Commodore 64 in a single cog: Thread, OBEX, SIDcogMedlay.mp3
    AYcog - An emulation of the AY3-8910 / YM2149F PSG: Thread, OBEX
    SNEcog - An emulation of the SN76489 PSG(and variants): Thread, OBEX
    Propeller chiptune player: Thread
  • Ahle2 wrote: »
    How could I obtain a ~4 GB file worth of data produced by your P2 implementation to run through Dieharder, Practrand and TestU01?! If it successfully runs through these tests, I think your implemenation is good and if the seed is known I could verify if it produces the same sequence as the C version with my tool!

    /Johannes

    I would have to make it step once per SETCLK, or something. Kind of a pain. There are only a few lines of Verilog, though, and it's easy to see the equivalence to the C code.

    I think that to check equivalency, you'd only need the first 64 or 128 samples, maybe fewer. The reason is that all bits of S0 and S1 get used before many iterations and their pattern is deterministic. You could probably even verify it by looking at the lower 32 bits of output over a few dozen iterations. Any error in implementation would cause an early difference.
  • Ahle2, I'm doing a compile with single-step of the PRNG. I'm seeding with s0=1, s1=0. I'll be able to read out the lower 16 bits of s0. Maybe I can make it serially output to a file.
  • Tor,
    Didn't work on AIX in 32-bit mode (wrong result), even after adding 'ULL' to the long constants.
    What kind of compiler museum have you got there? :)

    "//" is not C++ style comments. They have been in the C standard for nearly 20 years.

    That 32 bit AIX compiler seems to be buggy. When I compile a 32 bit executable with GCC I get the correct results.

    Might be interesting to find out where it goes wrong.Tor,
    Didn't work on AIX in 32-bit mode (wrong result), even after adding 'ULL' to the long constants.
    What kind of compiler museum have you got there? :)

    "//" is not C++ style comments. They have been in the C standard for nearly 20 years.

    That 32 bit AIX compiler seems to be buggy. When I compile a 32 bit executable with GCC I get the correct results.

    Might be interesting to find out where it goes wrong.
  • Ahle2Ahle2 Posts: 906
    edited March 3 Vote Up0Vote Down
    Chip,

    The ~4 GB worth of data is to run through the test suites. Even a 200Mb file produced with a TRNG fails because the file repeats 100's of times in each test.
    To just verify the integrity it would suffice with the 64 bit output from the 12345 iteration using seed s0=1, s1=0! :)

    /Johannes
    SIDcog - The sound of the Commodore 64 in a single cog: Thread, OBEX, SIDcogMedlay.mp3
    AYcog - An emulation of the AY3-8910 / YM2149F PSG: Thread, OBEX
    SNEcog - An emulation of the SN76489 PSG(and variants): Thread, OBEX
    Propeller chiptune player: Thread
  • evanhevanh Posts: 3,608
    edited March 3 Vote Up0Vote Down
    A difference from Johannes's original posting of xoroshiro is that result (ss) is updated from old data before recalculating. No idea if that should matter though. Maybe that just helps execution speed on superscalar CPUs.

    EDIT: Ah, I see, the parallel nature of Verilog is deceptive when written as text like that. ss is updated effectively before the recalculation of s0 and s1.

    Chip,
    Does ss need to be a register at all? It never accumulates anything. I suppose it's needed just as a registered output.
    As long as websites don't require javascript in the browser then I'm happy.
  • Ahle2Ahle2 Posts: 906
    edited March 3 Vote Up0Vote Down
    Chip,

    The 12345'th iteration yields: 166e226a59a395b5
    And the 182837465'th iterations yields: 3dfa177a94cfb71

    /Johannes
    SIDcog - The sound of the Commodore 64 in a single cog: Thread, OBEX, SIDcogMedlay.mp3
    AYcog - An emulation of the AY3-8910 / YM2149F PSG: Thread, OBEX
    SNEcog - An emulation of the SN76489 PSG(and variants): Thread, OBEX
    Propeller chiptune player: Thread
  • evanh wrote: »
    A difference from Johannes's original posting of xoroshiro is that result (ss) is updated from old data before recalculating. No idea if that should matter though. Maybe that just helps execution speed on superscalar CPUs.

    EDIT: Ah, I see, the parallel nature of Verilog is deceptive when written as text like that. ss is updated effectively before the recalculation of s0 and s1.

    Chip,
    Does ss need to be a register at all? It never accumulates anything. I suppose it's needed just as a registered output.

    Right. It's just needed as a registered output, because there are lots of routing delays after.
  • Ahle2 wrote: »
    Chip,

    The 12345'th iteration yields: 166e226a59a395b5
    And the 182837465'th iterations yields: 3dfa177a94cfb71

    /Johannes

    I'm not getting those values. I did, though, notice a discrepancy between the PASM version and the Verilog version. There was an error in the PASM version and after I fixed it, it agreed with the Verilog.

    Could you show me the code you are running to get those results you showed?
Sign In or Register to comment.