Perhaps someone else would like to reproduce what I have done on whatever machine/compiler they have?
Verified on AIX 7.1 w/IBM compiler, and on Tru64. Had to massage the code a bit, getting rid of the C++ comments and some other modernities, to get it to compile. Didn't work on AIX in 32-bit mode (wrong result), even after adding 'ULL' to the long constants.
Both of those machines are big endian btw, so big/little is ok.
What is the logic size of the two 'random' generators ?
ie what does this code in die area ?
The old generator, of course, used 32 registers. I don't know how many ALMs it used.
The new xoroshiro128+ generator uses 190 registers within 94.5 ALMs. To put that in perspective, it's about 1/4 the number of ALMs that a smart pin uses.
The new xoroshiro128+ PRNG is now implemented in the Prop2.
I also changed the GETRND instruction to put unique RND bits into C and Z, instead of the same RND bit, as before. This will all be in the next release. Now, back to Spin2, for me.
As far as I can tell the Verilog seeds S0 and S1 with 1 and 0. Which makes the start up look not so random. I guess that is as random as anything else though.
As far as I can tell the Verilog seeds S0 and S1 with 1 and 0. Which makes the start up look not so random. I guess that is as random as anything else though.
That's right. It's all relative. The important thing is, from whatever state you start it in, it immediately goes haywire and does its thing.
Does this thing in HUB just free run generating numbers, and the cogs just sample it when they want a value?
If so, then it should be sufficiently random, especially if something external (use input or reading an external device) causes an unknown delay at startup after it's going.
It's just that when I played with the KISS32 PRNG it would take forever to start looking "random" when seeded with zero or a low entropy seed. Much better to have at least an equal number of zero and one bits.
So I had to try the 1, 0 seed with xoroshiro128+. It gets into something looking "random" much quicker. Like so:
The generator could be allowed to run even when whole IC is in reset without any clear/reset logic in the registers. Ie: It just starts producing it's sequence as soon as the logic can function on rising supply voltage. Starting from a garbage state.
Does this thing in HUB just free run generating numbers, and the cogs just sample it when they want a value?
If so, then it should be sufficiently random, especially if something external (use input or reading an external device) causes an unknown delay at startup after it's going.
Exactly. I don't know if anyone would have ever noticed a quality problem with the old LFSR, but this is really neat, anyway.
The generator could be allowed to run even when whole IC is in reset without any clear/reset logic in the registers. Ie: It just starts producing it's sequence as soon as the logic can function on rising supply voltage. Starting from a garbage state.
Whoa!!! That's a great idea. It's a little late to change the schematic to make it do that, but there's another way we'll achieve the same effect:
I just made it so that when you do a 'CLKSET D' with D[31] set, it doesn't set the clock mode, but writes D[31:0] into the upper 32 bits of S1 (one of two 64-bit registers in xoroshiro128+), leaving S0 and the lower 32 bits of S1 alone. With D[31] necessarily being set, there is no danger of the PRNG being zero'd. This portal into the PRNG allows 1/4 of its state bits to be altered. WHEN it gets altered, and WHAT it gets altered with, both have huge effects.
So, here's how we'll have an effective 'true random number generator': The loader will configure a smart pin for ADC mode + GND calibration when it starts up. Every 1ms (20,000 clocks at 20MHz internal RC frequency), it will read the accumulated ADC sample, which will be a function of voltage, temperature, process, and thermal noise. It will then set the MSB of the sample data and do a 'SETCLK sample'. This will scramble 1/4 of the PRNG, and with each iteration, the randomizing affects compound. By the time the loader is ready to jump to the user's program, the PRNG is thoroughly seeded with true randomness. After that, it's deterministic, but with such rich state and such a good algorithm, it's going to have the quality of a TRNG. No need to make any other attempts at a TRNG. A 2-clock instruction (GETRND) can get you a random number any time.
The generator could be allowed to run even when whole IC is in reset without any clear/reset logic in the registers. Ie: It just starts producing it's sequence as soon as the logic can function on rising supply voltage. Starting from a garbage state.
Whoa!!! That's a great idea. It's a little late to change the schematic to make it do that, but there's another way we'll achieve the same effect:...
This statement interests me. Evan provided some Verilog code that performs the random number function but you mentioned a schematic. Aren't you doing your coding in Verilog as well? Where does the schematic come in?
As long as it started up with at least a single '1' bit in either S0 or S1, it would work. Otherwise, if everything was 0's, it would never get out of that zero state. It would be very boring.
Awesome Chip, evanh, Tor, Roy Eltham, Heater and the rest!!!! I really like all things I have read so far on page 3 of this thread!
I was about to ask about how we could seed the PRNG at startup, and now it seems like it's already taken care of!
Awesome Chip, evanh, Tor, Roy Eltham, Heater and the rest!!!! I really like all things I have read so far on page 3 of this thread!
I was about to ask about how we could seed the PRNG at startup, and now it seems like it's already taken care of!
I think this sufficiently solves the need for a TRNG:
How could I obtain a ~4 GB file worth of data produced by your P2 implementation to run through Dieharder, Practrand and TestU01?! If it successfully runs through these tests, I think your implemenation is good and if the seed is known I could verify that it produces the same sequence as the C version with my tool!
How could I obtain a ~4 GB file worth of data produced by your P2 implementation to run through Dieharder, Practrand and TestU01?! If it successfully runs through these tests, I think your implemenation is good and if the seed is known I could verify if it produces the same sequence as the C version with my tool!
/Johannes
I would have to make it step once per SETCLK, or something. Kind of a pain. There are only a few lines of Verilog, though, and it's easy to see the equivalence to the C code.
I think that to check equivalency, you'd only need the first 64 or 128 samples, maybe fewer. The reason is that all bits of S0 and S1 get used before many iterations and their pattern is deterministic. You could probably even verify it by looking at the lower 32 bits of output over a few dozen iterations. Any error in implementation would cause an early difference.
Ahle2, I'm doing a compile with single-step of the PRNG. I'm seeding with s0=1, s1=0. I'll be able to read out the lower 16 bits of s0. Maybe I can make it serially output to a file.
The ~4 GB worth of data is to run through the test suites. Even a 200Mb file produced with a TRNG fails because the file repeats 100's of times in each test.
To just verify the integrity it would suffice with the 64 bit output from the 12345 iteration using seed s0=1, s1=0!
A difference from Johannes's original posting of xoroshiro is that result (ss) is updated from old data before recalculating. No idea if that should matter though. Maybe that just helps execution speed on superscalar CPUs.
EDIT: Ah, I see, the parallel nature of Verilog is deceptive when written as text like that. ss is updated effectively before the recalculation of s0 and s1.
Chip,
Does ss need to be a register at all? It never accumulates anything. I suppose it's needed just as a registered output.
A difference from Johannes's original posting of xoroshiro is that result (ss) is updated from old data before recalculating. No idea if that should matter though. Maybe that just helps execution speed on superscalar CPUs.
EDIT: Ah, I see, the parallel nature of Verilog is deceptive when written as text like that. ss is updated effectively before the recalculation of s0 and s1.
Chip,
Does ss need to be a register at all? It never accumulates anything. I suppose it's needed just as a registered output.
Right. It's just needed as a registered output, because there are lots of routing delays after.
The 12345'th iteration yields: 166e226a59a395b5
And the 182837465'th iterations yields: 3dfa177a94cfb71
/Johannes
I'm not getting those values. I did, though, notice a discrepancy between the PASM version and the Verilog version. There was an error in the PASM version and after I fixed it, it agreed with the Verilog.
Could you show me the code you are running to get those results you showed?
Comments
What is the logic size of the two 'random' generators ?
ie what does this code in die area ?
Yes, this is a good development. Just in time, too.
Both of those machines are big endian btw, so big/little is ok.
The old generator, of course, used 32 registers. I don't know how many ALMs it used.
The new xoroshiro128+ generator uses 190 registers within 94.5 ALMs. To put that in perspective, it's about 1/4 the number of ALMs that a smart pin uses.
I also changed the GETRND instruction to put unique RND bits into C and Z, instead of the same RND bit, as before. This will all be in the next release. Now, back to Spin2, for me.
As far as I can tell the Verilog seeds S0 and S1 with 1 and 0. Which makes the start up look not so random. I guess that is as random as anything else though.
That's right. It's all relative. The important thing is, from whatever state you start it in, it immediately goes haywire and does its thing.
If so, then it should be sufficiently random, especially if something external (use input or reading an external device) causes an unknown delay at startup after it's going.
It's just that when I played with the KISS32 PRNG it would take forever to start looking "random" when seeded with zero or a low entropy seed. Much better to have at least an equal number of zero and one bits.
So I had to try the 1, 0 seed with xoroshiro128+. It gets into something looking "random" much quicker. Like so:
0000000000000001
0080001000004001
0008402018000121
8080563010444122
09d0240b1809c401
01d82012758940e2
69b05d703207c544
df59215fd8d25ee6
9a652772eb590ca2
4ba7fb3a655fc1b1
a44648618c9bfe2c
d89157b50d9ced27
431bed2f5777656a
Does the verilog/PASM version generate the same sequence?
Exactly. I don't know if anyone would have ever noticed a quality problem with the old LFSR, but this is really neat, anyway.
Whoa!!! That's a great idea. It's a little late to change the schematic to make it do that, but there's another way we'll achieve the same effect:
I just made it so that when you do a 'CLKSET D' with D[31] set, it doesn't set the clock mode, but writes D[31:0] into the upper 32 bits of S1 (one of two 64-bit registers in xoroshiro128+), leaving S0 and the lower 32 bits of S1 alone. With D[31] necessarily being set, there is no danger of the PRNG being zero'd. This portal into the PRNG allows 1/4 of its state bits to be altered. WHEN it gets altered, and WHAT it gets altered with, both have huge effects.
So, here's how we'll have an effective 'true random number generator': The loader will configure a smart pin for ADC mode + GND calibration when it starts up. Every 1ms (20,000 clocks at 20MHz internal RC frequency), it will read the accumulated ADC sample, which will be a function of voltage, temperature, process, and thermal noise. It will then set the MSB of the sample data and do a 'SETCLK sample'. This will scramble 1/4 of the PRNG, and with each iteration, the randomizing affects compound. By the time the loader is ready to jump to the user's program, the PRNG is thoroughly seeded with true randomness. After that, it's deterministic, but with such rich state and such a good algorithm, it's going to have the quality of a TRNG. No need to make any other attempts at a TRNG. A 2-clock instruction (GETRND) can get you a random number any time.
As long as it started up with at least a single '1' bit in either S0 or S1, it would work. Otherwise, if everything was 0's, it would never get out of that zero state. It would be very boring.
I was about to ask about how we could seed the PRNG at startup, and now it seems like it's already taken care of!
I think this sufficiently solves the need for a TRNG:
true random seed + excellent PRNG
Does that verilog produce the same results as the original C code? We would not want a silly mistake shortening the period.
I don't know, but the implementation is so simple, and it was made from the original C code.
/Johannes
I would have to make it step once per SETCLK, or something. Kind of a pain. There are only a few lines of Verilog, though, and it's easy to see the equivalence to the C code.
I think that to check equivalency, you'd only need the first 64 or 128 samples, maybe fewer. The reason is that all bits of S0 and S1 get used before many iterations and their pattern is deterministic. You could probably even verify it by looking at the lower 32 bits of output over a few dozen iterations. Any error in implementation would cause an early difference.
"//" is not C++ style comments. They have been in the C standard for nearly 20 years.
That 32 bit AIX compiler seems to be buggy. When I compile a 32 bit executable with GCC I get the correct results.
Might be interesting to find out where it goes wrong.Tor, What kind of compiler museum have you got there?
"//" is not C++ style comments. They have been in the C standard for nearly 20 years.
That 32 bit AIX compiler seems to be buggy. When I compile a 32 bit executable with GCC I get the correct results.
Might be interesting to find out where it goes wrong.
The ~4 GB worth of data is to run through the test suites. Even a 200Mb file produced with a TRNG fails because the file repeats 100's of times in each test.
To just verify the integrity it would suffice with the 64 bit output from the 12345 iteration using seed s0=1, s1=0!
/Johannes
EDIT: Ah, I see, the parallel nature of Verilog is deceptive when written as text like that. ss is updated effectively before the recalculation of s0 and s1.
Chip,
Does ss need to be a register at all? It never accumulates anything. I suppose it's needed just as a registered output.
The 12345'th iteration yields: 166e226a59a395b5
And the 182837465'th iterations yields: 3dfa177a94cfb71
/Johannes
Right. It's just needed as a registered output, because there are lots of routing delays after.
I'm not getting those values. I did, though, notice a discrepancy between the PASM version and the Verilog version. There was an error in the PASM version and after I fixed it, it agreed with the Verilog.
Could you show me the code you are running to get those results you showed?