1. Assuming there is some sketchy hardware process, I'm not sure it needs Von Neumann "whitening". Any old random bits will do as as seed.
2. No. there is only one PRNG in the whole machine. Not one per COG. As far as I understand the plan is that it churns away stepping though it's sequence no matter if anyone reads it or not.
The weird fan out and invert thing is a way for each COG to select 32 bits from the 63 available from the PRNG at any time. I'm not sure if it is necessary or even works!
Just now I cannot get Chip's verilog version of this PRNG to pass the PractRand test when run under the Icarus simulator. Quite likely my ignorance of verilog though.
Okaaay ... what then is the point of deriving 16 32-bit numbers from the original 64?
I believe that's just a nicety to use different parts of the whole. The real benefit is the long repeat cycle. As Heater mentioned, it's really a 128-bit number.
We've been thrashing it already but, now you've mention like this, maybe the summing should be done differently to produce 32 bit results instead of 63 bit results.
Okaaay ... what then is the point of deriving 16 32-bit numbers from the original 64?
Maybe the summing should be done differently to produce 32 bit results instead of 63 bit results.
But that would kind of produce a new PRNG that has not been tested, analyzed and proved "worthy". The designers behind Xoroshiro has been tweaking, analyzing, comparing it with numerous other PRNG's and running it through endless tests for years. I think our naive thinking that maybe using the reminder from the 64 bit addition as a substitute for the original LSB to get a 64 bit result proves my point. We can not allow any "randomness" (or arbitrarity of you want) in the algorithm itself. If we made such an change we would have to prove it worthy and the P2 would get delayed for month!
But that would kind of produce a new PRNG that has not been tested, analyzed and proved "worthy". The designers behind Xoroshiro has been tweaking, analyzing, comparing it with numerous other PRNG's and running it through endless tests for years. I think our naive thinking that maybe using the reminder from the 64 bit addition as a substitute for the original LSB to get a 64 bit result proves my point. We can not allow any "randomness" (or arbitrarity of you want) in the algorithm itself.
That raises a good point about those 63->32 taps that Chip allocated - have those taps been proven to be OK ?
It seems like code should be run with exactly those taps, and checked on all of the 16 32b results given.
- just to ensure there are no surprises.
The weird fan out and invert thing is a way for each COG to select 32 bits from the 63 available from the PRNG at any time.
Ugh, that's somewhat more disturbing. I can't imagine that there wouldn't be non-zero correlations among the 16 cogs doing it that way. But maybe the DieHard boffins will prove me wrong.
Anyway, what "randomness" is being tested? The bit-by-bit randomness, or the integer values they represent? Those are two very different things.
I added those taps to the PRNG Verilog that Chip posted. Compile it with a test bench using the Icarus Verilog compiler and run it with the Icarus simulator.
The results failed the PractRand test almost immediately.
However I know nothing of verilog so I have quite likely messed up my test harness. Then I may also have an error in the reformatting of data required to get it into PractRand.
As far as I understand, Dieharder and Practrand has numerous tests for determing the randomness on a bit level. Some of the tests sees the incoming random numbers as a sea of bits rather than 32 bits numbers or some such. Then they run suites that tests for different "bit spacing" (in lack of a better term) in this sea. Like for example every other bit or every 7th bit and so on.
And if any of the discussed PRNG's is anything near good, tapping arbitrary bits from the 63 available, should theoretically be equally good.
And if any of the discussed PRNG's is anything near good, tapping arbitrary bits from the 63 available, should theoretically be equally good.
Yes, true enough, on an individual basis. But what about cross-correlations among cogs that sample different bits from the same 63-bit value? Can we still assure statistical independence?
Yeah, we've found good testing tools to verify this ourselves now. Not only that but remember the summing is outside of the randomiser sequencing loop. It's there as an extraction step to pull 128 bits down to 63 bits.
So, we could do our own 128 -> 32 summing without the basic sequencing changing. Then put the results through the testers we've now got and see how it fairs. If it holds up then that'll eliminate the arbitrary mapping of bits to Cogs.
I can't imagine that there wouldn't be non-zero correlations among the 16 cogs doing it that way.
I'm sure you are right.
The problem is how to create 16 independent, uncorrelated PRNGs. What to do? :
1) Create 16 instances of the PRNG hardware.
That leaves the problem of seeding them all so that they all use fifferent parts of the PRNG sequence.
This PRNG has a sequence length of 2^128 so it is possible to create up to 2^64 seeds that can be used to get non-overlapping subsets of the PRNG sequence. Each subset being a sequence of length 2^64.
I posted the jump() function here earlier that does exactly that.
This solution is heavy on hardware resources. And that seeding business is better done in software.
2) Have one instance of the PRNG hardware.
Just give every COG the next number generated. If this were a HUB resource then it would be accessed in a round-robin fashion. Nobody would ever get the same number.
Is the PRNG a HUB resource?
Is distributing successive numbers to COGS sufficiently uncorrelated?
This is not a solution those guys running massively parallel simulations would use because it requires all the processes to communicate with the single PRNG to get a random number. That's too slow, too much communication overhead. But I don't see why it is not workable for the Propeller.
3) Do what Chip has done.
Pull out 32 of the 63 bits, and feed them to each COG. Now they could all read from the same generated number at the same time but they all get a different arrangement of bits.
I must admit this seems a bit "smelly".
But don't forget this PRNG is running continuously, stepping through it's sequence, even if nobody is reading from it.
And if any of the discussed PRNG's is anything near good, tapping arbitrary bits from the 63 available, should theoretically be equally good.
Yes, true enough, on an individual basis. But what about cross-correlations among cogs that sample different bits from the same 63-bit value? Can we still assure statistical independence?
-Phil
My gut feeling is no! But is it a problem? For some parallel simulations (parallel Monte Carlo simulations etc.) it might be, but if the cogs samples on different clocks of the PRNG it should be independent!
Chip,
I've got a working 130-bit solution that produces nice 64-bit results that PractRand likes. Basically just extended s0 and s1 by one bit each to push the carry out of reach when summing. It still relies on the same shift and rotate constants, of 55, 14 and 36, as the original xoroshiro algorithm.
There was a second code ordering warning that briefly came into existence when I was shuffling things around, but I think that never caused any errors due to it being declared "inlined".
EDIT: Ah, you were right, that second one was an error. Oops, It was only up for a moment.
$ ./evanh.exe | PractRand_0.93/RNG_test.exe stdin
RNG_test using PractRand version 0.93
RNG = RNG_stdin, seed = 0xef0aa058
test set = normal, folding = standard(unknown format)
rng=RNG_stdin, seed=0xef0aa058
length= 32 megabytes (2^25 bytes), time= 3.2 seconds
Test Name Raw Processed Evaluation
[Low1/32]Gap-16:B R= -4.3 p =1-9.8e-4 unusual
...and 129 test result(s) without anomalies
...
rng=RNG_stdin, seed=0xef0aa058
length= 1 gigabyte (2^30 bytes), time= 107 seconds
Test Name Raw Processed Evaluation
DC6-9x1Bytes-1 R= +8.2 p = 2.0e-4 mildly suspicious
...and 182 test result(s) without anomalies
rng=RNG_stdin, seed=0xef0aa058
length= 2 gigabytes (2^31 bytes), time= 211 seconds
Test Name Raw Processed Evaluation
DC6-9x1Bytes-1 R= +13.6 p = 2.3e-7 very suspicious
...and 193 test result(s) without anomalies
rng=RNG_stdin, seed=0xef0aa058
length= 4 gigabytes (2^32 bytes), time= 417 seconds
Test Name Raw Processed Evaluation
BCFN(2+0,13-0,T) R= +10.6 p = 3.3e-5 mildly suspicious
DC6-9x1Bytes-1 R= +31.3 p = 3.8e-15 FAIL !
...and 201 test result(s) without anomalies
$
If I am developing an algorithm (monte carlo simulations) it is important to be able to SEED my PRNG so I can have different runs I can relate/compare. Storing huge numbers of random numbers on the memory limited P2 is no option.
As I could use any seed that is all the randomness I can think of.
I personally see no need for a REAL random number, whatever this will be ...
So now I am not sure if this 'deterministic PRNG',
which gives the same sequence of PRNs every time for a given seed, is still in there ... ??
The problem is how to create 16 independent, uncorrelated PRNGs. What to do? :
1) Create 16 instances of the PRNG hardware.
That leaves the problem of seeding them all so that they all use fifferent parts of the PRNG sequence.
This PRNG has a sequence length of 2^128 so it is possible to create up to 2^64 seeds that can be used to get non-overlapping subsets of the PRNG sequence. Each subset being a sequence of length 2^64.
I posted the jump() function here earlier that does exactly that.
This solution is heavy on hardware resources. And that seeding business is better done in software.
What about making it pipelined like the cordic unit?
I mean iterating the combinatorial part (the taps and gates) to operate on 16 separate bit vectors:
- load seed #n
- apply RNG step to seed #n
- save seed #n
Once initialized, every sliced generator would mantain it's own state.
I guess this only make sense if the vectors are placed in an addressable memory, otherwise more logic is wasted than saved, due to switching.
Comments
I'm not sure...
1. Assuming there is some sketchy hardware process, I'm not sure it needs Von Neumann "whitening". Any old random bits will do as as seed.
2. No. there is only one PRNG in the whole machine. Not one per COG. As far as I understand the plan is that it churns away stepping though it's sequence no matter if anyone reads it or not.
The weird fan out and invert thing is a way for each COG to select 32 bits from the 63 available from the PRNG at any time. I'm not sure if it is necessary or even works!
Just now I cannot get Chip's verilog version of this PRNG to pass the PractRand test when run under the Icarus simulator. Quite likely my ignorance of verilog though.
The state of this PRNG is 128 bits.
So while one might be able to disturb the generator one cannot reproduce the original seed and get the same sequence back again.
-Phil
We've been thrashing it already but, now you've mention like this, maybe the summing should be done differently to produce 32 bit results instead of 63 bit results.
That raises a good point about those 63->32 taps that Chip allocated - have those taps been proven to be OK ?
It seems like code should be run with exactly those taps, and checked on all of the 16 32b results given.
- just to ensure there are no surprises.
Anyway, what "randomness" is being tested? The bit-by-bit randomness, or the integer values they represent? Those are two very different things.
-Phil
I added those taps to the PRNG Verilog that Chip posted. Compile it with a test bench using the Icarus Verilog compiler and run it with the Icarus simulator.
The results failed the PractRand test almost immediately.
However I know nothing of verilog so I have quite likely messed up my test harness. Then I may also have an error in the reformatting of data required to get it into PractRand.
Will continue ....
And if any of the discussed PRNG's is anything near good, tapping arbitrary bits from the 63 available, should theoretically be equally good.
-Phil
So, we could do our own 128 -> 32 summing without the basic sequencing changing. Then put the results through the testers we've now got and see how it fairs. If it holds up then that'll eliminate the arbitrary mapping of bits to Cogs.
The problem is how to create 16 independent, uncorrelated PRNGs. What to do? :
1) Create 16 instances of the PRNG hardware.
That leaves the problem of seeding them all so that they all use fifferent parts of the PRNG sequence.
This PRNG has a sequence length of 2^128 so it is possible to create up to 2^64 seeds that can be used to get non-overlapping subsets of the PRNG sequence. Each subset being a sequence of length 2^64.
I posted the jump() function here earlier that does exactly that.
This solution is heavy on hardware resources. And that seeding business is better done in software.
2) Have one instance of the PRNG hardware.
Just give every COG the next number generated. If this were a HUB resource then it would be accessed in a round-robin fashion. Nobody would ever get the same number.
Is the PRNG a HUB resource?
Is distributing successive numbers to COGS sufficiently uncorrelated?
This is not a solution those guys running massively parallel simulations would use because it requires all the processes to communicate with the single PRNG to get a random number. That's too slow, too much communication overhead. But I don't see why it is not workable for the Propeller.
3) Do what Chip has done.
Pull out 32 of the 63 bits, and feed them to each COG. Now they could all read from the same generated number at the same time but they all get a different arrangement of bits.
I must admit this seems a bit "smelly".
But don't forget this PRNG is running continuously, stepping through it's sequence, even if nobody is reading from it.
I've got a working 130-bit solution that produces nice 64-bit results that PractRand likes. Basically just extended s0 and s1 by one bit each to push the carry out of reach when summing. It still relies on the same shift and rotate constants, of 55, 14 and 36, as the original xoroshiro algorithm.
- DELETED BUGGY CODE - See fixed version - http://forums.parallax.com/discussion/comment/1403058/#Comment_1403058
EDIT: Ah, you were right, that second one was an error. Oops, It was only up for a moment.
However this compiles clean:
How would we ever know what the period of the evanh generator is?
Excellent attempt mind you.
I'm going to bed. I'll have another shot tomorrow ...
If I am developing an algorithm (monte carlo simulations) it is important to be able to SEED my PRNG so I can have different runs I can relate/compare. Storing huge numbers of random numbers on the memory limited P2 is no option.
As I could use any seed that is all the randomness I can think of.
I personally see no need for a REAL random number, whatever this will be ...
So now I am not sure if this 'deterministic PRNG',
which gives the same sequence of PRNs every time for a given seed, is still in there ... ??
And I am used to determinism from P1 ;-)
The hardware one never was particularly repeatable for a given program due to it being a free-running generator.
What about making it pipelined like the cordic unit?
I mean iterating the combinatorial part (the taps and gates) to operate on 16 separate bit vectors:
- load seed #n
- apply RNG step to seed #n
- save seed #n
Once initialized, every sliced generator would mantain it's own state.
I guess this only make sense if the vectors are placed in an addressable memory, otherwise more logic is wasted than saved, due to switching.