Random/LFSR on P2

1679111285

Comments

  • Phil,

    I'm not sure...

    1. Assuming there is some sketchy hardware process, I'm not sure it needs Von Neumann "whitening". Any old random bits will do as as seed.

    2. No. there is only one PRNG in the whole machine. Not one per COG. As far as I understand the plan is that it churns away stepping though it's sequence no matter if anyone reads it or not.

    The weird fan out and invert thing is a way for each COG to select 32 bits from the 63 available from the PRNG at any time. I'm not sure if it is necessary or even works!

    Just now I cannot get Chip's verilog version of this PRNG to pass the PractRand test when run under the Icarus simulator. Quite likely my ignorance of verilog though.
  • And any software can later reseed it I suspect. Although it's only a 32-bit seed. This was recently added to the multi-function SETCLK instruction - http://forums.parallax.com/discussion/comment/1402352/#Comment_1402352.
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • Err, 31-bit seed. Since bit 31 must be set.
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • Only 32 bit seed.

    The state of this PRNG is 128 bits.

    So while one might be able to disturb the generator one cannot reproduce the original seed and get the same sequence back again.
  • heater wrote:
    No. there is only one PRNG in the whole machine. Not one per COG.
    Okaaay ... what then is the point of deriving 16 32-bit numbers from the original 64?

    -Phil
    “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery
  • Heater. wrote: »
    So while one might be able to disturb the generator one cannot reproduce the original seed and get the same sequence back again.
    There's no way to be sure of reproducibility in the first place anyway. It's not like any software is in programmatic control of the sequence.
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • Okaaay ... what then is the point of deriving 16 32-bit numbers from the original 64?
    I believe that's just a nicety to use different parts of the whole. The real benefit is the long repeat cycle. As Heater mentioned, it's really a 128-bit number.

    We've been thrashing it already but, now you've mention like this, maybe the summing should be done differently to produce 32 bit results instead of 63 bit results.
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • evanh wrote: »
    Okaaay ... what then is the point of deriving 16 32-bit numbers from the original 64?
    Maybe the summing should be done differently to produce 32 bit results instead of 63 bit results.
    But that would kind of produce a new PRNG that has not been tested, analyzed and proved "worthy". The designers behind Xoroshiro has been tweaking, analyzing, comparing it with numerous other PRNG's and running it through endless tests for years. I think our naive thinking that maybe using the reminder from the 64 bit addition as a substitute for the original LSB to get a 64 bit result proves my point. We can not allow any "randomness" (or arbitrarity of you want) in the algorithm itself. If we made such an change we would have to prove it worthy and the P2 would get delayed for month!
    SIDcog - The sound of the Commodore 64 in a single cog: Thread, OBEX
  • jmgjmg Posts: 14,175
    Ahle2 wrote: »
    But that would kind of produce a new PRNG that has not been tested, analyzed and proved "worthy". The designers behind Xoroshiro has been tweaking, analyzing, comparing it with numerous other PRNG's and running it through endless tests for years. I think our naive thinking that maybe using the reminder from the 64 bit addition as a substitute for the original LSB to get a 64 bit result proves my point. We can not allow any "randomness" (or arbitrarity of you want) in the algorithm itself.

    That raises a good point about those 63->32 taps that Chip allocated - have those taps been proven to be OK ?
    It seems like code should be run with exactly those taps, and checked on all of the 16 32b results given.
    - just to ensure there are no surprises.

  • heater wrote:
    The weird fan out and invert thing is a way for each COG to select 32 bits from the 63 available from the PRNG at any time.
    Ugh, that's somewhat more disturbing. I can't imagine that there wouldn't be non-zero correlations among the 16 cogs doing it that way. But maybe the DieHard boffins will prove me wrong.

    Anyway, what "randomness" is being tested? The bit-by-bit randomness, or the integer values they represent? Those are two very different things.

    -Phil
    “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery
  • Heater.Heater. Posts: 21,233
    edited 2017-03-08 - 09:05:24
    jmg,
    ...have those taps been proven to be OK ?
    That is just what I was trying to test yesterday.

    I added those taps to the PRNG Verilog that Chip posted. Compile it with a test bench using the Icarus Verilog compiler and run it with the Icarus simulator.

    The results failed the PractRand test almost immediately.

    However I know nothing of verilog so I have quite likely messed up my test harness. Then I may also have an error in the reformatting of data required to get it into PractRand.

    Will continue ....

  • Ahle2Ahle2 Posts: 975
    edited 2017-03-08 - 09:25:09
    As far as I understand, Dieharder and Practrand has numerous tests for determing the randomness on a bit level. Some of the tests sees the incoming random numbers as a sea of bits rather than 32 bits numbers or some such. Then they run suites that tests for different "bit spacing" (in lack of a better term) in this sea. Like for example every other bit or every 7th bit and so on.

    And if any of the discussed PRNG's is anything near good, tapping arbitrary bits from the 63 available, should theoretically be equally good.
    SIDcog - The sound of the Commodore 64 in a single cog: Thread, OBEX
  • Ahle2 wrote:
    And if any of the discussed PRNG's is anything near good, tapping arbitrary bits from the 63 available, should theoretically be equally good.
    Yes, true enough, on an individual basis. But what about cross-correlations among cogs that sample different bits from the same 63-bit value? Can we still assure statistical independence?

    -Phil
    “Perfection is achieved not when there is nothing more to add, but when there is nothing left to take away. -Antoine de Saint-Exupery
  • Yeah, we've found good testing tools to verify this ourselves now. Not only that but remember the summing is outside of the randomiser sequencing loop. It's there as an extraction step to pull 128 bits down to 63 bits.

    So, we could do our own 128 -> 32 summing without the basic sequencing changing. Then put the results through the testers we've now got and see how it fairs. If it holds up then that'll eliminate the arbitrary mapping of bits to Cogs.
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • Phil,
    I can't imagine that there wouldn't be non-zero correlations among the 16 cogs doing it that way.
    I'm sure you are right.

    The problem is how to create 16 independent, uncorrelated PRNGs. What to do? :

    1) Create 16 instances of the PRNG hardware.

    That leaves the problem of seeding them all so that they all use fifferent parts of the PRNG sequence.

    This PRNG has a sequence length of 2^128 so it is possible to create up to 2^64 seeds that can be used to get non-overlapping subsets of the PRNG sequence. Each subset being a sequence of length 2^64.

    I posted the jump() function here earlier that does exactly that.

    This solution is heavy on hardware resources. And that seeding business is better done in software.

    2) Have one instance of the PRNG hardware.

    Just give every COG the next number generated. If this were a HUB resource then it would be accessed in a round-robin fashion. Nobody would ever get the same number.

    Is the PRNG a HUB resource?

    Is distributing successive numbers to COGS sufficiently uncorrelated?

    This is not a solution those guys running massively parallel simulations would use because it requires all the processes to communicate with the single PRNG to get a random number. That's too slow, too much communication overhead. But I don't see why it is not workable for the Propeller.


    3) Do what Chip has done.

    Pull out 32 of the 63 bits, and feed them to each COG. Now they could all read from the same generated number at the same time but they all get a different arrangement of bits.

    I must admit this seems a bit "smelly".

    But don't forget this PRNG is running continuously, stepping through it's sequence, even if nobody is reading from it.




  • Ahle2Ahle2 Posts: 975
    edited 2017-03-08 - 10:13:05
    Ahle2 wrote:
    And if any of the discussed PRNG's is anything near good, tapping arbitrary bits from the 63 available, should theoretically be equally good.
    Yes, true enough, on an individual basis. But what about cross-correlations among cogs that sample different bits from the same 63-bit value? Can we still assure statistical independence?

    -Phil
    My gut feeling is no! But is it a problem? For some parallel simulations (parallel Monte Carlo simulations etc.) it might be, but if the cogs samples on different clocks of the PRNG it should be independent!

    SIDcog - The sound of the Commodore 64 in a single cog: Thread, OBEX
  • evanhevanh Posts: 8,586
    edited 2017-03-08 - 16:29:18
    Chip,
    I've got a working 130-bit solution that produces nice 64-bit results that PractRand likes. Basically just extended s0 and s1 by one bit each to push the carry out of reach when summing. It still relies on the same shift and rotate constants, of 55, 14 and 36, as the original xoroshiro algorithm.

    - DELETED BUGGY CODE - See fixed version - http://forums.parallax.com/discussion/comment/1403058/#Comment_1403058

    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • You have never compiled that now have you. Because, well, it does not compile.
  • evanhevanh Posts: 8,586
    edited 2017-03-08 - 12:47:46
    I get a warning but it compiles runnable code. I've just now tweaked it for clarity.
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • evanhevanh Posts: 8,586
    edited 2017-03-08 - 12:58:45
    There was a second code ordering warning that briefly came into existence when I was shuffling things around, but I think that never caused any errors due to it being declared "inlined".

    EDIT: Ah, you were right, that second one was an error. Oops, It was only up for a moment. :D
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • $ gcc evanh.c
    evanh.c: In function ‘rotl’:
    evanh.c:14:54: error: ‘mask65bits’ undeclared (first use in this function)
      return (value << shift) | (value >> (65 - shift)) & mask65bits;
    
  • Heater.Heater. Posts: 21,233
    edited 2017-03-08 - 13:12:10
    $ gcc -Wall -o evanh evanh.c
    evanh.c: In function ‘rotl’:
    evanh.c:16:52: warning: suggest parentheses around arithmetic in operand of ‘|’ [-Wparentheses]
      return (value << shift) | (value >> (65 - shift)) & MASK_65_BITS;
    

    However this compiles clean:
    #include <limits.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <stdint.h>
    
    typedef  unsigned __int128  uint128_t;
    #define MASK_65_BITS ((0x7fffffffffffffff << 2) | 0xff)
    
    // Seed the state array.  Seed MUST not be all zero.
    uint128_t  s[2] = {1, 0};
    
    static inline  uint128_t rotl(uint128_t value, int shift) {
    	return ((value << shift) | (value >> (65 - shift))) & MASK_65_BITS;
    }
    
    uint64_t  next(void) {
    	uint128_t  s0 = s[0];
    	uint128_t  s1 = s[1];
    	uint64_t  result = (s0 + s1) >> 1;
    
    	s1 ^= s0;
    	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14 & MASK_65_BITS); // a, b
    	s[1] = rotl(s1, 36); // c
    
    	return result;
    }
    
    int  main(int argc, char* argv[])
    {
    	uint64_t  random64[64];
    	int  i;
    
    	// Print some randomness
    	while(1)
    	{
    		random64[0] = next();
    
    		for (i = 7; i >= 0; i--) {
    			putchar(random64[0] >> (i * 8));
    		}
    	}
    	return 0;
    }
    
  • I had it all sorted as you made your first post.
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • Sadly the evanh PRNG scores 5 "weak" results in Dieharder. See attachment.

    How would we ever know what the period of the evanh generator is?
  • Heater.Heater. Posts: 21,233
    edited 2017-03-08 - 14:18:47
    Also sadly the evanh PRNG fails PractRand badly:
    $ ./evanh.exe | PractRand_0.93/RNG_test.exe stdin
    RNG_test using PractRand version 0.93
    RNG = RNG_stdin, seed = 0xef0aa058
    test set = normal, folding = standard(unknown format)
    
    rng=RNG_stdin, seed=0xef0aa058
    length= 32 megabytes (2^25 bytes), time= 3.2 seconds
      Test Name                         Raw       Processed     Evaluation
      [Low1/32]Gap-16:B                 R=  -4.3  p =1-9.8e-4   unusual
      ...and 129 test result(s) without anomalies
    
    ...
    
    rng=RNG_stdin, seed=0xef0aa058
    length= 1 gigabyte (2^30 bytes), time= 107 seconds
      Test Name                         Raw       Processed     Evaluation
      DC6-9x1Bytes-1                    R=  +8.2  p =  2.0e-4   mildly suspicious
      ...and 182 test result(s) without anomalies
    
    rng=RNG_stdin, seed=0xef0aa058
    length= 2 gigabytes (2^31 bytes), time= 211 seconds
      Test Name                         Raw       Processed     Evaluation
      DC6-9x1Bytes-1                    R= +13.6  p =  2.3e-7   very suspicious
      ...and 193 test result(s) without anomalies
    
    rng=RNG_stdin, seed=0xef0aa058
    length= 4 gigabytes (2^32 bytes), time= 417 seconds
      Test Name                         Raw       Processed     Evaluation
      BCFN(2+0,13-0,T)                  R= +10.6  p =  3.3e-5   mildly suspicious
      DC6-9x1Bytes-1                    R= +31.3  p =  3.8e-15    FAIL !
      ...and 201 test result(s) without anomalies
    $
    
    Excellent attempt mind you.
  • Bugger, the mask is failing! It's all ones - doing nothing. Didn't do a though job did I. :(

    I'm going to bed. I'll have another shot tomorrow ...
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • Okay I didn't go to bed. Got it right this time ...
    #include <limits.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <stdint.h>
    
    typedef  unsigned __int128  uint128_t;
    
    
    static uint128_t  mask65bits = 0xffffffffffffffff;  // A workaround for lack of 128bit constant init in gcc.
    //#define MASK_65_BITS ((0x7fffffffffffffff << 2) | 0xff)
    
    // Seed the state array.  Seed MUST not be all zero.
    static uint128_t  s[2] = {1, 0};
    
    
    static inline  uint128_t rotl(uint128_t value, int shift) {
    //	return (value << shift) | (value >> (sizeof(value) * CHAR_BIT - shift));
    	return (value << shift) | (value >> (65 - shift)) & mask65bits;
    //	return (value << shift) | (value >> (65 - shift)) & MASK_65_BITS;
    }
    
    
    static uint64_t  next(void) {
    	uint128_t  s0 = s[0];
    	uint128_t  s1 = s[1];
    	uint64_t  result = (s0 + s1) >> 1;
    
    	s1 ^= s0;
    	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14) & mask65bits; // a, b
    //	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14) & MASK_65_BITS; // a, b
    	s[1] = rotl(s1, 36); // c
    
    	return result;
    }
    
    
    
    int  main(int argc, char* argv[])
    {
    	uint64_t  random64[64];
    	int  i, j;
    
    
    	mask65bits = (mask65bits << 1) + 1;
    /*	for (i = 15; i >= 0; i--) {
    		j = (uint8_t)(mask65bits >> (i * 8));
    //		j = (uint8_t)(MASK_65_BITS >> (i * 8));
    		printf( "%02x ", j );
    	}
    	printf("\n");*/
    	// Print some randomness
    	while(1)
    	{
    		random64[0] = next();
    
    		for (i = 7; i >= 0; i--) {
    			putchar(random64[0] >> (i * 8));
    		}
    	}
    	return 0;
    }
    
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • MJBMJB Posts: 1,139
    I am following this thread just for curiosity.

    If I am developing an algorithm (monte carlo simulations) it is important to be able to SEED my PRNG so I can have different runs I can relate/compare. Storing huge numbers of random numbers on the memory limited P2 is no option.
    As I could use any seed that is all the randomness I can think of.
    I personally see no need for a REAL random number, whatever this will be ...

    So now I am not sure if this 'deterministic PRNG',
    which gives the same sequence of PRNs every time for a given seed, is still in there ... ??

    And I am used to determinism from P1 ;-)



  • For total predictable control of the generated numbers you should be using your own software based PRNG.

    The hardware one never was particularly repeatable for a given program due to it being a free-running generator.
    "Those who think we are powerless to do anything about the greenhouse effect forget about the 'White House effect,'" - George H.W. Bush, 1988.
  • Heater. wrote: »

    The problem is how to create 16 independent, uncorrelated PRNGs. What to do? :

    1) Create 16 instances of the PRNG hardware.

    That leaves the problem of seeding them all so that they all use fifferent parts of the PRNG sequence.

    This PRNG has a sequence length of 2^128 so it is possible to create up to 2^64 seeds that can be used to get non-overlapping subsets of the PRNG sequence. Each subset being a sequence of length 2^64.

    I posted the jump() function here earlier that does exactly that.

    This solution is heavy on hardware resources. And that seeding business is better done in software.

    What about making it pipelined like the cordic unit?
    I mean iterating the combinatorial part (the taps and gates) to operate on 16 separate bit vectors:
    - load seed #n
    - apply RNG step to seed #n
    - save seed #n
    Once initialized, every sliced generator would mantain it's own state.

    I guess this only make sense if the vectors are placed in an addressable memory, otherwise more logic is wasted than saved, due to switching.
Sign In or Register to comment.