Random/LFSR on P2

Heater. · 2017-03-08 00:57

Phil,

I'm not sure...

1. Assuming there is some sketchy hardware process, I'm not sure it needs Von Neumann "whitening". Any old random bits will do as as seed.

2. No. there is only one PRNG in the whole machine. Not one per COG. As far as I understand the plan is that it churns away stepping though it's sequence no matter if anyone reads it or not.

The weird fan out and invert thing is a way for each COG to select 32 bits from the 63 available from the PRNG at any time. I'm not sure if it is necessary or even works!

Just now I cannot get Chip's verilog version of this PRNG to pass the PractRand test when run under the Icarus simulator. Quite likely my ignorance of verilog though.

evanh · 2017-03-08 01:16

And any software can later reseed it I suspect. Although it's only a 32-bit seed. This was recently added to the multi-function SETCLK instruction - http://forums.parallax.com/discussion/comment/1402352/#Comment_1402352.

evanh · 2017-03-08 01:20

Err, 31-bit seed. Since bit 31 must be set.

Heater. · 2017-03-08 01:22

Only 32 bit seed.

The state of this PRNG is 128 bits.

So while one might be able to disturb the generator one cannot reproduce the original seed and get the same sequence back again.

Phil Pilgrim (PhiPi) · 2017-03-08 01:57

heater wrote:

No. there is only one PRNG in the whole machine. Not one per COG.

Okaaay ... what then is the point of deriving 16 32-bit numbers from the original 64?

-Phil

evanh · 2017-03-08 04:20

Heater. wrote: »

So while one might be able to disturb the generator one cannot reproduce the original seed and get the same sequence back again.

There's no way to be sure of reproducibility in the first place anyway. It's not like any software is in programmatic control of the sequence.

evanh · 2017-03-08 04:30

Phil Pilgrim (PhiPi) wrote: »

Okaaay ... what then is the point of deriving 16 32-bit numbers from the original 64?

I believe that's just a nicety to use different parts of the whole. The real benefit is the long repeat cycle. As Heater mentioned, it's really a 128-bit number.

We've been thrashing it already but, now you've mention like this, maybe the summing should be done differently to produce 32 bit results instead of 63 bit results.

Ahle2 · 2017-03-08 08:19

evanh wrote: »

Phil Pilgrim (PhiPi) wrote: »

Okaaay ... what then is the point of deriving 16 32-bit numbers from the original 64?

Maybe the summing should be done differently to produce 32 bit results instead of 63 bit results.

But that would kind of produce a new PRNG that has not been tested, analyzed and proved "worthy". The designers behind Xoroshiro has been tweaking, analyzing, comparing it with numerous other PRNG's and running it through endless tests for years. I think our naive thinking that maybe using the reminder from the 64 bit addition as a substitute for the original LSB to get a 64 bit result proves my point. We can not allow any "randomness" (or arbitrarity of you want) in the algorithm itself. If we made such an change we would have to prove it worthy and the P2 would get delayed for month!

jmg · 2017-03-08 08:39

Ahle2 wrote: »

But that would kind of produce a new PRNG that has not been tested, analyzed and proved "worthy". The designers behind Xoroshiro has been tweaking, analyzing, comparing it with numerous other PRNG's and running it through endless tests for years. I think our naive thinking that maybe using the reminder from the 64 bit addition as a substitute for the original LSB to get a 64 bit result proves my point. We can not allow any "randomness" (or arbitrarity of you want) in the algorithm itself.

That raises a good point about those 63->32 taps that Chip allocated - have those taps been proven to be OK ?
It seems like code should be run with exactly those taps, and checked on all of the 16 32b results given.
- just to ensure there are no surprises.

Phil Pilgrim (PhiPi) · 2017-03-08 09:01

heater wrote:

The weird fan out and invert thing is a way for each COG to select 32 bits from the 63 available from the PRNG at any time.

Ugh, that's somewhat more disturbing. I can't imagine that there wouldn't be non-zero correlations among the 16 cogs doing it that way. But maybe the DieHard boffins will prove me wrong.

Anyway, what "randomness" is being tested? The bit-by-bit randomness, or the integer values they represent? Those are two very different things.

-Phil

Heater. · 2017-03-08 09:04

jmg,

...have those taps been proven to be OK ?

That is just what I was trying to test yesterday.

I added those taps to the PRNG Verilog that Chip posted. Compile it with a test bench using the Icarus Verilog compiler and run it with the Icarus simulator.

The results failed the PractRand test almost immediately.

However I know nothing of verilog so I have quite likely messed up my test harness. Then I may also have an error in the reformatting of data required to get it into PractRand.

Will continue ....

Ahle2 · 2017-03-08 09:22

As far as I understand, Dieharder and Practrand has numerous tests for determing the randomness on a bit level. Some of the tests sees the incoming random numbers as a sea of bits rather than 32 bits numbers or some such. Then they run suites that tests for different "bit spacing" (in lack of a better term) in this sea. Like for example every other bit or every 7th bit and so on.

And if any of the discussed PRNG's is anything near good, tapping arbitrary bits from the 63 available, should theoretically be equally good.

Phil Pilgrim (PhiPi) · 2017-03-08 09:49

Ahle2 wrote:

And if any of the discussed PRNG's is anything near good, tapping arbitrary bits from the 63 available, should theoretically be equally good.

Yes, true enough, on an individual basis. But what about cross-correlations among cogs that sample different bits from the same 63-bit value? Can we still assure statistical independence?

-Phil

evanh · 2017-03-08 09:52

Yeah, we've found good testing tools to verify this ourselves now. Not only that but remember the summing is outside of the randomiser sequencing loop. It's there as an extraction step to pull 128 bits down to 63 bits.

So, we could do our own 128 -> 32 summing without the basic sequencing changing. Then put the results through the testers we've now got and see how it fairs. If it holds up then that'll eliminate the arbitrary mapping of bits to Cogs.

Heater. · 2017-03-08 10:04

Phil,

I can't imagine that there wouldn't be non-zero correlations among the 16 cogs doing it that way.

I'm sure you are right.

The problem is how to create 16 independent, uncorrelated PRNGs. What to do? :

1) Create 16 instances of the PRNG hardware.

That leaves the problem of seeding them all so that they all use fifferent parts of the PRNG sequence.

This PRNG has a sequence length of 2^128 so it is possible to create up to 2^64 seeds that can be used to get non-overlapping subsets of the PRNG sequence. Each subset being a sequence of length 2^64.

I posted the jump() function here earlier that does exactly that.

This solution is heavy on hardware resources. And that seeding business is better done in software.

2) Have one instance of the PRNG hardware.

Just give every COG the next number generated. If this were a HUB resource then it would be accessed in a round-robin fashion. Nobody would ever get the same number.

Is the PRNG a HUB resource?

Is distributing successive numbers to COGS sufficiently uncorrelated?

This is not a solution those guys running massively parallel simulations would use because it requires all the processes to communicate with the single PRNG to get a random number. That's too slow, too much communication overhead. But I don't see why it is not workable for the Propeller.

3) Do what Chip has done.

Pull out 32 of the 63 bits, and feed them to each COG. Now they could all read from the same generated number at the same time but they all get a different arrangement of bits.

I must admit this seems a bit "smelly".

But don't forget this PRNG is running continuously, stepping through it's sequence, even if nobody is reading from it.

Ahle2 · 2017-03-08 10:06

Phil Pilgrim (PhiPi) wrote: »

Ahle2 wrote:

And if any of the discussed PRNG's is anything near good, tapping arbitrary bits from the 63 available, should theoretically be equally good.

Yes, true enough, on an individual basis. But what about cross-correlations among cogs that sample different bits from the same 63-bit value? Can we still assure statistical independence?

-Phil

My gut feeling is no! But is it a problem? For some parallel simulations (parallel Monte Carlo simulations etc.) it might be, but if the cogs samples on different clocks of the PRNG it should be independent!

evanh · 2017-03-08 12:21

Chip,
I've got a working 130-bit solution that produces nice 64-bit results that PractRand likes. Basically just extended s0 and s1 by one bit each to push the carry out of reach when summing. It still relies on the same shift and rotate constants, of 55, 14 and 36, as the original xoroshiro algorithm.

- DELETED BUGGY CODE - See fixed version - http://forums.parallax.com/discussion/comment/1403058/#Comment_1403058

Heater. · 2017-03-08 12:44

You have never compiled that now have you. Because, well, it does not compile.

evanh · 2017-03-08 12:46

I get a warning but it compiles runnable code. I've just now tweaked it for clarity.

evanh · 2017-03-08 12:51

There was a second code ordering warning that briefly came into existence when I was shuffling things around, but I think that never caused any errors due to it being declared "inlined".

EDIT: Ah, you were right, that second one was an error. Oops, It was only up for a moment.

Heater. · 2017-03-08 13:02

$ gcc evanh.c
evanh.c: In function ‘rotl’:
evanh.c:14:54: error: ‘mask65bits’ undeclared (first use in this function)
  return (value << shift) | (value >> (65 - shift)) & mask65bits;

Heater. · 2017-03-08 13:08

$ gcc -Wall -o evanh evanh.c
evanh.c: In function ‘rotl’:
evanh.c:16:52: warning: suggest parentheses around arithmetic in operand of ‘|’ [-Wparentheses]
  return (value << shift) | (value >> (65 - shift)) & MASK_65_BITS;

However this compiles clean:

#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

typedef  unsigned __int128  uint128_t;
#define MASK_65_BITS ((0x7fffffffffffffff << 2) | 0xff)

// Seed the state array.  Seed MUST not be all zero.
uint128_t  s[2] = {1, 0};

static inline  uint128_t rotl(uint128_t value, int shift) {
	return ((value << shift) | (value >> (65 - shift))) & MASK_65_BITS;
}

uint64_t  next(void) {
	uint128_t  s0 = s[0];
	uint128_t  s1 = s[1];
	uint64_t  result = (s0 + s1) >> 1;

	s1 ^= s0;
	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14 & MASK_65_BITS); // a, b
	s[1] = rotl(s1, 36); // c

	return result;
}

int  main(int argc, char* argv[])
{
	uint64_t  random64[64];
	int  i;

	// Print some randomness
	while(1)
	{
		random64[0] = next();

		for (i = 7; i >= 0; i--) {
			putchar(random64[0] >> (i * 8));
		}
	}
	return 0;
}

evanh · 2017-03-08 13:33

I had it all sorted as you made your first post.

Heater. · 2017-03-08 13:54

Sadly the evanh PRNG scores 5 "weak" results in Dieharder. See attachment.

How would we ever know what the period of the evanh generator is?

Heater. · 2017-03-08 14:05

Also sadly the evanh PRNG fails PractRand badly:

$ ./evanh.exe | PractRand_0.93/RNG_test.exe stdin
RNG_test using PractRand version 0.93
RNG = RNG_stdin, seed = 0xef0aa058
test set = normal, folding = standard(unknown format)

rng=RNG_stdin, seed=0xef0aa058
length= 32 megabytes (2^25 bytes), time= 3.2 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/32]Gap-16:B                 R=  -4.3  p =1-9.8e-4   unusual
  ...and 129 test result(s) without anomalies

...

rng=RNG_stdin, seed=0xef0aa058
length= 1 gigabyte (2^30 bytes), time= 107 seconds
  Test Name                         Raw       Processed     Evaluation
  DC6-9x1Bytes-1                    R=  +8.2  p =  2.0e-4   mildly suspicious
  ...and 182 test result(s) without anomalies

rng=RNG_stdin, seed=0xef0aa058
length= 2 gigabytes (2^31 bytes), time= 211 seconds
  Test Name                         Raw       Processed     Evaluation
  DC6-9x1Bytes-1                    R= +13.6  p =  2.3e-7   very suspicious
  ...and 193 test result(s) without anomalies

rng=RNG_stdin, seed=0xef0aa058
length= 4 gigabytes (2^32 bytes), time= 417 seconds
  Test Name                         Raw       Processed     Evaluation
  BCFN(2+0,13-0,T)                  R= +10.6  p =  3.3e-5   mildly suspicious
  DC6-9x1Bytes-1                    R= +31.3  p =  3.8e-15    FAIL !
  ...and 201 test result(s) without anomalies
$

Excellent attempt mind you.

evanh · 2017-03-08 15:06

Bugger, the mask is failing! It's all ones - doing nothing. Didn't do a though job did I.

I'm going to bed. I'll have another shot tomorrow ...

evanh · 2017-03-08 15:26

Okay I didn't go to bed. Got it right this time ...

#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

typedef  unsigned __int128  uint128_t;


static uint128_t  mask65bits = 0xffffffffffffffff;  // A workaround for lack of 128bit constant init in gcc.
//#define MASK_65_BITS ((0x7fffffffffffffff << 2) | 0xff)

// Seed the state array.  Seed MUST not be all zero.
static uint128_t  s[2] = {1, 0};


static inline  uint128_t rotl(uint128_t value, int shift) {
//	return (value << shift) | (value >> (sizeof(value) * CHAR_BIT - shift));
	return (value << shift) | (value >> (65 - shift)) & mask65bits;
//	return (value << shift) | (value >> (65 - shift)) & MASK_65_BITS;
}


static uint64_t  next(void) {
	uint128_t  s0 = s[0];
	uint128_t  s1 = s[1];
	uint64_t  result = (s0 + s1) >> 1;

	s1 ^= s0;
	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14) & mask65bits; // a, b
//	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14) & MASK_65_BITS; // a, b
	s[1] = rotl(s1, 36); // c

	return result;
}



int  main(int argc, char* argv[])
{
	uint64_t  random64[64];
	int  i, j;


	mask65bits = (mask65bits << 1) + 1;
/*	for (i = 15; i >= 0; i--) {
		j = (uint8_t)(mask65bits >> (i * 8));
//		j = (uint8_t)(MASK_65_BITS >> (i * 8));
		printf( "%02x ", j );
	}
	printf("\n");*/
	// Print some randomness
	while(1)
	{
		random64[0] = next();

		for (i = 7; i >= 0; i--) {
			putchar(random64[0] >> (i * 8));
		}
	}
	return 0;
}

MJB · 2017-03-08 16:13

I am following this thread just for curiosity.

If I am developing an algorithm (monte carlo simulations) it is important to be able to SEED my PRNG so I can have different runs I can relate/compare. Storing huge numbers of random numbers on the memory limited P2 is no option.
As I could use any seed that is all the randomness I can think of.
I personally see no need for a REAL random number, whatever this will be ...

So now I am not sure if this 'deterministic PRNG',
which gives the same sequence of PRNs every time for a given seed, is still in there ... ??

And I am used to determinism from P1 ;-)

evanh · 2017-03-08 16:25

For total predictable control of the generated numbers you should be using your own software based PRNG.

The hardware one never was particularly repeatable for a given program due to it being a free-running generator.

AntoineDoinel · 2017-03-08 17:14

Heater. wrote: »

The problem is how to create 16 independent, uncorrelated PRNGs. What to do? :

1) Create 16 instances of the PRNG hardware.

That leaves the problem of seeding them all so that they all use fifferent parts of the PRNG sequence.

This PRNG has a sequence length of 2^128 so it is possible to create up to 2^64 seeds that can be used to get non-overlapping subsets of the PRNG sequence. Each subset being a sequence of length 2^64.

I posted the jump() function here earlier that does exactly that.

This solution is heavy on hardware resources. And that seeding business is better done in software.

What about making it pipelined like the cordic unit?
I mean iterating the combinatorial part (the taps and gates) to operate on 16 separate bit vectors:
- load seed #n
- apply RNG step to seed #n
- save seed #n
Once initialized, every sliced generator would mantain it's own state.

I guess this only make sense if the vectors are placed in an addressable memory, otherwise more logic is wasted than saved, due to switching.

Random/LFSR on P2

Comments