Random/LFSR on P2

evanh · 2017-04-13 07:09

Out of the 125 there was 5 or 6 that terminated at 2 GB of random data. And maybe 4 more in the wider 500 set.

cgracey · 2017-04-13 07:39

evanh wrote: »

I automated 125 test runs of the a,b,c combinations resulting in 125 Practrand result output files. The longer a particular test lasts the bigger the result file becomes. I examined only the biggest files and posted my favourite from them.

I've since widened the combinations to 500 or so but not got anything better.

But, how come you were testing some patterns that are not 2^32-1 in length? All those three-digit hex numbers I listed represent max-length patterns. Those would be the starting point for testing, since they are known to iterate properly.

evanh · 2017-04-13 07:52

I wasn't trying to find max repeat. I just set a range for each of the three constants. For 'a' the range was 11-15, for 'c' range was 6-10, and for 'b' range was 1-5.

evanh · 2017-04-13 07:54

And recompiled for each test run. I use #defines for the tuning inputs and pass them to GCC from bash script.

cgracey · 2017-04-13 08:02

I see. We really wouldn't want to use a non-max-length set, though, because it will exhibit bad behavior at some point.

Would you be interested enough to try some of those values I listed? I don't want to keep bugging you about this, but I don't know how to work the Google here. I need to learn, though. I'm wondering if there is a really good pattern among those my test revealed. If there is a really good one, we can augment FUNCS to iterate that thing in two clocks. The summing would have to be done separately, though. The iteration consists only of XORs at hardwired rotate and shift offsets.

evanh · 2017-04-13 08:08

cgracey wrote: »

Those would be the starting point for testing, since they are known to iterate properly.

Of that list I'd choose 11,2,6 since its constants conform to original better than the others. I'll dig up the PRscore for it ...

Heater. · 2017-04-13 08:14

I think I'm missing a point here. Why not just use a known, analysed, mathematically proved, maximal length LFSR like the one from Bruce Schnider I posted earlier?

cgracey · 2017-04-13 08:50

Heater. wrote: »

I think I'm missing a point here. Why not just use a known, analysed, mathematically proved, maximal length LFSR like the one from Bruce Schnider I posted earlier?

Those are not high-quality. What if we could realize a 32-bit xoroshiro-type, instead? At only 32 bits of state, we can fully test it, ourselves.

Roy Eltham · 2017-04-13 08:55

Chip,
I'm with Heater, I don't understand why you think "those are not high-quality" for the known, analysed, mathematically proved, maximal length LFSR. For the 32bit case, they are commonly used, and considered good. Am I missing something?

cgracey · 2017-04-13 09:11

Roy Eltham wrote: »

Chip,
I'm with Heater, I don't understand why you think "those are not high-quality" for the known, analysed, mathematically proved, maximal length LFSR. For the 32bit case, they are commonly used, and considered good. Am I missing something?

LFSRs fail those random-quality tests immediately. That's why we implemented xoroshiro128+, instead of the LFSR that I had in there, before.

I'm thinking it would be good to have a similarly high-quality PRNG available in software, using a single long, so that it can be seeded and stepped. All we need to know is the best set of three 4-bit terms from among those sets that we already know are maximal-length, and then we'll have it.

jmg · 2017-04-13 10:03

evanh wrote: »

Out of the 125 there was 5 or 6 that terminated at 2 GB of random data. And maybe 4 more in the wider 500 set.

Does that mean you need a 33bit shifter, to give Chip's desired 4GB pass ?

evanh · 2017-04-13 10:47

evanh wrote: »

cgracey wrote: »

Those would be the starting point for testing, since they are known to iterate properly.

Of that list I'd choose 11,2,6 since its constants conform to original better than the others. I'll dig up the PRscore for it ...

Here we go:

evanh · 2017-04-13 10:52

jmg wrote: »

evanh wrote: »

Out of the 125 there was 5 or 6 that terminated at 2 GB of random data. And maybe 4 more in the wider 500 set.

Does that mean you need a 33bit shifter, to give Chip's desired 4GB pass ?

There's two halves at 16 bits each. See http://forums.parallax.com/discussion/comment/1407413/#Comment_1407413

evanh · 2017-04-13 10:59

Here's the actual random generator source I'm currently work with:

//#define  RESULT_SIZE  15   // Passed in as -D gcc parameter. (Min=8, Max=63) Bit width of generated random numbers.
//#define  CONSTANT_A  11   // Passed in as -D gcc parameter.
//#define  CONSTANT_B  2   // Passed in as -D gcc parameter.
//#define  CONSTANT_C  6   // Passed in as -D gcc parameter.

#define  ACCUM_MASK  (((1UL<<RESULT_SIZE)-1)<<1|1)   // One more bit than results.

// Seed the state array.  Seed MUST not be all zero.
static uint64_t  s[2] = {1, 0};


static inline  uint64_t rotl(uint64_t value, int shift) {
	return (value << shift) | ((value & ACCUM_MASK) >> (RESULT_SIZE + 1 - shift)) & ACCUM_MASK;
}


static uint64_t  next(void) {
	uint64_t  s0 = s[0];
	uint64_t  s1 = s[1];
	uint64_t  result = ((s0 + s1) & ACCUM_MASK) >> 1;   // Remove lsb because not quite so random.

	s1 ^= s0;
	s[0] = rotl(s0, CONSTANT_A) ^ s1 ^ (s1 << CONSTANT_B) & ACCUM_MASK;
	s[1] = rotl(s1, CONSTANT_C);

	return result;
}

evanh · 2017-04-13 14:40

One more detail to the all even constants instant fail criteria, it also requires the bit size of the internal state registers to be even as well. If their size is odd then the tests are much more consistent throughout the whole range of combinations and the all-even-constants has no negative impact.

cgracey · 2017-04-13 17:55

evanh wrote: »

One more detail to the all even constants instant fail criteria, it also requires the bit size of the internal state registers to be even as well. If their size is odd then the tests are much more consistent throughout the whole range of combinations and the all-even-constants has no negative impact.

I was thinking, what if we had a 12+12-bit xoroshiro24+ in a long, where we could iterate the two bottom 12-bit fields and put the top 8 bits of the 12+12-bit sum into the top byte, with bits 23..0 being the seed-able state? 24 bits would give only 16M states, but most software applications need way less than that, even. Having it repeatable and high-quality is most important. If we could get a byte out each iteration, it would be worthwhile.

	REP	#2,#4
	XORO24	state
	ROLBYTE	rnd,state,#3

cgracey · 2017-04-13 18:01

Ideally, it would be nice if we had some instruction to just step forward or backward to the next or previous 32-bit, high-quality, pseudo-random value. How to do this, though, where the state IS the value AND it's high-quality? A giant lookup table would be obvious, but impractical.

Heater. · 2017-04-13 18:02

Now I really don't get it. With a Schnider 32 bit LFSR, or similar, you get a 2^32 -1 sequence and surely the "quality" plenty good enough.

Roy Eltham · 2017-04-13 18:07

Heater,
I guess it depends on what is considered "quality" for a RNG. Non-repeating sequence is just one factor it seems. I guess they check for things like patterns on certain bits or groups of bits, maybe? Perhaps also getting to many sequential hits or to many numbers "close" to each other in a short set? Something like that?

msrobots · 2017-04-13 18:23

I still do not see any need for a repeatable random number, because it defeats the purpose of being - hmm - random.

If I need a random number in code, I want something to be not repeatable, else I can just write my own sequence. The main purpose is that we do have not the same sequence over and over again.

And that part is done with the free running xoroshiro128.

Now one can argument that for testing purpose a repeatable sequence could be nice, but if you later switch to a non repeatable version, what could you gain from testing?

And if someone really wants or needs a repeatable sequence, Isn't it not just simpler to use a second xoroshiro128, this time not free running, but able to be seeded?

confused again,

Mike

Heater. · 2017-04-13 18:26

Clearly the length of the sequence is not enough by itself. After all counting up from zero and rolling over is a long 32 bit sequence. Obviously not looking very random!

There are all kinds of statistical tests for "randomness", most of which I do not understand. For example you can estimate PI from a bunch of random numbers easily enough, if you don't get a good estimate of PI from the given numbers then something is up with them.

One obvious test is that runs of two "1"s in a row should be twice as likely as three, which should be twice as likely as four, etc.

I can't help but think that a well known and good 32 bit LFSR is plenty good enough without trying to invent your own by trial and error.

cgracey · 2017-04-13 19:32

The original LFSR I was using in the hub, complete with bit rewiring and NOT'ing of particular bits, done to disguise that it was really just an LFSR, failed the random-quality tests immediately, right out of the gate. So, an LFSR does not a good PRNG make.

The xoroshiro128+ is maybe the best PRNG today. We can see that by running the tests. Xoroshiro is, at heart, a topology which should be scalable. For the three 6-bit shift values chosen, there were likely several other candidates, out of the 216 possibilities. The xoroshiro128+ topology should be both scalable (i.e. 32 or 24 bits, instead of 128) and its shift factors alterable, without losing its essence.

Remember that game "Simon" from the 1970's, where a series of lights and sounds would be played and the player had to recite them by pressing the same pattern in? And it kept adding one light/tone to the sequence with each play iteration? That is a perfect example of where you'd want a seed-able and repeatable PRNG. That such a PRNG could be made to pass stringent quality checks would be great. And with limited 32-bit or 24-bit state and good test batteries available, there's no need to invent anything. Just tweak the recipe. The tests will tell you when you've got something good. There's no magic involved. Nobody needs to be a cryptologist to arrive at an optimal solution.

Heater. · 2017-04-13 19:53

I understand the need for a repeatable PRNG. If only for software testing where you want the same test to produce the same result each time. Or perhaps in simulations where someone may want to reproduce your results perhaps with a simulation written in a different language.

xorshiro is great but it is far from the best PRNG. Cryptographers would not want to use it for encrypting messages or generating keys and certificates. Although the reasons for that are beyond my understanding.

I can't help but think that with only 32 bits of state a well known, analysed and mathematically provably long sequence LSFR is the best one can do. People have been studying these things for decades now.

Thus saving a lot of P2 development time

Grief, was "Simon" such a long time ago....?

cgracey · 2017-04-13 20:35

evanh wrote: »

evanh wrote: »

cgracey wrote: »

Those would be the starting point for testing, since they are known to iterate properly.

Of that list I'd choose 11,2,6 since its constants conform to original better than the others. I'll dig up the PRscore for it ...

Here we go:

Evanh, thanks a lot for checking these out. So, {11,2,6} looks pretty good. It goes to 256MB before showing any signs of failure at 512MB. I think maybe that would be like xoroshiro128+ going to 2^124 MB before failing. I wonder if any of those other full-length combos would do much better. Good to know that {11,2,6} is pretty solid.

cgracey · 2017-04-13 20:46

Heater. wrote: »

I understand the need for a repeatable PRNG. If only for software testing where you want the same test to produce the same result each time. Or perhaps in simulations where someone may want to reproduce your results perhaps with a simulation written in a different language.

xorshiro is great but it is far from the best PRNG. Cryptographers would not want to use it for encrypting messages or generating keys and certificates. Although the reasons for that are beyond my understanding.

I can't help but think that with only 32 bits of state a well known, analysed and mathematically provably long sequence LSFR is the best one can do. People have been studying these things for decades now.

Thus saving a lot of P2 development time

Grief, was "Simon" such a long time ago....?

We've absolutely determined that this algorithm is both maximal-length and good to 256MB:

uint16_t next(void) {
	const uint16_t s0 = s[0];
	const_uint16_t s1 = s[1];
	const uint16_t result = s0 + s1;

	s1 ^= s0;
	s[0] = rotl(s0, 11) ^ s1 ^ (s1 << 2);
	s[1] = rotl(s1, 6);

	return result;
}

Because its state is only 32 bits, we can know this through brute-force testing.

Heater. · 2017-04-13 21:02

Can't argue with that.

In my naivety I would expect a good PRNG with only 32 bits of state to have a cycle length of 2^32 - 1 or so and to be statistically pass as random out to 4 billion iterations. Give or take a few.

Is it so that what you have there does better than the Schnider LFSR ? Or others ?

If so, that is great.

cgracey · 2017-04-13 21:17

Heater. wrote: »

Can't argue with that.

In my naivety I would expect a good PRNG with only 32 bits of state to have a cycle length of 2^32 - 1 or so and to be statistically pass as random out to 4 billion iterations. Give or take a few.

Is it so that what you have there does better than the Schnider LFSR ? Or others ?

If so, that is great.

The 32 bits of state do repeat after 2^32-1 iterations, but the state is not the output. The output is the top 15 bits of the sum of the two halves (top and bottom words). This is the xoroshiro topology.

If it's starting to fail at 512MB, that may be because it's only good for 256M iterations, where each iteration yields 15 bits. To get to 512MB, there would have to be over 256M iterations, yielding 15 bits, each.

I would like to see this tested just using the MSB of the sum as input to the test suite. Evanh, would you mind doing that? Just testing the MSB of the sum? It would take 8 iterations for each byte of output. I wonder what the failure point would be, then. I suspect it might fully iterate through all of its 2^32-1 states before failing, in that case.

evanh · 2017-04-14 01:23

cgracey wrote: »
I was thinking, what if we had a 12+12-bit xoroshiro24+ in a long, where we could iterate the two bottom 12-bit fields and put the top 8 bits of the 12+12-bit sum into the top byte, with bits 23..0 being the seed-able state? 24 bits would give only 16M states, but most software applications need way less than that, even. Having it repeatable and high-quality is most important. If we could get a byte out each iteration, it would be worthwhile.
	REP	#2,#4
	XORO24	state
	ROLBYTE	rnd,state,#3

I happened to do a quick run of that last night. Attached is the full set of tests. {8,2,3} is the only test that makes it to 64 MB.

cgracey · 2017-04-14 01:33

evanh wrote: »
cgracey wrote: »
I was thinking, what if we had a 12+12-bit xoroshiro24+ in a long, where we could iterate the two bottom 12-bit fields and put the top 8 bits of the 12+12-bit sum into the top byte, with bits 23..0 being the seed-able state? 24 bits would give only 16M states, but most software applications need way less than that, even. Having it repeatable and high-quality is most important. If we could get a byte out each iteration, it would be worthwhile.
	REP	#2,#4
	XORO24	state
	ROLBYTE	rnd,state,#3
I happened to do a quick run of that last night. Attached is the full set of tests. {8,2,3} is the only test that makes it to 64 MB.

There's only 16M iterations in 24 bits, so the random tester must have clued into the pattern after the 4th full cycle. What else could explain that?

evanh · 2017-04-14 01:46

cgracey wrote: »

I would like to see this tested just using the MSB of the sum as input to the test suite. Evanh, would you mind doing that? Just testing the MSB of the sum? It would take 8 iterations for each byte of output. I wonder what the failure point would be, then. I suspect it might fully iterate through all of its 2^32-1 states before failing, in that case.

Oh, ah, The B in MSB is Byte, right?

So you want to truncate each 12-bit sum to the top 8 bits and spit those individually at PractRand rather than my existing concatenating, correct?

Random/LFSR on P2

Comments