Random/LFSR on P2

evanh · 2017-10-31 12:10

scro wrote: »

Um... what's this parity bit 0 thing exactly? I've seen it mentioned a number of times here, but I've never seen in actually defined, unless it was in Verilog code I didn't bother trying to decipher.

Tony was reading something about parity being used on the PowerBASIC forums and I tried a few variations, even getting an improvement on the default Xoroshiro algorithm, before Chip dropped in on us and suggested we include the final carry bit as well - http://forums.parallax.com/discussion/comment/1423893/#Comment_1423893

evanh · 2017-10-31 12:25

Here's an example of how it was with regular Xoroshiro algorithm - http://forums.parallax.com/discussion/comment/1423849/#Comment_1423849

"Word0" is the entire 16 bits of the summing.

Also, note the msBit column. It is lightly sprinkled with some poor scores ... Now check out the msBit column of a later modification when I had put the parity at the msBit position - http://forums.parallax.com/discussion/comment/1424022/#Comment_1424022

Only a single non-perfect score!

evanh · 2017-10-31 12:39

I just noticed Tony's summaries of my scores has more precise labelling. Looks like I should include those in output from test scripts.

TonyB_ · 2017-10-31 12:41

Here is my original post from last week that started the parity discussion:
http://forums.parallax.com/discussion/comment/1423789/#Comment_1423789

The idea by David Roberts of using parity is the important point, the PowerBASIC details less so. I joined that forum to thank him but I haven't been approved yet, so if David Roberts is reading this post - thank you! (tags Xorshift PRNG, xorshift128+, xoroshiro128+, xoroshiro).

I think David Roberts uses 64-bit parity not the full 65-bit parity possible. We use 17-bit parity in the XORO32 instruction (thanks to Chip). In the tests done by Evan, the improvement factor when the original bit 0 is replaced with 16-bit parity is the same as when 16-bit parity is replaced by 17-bit parity. That observation is based simply on the ratio of the full word scores in PractRand.

I have informed Sebastiano Vigna about this parity idea on the prng forum he started:
https://groups.google.com/forum/m/#!forum/prng

TonyB_ · 2017-10-31 12:49

evanh wrote: »

I just noticed Tony's summaries of my scores has more precise labelling. Looks like I should include those in output from test scripts.

Evan, if you could adopt precise bit labelling it would make your test results easier to comprehend.

cgracey · 2017-10-31 13:05

Scro,

How is it known that xoroshiro128+ indeed has 2^128-1 states? Is it possible to prove such a thing? If not, might we seed it into a corner, so to speak? It would be awful if we found out later that some time after seeding, its quality might plummet, due to it falling into a tight loop. I mean, this thing is going to run at 160 megahertz for possibly years on end, in some cases.

If such high state ranges cannot be proven, might we string together, say, four of our known XORO32 generators in some fashion, so that we'll know with certainty there will be 2^128-1 states?

evanh · 2017-10-31 13:07

TonyB_ wrote: »

I have informed Sebastiano Vigna about this parity idea on the prng forum he started:
https://groups.google.com/forum/m/#!forum/prng

Nice conversation. He warns of a few factors that I don't understand at all but the one thing that stands out at the end is that bit 1 isn't a whole lot better than bit 0. Which Scro has also pointed out. And we've even noted it in testing.

He also pointed out how slow software is at doing a wide parity calculation. There's no way he'll recommend using parity with software methods.

TonyB_ · 2017-10-31 13:12

cgracey wrote: »

Scro,

How is it known that xoroshiro128+ indeed has 2^128-1 states? Is it possible to prove such a thing? If not, might we seed it into a corner, so to speak? It would be awful if we found out later that some time after seeding, its quality might plummet, due to it falling into a tight loop. I mean, this thing is going to run at 160 megahertz for possibly years on end, in some cases.

If such high state ranges cannot be proven, might we string together, say, four of our known XORO32 generators in some fashion, so that we'll know with certainty there will be 2^128-1 states?

Mathematics proves it.

evanh · 2017-10-31 13:12

cgracey wrote: »

Scro,

How is it known that xoroshiro128+ indeed has 2^128-1 states?

Bold question. I'll be very surprised if there is an easy to digest useable answer.

evanh · 2017-10-31 13:13

TonyB_ wrote: »

Mathematics proves it.

Ya, and Chip wants to see it!

TonyB_ · 2017-10-31 13:23

evanh wrote: »

TonyB_ wrote: »

I have informed Sebastiano Vigna about this parity idea on the prng forum he started:
https://groups.google.com/forum/m/#!forum/prng

Nice conversation. He warns of a few factors that I don't understand at all but the one thing that stands out at the end is that bit 1 isn't a whole lot better than bit 0. Which Scro has also pointed out. And we've even noted it in testing.

He also pointed out how slow software is at doing a wide parity calculation. There's no way he'll recommend using parity with software methods.

He says bit 1 has LFSR artifacts, not that bits 1 and 0 are about the same. We know that bit 1 is better from the tests and replacing bit 0 with full parity gives very good results for the bottom byte.

Also he says using parity is "slower" which it is, even in assembler.

evanh · 2017-10-31 13:24

Well, here's the source as compiled ... PractRand pukes on it at 1kB.

#include <limits.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>


static uint32_t  state = 1;


static uint32_t  get_rng_output() {
	uint32_t  tmp = state;

	state ^= state << 13;
	state ^= state >> 17;
	state ^= state << 5;
	tmp += (tmp << 18) | (tmp >> 14);//barrel shift and add
	tmp += ((tmp << 7) | (tmp >> 25)) ^ ((tmp << 11) | (tmp >> 21));//two barrel shifts, an xor, and an add
	return tmp ^ state;
}



int  main(int argc, char* argv[])
{
	int  i;
	uint32_t  random;
	uint8_t  k;


	// Print some randomness
	while(1)
	{
//*
		random = get_rng_output();
		k = random;  putchar( k );
		k = random >> 8;  putchar( k );
		k = random >> 16;  putchar( k );
		k = random >> 24;  putchar( k );
//*/
/*
		for( i=0; i<16; i++ )  {
			printf( " %08x", get_rng_output() );
		}
		printf( "\n" );
*/
	}

	return 0;
}

cgracey · 2017-10-31 13:26

I think the way to fully doctor these xoroshiro algos is to XOR each bit of the sum with the parity of all bits above it, including carry.

evanh · 2017-10-31 13:28

Here's the first numbers as a text dump using the commented part of the source.

PractRand output:

evanh@control:~/hoard/rng_testing$ test | ./PractRand stdin -a -tlmin 1KB
RNG_test using PractRand version 0.93
RNG = RNG_stdin, seed = 0x1cfbbb61
test set = normal, folding = standard(unknown format)

rng=RNG_stdin, seed=0x1cfbbb61
length= 1 kilobyte (2^10 bytes), time= 0.0 seconds
  Test Name                         Raw       Processed     Evaluation
  DC6-9x1Bytes-1                    R= +1266  p =  6.6e-350   FAIL !!!!!!!

TonyB_ · 2017-10-31 13:35

TonyB_ wrote: »

"Random Number Generators" by George Marsaglia, 2003
Download from http://digitalcommons.wayne.edu/jmasm/vol2/iss1/2/

See pages 4 and 5 for some of the theory about full period sequences in xorshift-type algorithms.

TonyB_ · 2017-10-31 13:38

cgracey wrote: »

I think the way to fully doctor these xoroshiro algos is to XOR each bit of the sum with the parity of all bits above it, including carry.

Have you changed the Verilog yet?

Doing this in C would takes ages.

evanh · 2017-10-31 13:39

TonyB_ wrote: »

Also he says using parity is "slower" which it is, even in assembler.

He was just being nice. C compilers don't know a parity from a potato.

Parity is a great tool for us because it's cheap in hardware.

EDIT: Actually assembly can be different. Both Prop1 and Prop2 generates parity into C flag with the binary logic instructions.

cgracey · 2017-10-31 13:43

evanh wrote: »

TonyB_ wrote: »

Also he says using parity is "slower" which it is, even in assembler.

He was just being nice. C compilers don't know a parity from a potato.

Parity is a great tool for us because it's cheap in hardware.

That's right. I can change the Verilog, no problem, and it wouldn't complicate the hardware very much, but is it worth doing? We need Evanh to run one of those tests on our XORO32 with this parity addition applied. I suspect it will work pretty well.

evanh · 2017-10-31 13:47

pitter patter ...

I think the way to fully doctor these xoroshiro algos is to XOR each bit of the sum with the parity of all bits above it, including carry.

Ahhhhhh ....

TonyB_ · 2017-10-31 13:49

I don't know how much data Evan creates for each test and how much longer that would take when every bit is XORed with higher parity. Only [14,2,7] and [15,3,6] really matter, though, and the rest can be dumped for this test.

cgracey · 2017-10-31 13:52

TonyB_ wrote: »

I don't know how much data Evan creates for each test and how much longer that would take when every bit is XORed with higher parity. Only [14,2,7] and [15,3,6] really matter, though, and the rest can be dumped for this test.

Yes, the parity computation would zipper down from the carry to bit 0.

TonyB_ · 2017-10-31 14:07

The result might be every bit equally good, maybe better than the best bit previously.

cgracey · 2017-10-31 14:08

TonyB_ wrote: »

The result might be every bit equally good, maybe better than the best bit previously.

It spreads the manure where it's most needed and thins it where it's not.

cgracey · 2017-10-31 14:09

Every bit gets equal richness.

evanh · 2017-10-31 14:13

Here's code I've used for post-summing carry/parity ripple:

	for( shift = ACCUM_SIZE; shift > 0; shift-- )  {
		result ^= (result & (1 << shift)) >> 1;
	}

The scoring results were rubbish. I stopped it early thinking something else was wrong. Couldn't find anything so changed the source back and all is good again.

evanh · 2017-10-31 14:26

TonyB_ wrote: »

I don't know how much data Evan creates for each test and how much longer that would take when every bit is XORed with higher parity. Only [14,2,7] and [15,3,6] really matter, though, and the rest can be dumped for this test.

It takes around 12 minutes to run all 84 candidates of the s16 set with SMT enabled. It's taking a decent amount longer now with SMT disabled. 17 minutes, too long I think ...

cgracey · 2017-10-31 14:28

evanh wrote: »
Here's code I've used for post-summing carry/parity ripple:
	for( shift = ACCUM_SIZE; shift > 0; shift-- )  {
		result ^= (result & (1 << shift)) >> 1;
	}
The scoring results were rubbish. I stopped it early thinking something else was wrong. Couldn't find anything so changed the source back and all is good again.

Oh, sad.

TonyB_ · 2017-10-31 14:28

evanh wrote: »
Here's code I've used for post-summing carry/parity ripple:
	for( shift = ACCUM_SIZE; shift > 0; shift-- )  {
		result ^= (result & (1 << shift)) >> 1;
	}
The scoring results were rubbish. I stopped it early thinking something else was wrong. Couldn't find anything so changed the source back and all is good again.

How about doing the parity thing for just bits 1 and 0?

cgracey · 2017-10-31 14:34

evanh wrote: »
Here's code I've used for post-summing carry/parity ripple:
	for( shift = ACCUM_SIZE; shift > 0; shift-- )  {
		result ^= (result & (1 << shift)) >> 1;
	}
The scoring results were rubbish. I stopped it early thinking something else was wrong. Couldn't find anything so changed the source back and all is good again.

ACCUM_SIZE was 16, right? The code looks good to me.

I was thinking we'd hit the jackpot.

evanh · 2017-10-31 14:42

Sorry guys, it's 3:40AM.

Random/LFSR on P2

Comments