Um... what's this parity bit 0 thing exactly? I've seen it mentioned a number of times here, but I've never seen in actually defined, unless it was in Verilog code I didn't bother trying to decipher.
Tony was reading something about parity being used on the PowerBASIC forums and I tried a few variations, even getting an improvement on the default Xoroshiro algorithm, before Chip dropped in on us and suggested we include the final carry bit as well - http://forums.parallax.com/discussion/comment/1423893/#Comment_1423893
The idea by David Roberts of using parity is the important point, the PowerBASIC details less so. I joined that forum to thank him but I haven't been approved yet, so if David Roberts is reading this post - thank you! (tags Xorshift PRNG, xorshift128+, xoroshiro128+, xoroshiro).
I think David Roberts uses 64-bit parity not the full 65-bit parity possible. We use 17-bit parity in the XORO32 instruction (thanks to Chip). In the tests done by Evan, the improvement factor when the original bit 0 is replaced with 16-bit parity is the same as when 16-bit parity is replaced by 17-bit parity. That observation is based simply on the ratio of the full word scores in PractRand.
How is it known that xoroshiro128+ indeed has 2^128-1 states? Is it possible to prove such a thing? If not, might we seed it into a corner, so to speak? It would be awful if we found out later that some time after seeding, its quality might plummet, due to it falling into a tight loop. I mean, this thing is going to run at 160 megahertz for possibly years on end, in some cases.
If such high state ranges cannot be proven, might we string together, say, four of our known XORO32 generators in some fashion, so that we'll know with certainty there will be 2^128-1 states?
Nice conversation. He warns of a few factors that I don't understand at all but the one thing that stands out at the end is that bit 1 isn't a whole lot better than bit 0. Which Scro has also pointed out. And we've even noted it in testing.
He also pointed out how slow software is at doing a wide parity calculation. There's no way he'll recommend using parity with software methods.
How is it known that xoroshiro128+ indeed has 2^128-1 states? Is it possible to prove such a thing? If not, might we seed it into a corner, so to speak? It would be awful if we found out later that some time after seeding, its quality might plummet, due to it falling into a tight loop. I mean, this thing is going to run at 160 megahertz for possibly years on end, in some cases.
If such high state ranges cannot be proven, might we string together, say, four of our known XORO32 generators in some fashion, so that we'll know with certainty there will be 2^128-1 states?
Nice conversation. He warns of a few factors that I don't understand at all but the one thing that stands out at the end is that bit 1 isn't a whole lot better than bit 0. Which Scro has also pointed out. And we've even noted it in testing.
He also pointed out how slow software is at doing a wide parity calculation. There's no way he'll recommend using parity with software methods.
He says bit 1 has LFSR artifacts, not that bits 1 and 0 are about the same. We know that bit 1 is better from the tests and replacing bit 0 with full parity gives very good results for the bottom byte.
Also he says using parity is "slower" which it is, even in assembler.
Here's the first numbers as a text dump using the commented part of the source.
PractRand output:
evanh@control:~/hoard/rng_testing$ test | ./PractRand stdin -a -tlmin 1KB
RNG_test using PractRand version 0.93
RNG = RNG_stdin, seed = 0x1cfbbb61
test set = normal, folding = standard(unknown format)
rng=RNG_stdin, seed=0x1cfbbb61
length= 1 kilobyte (2^10 bytes), time= 0.0 seconds
Test Name Raw Processed Evaluation
DC6-9x1Bytes-1 R= +1266 p = 6.6e-350 FAIL !!!!!!!
Also he says using parity is "slower" which it is, even in assembler.
He was just being nice. C compilers don't know a parity from a potato.
Parity is a great tool for us because it's cheap in hardware.
That's right. I can change the Verilog, no problem, and it wouldn't complicate the hardware very much, but is it worth doing? We need Evanh to run one of those tests on our XORO32 with this parity addition applied. I suspect it will work pretty well.
I don't know how much data Evan creates for each test and how much longer that would take when every bit is XORed with higher parity. Only [14,2,7] and [15,3,6] really matter, though, and the rest can be dumped for this test.
I don't know how much data Evan creates for each test and how much longer that would take when every bit is XORed with higher parity. Only [14,2,7] and [15,3,6] really matter, though, and the rest can be dumped for this test.
Yes, the parity computation would zipper down from the carry to bit 0.
The scoring results were rubbish. I stopped it early thinking something else was wrong. Couldn't find anything so changed the source back and all is good again.
I don't know how much data Evan creates for each test and how much longer that would take when every bit is XORed with higher parity. Only [14,2,7] and [15,3,6] really matter, though, and the rest can be dumped for this test.
It takes around 12 minutes to run all 84 candidates of the s16 set with SMT enabled. It's taking a decent amount longer now with SMT disabled. 17 minutes, too long I think ...
The scoring results were rubbish. I stopped it early thinking something else was wrong. Couldn't find anything so changed the source back and all is good again.
The scoring results were rubbish. I stopped it early thinking something else was wrong. Couldn't find anything so changed the source back and all is good again.
How about doing the parity thing for just bits 1 and 0?
The scoring results were rubbish. I stopped it early thinking something else was wrong. Couldn't find anything so changed the source back and all is good again.
ACCUM_SIZE was 16, right? The code looks good to me.
Comments
"Word0" is the entire 16 bits of the summing.
Also, note the msBit column. It is lightly sprinkled with some poor scores ... Now check out the msBit column of a later modification when I had put the parity at the msBit position - http://forums.parallax.com/discussion/comment/1424022/#Comment_1424022
Only a single non-perfect score!
http://forums.parallax.com/discussion/comment/1423789/#Comment_1423789
The idea by David Roberts of using parity is the important point, the PowerBASIC details less so. I joined that forum to thank him but I haven't been approved yet, so if David Roberts is reading this post - thank you! (tags Xorshift PRNG, xorshift128+, xoroshiro128+, xoroshiro).
I think David Roberts uses 64-bit parity not the full 65-bit parity possible. We use 17-bit parity in the XORO32 instruction (thanks to Chip). In the tests done by Evan, the improvement factor when the original bit 0 is replaced with 16-bit parity is the same as when 16-bit parity is replaced by 17-bit parity. That observation is based simply on the ratio of the full word scores in PractRand.
I have informed Sebastiano Vigna about this parity idea on the prng forum he started:
https://groups.google.com/forum/m/#!forum/prng
Evan, if you could adopt precise bit labelling it would make your test results easier to comprehend.
How is it known that xoroshiro128+ indeed has 2^128-1 states? Is it possible to prove such a thing? If not, might we seed it into a corner, so to speak? It would be awful if we found out later that some time after seeding, its quality might plummet, due to it falling into a tight loop. I mean, this thing is going to run at 160 megahertz for possibly years on end, in some cases.
If such high state ranges cannot be proven, might we string together, say, four of our known XORO32 generators in some fashion, so that we'll know with certainty there will be 2^128-1 states?
He also pointed out how slow software is at doing a wide parity calculation. There's no way he'll recommend using parity with software methods.
Mathematics proves it.
He says bit 1 has LFSR artifacts, not that bits 1 and 0 are about the same. We know that bit 1 is better from the tests and replacing bit 0 with full parity gives very good results for the bottom byte.
Also he says using parity is "slower" which it is, even in assembler.
PractRand output:
See pages 4 and 5 for some of the theory about full period sequences in xorshift-type algorithms.
Have you changed the Verilog yet?
Doing this in C would takes ages.
Parity is a great tool for us because it's cheap in hardware.
EDIT: Actually assembly can be different. Both Prop1 and Prop2 generates parity into C flag with the binary logic instructions.
That's right. I can change the Verilog, no problem, and it wouldn't complicate the hardware very much, but is it worth doing? We need Evanh to run one of those tests on our XORO32 with this parity addition applied. I suspect it will work pretty well.
Yes, the parity computation would zipper down from the carry to bit 0.
It spreads the manure where it's most needed and thins it where it's not.
The scoring results were rubbish. I stopped it early thinking something else was wrong. Couldn't find anything so changed the source back and all is good again.
It takes around 12 minutes to run all 84 candidates of the s16 set with SMT enabled. It's taking a decent amount longer now with SMT disabled. 17 minutes, too long I think ...
Oh, sad.
How about doing the parity thing for just bits 1 and 0?
ACCUM_SIZE was 16, right? The code looks good to me.
I was thinking we'd hit the jackpot.