I sleep for a few hours and find the gears have been turning.
The number of candidates mirrors xoroshiro, which makes sense.
Then the only hope is that the escape-from-zero benefit of the NOT operation would improve other statistics by-proxy.
I guess that is very possible with the right candidate... I'll give it a much better than 50% chance, but then could be higher with the NOT and either '<< B' or '>> B'.
Thanks for digging into this.
I think the best name is xoroshironot as that describes the algorithm: xor, rotate, shift, rotate, not. Thus xoroshironot32++ [5,1,14,9].
Technically, yes, but I was trying to steer clear of that exact name, as I already have published it for the (superior in many ways) three-state-variable version of my xoroshiro++ variant.
However, I already documented the alias for it as Xoroshiroppm, which would allow me to abandon XoroshiroNOT in favor of this new variant, if necessary.
Let us see how this pans out first.
I notice a snag, that although it might really be just an annoyance, might add excess burden to the already stressed output scrambler, if it were addressed...
Normally xoroshiro32 would emit 65535 zeros, but 65536 of all other values due to the 2^N-1 period (which has the negative consequence of creating an excess of set bits over the entire period).
Haphazardly inverting bits (i.e. 'Not') would shift the short value from zero to something else.
That might be problematic in the case that some 'aware' software assumes only the missing zero, so intentionally discards them, for example.
If that issue needed to be addressed, then the final output would have to be xor'd with the missing value to restore the missing zero, at the end of the output calculation.
Or, perhaps, on-the-fly as the register output is accessed.
On the plus side, if the missing output value were not xor'd with each output, and it happened to be one that contains exactly 8 set bits, then the total 0/1 bit balance would be perfect over the entire period (therefore over the course of an arbitrary number of full periods cycles). That behavior would be ideal for those writing software that flips a lot of coins.
If this makes your head spin... keep the status quo.
The good news is there is a couple of good B=1 candidates right there.
However, it sounds like idea about including zero in the full period is flawed. I think it's worse than expected, as in zero becomes part of the full period but something else is removed. Possibly -1.
I've just been doing some alternative full-period searches:
- When looking for 2^N states as full-period the output is nothing.
- But when looking for 2^N - 1 states as full-period and with s0=0 and s1=0 as start and finish conditions then I get the very same 84 triplets.
The good news is there is a couple of good B=1 candidates right there.
However, it sounds like idea about including zero in the full period is flawed. I think it's worse than expected, as in zero becomes part of the full period but something else is removed. Possibly -1.
Why is B=1 so important? Other values give better pfreq. I haven't looked at any others, but [8,1,9,4] is not very good on zfreq and nzfreq.
An XOR at the end, as already suggested, to make the zero frequency one fewer than the non-zero would be simple, at the cost of an extra instruction.
I'm just following the path Chris (Xoroshironot) is on. B=1 was one of the criteria, part of the Gray Code function. Also, I thought the inversion made zero frequency one more not one less. I've proven an all zero state repeats every 2^N - 1.
I think it's worse than expected, as in zero becomes part of the full period but something else is removed. Possibly -1.
I don't think -1 can be removed, since the 'Not' isn't used on all occurences of 's1'.
Removal of a single zero is a common convention, but in the vast majority of use cases it is not relevant which value is removed.
If I could choose, I would remove an occurrence of 0x5555. That would maintain bit balance over the entire stream (and I like that value less than I do zero).
The fact that any stream is short by the omission of a single value occurrence is only a consideration for smaller state generators.
The good news is there is a couple of good B=1 candidates right there.
B=1 seems rational to me, but that does not make it correct in the context of what you are trying to do, and the fact that 'rational' does always fit with PRNGs.
At the time I did not understand your use of specific criteria on the full period to steer the statistics toward the behavior of a larger state generator (per Scro).
Seba made some comment critical of that, but as long as the end result is a stream that also pushes against ~1GB in PractRand (and passes Small Crush), all is well.
As I recall, if I hash x and y independently before passing them to the ++ scrambler, I can always get to 1GB in Practrand (with options -tf 2 -te 1), but never to 2GB.
Thus my comment about 'status quo', as it might be difficult to squeeze enough net improvement to make the effort worthwhile.
But when looking for 2^N - 1 states as full-period and with s0=0 and s1=0 as start and finish conditions then I get the very same 84 triplets.
That illustrates another negative facet of lacking a missing zero... there must exist a seed state that will result in one iteration leaving the state untouched.
When that bad seed state is zero, as in the normal xoroshiro, you only need to ensure at least one bit is set when seeding.
With the 'Not' applied, you must xor each non-zero seed with the known bad seed value before storing the seed in the state.
Finding the bad seed (even by brute force) is fairly trivial for a 32-bit state generator.
Properly seeding is also trivial once the bad seed is known, as this is already being done with the normal xoroshiro (to avoid all zeros).
But when looking for 2^N - 1 states as full-period and with s0=0 and s1=0 as start and finish conditions then I get the very same 84 triplets.
That illustrates another negative facet of lacking a missing zero... there must exist a seed state that will result in one iteration leaving the state untouched.
When that bad seed state is zero, as in the normal xoroshiro, you only need to ensure at least one bit is set when seeding.
With the 'Not' applied, you must xor each non-zero seed with the known bad seed value before storing the seed in the state.
Finding the bad seed (even by brute force) is fairly trivial for a 32-bit state generator.
Properly seeding is also trivial once the bad seed is known, as this is already being done with the normal xoroshiro (to avoid all zeros).
I see what you mean now. Inversion only applies to s0, though, so setting one bit in s1 should guarantee a good seed?
I see what you mean now. Inversion only applies to s0, though, so setting one bit in s1 should guarantee a good seed?
I'll look at calculating the 1 cycle seed value for any given ABC. The missing output value would depend on D, as well, and even if it wasn't used for output xor, would be necessary to know.
I see what you mean now. Inversion only applies to s0, though, so setting one bit in s1 should guarantee a good seed?
I'll look at calculating the 1 cycle seed value for any given ABC. The missing output value would depend on D, as well, and even if it wasn't used for output xor, would be necessary to know.
Actually it's subtract we need for the output, not xor. To make sure the zero output frequency is 2^N-1, subtract the non-zero 2^N-1 value from each output.
Some [a,b,c,d] for the new engine have very good frequency distributions that are better than the chosen one in the P2. However, a good distribution does not imply a good score in PRNG tests, which I can't perform.
Currently my PractRand analysis finds about 4 of the above pfreq candidates possibly worth considering.
However, the forward bits analysis is slightly worse and the reverse bits is slightly better (as compared to 13,5,10,9) on ALL of the 4 I liked.
That is marginally suspicious.
Presuming you concur... it seems the issue might be related to using '>>' instead of the original '<<'.
I tried a few with '<<' instead and the balance between forward and reverse bits looked better.
Perhaps (and very likely) Seba understands something here that is not obvious to me.
Changing (back) to '<<' from '>>' (and keeping the NOT): A = 16 - A : B = (just) B : C = 16 - C
This creates an entirely new set of streams, thus possible candidates with respect to ABC and D (since the shift direction change results in a subtle asymmetry, I believe).
Switching the shift back to '<<' (so now only a Not has been added to the original code), takes us back to the pre-Not constants, since xor, rotate and shift are unaffected by that change.
As a result, this yields an easy example... your original [13,5,10,9], with these properties:
1. Very similar PractRand results to original.
2. Excellent escape from zero properties (10 iterations, in avg. bits set in output word considering all 32 single-set-bit seeds, no output subtraction):
Original 1.5 5.375 6.96875 7.96875 8.75 8.1875 7.46875 8.6875 8.375 8.5
NOT << 1.5 9.71875 7.96875 7.84375 8.46875 7.3125 9.0625 7.8125 8.0625 7.9375
3. Likely same, or very similar pf, zrf, and nzrf statistics... would need tested (and again with output correction below, if it will be used, just to verify).
4. Must subtract (or xor) 0x3BE1 from output to restore missing zero, if having one less zero output (as original) is required.
5. Must subtract (or xor) these values when non-zero seeding: s[0] = s[0]_seed - 0x2034, s[1]= s[1]_seed - 0xB659
So much effort for so little of a change to code... likely not worth the effort except for #2 and/or if there is some improvement in testing #3.
Comments
That's the same number with b=1 as xoroshiro. How many altogether?
The number of candidates mirrors xoroshiro, which makes sense.
Then the only hope is that the escape-from-zero benefit of the NOT operation would improve other statistics by-proxy.
I guess that is very possible with the right candidate... I'll give it a much better than 50% chance, but then could be higher with the NOT and either '<< B' or '>> B'.
Thanks for digging into this.
However, I already documented the alias for it as Xoroshiroppm, which would allow me to abandon XoroshiroNOT in favor of this new variant, if necessary.
Let us see how this pans out first.
Normally xoroshiro32 would emit 65535 zeros, but 65536 of all other values due to the 2^N-1 period (which has the negative consequence of creating an excess of set bits over the entire period).
Haphazardly inverting bits (i.e. 'Not') would shift the short value from zero to something else.
That might be problematic in the case that some 'aware' software assumes only the missing zero, so intentionally discards them, for example.
If that issue needed to be addressed, then the final output would have to be xor'd with the missing value to restore the missing zero, at the end of the output calculation.
Or, perhaps, on-the-fly as the register output is accessed.
On the plus side, if the missing output value were not xor'd with each output, and it happened to be one that contains exactly 8 set bits, then the total 0/1 bit balance would be perfect over the entire period (therefore over the course of an arbitrary number of full periods cycles). That behavior would be ideal for those writing software that flips a lot of coins.
If this makes your head spin... keep the status quo.
Looking just at pfreq for closest candidates to ideal pfreq0 = pfreq1 = 1580030169
(see http://forums.parallax.com/discussion/comment/1468197/#Comment_1468197)
here are the best ones:
I need to search for my program that reads the binary files and outputs Chi-Square results.
EDIT:
For comparison, here are the corresponding values for xoroshiro32++ [13,5,10,9] in P2 rev. B:
However, it sounds like idea about including zero in the full period is flawed. I think it's worse than expected, as in zero becomes part of the full period but something else is removed. Possibly -1.
- When looking for 2^N states as full-period the output is nothing.
- But when looking for 2^N - 1 states as full-period and with s0=0 and s1=0 as start and finish conditions then I get the very same 84 triplets.
Why is B=1 so important? Other values give better pfreq. I haven't looked at any others, but [8,1,9,4] is not very good on zfreq and nzfreq.
An XOR at the end, as already suggested, to make the zero frequency one fewer than the non-zero would be simple, at the cost of an extra instruction.
P.S.
Earlier reply edited to add chosen xoroshiro32++ values
http://forums.parallax.com/discussion/comment/1468260/#Comment_1468260
Removal of a single zero is a common convention, but in the vast majority of use cases it is not relevant which value is removed.
If I could choose, I would remove an occurrence of 0x5555. That would maintain bit balance over the entire stream (and I like that value less than I do zero).
The fact that any stream is short by the omission of a single value occurrence is only a consideration for smaller state generators. Unfortunately, 2^N period is not possible with a vanilla LFSR.
I am still working to solve that with something more eloquent than other alternatives.
At the time I did not understand your use of specific criteria on the full period to steer the statistics toward the behavior of a larger state generator (per Scro).
Seba made some comment critical of that, but as long as the end result is a stream that also pushes against ~1GB in PractRand (and passes Small Crush), all is well.
As I recall, if I hash x and y independently before passing them to the ++ scrambler, I can always get to 1GB in Practrand (with options -tf 2 -te 1), but never to 2GB.
Thus my comment about 'status quo', as it might be difficult to squeeze enough net improvement to make the effort worthwhile.
When that bad seed state is zero, as in the normal xoroshiro, you only need to ensure at least one bit is set when seeding.
With the 'Not' applied, you must xor each non-zero seed with the known bad seed value before storing the seed in the state.
Finding the bad seed (even by brute force) is fairly trivial for a 32-bit state generator.
Properly seeding is also trivial once the bad seed is known, as this is already being done with the normal xoroshiro (to avoid all zeros).
I see what you mean now. Inversion only applies to s0, though, so setting one bit in s1 should guarantee a good seed?
Actually it's subtract we need for the output, not xor. To make sure the zero output frequency is 2^N-1, subtract the non-zero 2^N-1 value from each output.
However, the forward bits analysis is slightly worse and the reverse bits is slightly better (as compared to 13,5,10,9) on ALL of the 4 I liked.
That is marginally suspicious.
Presuming you concur... it seems the issue might be related to using '>>' instead of the original '<<'.
I tried a few with '<<' instead and the balance between forward and reverse bits looked better.
Perhaps (and very likely) Seba understands something here that is not obvious to me.
Changing (back) to '<<' from '>>' (and keeping the NOT): A = 16 - A : B = (just) B : C = 16 - C
This creates an entirely new set of streams, thus possible candidates with respect to ABC and D (since the shift direction change results in a subtle asymmetry, I believe).
That will work. I'm curious how each approach affects PractRand's FPF analysis... Initial check says some effect, but not very much.
As a result, this yields an easy example... your original [13,5,10,9], with these properties:
1. Very similar PractRand results to original.
2. Excellent escape from zero properties (10 iterations, in avg. bits set in output word considering all 32 single-set-bit seeds, no output subtraction):
Original 1.5 5.375 6.96875 7.96875 8.75 8.1875 7.46875 8.6875 8.375 8.5
NOT << 1.5 9.71875 7.96875 7.84375 8.46875 7.3125 9.0625 7.8125 8.0625 7.9375
3. Likely same, or very similar pf, zrf, and nzrf statistics... would need tested (and again with output correction below, if it will be used, just to verify).
4. Must subtract (or xor) 0x3BE1 from output to restore missing zero, if having one less zero output (as original) is required.
5. Must subtract (or xor) these values when non-zero seeding: s[0] = s[0]_seed - 0x2034, s[1]= s[1]_seed - 0xB659
So much effort for so little of a change to code... likely not worth the effort except for #2 and/or if there is some improvement in testing #3.