Random/LFSR on P2

evanh · 2019-04-03 14:40

What's the 10 iterations mean? Doesn't it escape in one?

xoroshironot · 2019-04-03 15:08

evanh wrote: »

What's the 10 iterations mean? Doesn't it escape in one?

Yes, it does escape 1 bit set after the first iteration. Also, the ++ operation always converts the 1st output to 1.5 bits set, on average.
The question is: how many iterations does it take for the output to return to ~8 bits set, on average?
You can see that the original has recovered by the 4th iteration, whereas the Not has recovered by the 3rd iteration. But it doesn't end there...

The Not requires a klunky output xor or subtraction to restore the missing zero, which I did not apply to those results... and this is where it gets interesting:
Once an output subtraction is applied, all escape-from-zero-like issues disappear, and therefore should not reappear anywhere in the stream (except as would be statistically normal).
That is the only benefit I see with the Not code, and my further research shows it likely that any overall improvements for possible candidates will come from the use of the '<<', rather than the '>>' originally proposed.

evanh · 2019-04-03 15:13

What average? It's a PRNG. It repeats in a fixed way for a given set of constants.

TonyB_ · 2019-04-03 15:19

xoroshironot wrote: »
Switching the shift back to '<<' (so now only a Not has been added to the original code), takes us back to the pre-Not constants, since xor, rotate and shift are unaffected by that change.

As a result, this yields an easy example... your original [13,5,10,9], with these properties:
1. Very similar PractRand results to original.
2. Excellent escape from zero properties (10 iterations, in avg. bits set in output word considering all 32 single-set-bit seeds, no output subtraction):
Original	1.5	5.375	6.96875	7.96875	8.75	8.1875	7.46875	8.6875	8.375	8.5	
NOT 	<<	1.5	9.71875	7.96875	7.84375	8.46875	7.3125	9.0625	7.8125	8.0625	7.9375
3. Likely same, or very similar pf, zrf, and nzrf statistics... would need tested (and again with output correction below, if it will be used, just to verify).
4. Must subtract (or xor) 0x3BE1 from output to restore missing zero, if having one less zero output (as original) is required.
5. Must subtract (or xor) these values when non-zero seeding: s[0] = s[0]_seed - 0x2034, s[1]= s[1]_seed - 0xB659

So much effort for so little of a change to code... likely not worth the effort except for #2 and/or if there is some improvement in testing #3.

Yes, of course, either subtract or xor could be used to 'zero' the output. I don't know whether there is any difference in PractRand scores. There is a big difference in P2 code because the 32-bit PRN is composed of two consecutive 16-bit outputs and xor'ing is much easier, just xor both words with the same value. Subtraction would not be nice as there is carry/borrow to worry about between the two words.

I was wondering about how Not would work with the existing xoroshiro algorithm and perhaps the best name for the Not engine is xoroshiron? In terms of logic in the chip adding Not is nothing.

evanh · 2019-04-03 15:32

Tony,
It doesn't seem like it is producing anything worth naming. I've started a grid run for those top 13 candidates you've listed.

evanh · 2019-04-03 15:44

Oops, I've got the gridding configured for minimal number of aperture sizes. We're getting some quick results ...

evanh · 2019-04-03 16:05

evanh wrote: »

It doesn't seem like it is producing anything worth naming.

Ie: The NOT appears to be just swapping zero for something else. Full-period is still 2^N - 1. There is no gain.

TonyB_ · 2019-04-03 16:25

evanh wrote: »

evanh wrote: »

It doesn't seem like it is producing anything worth naming.

Ie: The NOT appears to be just swapping zero for something else. Full-period is still 2^N - 1. There is no gain.

There might be a slight increase in randomness, which would be a gain.

evanh · 2019-04-03 16:57

Slight doesn't show up very well. Parity was something but even that was tossed out.

evanh · 2019-04-03 17:42

Right, done those. Conveniently, candidate [8 1 9 4] comes out very strong on the grid test.

__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [2 2 7 2].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  128M  128M  256M
 08 |    1G  256M  512M  256M  256M  256M  256M  128M  128M    1G    1G    1G  256M  256M  256M  512M
 04 |  512M  512M  512M  512M  512M  512M  512M    2G    1G  512M  512M  512M  512M    1G    1G  512M
  Lowest Exponent = 27   Exponent Average = 28.833


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [2 3 3 4].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  128M  128M  512M  512M  256M  512M
 08 |    1G  512M    1G    1G  512M    1G    1G  512M  128M  256M  512M   64M  512M  512M  128M  512M
 04 |    1G    2G    2G  512M    1G  512M  512M    1G  512M    2G  256M  512M  256M  512M   64M    1G
  Lowest Exponent = 26   Exponent Average = 28.937


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [3 3 2 12].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  128M  128M  128M  128M  128M  256M    8M    2M    1M    8M   64M  128M  512M  256M  256M  128M
 08 |    4M    4M    4M    2M    2M    2M    2M    1M    4M    8M    8M    8M    8M    8M    8M    8M
 04 |  256M  256M    1G  256M   64M   64M    2M  512K  256K    2M   16M    2G    2G    1G    1G    2G
  Lowest Exponent = 18   Exponent Average = 24.791


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [3 11 2 10].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  512M  512M  256M  256M  128M  128M  128M   32M   32M   16M   32M   64M   64M  512M  512M  512M
 08 |  128M   64M   64M   16M   16M   16M   32M   32M   32M   64M   64M  128M    1G  512M  512M  256M
 04 |  256M    1G   64M   64M   32M   32M   32M    8M    8M    4M    8M   16M   16M  256M    1G  512M
  Lowest Exponent = 22   Exponent Average = 26.354


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [3 13 2 9].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  512M  512M  512M  512M  512M  512M  256M  512M  512M  512M  512M  512M  512M  512M  512M  512M
 08 |  256M  512M  256M  256M  256M  256M  256M  512M  512M  512M  512M    1G    1G  512M  512M  512M
 04 |  256M  512M  256M  256M  256M    1G   64M  128M    2G    1G  256M  256M  128M  512M  512M  512M
  Lowest Exponent = 26   Exponent Average = 28.708


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [6 2 9 1].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |   64M   32M   16M    4M   32M  512M   32M   32M   32M   16M   16M  512M  512M  512M  512M  128M
 08 |  256M  256M   32M    2M   16M  512M  512M  256M  512M  128M    8M  512M  512M  512M  512M  512M
 04 |  128M  256M  128M   64M   32M  512M    1G    1G    1G  256M  256M  512M  512M  512M    1G  512M
  Lowest Exponent = 21   Exponent Average = 27.229


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [6 5 3 2].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |   64M   32M   32M   32M   32M  128M   64M   64M   32M   64M   32M  128M   64M  512M  512M  512M
 08 |   32M   32M   32M    1G    1G  256M  512M  128M   64M  128M  256M  512M  512M  512M  256M   64M
 04 |   16M  128M  256M  512M  512M  512M  256M  256M    2G    2G   32M   32M   16M  128M  256M  256M
  Lowest Exponent = 24   Exponent Average = 27.145


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [7 2 2 3].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  128M   64M  128M  128M  512M
 08 |  256M  512M  512M  512M  256M  256M  512M  512M  256M  256M  256M  128M  256M  256M  512M  512M
 04 |  512M  512M  256M    1G    1G  256M  128M    1G  128M  256M  512M   32M   16M   32M    2G    2G
  Lowest Exponent = 24   Exponent Average = 28.333


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [7 2 2 6].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  256M  512M
 08 |  512M  512M  512M  512M  512M  512M  512M    1G  512M    1G    1G  512M  256M  512M    1G  512M
 04 |  256M    1G    1G  512M  512M  512M    1G    2G    2G  512M  512M    1G    1G    1G   64M    1G
  Lowest Exponent = 26   Exponent Average = 29.187


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [8 1 9 4].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  512M    1G  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M    1G  512M    1G    1G
 08 |  512M  512M  512M    1G  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M
 04 |  512M  256M    1G    1G    2G  512M  512M    1G  512M    1G    1G    1G  512M  512M    2G    1G
  Lowest Exponent = 28   Exponent Average = 29.312


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [8 1 9 12].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  256M  512M  512M  512M  512M  256M  256M  256M  128M   64M  512M  512M  512M  512M  512M  512M
 08 |  512M  128M  512M  512M  512M    1G  512M  512M  512M   32M  512M  512M  512M  512M  512M  512M
 04 |  512M   64M  512M    2G    1G    1G  512M    1G    1G  512M    1G  512M  256M    1G    1G  512M
  Lowest Exponent = 25   Exponent Average = 28.812


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [13 3 6 4].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  128M  512M  256M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M
 08 |  256M    1G  512M  256M    1G    1G    1G    1G    1G  512M    1G  512M  256M    1G    1G    1G
 04 |   32M  128M   64M    2G  256M  512M    1G  512M  512M  256M  512M    2G    1G  512M  512M  128M
  Lowest Exponent = 25   Exponent Average = 28.937


__________________________________________________________________________________________________________
  Gridded scores of single iterated Xorograyro32(16)++ candidate [14 11 13 10].
      Byte Sampled Double Full Period = 8 GB
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  512M  256M  256M  512M    1G  512M  512M  512M  512M  512M  512M  512M  512M  512M    1G  256M
 08 |   16M    8M    8M    4M    2M    2M    2M    2M    4M   16M   64M  128M  256M  512M  512M    1G
 04 |  256M  512M    1G  512M  512M    1G    2G  512M  512M    1G    1G  512M  512M  512M    1G   64M
  Lowest Exponent = 21   Exponent Average = 27.520

xoroshironot · 2019-04-03 18:08

evanh wrote: »

What average? It's a PRNG. It repeats in a fixed way for a given set of constants.

If you generate a distribution equating number of bits set in the state to the number of bits set in the next state, in the output you will find a strong correlation in (fewer) bits set than would be statistically likely.

With the normal xoroshiro (and other LFSRs), the issues discussed were:
1. Binary matrix, linear complexity and Hamming weight/distance, all solved very adequately by applying the ++ scrambler.
2. Sparse state recovery (aka escape-from-zero), not solved fully by applying ++ scrambler, but solved completely with Not combined with zero correction.
3. Missing output value (usually zero by convention), to my knowledge not solvable in any LFSR without invoking a full state operation (which I have been researching).

All of the above has been of minor consideration for most applications. Normally, only a purist with strict requirements would worry about such things.
Whether or not any of this is more or less relevant when discussing smaller state generators is debatable.

TonyB_ wrote: »

There might be a slight increase in randomness, which would be a gain.

I guess it would be good to know, one way or another.
At this point, change would likely be justified only by a very measurable statistical improvement, which might occur only with left shift, rather than right, if at all.

evanh · 2019-04-03 18:17

xoroshironot wrote: »

3. Missing output value (usually zero by convention), to my knowledge not solvable in any LFSR without invoking a full state operation (which I have been researching).

Isn't that a missing state value?

evanh · 2019-04-03 18:19

xoroshironot wrote: »

If you generate a distribution equating number of bits set in the state to the number of bits set in the next state, in the output you will find a strong correlation in (fewer) bits set than would be statistically likely.

I'd probably call that a rate of change or slope.

evanh · 2019-04-03 18:27

I'm guessing the slope varies with bit distribution in the state store and presumably flattens when there is mostly set bits as well as when mostly clear bits.

xoroshironot · 2019-04-03 18:31

evanh wrote: »

xoroshironot wrote: »

3. Missing output value (usually zero by convention), to my knowledge not solvable in any LFSR without invoking a full state operation (which I have been researching).

Isn't that a missing state value?

Yes, I should have been speaking in terms of the full state... PCG, for example, deals with this issue by using a full state operation on an LCG, which unfortunately requires multiplication and extra operations to achieve a very reasonable degree of randomness.

evanh wrote: »

xoroshironot wrote: »

If you generate a distribution equating number of bits set in the state to the number of bits set in the next state, in the output you will find a strong correlation in (fewer) bits set than would be statistically likely.

I'd probably call that a rate of change or slope.

Fair enough... I don't recall PractRand or TestU01 Small Crush attempting to dig out those statistics to find the known fault with LFSRs (excepting perhaps Not), but I could be wrong.

xoroshironot · 2019-04-03 18:47

evanh wrote: »

I'm guessing the slope varies with bit distribution in the state store and presumably flattens when there is mostly set bits as well as when mostly clear bits.

The mostly set bits may very often quickly translate to mostly clear bits, thus I believe creates a slightly different distribution, which is likely biased toward creating more sparse (mostly clear) bit states in succession. In any case, there is usually oscillation, and the lag should be expected and somehow accounted for in any meaningful statistic.

evanh · 2019-04-03 18:50

I'm just using engineering terms. I have no knowledge of the science of hashing or cryptography.

evanh · 2019-04-03 18:52

xoroshironot wrote: »

The mostly set bits may very often quickly translate to mostly clear bits, thus I believe creates a slightly different distribution, which is likely biased toward creating more sparse (mostly clear) bit states in succession. In any case, there is usually oscillation, and the lag should be expected and somehow accounted for in any meaningful statistic.

That's with the extra NOT, right?

evanh · 2019-04-03 19:01

Btw, Chris, I don't know if you've seen this before, here's an example of what Seba would produce for a given engine and word size: This one being Xoroshiro32(16):

evanh · 2019-04-03 19:03

The polynomials are meaningless to me but the ranking next to the constants indicates likely good candidates. With 11 to 15 being the sweet spot for word size 16.

EDIT: They're also the most prolific rankings too, so not a huge benefit.

evanh · 2019-04-03 19:22

Tony,
Are we wanting to do double iterated (32-bit outputs) distribution and scoring runs of this first algorithm? Or move onto some of the other modifications instead?

TonyB_ · 2019-04-03 20:45

evanh wrote: »

Right, done those. Conveniently, candidate [8 1 9 4] comes out very strong on the grid test.

Thanks for the tests, Evan. [2,2,7,2] and [13,3,6,4] have the best distributions, although I haven't looked at the also-rans based on pfreq.

evanh wrote: »

Tony,
Are we wanting to do double iterated (32-bit outputs) distribution and scoring runs of this first algorithm? Or move onto some of the other modifications instead?

If you're willing to do it, I think we should test Not with xoroshiro32++ [a,b,c,d] because then we could see what difference it makes. Pair frequencies first and then PractRand tests, maybe.

xoroshironot · 2019-04-03 21:02

evanh wrote: »

xoroshironot wrote: »

The mostly set bits may very often quickly translate to mostly clear bits, thus I believe creates a slightly different distribution, which is likely biased toward creating more sparse (mostly clear) bit states in succession. In any case, there is usually oscillation, and the lag should be expected and somehow accounted for in any meaningful statistic.

That's with the extra NOT, right?

No, that is without the NOT. The extra NOT, combined with the output correction (back to a single missing zero) should erase most all artifacts of sparse state effect on the next outputs.

xoroshironot · 2019-04-03 22:28

evanh wrote: »

The polynomials are meaningless to me but the ranking next to the constants indicates likely good candidates. With 11 to 15 being the sweet spot for word size 16.
EDIT: They're also the most prolific rankings too, so not a huge benefit.

The polynomial weight and exponents should be the same as Seba's doc for NOT with '<<' (that I have back-peddled to recommend if '>>' does not meet expectations).
I believe that '>>' would maintain the same weight if you remap 16-A and 16-C to his doc (B stays the same), and the exponents would be recalculated as x^(32-exp).
For example, '>>' candidate 2,3,3 would become (16-2),3,(16-3) if '<<'
From Seba's list ('<<'): 14,3,13 17 x^32 + x^30 + x^28 + x^21 + x^20 + x^17 + x^16 + x^13 + x^11 + x^10 + x^9 + x^7 + x^6 + x^5 + x^4 + x^2 + 1
Which translates to ('>>') 2,3,3 17 x^32 + x^30 + x^28 + x^27 + x^26 + x25 + x^23 + x^22 + x^21 + x^19 + x^16 + s^15 + x^12 + x^11 + x^4 + x^2 + 1

As I recall, Seba uses the Fermat Computer Algebra System (which I have only recently taken interest in) to calculate and export these polynomials.
I am just reversing the constants and exponents in my head (which is admittedly unreliable) for '>>'.

xoroshironot · 2019-04-04 02:48

Just to be clear on why I was finding fault in my own choice of '>>' (right-shift) versus the normal left-shift:
In my own further tests with PractRand, in addition to forward testing, I also automate the reverse of 32-bits at a time to review the output backwards.
The reason I do this is that PractRand test are optimized to find common faults, so are not as sensitive to certain issues with high bits on some tests and with low bits on other tests.
The '>>' in several cases I have looked at triggers premature (pesky) FPF failures (at 32MB, for example) only on the reversed bits.
This is an uncommon fault that PractRand is unable to notice early on with the forward bits.
Practically speaking, FPF issues with the reversed bits should not present a problem, 'as one would normally not use random numbers in a way where that matters'.
We have heard that statement before (when the issue is endemic and cannot be easily corrected).
However, in this case, I noticed the issue seems to often dissipate completely when flipping the shift back to the left, yet I still see some further improvements overall with keeping the 'Not' present.
I am not implying that there are no good right-shift candidates, just that you might have to be extremely careful in vetting them, as the current XORO32 is extremely well tuned (both forward and reverse).

evanh · 2019-04-04 05:17

Okay, on to Xoroshiron32. Algorithm attached.

First 20 output values of candidate [13 5 10 9]:

0201
9da5
9819
0bd6
d3b9
0819
fdb5
7f27
58fe
39d4
66cd
9062
72c0
7783
0cd8
3a0a
2db6
dd42
0bdb
8f4f

EDIT: Removed full-period data. Made a mistake with compile directives.

TonyB_ · 2019-04-04 11:57

Evan, I agree with your values for xoroshiron32++ [13,5,10,9].

evanh · 2019-04-04 12:28

Cool. I spent a while today working engine replacement selection into both the scripts and the C sources so that no longer have to duplicate so much.

Right now running all frequency distributions for full-period triplet set. Which presumably is same set as Xoroshiro32 engine. I haven't bothered to sort for compare. EDIT: Confirmed to be identical full-period triplet set.

evanh · 2019-04-04 23:12

Xoroshiron32 frequency distribution data attached.

xoroshironot · 2019-04-05 01:59

[Edited to fix bad values for 'Seed S1 xor'... replacement file is a 7z]
Xoroshiron32 ('<<') constants for seed and output correction attached.

Seed pseudocode:
S[0] = {non-zero seed 0} ^ {Seed S0 xor};
S[1] = {non-zero seed 1} ^ {Seed S1 xor};

Output pseudocode:
rnd = x + rotl(x+y,D) ^ {Output Xor D=#}; // Could be minus instead of ^

Let me know if you see any issues.

Random/LFSR on P2

Comments