What's the 10 iterations mean? Doesn't it escape in one?

Yes, it does escape 1 bit set after the first iteration. Also, the ++ operation always converts the 1st output to 1.5 bits set, on average.
The question is: how many iterations does it take for the output to return to ~8 bits set, on average?
You can see that the original has recovered by the 4th iteration, whereas the Not has recovered by the 3rd iteration. But it doesn't end there...

The Not requires a klunky output xor or subtraction to restore the missing zero, which I did not apply to those results... and this is where it gets interesting:
Once an output subtraction is applied, all escape-from-zero-like issues disappear, and therefore should not reappear anywhere in the stream (except as would be statistically normal).
That is the only benefit I see with the Not code, and my further research shows it likely that any overall improvements for possible candidates will come from the use of the '<<', rather than the '>>' originally proposed.

Switching the shift back to '<<' (so now only a Not has been added to the original code), takes us back to the pre-Not constants, since xor, rotate and shift are unaffected by that change.

As a result, this yields an easy example... your original [13,5,10,9], with these properties:
1. Very similar PractRand results to original.
2. Excellent escape from zero properties (10 iterations, in avg. bits set in output word considering all 32 single-set-bit seeds, no output subtraction):

3. Likely same, or very similar pf, zrf, and nzrf statistics... would need tested (and again with output correction below, if it will be used, just to verify).
4. Must subtract (or xor) 0x3BE1 from output to restore missing zero, if having one less zero output (as original) is required.
5. Must subtract (or xor) these values when non-zero seeding: s[0] = s[0]_seed - 0x2034, s[1]= s[1]_seed - 0xB659

So much effort for so little of a change to code... likely not worth the effort except for #2 and/or if there is some improvement in testing #3.

Yes, of course, either subtract or xor could be used to 'zero' the output. I don't know whether there is any difference in PractRand scores. There is a big difference in P2 code because the 32-bit PRN is composed of two consecutive 16-bit outputs and xor'ing is much easier, just xor both words with the same value. Subtraction would not be nice as there is carry/borrow to worry about between the two words.

I was wondering about how Not would work with the existing xoroshiro algorithm and perhaps the best name for the Not engine is xoroshiron? In terms of logic in the chip adding Not is nothing.

What average? It's a PRNG. It repeats in a fixed way for a given set of constants.

If you generate a distribution equating number of bits set in the state to the number of bits set in the next state, in the output you will find a strong correlation in (fewer) bits set than would be statistically likely.

With the normal xoroshiro (and other LFSRs), the issues discussed were:
1. Binary matrix, linear complexity and Hamming weight/distance, all solved very adequately by applying the ++ scrambler.
2. Sparse state recovery (aka escape-from-zero), not solved fully by applying ++ scrambler, but solved completely with Not combined with zero correction.
3. Missing output value (usually zero by convention), to my knowledge not solvable in any LFSR without invoking a full state operation (which I have been researching).

All of the above has been of minor consideration for most applications. Normally, only a purist with strict requirements would worry about such things.
Whether or not any of this is more or less relevant when discussing smaller state generators is debatable.

There might be a slight increase in randomness, which would be a gain.

I guess it would be good to know, one way or another.
At this point, change would likely be justified only by a very measurable statistical improvement, which might occur only with left shift, rather than right, if at all.

3. Missing output value (usually zero by convention), to my knowledge not solvable in any LFSR without invoking a full state operation (which I have been researching).

If you generate a distribution equating number of bits set in the state to the number of bits set in the next state, in the output you will find a strong correlation in (fewer) bits set than would be statistically likely.

I'm guessing the slope varies with bit distribution in the state store and presumably flattens when there is mostly set bits as well as when mostly clear bits.

3. Missing output value (usually zero by convention), to my knowledge not solvable in any LFSR without invoking a full state operation (which I have been researching).

Isn't that a missing state value?

Yes, I should have been speaking in terms of the full state... PCG, for example, deals with this issue by using a full state operation on an LCG, which unfortunately requires multiplication and extra operations to achieve a very reasonable degree of randomness.

If you generate a distribution equating number of bits set in the state to the number of bits set in the next state, in the output you will find a strong correlation in (fewer) bits set than would be statistically likely.

I'd probably call that a rate of change or slope.

Fair enough... I don't recall PractRand or TestU01 Small Crush attempting to dig out those statistics to find the known fault with LFSRs (excepting perhaps Not), but I could be wrong.

I'm guessing the slope varies with bit distribution in the state store and presumably flattens when there is mostly set bits as well as when mostly clear bits.

The mostly set bits may very often quickly translate to mostly clear bits, thus I believe creates a slightly different distribution, which is likely biased toward creating more sparse (mostly clear) bit states in succession. In any case, there is usually oscillation, and the lag should be expected and somehow accounted for in any meaningful statistic.

The mostly set bits may very often quickly translate to mostly clear bits, thus I believe creates a slightly different distribution, which is likely biased toward creating more sparse (mostly clear) bit states in succession. In any case, there is usually oscillation, and the lag should be expected and somehow accounted for in any meaningful statistic.

Btw, Chris, I don't know if you've seen this before, here's an example of what Seba would produce for a given engine and word size: This one being Xoroshiro32(16):

The polynomials are meaningless to me but the ranking next to the constants indicates likely good candidates. With 11 to 15 being the sweet spot for word size 16.

EDIT: They're also the most prolific rankings too, so not a huge benefit.

Tony,
Are we wanting to do double iterated (32-bit outputs) distribution and scoring runs of this first algorithm? Or move onto some of the other modifications instead?

Tony,
Are we wanting to do double iterated (32-bit outputs) distribution and scoring runs of this first algorithm? Or move onto some of the other modifications instead?

If you're willing to do it, I think we should test Not with xoroshiro32++ [a,b,c,d] because then we could see what difference it makes. Pair frequencies first and then PractRand tests, maybe.

The mostly set bits may very often quickly translate to mostly clear bits, thus I believe creates a slightly different distribution, which is likely biased toward creating more sparse (mostly clear) bit states in succession. In any case, there is usually oscillation, and the lag should be expected and somehow accounted for in any meaningful statistic.

That's with the extra NOT, right?

No, that is without the NOT. The extra NOT, combined with the output correction (back to a single missing zero) should erase most all artifacts of sparse state effect on the next outputs.

The polynomials are meaningless to me but the ranking next to the constants indicates likely good candidates. With 11 to 15 being the sweet spot for word size 16.
EDIT: They're also the most prolific rankings too, so not a huge benefit.

The polynomial weight and exponents should be the same as Seba's doc for NOT with '<<' (that I have back-peddled to recommend if '>>' does not meet expectations).
I believe that '>>' would maintain the same weight if you remap 16-A and 16-C to his doc (B stays the same), and the exponents would be recalculated as x^(32-exp).
For example, '>>' candidate 2,3,3 would become (16-2),3,(16-3) if '<<'
From Seba's list ('<<'): 14,3,13 17 x^32 + x^30 + x^28 + x^21 + x^20 + x^17 + x^16 + x^13 + x^11 + x^10 + x^9 + x^7 + x^6 + x^5 + x^4 + x^2 + 1
Which translates to ('>>') 2,3,3 17 x^32 + x^30 + x^28 + x^27 + x^26 + x25 + x^23 + x^22 + x^21 + x^19 + x^16 + s^15 + x^12 + x^11 + x^4 + x^2 + 1

As I recall, Seba uses the Fermat Computer Algebra System (which I have only recently taken interest in) to calculate and export these polynomials.
I am just reversing the constants and exponents in my head (which is admittedly unreliable) for '>>'.

Just to be clear on why I was finding fault in my own choice of '>>' (right-shift) versus the normal left-shift:
In my own further tests with PractRand, in addition to forward testing, I also automate the reverse of 32-bits at a time to review the output backwards.
The reason I do this is that PractRand test are optimized to find common faults, so are not as sensitive to certain issues with high bits on some tests and with low bits on other tests.
The '>>' in several cases I have looked at triggers premature (pesky) FPF failures (at 32MB, for example) only on the reversed bits.
This is an uncommon fault that PractRand is unable to notice early on with the forward bits.
Practically speaking, FPF issues with the reversed bits should not present a problem, 'as one would normally not use random numbers in a way where that matters'.
We have heard that statement before (when the issue is endemic and cannot be easily corrected).
However, in this case, I noticed the issue seems to often dissipate completely when flipping the shift back to the left, yet I still see some further improvements overall with keeping the 'Not' present.
I am not implying that there are no good right-shift candidates, just that you might have to be extremely careful in vetting them, as the current XORO32 is extremely well tuned (both forward and reverse).

Cool. I spent a while today working engine replacement selection into both the scripts and the C sources so that no longer have to duplicate so much.

Right now running all frequency distributions for full-period triplet set. Which presumably is same set as Xoroshiro32 engine. I haven't bothered to sort for compare. EDIT: Confirmed to be identical full-period triplet set.

## Comments

10,423280The question is: how many iterations does it take for the output to return to ~8 bits set, on average?

You can see that the original has recovered by the 4th iteration, whereas the Not has recovered by the 3rd iteration. But it doesn't end there...

The Not requires a klunky output xor or subtraction to restore the missing zero, which I did not apply to those results... and this is where it gets interesting:

Once an output subtraction is applied, all escape-from-zero-like issues disappear, and therefore should not reappear anywhere in the stream (except as would be statistically normal).

That is the only benefit I see with the Not code, and my further research shows it likely that any overall improvements for possible candidates will come from the use of the '<<', rather than the '>>' originally proposed.

10,4231,625Yes, of course, either subtract or xor could be used to 'zero' the output. I don't know whether there is any difference in PractRand scores. There is a big difference in P2 code because the 32-bit PRN is composed of two consecutive 16-bit outputs and xor'ing is much easier, just xor both words with the same value. Subtraction would not be nice as there is carry/borrow to worry about between the two words.

I was wondering about how Not would work with the existing xoroshiro algorithm and perhaps the best name for the Not engine is

xoroshiron? In terms of logic in the chip adding Not is nothing.10,423It doesn't seem like it is producing anything worth naming. I've started a grid run for those top 13 candidates you've listed.

10,42310,4231,625There might be a slight increase in randomness, which would be a gain.

10,42310,423280With the normal xoroshiro (and other LFSRs), the issues discussed were:

1. Binary matrix, linear complexity and Hamming weight/distance, all solved very adequately by applying the ++ scrambler.

2. Sparse state recovery (aka escape-from-zero), not solved fully by applying ++ scrambler, but solved completely with Not combined with zero correction.

3. Missing output value (usually zero by convention), to my knowledge not solvable in any LFSR without invoking a full state operation (which I have been researching).

All of the above has been of minor consideration for most applications. Normally, only a purist with strict requirements would worry about such things.

Whether or not any of this is more or less relevant when discussing smaller state generators is debatable.

I guess it would be good to know, one way or another.

At this point, change would likely be justified only by a very measurable statistical improvement, which might occur only with left shift, rather than right, if at all.

10,42310,42310,42328028010,42310,42310,42310,423EDIT: They're also the most prolific rankings too, so not a huge benefit.

10,423Are we wanting to do double iterated (32-bit outputs) distribution and scoring runs of this first algorithm? Or move onto some of the other modifications instead?

1,625Thanks for the tests, Evan. [2,2,7,2] and [13,3,6,4] have the best distributions, although I haven't looked at the also-rans based on pfreq.

If you're willing to do it, I think we should test Not with xoroshiro32++ [a,b,c,d] because then we could see what difference it makes. Pair frequencies first and then PractRand tests, maybe.

280280I believe that '>>' would maintain the same weight if you remap 16-A and 16-C to his doc (B stays the same), and the exponents would be recalculated as x^(32-exp).

For example, '>>' candidate 2,3,3 would become (16-2),3,(16-3) if '<<'

From Seba's list ('<<'): 14,3,13 17 x^32 + x^30 + x^28 + x^21 + x^20 + x^17 + x^16 + x^13 + x^11 + x^10 + x^9 + x^7 + x^6 + x^5 + x^4 + x^2 + 1

Which translates to ('>>') 2,3,3 17 x^32 + x^30 + x^28 + x^27 + x^26 + x25 + x^23 + x^22 + x^21 + x^19 + x^16 + s^15 + x^12 + x^11 + x^4 + x^2 + 1

As I recall, Seba uses the Fermat Computer Algebra System (which I have only recently taken interest in) to calculate and export these polynomials.

I am just reversing the constants and exponents in my head (which is admittedly unreliable) for '>>'.

280In my own further tests with PractRand, in addition to forward testing, I also automate the reverse of 32-bits at a time to review the output backwards.

The reason I do this is that PractRand test are optimized to find common faults, so are not as sensitive to certain issues with high bits on some tests and with low bits on other tests.

The '>>' in several cases I have looked at triggers premature (pesky) FPF failures (at 32MB, for example) only on the reversed bits.

This is an uncommon fault that PractRand is unable to notice early on with the forward bits.

Practically speaking, FPF issues with the reversed bits should not present a problem, 'as one would normally not use random numbers in a way where that matters'.

We have heard that statement before (when the issue is endemic and cannot be easily corrected).

However, in this case, I noticed the issue seems to often dissipate completely when flipping the shift back to the left, yet I still see some further improvements overall with keeping the 'Not' present.

I am not implying that there are no good right-shift candidates, just that you might have to be extremely careful in vetting them, as the current XORO32 is extremely well tuned (both forward and reverse).

10,423First 20 output values of candidate [13 5 10 9]:

EDIT: Removed full-period data. Made a mistake with compile directives.

1,62510,423Right now running all frequency distributions for full-period triplet set. Which presumably is same set as Xoroshiro32 engine. I haven't bothered to sort for compare. EDIT: Confirmed to be identical full-period triplet set.

10,423280Xoroshiron32 ('<<') constants for seed and output correction attached.

Seed pseudocode:

S[0] = {non-zero seed 0} ^ {Seed S0 xor};

S[1] = {non-zero seed 1} ^ {Seed S1 xor};

Output pseudocode:

rnd = x + rotl(x+y,D) ^ {Output Xor D=#}; // Could be minus instead of ^

Let me know if you see any issues.