Welcome to the Parallax Discussion Forums, sign-up to participate.

- 101.5K All Categories
- 812 Announcements
- 53 Propeller Code
- 23 PASM2/Spin2 (P2)
- 5 PASM/Spin (P1)
- 14 BASIC (for Propeller)
- 61 Forth
- 10 C/C++
- 2.8K Propeller 2
- 27.6K Propeller 1
- 18.9K BASIC Stamp
- 10 micro:bit
- 21.1K General Discussion
- 2K Learn with BlocklyProp
- 8.2K Robotics
- 124 Customer Projects
- 3.3K Accessories

## Comments

304However, a good random sequence should certainly not always produce the ideal for each independent trial, but should produce values that have a certain deviation (MAD and/or Std) from the ideal.

Since the chi^2 throws away the direction of bias, you could have seemingly good results, but all biased in one direction, which would be inferior to the same chi^2 value based on results that had no directional bias.

Hopefully the xoroshiro128** results will shed some light on this so it can be codified at the 16GB mark as an additional statistic for consideration.

I will try to do 30+ trials (and prefer to use StdDev).

304Average frequencies chi^2 total = 0.2 (very close to zero, as I predicted)

Average chi^2 frequencies total = 11.8

12.5(lowest value seen was 4)StdDev chi^2 frequencies total = 5.3 (values from 0-2 above should occasionally be seen, based on this and the above total)

For comparison, here are the pfreq(x) stats for 32 independent trials of xoroacc_rol7^AD65h_revbits(rol7):

Average frequencies chi^2 total = 2.7 (not as close to zero as I had hoped, but very respectable)

Average chi^2 frequencies total = 15.7

14.3(lowest value seen was 4)StdDev chi^2 frequencies total = 5.3 (values from 0-2 above will rarely be seen, based on this and the above total)

I have attached the ods files, just in case the above is not clear.

This should set some clear targets for other variants, and I will process the zfreq and nzfreq data when I have a chance.

There was a bug with the binary data when migrating to Windows (i.e. some of the files were larger than expected), which I have just fixed (by opening files with wb, instead of w, which is now cross-platform compatible).

Edit: Corrected 'Average chi^2 frequencies total' above, and re-attached corrected ods files.304So the 12.5 total for xoroshiro128 in the previous post, would make perfect sense, as ideal pfreq(12.47) = 1.00 and sum of good PRNG pfreqs 0 to 13 =~12.47.

I am not sure if that is an exact expectation (probably documented somewhere), but the values I see would bear out the idea, as well. Therefore, in order to combine the three sums: Average(pfreqchi^2sum / 12.47, zfreqchi^2sum / 21.26, nzfreqchi^2sum / 44.00) = 1 for an 'ideal' PRNG.

The range 0.5-2.0 would be typical. For 32 runs of xoroshiro128** high 32 bits, I got:

Min = 0.54 from Average(0.49, 0.41, 0.70)

Max = 1.85 from Average(1.29, 2.34, 1.91)

Avg = 1.11

Edit: Improved formula, added more notes, xoroshiro128** stats and added quote.Edit2: The ideal divisor values for zfreq=21.26 and nzfreq=44.00 might be slightly wrong, since x=0 is not included in the sum. Xoroshiro128** averaged divisor values are zfreq=23.74 and nzfreq=53.33.However, they are close enough to allow for rational comparisons between pfreq, zfreq and nzfreq chi^2 sums.

Edit3: Attached 3-page 'xoroshiro128** high 32 bits' ods file which shows all pfreq, zfreq and nzfreq data supporting the above.1,808http://onlinestatbook.com/2/chi_square/distribution.html states: Are the few nzfreq(51)=23 values for xoroshiro128** due to the larger state?

If you have time, could you find pfreq/zfreq/nzfreq for xoroacc ^1, no bit reversal when ^AD65H is finished?

304For the moment, let us assume that it is just random noise that had to appear somewhere. In my experience, deviations like that (in non-chaotic PRNGs) can only progress with state size: XORO32 < XOROACC < xoroshiro128. I am working on automating many of the variants by including passable parameters for 'e', rol() and revbits()... so this simplest variant 'xoroacc^1' would be e=1, rol=0, and revbits=0. Since xoroacc+1 is likely superior to ^1, I will have to run those variants as a separate set, but will do ^1 and +1 first.

Initial evidence is that there will be less distinction than expected between most variants at one double-iterated stream, so it will not be as useful as I thought it would be for an 'e' search. Still need to confirm.

I believe that pfreq, etc., performance at 12 to 24 sequential streams will be where most of the distinction will be. But it was never originally my intent to put too much weight into it until I saw how compellingly well some variants perform at it.

Ultimately, weighting factors will have to be created to balance 1 stream freqs, multi-stream pfreqs and PractRand rotation results. Gjrand at 16GB and 32GB seems useful, as it produces fairly easy to understand result... perhaps useful for cross-check or tie-breaker.

Distracted right now, as my car has failed me in a way that it will cost more to fix than it is worth, so I may have to dig into my new server funds to handle it. I will console myself by noting that TSMC's forthcoming 5nm process (and then 3nm that follows) will ensure that whatever I could buy now will be obsolete in a matter of 12-24 months.

304I have much of the simple XOROACC variant data complete, just need to cross-check and package it.

304I have the chi^2 sum data that I automated within the c++ code, but it is not entirely correct and needs to be redone (so I did not attached it).

I finally realized that the chi^2 set formula for multiple independent streams must be: (sum(actuals) - expected * actuals_count)

^2/ (expected * actuals_count)Also been studying Yate's Correction for Continuity, and the impact of misuse of Degrees of Freedom. I think we are good with the latter (which is one of the best explanations on the web), but the former might explain the excessive chi^2 value for the occasional nzfreq(51) hits on xoroshiro128**.

Edit: Fixed multi-stream chi^2 formula1,808I agree it would be better for the baseline to have a ++ scrambler, not **, however I doubt xoroshiro128++ would ever be used on the P2. The more practicable (but considerably slower) software alternative to a hardware xoroacc32 is xoroshiro64++ and it would be fairer to compare these two.

It's interesting that the xoroshiro128** in the P2 rev B needs less logic than xoroshiro128++, which is why we skipped the latter when upgrading from xoroshiro128+ in rev A.

304I ran those over again, and more variants, with the fixed chi^2 multi-stream sums: Keep in mind that having an average ~7.5x greater than xoroshiro128 in 45 double-iterated streams is good considering that known (mostly GAP and autocorrelation) defects exist just after 1 stream (16GB, which is the absolute maximum rating for this PRNG for most stringent applications, or

perhapsonly 8GB for quality hard research-level work).For comparison, here is the recommended variant at 5 streams:

30416GBstreams are run.If it were possible to implement, then the truncated form of XOROACC would also be easy to implement: acc = acc + rol(s0 + s1, D) + 1

That would have near-perfect 2-dimensional 32-bit equidistribution

(and fully 1D), but no 64-bit equidistribution(unless the +1 were applied every other iteration).Much harder to vet constants for such endeavors

, but using Seba's recommended ABC would only require searching for best D constant.I'll run a few hundred on xoroshiro64++, if I can guess a suitable D constant (since I don't think one is published).Edit: Added clarifications.Edit2: I notice xoshiro128++ (32-bit output, like xoroshiro64) uses D=7, which is likely a good starting point for xoroshiro64++.304Much lower averages seem statistically possible. An N=45 average of 4 or less would be a target goal.

I will write a program that will search for all 16-bit primes that have runs of 1 or 2 consecutive 0 or 1 bits, which seems to me related.

1,808A couple of earlier posts that might be of interest:

xoroshiro128++ vs. xoroshiro128**

http://forums.parallax.com/discussion/comment/1448782/#Comment_1448782

xoroshiro64++

http://forums.parallax.com/discussion/comment/1449224/#Comment_1449224

Seba's suggestion for what we call constant d for xoroshiro++ when state too large to iterate fully:

If output width = w, then d ~ w/2, in theory prime number is better, consider w/2 +/- 1 first.

However, this is just a rule-of-thumb and "'not set in stone."

304xoshiro128++ threw me, since it does not follow that logic.My guesses are that the use of 7 in that case:

1. Is a safe minimum value for masking binary matrix / linear complexity issues.

2. Minimizes correlations with the other shift / rotation constants in use.

3. May be more specific to 'xo' rather than 'xoro'-shiro to improve bit mixing (due to 4, instead of 2, state variables).

I had actually thought about exploring an xoshiro32 version of xoroacc (or the truncated version), with a period of 2^40-2^8 (using 5 8-bit state variables).

It could be iterated fully to perhaps help provide answers to some fundamental questions.

304Obviously more complexity would be useful for the 48-bit state versions to work around the comparatively smaller size. Here is what I found:

1. The smallest / fastest version (based on '+') that I am testing with larger states is still about 8x more random at 48-bit state than xoroshiro32++ (and has the added equidistribution and sparse-state-recovery benefits). Since a basic XORACC32 based on '+' is nearly identical in hardware implementation complexity to XORO32, it is unfortunate that this scrambler variant was not previously known.

2. The best version for maximum randomness at any state size (based on the '++' that we have been looking at) requires an additional xor of 's0' against the ACC output before subsequent addition (which can be done in parallel with the rotl(s0+s1) before final addition). This in intriguing, since it forces multiple collision carries with the 'x0' in the ++ part, that I originally assumed would lead to less randomness, but was wrong (obviously so, after observing forward PractRand fail

well afterthe 32GB mark).3. None of the versions I am testing or recommending have an xor/addition constant greater than 1 (+1 is currently recommended, per Tony's original intuition), since large values increase complexity without fully solving the base issue with multi-stream correlation.

4. When creating a double-iterated version for maximum randomness (~32x that of XORO32, and including the addition of near-perfect 32-bit equidistribution), the modification described in #2 coupled with rotl(XOROACC,1) on the second iteration is the best. Revbits(XOROACC) no longer becomes worth considering (also because it created headaches for mathematical analysis and a slower software implementation).

I'll post test code for some of the above variants at various state sizes after I have finished further evaluations.

12,207304In summary, we are not exactly discussing replacement, but if we were, then in that context

(at least 1 of)the XORO32 output registers would become bi-directional, thus not technically using any more state than currently used by XORO32. (Then an explosion of benefits ensues which initially were difficult to believe).304programmaticmechanism exists that I am aware of to prove the loss of random quality that is apparent at 16-bit.30412,207Chip may be able to indicate how hard that would be to do.

12,207304Edit: Improved code execution speed by deferring state[2] reassignment through use of tmp variable.Edit2: Better documentation added here.12,20712,2071,808Modified XORO32 logic is needed in a future P2 or P3 to provide a new XOROACC function, yet only two instructions will be required, the same as now, as follows:

Some of the existing XORO32 adders could be re-used for XOROACC, probably, saving logic.

result is the new 32-bit PRN result, made up of two separate 16-bit accumulations in a similar way to the current XORO32 PRN. The high word of result is used to generate the next result, thus it must not be destroyed and the state is now 48 bits. The extra state bits mean the overall period is larger, consisting of 2^16 streams of 2^33-2 XORO32 iterations. Each stream is correlated with every other stream, with constant differences between outputs for corresponding iterations.

Instead of one stream following another, there is a jump to a different stream after each xoroshiro32++ double-iteration. These jumps produce near-perfect 2D equidistribution, i.e. each possible 32-bit result occurs almost equally over the full extended period. With or without jumps, there is perfect 1D equidistribution, i.e. each 16-bit word occurs exactly equally, including zero. Compare this with the current XORO32, which has almost perfect 1D (16-bit zero occurs one time less than non-zero outputs) and a good approximation to random 32-bit output but nowhere near 2D (e.g. 37% of possible long values never occur).

Chris, thanks for the update. Just to confirm, are you saying that +1 and rotl(XOROACC,1) are acceptable with xoroshiro32++ and anything more complicated is unnecessary? If so, then the +1 could be implemented as a simple add-with-carry.

Note:

This extra +1 every double-iteration is enough to cause a jump to another stream and these jumps are crucial in achieving near-perfect 2D.

12,207304All previous code that Tony and I discussed was based on the original ++, which is about 2x-4x more random than + only. Therefore, we had not discussed altering the underlying ++ to drop the '+ s0'.

Recently, I found that keeping the underlying XORO32 (with its ++) could be improved further by xoring the extended state with state[0] prior to adding the original scrambler +1... but this assumes access to state information that is no longer available from XORO32. Therefore, it would have to be embedded in the base scrambler, which was not on the table as an option. Interesting though. Yes, for the most part. That gets us up to 16GB randomness and acknowledges that any minor attempt (e.g. xor 0xAD65h instead of +1) to mask issues going beyond that is denying the issues with subsequent double-iterated stream correlation that only 'looks' less correlated.

You can optionally add the +1 to the second iteration in addition to the first (if that consistency were somehow advantageous), but it must be prior to the (now required) rotation, else it breaks the 32-bit equidistribution properties. I don't recall comparing the randomness directly with and without the +1 in the second iteration when the rotation is used. Reminder that the second iteration rotation was used to break the early GAP issue and improve overall randomness

from 8GB up to 16GB.30412,207My takeaway is the existing XORO32 is still about as good as it can be without taking longer to execute or using more state store. Probably both.

304Here are the PractRand results for a comparable double-iterated + variant using the rotl(xoroshiro32prol,1) on the second iteration (for which you can extrapolate the 32-bit word variant properties): Notice the fail is very gentle... it could have just as easily failed at 32GB with different seeds.

Chasing the rabbit down the hole with more complicate code using ++ will not extend the forward failure beyond 32GB (though I recall the reverse failure is extended to 128GB): All of these variants are at least 16x more random than XORO32 (and all with the superior equidistribution properties).

12,207