Welcome to the Parallax Discussion Forums, sign-up to participate.

- 101.5K All Categories
- 812 Announcements
- 49 Propeller Code
- 21 PASM2/Spin2 (P2)
- 4 PASM/Spin (P1)
- 13 BASIC (for Propeller)
- 61 Forth
- 10 C/C++
- 2.8K Propeller 2
- 27.6K Propeller 1
- 18.9K BASIC Stamp
- 9 micro:bit
- 21.1K General Discussion
- 2K Learn with BlocklyProp
- 8.2K Robotics
- 124 Customer Projects
- 3.3K Accessories

## Comments

1,803I've been looking at the final pfreq distribution and ignoring individual pair outputs. Just to clarify, the lsb issue is that the xored bit has lower quality than if no xor?

Another issue for me with xoring the high word is that the low and high words are treated differently, with only the latter modified within the same pair. I think it might be better to xor the low word instead, but my current preference is adding 1 to the low word so that both words are one more than they would have been otherwise.

The jump action then becomes clearer: rather than jumping after every double-iteration period, the jumping occurs after every double-iteration, thus the streams are interleaved not sequential. It is jumping that produces 2D equidistribution.

(This will be obvious to you and is intended mainly for others who might be interested.)1,803304It is fine to modify the low word (instead of the high), but my testing has indicated that adding 1 (instead of xor) propagates in a way that exactly repeats every other word in the other half of the next stream. I will post my exact findings shortly.

1,803I prefer modifying the low word and I find it easier to visualize what is happening with +1, but if ^1 is better then no problem as changes to software or hardware are trivial. As mentioned, I haven't been studying pair values to spot undesirable patterns.

1,803Initial state = 1, initial acc = 0

+1 or ^1 applied to low word of acc

304Edit: Likely not necessary, as I verified exactly against your full-size output.All but the first two exampleshave the following remarkable distribution properties:Edit: Various changes for correctness and clarity.304304Forward: Similar forward failures as compared to xor 1.

Reverse (16-bits at a time): The reverse failures are remarkably fewer as compared to xor 1 (which logged nearly 150 failures at 32GB).

1,803Inverting about half the bits of prn before the low accumulation looks optimal, based on nothing more than a gut feeling, and nine seems better than eight for a 16-bit output, which is true for 27D3h = 0010011111010011b. It does have a run of five 1s and is not prime, though.

Are a few more PractRand tests worth doing?

5E2Dh = 24109d = 0101111000101101b is prime, has nine 1s with a longest run of four and is very close to 65536 * 1/e (the expected number of 16-bit outputs that occur once or never for a random generator with period 2^16-1). A couple of other, more control-type xor values are 00FFh and 5555h.

I used QuickBASIC, not the P2 code I posted!

I intend to check xoroshiro16++a with xor value of 115d, on Friday with luck.

304comparatively horrible, BTW), and several things occurred to me while looking at the results:1. The low nibble of the xor value drives the forward failures, with perhaps 3 or 5 being optimal.

2. Perhaps need to use a better seed set (i.e. I suspect than 1/0/0 is not representative enough).

3. Should force-test forward up to 32GB, and reverse up to 64G, in order to provide more data points.

1,803Thus, low difference = x, high difference = x or x-1 and correlation between periods is obvious. I tested many values of x using xor and replacing it with add produced same results for the one or two I tested. initial pair values for each stream.

304In the the last of the example outputs I gave above: ... It is clear that, for example Abs(FD-27)=Abs(8A-B4).

The only questions are related to how desirable it is to have the correlation appear less obvious, and will it affect statistical use case scenarios?

If the answer to either is 'yes', then xor seems to be the better choice.

For example, is a single or double stream actually more random, since the flipped bits produce new carries into ACC?

The answer seems to be 'yes' when looking at forward PractRand results at the 8GB mark where the xor >1 seems to help completion of the single stream without any suspicious results.

However, my original statement on values >1 still partially stands (though could be more broadly applied to using sum, as well): Regarding xor values, I have attached a file of results that gives a clue about PractRand's statistical power.

Spoiler: 0x3725h would be the best constant I have tried (found by using an iterative technique to converge on a seemingly good value).

304I am using the arbitrary seeds (s0 = B49Fh, s1 = 888Eh, acc = 945Bh) for this and re-runs on +1 and ^1, which I will add here and summarize, for comparison.

Tony, given everything we have looked at, let me know which additional data / summaries / BigCrush results, etc. that you need on any specific + or xor constants.

Added more variants using above seeds.1,803Chris, thanks for the PractRand results.

I can't replicate your outputs for xoroshiro16++a when xoring 73h and I see the same correlation between every adjacent stream, for 73h or any other value I tested.

In the P2 hardware, there is enough time for only two cascaded 16-bit additions, where the result from the first adder is used in the second. +1 could be done by add with carry then add, but adding large(r) values would take too long. In contrast, xoring means just inverting some of the first adder inputs and any value would work.

It is clear that +1 or ^1 are enough to hide the 1D equidistribution from PractRand so we get maximum scores, which we never achieved with xoroshiro32++. In theory, all the failures should be 16GB and I'm not sure why some are 32GB.

Adjacent streams that differ by more than 1 make me feel better, even though they are just as correlated. I suggest BigCrush testing with +1 (but no more +), ^1, ^5E2Dh and other ^ you feel are worthwhile. That should be enough to finish ++a.

304That is part of the reason why I consider the 8GB passing results for all variants on fwd/rev and rotations so important as a catch-all for pushing forward to BigCrush (which must find at least some issue(s) with any given variant, I presume). I agree... will do.

1,803I didn't notice s0 & s1 were swapped from the usual seed of 1. Also, my x86 code was calculating PRN after the state iteration. Both now fixed, I can match your data and see the problem. The period is 65535 * 2 as we are dealing with pairs and the start of your third stream is actually start of the second.

304I need to write an accurate XOROACC module for BigCrush... must be XOROACC, since the 32-bit format is required.

Actually, the analysis is mostly 30-bit, which will certainly put an interesting spin on most of the results.

Created and running attached '+ 1' code, following Lemire's convention for testing.Note: There is no mention of ++a, as the transition directly from ++ to XOROACC is cleaner to code in C, as no xor toggled state variable is required to apply e to every other ++ iteration.

304Lemire's naming convention: Any plot result that is entirely within +/- 4 Sigma, with the vast majority of points within +/- 3 Sigma, is considered normal and acceptable...

^1 and +1 did not pass BigCrush, with +1 clearly being the better of the two.

^3725h and ^5E2Dh did pass BigCrush, with ^5E2Dh being marginally better (enough to be recommended).

1,803Many thanks for running the BigCrush tests. The 48 logs are for 3 tests x 16 rotations? The filenames are rather cryptic. xoroshiro32++[13,5,10,9] does not pass all of BigCrush. I'm pleased that ^5E2Dh does and is the best of those tested.

I haven't looked at '2D-8' for xoroshiro16++a yet. Another thing I'm wondering about is how many iterations must be done to output all possible acc pairs.

304, which requires a minimum of about 35 or 36 state bits (if all bits are used optimally)..I ran 16 seed sets (provided by SplitMix and starting seed documented in the filename) for each of forward bits, reverse bits, and reverse bytes, in order to look for p-value bias... so no rotations.

With about 30 of the p-values, I have applied corrections (to compensate for defects in TestU01). However, only a few of those corrections would be required when looking at only 48 BigCrush runs.

When processed through meta-analysis

(required to compete with PractRand), abnormal bias was noticed in only a few of the passing p-values, mostly when the bits were reversed.P-value # 50 (of 254 total) was the most notable offender, but only when bits were reversed: sknuth_CouponCollector: N=1,n=200000000,r=0,d=8

If you feel you must steer other aspects of the distribution (without disrupting the existing equidistribution

or randomness quality) and find you cannot do so when trying other odd constants, then try this:1. Add 1 to the second xoroshiro iteration result AND

2. Use an even xor constant on the first xoroshiro iteration (e.g. 5E2Ch)

I am happy to run a few more PractRand/BigCrush, if you have specific changes / constants in mind.

304remember that the lsb is the master clock, and now that lsb has the highest linear complexity, so its carry is no longer discarded. Have a look at the results: FPs disappeared from forward XOROACC, but popped-up on reverse XOROACC... better overall, though.Then I got bold and tried the same reverse-result-bits experiment, combined with using the methodology described in my previous post, with 5E2Ch (an even xor value) on the first iteration and +1 on the second iteration: Suddenly all FP issues at the failure point for the forward and reverse XOROACC bits have nearly evaporated (though one or two 'unusual' might 'randomly' pop in at ~5e-5 on either forward at 16GB or reverse at 32GB, with some initial seeds).

This leaves only the endemic GAP issue (which likely has no simplistic solution).

Sorry to drag this out... but often these kinds of seemingly incremental

(pun not originally intended)improvements positively impact other aspects of the distribution properties that you may find desirable.AttachedPractRand rotations and will attach BigCrush results of the latter modification that uses [reverse ++, e= ^5E2Ch, f = +1.]Edit: BigCrush results on these variants is a step backward, so will not post.Will do reverse ++, e= ^5E2Dh, f = 0 also... Attached, and also looks very good (e.g. the zip file is even smaller, indicating fewer total failures).Edited multiple times for clarity (and humorous effect).304Accumulate forward bits of xoroshiro++ result Xor'd with 5E2Dh on first iteration (i.e. normal) and then accumulate reverse bits of xoroshiro++ result on second iteration. Same equidistribution properties.

I will post all PractRand and BigCrush results on this variant tomorrow, then I am finally done playing with this.

Edit: PractRand results posted... looks very good!Edit: BigCrush results posted... looks good, with nearly ideal failure rate (i.e. about half of all runs on a good PRNG should show 1 fail). However, meta-analysis shows p-value #50 bias is now visible in both forward and reverse bits (but not reverse bytes).1,803I haven't had time to look at this yet for xoroshiro16++a, perhaps over the weekend. I don't have enough RAM to test xoroshiro32++a[13,5,10,9] but is it possible that different variants might need different numbers of iterations before all 2^32 outputs occur? If so, the lower the better?

304Edit: My intuition says: If 1 full stream is 2^32 double-iterations (16GB), then each stream should accomplish 50% coverage of the unsampled 32-bit values, which should deplete, on average, by the 33rd stream. Therefore, in my example above, complete coverage of 2^17 specific 32-bit values should occur in about 18 streams.My intuition is obviously wrong... decimation of unsampled values is by 1/e, so complete depletion should occur by about the 23rd stream, so therefore complete coverage of 2^16 specific values should occur in about 12 streams.

I tested this logic out on the small version and it works as expected (at about 12 and 6 streams respectively, in that case).

304The PractRand results have no mod3n artifact at 32GB fail on Fwd00, and the BigCrush meta-analysis is the first one I have found for f=Revbits where the p-value #50 bias is only on reversed bits and is minimized to the point where it does not manifest in the full meta-analysis of combined forward, reverse-bits and reverse-bytes.

This might be a keeper... too many primes to test for e. Final hex digit of 3 or 5 seems to minimize most artifacts, if I were to try more.

1,803Rather than one word constant, I think the 64K outputs with both words the same would be better, as it would test the immediate effect of the xor and bit reversal in all of these outputs. I'll make some time this week and also check the equidistribution of low and high bytes separately for xoroshiro16++a to see whether they are the same. Some time ago Evan ran some PractRand tests of xoroshiro32++ with every other word skipped, which is how XORO32 or XOROACC might be used.

304Edited for clarity.1,803It's possible the end user might ignore the low or high word of every XORO32/XOROACC output and the randomness of this could be tested by skipping every other word in PractRand or BigCrush, 2^33-2 iterations producing 2^32-1 used words. Instead of being interleaved, the even and odd words would be separated by a full single-iteration period.

304This makes some sense, as every other word benefits from the bit-mixing of the unused word, but still races twice as fast toward failure. However, I would expect BigCrush would notice more issues with the non-revbits version that PractRand somehow missed.

Edit: Instead of BigCrush, I ran gjrand on 32GB of the various combinations of both/high/low words, forward/reverse output bits, and forward/reverse 2nd iteration using ^5E13. Results attached, w/readme.

There really is not much very exciting (i.e. statistically differentiating) going on with any of the results of gjrand or BigCrush, other than the conclusion that only PractRand seems to be aware that an improvement occurred (within its statistical purview) with GAP on 32-bit outputs (regarding only the lsb) when only the 2nd xoroshiro++ iteration result bits are reversed. My best guess is that reversing the bits of the 2nd iteration is a good thing (beyond the lsb improvement), but it is also a fine point that would not necessarily be relevant in a purely software implementation.

Additionally, by random trial I have demonstrated that there are some odd values of e that might be very slightly better than ^5E13h, but many more that are substantially worse.

Some of those e values are so bad they cause abort issues to occur in PractRand pre0.95 (and might actually trigger a NaN lock-up in 0.94), which I am beta testing only in the random e trial.

Edit: Readme file in attachment mistakenly mentions (twice) 'reversed before xor', which should instead read 'reversed before sum'.304Using ROL16(xoroshiro32++, 1) instead of RevBits16(xoroshiro32++) in the second iteration would accomplish the same goal and be more explicitly compatible/performant with a software only implementation, if desired.

More later, plus I will attach high and low word rotations once I have re-vetted some of the xor constants using this approach, which will take some time.