Welcome to the Parallax Discussion Forums, sign-up to participate.

- 101.5K All Categories
- 812 Announcements
- 51 Propeller Code
- 22 PASM2/Spin2 (P2)
- 4 PASM/Spin (P1)
- 14 BASIC (for Propeller)
- 61 Forth
- 10 C/C++
- 2.8K Propeller 2
- 27.6K Propeller 1
- 18.9K BASIC Stamp
- 9 micro:bit
- 21.1K General Discussion
- 2K Learn with BlocklyProp
- 8.2K Robotics
- 124 Customer Projects
- 3.3K Accessories

## Comments

1,8081,808During a complete result period of xoroshiro16+++, every 8-bit result occurs 65535 times and every 16-bit result pair occurs 255 times for 256 pairs and 256 times for 65536-256 = 65280 pairs. Equidistribution is perfect for 1-D and not quite perfect for 2-D (because the period of 65535 * 256 is not exactly divisible by 65536).

Thus the intrinsic 2-D equidistribution that is destroyed by the adder scrambler can be restored (almost).

304(or 2^8 for small version).1,808Changing the initial value of result acts as a type of jump function and I have used

values other than zero, e.g. 255 for xoroshiro16+++ so that the state periods start with 255, 254, ..., 1, 0.

The pfreq distribution for one state period of xoroshiro16+++ appears to show a significant overall improvement compared to the full (state) period of xoroshiro16++, for the small number of [a,b,c,d] candidates I looked at.

13,6881,808Yes, a xoroshiro16+++ is very fast to test and that's what we have been doing. A xoroshiro32+++ would take 65536 times longer than xoroshiro32++, which is too long to be practical for the whole period but smaller versions could be tested to confirm the principle.

xoroshiro+++ consists of an

enginethat iterates the state, ascrambler(in this case a double addition) and a newaccumulatorwhich can be initialised to any value. If the state width is w bits, then the period between state and accumulator repeating their initial values is (2^w-1)*2^(w/2). For w = 32 as in XORO32, this is (2^32-1)*2^16 and instead of a single period of 2^32-1 there are 65536 such periods that begin with the accumulator one lower/higher than the period before/after.I'm not sure whether there is any advantage in using xoroshiro32+++ over xoroshiro32++ for single 16-bit outputs but there is for paired 32-bit outputs. XORO32 longs are a good approximation to ideal randomness; instead of all values occuring once during a state period, some occur multiple times and others not at all. In fact, 1/e or ~37% of the possible outputs never happen, ~37% once and the remainder twice or more.

In contrast, xoroshiro32+++ outputs all 32-bit paired values the same number of times (2-dimensional equidistribution) or very nearly so. Extrapolating, 65536 values should occur 65535 times and 4294967296-65536 = 4294901760 values should occur 65536 times.

304All of the ground-work was laid out with the current implementation of xoroshiro++.

My opinion is that it is the best general purpose PRNG available, thus worth exploring in further detail.

There are some minor deficits that leave a little room for improvement, which I had previously put considerable thought into, but was never able to simplify to be potentially useful until recently.

All of the current discussion is involving a general purpose equidistibution recovery extension that is a simplification of my original experiments, increasing code complexity by only a small amount.

It has some merit on multiple non-trivial points, with only a few limitations that are seemingly unavoidable without moving to even greater state and/or word size... where the same logic could be applied, as well.

1,808Another thought: if the accumulator (acc) is reset after two state periods (one period of paired outputs), the pair frequency could be closer to the ideal random binomial distribution than with plain XORO32, based on brief testing with 16-bit states. The initial value of acc does not matter, but would the resetting create a discontinuity?

304The only other possible way I have found is to xor mask s1 (s1 = s1 xor M) prior to the scrambler (yielding a non-correlated stream), where M possibly be updated periodically. Does not work when previous result is summed with new result.

1,808vice-versa. Therefore only one state iteration is required per long due to this result repetition.XORO32 does two state iterations of xoroshiro++ to generate a long. An option for xoroshiro+++ as a software extension of XORO32 is that only one word from the long could be used, twice, and the other word ignored. At the end of the state period, which is an odd length, the halves would swap over so that all the outputs would be used in the end. Words output by successive states would be the entire state period apart, i.e. all the high words would come after all the low words or

vice-versa. Tests show that all possible long outputs can be generated but the output is not equidistributed. If xoroshiro+++ uses both words from XORO32 in succession only once the output is similar, i.e. no equidistribution.Was the PractRand testing done with repeating and skipping, or was it a continuous stream?

EDIT1:Updated second paragraph after more testing.

EDIT2:Second paragraph obliterated.

304I am not sure I understand exactly how skipping would work, as the entire state (including 'result') is an even period.

I believe I have actually found a simple method of switching streams (that are de-correlated) after 2^wordsize iterations by counting... and maintain the same 2D at a period of 2^32-2-16 2^64-2^32.

However, the counter itself introduces another (very simple, but powerful) 16-bit state variable.

At that point you at, or very close to, needing to just move on to using xoshiro, instead of xoroshiro.

I'll post the code here, after some more study.

1,808Second paragraph of my previous post should be ignored. Do you agree with the first one?

No need for any skipping in PractRand tests but, if I am right in para. 1 above, all values should be repeated for 2-d equidistribution, which probably won't score well.

304small state verison:Each 8-bit value occurs exactly 65535 times. Each possible pair of 8-bit values occurs exactly 256 times.

However, half of the total 8-bit 2D equidistribution is split between the second half of a 16-bit pair and the first half of the next 16-bit pair.

This breaks all equidistribution at the 16-bit level.

This does produce an excellent normal distribution in the 16-bits, but I see no way back toward actual pair equidistribution without either compromising randomness or using 16-bit variables.

The only way I know of to achieve the (near) perfect 1D 8+8=16 pairing with xoroshiro along with reasonable randomness is to use the ** variant, which brings other issues to the table.

1,808It seems we are seeing the same thing. 1/256th of the pair values occur 255 times, as mentioned earlier.

The period is doubled when dealing with pair values and at the midpoint the high and low halves are swapped due to the odd state period (as shown below).

To get pair equidistribution, I use the following sequence

and record the pair frequencies. The sequence could be rewritten as

which avoids repetitions and thus corresponds to the linear stream sent to PractRand. 65534_0 is the midpoint of the double-length pair period and I wonder whether pair equidistribution could be achieved by reloading the accumulator at this point.

304recovery ofthe pairing (yet **, or just plain 'result=s0', does so effortlessly,but missing a 0).In fact the method I thought I had found using 's1 ^ M' will not work either (I think because s0 ^ M2 needs to be done simultaneously).

304Can you confirm what I did above? I will retool and test with larger state...

On the large state, PractRand and SmallCrush easily see through my trick, even though it passes most tests in the latter..I am surprised that SmallCrush did as well as it did (passing 10 out of 15 tests)... considering that 1 iteration of the engine is being used to generate both the high and low values!

1,808I have pair equidistribution working now for xoroshiro16+++.

Before iteration at * the accumulator must be reloaded with the same value as it has before the preceding ^, then this value must be decremented so that at the next * acc is one lower than previous *, etc. Loading acc takes place before the first ^ and every *.

304304Using xoroshiro32++ as an example, I am thinking xoroshiro48p++, or for Seba's original xoroshiro32+ using xoroshiro48p+.

Of the above two variants, for the 192-bit version they can be expressed in C (if done properly) to be at least as fast as the original (using Seba's benchmark).

1,808My code is x86 and I use Windows Debug.com to run it and then examine RAM - not a setup designed for sharing, to be honest.

I suggest proving first of all that this sequence repeated 256 times produces pair equidistribution: After each xoroshiro++ iteration, the current acc is the high word and the previous acc is the low word (or vice-versa) of the 16-bit pair. The order in which these pairs occur does not affect their frequency, therefore the following modified sequence will give the same pfreq: The important point is that acc must have the same value before the second instance of acc/state X as the first instance of X. This means acc must be reloaded at the midpoints, not just initialised once at the start.

The second sequence is the way XORO32 works with its double-iteration and is a lot more random than the first, which repeats each acc immediately.

1,808Before we knew about the ++ scrambler, we used xoroshiro+ and replaced bit 0 with the parity of (s0+s1)

including the carry, i.e 17-bit parity for xoroshiro32+. The quality of this parity is better than any other bit of the sum and we called our new algorithm xoroshiro32+p. However, it had a very short life because Seba told me about ++ soon afterwards.13,688Here is what we currently have for the P2 chip's XORO32 instruction:

304I easily think in x86 and 65xx, since I started on and mastered the latter, and I've been using the former for over 35 years.

Assuming I get that working and tested... I cannot exactly visualize a P2 implementation, but I gather you are thinking cyclical load/store/decrement is no challenge under that architecture (whereas it will induce a performance penalty in compiled C, which is not really a problem since it likely would not be used extensively in that form).

Regarding naming, for non-double-iterated, I will probably do something like 'rs+' for '+', and 'rs++' for '++'. The 'rs' would probably stand for 'result summation' and help ensure clarity about which base scrambler is being used. If that sounds rational (and jumping ahead), then perhaps 'xoroshiro48rsm++' (i.e. result summation modified) might do for what you are thinking.

304I see 512 values occurred 255 times, 256 values occurred 257 times, and the balance (64768 values) occurred 256 times.

PractRand results on normal code with this mod still fail as expected at 16GB of output using [13R,5R,10R,9R,1], but more severe due to the exact swap of high and low words:

Enjoy.

1,808I get the same results. There are 256 pair values X and X+1 such that pfreq(X) = 257 and pfreq(X+1) = 255 if e = +1 or pfreq(X) = 255 and pfreq(X+1) = 257 if e = -1, instead of both being 256. X and X+1 differ only in the low byte. The accumulator seems to be one less or more than it should be sometimes. I think we should be able to get rid of this error because, as mentioned already, the order in which the pair values occur should not affect their frequency. Maybe a particular X with pfreq(X) = 257 should be trapped each time the frequency is incremented and the iteration numbers recorded.

Re naming, I suggest xoroshiro++a for ++ scrambler and accumulator.

EDIT:Added pfreq(X), pfreq(X+1) for e = -1

304Perhaps more advanced decrementing (e.g. flipping between bit inverted pairs: -2 +1... ) prior to reload? I am worried about getting too complex.

No, that does not work.[Edit]Perhaps change the reload point (full state?) to full period -/+ 1 or some variation on that idea? Harder to investigate without using an even smaller state xoroshiro.[/Edit]

1,808The asterisk above shows precisely when the accumulator must be

adjusted(not loaded) near the midpoint. If the state before the previous iteration is the initial seed state, then acc should be incremented/decremented (assuming e = +/-1) to move to the next/previous stream, where a stream is defined as a single xoroshiro++ period with a unique initial accumulator value.To put it another way, if double-iterating do inc acc or dec acc when state equals next one after initial state. This is how it must be done using XORO32, as the initial state value is not returned and is therefore untestable when it applies to the second iteration.

I have tried different small values of e apart from +/- 1 with acc adjustment and they work successfully, including zero. For non-zero values of e and no acc adjust, 2D (pfreq) looks to be a normal distribution. PractRand testing might show that certain e are better than others.

13,688So, to make XORO32 more perfect, we would need the hardware to detect a certain value (that evolves midway through the overall sequence) and then change it.

1,808Of all the random number ideas we've looked at since P2 revB was fixed, I think this latest one is the best. We could improve XORO32 with or without hardwiring a particular state. The first thing we need is an extra instruction that does the accumulation, say XOROACC, which could use the same opcode as XORO32 except the immediate bit = 1 (similar to WRNZ & MODCZ).

Here is how the code might look:

EDIT1:It is better to hardwire the mid-state acc adjustment.

EDIT2:xoroacc operation corrected.

1,808XORO32(existing) double-iterated xoroshiro++XORO32 and accumulator(proposed) double-iterated "xoroshiro++a"word means bits [31:16] and [15:0] only