xoroshiro++ random number generator

TonyB_TonyB_ Posts: 586
edited April 30 in Propeller 2 Vote Up0Vote Down
xoroshiro++ was created by Sebastiano Vigna in collaboration with David Blackman towards the end of 2017 and is an improvement of their xoroshiro+ algorithm. Details of xoroshiro128+, a 128-bit state version of xoroshiro+ with 64-bit output, can be found at http://xoroshiro.di.unimi.it/

Sebastiano told me about xoroshiro++ in November 2017 and at that time it was a candidate to replace xoroshiro128+. Evan (who has done a huge amount of xoroshiro testing using PractRand) and Chip were informed but the details could not be made public. Sebastiano and David have decided a different algorithm will supersede xoroshiro128+ and now we are able to discuss xoroshiro++ freely.

Our tests have shown that xoroshiro++ is a far better random number generator than xoroshiro+. It deserves to be known about and used and that is the main purpose of this post. The story of how various xoroshiro versions have been implemented on the P2 can be read in this long topic:
https://forums.parallax.com/discussion/166176/random-lfsr-on-p2

What is missing from that thread is the xoroshiro++ algorithm, which is the same as xoroshiro+ with two extra steps at the end. The (s0 + s1) sum is rotated left by the number of bits given by a new constant d, then this rotated sum is added to s0 to give the output. The extra rotation and addition scramble the output bits more than with xoroshiro+, thus making them more random. As there are now two additions the enhanced algorithm is called xoroshiro++ and a quadruple of four constants [a,b,c,d] defines it, instead of the [a,b,c] triple in xoroshiro+.

Here is the xoroshiro++ pseudo code:
;{s1,s0} = state (input and output)
;prn = pseudo-random number (output)
;tmp and prn can be the same register

;xoroshiro+
	xor s1,s0		;s1 = s1 ^ s0
	mov tmp,s1
	rol s0,a		;s0 = s0 rol a
	shl tmp,b		;tmp = (s1 ^ s0) shl b
	xor s0,tmp
	xor s0,s1		;s0 = s0 rol a ^ (s1 ^ s0) shl b ^ s1 ^ s0
	rol s1,c		;s1 = (s1 ^ s0) rol c
	mov prn,s0
	add prn,s1		;prn = s0 + s1
;xoroshiro++ enhancement
	rol prn,d
	add prn,s0		;prn = (s0 + s1) rol d + s0

Note that constant d affects only the PRN output and the state calculations in xoroshiro+ and xoroshiro++ are identical. To test xoroshiro++, the same full-period [a,b,c] triples as in xoroshiro+ are used, then the possible values of d tried in turn. The results can vary greatly for different d, even between those that differ by only one.

The P2 XORO32 D instruction does a double-iteration of xoroshiro32++. The state D is advanced by two iterations and the resulting state is written back to D, the intermediate state is not available to the user. The 32-bit PRN output is a concatenation of two successive 16-bit outputs, the low word is the first and the high word the second. This 32-bit value is copied to the S field of the instruction following XORO32.

To save some time, both in hardware and in software if instructions can be executed in parallel, the PRN output can be calculated at the start of the algorithm, not at the end. The XORO32 logic was modified in this way for the final version and here is the corresponding xoroshiro++ pseudo code:
;{s1,s0} = state (input and output)
;prn = pseudo-random number (output)

	mov prn,s0
	add prn,s1		;prn = s0 + s1
	rol prn,d
	add prn,s0		;prn = (s0 + s1) rol d + s0
	xor s1,s0		;s1 = s1 ^ s0
	mov tmp,s1
	rol s0,a		;s0 = s0 rol a
	shl tmp,b		;tmp = (s1 ^ s0) shl b
	xor s0,tmp
	xor s0,s1		;s0 = s0 rol a ^ (s1 ^ s0) shl b ^ s1 ^ s0
	rol s1,c		;s1 = (s1 ^ s0) rol c

After much testing with PractRand the quadruple [14,2,7,6] was selected in December 2017. Following more tests a decision was made in the last few days to change to [14,2,7,5], however this was probably not really necessary. (The free-running hub-based PRNG uses the original xoroshiro128+ [55,14,36] algorithm.)

An intrinsic property of both xoroshiro+ and xoroshiro++ is equidistribution. All outputs occur the same number of times, except for zero. For xoroshiro32+/++, the frequency of every non-zero 16-bit output is 2^16 and the frequency of zero is 2^16-1. A similar equidistribution applies to bytes and bits. The initial state for seeding can be any non-zero value.

xoroshiro++ can be ported easily to other CPUs and languages, for example x86, Z80, C and BASIC. Tests have shown that good results can be obtained even with a 16-bit state, 8-bit output and period of 2^16-1.
Formerly known as TonyB

Comments

  • 14 Comments sorted by Date Added Votes
  • Heater.Heater. Posts: 20,934
    edited March 13 Vote Up0Vote Down
    Would the real xoroshiro please stand up!

    This is very confusing.

  • Heater. wrote: »
    Would the real xorshiro please stand up!

    This is very confusing.

    That's what makes it so random. Even more than before.
  • evanhevanh Posts: 5,125
    edited March 12 Vote Up0Vote Down
    Thanks Tony, Good coverage.

    Heater, ask some questions. It would be good to know if some terms need explained or more fleshing out of the events and decisions.
    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • Heater.Heater. Posts: 20,934
    edited March 13 Vote Up0Vote Down
    evanh,

    I don't know what questions to ask. Having been fascinated by this topic for some years I have made some observations:

    1) Back in the early days of computing people slapped all kind of operations together, in a haphazard way, in the hope that the output would be kind of random.

    2) As the years went by all kinds of criticisms were leveled at those early PRNGs. They have short periods, they have a bias, and so on.

    3) Then there were guys like Prof. George Marsaglia, creating new PRNGs but also backing them up with some mathematical proofs about their properties. Like the period and such.

    4) Today, with the xoroshio guys, we seem to be back to step 1). For example today I notice they have xorshift1024*φ, which smacks of "Oh, lets throw in a multiply by some weird number to stir things up a bit" without any explanation. I mean, why φ? Why not π? Or e? Or 2 for that matter?

    5) If your PRNG is supposed to have some huge period, like 2^1024 − 1 or whatever, then please show why that might be. It's impossible to test for this. Even if I don't understand the proof it would be nice to know there is one that somebody does.

    And finally, I do wish creators of PRNG algorithms would provide some test data. Some sequence that is generated from a known initial state. It's so easy for the algorithms to me implemented wrongly when translated to other programming languages. Or just implemented badly through not understanding the specification.

  • TonyB_TonyB_ Posts: 586
    edited April 1 Vote Up0Vote Down
    First 32 outputs for xoroshiro32++ [14,2,7,5] when seed = 1 {s1 = 0, s0 = 1}
    PRN calculated at end of algorithm after new state generated
    50AD
    B89A
    A3D9
    0C87
    9BD9
    3CAA
    3502
    8840
    3D24
    7287
    71F5
    90AC
    A818
    B4C1
    2CE3
    CC58
    B31D
    DF3D
    B7A4
    A2E3
    3B2E
    2C20
    BFFF
    AFE9
    5B9A
    3BB8
    35F4
    921C
    6A2D
    CD9A
    A217
    207F
    

    First 32 outputs for xoroshiro32++ [14,2,7,5] when seed = 1 {s1 = 0, s0 = 1}
    PRN calculated at start of algorithm before new state generated
    0021
    50AD
    B89A
    A3D9
    0C87
    9BD9
    3CAA
    3502
    8840
    3D24
    7287
    71F5
    90AC
    A818
    B4C1
    2CE3
    CC58
    B31D
    DF3D
    B7A4
    A2E3
    3B2E
    2C20
    BFFF
    AFE9
    5B9A
    3BB8
    35F4
    921C
    6A2D
    CD9A
    A217
    

    First 16 outputs for XORO32 when seed = 1 {s1 = 0, s0 = 1}
    PRN calculated at start of algorithm before new state generated
    50AD0021
    A3D9B89A
    9BD90C87
    35023CAA
    3D248840
    71F57287
    A81890AC
    2CE3B4C1
    B31DCC58
    B7A4DF3D
    3B2EA2E3
    BFFF2C20
    5B9AAFE9
    35F43BB8
    6A2D921C
    A217CD9A
    
    Formerly known as TonyB
  • Reserved for xoroshiro64++ code and test data.
    Formerly known as TonyB
  • TonyB_TonyB_ Posts: 586
    edited March 13 Vote Up0Vote Down
    Heater. wrote: »
    I don't know what questions to ask. Having been fascinated by this topic for some years I have made some observations:

    1) Back in the early days of computing people slapped all kind of operations together, in a haphazard way, in the hope that the output would be kind of random.

    2) As the years went by all kinds of criticisms were leveled at those early PRNGs. They have short periods, they have a bias, and so on.

    3) Then there were guys like Prof. George Marsaglia, creating new PRNGs but also backing them up with some mathematical proofs about their properties. Like the period and such.

    4) Today, with the xoroshiro guys, we seem to be back to step 1). For example today I notice they have xorshift1024*φ, which smacks of "Oh, lets throw in a multiply by some weird number to stir things up a bit" without any explanation. I mean, why φ? Why not π? Or e? Or 2 for that matter?

    5) If your PRNG is supposed to have some huge period, like 2^1024 − 1 or whatever, then please show why that might be. It's impossible to test for this. Even if I don't understand the proof it would be nice to know there is one that somebody does.

    And finally, I do wish creators of PRNG algorithms would provide some test data. Some sequence that is generated from a known initial state. It's so easy for the algorithms to me implemented wrongly when translated to other programming languages. Or just implemented badly through not understanding the specification.

    Heater,

    To answer the last point first, I've posted some test data above, with more to come.

    4) Some of the work started by the late Prof. Marsaglia is being carried on and improved by Sebastiano Vigna and David Blackman. I have no doubt that they know what they are doing so you need to be a bit careful what you write - they might be reading! From my personal experience of it, I think xoroshiro++ is very good and yet a forthcoming paper will not even mention it, which means (a) the replacement for xoroshiro128+ must be rather special and (b) it is up to us here to publicize xoroshiro++ because nobody else will.

    There are some rigorous random number tests available and a representation of φ (arguably the most irrational number) would not have been chosen just on a whim. Multiplication scrambles bits better than addition and lots of odd numbers would do a good job probably, but a discussion of xorshift1024*φ and other generators that require a multiply is off-topic here.

    5) The proof that certain classes of generator with n-bit states have a periods of 2^n-1 is not difficult, I understand, for mathematicians anyway. As I've mentioned before, we know that xoroshiro32 has a period of a 2^32-1 because the full period can be iterated in a matter of seconds.
    Formerly known as TonyB
  • Regarding point 5, I was getting the impression it's easy-as to proof every combination. Seba was providing detailed complete lists of full period candidates, which perfectly matched our empirically found list, on the click of the finger. It didn't seem to matter how big the state space was.

    There was some polynomial associated with each listed candidate. Chris may have provided something for his example too. Certainly he wasn't phased/daunted by large periods.

    It seems these sort of capabilities are well understood in the research field so no one much details them.

    I guess Xoroshiro isn't trying to be an educational product.
    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • Heater.Heater. Posts: 20,934
    edited March 13 Vote Up0Vote Down
    TonyB_,
    It's even more confusing if the name is wrong. It's xoroshiro - short for xor-rotate-shift-rotate.
    Odd that. I cut and pasted xorshift1024*φ from their web page and did not notice the missing "o" there.

    I am careful what I write. No disrespect to Sebastiano and David, just posing the questions.

    I do like the xoroshiro generators for their small size, speed and lack of multiply, divide or modulo. Which makes them ideal for the Propeller. That is why I posted a Spin version of David Jones' JKISS32 here years ago. Another derivative of Marsaglia's work.
    http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf
    https://forums.parallax.com/discussion/136975/jkiss32-high-quality-prng-with-realrandom-seed-save-a-cog

    As for the maths behind all this. Likely I'm never going to understand that. Which is why I like to see sample output from the algorithms creators. Something to test my, likely buggy, implementations against.

  • TonyB_TonyB_ Posts: 586
    edited May 6 Vote Up0Vote Down
    New paper by David Blackman & Sebastiano Vigna out today:
    Scrambled Linear Pseudorandom Number Generators

    More info and PDF download at http://xoshiro.di.unimi.it/
    PDF also available at https://arxiv.org/abs/1805.01407

    Please read the paper before passing comment. Even if you don't understand some or much of it, the amount of thought and testing that has been done should be apparent.

    It's good advanced publicity for Parallax and we can talk about all of the PRNG logic in the P2 freely now. Each cog has a xoroshiro32++ and the hub has a xoroshiro128**. The scrambler for the latter was changed from + to ** in early April, improving the quality and allowing all 64 output bits to be used equally.

    The P2 does not use the new xoshiro generator, which has four times as many state as output bits. Something for the future, possibly.
    Formerly known as TonyB
  • Heater.Heater. Posts: 20,934
    Fascinating read. Thanks for the heads up TonyB_.

    You are right I understand very little of it.

    Only one niggle:

    Would it be so hard to include complete, working, examples together with some output for test data? In C, pseudo code or whatever language.

    Nice to see that Parallax, Chip and co. get a mention.

  • cgraceycgracey Posts: 9,185
    edited May 4 Vote Up0Vote Down
    That's neat, TonyB_!

    I think we've got the best-bang-for-the-buck PRNG's in the Prop2.

    Thanks for all of your and Evan's work, and also Sebastiano's help.

  • evanhevanh Posts: 5,125
    Reading the paper feels easy but the descriptive maths just completely leaps over my head. This isn't a criticism of the paper, just naming my own limits here.

    Note: For anyone reading my prior postings and sources - Engine and scrambler is what I've been calling iterator and hash respectively.

    "Are we alone in the universe?"
    "Yes," said the Oracle.
    "So there's no other life out there?"
    "There is. They're alone too."
  • We were fortunate to have information about the new algorithms in advance. Having to wait for the paper would have been too late for the P2.
    Formerly known as TonyB
Sign In or Register to comment.