Shop OBEX P1 Docs P2 Docs Learn Events
xoroshiro++ random number generator — Parallax Forums

xoroshiro++ random number generator

TonyB_TonyB_ Posts: 2,196
edited 2022-11-05 10:54 in Propeller 2
xoroshiro++ was created by Sebastiano Vigna in collaboration with David Blackman towards the end of 2017 and is an improvement of their xoroshiro+ algorithm. Details of xoroshiro128+, a 128-bit state version of xoroshiro+ with 64-bit output, can be found at http://xoroshiro.di.unimi.it/

Sebastiano told me about xoroshiro++ in November 2017 and at that time it was a candidate to replace xoroshiro128+. Evan (who has done a huge amount of xoroshiro testing using PractRand) and Chip were informed but the details could not be made public. Sebastiano and David have decided a different algorithm will supersede xoroshiro128+ and now we are able to discuss xoroshiro++ freely.

Our tests have shown that xoroshiro++ is a far better random number generator than xoroshiro+. It deserves to be known about and used and that is the main purpose of this post. The story of how various xoroshiro versions have been implemented on the P2 can be read in this long topic:
https://forums.parallax.com/discussion/166176/random-lfsr-on-p2

What is missing from that thread is the xoroshiro++ algorithm, which is the same as xoroshiro+ with two extra steps at the end. The (s0 + s1) sum is rotated left by the number of bits given by a new constant d, then this rotated sum is added to s0 to give the output. The extra rotation and addition scramble the output bits more than with xoroshiro+, thus making them more random. As there are now two additions the enhanced algorithm is called xoroshiro++ and a quadruple of four constants [a,b,c,d] defines it, instead of the [a,b,c] triple in xoroshiro+.

Here is the xoroshiro++ pseudo code:
;{s1,s0} = state (input and output)
;prn = pseudo-random number (output)
;tmp and prn can be the same register

;xoroshiro+
	xor s1,s0		;s1 = s1 ^ s0
	mov tmp,s1
	rol s0,a		;s0 = s0 rol a
	shl tmp,b		;tmp = (s1 ^ s0) shl b
	xor s0,tmp
	xor s0,s1		;s0 = s0 rol a ^ (s1 ^ s0) shl b ^ s1 ^ s0
	rol s1,c		;s1 = (s1 ^ s0) rol c
	mov prn,s0
	add prn,s1		;prn = s0 + s1
;xoroshiro++ enhancement
	rol prn,d
	add prn,s0		;prn = (s0 + s1) rol d + s0

Note that constant d affects only the PRN output and the state calculations in xoroshiro+ and xoroshiro++ are identical. To test xoroshiro++, the same full-period [a,b,c] triples as in xoroshiro+ are used, then the possible values of d tried in turn. The results can vary greatly for different d, even between those that differ by only one.

The P2 XORO32 D instruction does a double-iteration of xoroshiro32++. The state D is advanced by two iterations and the resulting state is written back to D, the intermediate state is not available to the user. The 32-bit PRN output is a concatenation of two successive 16-bit outputs, the low word is the first and the high word the second. This 32-bit value is copied to the S field of the instruction following XORO32.

To save some time, both in hardware and in software if instructions can be executed in parallel, the PRN output can be calculated at the start of the algorithm, not at the end. The XORO32 logic was modified in this way for the final version and here is the corresponding xoroshiro++ pseudo code:
;{s1,s0} = state (input and output)
;prn = pseudo-random number (output)

	mov prn,s0
	add prn,s1		;prn = s0 + s1
	rol prn,d
	add prn,s0		;prn = (s0 + s1) rol d + s0
	xor s1,s0		;s1 = s1 ^ s0
	mov tmp,s1
	rol s0,a		;s0 = s0 rol a
	shl tmp,b		;tmp = (s1 ^ s0) shl b
	xor s0,tmp
	xor s0,s1		;s0 = s0 rol a ^ (s1 ^ s0) shl b ^ s1 ^ s0
	rol s1,c		;s1 = (s1 ^ s0) rol c

After much testing with PractRand the quadruple [14,2,7,6] was selected in December 2017. Following more tests a decision was made in the last few days to change to [14,2,7,5], however this was probably not really necessary. (The free-running hub-based PRNG uses the original xoroshiro128+ [55,14,36] algorithm.)

An intrinsic property of both xoroshiro+ and xoroshiro++ is equidistribution. All outputs occur the same number of times, except for zero. For xoroshiro32+/++, the frequency of every non-zero 16-bit output is 2^16 and the frequency of zero is 2^16-1. A similar equidistribution applies to bytes and bits. The initial state for seeding can be any non-zero value.

xoroshiro++ can be ported easily to other CPUs and languages, for example x86, Z80, C and BASIC. Tests have shown that good results can be obtained even with a 16-bit state, 8-bit output and period of 2^16-1.

EDIT:
Quadruple changed to [13,5,10,9] for Rev B and later. See post 16 below.

Comments

  • Heater.Heater. Posts: 21,230
    edited 2018-03-13 07:19
    Would the real xoroshiro please stand up!

    This is very confusing.

  • cgraceycgracey Posts: 14,231
    Heater. wrote: »
    Would the real xorshiro please stand up!

    This is very confusing.

    That's what makes it so random. Even more than before.
  • evanhevanh Posts: 16,070
    edited 2018-03-12 22:02
    Thanks Tony, Good coverage.

    Heater, ask some questions. It would be good to know if some terms need explained or more fleshing out of the events and decisions.
  • Heater.Heater. Posts: 21,230
    edited 2018-03-13 07:21
    evanh,

    I don't know what questions to ask. Having been fascinated by this topic for some years I have made some observations:

    1) Back in the early days of computing people slapped all kind of operations together, in a haphazard way, in the hope that the output would be kind of random.

    2) As the years went by all kinds of criticisms were leveled at those early PRNGs. They have short periods, they have a bias, and so on.

    3) Then there were guys like Prof. George Marsaglia, creating new PRNGs but also backing them up with some mathematical proofs about their properties. Like the period and such.

    4) Today, with the xoroshio guys, we seem to be back to step 1). For example today I notice they have xorshift1024*φ, which smacks of "Oh, lets throw in a multiply by some weird number to stir things up a bit" without any explanation. I mean, why φ? Why not π? Or e? Or 2 for that matter?

    5) If your PRNG is supposed to have some huge period, like 2^1024 − 1 or whatever, then please show why that might be. It's impossible to test for this. Even if I don't understand the proof it would be nice to know there is one that somebody does.

    And finally, I do wish creators of PRNG algorithms would provide some test data. Some sequence that is generated from a known initial state. It's so easy for the algorithms to me implemented wrongly when translated to other programming languages. Or just implemented badly through not understanding the specification.

  • TonyB_TonyB_ Posts: 2,196
    edited 2018-04-01 18:15
    First 32 outputs for xoroshiro32++ [14,2,7,5] when seed = 1 {s1 = 0, s0 = 1}
    PRN calculated at end of algorithm after new state generated
    50AD
    B89A
    A3D9
    0C87
    9BD9
    3CAA
    3502
    8840
    3D24
    7287
    71F5
    90AC
    A818
    B4C1
    2CE3
    CC58
    B31D
    DF3D
    B7A4
    A2E3
    3B2E
    2C20
    BFFF
    AFE9
    5B9A
    3BB8
    35F4
    921C
    6A2D
    CD9A
    A217
    207F
    

    First 32 outputs for xoroshiro32++ [14,2,7,5] when seed = 1 {s1 = 0, s0 = 1}
    PRN calculated at start of algorithm before new state generated
    0021
    50AD
    B89A
    A3D9
    0C87
    9BD9
    3CAA
    3502
    8840
    3D24
    7287
    71F5
    90AC
    A818
    B4C1
    2CE3
    CC58
    B31D
    DF3D
    B7A4
    A2E3
    3B2E
    2C20
    BFFF
    AFE9
    5B9A
    3BB8
    35F4
    921C
    6A2D
    CD9A
    A217
    

    First 16 outputs for XORO32 when seed = 1 {s1 = 0, s0 = 1}
    PRN calculated at start of algorithm before new state generated
    50AD0021
    A3D9B89A
    9BD90C87
    35023CAA
    3D248840
    71F57287
    A81890AC
    2CE3B4C1
    B31DCC58
    B7A4DF3D
    3B2EA2E3
    BFFF2C20
    5B9AAFE9
    35F43BB8
    6A2D921C
    A217CD9A
    
  • Reserved for xoroshiro64++ code and test data.
  • TonyB_TonyB_ Posts: 2,196
    edited 2018-03-13 03:41
    Heater. wrote: »
    I don't know what questions to ask. Having been fascinated by this topic for some years I have made some observations:

    1) Back in the early days of computing people slapped all kind of operations together, in a haphazard way, in the hope that the output would be kind of random.

    2) As the years went by all kinds of criticisms were leveled at those early PRNGs. They have short periods, they have a bias, and so on.

    3) Then there were guys like Prof. George Marsaglia, creating new PRNGs but also backing them up with some mathematical proofs about their properties. Like the period and such.

    4) Today, with the xoroshiro guys, we seem to be back to step 1). For example today I notice they have xorshift1024*φ, which smacks of "Oh, lets throw in a multiply by some weird number to stir things up a bit" without any explanation. I mean, why φ? Why not π? Or e? Or 2 for that matter?

    5) If your PRNG is supposed to have some huge period, like 2^1024 − 1 or whatever, then please show why that might be. It's impossible to test for this. Even if I don't understand the proof it would be nice to know there is one that somebody does.

    And finally, I do wish creators of PRNG algorithms would provide some test data. Some sequence that is generated from a known initial state. It's so easy for the algorithms to me implemented wrongly when translated to other programming languages. Or just implemented badly through not understanding the specification.

    Heater,

    To answer the last point first, I've posted some test data above, with more to come.

    4) Some of the work started by the late Prof. Marsaglia is being carried on and improved by Sebastiano Vigna and David Blackman. I have no doubt that they know what they are doing so you need to be a bit careful what you write - they might be reading! From my personal experience of it, I think xoroshiro++ is very good and yet a forthcoming paper will not even mention it, which means (a) the replacement for xoroshiro128+ must be rather special and (b) it is up to us here to publicize xoroshiro++ because nobody else will.

    There are some rigorous random number tests available and a representation of φ (arguably the most irrational number) would not have been chosen just on a whim. Multiplication scrambles bits better than addition and lots of odd numbers would do a good job probably, but a discussion of xorshift1024*φ and other generators that require a multiply is off-topic here.

    5) The proof that certain classes of generator with n-bit states have a periods of 2^n-1 is not difficult, I understand, for mathematicians anyway. As I've mentioned before, we know that xoroshiro32 has a period of a 2^32-1 because the full period can be iterated in a matter of seconds.
  • evanhevanh Posts: 16,070
    Regarding point 5, I was getting the impression it's easy-as to proof every combination. Seba was providing detailed complete lists of full period candidates, which perfectly matched our empirically found list, on the click of the finger. It didn't seem to matter how big the state space was.

    There was some polynomial associated with each listed candidate. Chris may have provided something for his example too. Certainly he wasn't phased/daunted by large periods.

    It seems these sort of capabilities are well understood in the research field so no one much details them.

    I guess Xoroshiro isn't trying to be an educational product.
  • Heater.Heater. Posts: 21,230
    edited 2018-03-13 07:54
    TonyB_,
    It's even more confusing if the name is wrong. It's xoroshiro - short for xor-rotate-shift-rotate.
    Odd that. I cut and pasted xorshift1024*φ from their web page and did not notice the missing "o" there.

    I am careful what I write. No disrespect to Sebastiano and David, just posing the questions.

    I do like the xoroshiro generators for their small size, speed and lack of multiply, divide or modulo. Which makes them ideal for the Propeller. That is why I posted a Spin version of David Jones' JKISS32 here years ago. Another derivative of Marsaglia's work.
    http://www0.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf
    https://forums.parallax.com/discussion/136975/jkiss32-high-quality-prng-with-realrandom-seed-save-a-cog

    As for the maths behind all this. Likely I'm never going to understand that. Which is why I like to see sample output from the algorithms creators. Something to test my, likely buggy, implementations against.

  • TonyB_TonyB_ Posts: 2,196
    edited 2018-05-06 21:06
    New paper by David Blackman & Sebastiano Vigna out today:
    Scrambled Linear Pseudorandom Number Generators

    More info and PDF download at http://xoshiro.di.unimi.it/
    PDF also available at https://arxiv.org/abs/1805.01407

    Please read the paper before passing comment. Even if you don't understand some or much of it, the amount of thought and testing that has been done should be apparent.

    It's good advanced publicity for Parallax and we can talk about all of the PRNG logic in the P2 freely now. Each cog has a xoroshiro32++ and the hub has a xoroshiro128**. The scrambler for the latter was changed from + to ** in early April, improving the quality and allowing all 64 output bits to be used equally.

    The P2 does not use the new xoshiro generator, which has four times as many state as output bits. Something for the future, possibly.
  • Heater.Heater. Posts: 21,230
    Fascinating read. Thanks for the heads up TonyB_.

    You are right I understand very little of it.

    Only one niggle:

    Would it be so hard to include complete, working, examples together with some output for test data? In C, pseudo code or whatever language.

    Nice to see that Parallax, Chip and co. get a mention.

  • cgraceycgracey Posts: 14,231
    edited 2018-05-04 22:21
    That's neat, TonyB_!

    I think we've got the best-bang-for-the-buck PRNG's in the Prop2.

    Thanks for all of your and Evan's work, and also Sebastiano's help.

  • evanhevanh Posts: 16,070
    Reading the paper feels easy but the descriptive maths just completely leaps over my head. This isn't a criticism of the paper, just naming my own limits here.

    Note: For anyone reading my prior postings and sources - Engine and scrambler is what I've been calling iterator and hash respectively.

  • We were fortunate to have information about the new algorithms in advance. Having to wait for the paper would have been too late for the P2.
  • TonyB_TonyB_ Posts: 2,196
    edited 2018-10-21 00:08
    After further tests the XORO32 instruction was changed in October 2018 to use xoroshiro32++[13,5,10,9].

    The modified Verilog code:
    https://forums.parallax.com/discussion/comment/1448460/#Comment_1448460
  • TonyB_TonyB_ Posts: 2,196
    edited 2018-10-11 15:52
    First 32 outputs for xoroshiro32++ [13,5,10,9] when seed = 1 {s1 = 0, s0 = 1}
    PRN calculated at start of algorithm before new state generated
    0201
    6269
    AE16
    12A2
    4AE8
    D719
    0C52
    984B
    1DF1
    743C
    DBA0
    BCC6
    34C9
    746C
    3643
    07FF
    BBC0
    C642
    AA85
    594E
    D05A
    701A
    A328
    F3AE
    67EE
    695A
    C140
    93C6
    F5E1
    4964
    5E24
    FC57
    

    First 16 outputs for XORO32 when seed = 1
    PRN calculated at start of algorithm before new state generated
    62690201
    12A2AE16
    D7194AE8
    984B0C52
    743C1DF1
    BCC6DBA0
    746C34C9
    07FF3643
    C642BBC0
    594EAA85
    701AD05A
    F3AEA328
    695A67EE
    93C6C140
    4964F5E1
    FC575E24
    
  • Heater.Heater. Posts: 21,230
    edited 2018-10-11 15:23
    TonyB_,

    I'm not making any sense out of the results you posted above for "First 16 outputs for XORO32 using xoroshiro32++[13,5,10,9] with seed = 1.
    PRN calculated at start of algorithm before new state generated.". The first output you have is 62690201 which according to the present algorithm is not possible.

    If you are doing the PRN calculation first then on the first run it is :

    prn = (s0 + s1) rol d + s0
    = (0x00000001 + 0x00000000) rol 9 + 0x00000001
    = 0x00000201

    Given that the seed is single 1 bit there is no way to get all the bits required to make 0x62690201

    I was trying to make a C++ version of the algorithm above but it produces different results.
    #include <cstdint>
    #include <stdlib.h>
    #include <iostream>
    #include <iomanip>
    
    class XoroshiroPlusPlus
    {
      const uint32_t a = 13;
      const uint32_t b = 5;
      const uint32_t c = 10;
      const uint32_t d = 9;
    
      static inline uint32_t rol(uint32_t x, uint32_t k) noexcept
      {
        return (((x) << (k)) | ((x) >> (32 - (k))));
      }
    
      uint32_t s0;
      uint32_t s1;
    
    public:
      explicit XoroshiroPlusPlus(uint32_t s0 = 1, uint32_t s1 = 0) noexcept
        : s0(s0), s1(s1)
      {
      }
    
      inline uint32_t operator()() noexcept
      {
        uint32_t prn = s0;
        prn + = s1;
        prn = rol(prn, d);
        prn += s0;
        s1 ^= s0;
        uint32_t tmp = s1;
        s0 = rol(s0, a);
        tmp = tmp << b;
        s0 ^= tmp;
        s0 ^= s1;
        s1 = rol(s1, c);
        return prn;
      }
    };
    
    int main(int argc, char* argv[])
    {
      XoroshiroPlusPlus prng(1, 0);
    
      for (int i = 0; i <= 16; i++)
      {
        std::cout << std::hex << std::setfill('0') << std::setw(8) << prng() << std::endl;
      }
    }
    
    The results:
    00000201
    00486221
    2608820a
    c7e20c2e
    08aabc0a
    5f77c44d
    9056f91c
    ed37ad47
    092dfb84
    62bd58fb
    d66e0e30
    106547c8
    b82d3d89
    72bc16ca
    ac4a76b8
    f6d6d4ef
    a5661e00
    
    The only thing that matches is the low 12 bits of the first result, which is the 0x201 I calculated above. Where did all the extra bits in your 62690201 come from?

    Is there an xoroshiro++ in C or C++ with example output anywhere? I could not find one.






  • evanhevanh Posts: 16,070
    Heater,
    I can see your variables and shifters are 32-bit. They should all be 16-bit.


    Tony has always done it as engine iteration first, then scrambler summing. For quite a while Chip had XORO32 also following that order of Tony's, but later changed it to summing first followed by engine iterate when XORO32 had become the longest (critical) timing path.

    It doesn't change the random data, just swaps the shortwords around.

  • Heater.Heater. Posts: 21,230
    edited 2018-10-11 15:59
    evanh,

    Many moons ago I made the comment here: "Would the real xoroshiro please stand up! This is very confusing."

    Well, it's still confusing :(

    The results I was attempting to recreate are for what TonyB_ described above as "XORO32 using xoroshiro32++[13,5,10,9] with seed = 1. PRN calculated at start of algorithm before new state generated."

    There is nothing there that indicates 16 bit processing. The name has "32" in it. The presented code shows 32 bit operations. The present results are 32 bit. He says right there that he is extracting PRN before generating the new state.

    Why only 16 bit variables when we have a 32 bit P2?

    More confused than ever. What is it you guys have been testing and is now in the P2?

  • Heater.,

    XORO32 does a double-iteration of xoroshiro32++. When seed = 1, 0201 is the first output and low word, 6269 the second output and high word, of the first 32-bit XORO32 result.

    I've added single-iterated test outputs to
    http://forums.parallax.com/discussion/comment/1448906/#Comment_1448906
  • evanhevanh Posts: 16,070
    Heater,
    I did post actual C source code for you yesterday - https://forums.parallax.com/discussion/comment/1448759/#Comment_1448759
  • Heater.Heater. Posts: 21,230
    What exactly do I need to my code above to get the same results?
  • evanhevanh Posts: 16,070
    Copy mine! Then build you own when confident.

  • This thread was meant for announcements. Please continue discussion at
    http://forums.parallax.com/discussion/166176/random-lfsr-on-p2#latest
  • evanhevanh Posts: 16,070
    Opps, I was wrong about Tony's output. The reason for the apparent rapid transition is simply because there is two concatenated 16-bit random numbers.
    62690201
    12a2ae16
    d7194ae8
    984b0c52
    743c1df1
    bcc6dba0
    746c34c9
    07ff3643
    c642bbc0
    594eaa85
    701ad05a
    f3aea328
    695a67ee
    93c6c140
    4964f5e1
    fc575e24
    a638d7ad
    181ae233
    7be766b5
    5cbd8445
    
  • Heater.Heater. Posts: 21,230
    Oops, we were cross posting there.

    Somehow it totally slipped my mind that you had posted that code. Been a bit busy around here.

    Let me see...
  • evanhevanh Posts: 16,070
    edited 2019-10-05 00:20
    Screen shot of verilog code for revB XORO32 [13 5 10 9]
    XORO32_Verilog.png
Sign In or Register to comment.