Random/LFSR on P2

1192021222325»

Comments

  • evanh wrote: »
    First twenty 17-bit (Xoroshiro34) sums for triplet [7,8,12] is:
    00001 01181 1608d 0a75d 0bf68 1c71d 158e8 028f2 1c4a8 1c24f 14e6c 1f938 1f135 07a8e 068cd 15404 1d908 02648 1e45c 13161

    First twenty 20-bit (Xoroshiro40) sums for triplet [7,9,12] is:
    00001 01281 c6013 5a2f1 2f5ac 7590c fb594 b7ac1 2a644 33649 ebe23 66f73 18a54 fd4a2 4674a 590d3 37bc4 2cce9 3a9db 0ee4e

    Evan, do you have a list of the best triplets in your opinion for different widths, say Xoroshiro24 to Xoroshiro40? If not, don't worry and if so, score tables not needed.
    Formerly known as TonyB
  • Yep, I've got a set of score tables for all those. Size 20 is the largest I've done though. Anything bigger took forever to brute force search for full period candidates.

    I'll zip them up when home ...
  • Actually, these previous early tables might sufficient for your needs -
    http://forums.parallax.com/discussion/comment/1410683/#Comment_1410683
  • Okay, I completed the current effort a few days later - http://forums.parallax.com/discussion/comment/1411074/#Comment_1411074 - and re-ran all the scoring but didn't post the full set. Here they go:
  • evanhevanh Posts: 4,180
    edited October 6 Vote Up0Vote Down
    Here's my selection. Some of them were a real toss up and only chosen by least number of lowest scores. A method that's somewhat dependent on the sampling variants I've scored with.
     Algorithm      Triplet     Word1   Word2  Byte99  Byte49   Byte2   Byte1    Bit
    =================================================================================
    Xoroshiro24+  [ 8  2  3]     64M     32M      8M              8M      8M      4M
    Xoroshiro26+  [ 9 12  3]     64M    128M     16M             16M     16M     16M
    Xoroshiro28+  [11  5  2]      1G    256M     32M     64M     32M     32M     64M
    Xoroshiro30+  [ 6 10  8]    512M    512M    256M    128M    256M     32M    256M
    Xoroshiro32+  [14  2  7]      1G    512M      1G    512M    512M    128M      1G
    Xoroshiro34+  [ 7  8 12]    512M      4G      2G      2G      2G    128M      4G
    Xoroshiro36+  [ 8 12 13]      1G    512M      2G      8G    512M    128M     16G
    Xoroshiro38+  [11  5 16]      8G      8G     32G      8G      4G    128M     64G
    Xoroshiro40+  [ 7  9 12]     16G     16G     64G     32G     32G    256M    256G
    

    EDIT: The "Bit" column exhibits a perfect scaling of scores! That blows me away. I don't have a clue why it's so rigidly alone like that. Pity it's also the scoring variant that takes many times longer to run.
  • TonyB_TonyB_ Posts: 83
    edited October 7 Vote Up0Vote Down
    Many thanks, Evan.

    I've just been reading this post in which Melissa O'Neill argues (right at the end) that it would be better to multiply the xoroshiro states rather than add them:
    http://www.pcg-random.org/posts/visualizing-the-heart-of-some-prngs.html

    As MUL takes the same time as ADD on the P2 there would be no performance penalty. It's not clear to me whether the product could be truncated to 16 bits as well as 8 bits. If so, would it perform better in randomness tests than the 16-bit sum? If not, two iterations would be needed to get a high-quality 16-bit result, but that's just the same as with adding.
    Formerly known as TonyB
  • evanhevanh Posts: 4,180
    edited October 7 Vote Up0Vote Down
    I can certainly replace the summing with multiplying. Don't even have to re-run the full period searches.

    Tony,
    With a quick look at that webpage, I'm not making much sense of how he's done the Xorshift algorithm. I'm suspicious he's just looking at the state data directly, because otherwise PractRand would be throwing a wobbly on our testing quick smart I'd think.
  • TonyB_ wrote: »
    ... It's not clear to me whether the product could be truncated to 16 bits as well as 8 bits. If so, would it perform better in randomness tests than the 16-bit sum? If not, two iterations would be needed to get a high-quality 16-bit result, but that's just the same as with adding.
    Assuming XORO32 based, why would multiplying two 16-bit values only produce 8 bits? Or is that not what you were talking about?
  • Hmm, what we need is the randogram code so I can reproduce his pretty pictures with selected triplets.
  • evanhevanh Posts: 4,180
    edited October 7 Vote Up0Vote Down
    Well, simply replacing the addition with a multiply produces rubbish PractRand scores. Amusingly, the "bit" variant is still the same scores but most everything else barely makes it passed the starting blocks.

    Here's the s12 multiplying scores for comparison:
        Xorshift24* PractRand Score Table
    
    Combination    Word1   Word2  Byte04  Byte02   Byte2   Byte1    Bit
    ====================================================================
     [ 1  7  1]      8K     16K     64K              8K      2K      4M
     [ 1  7  5]      8K     16K     64K              8K      2K      4M
     [ 2  5 11]      8K     16K     64K              4K      2K      4M
     [ 3  1  8]      8K     16K     64K              8K      2K      4M
     [ 3  2 11]      8K     32K     64K              4K      2K      4M
     [ 3  5  3]      4K     16K     64K              4K      2K      4M
     [ 4  5  1]      8K     32K     64K              8K      2K      4M
     [ 5  1  4]      4K     16K    128K              4K      2K      4M
     [ 5  2 11]      8K     64K     64K              4K      2K      4M
     [ 5  7  9]      8K     32K     64K              8K      2K      4M
     [ 7  5  8]      4K     16K     64K              4K      2K      4M
     [ 8  3  1]      8K     32K    128K              8K      2K      4M
     [ 9  1  2]      8K     32K     64K              4K      2K      4M
     [ 9  1  5]      4K     32K    128K              2K      1K      4M
    
    
        Xoroshiro24* PractRand Score Table
    
    Combination    Word1   Word2  Byte04  Byte02   Byte2   Byte1    Bit
    ====================================================================
     [ 1  1 10]      4K     16K    128K              8K      2K      4M
     [ 1  5  6]      4K      8K     64K              4K      2K      4M
     [ 2  2  3]      4K     16K    128K              8K      2K      4M
     [ 3  2  2]      8K     16K     64K              8K      2K      4M
     [ 3  2  8]      4K     16K     64K              4K      1K      4M
     [ 4  2 11]      4K     16K    128K              4K      2K      4M
     [ 4  4  7]      4K     32K     64K              4K      2K      4M
     [ 4  4  9]      4K     32K     64K              4K      2K      4M
     [ 4  9  7]      8K     32K     64K              8K      1K      4M
     [ 6  5  1]      4K     16K     32K              8K      2K      4M
     [ 7  4  4]      8K     16K     64K              4K      2K      4M
     [ 7  4  8]      8K     16K     64K              8K      1K      4M
     [ 7  9  4]      8K     32K     64K              8K      2K      4M
     [ 7 10 10]      4K     16K     64K              4K      2K      4M
     [ 8  2  3]      4K     32K     64K              8K      2K      4M  (Same triplet as favourites above)
     [ 8  2 11]      4K     16K    128K              4K      2K      4M
     [ 8  3  9]      8K      8K     64K              4K      2K      4M
     [ 8  4  7]      4K     16K     64K              8K      2K      4M
     [ 8  9 11]      4K      4K     64K              8K      2K      4M
     [ 9  3  8]      8K     16K     64K              4K      2K      4M
     [ 9  4  4]      8K     32K    128K              8K      2K      4M
     [10  1  1]      4K     16K     64K              4K      2K      4M
     [10 10  7]      4K     32K     64K              4K      2K      4M
     [11  2  4]      4K     16K     64K              8K      2K      4M
     [11  2  8]      4K     16K     64K              8K      2K      4M
     [11  9  8]      4K     16K    128K              4K      1K      4M
    
  • TonyB_TonyB_ Posts: 83
    edited October 7 Vote Up0Vote Down
    I've added the author's name to my previous post. It's hard to be sure, but I think her "xoroshiro16star8" uses two 8-bit states that are multiplied by a 16-bit magic number ("MCG multiply") and only the high byte of the product kept.
    Formerly known as TonyB
  • Which one of my posts are you answering there Tony?
  • All of them, but let's put this to one side for a while. Try doing a hexdump of the first few Xoroshiro32+ sums (eight is plenty) when seed is $32F3_1EDB :)
    Formerly known as TonyB
  • evanhevanh Posts: 4,180
    edited October 8 Vote Up0Vote Down
    Seeding with s0 = $1EDB and s1 = $32F3, I get the following 16-bit sums: 51ce 6f54 796e fde9 1dbf dac1 7e47 0486 85d8 48f7 7df9 0102 0601 bf23 1f34 c6b5 39be 1909 7824 c941
  • evanhevanh Posts: 4,180
    edited October 12 Vote Up0Vote Down
    evanh wrote: »
    Dang, it just dawned on me, far too late, that DRAM prices have been going through the roof all this year! And even with those prices, supply appears to have gone to virtually nil.

    One of my new 16GB DIMMs failed and there is no exact replacements in stock. I've got the full credit for them (Returned the pair) and can claim it as a refund, but the options for replacement are slim pickings.

    I'm back using me old PC for the moment, and the remainder of the year looks bleak. :(
    Well, things have progressed. I got the full refund for the faulty RAM and went elsewhere to get something faster. I figured if I was going to be paying inflated prices to get a decent set of DIMMs I may as well get something faster than I had. The alternative was something slower but still more expensive than original.

    I've since discovered, and verified, that at least one program was having issues with the SMT flaw in the early Ryzens. I would have been happy to just disable SMT in the BIOS but the silly BIOS has a problem with that - It insists I had an unexpected power down every time it powers up - have to press F1 and ESC and ENTER every time I power it up with SMT disabled!

    So now I'm sending the CPU back for a free exchange tomorrow - never done anything international like this before - AMD have directly emailed me a FedEx number to book it to. Conveniently there is a depot only ten minutes drive away ...
  • TonyB_TonyB_ Posts: 83
    edited October 20 Vote Up0Vote Down
    TonyB_ wrote: »
    evanh wrote: »
    First twenty 17-bit (Xoroshiro34) sums for triplet [7,8,12] is:
    00001 01181 1608d 0a75d 0bf68 1c71d 158e8 028f2 1c4a8 1c24f 14e6c 1f938 1f135 07a8e 068cd 15404 1d908 02648 1e45c 13161

    First twenty 20-bit (Xoroshiro40) sums for triplet [7,9,12] is:
    00001 01281 c6013 5a2f1 2f5ac 7590c fb594 b7ac1 2a644 33649 ebe23 66f73 18a54 fd4a2 4674a 590d3 37bc4 2cce9 3a9db 0ee4e

    Evan, do you have a list of the best triplets in your opinion for different widths, say Xoroshiro24 to Xoroshiro40? If not, don't worry and if so, score tables not needed.
    evanh wrote: »
    Yep, I've got a set of score tables for all those. Size 20 is the largest I've done though. Anything bigger took forever to brute force search for full period candidates.

    I'll zip them up when home ...

    I've done Xoroshiro40+ on the Z80 and the triplet numbers were quite friendly. A final one I'd like to try is Xoroshiro48+ as every bit in two sets of three 8-bit registers would be used. There is a trick for finding full-period triplets without brute-force as that would take forever with Xoroshiro128+, if only we knew what the trick is.
    Formerly known as TonyB
  • Cool. :)

    Yes, Xoroshiro128+ and 1024+ both have "jump" functions supplied in the sources from the authors.

    Starting from bit 0 of both word halves of the PRNG state, it does an overlaying XOR of a working state with the current state based on what looks to be a specially crafted key, then calls the iterator, stepping the current state by one. This repeats for each bit of the word size (ie: half the state length).

    And at the end it copies the working state over the current state.


    Here's the actual code from original Xoroshiro128+ jump() function:
    /* This is the jump function for the generator. It is equivalent
       to 2^64 calls to next(); it can be used to generate 2^64
       non-overlapping subsequences for parallel computations. */
    
    void jump(void) {
    	static const uint64_t JUMP[] = { 0xbeac0467eba5facb, 0xd86b048b86aa9922 };
    
    	uint64_t s0 = 0;
    	uint64_t s1 = 0;
    	for(int i = 0; i < sizeof JUMP / sizeof *JUMP; i++)
    		for(int b = 0; b < 64; b++) {
    			if (JUMP[i] & 1ULL << b) {
    				s0 ^= s[0];
    				s1 ^= s[1];
    			}
    			next();
    		}
    
    	s[0] = s0;
    	s[1] = s1;
    }
    
  • Oops, sorry, it runs the word length again with a second key before copying the working state over the current state.
  • C is mostly gibberish to me. I don't see how a jump function proves a certain triplet is full-period. I've just looked through the prng Google group and it wasn't any help really.
    Formerly known as TonyB
  • Somewhere, some how, some mathematician can prove that some pseudo random number generator algorithm with some parameters will run though all, or most, of its combinations of states before returning to where it started.

    I have no idea how.

    Of course, running through a huge sequence does not imply a random looking sequence. A simple binary counter will run through such a sequence, step by step in order. We need something better to defeat the tests of "randomness".

    I think understanding C is the least of the problems in understanding how these things work.

    Certainly brute force running and checking them does not help. The periods can be longer to work through than the age of the universe!

    What is certainly true is that tweaking the number of bits in the state or messing with the parameters can ruin a good PRNG.

    I start to believe that the search for a better pseudo random number generator drives people insane. As this thread shows :) I felt it myself before I got help from "PRNG Anonymous". For example this from Stanford:

    PCG: A Family of Better Random Number Generators:




  • The top three lines of the source is a comment, so that's plain english at least. :) Basically, this particular jump() config is not useful for finding any full period triplets because even it still has to be called 2^64 times to hop all the way along.

    So, presumably it's possible to built jump functions that steps further than a square-root into the period. But they're not inclined to explain, or maybe it's just really hard to work out. I have no idea.
  • Heater. wrote: »
    What is certainly true is that tweaking the number of bits in the state or messing with the parameters can ruin a good PRNG.
    Full period may be solved at a higher analytical level but the quality certainly isn't. Tony's link to Melissa's visualisation work is a good demo of how quality is accessed. And that's also why PractRand and BigCrush are heavily relied on.
  • Hmm... if it has to be called 2^64 times I would hardly call it a "jump" function.

    Certainly jump functions can make huge strides into the state space. I have no idea about the square-root thing.

    Me too. I have no idea.

Sign In or Register to comment.