Shop OBEX P1 Docs P2 Docs Learn Events
Random/LFSR on P2 - Page 42 — Parallax Forums

Random/LFSR on P2

1394042444592

Comments

  • Heater.Heater. Posts: 21,230
    Great, Pseudo Random Number Generator wars!

    As I suspected, random people are throwing randomly generated algorithms at the problem hoping something will stick. With perhaps a smattering of theory to back them up.


  • TonyB_TonyB_ Posts: 2,099
    edited 2018-04-03 03:08
    TonyB_ wrote: »
    I've just spotted tonight that there is a new post about xoroshiro128+ on Melissa O'Neill's "PCG, A Better Random Number Generator" blog:
    http://www.pcg-random.org/posts/xoroshiro-fails-truncated.html

    I told her about the xoroshiro++ algorithm on February 22nd.

    Notice the length of Melissa's PractRand tests:

    xoroshiro64+ [31:16]
    Failed after 14 hours but kept running for 210 days!

    xoroshiro128+ [63:32]
    Failed after 101 days!
    Evan's [55:0] & [57:2] tests failed after 3 days
  • cgraceycgracey Posts: 14,131
    edited 2018-04-03 04:54
    Here is the new Verilog for the XORO32 instruction.

    I had to rewrite it, to reveal its regularity, in order to manipulate it confidently:

    XORO32.png
    812 x 329 - 10K
  • cgraceycgracey Posts: 14,131
    The new/faster XORO32 outputs these initial 16 longs when seeded with $0000_0001:
    50AD0021
    A3D9B89A
    9BD90C87
    35023CAA
    3D248840
    71F57287
    A81890AC
    2CE3B4C1
    B31DCC58
    B7A4DF3D
    3B2EA2E3
    BFFF2C20
    5B9AAFE9
    35F43BB8
    6A2D921C
    A217CD9A
    

    This looks right to me.
  • cgraceycgracey Posts: 14,131
    edited 2018-04-03 05:46
    Now is the time, if we want to add an adder stage to the XOROSHIRO128+, like we did for the XORO32++.

    Is there much risk in using the guideline given about a prime rotation value near 32?

    Never mind that extra adder question for the central PRNG. What we have is probably much more than we need, anyway, and I need to get the files to On Semi.

    I just sent On Semi what are, hopefully, the final Verilog files.
  • evanhevanh Posts: 15,091
    cgracey wrote: »
    Is there much risk in using the guideline given about a prime rotation value near 32?

    XORO32 experience suggest it's okay, assuming the original triplet is truly a good quality candidate. The [14 2 7] triplet produced a wide group of good scores.

    The outcome is likely a big improvement no matter what, so I guess on that basis you are good to do it. But no guarantees.

  • evanhevanh Posts: 15,091
    cgracey wrote: »
    Never mind that extra adder question for the central PRNG. What we have is probably much more than we need, anyway, and I need to get the file to On Semi.

    Agreed.

  • cgraceycgracey Posts: 14,131
    Okay. If a bug-fix arises, we could slip it in.
  • evanhevanh Posts: 15,091
    edited 2018-04-03 11:00
    TonyB_ wrote: »
    evanh wrote: »
    Actually, I'll run more selections of 32-bit widths first. That's what the Cogs use after all ...
    It would be best to test the actual 32-bit subsets used by GETRND D or 8-bit (?) subsets used for dithering.
    I've started a run of 32-bit widths and the first one to drop out is, as expected, the one that has bit1 as lsb, [32:1]. It baled at 256 GB, which is in the ballpark of previous equivalent:
    length= 256 gigabytes (2^38 bytes), time= 4539 seconds
    Test Name Raw Processed Evaluation
    [Low1/32]BRank(18):12K(1) R=+86692 p~= 0 FAIL !!!!!!!!
    ...and 1567 test result(s) without anomalies

    Still running is [33:2] and [34:3], all eight Ryzen cores are active. Not expecting these two to drop out until the 16 TB mark I guess. After that I'll have a shot at an odd bit width - which should go much further.

    PS: Just over 2 GB of RAM used by each at the moment. I think the prior 16 TB runs got up to about 6.5 GB of RAM each. I'll keep an eye on this run too.

  • cgracey wrote: »
    The new/faster XORO32 outputs these initial 16 longs when seeded with $0000_0001:
    50AD0021
    A3D9B89A
    9BD90C87
    35023CAA
    3D248840
    71F57287
    A81890AC
    2CE3B4C1
    B31DCC58
    B7A4DF3D
    3B2EA2E3
    BFFF2C20
    5B9AAFE9
    35F43BB8
    6A2D921C
    A217CD9A
    

    This looks right to me.
    TonyB_ wrote: »
    First 16 outputs for XORO32 when seed = 1 {s1 = 0, s0 = 1}
    PRN calculated at start of algorithm before new state generated
    50AD0021
    A3D9B89A
    9BD90C87
    35023CAA
    3D248840
    71F57287
    A81890AC
    2CE3B4C1
    B31DCC58
    B7A4DF3D
    3B2EA2E3
    BFFF2C20
    5B9AAFE9
    35F43BB8
    6A2D921C
    A217CD9A
    

    More confirmation that new XORO32 working as intended.
    The new Verilog looks much cleaner.
  • TonyB_TonyB_ Posts: 2,099
    edited 2018-04-04 23:25
    Heater. wrote: »
    TonyB_ wrote: »
    I've just spotted tonight that there is a new post about xoroshiro128+ on Melissa O'Neill's "PCG, A Better Random Number Generator" blog:
    http://www.pcg-random.org/posts/xoroshiro-fails-truncated.html

    I told her about the xoroshiro++ algorithm on February 22nd.

    Great, Pseudo Random Number Generator wars!

    As I suspected, random people are throwing randomly generated algorithms at the problem hoping something will stick. With perhaps a smattering of theory to back them up.

    No, just "not-invented-here" or more particularly "not-invented-by-me" syndrome, to a significant extent. xoroshiro+ is old hat now, anyway.

  • cgraceycgracey Posts: 14,131
    I've found a bug that needs fixing. What do we need to do to also upgrade our xoroshiro128?
  • cgracey wrote: »
    I've found a bug that needs fixing. What do we need to do to also upgrade our xoroshiro128?

    PM just sent.

  • cgraceycgracey Posts: 14,131
    edited 2018-04-05 17:55
    Thanks, Guys! We now have an improved Xoroshiro128 implementation.

    I found the bug. It had to do with SETQ2+WRLONG (writing LUT longs to hub). All fixed now. New files going off to OnSemi.
  • evanhevanh Posts: 15,091
    evanh wrote: »
    I've started a run of 32-bit widths and the first one to drop out is, as expected, the one that has bit1 as lsb, [32:1]. It baled at 256 GB, which is in the ballpark of previous equivalent:
    length= 256 gigabytes (2^38 bytes), time= 4539 seconds
    Test Name Raw Processed Evaluation
    [Low1/32]BRank(18):12K(1) R=+86692 p~= 0 FAIL !!!!!!!!
    ...and 1567 test result(s) without anomalies

    Still running is [33:2] and [34:3], all eight Ryzen cores are active. Not expecting these two to drop out until the 16 TB mark I guess. After that I'll have a shot at an odd bit width - which should go much further.

    PS: Just over 2 GB of RAM used by each at the moment. I think the prior 16 TB runs got up to about 6.5 GB of RAM each. I'll keep an eye on this run too.
    Both have just breezed passed 16 TB ... another 3 days 'till 32 TB mark. Not sure how that conforms, I expected at least the [33:2] to fail at or before the earlier [57:2] run.

    I had planned on reflashing the BIOS this weekend too. That sucks a little now cos it might be some weeks.

    RAM usage is now over 15 GB between them.

  • cgraceycgracey Posts: 14,131
    Evanh, what are you testing? Is it some form of Xoroshiro128?
  • evanhevanh Posts: 15,091
    That was started a few days back, it's the original Xoroshiro128+ [55 14 36].

  • cgraceycgracey Posts: 14,131
    edited 2018-04-06 12:03
    Okay. Cool!

    Wait... You mean you are also running the NEW one now? That would be good.
  • evanhevanh Posts: 15,091
    No I cancelled that so it wouldn't slow these two down. I guess now that they haven't failed where I expected them to they can be lowered in priority.

    The short run test, [7:0] sampling is usually a weak point, that I did do of the new algorithm satisfied me it is good to go. I'll do a short run full width 64-bit run now to double check that ...

  • cgraceycgracey Posts: 14,131
    Sounds good.
  • TonyB_TonyB_ Posts: 2,099
    edited 2018-04-07 01:03
    Seba has confirmed our test outputs for the new and much improved xoroshiro128. So new it hasn't been made public yet. All 64 bits of the output are very good and are used equally by the bit scramblers. Previously with xoroshiro128+ the top 62 bits were used unequally and the weak lowest two bits ignored. Everything is fine for xoroshiro128.

    Not necessarily so for XORO32. A little while ago we switched algorithms to xoroshiro32++ [14,2,7,5] from [14,2,7,6]. The only difference is a left rotation of 5, not 6, in the PRN calculation. I asked Seba to check [14,2,7,5] with BigCrush, which is one of the battery tests in TestUI, an alternative random number tester to PractRand that Evan has been using. He says we should not rely on a single test program.

    The main batteries in the TestUI suite are SmallCrush, Crush and BigCrush, which are progressively more stringent and time-consuming. BigCrush results for [14,2,7,5] were much better than [14,2,7,6]. However, they were also totally wrong as xoroshiro64++ [14,2,7,5] was tested by mistake.

    I have been sent the following report files:

    SmallCrush [14,2,7,5] - 1 test failed (of 10)
    Crush [14,2,7,5] - 5 tests failed (of 96)
    BigCrush [14,2,7,6] - 11 tests failed (of 106)

    Unfortunately I can't compare like-for-like. Seba told me that [14,2,7,6] passes all tests in Crush. I really need to see corresponding report files to be 100% sure but as things stand I think we should revert to [14,2,7,6] which has better scores with normal PractRand tests, but not extended ones.
  • evanhevanh Posts: 15,091
    TonyB_ wrote: »
    ... I think we should revert to [14,2,7,6] which has better scores with normal PractRand tests, but not extended ones.
    I consider the extended tests more valid, and I didn't test as many sampling variants with the default PractRand settings either.

    I'm not in favour of reverting.

  • evanhevanh Posts: 15,091
    edited 2018-04-07 00:34
    Chip,
    I left the full width sampling test of the updated Xoroshiro128 running overnight. It went past 2 TB without issue.

    EDIT: To put that in context, under same test configuration, the original Xoroshiro128+ falls over at a tiny 4 MB! See https://forums.parallax.com/discussion/comment/1434165/#Comment_1434165

  • cgraceycgracey Posts: 14,131
    Evanh, I hope our new XORO32 is optimal. I don't want to force a change with OnSemi over something like that.

    TonyB_, any change of opinion?
  • evanhevanh Posts: 15,091
    edited 2018-04-06 23:13
    I'm rerunning the same candidates with and without extended testing right now ...

    PS: Seba's earlier scores for [14 2 7 x] would have both been for Xoroshiro64++ not for 32++. My guess is we have no idea how good Xoroshiro32++ [14 2 7 6] scores are on the Crush tests.

  • TonyB_TonyB_ Posts: 2,099
    edited 2018-04-06 23:28
    cgracey wrote: »
    Evanh, I hope our new XORO32 is optimal. I don't want to force a change with OnSemi over something like that.

    TonyB_, any change of opinion?

    Chip, Seba said he would run some more Crush and BigCrush tests and we'll have the output files to make a proper comparison between the two. Results not here yet and might not be now until tomorrow. I hope our XORO32 is optimal and we haven't gone backwards by switching.

    Evan, can you post your extended results here, now that xoroshiro++ is not a secret any more? The problem I have is that the best two quadruples of [5,2,6,9] and [6,2,5,9] do badly on repeat frequency, which makes me doubt the worth of the extended tests. [14,2,7,5] and [14,2,7,6] do well on repeat frequency.
  • evanhevanh Posts: 15,091
    edited 2018-04-06 23:41
    Chip,
    This is certainly not something that requires intervention. The differences are nit picking.


    Having said that, it's even murkier at this end now. :( It would seem I was only just starting my foray into the extended options of PractRand when I generated that score table. It didn't include the -tf 2 option.

    Here's a repost of that truncated table, plus I've added the equivalent table for standard options and also for -te 1 -tf 2 extended options:
    
      Xoroshiro32(16)++ PractRand Score Table.  PractRand v0.93 options: -multithreaded -te 1 -tlmin 1KB
    __________________________________________________________________________________________________________
                    Sampling apertures of the generator output, labelled as most to least bit-significance
      Candidate    ----------------------------------------------------------------------------------------
    [ A  B  C  D]  15:00 14:00 13:00 15:01 15:02 15:08 14:07 13:06 12:05 11:04 10:03  9:02  8:01  7:00
    ==========================================================================================================
    [ 5  2  6  9]     1G    4G    2G    2G    1G    2G    2G    1G    1G    1G    2G    1G    1G    1G
    [ 6  2  5  9]     1G    4G    2G    4G    1G    2G    1G    2G    1G    1G    1G    1G    1G    2G
    [14  2  7  5]   512M    4G    4G    2G    4G    1G    1G    1G    1G    1G    1G    1G    1G    1G
    [14  2  7  6]   512M    8G    2G    4G    1G    2G    1G  256M  256M    2G  512M  512M  512M  512M
    [15  4 12  9]   512M   16G    8G    8G    2G    1G    1G  512M  512M  512M  512M    1G    1G    1G
    
    
    
      Xoroshiro32(16)++ PractRand Score Table.  PractRand v0.93 options: -multithreaded -tlmin 1KB
    __________________________________________________________________________________________________________
                    Sampling apertures of the generator output, labelled as most to least bit-significance
      Candidate    ----------------------------------------------------------------------------------------
    [ A  B  C  D]  15:00 14:00 13:00 15:01 15:02 15:08 14:07 13:06 12:05 11:04 10:03  9:02  8:01  7:00
    ==========================================================================================================
    [ 5  2  6  9]     1G    8G   16G    8G    4G    2G    2G    1G    2G    2G    2G    8G    2G    1G
    [ 6  2  5  9]     1G    8G    8G    8G    4G    2G    2G    2G    2G    2G    2G    2G    1G    2G
    [14  2  7  5]   512M   16G   16G    8G   16G    2G    2G    1G    1G    2G    2G    2G    2G    2G
    [14  2  7  6]   512M   16G    8G   16G    8G    4G    2G    1G  512M    2G    1G    1G    1G    2G
    [15  4 12  9]   512M   16G   16G   16G    8G    1G    1G  512M  512M    1G    1G    1G    1G    1G
    
    
    
      Xoroshiro32(16)++ PractRand Score Table.  PractRand v0.93 options: -multithreaded -te 1 -tf 2 -tlmin 1KB
    __________________________________________________________________________________________________________
                    Sampling apertures of the generator output, labelled as most to least bit-significance
      Candidate    ----------------------------------------------------------------------------------------
    [ A  B  C  D]  15:00 14:00 13:00 15:01 15:02 15:08 14:07 13:06 12:05 11:04 10:03  9:02  8:01  7:00
    ==========================================================================================================
    [ 5  2  6  9]     1G    4G    2G    2G    1G    2G    2G    1G    1G    1G    2G    1G    1G    1G
    [ 6  2  5  9]     1G    4G    2G    4G    1G    2G    1G  256M    1G    1G    1G    1G    1G    2G
    [14  2  7  5]   256M    4G    4G    2G    4G    1G    1G    1G    1G    1G    1G    1G    1G    1G
    [14  2  7  6]   512M    8G    2G    4G    2G    2G    1G  256M  256M    2G  512M  512M  512M  512M
    [15  4 12  9]   256M   16G    8G    8G    2G    1G    1G  512M  512M  512M    1G    1G    1G    1G
    
    Notably, with full extended options, [14 2 7 5] degrades to 256 MB on the full width sampling.

  • evanhevanh Posts: 15,091
    Here's all the full extended "-te 1 -tf 2" report files I just scored.
  • TonyB_TonyB_ Posts: 2,099
    edited 2018-04-07 02:09
    evanh wrote: »
    Chip,
    This is certainly not something that requires intervention. The differences are nit picking.

    It's more than nit-picking if [14,2,7,6] passes all the Crush tests with [14,2,7,5] failing 5. Or if [14,2,7,5] has fewer passes in BigCrush than [14,2,7,6]. The failures are due to the small state, if the generator is good, but that applies to PractRand as well.

    P.S. xoroshiro[x]y++ is in the text in the output files.
  • evanhevanh Posts: 15,091
    No 32-bit state PRNG passes all Crush tests. Their periods just aren't long enough.

    I'm confident we haven't seen any Crush scores for [14,2,7,6], and that it won't be significantly different to [14,2,7,5].

Sign In or Register to comment.