Shop OBEX P1 Docs P2 Docs Learn Events
Random/LFSR on P2 - Page 83 — Parallax Forums

Random/LFSR on P2

1808183858692

Comments

  • evanh wrote: »
    All of these variants are at least 16x more random than XORO32 (and all with the superior equidistribution properties).
    But only the last one will fit the constraints when as a single iterated 16-bit implementation.
    Which last one? The only requirement for double iteration is that the carry bit (i.e. +1) be toggled each iteration.
    The rotl(,1) on the second iteration is not a requirement, but does improve randomness.
  • evanhevanh Posts: 16,015
    And, to be honest, even that one is breaking the constraints by using a second store register to get the extended accumulation. Albeit, optionally.
  • evanh wrote: »
    I’ll re-run it on PractRand tomorrow... must sleep.
  • TonyB_TonyB_ Posts: 2,191
    edited 2020-01-10 12:28
    TonyB_ wrote: »
    Evan, if you have time to spare, I suggest reading the posts after the last one you made in this thread before today. Some posts are complicated, but the basic innovations are apparent quite early on.
    Page 77 is where to start:
    http://forums.parallax.com/discussion/166176/random-lfsr-on-p2/p77
    evanh wrote: »
    May as well make it one 3-clock instruction that takes two operands.
    A 3-clock instruction would be odd, very odd. A new XOROACC should be compatible with the existing XORO32. Here is a software version of XOROACC which does +1 but omits the rotl 1 for speed:
    https://forums.parallax.com/discussion/comment/1480364/#Comment_1480364
  • evanhevanh Posts: 16,015
    No odder than RDLUT.
  • TonyB_ wrote: »
    Some of the early discussion on equidistribution was in error regarding 16-bit 2D (it is near-perfect, not perfect like the 1D is), and Tony and I had not yet worked out the idea or details of making 32-bit near-perfect 1D a possibility (which I eventually found that toggling the +1 or ^1 on/off would create).
  • TonyB_ wrote: »
    A new XOROACC should be compatible with the existing XORO32. Here is a software version of XOROACC which does +1 but omits the rotl 1 for speed:
    https://forums.parallax.com/discussion/comment/1480364/#Comment_1480364
    From that simple code, further exploration was concerning extending the randomness throughout the entire double-iterated stream, not just half of it.
    Xor masking with a prime seemed like a good idea, but eventually was eclipsed by rotating the second iteration by 1 before adding (with or without an xor mask), as the rotation solved multiple issues.

    Any complexity beyond that (e.g. xoring the extended state with s0) is mostly for my own research, but I was hoping I would stumble on some new simple way to fully extend the randomness beyond 1 double-iterated 16GB stream, even though it was fairly obvious to me that couldn't actually happen without messing with the underlying engine or making dynamic changes to the underlying ++ scrambler... all too expensive to be useful even if I found something that worked.

    I cannot access my server from work due to blocking here since I changed ports on my server for security reasons (as I was under large-scale attack).
  • xoroshironotxoroshironot Posts: 309
    edited 2020-01-11 01:36
    Here is the PractRand pre0.95(fixed) forward result for XORO32 with no +s0, has +1 on first iteration and no rotation on second:
    rng=RNG_stdin, seed=unknown
    length= 4 gigabytes (2^32 bytes), time= 104 seconds
      Test Name                         Raw       Processed     Evaluation
      DC7-9x1Bytes-1:odd                R=  +8.1  p =  1.8e-4   unusual
      ...and 1694 test result(s) without anomalies
    
    rng=RNG_stdin, seed=unknown
    length= 8 gigabytes (2^33 bytes), time= 242 seconds
      Test Name                         Raw       Processed     Evaluation
      DC7-9x1Bytes-1:even               R= +11.7  p =  6.0e-6   mildly suspicious
      DC7-9x1Bytes-1:odd                R= +16.4  p =  4.0e-8   very suspicious
      DC7-9x1Bytes-1:both               R= +14.4  p =  6.5e-8   very suspicious
      DC7-9x1Bytes-1:indep              R= +27.0  p =   7e-12    VERY SUSPICIOUS
      [Low1/16]FPF-14+6/4:all           R=  -6.3  p =1-1.0e-5   unusual
      [Low4/16]FPF-14+6/16:all          R=  -5.5  p =1-5.9e-5   unusual
      [Low4/16]FPF-14+6/4:all           R=  -7.3  p =1-1.1e-6   mildly suspicious
      ...and 1796 test result(s) without anomalies
    
    rng=RNG_stdin, seed=unknown
    length= 16 gigabytes (2^34 bytes), time= 441 seconds
      Test Name                         Raw       Processed     Evaluation
      BCFN(0+0,13-0,T)                  R= +38.0  p =  8.0e-20    FAIL !
      BCFN(0+1,13-0,T)                  R= +22.2  p =  2.2e-11   VERY SUSPICIOUS
      DC7-9x1Bytes-1:even               R=+104.1  p =  1.4e-55    FAIL !!!!
      DC7-9x1Bytes-1:odd                R=+139.0  p =  4.0e-74    FAIL !!!!
      DC7-9x1Bytes-1:both               R=+120.0  p =  3.4e-62    FAIL !!!!
      DC7-9x1Bytes-1:indep              R=+293.3  p =  0          FAIL !!!!!!!!
      DC7-6x2Bytes-1:both               R= +40.0  p =  2.9e-18    FAIL !
      DC7-6x2Bytes-1:indep              R= +11.1  p =   2e-5    unusual
      FPF-14+6/16:(0,14-0)              R= +66.7  p =  2.3e-61    FAIL !!!!
      FPF-14+6/16:(1,14-0)              R= +64.9  p =  1.1e-59    FAIL !!!!
      FPF-14+6/16:all                   R= +38.7  p =  8.0e-36    FAIL !!!
      FPF-14+6/4:(0,14-0)               R= +67.7  p =  2.8e-62    FAIL !!!!
      FPF-14+6/4:(1,14-0)               R= +61.7  p =  1.1e-56    FAIL !!!!
      FPF-14+6/4:(2,14-0)               R= +48.0  p =  4.9e-44    FAIL !!!
      FPF-14+6/4:(3,14-0)               R= +46.2  p =  2.3e-42    FAIL !!!
      FPF-14+6/4:(4,14-0)               R= +47.3  p =  2.2e-43    FAIL !!!
      FPF-14+6/4:(5,14-0)               R= +45.7  p =  6.6e-42    FAIL !!!
      FPF-14+6/4:(6,14-0)               R= +31.8  p =  4.3e-29    FAIL !!
      FPF-14+6/4:(7,14-0)               R= +30.7  p =  4.6e-28    FAIL !!
      FPF-14+6/4:(8,14-0)               R= +29.3  p =  1.0e-26    FAIL !!
      FPF-14+6/4:(9,14-0)               R= +32.5  p =  1.0e-29    FAIL !!
      FPF-14+6/4:(10,14-0)              R= +14.5  p =  4.9e-13   VERY SUSPICIOUS
      FPF-14+6/4:(11,14-0)              R= +14.6  p =  3.8e-13   VERY SUSPICIOUS
      FPF-14+6/4:(14,14-2)              R=  +9.9  p =  1.9e-8   mildly suspicious
      FPF-14+6/4:all                    R=+134.0  p =  2.8e-125   FAIL !!!!!
      Gap-16:A                          R= +61.3  p =  9.3e-41    FAIL !!!
      Gap-16:B                          R= +79.4  p =  7.5e-68    FAIL !!!!
      [Low1/8]FPF-14+6/4:all            R=  +8.0  p =  6.0e-7   suspicious
      [Low1/8]Gap-16:A                  R= +21.0  p =  7.3e-15    FAIL
      [Low1/8]Gap-16:B                  R= +18.9  p =  4.1e-16    FAIL !
      [Low1/16]FPF-14+6/4:(0,14-0)      R= -11.9  p =1-1.2e-10  very suspicious
      [Low1/16]FPF-14+6/4:all           R= -10.6  p =1-6.2e-10   VERY SUSPICIOUS
      [Low1/16]Gap-16:A                 R= +69.4  p =  1.5e-60    FAIL !!!!
      [Low1/16]Gap-16:B                 R= +45.4  p =  1.0e-38    FAIL !!!
      [Low4/16]FPF-14+6/16:all          R=  -7.5  p =1-6.4e-7   mildly suspicious
      [Low4/16]FPF-14+6/4:(1,14-0)      R=  -9.5  p =1-1.7e-8   mildly suspicious
      [Low4/16]FPF-14+6/4:all           R= -12.2  p =1-1.4e-11   VERY SUSPICIOUS
      [Low4/16]Gap-16:B                 R=  +7.4  p =  3.0e-6   mildly suspicious
      ...and 1866 test result(s) without anomalies
    
    I believe the DC7 test is new in PractRand 0.95.
  • Cluso99Cluso99 Posts: 18,069
    IMHO the RND generator in the P2 is good enough. Please don't change it in later P2 Rev's.
  • Cluso99 wrote: »
    IMHO the RND generator in the P2 is good enough. Please don't change it in later P2 Rev's.
    Please list a specific reason, as that would be helpful at this point.
    I am in favor of keeping XORO32 intact, but possibly adding an extension to it.
    However, I am curious about how current usage scenarios with XORO32 might be broken if replaced entirely (with a superior PRNG that has several definite advantages)?
    For a user seedable PRNG that produces a definite user accessible repeatable sequence, I see reason for keeping the legacy sequence intact. I didn’t think XORO32 worked that way though?
    Good topic to bring up.
  • evanh wrote: »
    I’ll re-run it on PractRand tomorrow... must sleep.
    Evan, the results I just posted above were for toggling the + 1 on and off, so not exactly as you asked.
    As you requested, here are the results with +1, without toggling, D=9, no + s0:
    rng=RNG_stdin, seed=unknown
    length= 2 gigabytes (2^31 bytes), time= 39.2 seconds
      no anomalies in 1606 test result(s)
    
    rng=RNG_stdin, seed=unknown
    length= 4 gigabytes (2^32 bytes), time= 106 seconds
      Test Name                         Raw       Processed     Evaluation
      DC7-9x1Bytes-1:odd                R= +10.1  p =  1.7e-5   mildly suspicious
      DC7-9x1Bytes-1:both               R= +10.6  p =  2.0e-5   mildly suspicious
      DC7-9x1Bytes-1:indep              R= +16.4  p =   1e-7    suspicious
      [Low4/16]FPF-14+6/4:all           R=  -8.7  p =1-4.1e-8   very suspicious
      ...and 1693 test result(s) without anomalies
    
    rng=RNG_stdin, seed=unknown
    length= 8 gigabytes (2^33 bytes), time= 238 seconds
      Test Name                         Raw       Processed     Evaluation
      BCFN(0+0,13-0,T)                  R= +22.3  p =  1.7e-11   VERY SUSPICIOUS
      DC7-9x1Bytes-1:even               R= +18.3  p =  5.1e-9   very suspicious
      DC7-9x1Bytes-1:odd                R= +23.0  p =  3.0e-11   VERY SUSPICIOUS
      DC7-9x1Bytes-1:both               R= +18.3  p =  5.7e-10   VERY SUSPICIOUS
      DC7-9x1Bytes-1:indep              R= +41.3  p =  0          FAIL !!!!!!!!
      [Low1/16]FPF-14+6/4:all           R=  -7.0  p =1-1.9e-6   mildly suspicious
      [Low4/16]FPF-14+6/16:all          R=  -7.5  p =1-7.0e-7   mildly suspicious
      [Low4/16]FPF-14+6/4:(0,14-0)      R= -12.2  p =1-6.2e-11  very suspicious
      [Low4/16]FPF-14+6/4:(1,14-0)      R=  -7.8  p =1-6.4e-7   unusual
      [Low4/16]FPF-14+6/4:(2,14-0)      R=  -8.5  p =1-1.5e-7   mildly suspicious
      [Low4/16]FPF-14+6/4:(3,14-0)      R=  -8.3  p =1-2.4e-7   unusual
      [Low4/16]FPF-14+6/4:all           R= -16.8  p =1-4.7e-16    FAIL !
      ...and 1790 test result(s) without anomalies
    
    Toggling the +1 on/off clearly has superior randomness (and is required to provide near-perfect 32-bit equidistribution).
  • Cluso99Cluso99 Posts: 18,069
    Cluso99 wrote: »
    IMHO the RND generator in the P2 is good enough. Please don't change it in later P2 Rev's.
    Please list a specific reason, as that would be helpful at this point.
    I am in favor of keeping XORO32 intact, but possibly adding an extension to it.
    However, I am curious about how current usage scenarios with XORO32 might be broken if replaced entirely (with a superior PRNG that has several definite advantages)?
    For a user seedable PRNG that produces a definite user accessible repeatable sequence, I see reason for keeping the legacy sequence intact. I didn’t think XORO32 worked that way though?
    Good topic to bring up.

    The specific reason is obvious. Code will then need to be specific for each revision of the P2. There are already differences between the 3 hardware prototype P2's. Definite advantages need to compared with consistency. Just because the outcome is better doesn't make a change worthwhile. This is great theoretical stuff, but in practice it doesn't matter much to those who will use it.
  • Cluso99 wrote: »
    The specific reason is obvious. Code will then need to be specific for each revision of the P2.
    I was assuming that any change, if it did occur, would be implemented in a way that didn't require version specific code.
    Given that, having access to a provably better PRNG might possibly increase the usefulness, and integrity of those uses.
    Honestly, that was why I was asking questions, as I am not that familiar with any of the current uses cases of XORO32.
    We know from history that end users don't always understand how a hidden lack of randomness can negatively impact certain use cases.
    Also, theoreticians are often unable to see the flaws in their own work, or otherwise over-estimate the quality of their own work.
    That being said, this new PRNG we are discussing has some potential to find uses elsewhere (possibly only at larger state / period), even if not here.
  • evanhevanh Posts: 16,015
    edited 2020-01-11 10:54
    I've used XORO32 once so far for procedurally comparing multi-megabyte reads or writes with hyperRAM. It didn't matter the exact data sequence, just as long as it was repeatable. Worked a treat.

  • evanhevanh Posts: 16,015
    rng=RNG_stdin, seed=unknown
    length= 8 gigabytes (2^33 bytes), time= 238 seconds
      Test Name                         Raw       Processed     Evaluation
      BCFN(0+0,13-0,T)                  R= +22.3  p =  1.7e-11   VERY SUSPICIOUS
      DC7-9x1Bytes-1:even               R= +18.3  p =  5.1e-9   very suspicious
      DC7-9x1Bytes-1:odd                R= +23.0  p =  3.0e-11   VERY SUSPICIOUS
      DC7-9x1Bytes-1:both               R= +18.3  p =  5.7e-10   VERY SUSPICIOUS
      DC7-9x1Bytes-1:indep              R= +41.3  p =  0          FAIL !!!!!!!!
      [Low1/16]FPF-14+6/4:all           R=  -7.0  p =1-1.9e-6   mildly suspicious
      [Low4/16]FPF-14+6/16:all          R=  -7.5  p =1-7.0e-7   mildly suspicious
      [Low4/16]FPF-14+6/4:(0,14-0)      R= -12.2  p =1-6.2e-11  very suspicious
      [Low4/16]FPF-14+6/4:(1,14-0)      R=  -7.8  p =1-6.4e-7   unusual
      [Low4/16]FPF-14+6/4:(2,14-0)      R=  -8.5  p =1-1.5e-7   mildly suspicious
      [Low4/16]FPF-14+6/4:(3,14-0)      R=  -8.3  p =1-2.4e-7   unusual
      [Low4/16]FPF-14+6/4:all           R= -16.8  p =1-4.7e-16    FAIL !
      ...and 1790 test result(s) without anomalies
    
    Toggling the +1 on/off clearly has superior randomness (and is required to provide near-perfect 32-bit equidistribution).
    I should grid it. Off to bed right now though.

  • TonyB_TonyB_ Posts: 2,191
    edited 2020-01-11 12:36
    Cluso99 wrote: »
    Cluso99 wrote: »
    IMHO the RND generator in the P2 is good enough. Please don't change it in later P2 Rev's.
    Please list a specific reason, as that would be helpful at this point.
    I am in favor of keeping XORO32 intact, but possibly adding an extension to it.
    However, I am curious about how current usage scenarios with XORO32 might be broken if replaced entirely (with a superior PRNG that has several definite advantages)?
    For a user seedable PRNG that produces a definite user accessible repeatable sequence, I see reason for keeping the legacy sequence intact. I didn’t think XORO32 worked that way though?
    Good topic to bring up.

    The specific reason is obvious. Code will then need to be specific for each revision of the P2. There are already differences between the 3 hardware prototype P2's. Definite advantages need to compared with consistency. Just because the outcome is better doesn't make a change worthwhile. This is great theoretical stuff, but in practice it doesn't matter much to those who will use it.

    XORO32 output will not change. An optional hardware enhancement could improve it, but if you don't want that then just ignore it. Adding seven P2 instructions after XORO32 would emulate this proposed enhancement. If done in hardware in future, total instructions could be reduced to two and cycles four, the same as with XORO32 now.

    Over one-third of the possible 32-bit PRN values never occur with XORO32 as is, over one-third occur once and the remainder occur twice or more. Proposed optional enhancement would output all 32-bit values almost equally and 16-bit words exactly equally (and more randomly than now) over a much longer period.
  • xoroshironotxoroshironot Posts: 309
    edited 2020-01-11 18:43
    evanh wrote: »
    I should grid it. Off to bed right now though.
    For speed, cross-porting and memory access for 32-bit freqs, I moved all of my PRNGs from VB6 to C++ .NET.
    Output piping for PractRand, etc., is using fwrite, which I found on Windows benefits noticably from using a pre-buffer that is optimally about 32 bytes (actually 16 16-bit output words).
    In theory the call overhead for fwrite on Windows should not be an issue because of fwrite's existing internal buffer.
    But in practice it is. Cutting these longer single / parallel PractRand run times down by 20% or more by additional buffering worked a treat.
    Not sure if Linux / GCC would benefit the same.
  • evanhevanh Posts: 16,015
    edited 2020-01-12 00:10
    I used 1kB fwrite() buffers in my generators and the TestU01 tests. I don't think I looked for Practrand's input handler though.

    To be honest, I never compared if it helped vs putchar() or similar.

  • evanhevanh Posts: 16,015
    Evan, the results I just posted above were for toggling the + 1 on and off, so not exactly as you asked.
    As you requested, here are the results with +1, without toggling, D=9, no + s0: ...

    That's [13 5 10 9], right?

  • evanh wrote: »
    I used 1kB fwrite() buffers in my generators
    I tried modifying the default fwrite() buffer, but it only made things worse, on Windows anyway.
    I recall reading that the fwrite() buffer is adaptive and wouldn't benefit from using my own pre-buffer, which seems to be incorrect for high-data-rate sources.
    However, increasing my pre-buffer much beyond 40 bytes created a minor regression.
    Regardless, this (slightly adjusted) code gave the 20% speedup for Windows:
    //	uint16_t buffer[20]; 
    	uint32_t buffer[10];
    //	uint64_t buffer[5];
    	int sbuffer = sizeof(buffer[0]);
    	int buffsize = sizeof(buffer) / sbuffer;
    	_setmode(_fileno(stdout), _O_BINARY); // Only use for Windows!
    	do {
    		for (int j = 0; j < buffsize; j++) {
    			buffer[j] = xoroacc32(); // store 32-bit dword in array
    		}
    	} while (fwrite(&buffer[0], sbuffer, buffsize, stdout) == buffsize); // exit when PractRand is finished accepting data
    
    Tested speed with this command line:
    PRNG_STDOUT | rng_test.exe stdin -tf 2 -te 1 -tlmin 4GB -tlmax 4GB -multithreaded
  • evanhevanh Posts: 16,015
    Pre-buffer is what I've worked with then. I've not looked below.

  • xoroshironotxoroshironot Posts: 309
    edited 2020-01-12 03:53
    evanh wrote: »
    Evan, the results I just posted above were for toggling the + 1 on and off, so not exactly as you asked.
    As you requested, here are the results with +1, without toggling, D=9, no + s0: ...
    That's [13 5 10 9], right?
    Yes, as all indications are [13 5 10 9] is fairly good without the trailing + s0. I haven't tried all values of D, but recall values near W/2 (inclusive, unlike with ++) looked best.

    I had assumed this would be done with the + s0 included, thus D probably must = 9 since the required output values were already available from XORO32, and the randomness is slightly better with s0 included.

    Unfortunately, much of my early testing was on PractRand 0.94 and I hadn't realized how the (fixed) pre0.95 so effortlessly found faults that didn't show in 0.94.
  • evanhevanh Posts: 16,015
    Just compiled Practrand pre0.95 - The library code has a lot of warnings about a constant shift, BITS_PER_BLOCK, being beyond the type size.

    Recompiling 0.94 doesn't have any of those.

  • xoroshironotxoroshironot Posts: 309
    edited 2020-01-12 04:00
    evanh wrote: »
    Just compiled Practrand pre0.95
    I don't recall that specifically... were your able to follow all of the sparse notes on how to fix the various issues with pre-0.95 from the link I provided?
    I didn't see any indication of changes to the zip file source after the discussion the other Chris and I had, but haven't checked lately.
  • evanhevanh Posts: 16,015
    Just completed a grid run on that candidate:
      Gridded scores of single iterated Xoroacc32(16)+1 candidate [13 5 10 9].
          Byte Sampled Double Full Period = 8 GB
          PractRand v0.94 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 128KB
        |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15==
     16 |    8G   16G   16G   16G    8G    8G    8G    8G    8G   16G   16G    4G    8G    8G    8G    8G
     08 |    4G    4G    8G    4G    4G    4G    8G    8G   32M   32M   32M   32M   32M   64M    8G    8G
     04 |    2G    8G    4G    4G    4G    4G    4G    4G    4G    4G    4G    1G    4G    4G    8G    8G
      Exponent Minimum = 25   Exponent Average = 31.666   Exponent Maximum = 34
    

    Strong scores but has a weak spot with rotated 8-bit window. Will need search for a better candidate.

  • evanh wrote: »
    Strong scores but has a weak spot with rotated 8-bit window. Will need search for a better candidate.
    Try toggling +1 on and off each iteration.
    Without that and/or '+ s0', I imagine smaller windows might have more issues...not sure if changing D will help.
    Most of my rotational testing was with XORO32 at 16-bit window.
  • evanhevanh Posts: 16,015
    Wow, it's been a while since I did double iterated gridding. The script for it doesn't handle the newer source building method. Probably the matching source code doesn't either.

  • evanh wrote: »
    Wow, it's been a while since I did double iterated gridding.
    Initially I simply added an int state3 variable to be used in place of 1 and executed: state3 ^= 1;
  • evanhevanh Posts: 16,015
    evanh wrote: »
    Just compiled Practrand pre0.95
    I don't recall that specifically... were your able to follow all of the sparse notes on how to fix the various issues with pre-0.95 from the link I provided?
    They're only warnings so compile and link still succeeds. I haven't tried using it yet.

Sign In or Register to comment.