Shop OBEX P1 Docs P2 Docs Learn Events
Random/LFSR on P2 - Page 71 — Parallax Forums

Random/LFSR on P2

1686971737492

Comments

  • evanhevanh Posts: 15,915
    edited 2019-04-15 02:15
    I'm ditching the D=0 means "plus" scrambler special case. The new selection mechanism for which algorithm to test has made it redundant. From now on the data and reports with D constant of 0 will really be the specified algorithm.
  • evanh wrote: »
    Okay so M can be used as evenly spaced offsets to the state. But it won't be as simple as, say, invert msb of M to jump 50% through state space. It won't be that linear, right?
    M is a key to an entirely different sequence, so the values that occur (pick any dozen in a row) with one M will likely never occur with another M present.
    Inverting any bit in M (using the seed function) will switch to a stream unrelated to the previous one (except that all streams form a master set, like threads in a rope that complexly intertwine, but never join each other).
    So, nothing remotely linear.

    Regarding scrambler modifications, it is difficult. The issue is with carry propagation, which is why the original sum rotate sum works so well, so viable options would likely do something similar.
    Honestly, I believe that any small improvements would be easily solved in assembly/hardware (if there were no need to publish comprehensible code in a language like C... is there a need?).
  • evanhevanh Posts: 15,915
    edited 2019-04-15 04:01
    Inverting any bit in M (using the seed function) will switch to a stream unrelated to the previous one (except that all streams form a master set, like threads in a rope that complexly intertwine, but never join each other).
    Ah, nice, I see, more an alternative to a jump.
    Honestly, I believe that any small improvements would be easily solved in assembly/hardware (if there were no need to publish comprehensible code in a language like C... is there a need?).
    Cool, I guess it's less universal that way but I'm interested. There's always new hardware getting designed.
  • xoroshironotxoroshironot Posts: 309
    edited 2019-04-16 21:48
    Updated beta code: Xoroshiro128psp
    I need to restart statistical testing on it.
    128-bit Equivalent to xoroshiro32pp (no streaming) using scrambler: uint16_t result = rotl((s0 + s1) * 5, 8 ) + s0 * 5;
  • evanhevanh Posts: 15,915
    edited 2019-04-16 16:54
    This refactoring road is feeling like a rabbit hole at times. I keep finding more ideas to make it tidier as I move to each script ... causing a recursion. Lots of copying of replacement code that then need tweaking for different job.

    Time for bed I think, I just spent some hours trying to work out why it was reporting every file of a certain aging directory, with over 90,000 files, not to exist. Only to discover it was entirely correct due to an unremembered change that had been made some months back that produced a surprisingly subtle naming change.
  • evanh wrote: »
    This refactoring road is feeling like a rabbit hole at times.
    It is a rabbit hole... only a little to be gained with 32-bit state.
    The above example scrambler should set the upper limit of what is possible at 32-bit, but that is not necessarily what is reasonable for a decent hardware implementation.
    In light of that, the existing xoroshiro32pp has only a little room to grow... I guess an almost 'Pass' on PractRand at 2GB would be considered about double.
    With 128-bit state, I will take that improvement, and the x64 compiler/CPU pipeline efficiency relieves the performance penalty. However, I still have to be concerned with all other details.
  • evanhevanh Posts: 15,915
    evanh wrote: »
    This refactoring road is feeling like a rabbit hole at times.
    It is a rabbit hole... only a little to be gained with 32-bit state.
    The above example scrambler should set the upper limit of what is possible at 32-bit, but that is not necessarily what is reasonable for a decent hardware implementation.
    Oh, I'm talking about posting a more useable versions of my automation scripts and C sources.
  • evanh wrote: »
    Oh, I'm talking about posting a more useable versions of my automation scripts and C sources.
    I understand what you mean. I'm just afraid to touch anything I have created for fear of breaking it.

    I'm gearing up for automating Big Crush again. Pretty ugly coding, because my Big Crush piped version for Windows is too slow, so I use Windows VB auto-generated batch files that call 'Bash on Ubuntu on Windows' shell scripts (from D. Lemire, but modified by me) and automatically parse the output files from Windows with another VB statistical analysis and visualization program I wrote.

    See attached screenshot of AES-counter-mode statistical cloud meta-analysis. Cloud-center deviation is on left and boundary is on right. The various lines are (all-bits, high-bits, low-bits) and (forward-bits, reverse-bits, byte-reverse-bits), and combinations of those groups.

    I never wrote a parser for PractRand output, as I would not know what to do with the full p-value dump (-a), since so many of the the statistics are deeply correlated (unlike Big Crush, which the 254 p-vals are minimally correlated).
    1920 x 1040 - 76K
  • evanhevanh Posts: 15,915
    For Practrand, because it grows the testing as the tests keep passing, I just use the fail point in processed data size as the final score. It's so much easier to handle this way.

    I never put much effort into a solution for Crush but I had observed that counting the number of pass/fail tests was one possible way.

    Late in the effort I went looking for a way to impliement Crush in my testing and got the best answer from Melisa's work. It resulted in two versions: One as a generic piped input, and the other one as a compiled in tuned generator. Both were simpler than what I could glean from the documentation, which never worked anyway.

    Here's the piped version(It's a lot cleaner looking). I've used read() instead of getchar() in the hopes it might have less overhead:
    #include <stdint.h>
    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
    
    #include "unif01.h"
    #include "bbattery.h"
    
    
    
    uint32_t  chunk_bits( void )
    {
    	static uint32_t  buff[260];
    	static int  emitidx = 0;
    	uint32_t  output;
    
    
    	if( emitidx == 0 )
    		if( fread( buff, sizeof(uint32_t), 256, stdin ) != 256 )
    		{
    			printf( "Error!  Stream incomplete.\n" );
    			exit(2);
    		}
    
    	// Syphon out the random data.  Maybe more sensitive with all 32-bits swapped.
    	output = buff[emitidx++];
    	if( emitidx >= 256 )
    		emitidx = 0;
    
    	return output;
    }
    
    
    
    int  main( void )
    {
    	// Create TestU01 PRNG object for our generator
    	unif01_Gen  *gen = unif01_CreateExternGenBits( "stdin32", chunk_bits );
    
    	// Run the tests.
    #if CRUSH == 1
    	bbattery_SmallCrush( gen );
    #elif CRUSH == 2
    	bbattery_Crush( gen );
    #elif CRUSH == 3
    	bbattery_BigCrush( gen );
    #else
    	printf( "No test-battery selected!\n" );
    #endif
    
    	// Clean up.
    	unif01_DeleteExternGenBits( gen );
    
    	return 0;
    }
    
  • evanhevanh Posts: 15,915
    Tony,
    Here's another distribution run dataset. This one from the recent streamlined "plusmultplus".
    	rngword_t  result = rotl( s0 + s1, CONSTANT_D ) + s1 + (s1 << 2);
    
  • evanh wrote: »
    Tony,
    Here's another distribution run dataset. This one from the recent streamlined "plusmultplus".
    	rngword_t  result = rotl( s0 + s1, CONSTANT_D ) + s1 + (s1 << 2);
    

    Is s1 * 5 better than s0 * 5?

    I'm calling this xoroshiro32+x+, as cannot use +*+ in filename. Below is overall top 20 for prank+zrank+nzrank, followed by top 10 prank/zrank/nzrank not in overall top 20 plus [13,5,10,9].

    xoroshiro32+x+[a,b,c,d]
    #a, b, c, d,   prank,   zrank,  nzrank,    rank,   pfreq,   zfreq,  nzfreq  
    13, 3,14, 9,      26,       1,      12,       1,      19,      30,     179
     7, 1, 8, 4,       9,       5,      37,       2,      10,      66,     304
     7, 1, 8,13,      56,       3,       9,       3,      54,      60,     162
    13, 5,10,11,      52,       2,      16,       4,      46,      43,     239
     5, 2, 6,10,      61,      23,       7,       5,      58,     142,     156
     2, 1,11, 4,      55,      32,       8,       6,      52,     171,     162
     4, 7,15, 6,      87,       4,       5,       7,     109,      61,     139
     3, 2, 6,10,       2,      36,      68,       8,       7,     180,     401
     9, 8,14,11,      66,      21,      22,       9,      63,     130,     267
     5, 8, 4, 9,      18,      28,      64,      10,      15,     160,     388
    15, 6, 2,10,       5,      73,      41,      11,       8,     282,     327
     7, 2,10, 4,      23,      29,      69,      12,      17,     165,     407
     7, 2,14, 6,     114,       8,       1,      13,     170,      89,      68
    14,11,13,11,      35,      12,      90,      14,      27,     102,     470
    12, 8,11,10,      63,      20,      56,      15,      59,     128,     363
    14, 2, 7,10,      28,      17,      97,      16,      20,     122,     486
    13,12,14, 7,     138,       6,       3,      17,     242,      72,     128
    14,11, 3,13,     128,      18,       6,      18,     210,     124,     156
    13, 5,10, 7,      27,      79,      49,      19,      19,     294,     347
    10, 3,11, 4,      62,      54,      40,      20,      58,     224,     325
    
    11, 7,10, 7,     135,       7,      20,      21,     238,      86,     259
    15, 4,12, 6,     147,      58,       2,      38,     273,     228,     115
    11, 8,12, 9,     189,      50,      10,      48,     460,     215,     166
     2,11, 3, 7,     214,      10,      27,      50,     568,      92,     277
    11, 7,10, 8,     154,       9,      98,      52,     293,      92,     500
    13, 5, 8,10,       1,      93,     201,      57,       6,     332,     789
     8,15, 7, 6,       3,     115,     255,      71,       8,     381,     967
    14, 3,13, 9,     263,     138,       4,      80,     838,     454,     136
    14, 8, 9, 6,       7,     107,     336,      88,       9,     366,    1274
    14, 2, 9, 4,       6,     201,     310,     110,       8,     676,    1163
    13, 5, 8, 5,       8,     290,     305,     144,      10,     991,    1137
    13, 5,10, 9,      17,     283,     368,     180,      15,     961,    1393
    11, 1,14,12,       4,     305,     539,     253,       8,    1091,    2200
    10,10, 7, 6,      10,     900,     940,     653,      11,    13257,  12342
    

    xoroshiro32++ rankings here:
    http://forums.parallax.com/discussion/comment/1468718/#Comment_1468718
  • evanhevanh Posts: 15,915
    edited 2019-04-19 02:22
    I know that [14 8 9 6] is high scoring.
    TonyB_ wrote: »
    Is s1 * 5 better than s0 * 5?
    :shrug: I'll do a distribution run for that too ...
  • evanhevanh Posts: 15,915
    edited 2019-04-20 11:50
    Here we go. I've tacked a zero on the name to distinguish it.

    EDIT: To put that name in context, it's kind of formally named as part of the new collection. See second third attachment. So I'm using this to test out the revised scripts.

    EDIT2: Third Second attachment is three csv files for the three distributions. I don't know if they'll be any use but I've added building these to the automation anyway.

    EDIT4: Refresh the algorithms file.
  • evanhevanh Posts: 15,915
    edited 2019-04-20 00:40
    Here's the 34 grid scores for Tony's "xom" selection above.

    I also bumped into a slightly higher grid score with candidate [14 1 11 8]
    __________________________________________________________________________________________________________
      Gridded scores of single iterated Xoroshiro32(16)plusmultplus candidate [14 1 11 8].
          Byte Sampled Double Full Period = 8 GB
          PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
        |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15==
     16 |  512M    1G  512M  512M    1G    1G  512M  512M  256M  512M  512M  512M  512M  512M  512M  512M
     08 |    1G  512M  512M    1G    1G    2G    1G    2G    1G    2G    2G    1G    1G    1G    2G    2G
     04 |  512M    1G    2G    2G    2G    2G    2G    4G    2G    2G    2G    4G    2G    2G    2G    1G
      Exponent Minimum = 28   Exponent Average = 30.083
    
    
  • evanhevanh Posts: 15,915
    edited 2019-04-20 01:14
    Here's the ten best, I think*, grid scores from last year's ++ effort. Notable is there is three than have minimum's of 29, [13 5 10 9] being one. While, just as notably, the averages are all within 0.5 of each other.

    * We spent a lot of time doing double iterated scoring so I'm not sure where these scores actually came from.

    EDIT: Ah, these were ten best distributions by the looks. Actually, there was twelve but a couple were lower scoring.
  • evanhevanh Posts: 15,915
    edited 2019-04-20 01:28
    Here's the twelve best scores, from the old lists anyway. The average is a fraction higher but there is now seven minimums of 29.

    I'm thinking that with a more exhaustive search of Chris's newest scramblers we'll get some higher averages, say 30.5.
  • evanh wrote: »
    Here we go. I've tacked a zero on the name to distinguish it.

    xom0 rankings:
    #a, b, c, d,   prank,   zrank,  nzrank,    rank,   pfreq,   zfreq,  nzfreq  
    11, 7,10,10,      21,       6,       7,       1,      18,      74,     172
     3, 2, 6, 9,      29,       4,      10,       2,      24,      66,     185
     8, 7,15, 6,       8,      28,      43,       3,      12,     135,     334
    15, 6, 2,10,      17,       8,      47,       4,      16,      86,     353
    15, 7, 4,10,      42,       1,      14,       5,      31,      53,     221
     2, 9, 9, 4,      37,       3,      26,       6,      29,      63,     280
     5, 2, 6, 8,      34,      10,      29,       7,      27,      93,     303
    11, 8,12, 6,      30,      12,      37,       8,      24,     103,     319
     3, 2, 6,12,      47,       2,      17,       9,      36,      56,     230
     4, 8, 5, 7,       1,      47,      91,      10,       5,     192,     457
    13,12,14,11,      20,      43,      57,      11,      18,     185,     373
    13,12,14, 7,      13,      61,      69,      12,      14,     229,     396
    12,10,11,12,      63,      16,      18,      13,      56,     118,     238
     9, 2,14,10,      10,      70,      74,      14,      12,     250,     410
    13, 3,14,12,      75,      13,      20,      15,      75,     108,     244
     4, 1, 9, 6,      43,      58,      40,      16,      32,     214,     327
     3,11,14, 5,      76,       9,      24,      17,      75,      92,     271
     4, 8, 5, 9,      73,      17,      33,      18,      73,     121,     313
     6, 2, 3, 7,      74,      15,      41,      19,      75,     117,     329
     8, 7,15, 5,      18,      37,     132,      20,      16,     163,     586
    
    14, 3,13,11,     103,       7,      19,      23,     114,      74,     239
     2, 1,15,10,     100,      26,       8,      24,     111,     134,     172
    11,14, 6,11,      96,       5,      56,      27,     107,      72,     373
    14,11,13, 7,       6,      30,     215,      29,      10,     138,     780
     5,14,12, 8,     154,      64,       1,      50,     245,     235,     103
     5, 8, 4, 5,       7,     147,     238,      57,      11,     425,     852
     3, 2, 6, 6,       3,     164,     268,      65,       6,     487,     955
     3,11,14, 7,     221,      39,       4,      75,     478,     170,     161
     3,11, 2, 5,     181,     133,       5,      79,     298,     390,     163
     5, 8, 4,11,       5,      96,     442,      93,       9,     303,    1756
    15, 7, 4, 5,       9,     258,     341,     104,      12,     784,    1224
     6, 2, 3, 5,     253,     120,       2,     110,     640,     361,     145
    15, 7, 8, 3,       4,     235,     527,     144,       6,     716,    2235
    10, 7,11,12,     278,     260,       6,     160,     774,     792,     169
    13,11,14,11,     356,     247,       3,     195,    1324,     753,     151
     6, 2, 5, 7,     345,     279,       9,     199,    1241,     848,     173
     9, 2,14, 5,       2,     729,     834,     383,       5,    5351,    7809
    
    13, 5, 8,10,     228,     317,     570,     312,     514,    1017,    2537
    13, 5,10, 9,     986,     886,     872,     944,   51221,   13371,    9797
    
  • evanhevanh Posts: 15,915
    Ha, that selection definitely produce better grid scores. A top average of 30.2 is notably better than anything else so far. There's a lot of variation but seven minimums of 29 also whips the Smile off (s1 * 5), which got none at all.
  • evanhevanh Posts: 15,915
    So, I'm all for focusing on s0 * 5 from now on.
  • evanhevanh Posts: 15,915
    edited 2019-04-20 11:53
    I've now adopted the shorthand scrambler naming, eg: +x+, in the complete names for the algorithms. It seems to flow nicely through the scripts and filenames without issue.

    EDIT: Accordingly, I've updated the algorithms zip file above - https://forums.parallax.com/discussion/comment/1469195/#Comment_1469195
  • evanh wrote: »
    So, I'm all for focusing on s0 * 5 from now on.
    That makes some sense now, since s0 * 5 was required for maintaining 2-D equidistribution of my streaming version across all streams.
    That would tend to indicate a form of hidden order that would extend to the full period statistics you are measuring.
    That can't happen with s1 * 5 (for some reason), but makes for some good PractRand scores at 32-bit.
    Hopefully the combination of s0 * 5, without the ( s0+s1 ) * 5 that I am using, will have enough effect on all stats for some candidates.
  • evanhevanh Posts: 15,915
    edited 2019-04-20 16:12
    And on that note, here's my sources in their latest glory for you to try out. It's not everything but what I've updated for the runs we've been doing in the last couple of weeks.

    All the scripts expect the src directory to be present and they will build all workings in a fresh directory called tests. I'm using Ubuntu (Kubuntu) as my OS so there is likely some assumptions on what is default installed like GCC and Bash and Awk and Grep and likely others of similar ilk.

    General flow is: find-fullp-candidates -> run-distribution -> merge-distribution -> sort the result and trim to acceptable number of candidates (Tony is doing this for me) -> paste into candidates.cfg file -> run-grid-scoring -> assemble-grids

    You will find results like the grid tables I've posted will be created within relevant subdirectory of tests directory.
  • evanhevanh Posts: 15,915
    Oh, I forgot about Practrand. The compiled copy I've made is oddly named as RNG_test.unchanged, and sits in the same base directory as those scripts. The name and path to Practrand can be edited at the top of run-grid-scoring.sh

    Ha, and looking at the top of the scripts, I see I've accidental left enabled the double iteration switch in them all. You'll most likely want to comment that out.
  • xoroshironotxoroshironot Posts: 309
    edited 2019-04-20 22:25
    Thanks!

    I'll have have to study the nix PractRand build, as I have never used it.
    I am curious about that... how many seconds (one instance) to 1GB using: 'stdin -tf 2 -te 1 -tlmin 1KB -multithreaded' on your 8-core 4 GHz CPU?
  • evanhevanh Posts: 15,915
    edited 2019-04-21 01:22
    Long forgotten the answer ... although I did just add something to the report files ... 2 mins 40 sec for 2GB with eight paralleled single-tasking jobs ...

  • evanhevanh Posts: 15,915
    97 seconds under same condition but for 1 GB. Both were double iterated cases.

    Doing a special run (back at single iterated):
    - Used PractRand v0.93 options: stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
    - tests/engine-xom0-s16/pract/a5b14c12/case-a5b14c12d8w16p0 produced a failure score at 1 GB.
    - Forcing a single case at a time netted me 59 seconds.
  • evanhevanh Posts: 15,915
    edited 2019-04-21 02:57
    Allowing parallel tasks again, for the same single iterated candidate, the per case time jumps from 60(59) seconds to 80 seconds for 1 GB score. Which is bang on half the 2:40.

    Not sure what to make of that 97 seconds now. Maybe I miss-read it, I'm working with datestamps. EDIT: Oh, it was during normal use of computer and all these scripts intentionally lower the task priority so I can do other stuff while waiting for results. Maybe I'd done something that was slowing the gridding down for that particular case.
  • evanhevanh Posts: 15,915
    Okay, it's not that simple. I've got a 2 GB score here in just 95 seconds. Looks like there is a lot of variation with Practrand speed, depending on the data stream content. Maybe singletasking is more consistent.

  • Thanks for the results... those seem consistent with Windows binaries.
    Just brought my small workstation to idle, a 6-core/12-thread Intel Xeon W3690 @ 3.8 GHz.
    Piping the output of a 32-bit exe to PractRand:
    1 GB = 80 seconds
    2GB = 105 seconds
    PractRand reporting overhead is quite a substantial fraction until you get to about 32 GB.
    I know my dual CPU workstation is about 25% faster since it can satisfy all of the threads from Practrand.
  • evanhevanh Posts: 15,915
    evanh wrote: »
    Okay, it's not that simple. I've got a 2 GB score here in just 95 seconds. Looks like there is a lot of variation with Practrand speed, depending on the data stream content. Maybe singletasking is more consistent.
    Doh! Of course, although both were running in parallel with 7 other cases, the 160 seconds (2:40) one was without the -multithread option whereas the 95 seconds was with it.

    BTW: I just looked up the W3690 and I can see it's longer in the tooth than I expected, eg: Architecture dated from 2011, with DDR3 RAM. Looks to be the first of the "Core i#" with built-in north-bridge. I presume you got it second hand cheaply. It's performing really well!
Sign In or Register to comment.