Shop OBEX P1 Docs P2 Docs Learn Events
Random/LFSR on P2 - Page 75 — Parallax Forums

Random/LFSR on P2

1727375777892

Comments

  • I have been so busy I have not had time to look over much of the data.
    However, I did notice that Xoroshiro32(16)s1+s0- candidate [6 2 3 6] (that you mentioned) fails one of the new tests in Practrand 0.94, but at the same time that it fails other tests (~512MB).
    Still unusual, though, as others I looked at did not exhibit that behavior.
    If you do decide to use 0.94, start testing at 128KB, as the the occasional false-positive issue in 0.94 from 1KB-64KB should be fixed for the eventual 0.95 release, as I understand it.

    I'm busy with meta-analysis of my xoroshiro128psp (which looks good on its own) in Big Crush as part of a TRNG where I sum its output with AES (counter mode) and Lehmer128. I've got 44 threads running 24/7 for the next few weeks.
  • evanhevanh Posts: 16,032
    Ah, 0.94 is out. I wonder if he's fixed the bug I found ...
  • evanhevanh Posts: 16,032
    edited 2019-05-27 08:16
    version 0.94
    	RNGs
    		xsm64
    			revised
    			old version was falling well short of intended quality on some seeds
    
    		xsm32
    			revised
    
    		efiix8x384 / efiix16x384 / efiix32x384 / efiix64x384
    			the large array sizes didn't realy help anything, so I shrunk them down to reduce memory & cache footprints
    			where the old iteration table size used to be 128, it's now 32
    			and where the old indirection table size used to be 256, it's now 16
    			so the names are now efiix8x48 instead of efiix8x384, efiix16x48 instead of efiix16x384, etc
    
    	tools
    		RNG_test
    			adjusted formating of raw outputs to better preserve columns during extreme failures
    			changed listed seed for "RNGs" that don't have meaningful seed
    			fixed off-by-1 bug causing evaluation thresholds to be slightly incorrect
    			added new evalution threshold "normalish" between "normal" and "unusual"
    			SeedingTester and EntropyPoolingTester meta-rngs that were only supported by command line options -ttseed64 and -ttep can now be specified as part of the RNG name
    
    		RNG_output
    			increased buffer size (again)
    			fixed endianness issue (again?)
    			now supports SeedingTester and EntropyPoolingTester meta-rngs
    			adjust to how very large Raw values are printed, to preserve column alignment
    
    	Test Batteries:
    		core tests:
    			added a new test, mod3n
    				it doesn't help on all that many PRNGs, but the ones it does help with it helps a lot
    				and it doesn't take much resources (largely because it's set up to only test a small fraction of the data stream)
    				this one is inspired the mod3 test in jgrand
    			added a new test, TMFn
    				it doesn't help on all that many PRNGs, but the ones it does help with it helps a lot
    				and it doesn't take much resources (largely because it's set up to only test a small fraction of the data stream)
    				this one primarily targets LCGs
    			FPF
    				removed "all2" subtest
    
    		expanded tests:
    			added a new test, mod3n
    			added a new test, TMFn
    			added a new test, BirthdaySystematic
    			various other changes?
    
    	tests
    		DistC6
    			fixed bug that effected the parameterizations used in the expanded test set
    				bug caused crashes on some platforms, had little to no effect on others
    
    		BRank
    			fixed semi-minor bug (again?)
    
    		transforms
    			FirstNofM
    				bugfixes; however, I don't think this was actually used anywhere in PractRand before, so probably nothing visible
    
    		mod3n
    			based upon the mod3 test in gjrand (by Geronimo Jones)
    			but it targets 8, 16, 32, 64, 128, etc bit blocks, all simultaneously, the same way BCFN does
    			it does not find bias in a very wide variety of PRNGs, but several of the ones catches are otherwise very difficult to find bias in (flea32x1, ara32, ibaa32)
    
    		TMFn
    			intended to catch LCGs.  I may replace it with a different test intended for the same purpose in a later version.
    
    	rng sets for testing
    		adding to ease testing automation
    
  • evanhevanh Posts: 16,032
    Hmm, the one I found was coming from Gap16 tests with very specific data input - I posted a binary input file here for him to work with. It would just lock up in a busy loop due to a divide-by-zero.

    I'm not seeing Gap16 listed in the fixes.

    I'm not sure the false positives are the same ones I'm dealing with either. I found the BCFN tests, though they were most obvious in the smaller sizes, could prematurely exit at any point. I have a whole script that splits off each test case just for sequencing the retries of Practrand.

    Time to try v0.94 out I guess ...
  • evanhevanh Posts: 16,032
    edited 2019-05-27 08:49
    I notified ChrisD of both last year, March and May respectively.

    Here's the binary input data for locking up practrand - https://forums.parallax.com/discussion/comment/1432700/#Comment_1432700
    And piping command used:
    cat ran32out-s8.bin | ./RNG_test stdin -tlmin 1KB
    or
    cat ran32out-s8.bin | ./RNG_test stdin -te 1 -tf 2 -tlmin 1KB -tlmaxonly
    
  • evanhevanh Posts: 16,032
    edited 2019-05-27 09:22
    Bother, 0.94 isn't compiling cleanly for me. Had a couple of case sensitive changes required to find the needed files. This indicates he's not on Linux for his regular development. Now it's complaining about undefined references during linking. :(
    Oh, it's more filename mismatches than I first thought. I'm going to have to edit the sources ...
  • evanhevanh Posts: 16,032
    edited 2019-05-27 09:48
    Practrand 0.94 also locks up on that data file. Here's the output:
    $ cat ran32out-s8.bin | ./RNG_test stdin -te 1 -tf 2 -tlmin 1KB -tlmaxonly
    RNG_test using PractRand version 0.94
    RNG = RNG_stdin, seed = unknown
    test set = expanded, folding = extra
    
    rng=RNG_stdin, seed=unknown
    length= 1 kilobyte (2^10 bytes), time= 0.4 seconds
      no anomalies in 16 test result(s)
    
    rng=RNG_stdin, seed=unknown
    length= 2 kilobytes (2^11 bytes), time= 0.7 seconds
      no anomalies in 21 test result(s)
    
    rng=RNG_stdin, seed=unknown
    length= 4 kilobytes (2^12 bytes), time= 1.0 seconds
      no anomalies in 61 test result(s)
    
    rng=RNG_stdin, seed=unknown
    length= 8 kilobytes (2^13 bytes), time= 1.7 seconds
      Test Name                         Raw       Processed     Evaluation
      DC6-9x1Bytes-1                    R= +10.7  p =  5.9e-5   mildly suspicious
      ...and 125 test result(s) without anomalies
    
    rng=RNG_stdin, seed=unknown
    length= 16 kilobytes (2^14 bytes), time= 2.9 seconds
      Test Name                         Raw       Processed     Evaluation
      DC6-9x1Bytes-1                    R= +10.4  p =  2.9e-5   mildly suspicious
      FPF-14+6/4:(2,14-8)               R= +43.8  p =  1.5e-31    FAIL !!!       
      FPF-14+6/4:(3,14-9)               R= +24.1  p =  1.8e-15    FAIL           
      FPF-14+6/4:(4,14-10)              R= +62.3  p =  9.9e-34    FAIL !!!       
      FPF-14+6/4:all                    R= +39.5  p =  5.6e-27    FAIL !!        
      ...and 189 test result(s) without anomalies
    
    rng=RNG_stdin, seed=unknown
    length= 32 kilobytes (2^15 bytes), time= 4.4 seconds
    ^C
    
  • evanhevanh Posts: 16,032
    edited 2019-05-27 12:35
    Just trying out the -te 10 "Birthday" test. Contrary to what he says in the help text, I'm not seeing a large RAM usage, but it's certainly way more processor hungry. It's sitting just above 60% utilisation of 16 threads with only 4 cases running.

    For some reason it's also creating excessive kernel level burn too. About 40% of a core between them.

    EDIT: Oh, I just realised how much random data was flowing, I noticed this when it clocked over 1 TB for each case! So the kernel burn would have been the pipes operating.

    Okay, I've stopped it now, and don't have a clue what good the reports are. Back to the regular tests I guess.
  • I posted about the lockup issue also, and it will be fixed for 0.95. There is no work-around for that one in 0.94.
    However, it only occurs for me on small state 2-dimensionally equidistributed PRNGs, where the obvious non-randomness creates a loop in one of the statistical tests due to a calculated NAN (not a number): link
    Xoroshiro32++ (and variants we have been reviewing) are unaffected, but still need to start tlmin at 128KB (when using -te 1 -tf 2).
  • evanhevanh Posts: 16,032
    All my automation start at -tlmin 1KB. Only ever had the one issue and that was with an s8 run which was only an experiment anyway. If I hadn't done that I would never have struck the bug.

    Here's my verbatim PM and reply:
    evanh 2018-03-11 - 23:01:45
    Hi Chris,
    Dunno if you'll get this but I'll give it a go.

    I've bumped into a bug in PractRand testing code. Somehow one of my generators produced a random sequence that locks PractRand solid every time. It happens at the 32 KB mark when piping data into RNG_Test using "stdin". When using "stdin16" it occurs at the 64 KB mark. When using "stdin32" it occurs at the 128 KB mark.

    I have a 32 KB binary file of the trigger random data here for testing with. I can post it in the forum if you respond.

    I only know C but I've still managed to follow the cause of the lockup to a -NaN value being passed into sample_to_pvalue() function in tests.cpp. This -NaN is coming from a log() function in g_test() function in math.cpp. It in turn is being fed a zero value which comes from a Gap16 "const Uint64 *count_ = counts.get_array();" array back in test.cpp, which is entirely blank for some reason.

    That's as far as I got.

    scro 2018-03-12 - 11:43:20
    Really! I've never seen one of those before. I'll take a look at fixing it.

    Thanks for letting me know.

    So he did acknowledge it back then. I'm guessing he got sidetracked and forgot.

  • evanhevanh Posts: 16,032
    Looking at the sources for 0.94, the offending zero value is coming from line 1677 of tests.cpp.

    I didn't try to find how that array is supposed to get filled. It felt like a separate, earlier, stage of the testing. I wasn't that keen to understand all the workings.

  • evanh wrote: »
    So he did acknowledge it back then. I'm guessing he got sidetracked and forgot.
    If that is the same -NAN I found, then it is certainly fixed for 0.95, but I suspect other issues/improvements will delay its release.

  • evanhevanh Posts: 16,032
    Ah, your handle is Cerian Knight and that link is to ChrisD's reply to you, isn't it. So v0.95 was already well underway and the bug already corrected in v0.95 before you even reported it.

    That makes more sense now. 0.94 must have been too far along to be patched with what can be considered a low value bug fix.
  • evanhevanh Posts: 16,032
    He must throw a huge amount of testing at each release.
  • xoroshironotxoroshironot Posts: 309
    edited 2019-05-28 02:07
    Yes to all of the above. I am happy to wait for 0.95 (and beyond) if it means:
    1. More accurate and stable small state analysis.
    2. More comprehensive large state analysis.

    As far as #2 goes, the TestU01 Big Crush meta-analysis I am doing should be simple, except for the fact that some of the statistics (at least 20 of 254, some important) are skewed due to the use of approximations in the p-val calculation.
    Excluding those or using corrections, I have refined the process to the point where I see evidence of previously undisclosed flaws in many common PRNGs I have tested... so if I am the only one seeing this, then obviously more testing and validation is required.
    Alternatively, another already fairly prominent tool would find some of the same issues I see...
    I was hoping PractRand would one day grow beyond TestU01, as the former is easy to use, whereas the latter requires much more than my basic knowledge of C programming to fully take advantage of its extended features (e.g. automatic repetitions, etc.).
  • evanhevanh Posts: 16,032
    Hehe, I'm no fan of C++ either but, like most others that have used Practrand, I didn't have to know a thing about it because of the "stdin" command line option. All had to do was build the full program and then use that one feature.

    TestU01 should have come with an equivalent example build. I couldn't make sense of compiling the sources without first reading Melisa's uses. It's almost like the API had changed and they'd never updated the examples.
  • evanhevanh Posts: 16,032
    Here's the ChiSquared sorted distributions for the xom group of algorithm variants in CSV [tab] format. The cfg file contains expanded naming.
  • evanhevanh Posts: 16,032
    ... but still need to start tlmin at 128KB (when using -te 1 -tf 2).
    Ah, this seems to be new to v0.94. Things weren't looking right with the gridding since upgrading Practrand from v0.93. I decided it was time to apply my new found knowledge about sorted Bash arrays to the gridded score averages.

    Below is two lists side by side, left-hand list comes from scoring with Practrand v0.94 and the right-hand list comes from scoring with Practrand v0.93. As you see, there is a large number of <17 minimums for the v0.94 run.

    Looking at the Practrand reports themselves, these new anomalies are all coming from FPF tests. Whereas the prior situation in v0.93 was all from BCFN tests.
    Scoring summary for single iterated sampling of Xoroshiro32(16)+x0+
    Candidate	ExpMin	ExpAvg	ExpMax		Candidate	ExpMin	ExpAvg	ExpMax
    [4 8 5 7]	29	30.437	32		[4 8 5 7]	29	30.187	32
    [15 7 4 10]	28	30.187	32		[3 2 6 6]	29	30.166	32
    [3 11 2 9]	28	30.166	32		[5 2 6 8]	29	30.02	32
    [6 2 3 7]	26	30.416	32		[13 5 10 9]	29	29.916	32
    [15 7 8 3]	26	28.02	31		[15 7 4 10]	28	29.937	32
    [10 5 13 12]	25	29.729	32		[3 11 2 9]	28	29.875	32
    [14 11 13 7]	24	28.895	32		[4 8 5 9]	28	29.812	32
    [4 1 9 8]	23	27.666	32		[11 14 6 11]	28	29.604	31
    [15 6 2 10]	22	29.625	32		[3 2 6 9]	27	30.02	32
    [5 2 6 8]	16	30.229	32		[11 7 10 10]	27	29.833	32
    [3 2 6 6]	16	30.208	33		[8 9 13 6]	27	29.729	32
    [13 5 10 9]	16	29.895	32		[2 9 9 4]	27	29.25	30
    [4 8 5 9]	15	29.5	33		[9 2 14 5]	27	29.104	31
    [5 8 4 5]	15	29.458	32		[6 2 3 7]	26	30.083	32
    [2 6 15 9]	15	29.187	32		[13 5 8 10]	26	29.562	32
    [13 12 14 7]	15	27.25	31		[11 8 12 6]	26	29.541	32
    [13 5 8 10]	14	28.583	32		[3 11 14 5]	26	28.833	32
    [8 7 15 5]	13	28.416	31		[12 10 11 12]	25	29.854	32
    [13 3 14 12]	13	28.291	32		[5 8 4 5]	25	29.729	32
    [13 12 14 11]	13	28.02	32		[14 3 13 11]	25	29.583	32
    [3 11 14 5]	13	27.875	32		[10 5 13 12]	25	29.25	32
    [11 7 10 10]	12	29.27	32		[9 2 14 10]	25	29.125	32
    [14 3 13 11]	12	29.145	32		[8 7 15 5]	25	29.104	31
    [9 2 14 5]	12	27.52	32		[13 3 14 12]	25	29.041	32
    [4 1 9 6]	12	26.812	32		[15 7 8 3]	25	27.75	30
    [8 9 13 6]	10	29.791	32		[14 11 13 7]	24	28.5	31
    [3 2 6 9]	10	29.333	32		[2 6 15 9]	23	29.5	32
    [12 10 11 12]	10	29	32		[13 12 14 7]	23	28.187	31
    [11 8 12 6]	10	28.854	32		[13 12 14 11]	23	28.041	32
    [9 2 14 10]	10	28.166	32		[4 1 9 8]	23	27.687	32
    [5 8 4 11]	10	27.77	32		[4 1 9 6]	23	27.291	32
    [2 9 9 4]	10	27.437	31		[15 6 2 10]	22	29.395	32
    [11 14 6 11]	10	26.75	32		[8 7 15 6]	22	28.333	30
    [8 7 15 6]	10	25.708	31		[5 8 4 11]	22	27.854	32
    [3 2 6 12]	10	25.583	32		[3 2 6 12]	22	27.166	32
    [2 1 15 10]	10	25.333	30		[2 1 15 10]	20	25.416	29
    
  • xoroshironotxoroshironot Posts: 309
    edited 2019-06-03 01:34
    The ExpMin < 17 on FPF 32/64 is a false positive, since when the 32/64 tests were added in 0.94 they were not calibrated below that level.
    I believe even (the extremely rare) 2^17 FPF 32/64 failures should be retested with a different seed (or restarted with tlmin at 256KB, which cannot false positive, in my experience).
    There is a reply to my post on that subject: link
  • evanhevanh Posts: 16,032
    edited 2019-06-03 02:20
    Figures. I hadn't looked that close to notice they were the new tests. The BFCN situation seemed pretty similar in that the pValue was only just bumping over the fail threshold but was sparsely distributed instead.

    And guess what? Looking through the 0.94 reports now I can't see any BCFN false positives. Presumably he retuned those tests in 0.94. I probably wasn't the first to notice it.

    Amusingly, the script logic has noticed some scores are the result of BCFN tests failing but the subsequent automatic retest has confirmed they are all valid fails. I haven't eyeballed every one of these though, so I can't say with 100% certainty.

    EDIT: Huh, the BCFN retests in v0.94 are all confined to just two, of 36, candidates: [4 1 9 6] and [4 1 9 8].
  • evanhevanh Posts: 16,032
    evanh wrote: »
    The BFCN situation seemed pretty similar in that the pValue was only just bumping over the fail threshold but was sparsely distributed instead.
    The detail that I relied on was the false bump was always steep sided. It didn't have a gradual build up and fall off either side of the peak. At most only two sizes were involve, so retesting one size up was usually enough for skipping past it.
  • TonyB_TonyB_ Posts: 2,196
    edited 2019-06-05 11:33
    evanh wrote: »
    Here's the ChiSquared sorted distributions for the xom group of algorithm variants in CSV [tab] format. The cfg file contains expanded naming.

    Thanks for the results, Evan. A few comments:

    1. You've used a weighting of two for prank and one for zrank & nzrank. An equal weighting of one, which I've used in the past, does not change the best result for any +/- combination and makes relatively little difference to each top 20.

    2. I'm not convinced about your names for the different +/- options. For one thing, it is hard to identify the usually similar positive/negative pairs (e.g. +++/---).

    3. The innovation we have been studying is to add some value, call it X for now, to s0 during the same time that s0 and s1 are added, then add the latter sum rotated to the former sum. Here add can mean add or subtract. Thus PRN = rotl(+/- s1 +/- s0, d ) +/- (s0 + X).

    X has been chosen deliberately to be an even multiple of s0 so that (s0 + X) is an odd multiple of s0. We could introduce a fifth constant, e, such that s0 * e = s0 + X. e = 5 produces better distributions than e = 1 (the original xoroshiro++) so does this mean e = 3 would be somewhere in between? Or could we increase e > 5 and get better results?

    Instead of using non-obvious names, [a,b,c,d,e]+++ to [a,b,c,d,e]--- could identify the eight different +/- options, where
    [a,b,c,d,1]+++ = [a,b,c,d] ++ = original xoroshiro++ algorithm
    

    4. The best result for the new distributions you posted is this one:
    Algorithm		Candidate	pchi	zchi	nzchi	prank	zrank	nzrank
    Xoroshiro32(16)-s1+s0-x	[12 4 15 7]	9	46	159	4	2	6
    

    which could be known as xoroshiro[12,4,15,7,5]-+-
  • evanhevanh Posts: 16,032
    I presume it's important that X is a multiple of s0, ie: not a constant.

    As for the naming, that's all a bit hackish anyway. That particular edition was from before I started down the "try all combinations" path. I've just left it alone. The other seven editions, as per your example, are more long-winded naming.

    I've done a number of grid runs now, though not the Xoroshiro32(16)-s1+s0-x run yet. Have been repairing/changing some scripts in the process.
  • evanhevanh Posts: 16,032
    Oh, ah, for point #2, what do you mean by "usually similar positive/negative pairs"?
  • evanh wrote: »
    Oh, ah, for point #2, what do you mean by "usually similar positive/negative pairs"?

    Here's what I mean. Best scores for xoroshiro[a,b,c,d,5]??? are:
    ???,	a,b,c,d
    
    +++,	11,7,10,10
    ---,	11,7,10,10
    
    ++-,	13,3,14,8
    --+,	13,3,14,8
    
    +-+,	12,4,15,7
    -+-,	12,4,15,7
    
    +--,	3,2,6,8
    -++,	3,2,6,8
    
  • At some point, increasing e will decrease quality as fewer bits of s0 are used twice (in different positions) in the s0 * e addition. Is e = 5 best, or is it 3 or 7 or 9?
  • evanhevanh Posts: 16,032
    Chris recommended x5. I just followed instructions.

    Huh, that sign pairing really stands out. I've just completed the first grid run for xom10 (-++) and comparing it to the earlier xom03 (+--) it comes out remarkably close sorting order. And not just the order either, matching candidate grid score averages are close too. So, not surprisingly, whole gridding run averages are also close: 29.584 for xom03, 29.626 for xom10.
  • evanh wrote: »
    TonyB_ wrote: »
    At some point, increasing e will decrease quality as fewer bits of s0 are used twice (in different positions) in the s0 * e addition. Is e = 5 best, or is it 3 or 7 or 9?
    Chris recommended x5. I just followed instructions.

    s0 * 5 is an extra 14-bit adder for each iteration compared to just s0, therefore the double iteration in XORO32 would use 92 add/subtract bits instead of 64 now. No extra routing resources should be needed for s0 + s0 * 4 as both operands are the same.
  • xoroshironotxoroshironot Posts: 309
    edited 2019-06-10 00:38
    TonyB_ wrote: »
    evanh wrote: »
    TonyB_ wrote: »
    At some point, increasing e will decrease quality as fewer bits of s0 are used twice (in different positions) in the s0 * e addition. Is e = 5 best, or is it 3 or 7 or 9?
    Chris recommended x5. I just followed instructions.

    s0 * 5 is an extra 14-bit adder for each iteration compared to just s0, therefore the double iteration in XORO32 would use 92 add/subtract bits instead of 64 now. No extra routing resources should be needed for s0 + s0 * 4 as both operands are the same.

    I notice: s0 + s0 * 4, from a hardware implementation standpoint, would likely be nearly the same as s0 + rotl(s0, 2).

    The worth of those two (normally discarded) bits might be high, with the right constants, but equidistribution would have to be verified.

    I have not tested this idea yet in my own code (which uses two * 5s), but from a software implementation standpoint, I suspect it would likely overburden an already saturated CPU pipeline.
  • evanhevanh Posts: 16,032
    In terms of the execution stages of the pipeline, even though there is two clock cycles allotted, it really all happens in the first of the two clocks. Chip has mentioned the second execution clock is result mux'ing only - Which presumably means there is a wide mass of flops in the middle of execution. I think this was a design decision Chip made to eliminate a nightmare in the critical path analysis.
Sign In or Register to comment.