Random/LFSR on P2

xoroshironot · 2019-05-27 02:52

I have been so busy I have not had time to look over much of the data.
However, I did notice that Xoroshiro32(16)s1+s0- candidate [6 2 3 6] (that you mentioned) fails one of the new tests in Practrand 0.94, but at the same time that it fails other tests (~512MB).
Still unusual, though, as others I looked at did not exhibit that behavior.
If you do decide to use 0.94, start testing at 128KB, as the the occasional false-positive issue in 0.94 from 1KB-64KB should be fixed for the eventual 0.95 release, as I understand it.

I'm busy with meta-analysis of my xoroshiro128psp (which looks good on its own) in Big Crush as part of a TRNG where I sum its output with AES (counter mode) and Lehmer128. I've got 44 threads running 24/7 for the next few weeks.

evanh · 2019-05-27 07:58

Ah, 0.94 is out. I wonder if he's fixed the bug I found ...

evanh · 2019-05-27 08:15

version 0.94
	RNGs
		xsm64
			revised
			old version was falling well short of intended quality on some seeds

		xsm32
			revised

		efiix8x384 / efiix16x384 / efiix32x384 / efiix64x384
			the large array sizes didn't realy help anything, so I shrunk them down to reduce memory & cache footprints
			where the old iteration table size used to be 128, it's now 32
			and where the old indirection table size used to be 256, it's now 16
			so the names are now efiix8x48 instead of efiix8x384, efiix16x48 instead of efiix16x384, etc

	tools
		RNG_test
			adjusted formating of raw outputs to better preserve columns during extreme failures
			changed listed seed for "RNGs" that don't have meaningful seed
			fixed off-by-1 bug causing evaluation thresholds to be slightly incorrect
			added new evalution threshold "normalish" between "normal" and "unusual"
			SeedingTester and EntropyPoolingTester meta-rngs that were only supported by command line options -ttseed64 and -ttep can now be specified as part of the RNG name

		RNG_output
			increased buffer size (again)
			fixed endianness issue (again?)
			now supports SeedingTester and EntropyPoolingTester meta-rngs
			adjust to how very large Raw values are printed, to preserve column alignment

	Test Batteries:
		core tests:
			added a new test, mod3n
				it doesn't help on all that many PRNGs, but the ones it does help with it helps a lot
				and it doesn't take much resources (largely because it's set up to only test a small fraction of the data stream)
				this one is inspired the mod3 test in jgrand
			added a new test, TMFn
				it doesn't help on all that many PRNGs, but the ones it does help with it helps a lot
				and it doesn't take much resources (largely because it's set up to only test a small fraction of the data stream)
				this one primarily targets LCGs
			FPF
				removed "all2" subtest

		expanded tests:
			added a new test, mod3n
			added a new test, TMFn
			added a new test, BirthdaySystematic
			various other changes?

	tests
		DistC6
			fixed bug that effected the parameterizations used in the expanded test set
				bug caused crashes on some platforms, had little to no effect on others

		BRank
			fixed semi-minor bug (again?)

		transforms
			FirstNofM
				bugfixes; however, I don't think this was actually used anywhere in PractRand before, so probably nothing visible

		mod3n
			based upon the mod3 test in gjrand (by Geronimo Jones)
			but it targets 8, 16, 32, 64, 128, etc bit blocks, all simultaneously, the same way BCFN does
			it does not find bias in a very wide variety of PRNGs, but several of the ones catches are otherwise very difficult to find bias in (flea32x1, ara32, ibaa32)

		TMFn
			intended to catch LCGs.  I may replace it with a different test intended for the same purpose in a later version.

	rng sets for testing
		adding to ease testing automation

evanh · 2019-05-27 08:31

Hmm, the one I found was coming from Gap16 tests with very specific data input - I posted a binary input file here for him to work with. It would just lock up in a busy loop due to a divide-by-zero.

I'm not seeing Gap16 listed in the fixes.

I'm not sure the false positives are the same ones I'm dealing with either. I found the BCFN tests, though they were most obvious in the smaller sizes, could prematurely exit at any point. I have a whole script that splits off each test case just for sequencing the retries of Practrand.

Time to try v0.94 out I guess ...

evanh · 2019-05-27 08:35

I notified ChrisD of both last year, March and May respectively.

Here's the binary input data for locking up practrand - https://forums.parallax.com/discussion/comment/1432700/#Comment_1432700
And piping command used:

cat ran32out-s8.bin | ./RNG_test stdin -tlmin 1KB
or
cat ran32out-s8.bin | ./RNG_test stdin -te 1 -tf 2 -tlmin 1KB -tlmaxonly

evanh · 2019-05-27 09:16

Bother, 0.94 isn't compiling cleanly for me. Had a couple of case sensitive changes required to find the needed files. This indicates he's not on Linux for his regular development. Now it's complaining about undefined references during linking.

Oh, it's more filename mismatches than I first thought. I'm going to have to edit the sources ...

evanh · 2019-05-27 09:37

Practrand 0.94 also locks up on that data file. Here's the output:

$ cat ran32out-s8.bin | ./RNG_test stdin -te 1 -tf 2 -tlmin 1KB -tlmaxonly
RNG_test using PractRand version 0.94
RNG = RNG_stdin, seed = unknown
test set = expanded, folding = extra

rng=RNG_stdin, seed=unknown
length= 1 kilobyte (2^10 bytes), time= 0.4 seconds
  no anomalies in 16 test result(s)

rng=RNG_stdin, seed=unknown
length= 2 kilobytes (2^11 bytes), time= 0.7 seconds
  no anomalies in 21 test result(s)

rng=RNG_stdin, seed=unknown
length= 4 kilobytes (2^12 bytes), time= 1.0 seconds
  no anomalies in 61 test result(s)

rng=RNG_stdin, seed=unknown
length= 8 kilobytes (2^13 bytes), time= 1.7 seconds
  Test Name                         Raw       Processed     Evaluation
  DC6-9x1Bytes-1                    R= +10.7  p =  5.9e-5   mildly suspicious
  ...and 125 test result(s) without anomalies

rng=RNG_stdin, seed=unknown
length= 16 kilobytes (2^14 bytes), time= 2.9 seconds
  Test Name                         Raw       Processed     Evaluation
  DC6-9x1Bytes-1                    R= +10.4  p =  2.9e-5   mildly suspicious
  FPF-14+6/4:(2,14-8)               R= +43.8  p =  1.5e-31    FAIL !!!       
  FPF-14+6/4:(3,14-9)               R= +24.1  p =  1.8e-15    FAIL           
  FPF-14+6/4:(4,14-10)              R= +62.3  p =  9.9e-34    FAIL !!!       
  FPF-14+6/4:all                    R= +39.5  p =  5.6e-27    FAIL !!        
  ...and 189 test result(s) without anomalies

rng=RNG_stdin, seed=unknown
length= 32 kilobytes (2^15 bytes), time= 4.4 seconds
^C

evanh · 2019-05-27 12:02

Just trying out the -te 10 "Birthday" test. Contrary to what he says in the help text, I'm not seeing a large RAM usage, but it's certainly way more processor hungry. It's sitting just above 60% utilisation of 16 threads with only 4 cases running.

For some reason it's also creating excessive kernel level burn too. About 40% of a core between them.

EDIT: Oh, I just realised how much random data was flowing, I noticed this when it clocked over 1 TB for each case! So the kernel burn would have been the pipes operating.

Okay, I've stopped it now, and don't have a clue what good the reports are. Back to the regular tests I guess.

xoroshironot · 2019-05-27 13:40

I posted about the lockup issue also, and it will be fixed for 0.95. There is no work-around for that one in 0.94.
However, it only occurs for me on small state 2-dimensionally equidistributed PRNGs, where the obvious non-randomness creates a loop in one of the statistical tests due to a calculated NAN (not a number): link
Xoroshiro32++ (and variants we have been reviewing) are unaffected, but still need to start tlmin at 128KB (when using -te 1 -tf 2).

evanh · 2019-05-27 15:16

All my automation start at -tlmin 1KB. Only ever had the one issue and that was with an s8 run which was only an experiment anyway. If I hadn't done that I would never have struck the bug.

Here's my verbatim PM and reply:

evanh 2018-03-11 - 23:01:45
Hi Chris,
Dunno if you'll get this but I'll give it a go.

I've bumped into a bug in PractRand testing code. Somehow one of my generators produced a random sequence that locks PractRand solid every time. It happens at the 32 KB mark when piping data into RNG_Test using "stdin". When using "stdin16" it occurs at the 64 KB mark. When using "stdin32" it occurs at the 128 KB mark.

I have a 32 KB binary file of the trigger random data here for testing with. I can post it in the forum if you respond.

I only know C but I've still managed to follow the cause of the lockup to a -NaN value being passed into sample_to_pvalue() function in tests.cpp. This -NaN is coming from a log() function in g_test() function in math.cpp. It in turn is being fed a zero value which comes from a Gap16 "const Uint64 *count_ = counts.get_array();" array back in test.cpp, which is entirely blank for some reason.

That's as far as I got.

scro 2018-03-12 - 11:43:20
Really! I've never seen one of those before. I'll take a look at fixing it.

Thanks for letting me know.

So he did acknowledge it back then. I'm guessing he got sidetracked and forgot.

evanh · 2019-05-27 15:55

Looking at the sources for 0.94, the offending zero value is coming from line 1677 of tests.cpp.

I didn't try to find how that array is supposed to get filled. It felt like a separate, earlier, stage of the testing. I wasn't that keen to understand all the workings.

xoroshironot · 2019-05-27 16:31

evanh wrote: »

So he did acknowledge it back then. I'm guessing he got sidetracked and forgot.

If that is the same -NAN I found, then it is certainly fixed for 0.95, but I suspect other issues/improvements will delay its release.

evanh · 2019-05-27 16:51

Ah, your handle is Cerian Knight and that link is to ChrisD's reply to you, isn't it. So v0.95 was already well underway and the bug already corrected in v0.95 before you even reported it.

That makes more sense now. 0.94 must have been too far along to be patched with what can be considered a low value bug fix.

evanh · 2019-05-27 16:55

He must throw a huge amount of testing at each release.

xoroshironot · 2019-05-28 01:01

Yes to all of the above. I am happy to wait for 0.95 (and beyond) if it means:
1. More accurate and stable small state analysis.
2. More comprehensive large state analysis.

As far as #2 goes, the TestU01 Big Crush meta-analysis I am doing should be simple, except for the fact that some of the statistics (at least 20 of 254, some important) are skewed due to the use of approximations in the p-val calculation.
Excluding those or using corrections, I have refined the process to the point where I see evidence of previously undisclosed flaws in many common PRNGs I have tested... so if I am the only one seeing this, then obviously more testing and validation is required.
Alternatively, another already fairly prominent tool would find some of the same issues I see...
I was hoping PractRand would one day grow beyond TestU01, as the former is easy to use, whereas the latter requires much more than my basic knowledge of C programming to fully take advantage of its extended features (e.g. automatic repetitions, etc.).

evanh · 2019-05-28 02:55

Hehe, I'm no fan of C++ either but, like most others that have used Practrand, I didn't have to know a thing about it because of the "stdin" command line option. All had to do was build the full program and then use that one feature.

TestU01 should have come with an equivalent example build. I couldn't make sense of compiling the sources without first reading Melisa's uses. It's almost like the API had changed and they'd never updated the examples.

evanh · 2019-05-30 06:10

Here's the ChiSquared sorted distributions for the xom group of algorithm variants in CSV [tab] format. The cfg file contains expanded naming.

evanh · 2019-06-02 14:34

xoroshironot wrote: »

... but still need to start tlmin at 128KB (when using -te 1 -tf 2).

Ah, this seems to be new to v0.94. Things weren't looking right with the gridding since upgrading Practrand from v0.93. I decided it was time to apply my new found knowledge about sorted Bash arrays to the gridded score averages.

Below is two lists side by side, left-hand list comes from scoring with Practrand v0.94 and the right-hand list comes from scoring with Practrand v0.93. As you see, there is a large number of <17 minimums for the v0.94 run.

Looking at the Practrand reports themselves, these new anomalies are all coming from FPF tests. Whereas the prior situation in v0.93 was all from BCFN tests.

Scoring summary for single iterated sampling of Xoroshiro32(16)+x0+
Candidate	ExpMin	ExpAvg	ExpMax		Candidate	ExpMin	ExpAvg	ExpMax
[4 8 5 7]	29	30.437	32		[4 8 5 7]	29	30.187	32
[15 7 4 10]	28	30.187	32		[3 2 6 6]	29	30.166	32
[3 11 2 9]	28	30.166	32		[5 2 6 8]	29	30.02	32
[6 2 3 7]	26	30.416	32		[13 5 10 9]	29	29.916	32
[15 7 8 3]	26	28.02	31		[15 7 4 10]	28	29.937	32
[10 5 13 12]	25	29.729	32		[3 11 2 9]	28	29.875	32
[14 11 13 7]	24	28.895	32		[4 8 5 9]	28	29.812	32
[4 1 9 8]	23	27.666	32		[11 14 6 11]	28	29.604	31
[15 6 2 10]	22	29.625	32		[3 2 6 9]	27	30.02	32
[5 2 6 8]	16	30.229	32		[11 7 10 10]	27	29.833	32
[3 2 6 6]	16	30.208	33		[8 9 13 6]	27	29.729	32
[13 5 10 9]	16	29.895	32		[2 9 9 4]	27	29.25	30
[4 8 5 9]	15	29.5	33		[9 2 14 5]	27	29.104	31
[5 8 4 5]	15	29.458	32		[6 2 3 7]	26	30.083	32
[2 6 15 9]	15	29.187	32		[13 5 8 10]	26	29.562	32
[13 12 14 7]	15	27.25	31		[11 8 12 6]	26	29.541	32
[13 5 8 10]	14	28.583	32		[3 11 14 5]	26	28.833	32
[8 7 15 5]	13	28.416	31		[12 10 11 12]	25	29.854	32
[13 3 14 12]	13	28.291	32		[5 8 4 5]	25	29.729	32
[13 12 14 11]	13	28.02	32		[14 3 13 11]	25	29.583	32
[3 11 14 5]	13	27.875	32		[10 5 13 12]	25	29.25	32
[11 7 10 10]	12	29.27	32		[9 2 14 10]	25	29.125	32
[14 3 13 11]	12	29.145	32		[8 7 15 5]	25	29.104	31
[9 2 14 5]	12	27.52	32		[13 3 14 12]	25	29.041	32
[4 1 9 6]	12	26.812	32		[15 7 8 3]	25	27.75	30
[8 9 13 6]	10	29.791	32		[14 11 13 7]	24	28.5	31
[3 2 6 9]	10	29.333	32		[2 6 15 9]	23	29.5	32
[12 10 11 12]	10	29	32		[13 12 14 7]	23	28.187	31
[11 8 12 6]	10	28.854	32		[13 12 14 11]	23	28.041	32
[9 2 14 10]	10	28.166	32		[4 1 9 8]	23	27.687	32
[5 8 4 11]	10	27.77	32		[4 1 9 6]	23	27.291	32
[2 9 9 4]	10	27.437	31		[15 6 2 10]	22	29.395	32
[11 14 6 11]	10	26.75	32		[8 7 15 6]	22	28.333	30
[8 7 15 6]	10	25.708	31		[5 8 4 11]	22	27.854	32
[3 2 6 12]	10	25.583	32		[3 2 6 12]	22	27.166	32
[2 1 15 10]	10	25.333	30		[2 1 15 10]	20	25.416	29

xoroshironot · 2019-06-03 01:18

The ExpMin < 17 on FPF 32/64 is a false positive, since when the 32/64 tests were added in 0.94 they were not calibrated below that level.
I believe even (the extremely rare) 2^17 FPF 32/64 failures should be retested with a different seed (or restarted with tlmin at 256KB, which cannot false positive, in my experience).
There is a reply to my post on that subject: link

evanh · 2019-06-03 02:05

Figures. I hadn't looked that close to notice they were the new tests. The BFCN situation seemed pretty similar in that the pValue was only just bumping over the fail threshold but was sparsely distributed instead.

And guess what? Looking through the 0.94 reports now I can't see any BCFN false positives. Presumably he retuned those tests in 0.94. I probably wasn't the first to notice it.

Amusingly, the script logic has noticed some scores are the result of BCFN tests failing but the subsequent automatic retest has confirmed they are all valid fails. I haven't eyeballed every one of these though, so I can't say with 100% certainty.

EDIT: Huh, the BCFN retests in v0.94 are all confined to just two, of 36, candidates: [4 1 9 6] and [4 1 9 8].

evanh · 2019-06-03 02:31

evanh wrote: »

The BFCN situation seemed pretty similar in that the pValue was only just bumping over the fail threshold but was sparsely distributed instead.

The detail that I relied on was the false bump was always steep sided. It didn't have a gradual build up and fall off either side of the peak. At most only two sizes were involve, so retesting one size up was usually enough for skipping past it.

TonyB_ · 2019-06-05 11:28

evanh wrote: »

Here's the ChiSquared sorted distributions for the xom group of algorithm variants in CSV [tab] format. The cfg file contains expanded naming.

Thanks for the results, Evan. A few comments:

1. You've used a weighting of two for prank and one for zrank & nzrank. An equal weighting of one, which I've used in the past, does not change the best result for any +/- combination and makes relatively little difference to each top 20.

2. I'm not convinced about your names for the different +/- options. For one thing, it is hard to identify the usually similar positive/negative pairs (e.g. +++/---).

3. The innovation we have been studying is to add some value, call it X for now, to s0 during the same time that s0 and s1 are added, then add the latter sum rotated to the former sum. Here add can mean add or subtract. Thus PRN = rotl(+/- s1 +/- s0, d ) +/- (s0 + X).

X has been chosen deliberately to be an even multiple of s0 so that (s0 + X) is an odd multiple of s0. We could introduce a fifth constant, e, such that s0 * e = s0 + X. e = 5 produces better distributions than e = 1 (the original xoroshiro++) so does this mean e = 3 would be somewhere in between? Or could we increase e > 5 and get better results?

Instead of using non-obvious names, [a,b,c,d,e]+++ to [a,b,c,d,e]--- could identify the eight different +/- options, where

[a,b,c,d,1]+++ = [a,b,c,d] ++ = original xoroshiro++ algorithm

4. The best result for the new distributions you posted is this one:

Algorithm		Candidate	pchi	zchi	nzchi	prank	zrank	nzrank
Xoroshiro32(16)-s1+s0-x	[12 4 15 7]	9	46	159	4	2	6

which could be known as xoroshiro[12,4,15,7,5]-+-

evanh · 2019-06-05 12:30

I presume it's important that X is a multiple of s0, ie: not a constant.

As for the naming, that's all a bit hackish anyway. That particular edition was from before I started down the "try all combinations" path. I've just left it alone. The other seven editions, as per your example, are more long-winded naming.

I've done a number of grid runs now, though not the Xoroshiro32(16)-s1+s0-x run yet. Have been repairing/changing some scripts in the process.

evanh · 2019-06-05 12:34

Oh, ah, for point #2, what do you mean by "usually similar positive/negative pairs"?

TonyB_ · 2019-06-05 20:37

evanh wrote: »

Oh, ah, for point #2, what do you mean by "usually similar positive/negative pairs"?

Here's what I mean. Best scores for xoroshiro[a,b,c,d,5]??? are:

???,	a,b,c,d

+++,	11,7,10,10
---,	11,7,10,10

++-,	13,3,14,8
--+,	13,3,14,8

+-+,	12,4,15,7
-+-,	12,4,15,7

+--,	3,2,6,8
-++,	3,2,6,8

TonyB_ · 2019-06-05 20:44

At some point, increasing e will decrease quality as fewer bits of s0 are used twice (in different positions) in the s0 * e addition. Is e = 5 best, or is it 3 or 7 or 9?

evanh · 2019-06-06 03:06

Chris recommended x5. I just followed instructions.

Huh, that sign pairing really stands out. I've just completed the first grid run for xom10 (-++) and comparing it to the earlier xom03 (+--) it comes out remarkably close sorting order. And not just the order either, matching candidate grid score averages are close too. So, not surprisingly, whole gridding run averages are also close: 29.584 for xom03, 29.626 for xom10.

TonyB_ · 2019-06-06 10:07

evanh wrote: »

TonyB_ wrote: »

At some point, increasing e will decrease quality as fewer bits of s0 are used twice (in different positions) in the s0 * e addition. Is e = 5 best, or is it 3 or 7 or 9?

Chris recommended x5. I just followed instructions.

s0 * 5 is an extra 14-bit adder for each iteration compared to just s0, therefore the double iteration in XORO32 would use 92 add/subtract bits instead of 64 now. No extra routing resources should be needed for s0 + s0 * 4 as both operands are the same.

xoroshironot · 2019-06-10 00:28

TonyB_ wrote: »

evanh wrote: »

TonyB_ wrote: »

At some point, increasing e will decrease quality as fewer bits of s0 are used twice (in different positions) in the s0 * e addition. Is e = 5 best, or is it 3 or 7 or 9?

Chris recommended x5. I just followed instructions.

s0 * 5 is an extra 14-bit adder for each iteration compared to just s0, therefore the double iteration in XORO32 would use 92 add/subtract bits instead of 64 now. No extra routing resources should be needed for s0 + s0 * 4 as both operands are the same.

I notice: s0 + s0 * 4, from a hardware implementation standpoint, would likely be nearly the same as s0 + rotl(s0, 2).

The worth of those two (normally discarded) bits might be high, with the right constants, but equidistribution would have to be verified.

I have not tested this idea yet in my own code (which uses two * 5s), but from a software implementation standpoint, I suspect it would likely overburden an already saturated CPU pipeline.

evanh · 2019-06-10 01:27

In terms of the execution stages of the pipeline, even though there is two clock cycles allotted, it really all happens in the first of the two clocks. Chip has mentioned the second execution clock is result mux'ing only - Which presumably means there is a wide mass of flops in the middle of execution. I think this was a design decision Chip made to eliminate a nightmare in the critical path analysis.

Random/LFSR on P2

Comments