Random/LFSR on P2

TonyB_ · 2018-07-13 13:28

Re point 2, how long would it take to do the distribution for all possible [a,b,c,d]? 84 x 16 = 1344 tests.

evanh · 2018-07-13 14:05

From memory, 4 minutes per test case. And can easily run six cases concurrently., so average 40 seconds per case.

evanh · 2018-07-13 14:57

evanh wrote: »

A detail: The best possible exponent average is 33.41785.

Worked out as:
+ 8 lines of 16 GB scores (aperture 9-16 bits)
+ 4 lines of 8 GB scores (aperture 5-8 bits)
+ 2 lines of 4 GB scores (aperture 3-4 bits)
+ 1 line of 2 GB scores (aperture 2 bit)
+ 1 line of 1 GB scores (aperture 1 bit)

Hmm, that wasn't right. I can't remember how I got from any tally-up to the that best value. Best is (34x8+33x4+32x2+31+30) / 16 = 33.0625

But the best exponent average for 32x32 grid is different.
It'll be:
+ 35x16 lines of 32 GB scores (aperture 17-32 bits)
+ 34x8 lines of 16 GB scores (aperture 9-16 bits)
+ 33x4 lines of 8 GB scores (aperture 5-8 bits)
+ 32x2 lines of 4 GB scores (aperture 3-4 bits)
+ 31x1 line of 2 GB scores (aperture 2 bit)
+ 30x1 line of 1 GB scores (aperture 1 bit)
============
1089 / 32 = 34.03125

This basically shifts the bar up by 1 exponent level. Which means the double iterated average score came out worse than the single iterated average score.

evanh · 2018-07-13 15:23

Ah, now I remember. Did the tally of the scores themselves, then log base2 the average score:

=== Single iterated Xoroshiro32 ===
+ 128G = 8 lines of 16 GB scores (aperture 9-16 bits)
+ 32G = 4 lines of 8 GB scores (aperture 5-8 bits)
+ 8G = 2 lines of 4 GB scores (aperture 3-4 bits)
+ 2G = 1 line of 2 GB scores (aperture 2 bit)
+ 1G = 1 line of 1 GB scores (aperture 1 bit)
=============
log( 171 x 1024^3 / 16 ) / log( 2 ) = 33.41785

=== Double iterated Xoroshiro32 ===
+ 512G = 16 lines of 32 GB scores (aperture 17-32 bits)
+ 128G = 8 lines of 16 GB scores (aperture 9-16 bits)
+ 32G = 4 lines of 8 GB scores (aperture 5-8 bits)
+ 8G = 2 lines of 4 GB scores (aperture 3-4 bits)
+ 2G = 1 line of 2 GB scores (aperture 2 bit)
+ 1G = 1 line of 1 GB scores (aperture 1 bit)
=============
log( 683 x 1024^3 / 32 ) / log( 2 ) = 34.41574

Okay, so still basically raising the bar by one exponent level.

EDIT: Typo fixes

TonyB_ · 2018-07-14 23:05

evanh wrote: »

TonyB_ wrote: »

TonyB_ wrote: »

Running full double-iterated grid tests for all non-culled candidates might not be the best use of the resources at the moment. I'd like to know:

1. Does the 32-bit distribution (pair/XORO32) vary with different lsb's for [14,2,7,5]? The ones to try are 5 or 19 or 20 as these have 2G score and 0 is only 256M. The C code would need a new constant to rotate the 32-bit output before incrementing the 4GB byte array.

2. How long does each 32-bit distribution test take? About a minute? I'd prefer to see all the [31:0] pair and zero distributions before any more PractRand grid tests. Which [a,b,c,d] is closest to the ideal? Currently it's [3,2,6,5] for pair frequency but lots of candidates have not been tested yet.

3. What is the distribution for scro's generator with 32-bit output and max score of 32G? Is it a binomial?

Re point 2, how long would it take to do the distribution for all possible [a,b,c,d]? 84 x 16 = 1344 tests.

From memory, 4 minutes per test case. And can easily run six cases concurrently., so average 40 seconds per case.

Is this doable? Or all of [14,2,7,x] to begin with, to see if PractRand scores and distributions continue to be in broad agreement.

evanh · 2018-07-15 05:48

All is doable. I haven't done any coding in a couple of days. The gridding is progressing.

TonyB_ · 2018-07-19 11:08

Thursday again! Any news?

evanh · 2018-07-20 10:24

Gridding finished a couple hours ago. Not done much else - Verified rotated distribution is as you say, comes up identical results for every case.

I'm not sure how to do double iteration pairs at all. It seems to demand 64-bit indexes because of the 32-bit output word size. This applies equally for Chris's generator too. I've experimented with using reduced width sampling in attempt to contain the pairing table size but this gives massive distortion to distribution results. The distribution table expands out to larger and flatter coverage and centred upon 4^x difference between output and sampling exponents. The table rapidly explodes as a result.

TonyB_ · 2018-07-20 12:29

evanh wrote: »

Gridding finished a couple hours ago. Not done much else - Verified rotated distribution is as you say, comes up identical results for every case.

I'm not sure how to do double iteration pairs at all. It seems to demand 64-bit indexes because of the 32-bit output word size. This applies equally for Chris's generator too. I've experimented with using reduced width sampling in attempt to contain the pairing table size but this gives massive distortion to distribution results. The distribution table expands out to larger and flatter coverage and centred upon 4^x difference between output and sampling exponents. The table rapidly explodes as a result.

The XORO32 distribution code is just the same as the ones you've done already, with pairs of 16-bit outputs, so it's a case of running tests for all possible [a,b,c,d]. The distribution for Chris's generator can be found by replacing the two successive xoroshiro32++ outputs with his single 32-bit output for the 4GB array index.

evanh · 2018-07-20 12:49

TonyB_ wrote: »

... with pairs of 16-bit outputs, so it's a case of running tests for all possible [a,b,c,d].

Doesn't work/make sense. The distribution table would need to be extended from 256 entries, with only the first 10 or so filled, to a table with 32M entries and the spread filling maybe thousands of entries each side of the 16M mark. All very small tallies.

The distribution for Chris's generator can be found by replacing the two successive xoroshiro32++ outputs with his single 32-bit output for the 4GB array index.

Then that's no longer done with pairs. What does it measure?

TonyB_ · 2018-07-20 14:10

evanh wrote: »

TonyB_ wrote: »

... with pairs of 16-bit outputs, so it's a case of running tests for all possible [a,b,c,d].

Doesn't work/make sense. The distribution table would need to be extended from 256 entries, with only the first 10 or so filled, to a table with 32M entries and the spread filling maybe thousands of entries each side of the 16M mark. All very small tallies.

You've calculated pair and zero distributions for certain [a,b,c,d] already. I'm asking for the other [a,b,c,d] values to be tested so that we know for certain which constants produce the best distribution.

evanh wrote: »

TonyB_ wrote: »

The distribution for Chris's generator can be found by replacing the two successive xoroshiro32++ outputs with his single 32-bit output for the 4GB array index.

Then that's no longer done with pairs. What does it measure?

It measures the frequency distribution for his generator. Is it a binomial or something else? This test is not essential, but it would be interesting to know. We don't need to use pairs because his output is 32-bit.

evanh · 2018-07-20 14:50

I haven't produced results for any double iteration pairs. You're wanting XORO32 I thought. I can certainly do all the single iteration Xoroshiro32++ candidates.

I assumed pairs was an important criteria of making this measurement. I'm not measuring the same thing if all I do is tally up each output occurrence.

TonyB_ · 2018-07-20 19:57

Remember that the xoroshiro32++ pair distributions are identical to the XORO32 output distributions, but ordering of data is different. You've already posted various distributions reports, e.g. http://forums.parallax.com/discussion/comment/1441572/#Comment_1441572.

What I'm requesting are the distribution reports for all possible xoroshiro32++[a,b,c,d]. That's 84 * 16 = 1344 pair/zero run tests. You said that each one takes 40 seconds, therefore total time is 15 hours. The C code doesn't have to change but did you see the following post? http://forums.parallax.com/discussion/comment/1441480/#Comment_1441480

In summary, nothing new (very minor zero run change excepted), just more of the same.

If you feel like doing it, please also test Chris's generator, which will need a bit of alteration to the C code. His generator is biased deliberately to get a max PractRand score and is not equidistributed. If it were, with a 32-bit state and a 32-bit output, each non-zero output would occur exactly once and zero never. However, this is not the case but the actual distribution is a mystery and might not be as good as XORO32 with the best candidates.

evanh · 2018-07-21 00:08

So the pairs I've been doing is not the equidistribution test then? I must admit to not following how each of your reports came about.

TonyB_ · 2018-07-21 01:50

Output is either equidistributed or not, no need to test for it. Single-iterated xoroshiro+ or xoroshiro++ is, but double-iterated is not. The pair distribution test arose from my duplicate (formerly repeat) test. For 16-bit output, the probability of a duplicate (next output same as previous) is 1/2^16. The period is 2^32-1 single iterations, therefore the expected total repeats is 2^32-1/2^16 = 2^16. If the repeats were equidistributed each output would occur exactly once, but they are not and the following binomial frequency distribution is expected assuming the outputs are independent random values:

Binomial distribution for N = 2^16 samples

Freq	Expected count		Nearest dec	Nearest hex

  0     N/0!e  =  24109.347	      24109	       5E2D
  1     N/1!e  =  24109.347	      24109	       5E2D
  2     N/2!e  =  12054.674	      12055	       2F17
  3     N/3!e  =   4018.225	       4018	       0FB2
  4     N/4!e  =   1004.556	       1005	       03ED
  5     N/5!e  =    200.911	  	201	       00C9
  6     N/6!e  =     33.485		 33	       0021
  7     N/7!e  =      4.784		  5	       0005
  8     N/8!e  =       0.598		  1	       0001
  9     N/9!e  =       0.0664		  0	       0000
 10     N/10!e =       0.00664		  0	       0000
				      -----	       ----
				      65536	       FFFF

If we treat a pair of outputs as the low and high words of a single 32-bit output, the probability of any output is 1/2^32. The period is 2^32-1 double iterations, again there is no equidistribution and the expected binomial distribution is as follows:

Binomial distribution for N = 2^32 samples

Freq	Expected count 	     		Nearest dec	Nearest hex

  0     N/0!e  =  1580030168.7021	1580030169	5E2D58D9
  1     N/1!e  =  1580030168.7021	1580030169	5E2D58D9
  2     N/2!e  =   790015084.3510	 790015084	2F16AC6C
  3     N/3!e  =   263338361.4504	 263338361	0FB23979
  4     N/4!e  =    65834590.3626	  65834590	03EC8E5E
  5     N/5!e  =    13166918.0725	  13166918	00C8E946
  6     N/6!e  =     2194486.3454	   2194486	00217C36
  7     N/7!e  =      313498.0493	    313498	0004C89A
  8     N/8!e  =       39187.2562	     39187	00009913
  9     N/9!e  =        4354.1396	      4354	00001102
 10     N/10!e = 	 435.4140	       435	000001B3
 11     N/11!e = 	  39.5831	        40	00000028
 12     N/12!e = 	   3.2986		 3	00000003
 13     N/13!e = 	   0.2537		 0	00000000
 14     N/14!e = 	   0.0181	         0	00000000
 15     N/15!e = 	   0.0012	         0	00000000
					----------	--------
					4294967294	FFFFFFFE

Integer totals slightly low due to rounding errors
N = 2^32 used for simplicity instead of 2^32-1

Our results prove the above mathematics. I can do the repeat tests but I don't have enough RAM for the generalized pair tests. The latter are much more accurate tests of randomness because the sample size is massively bigger.

evanh · 2018-07-21 02:53

Ah, of course, equidistributed is applicable to any data. Whether it be the generator output directly or the pairs or the duplicates. Each data type is separately evaluated for equidistribution. I hadn't really noticed the distinction between pairs and duplicates until now either.

In the case of the generator output, I gather we already know all candidates are equidistributed because we've been using the set of engine constants from the full-period culling we did yonks ago. Chip did the first run of that.

So, measuring the "Binomial frequency distribution" is what the pairs is for. And duplicates is also producing the same but at less detail, right?

evanh · 2018-07-21 04:36

I've tested using s12, s13 and s14 candidates on single iterated ++ algorithm. Automation is cycling good. Data look typical. See attached. I've changed to 7zip compression because it's 10x higher compression with so many similar files like this!

Running the s16 candidates right now.

TonyB_ · 2018-07-21 23:41

Thanks, Evan. It would help me sort the results if you could combine the separate .bin files into one file for each [a,b,c] starting at d = 1 (as you omit d = 0), then d = 2, etc., with filename in 8.3 format.

s8 would be interesting to see after the s16 and Chris's distributions.

evanh · 2018-07-22 00:13

Ouch, what are you using? Surely Windoze doesn't make it hard to use longer naming? Not that I'd know.

Last time I was dealing with 8.3 format, or similar, I was using cassette tapes and 3.0" floppies! No, tell a lie, I have once or twice since written industrial exe's for FAT booting equipment, but that was just a means of booting only.

evanh · 2018-07-22 01:29

Done. The numerical naming of the bin files is position dependant. I'm guessing that'll be helpful.

PS: I've not verified the order within each file. The 16 individual bin files were collated using wild cards.

PPS: Huh, as a result of now explicitly concatenating each data file by automated naming I've discovered one case of missing data in the s16 results. Must have happened during one of my interruptions or maybe I've deleted it accidentally. I'll regenerate and rezip it all ...

PPPS: Even weirder, it wasn't deleted accidentally, the test case runs but is short runtime and never generates any data nor even any log reporting. ...

EDIT: Got the logging sorted. It was another one of those console buffering issues that needed "stdbuf" mod.

The actual problem is a little more annoying with crash due to an index (#332) beyond the zfreq[255] array limit. Also found I managed to introduce a small bug in the late stages of reengineering the C source to handle double iterating distribution testing. It incorrectly double increments the pair array final indexing. ... Sadly this means I have to rerun these last couple of days effort.

EDIT2: Crazy, it was a lot more than just a single index of 332. For this sole case, I've just generated a log file with over 378000 errors saying index is greater than 255! Might have to look into it a bit more ...

EDIT3: Oh, looking at the data now it just looks like a case of the frequency distributions really are that bad. I'll be leaving the bounds in place I guess. Will remove that logging emit though!

EDIT4: Tony, Here's what I've currently got. You can see how I've now got the final loop iteration unrolled. I'm thinking there is also one more improvement with filling the zfreq[] array. See second code snippet.

	// read pair array and increment freq values
	irng = 0;
	do {
		// pair freq values
		ifreq = pairs[irng++];  // 8-bit limited
		pfreq[ifreq]++;

		// zero-run freq values
		if( ifreq == 0 )  ztally++;
		else {
			if( ztally > 255 )  ztally=255;
			if( ztally != 0 )  zfreq[ztally]++;
			zfreq[0]++;
			ztally = 0;
		}
	} while( irng < (1UL<<(SAMPLE_SIZE*2))-1 );

	// pair freq values
	ifreq = pairs[irng];
	pfreq[ifreq]++;
	// zero-run freq values
	if( ifreq == 0 )  ztally++;
	else {
		if( ztally > 255 )  ztally=255;
		if( ztally != 0 )  zfreq[ztally]++;
		zfreq[0]++;
		ztally = 0;
	}

	// read pair array and increment freq values
	irng = 0;
	do {
		// pair freq values
		ifreq = pairs[irng++];  // 8-bit limited
		pfreq[ifreq]++;

		// zero-run freq values
		if( ifreq == 0 )  ztally++;
		else {
			if( ztally > 255 )  ztally=255;
			if( ztally != 0 )  zfreq[ztally]++;
			zfreq[0]++;
			ztally = 0;
		}
	} while( irng < (1UL<<(SAMPLE_SIZE*2))-1 );

	// pair freq values
	ifreq = pairs[irng];
	pfreq[ifreq]++;
	// zero-run freq values
	if( ifreq == 0 )  ztally++;
	if( ztally > 255 )  ztally=255;
	if( ztally != 0 )  zfreq[ztally]++;
	zfreq[0]++;

evanh · 2018-07-22 07:57

Wow, I didn't know each Epyc socket fields eight DIMM channels. That's insane pin count! I had assumed the Threadripper parts were same packaging as Epyc but that's now clearly not the case.

EDIT: Related to this topic, the 4 minutes to run a s16 distribution case (4 GB pairs array) is actually more like 4.5-5 minutes. Varies between 270 and 290 seconds. I can get up to 7 cases running (7x4=28 GB of randomly and rapidly accessed array space) in parallel to average that down to the same earlier estimate of 40 seconds per case. The caches must be getting thrashed big time. Quite impressive how well the Ryzen handles the workload. Not that I have any comparative measurements.

Funnily, the individual case time speeds up somewhat when there's no contention for main memory. If only a single case is running it's more like 2.5-3 minutes to completion.

evanh · 2018-07-22 09:20

Right, here's everything below s16 run as per the second snippet above.

TonyB_ · 2018-07-22 11:30

To avoid wasting a lot of time renaming files, I'd prefer [a,b,c,d] to be in ASCII hex at the start of the 8.3 filename. Combining [a,b,c] into one file is not as important:

abcds16p.bin / abcds16z.bin for separate files, or
abcxs16p.bin / abcxs16z.bin for combined files.

evanh · 2018-07-22 11:43

Go have another cup of coffee! Many constants are two digit. Best I could do is modify the existing short names from aabbcc.pb (and .zb) to aabbccdd.pb (and .zb) for uncombined naming.

TonyB_ · 2018-07-22 11:43

evanh wrote: »

Done. The numerical naming of the bin files is position dependant. I'm guessing that'll be helpful.

PS: I've not verified the order within each file. The 16 individual bin files were collated using wild cards.

PPS: Huh, as a result of now explicitly concatenating each data file by automated naming I've discovered one case of missing data in the s16 results. Must have happened during one of my interruptions or maybe I've deleted it accidentally. I'll regenerate and rezip it all ...

PPPS: Even weirder, it wasn't deleted accidentally, the test case runs but is short runtime and never generates any data nor even any log reporting. ...

EDIT: Got the logging sorted. It was another one of those console buffering issues that needed "stdbuf" mod.

The actual problem is a little more annoying with crash due to an index (#332) beyond the zfreq[255] array limit. Also found I managed to introduce a small bug in the late stages of reengineering the C source to handle double iterating distribution testing. It incorrectly double increments the pair array final indexing. ... Sadly this means I have to rerun these last couple of days effort.

EDIT2: Crazy, it was a lot more than just a single index of 332. For this sole case, I've just generated a log file with over 378000 errors saying index is greater than 255! Might have to look into it a bit more ...

EDIT3: Oh, looking at the data now it just looks like a case of the frequency distributions really are that bad. I'll be leaving the bounds in place I guess. Will remove that logging emit though!

378000 counts > 255 sounds a lot. A good s16 candidate shouldn't have any pair counts over 12 or maybe 13.

37% of the possible 2^32 output values should have a frequency of zero and 37% a frequency of one, thus pfreq0 and pfreq1 cover ¾ of the outputs.

EDIT:
Zero runs exceed 255, not pair frequencies.

evanh · 2018-07-22 11:47

TonyB_ wrote: »

378000 counts > 255 sounds a lot. A good candidate s16 shouldn't have any pair counts over 12 or maybe 13.

It's only a single candidate of the total, 1260, s16 cases tested. I wasn't too concerned in the end. It can be ignored I think. The bounding code stuffs them all into entry #255 so it's a messed distribution table but it easy to see from the rest of the entries that it's a worthless candidate.

EDIT: The candidate is [2 1 15 1]

TonyB_ · 2018-07-22 11:51

evanh wrote: »

Go have another cup of coffee! Many constants are two digit. Best I could do is modify the existing short names from aabbcc.pb (and .zb) to aabbccdd.pb (and .zb) for uncombined naming.

Could you save the individual .bin files with your preferred filenames, then write a script to combine the [a,b,c] into .bin files with my preferred filenames? If not, I'll have to rename 84 combined files.

evanh · 2018-07-22 11:53

They're there, minus the s16's that is, in the above attachment - https://forums.parallax.com/discussion/comment/1442445/#Comment_1442445

evanh · 2018-07-22 11:59

BTW: Please have a careful read of those code snippets. I'm wanting confirmation that the second one is correct.

TonyB_ · 2018-07-22 12:02

Re the C code, on June 29 I wrote this post http://forums.parallax.com/discussion/comment/1441480/#Comment_1441480 copied below.

TonyB_ wrote: »
TonyB_ wrote: »
evanh wrote: »
I've made a multi-case launching script too. It uses all RAM and creates a lot of smaller files now. I mainly chose this approach because I was wary of many programs concurrently trying to append the same files.
	// read pair array and increment freq values
	irng = 0;
	do {
		ifreq = pairs[irng++];
		pfreq[ifreq]++;

		// zero-run freq values
		if( ifreq == 0 )  ztally++;
		else {
			if( ztally != 0 )  zfreq[ztally]++;
			zfreq[0]++;
			ztally = 0;
		}
	} while( irng );
There is a small error in the above code. If the last byte of the 4GB pair array is zero, then the last zero run will not be written to the zfreq array. Also, it would better if zfreq[0] held the number of zero runs, i.e. it is incremented whenever zfreq[tally] is incremented. I think the amended code could be:
	// read pair array and increment freq values
	irng = 0;
	do {
		ifreq = pairs[irng++];
		pfreq[ifreq]++;

		// zero-run freq values
		if( ifreq == 0 )  ztally++;
		else
		{
			if( ztally != 0 )
			{
				zfreq[ztally]++;
				zfreq[0]++;
				ztally = 0;
			}
		}
	} while( irng );
	if( ztally != 0 )
	{
		zfreq[ztally]++;
		zfreq[0]++;
	}

I don't know how whether or not you've adopted these two points: (1) Checking after the zero loop ends to see if there is zero run that should be stored which otherwise would be missed because the last byte of the array is zero and (2) making zfreq[0] = number of zero runs.

Random/LFSR on P2

Comments