A detail: The best possible exponent average is 33.41785.
Worked out as:
+ 8 lines of 16 GB scores (aperture 9-16 bits)
+ 4 lines of 8 GB scores (aperture 5-8 bits)
+ 2 lines of 4 GB scores (aperture 3-4 bits)
+ 1 line of 2 GB scores (aperture 2 bit)
+ 1 line of 1 GB scores (aperture 1 bit)
Hmm, that wasn't right. I can't remember how I got from any tally-up to the that best value. Best is (34x8+33x4+32x2+31+30) / 16 = 33.0625
But the best exponent average for 32x32 grid is different.
It'll be:
+ 35x16 lines of 32 GB scores (aperture 17-32 bits)
+ 34x8 lines of 16 GB scores (aperture 9-16 bits)
+ 33x4 lines of 8 GB scores (aperture 5-8 bits)
+ 32x2 lines of 4 GB scores (aperture 3-4 bits)
+ 31x1 line of 2 GB scores (aperture 2 bit)
+ 30x1 line of 1 GB scores (aperture 1 bit)
============
1089 / 32 = 34.03125
This basically shifts the bar up by 1 exponent level. Which means the double iterated average score came out worse than the single iterated average score.
Running full double-iterated grid tests for all non-culled candidates might not be the best use of the resources at the moment. I'd like to know:
1. Does the 32-bit distribution (pair/XORO32) vary with different lsb's for [14,2,7,5]? The ones to try are 5 or 19 or 20 as these have 2G score and 0 is only 256M. The C code would need a new constant to rotate the 32-bit output before incrementing the 4GB byte array.
2. How long does each 32-bit distribution test take? About a minute? I'd prefer to see all the [31:0] pair and zero distributions before any more PractRand grid tests. Which [a,b,c,d] is closest to the ideal? Currently it's [3,2,6,5] for pair frequency but lots of candidates have not been tested yet.
3. What is the distribution for scro's generator with 32-bit output and max score of 32G? Is it a binomial?
Re point 2, how long would it take to do the distribution for all possible [a,b,c,d]? 84 x 16 = 1344 tests.
From memory, 4 minutes per test case. And can easily run six cases concurrently., so average 40 seconds per case.
Is this doable? Or all of [14,2,7,x] to begin with, to see if PractRand scores and distributions continue to be in broad agreement.
Gridding finished a couple hours ago. Not done much else - Verified rotated distribution is as you say, comes up identical results for every case.
I'm not sure how to do double iteration pairs at all. It seems to demand 64-bit indexes because of the 32-bit output word size. This applies equally for Chris's generator too. I've experimented with using reduced width sampling in attempt to contain the pairing table size but this gives massive distortion to distribution results. The distribution table expands out to larger and flatter coverage and centred upon 4^x difference between output and sampling exponents. The table rapidly explodes as a result.
Gridding finished a couple hours ago. Not done much else - Verified rotated distribution is as you say, comes up identical results for every case.
I'm not sure how to do double iteration pairs at all. It seems to demand 64-bit indexes because of the 32-bit output word size. This applies equally for Chris's generator too. I've experimented with using reduced width sampling in attempt to contain the pairing table size but this gives massive distortion to distribution results. The distribution table expands out to larger and flatter coverage and centred upon 4^x difference between output and sampling exponents. The table rapidly explodes as a result.
The XORO32 distribution code is just the same as the ones you've done already, with pairs of 16-bit outputs, so it's a case of running tests for all possible [a,b,c,d]. The distribution for Chris's generator can be found by replacing the two successive xoroshiro32++ outputs with his single 32-bit output for the 4GB array index.
... with pairs of 16-bit outputs, so it's a case of running tests for all possible [a,b,c,d].
Doesn't work/make sense. The distribution table would need to be extended from 256 entries, with only the first 10 or so filled, to a table with 32M entries and the spread filling maybe thousands of entries each side of the 16M mark. All very small tallies.
The distribution for Chris's generator can be found by replacing the two successive xoroshiro32++ outputs with his single 32-bit output for the 4GB array index.
Then that's no longer done with pairs. What does it measure?
... with pairs of 16-bit outputs, so it's a case of running tests for all possible [a,b,c,d].
Doesn't work/make sense. The distribution table would need to be extended from 256 entries, with only the first 10 or so filled, to a table with 32M entries and the spread filling maybe thousands of entries each side of the 16M mark. All very small tallies.
You've calculated pair and zero distributions for certain [a,b,c,d] already. I'm asking for the other [a,b,c,d] values to be tested so that we know for certain which constants produce the best distribution.
The distribution for Chris's generator can be found by replacing the two successive xoroshiro32++ outputs with his single 32-bit output for the 4GB array index.
Then that's no longer done with pairs. What does it measure?
It measures the frequency distribution for his generator. Is it a binomial or something else? This test is not essential, but it would be interesting to know. We don't need to use pairs because his output is 32-bit.
I haven't produced results for any double iteration pairs. You're wanting XORO32 I thought. I can certainly do all the single iteration Xoroshiro32++ candidates.
I assumed pairs was an important criteria of making this measurement. I'm not measuring the same thing if all I do is tally up each output occurrence.
What I'm requesting are the distribution reports for all possible xoroshiro32++[a,b,c,d]. That's 84 * 16 = 1344 pair/zero run tests. You said that each one takes 40 seconds, therefore total time is 15 hours. The C code doesn't have to change but did you see the following post? http://forums.parallax.com/discussion/comment/1441480/#Comment_1441480
In summary, nothing new (very minor zero run change excepted), just more of the same.
If you feel like doing it, please also test Chris's generator, which will need a bit of alteration to the C code. His generator is biased deliberately to get a max PractRand score and is not equidistributed. If it were, with a 32-bit state and a 32-bit output, each non-zero output would occur exactly once and zero never. However, this is not the case but the actual distribution is a mystery and might not be as good as XORO32 with the best candidates.
Output is either equidistributed or not, no need to test for it. Single-iterated xoroshiro+ or xoroshiro++ is, but double-iterated is not. The pair distribution test arose from my duplicate (formerly repeat) test. For 16-bit output, the probability of a duplicate (next output same as previous) is 1/2^16. The period is 2^32-1 single iterations, therefore the expected total repeats is 2^32-1/2^16 = 2^16. If the repeats were equidistributed each output would occur exactly once, but they are not and the following binomial frequency distribution is expected assuming the outputs are independent random values:
If we treat a pair of outputs as the low and high words of a single 32-bit output, the probability of any output is 1/2^32. The period is 2^32-1 double iterations, again there is no equidistribution and the expected binomial distribution is as follows:
Our results prove the above mathematics. I can do the repeat tests but I don't have enough RAM for the generalized pair tests. The latter are much more accurate tests of randomness because the sample size is massively bigger.
Ah, of course, equidistributed is applicable to any data. Whether it be the generator output directly or the pairs or the duplicates. Each data type is separately evaluated for equidistribution. I hadn't really noticed the distinction between pairs and duplicates until now either.
In the case of the generator output, I gather we already know all candidates are equidistributed because we've been using the set of engine constants from the full-period culling we did yonks ago. Chip did the first run of that.
So, measuring the "Binomial frequency distribution" is what the pairs is for. And duplicates is also producing the same but at less detail, right?
I've tested using s12, s13 and s14 candidates on single iterated ++ algorithm. Automation is cycling good. Data look typical. See attached. I've changed to 7zip compression because it's 10x higher compression with so many similar files like this!
Thanks, Evan. It would help me sort the results if you could combine the separate .bin files into one file for each [a,b,c] starting at d = 1 (as you omit d = 0), then d = 2, etc., with filename in 8.3 format.
s8 would be interesting to see after the s16 and Chris's distributions.
Ouch, what are you using? Surely Windoze doesn't make it hard to use longer naming? Not that I'd know.
Last time I was dealing with 8.3 format, or similar, I was using cassette tapes and 3.0" floppies! No, tell a lie, I have once or twice since written industrial exe's for FAT booting equipment, but that was just a means of booting only.
Done. The numerical naming of the bin files is position dependant. I'm guessing that'll be helpful.
PS: I've not verified the order within each file. The 16 individual bin files were collated using wild cards.
PPS: Huh, as a result of now explicitly concatenating each data file by automated naming I've discovered one case of missing data in the s16 results. Must have happened during one of my interruptions or maybe I've deleted it accidentally. I'll regenerate and rezip it all ...
PPPS: Even weirder, it wasn't deleted accidentally, the test case runs but is short runtime and never generates any data nor even any log reporting. ...
EDIT: Got the logging sorted. It was another one of those console buffering issues that needed "stdbuf" mod.
The actual problem is a little more annoying with crash due to an index (#332) beyond the zfreq[255] array limit. Also found I managed to introduce a small bug in the late stages of reengineering the C source to handle double iterating distribution testing. It incorrectly double increments the pair array final indexing. ... Sadly this means I have to rerun these last couple of days effort.
EDIT2: Crazy, it was a lot more than just a single index of 332. For this sole case, I've just generated a log file with over 378000 errors saying index is greater than 255! Might have to look into it a bit more ...
EDIT3: Oh, looking at the data now it just looks like a case of the frequency distributions really are that bad. I'll be leaving the bounds in place I guess. Will remove that logging emit though!
EDIT4: Tony, Here's what I've currently got. You can see how I've now got the final loop iteration unrolled. I'm thinking there is also one more improvement with filling the zfreq[] array. See second code snippet.
Wow, I didn't know each Epyc socket fields eight DIMM channels. That's insane pin count! I had assumed the Threadripper parts were same packaging as Epyc but that's now clearly not the case.
EDIT: Related to this topic, the 4 minutes to run a s16 distribution case (4 GB pairs array) is actually more like 4.5-5 minutes. Varies between 270 and 290 seconds. I can get up to 7 cases running (7x4=28 GB of randomly and rapidly accessed array space) in parallel to average that down to the same earlier estimate of 40 seconds per case. The caches must be getting thrashed big time. Quite impressive how well the Ryzen handles the workload. Not that I have any comparative measurements.
Funnily, the individual case time speeds up somewhat when there's no contention for main memory. If only a single case is running it's more like 2.5-3 minutes to completion.
To avoid wasting a lot of time renaming files, I'd prefer [a,b,c,d] to be in ASCII hex at the start of the 8.3 filename. Combining [a,b,c] into one file is not as important:
abcds16p.bin / abcds16z.bin for separate files, or abcxs16p.bin / abcxs16z.bin for combined files.
Go have another cup of coffee! Many constants are two digit. Best I could do is modify the existing short names from aabbcc.pb (and .zb) to aabbccdd.pb (and .zb) for uncombined naming.
Done. The numerical naming of the bin files is position dependant. I'm guessing that'll be helpful.
PS: I've not verified the order within each file. The 16 individual bin files were collated using wild cards.
PPS: Huh, as a result of now explicitly concatenating each data file by automated naming I've discovered one case of missing data in the s16 results. Must have happened during one of my interruptions or maybe I've deleted it accidentally. I'll regenerate and rezip it all ...
PPPS: Even weirder, it wasn't deleted accidentally, the test case runs but is short runtime and never generates any data nor even any log reporting. ...
EDIT: Got the logging sorted. It was another one of those console buffering issues that needed "stdbuf" mod.
The actual problem is a little more annoying with crash due to an index (#332) beyond the zfreq[255] array limit. Also found I managed to introduce a small bug in the late stages of reengineering the C source to handle double iterating distribution testing. It incorrectly double increments the pair array final indexing. ... Sadly this means I have to rerun these last couple of days effort.
EDIT2: Crazy, it was a lot more than just a single index of 332. For this sole case, I've just generated a log file with over 378000 errors saying index is greater than 255! Might have to look into it a bit more ...
EDIT3: Oh, looking at the data now it just looks like a case of the frequency distributions really are that bad. I'll be leaving the bounds in place I guess. Will remove that logging emit though!
378000 counts > 255 sounds a lot. A good s16 candidate shouldn't have any pair counts over 12 or maybe 13.
37% of the possible 2^32 output values should have a frequency of zero and 37% a frequency of one, thus pfreq0 and pfreq1 cover ¾ of the outputs.
378000 counts > 255 sounds a lot. A good candidate s16 shouldn't have any pair counts over 12 or maybe 13.
It's only a single candidate of the total, 1260, s16 cases tested. I wasn't too concerned in the end. It can be ignored I think. The bounding code stuffs them all into entry #255 so it's a messed distribution table but it easy to see from the rest of the entries that it's a worthless candidate.
Go have another cup of coffee! Many constants are two digit. Best I could do is modify the existing short names from aabbcc.pb (and .zb) to aabbccdd.pb (and .zb) for uncombined naming.
Could you save the individual .bin files with your preferred filenames, then write a script to combine the [a,b,c] into .bin files with my preferred filenames? If not, I'll have to rename 84 combined files.
I've made a multi-case launching script too. It uses all RAM and creates a lot of smaller files now. I mainly chose this approach because I was wary of many programs concurrently trying to append the same files.
There is a small error in the above code. If the last byte of the 4GB pair array is zero, then the last zero run will not be written to the zfreq array. Also, it would better if zfreq[0] held the number of zero runs, i.e. it is incremented whenever zfreq[tally] is incremented. I think the amended code could be:
I don't know how whether or not you've adopted these two points: (1) Checking after the zero loop ends to see if there is zero run that should be stored which otherwise would be missed because the last byte of the array is zero and (2) making zfreq[0] = number of zero runs.
Comments
But the best exponent average for 32x32 grid is different.
It'll be:
+ 35x16 lines of 32 GB scores (aperture 17-32 bits)
+ 34x8 lines of 16 GB scores (aperture 9-16 bits)
+ 33x4 lines of 8 GB scores (aperture 5-8 bits)
+ 32x2 lines of 4 GB scores (aperture 3-4 bits)
+ 31x1 line of 2 GB scores (aperture 2 bit)
+ 30x1 line of 1 GB scores (aperture 1 bit)
============
1089 / 32 = 34.03125
This basically shifts the bar up by 1 exponent level. Which means the double iterated average score came out worse than the single iterated average score.
=== Single iterated Xoroshiro32 ===
+ 128G = 8 lines of 16 GB scores (aperture 9-16 bits)
+ 32G = 4 lines of 8 GB scores (aperture 5-8 bits)
+ 8G = 2 lines of 4 GB scores (aperture 3-4 bits)
+ 2G = 1 line of 2 GB scores (aperture 2 bit)
+ 1G = 1 line of 1 GB scores (aperture 1 bit)
=============
log( 171 x 1024^3 / 16 ) / log( 2 ) = 33.41785
=== Double iterated Xoroshiro32 ===
+ 512G = 16 lines of 32 GB scores (aperture 17-32 bits)
+ 128G = 8 lines of 16 GB scores (aperture 9-16 bits)
+ 32G = 4 lines of 8 GB scores (aperture 5-8 bits)
+ 8G = 2 lines of 4 GB scores (aperture 3-4 bits)
+ 2G = 1 line of 2 GB scores (aperture 2 bit)
+ 1G = 1 line of 1 GB scores (aperture 1 bit)
=============
log( 683 x 1024^3 / 32 ) / log( 2 ) = 34.41574
Okay, so still basically raising the bar by one exponent level.
EDIT: Typo fixes
Is this doable? Or all of [14,2,7,x] to begin with, to see if PractRand scores and distributions continue to be in broad agreement.
I'm not sure how to do double iteration pairs at all. It seems to demand 64-bit indexes because of the 32-bit output word size. This applies equally for Chris's generator too. I've experimented with using reduced width sampling in attempt to contain the pairing table size but this gives massive distortion to distribution results. The distribution table expands out to larger and flatter coverage and centred upon 4^x difference between output and sampling exponents. The table rapidly explodes as a result.
The XORO32 distribution code is just the same as the ones you've done already, with pairs of 16-bit outputs, so it's a case of running tests for all possible [a,b,c,d]. The distribution for Chris's generator can be found by replacing the two successive xoroshiro32++ outputs with his single 32-bit output for the 4GB array index.
Then that's no longer done with pairs. What does it measure?
You've calculated pair and zero distributions for certain [a,b,c,d] already. I'm asking for the other [a,b,c,d] values to be tested so that we know for certain which constants produce the best distribution.
It measures the frequency distribution for his generator. Is it a binomial or something else? This test is not essential, but it would be interesting to know. We don't need to use pairs because his output is 32-bit.
I assumed pairs was an important criteria of making this measurement. I'm not measuring the same thing if all I do is tally up each output occurrence.
What I'm requesting are the distribution reports for all possible xoroshiro32++[a,b,c,d]. That's 84 * 16 = 1344 pair/zero run tests. You said that each one takes 40 seconds, therefore total time is 15 hours. The C code doesn't have to change but did you see the following post? http://forums.parallax.com/discussion/comment/1441480/#Comment_1441480
In summary, nothing new (very minor zero run change excepted), just more of the same.
If you feel like doing it, please also test Chris's generator, which will need a bit of alteration to the C code. His generator is biased deliberately to get a max PractRand score and is not equidistributed. If it were, with a 32-bit state and a 32-bit output, each non-zero output would occur exactly once and zero never. However, this is not the case but the actual distribution is a mystery and might not be as good as XORO32 with the best candidates.
If we treat a pair of outputs as the low and high words of a single 32-bit output, the probability of any output is 1/2^32. The period is 2^32-1 double iterations, again there is no equidistribution and the expected binomial distribution is as follows:
Our results prove the above mathematics. I can do the repeat tests but I don't have enough RAM for the generalized pair tests. The latter are much more accurate tests of randomness because the sample size is massively bigger.
In the case of the generator output, I gather we already know all candidates are equidistributed because we've been using the set of engine constants from the full-period culling we did yonks ago. Chip did the first run of that.
So, measuring the "Binomial frequency distribution" is what the pairs is for. And duplicates is also producing the same but at less detail, right?
Running the s16 candidates right now.
s8 would be interesting to see after the s16 and Chris's distributions.
Last time I was dealing with 8.3 format, or similar, I was using cassette tapes and 3.0" floppies! No, tell a lie, I have once or twice since written industrial exe's for FAT booting equipment, but that was just a means of booting only.
PS: I've not verified the order within each file. The 16 individual bin files were collated using wild cards.
PPS: Huh, as a result of now explicitly concatenating each data file by automated naming I've discovered one case of missing data in the s16 results. Must have happened during one of my interruptions or maybe I've deleted it accidentally. I'll regenerate and rezip it all ...
PPPS: Even weirder, it wasn't deleted accidentally, the test case runs but is short runtime and never generates any data nor even any log reporting. ...
EDIT: Got the logging sorted. It was another one of those console buffering issues that needed "stdbuf" mod.
The actual problem is a little more annoying with crash due to an index (#332) beyond the zfreq[255] array limit. Also found I managed to introduce a small bug in the late stages of reengineering the C source to handle double iterating distribution testing. It incorrectly double increments the pair array final indexing. ... Sadly this means I have to rerun these last couple of days effort.
EDIT2: Crazy, it was a lot more than just a single index of 332. For this sole case, I've just generated a log file with over 378000 errors saying index is greater than 255! Might have to look into it a bit more ...
EDIT3: Oh, looking at the data now it just looks like a case of the frequency distributions really are that bad. I'll be leaving the bounds in place I guess. Will remove that logging emit though!
EDIT4: Tony, Here's what I've currently got. You can see how I've now got the final loop iteration unrolled. I'm thinking there is also one more improvement with filling the zfreq[] array. See second code snippet.
EDIT: Related to this topic, the 4 minutes to run a s16 distribution case (4 GB pairs array) is actually more like 4.5-5 minutes. Varies between 270 and 290 seconds. I can get up to 7 cases running (7x4=28 GB of randomly and rapidly accessed array space) in parallel to average that down to the same earlier estimate of 40 seconds per case. The caches must be getting thrashed big time. Quite impressive how well the Ryzen handles the workload. Not that I have any comparative measurements.
Funnily, the individual case time speeds up somewhat when there's no contention for main memory. If only a single case is running it's more like 2.5-3 minutes to completion.
abcds16p.bin / abcds16z.bin for separate files, or
abcxs16p.bin / abcxs16z.bin for combined files.
378000 counts > 255 sounds a lot. A good s16 candidate shouldn't have any pair counts over 12 or maybe 13.
37% of the possible 2^32 output values should have a frequency of zero and 37% a frequency of one, thus pfreq0 and pfreq1 cover ¾ of the outputs.
EDIT:
Zero runs exceed 255, not pair frequencies.
It's only a single candidate of the total, 1260, s16 cases tested. I wasn't too concerned in the end. It can be ignored I think. The bounding code stuffs them all into entry #255 so it's a messed distribution table but it easy to see from the rest of the entries that it's a worthless candidate.
EDIT: The candidate is [2 1 15 1]
Could you save the individual .bin files with your preferred filenames, then write a script to combine the [a,b,c] into .bin files with my preferred filenames? If not, I'll have to rename 84 combined files.
I don't know how whether or not you've adopted these two points: (1) Checking after the zero loop ends to see if there is zero run that should be stored which otherwise would be missed because the last byte of the array is zero and (2) making zfreq[0] = number of zero runs.