Rather than one word constant, I think the 64K outputs with both words the same would be better, as it would test the immediate effect of the xor and bit reversal in all of these outputs.
I see the wisdom in this approach, as it might be useful for vetting constants much faster (and perhaps more reliably), as compared to running PractRand and BigCrush first.
This will test 192GB of output out to ~12 double iterated streams, which I can normalize against the following table, which I would like you to verify:
Total Values with Zero Count of [high 16 bits = low 16 bits]
Stream All 64K
11580670878241192581732119887732140940683267478792744120252899798544261067208816373927634608144548222953197981019578331172054112265180
Assuming the above is correct, here is the results for 'n ^ AD65h/RevBits(n)', as a goodbad example (data here is incorrect, will create a post with good data later):
Stream Avg. of 6 Runs Diff
124126728886933278124120645449761641761182209801030112112111300
Edit: Fixed double-iteration formula, and also notice that AD65h and its bit reverse, A6B5h, are both prime (but not sure of implications). Edit2: There was a correlation issue with my seeding/re-seeding, which somehow managed to make all of the above AD65h results closer to theoretical than they should be.
I will post correct results in response to Tony's request below.
Thanks for the results, I'll attempt to verify. I've been preoccupied with something else recently, but rest assured I haven't lost any interest. Are the results significantly worse if no bit reversal, nor anything else, is applied to the high word?
Attached file (.txt as .csv is not allowed) containing processed results consisting of 2 trials of 16 runs each for several variants on ^AD65h and ^A6B5h, where the total non-occurring value results were summed in each trial for each successive stream, then averaged, and finally subtracted from the theoretical value to create a difference result (which should approach zero).
The three variants tested (for each first iteration xor constant) were only different in how the second iteration value was processed: no-operation [NOP(xororshiro++)], rotate left 1 bit [ROL(xororshiro++,1)], and reverse bits [REVBITS(xororshiro++)]
Larger differences in the first several streams might be tolerable for a PRNG of much larger period (that did not have to satisfy equidistribution requirements in 65536 16GB streams), but (average of 16 run) differences greater than ~15 in the 6th stream or later would be an issue for any PRNG of larger period.
Edit: Keep in mind that the NOP results are not just obviously worse, but also possibly a side-effect indicated by the early GAP fail and FPF issues with PractRand. Edit2: There is a small error in the expected zeros and diff results in the attachment. Streams 1-6 (only) are offset by 10, 8, 4, 2, 0 and 1, respectively, which only minimally impacts the comparison of results.
This will test 192GB of output out to ~12 double iterated streams, which I can normalize against the following table, which I would like you to verify:
Total Values with Zero Count of [high 16 bits = low 16 bits]
Stream All 64K
11580670878241192581732119887732140940683267478792744120252899798544261067208816373927634608144548222953197981019578331172054112265180
I think the expected values above are not quite right for what I call pfreq0, the frequency of pair outputs that occur 0 times. If possible outputs = N and stream = S, then pfreq0 = pfreq1 = N/e^S. Here are my values to the nearest integer:
Stream All 64K
11580030169241092886933263412005442616276082298103111120
As the current emphasis is on the 64K duplicates, I've listed only the Stream 1 expected for All and the actual value for xoroshiro32++[13,5,10,9] = 1580109068, which is better than 99.99% accurate.
I think the expected values above are not quite right for what I call pfreq0, the frequency of pair outputs that occur 0 times...
Thanks for finding that. I replicated the issue... I typed e from memory and accidentally transposed two pairs of digits.
Next time I will just type 'Exp(1)', as reciting e is a dyslexic endeavor for me (and perhaps many others, I guess).
Luckily, the maximum error in my results is still small enough to accurately discern RevBits as the better choice when considering only those two constants.
I planned to write an automated search for least (perhaps weighted) differential sum of 12 (or 13 streams), considering only primes that have 7 to 9 bits set.
However, in order to establish automated vetting criteria (which will be used to expedite the process), I need to create some more reference data using a good PRNG (that is beyond reproach).
I planned to write an automated search for least (perhaps weighted) differential sum of 12 (or 13 streams), considering only primes that have 7 to 9 bits set.
However, in order to establish automated vetting criteria (which will be used to expedite the process), I need to create some more reference data using a good PRNG (that is beyond reproach).
Although it seems logical to consider primes, I am unaware of any proof that primes are directly applicable to this scenario. Prime, or not, bit pattern distribution is certainly relevant.
Additionally, a 'good' PRNG will have a much greater StdDev of trial stream differences, so is not very useful with XOROACC due to the relatively short protraction of the 16/17 bits of state extension.
Therefore, I have written a search for odd xor values that produce an absolute difference (average of 16 trials) sum of less than 80 (which is the current absolute difference sum for AD65h).
I will take the best of those and look at some basic PractRand and gjrand stats (since they are quick to produce), before considering deeper PractRand and BigCrush analysis.
Tony, have a look at the attachment, which is a fairly comprehensive analysis of 'xoroshiro32pp[13,5,10,9]^AD65h,Revbits(xoroshiro32pp[13,5,10,9])'.
In comparison to all of the other variations looked at, this is the best so far.
Let me know if you have any questions about the data, or if any further information is required.
Thanks for these latest results.
Re the ideal vs. actual pfreq0 distributions for the 64K duplicates (both words of 32-bit value the same), I tested xoroshiro64++[26,9,13,17] just to see how it behaved and here are the results for a single run with initial state = 1:
Note that a "stream" for xoroshiro64++ is a period of 2^32-1 iterations to match that for xoroshiro32. (It makes no difference in this case if 2^32 is used instead.) I'm hoping that using the amended ideal values I posted your averaged results for xoroshiro32++a/pp will improve a little.
Being able to record all the outputs to see how many iterations it takes for each of them to occur at least once would be more accurate than the duplicates and it would also give us the distributions of zero and non-zero runs (zfreq and nzfreq). 512MB would be enough to record all 2^32 outputs at one bit per output, but I still couldn't do it because that's all the RAM there is in my current PC.
A Chi-Square sum of (Actual - Expected)^2 / Expected for streams 1-11 would tell us the "goodness of fit" to the ideal pfreq0 distribution, the lower the better.
Here is a more accurate fpreq0 for the above AD65h (in the attachment), based on 112 independent 13 sequential stream trials:
1241372889233265412005448616276082298103112121130
Slight bias in streams 1, 2 and 5 is obvious, and greater than expected given the nature of this type of PRNG... however, I am in the process of trying hundreds of odd values, and have found nothing obviously better as yet, but am exploring a few contenders... I am fairly sure they are all prime, but need to double-check. Edit: 5B69h is NOT PRIME, and by initial inspection of pfreq0, PractRand and gjrand 32GB looks better than AD65!
I can likely perform a full test in 64-bit mode, in order to quickly allocate a full count array. Byte should be fine, I guess. Edit2: Code written (attached... I really dislike writing in C++, but I cannot argue with the performance) and some tests already done.
5B69h is no good, as it takes 27 streams to complete, instead of the expected 23. AD65h is still the top contender.
Now looking at some other prime/non-prime values of seemingly good merit (i.e. low chi-square sum).
Will post some data in the next several days.
Here is pfreq0 w/chi-square sum data on '^AD65h,RevBits', and 'Xoroshiro128** High 32-Bits', for comparison (single test, randomly seeded, scroll to bottom of table for sums):
Stream Ideal XOROACC Chi- Xoroshiro128** Chi-
04294967296 ^AD65h,RevBits Sqr High Bits Sqr
1158003016915799966150.7115800473600.1925812606155812388080.825812865191.1532138338302138373750.062138470090.81478665070786687130.17786611750.19528939262289460201.58289464391.78610646160106498551.28106491890.867391650339178810.4839165540.008144080114411300.0814421421.2595300415297800.135307791.03101949911944941.271952800.431171733714850.86722073.131226389262660.58267224.2013970895024.3798542.1914357134335.3636300.9615131412364.6112305.35164834453.044531.90171781504.351750.041865551.66650.001924210.39250.0420980.08110.5221330.0262.3122120.5432.7123010.7125.5224000.1600.16
Sum = 33.30 Sum = 36.73
By contrast, the simple variants using '^1,rol1', '+1,rol1' and '+1,revbits' all have lower pfreq0 chi-squared sums (20, 27, and 21, respectively)...
Yet PractRand and gjrand agree that the simpler variants are significantly inferior to '^AD65h,RevBits' at 32GB (in multiple ways), which is only 2 streams.
The dozens of the other constants I have considered with any depth have not made it through the gauntlet of tests I have done without showing some obvious weakness.
Due to the vast amount of testing required to adequately explore more candidates, I should say that I believe that '^AD65h,RevBits' must be within the top 99.5%, and that any improvements would be statistically very small by comparison.
I'm hoping that using the amended ideal values I posted your averaged results for xoroshiro32++a/pp will improve a little.
I had used the correct values for '^AD65,revbits' in the last zip attachment here, but did not calculate the chi-squared values or sums in the ods file... all of the required data is there.
Being able to record all the outputs to see how many iterations it takes for each of them to occur at least once would be more accurate than the duplicates and it would also give us the distributions of zero and non-zero runs (zfreq and nzfreq).
Being able to record all the outputs to see how many iterations it takes for each of them to occur at least once would be more accurate than the duplicates and it would also give us the distributions of zero and non-zero runs (zfreq and nzfreq).
That post is a concise description of the distributions and there isn't much more I can say apart from a good distribution does not imply a good score in randomness tests. [13,5,10,9] was chosen for xoroshiro32++ as it had the best (lowest) prank+zrank+nzrank and it performed well in PractRand tests.
XOROACC should be capable of slightly better pfreq, zfreq & nzfreq distributions than xoroshiro32++ [13,5,10,9] within a single stream. In other words, users wanting random 32-bit outputs that are a bit closer to the ideal could use XOROACC for 2^32-1 double-iterations instead of XORO32. Have you compared stream 1 Chi-Square pfreq/zfreq/nzfreq values for '^AD65,revbits' with the simple variants?
Chi-Square for multiple streams is something new and I need to think whether zfreq and nzfreq are still relevant and if so how. Maybe something else should be measured, such as the interval in double-iterations between outputs that have not occurred before. Note that the highest Chi-Square values for '^AD65,revbits' and most of its total are for streams 13-17. How do the simple variants perform for streams 1-12?
That post is a concise description of the distributions and there isn't much more I can say apart from a good distribution does not imply a good score in randomness tests. [13,5,10,9] was chosen for xoroshiro32++ as it had the best (lowest) prank+zrank+nzrank and it performed well in PractRand tests.
To be more clear, I have studied Evan's code for pfreq, zfreq & nzfreq, but my ability to read other's c code is poor and I have little use for or experience with Linux/gcc other than to run TestU01 and Seba's Hamming Weight Dependency (under Windows Subsystem for Linux). Therefore, I must somehow translate his work to mathematical / statistical fundamentals (using a standard reference book), then to Visual C++ .NET, and also expand it to cover the nuances of multiple streams (both sequential and/or randomly selected). I have heavily relied on statistical analysis packages (e.g. TestU01, PractRand, gjrand, RaBiGeTe and HWD) to avoid re-inventing the wheel. However, I have seen the value of implementing focused tests such as pfreq, etc., since it serves well in fast vetting and as a reality check... especially given that the extra streams make a mess out of the normal statistical package results.
So basic questions need answered:
How does pair_frequency(0) equate to what I have done so far with counting zerosnon-occurring values per stream, as I do not see the logical relationship?
Can you provide an example description of what pfreq(1), etc. is testing detailed enough so I can code it?
Note that the highest Chi-Square values for '^AD65,revbits' and most of its total are for streams 13-17. How do the simple variants perform for streams 1-12?
Since that was a single trial, the bubbles in streams 13-17 should be just a normal statistical variance. I will run 16+ randomly selected stream sequences for that, xoroshiro128**, and simple variants, like I did for high16=low16.
However, I need to look into automated seeding under .NET (probably using Crypto APIs, since my VB6 AMLS bit fountain would take a lot of work to translate)... that will take some time to come together.
To be more clear, I have studied Evan's code for pfreq, zfreq & nzfreq, but my ability to read other's c code is poor and I have little use for or experience with Linux/gcc other than to run TestU01 and Seba's Hamming Weight Dependency (under Windows Subsystem for Linux). Therefore, I must somehow translate his work to mathematical / statistical fundamentals (using a standard reference book), then to Visual C++ .NET, and also expand it to cover the nuances of multiple streams (both sequential and/or randomly selected). I have heavily relied on statistical analysis packages (e.g. TestU01, PractRand, gjrand, RaBiGeTe and HWD) to avoid re-inventing the wheel. However, I have seen the value of implementing focused tests such as pfreq, etc., since it serves well in fast vetting and as a reality check... especially given that the extra streams make a mess out of the normal statistical package results.
So basic questions need answered:
How does pair_frequency(0) equate to what I have done so far with counting zerosnon-occurring values per stream, as I do not see the logical relationship?
Can you provide an example description of what pfreq(1), etc. is testing detailed enough so I can code it?]
Evan translated my ideas into C. As I was concerned we were being overly reliant on PractRand, I came up with the pfreq distribution as an additional, reality-based test. Chris Doty-Humphrey, the author of PractRand, suggested also looking at zero and non-zero runs, which led to zfreq and nzfreq.
Some definitions. A pair is two successive 16-bit xoroshiro/xoroacc PRN outputs concatenated in one 32-bit value. Pair frequency(n) or pfreq(n) is the number of different pairs that occur exactly n times, thus pfreq(0) is the number of pairs that occur zero times and the answer to your first question as amended is the two are the same. The expected set of values of pfreq(n) for different n is the well-known binomial distribution.
In our case, 4GB of RAM is required to find the actual pfreq(n) values, configured as an array of 2^32 bytes, called pair, say. Zero the entire pair array, then iterate xoroshiro/xoroacc for 2^32-1 double-iterations. Use each 32-bit pair output as an index into the pair array and increment that particular byte value. Now we need another array, called pfreq, which is array of 256 longs. Zero pfreq first, then for n = 0 to 2^32-1, read byte at pair(n) and use that as index into pfreq and increment that long value. At the end, we will know the pfreq distribution in its entirety and most pfreq(n) will be zero. It is possible to calculate pfreq(0) without a pfreq array, by reading the pair array bytes before writing to check whether they are zero. If so, decrement a 32-bit counter, say pfreq0, which is initially zero.
So far, no mention of zfreq nor nzfreq. The pair array will have runs of consecutive zero or non-zero bytes, e.g. pair(0) to pair(10) might all be non-zero and pair(11) zero, in which case the non-zero run length is 11. When reading all of the pair array from pair(0) to pair(2^32-1) and writing to pfreq, it is quite easy to also record the zero and non-zero runs and write these to two other arrays of 256 longs called zfreq and nzfreq. The final run, zero or non-zero, is terminated by end of data, not a non-zero or zero value. So now we have three real-world distributions which we can compare against the expected values using Chi-Square goodness of fit tests. My maths is not good to prove the expected equations for zfreq and nzfreq, but results show them to be correct.
Evan created pfreq, zfreq & nzfreq distributions for all 84 full-length [a,b,c,d] xoroshiro32++ candidates, except for a handful of complete duds from earlier PractRand tests. I calculated Chi-Square values from the distribution data, ranking each of pfreq/zfreq/nzfreq separately to give prank/zrank/nzrank and adding these using equal weighting the lowest prank+zrank+nzrank was [13,5,10,9]. Its PractRand scores were very acceptable and therefore this [a,b,c,d] was chosen for the P2 rev B silicon.
Thanks for that explanation... it should be enough to move forward.
So far, I have some preliminary pfreq0 results from 14 (only, due to RAM limit, but should suffice) independent trials of xoroshiro128** high bits, ^AD65_RevBits, +1_RevBits and ^1_Revbits.
I will run a few more variants and post an expanded data set based on new code, once it is proofed to my satisfaction.
My initial preview of pfreq0 results indicates that there is indeed greater statistical bias in streams 13-18 with the variants tested so far.
Stream 13 is actually composited of single iterated streams 25 and 26, which might indicate an endemic artifact of the short protraction of equidistribution.
The way it occurred in XORO32 with the failure in PractRand at 512MB-1GB signaled a pending issue with double-iterated pfreq (if it had been carried beyond one double-iterated stream).
With XOROACC being based on that same underlying limit, the best that can be hoped for is to optimize the e xor mask to smooth the avalanche behavior.
For now, here is the pfreq0 data for ^AD65_Revbits (14 trials): Edit: Replaced table with ods attachment.
The explicit near-perfect statistical termination at stream 23 is not something shared in common with simpler variants.
I have compared two extended statistics with 'Xoroshiro128** High 32 bits': chi-squared sum of stream averages, and average of chi-squared sums.
^AD65_RevBits : 39.65 and 43.48
Xoroshiro128**: 1.89 and 19.43
The former value ideally should approach zero with more trials, and I can only guess that ~20 is very close to the ideal normal for the latter value.
My belief is that an optimal e constant may exist which would allow XOROACC to reach a lower limit of ~30 for both statistics while maintaining the stream 23 termination... a marginal improvement for the first statistic and a meaningful improvement for the last statistic.
'+1_Revbits' has just a little above the 30/30 that I propose... but terminates too soon, decidedly at stream 22. I had discarded it early on due the excessive PractRand failures... but still a viable option if one is concerned with only pfreq0 performance on just 1 double-iterated stream, as it out-performs '^AD65_RevBits' by a considerable margin on stream 1 pfreq0 chi-square of the average of 14 streams: 0.0068 vs. 0.42 (the former value nearly ideal, and the latter value being, I believe, much below 0.05 probability).
I have compared two extended statistics with 'Xoroshiro128** High 32 bits': chi-squared sum of stream averages, and average of chi-squared sums.
^AD65_RevBits : 39.65 and 43.48
Xoroshiro128**: 1.89 and 19.43
The former value ideally should approach zero with more trials, and I can only guess that ~20 is very close to the ideal normal for the latter value.
Extrapolating from what I have learned, I tried a few +1 variants.
Here is the best (which excels at PractRand, also):
(xoroshiro32pp ^ AD65h)_RevBits(xoroshiro32pp + 1): 19.51 and 22.88
What these pfreq0 statistics (ods file attached) imply is that an individual 23 stream trial is now similar to an individual xoroshiro128** (also attached) trial, but it only takes several trials to prove that that the per-stream biases are fixed to specific stream #s in the sequence.
I tried (xoroshiro32pp + 1)_RevBits(xoroshiro32pp + 1), and the results on PractRand and pfreq0 were not a step forward.
And yes, these hybrid variants somehow maintain the same equidistribution (which is quite extraordinary).
Tony, I think this type of simple hybrid variant is worth pursuing, but want your take on the implementation complexity (as this one adds carry implementation on one of the double iterations, in addition to the xor on the other).
Edited for clarity and to add attachments. Edit2: I am currently looking at (xoroshiro32pp ^ AD65h)_rotl16(xoroshiro32pp + 1, 1), which shows NO precursor issues in PractRand at 16GB... the only variant that has ever accomplished this! Edit3: (xoroshiro32pp ^ AD65h)_rotl16(xoroshiro32pp + 1, 1) is the best when completing 1 double-iterated stream, but decidedly completes pfreq0 two streams too early. I bet it would do well in BigCrush.
For now, here is the pfreq0 data for ^AD65_Revbits (14 trials): Edit: Replaced table with ods attachment.
How do initial state and initial acc differ for each of the 14 trials? Are you sure there is no overlap between any of them for at least 23 streams?
The main reason for using xoroacc is to get 2D equidistribution. I was concerned that outputting all possible outputs might take longer with our 'pseudo' 32-bit generator than with a 'proper' 32-bit or 64-bit one. However, tests have shown this concern to be completely misplaced. Taking one stream or even two fewer than the ideal is erring on the good side and I think we should not be too worried about 22 instead of a perfect 23.
Studying the results, pfreq0 in general is on the low side from stream 9 onwards and more clearly so from streams 13 to 18 or 19. There are ~500 fewer missing outputs after stream 13 compared to the ideal remainder of ~15000. Some trials have very low Chi-Squared errors for streams 1 to 9, say, during which more than 99.99% of the possible outputs occur.
Thus it seems likely that, by choosing good constants and the appropriate initial state and acc values, a single stream of 2^32-1 double-iterations could be extremely close to the ideal random distribution for a 32-bit output. I think further tests should look at the total pfreq, zfreq and nzfreq distributions for stream 1 and these results considered along with those for multi-stream pfreq0.
Re hardware implementation, there ought to be enough time for an addition of X+Y+1, but I think we should avoid too much complexity. I'm not sure what (xoroshiro32pp ^ AD65h)_rotl16(xoroshiro32pp + 1, 1) means because the two acc additions are only 16-bit.
How do initial state and initial acc differ for each of the 14 trials? Are you sure there is no overlap between any of them for at least 23 streams?
Non-deterministic random seeding, so there is no guarantee of no overlap, but basic birthday spacings logic applies, so (if I calculated correctly) there is about a 3.3% chance an overlap would occur.
Re Chi-Squared sums, it seems wrong to ignore an actual non-zero value that is expected to be zero. Of course, one cannot divide by zero, but perhaps the expected value could be changed to 1? In the end, though, this would affect the total only slightly.
I am considering the fractional expectation, so the first 0 (stream 23) is actually 0.44.
Taking one stream or even two fewer than the ideal is erring on the good side and I think we should not be too worried about 22 instead of a perfect 23.
Agreed, but it is very compelling to discover that the AD65h constant I previously found though great effort by other means also happens to satisfy the 23 streams so exactly.
Thus it seems likely that, by choosing good constants and the appropriate initial state and acc values, a single stream of 2^32-1 double-iterations could be extremely close to the ideal random distribution for a 32-bit output. I think further tests should look at the total pfreq, zfreq and nzfreq distributions for stream 1 and these results considered along with those for multi-stream pfreq0.
Agreed, but striking a good balance with meshing those results with results from PractRand/BigCrush/gjrand might be problematic, as the current evidence suggests.
However, a perhaps larger issue at the moment concerns how to gather total pfreq, zfreq, and nzfreq: i.e. I am currently averaging 14 of each x statistic (takes about 2.3 hours total), admittedly generated via a seeding method that will eventually cause collisions.
Re hardware implementation, there ought to be enough time for an addition of X+Y+1, but I think we should avoid too much complexity. I'm not sure what (xoroshiro32pp ^ AD65h)_rotl16(xoroshiro32pp + 1, 1) means because the two acc additions are only 16-bit.
My notation is poor (thus open to suggestion), but simply indicates that the first xoroshiro result is xor'd with AD65h (prior to accumulation), and the second result has 1 added, then a left rotation by 1 (prior to accumulation).
However, I don't think the extra complexity will be required. For instance, that particular example using the +1 with rotate looks horrible under BigCrush meta-analysis (often exceeding 3 StdDev). Think of the meta-analysis as performing a very similar function to pfreq, zfreq and nzfreq, except that it is meshing a few perhaps similar statistics, along with lots of others.
Limiting BigCrush to just 1 trial (instead of the meta-analysis 16 each of forward bits, reverse bits and byte reverse) is pointless, since most well-conceived variants I have tried pass it easily.
Thus it seems likely that, by choosing good constants and the appropriate initial state and acc values, a single stream of 2^32-1 double-iterations could be extremely close to the ideal random distribution for a 32-bit output. I think further tests should look at the total pfreq, zfreq and nzfreq distributions for stream 1 and these results considered along with those for multi-stream pfreq0.
Agreed, but striking a good balance with meshing those results with results from PractRand/BigCrush/gjrand might be problematic, as the current evidence suggests.
However, a perhaps larger issue at the moment concerns how to gather total pfreq, zfreq, and nzfreq: i.e. I am currently averaging 14 of each x statistic (takes about 2.3 hours total), admittedly generated via a seeding method that will eventually cause collisions.
I appreciate the time this all takes. pfreq, zfreq & nzfreq data could be generated at the same time. Perhaps you could test one of the 14 initially, the one with pfreq0 closest to the ideal? The fourth is 1580027099.
From testing xoroshiro32++, any pfreq0 that begins 15800... or 15799... could be considered excellent.
Re hardware implementation, there ought to be enough time for an addition of X+Y+1, but I think we should avoid too much complexity. I'm not sure what (xoroshiro32pp ^ AD65h)_rotl16(xoroshiro32pp + 1, 1) means because the two acc additions are only 16-bit.
My notation is poor (thus open to suggestion), but simply indicates that the first xoroshiro result is xor'd with AD65h (prior to accumulation), and the second result has 1 added, then a left rotation by 1 (prior to accumulation).
Notation understood now, it was the 16 in rotl16 that was confusing me.
I appreciate the time this all takes. pfreq, zfreq & nzfreq data could be generated at the same time. Perhaps you could test one of the 14 initially, the one with pfreq0 closest to the ideal? The fourth is 1580027099.
I was not recording the seeds, but had already started writing code to do so and add the additional statistics.
In any case, it takes the same amount of time to do 1 trial as it does 14 (with 15 perhaps possible).
I do not like the idea of hand picking data. However, for example, with electronic analytical balances, the old 3 of 5 algorithm suffices... so perform 5 independent trials and discard the high and low for each stream #, taking the average of the remaining 3. This approach might allow vetting (15/5) 3 candidate algorithms/constants simultaneously, for a total of ~30 per day (if fully automated).
Regarding other comments: '15800... or 15799' describes all of these... it is simply the perceived expectation of greater precision on stream #1 in particular, and > #1 as a bonus. I think it is within grasp, but that remains to be seen. I used the rotl16 notation, as other bit-size rotls existing in my code (so I have to consciously be careful in constructing logic in my head).
I appreciate the time this all takes. pfreq, zfreq & nzfreq data could be generated at the same time. Perhaps you could test one of the 14 initially, the one with pfreq0 closest to the ideal? The fourth is 1580027099.
I was not recording the seeds, but had already started writing code to do so and add the additional statistics.
In any case, it takes the same amount of time to do 1 trial as it does 14 (with 15 perhaps possible).
I do not like the idea of hand picking data. However, for example, with electronic analytical balances, the old 3 of 5 algorithm suffices... so perform 5 independent trials and discard the high and low for each stream #, taking the average of the remaining 3. This approach might allow vetting (15/5) 3 candidate algorithms/constants simultaneously, for a total of ~30 per day (if fully automated).
Regarding other comments: '15800... or 15799' describes all of these... it is simply the perceived expectation of greater precision on stream #1 in particular, and > #1 as a bonus. I think it is within grasp, but that remains to be seen. I used the rotl16 notation, as other bit-size rotls existing in my code (so I have to consciously be careful in constructing logic in my head).
If 14 trials are as easy as one, then please find the 14 pfreq/zfreq/nzfreq distributions for stream 1 and calculate the Chi-Squared values. It is possible that all 14 might be better than xoroshiro32++[13,5,10,9] which was 6th/5th/1st best for pfreq/zfreq/nzfreq individually and best overall.
... it is simply the perceived expectation of greater precision on stream #1 in particular, and > #1 as a bonus. I think it is within grasp, but that remains to be seen.
As I understand it, when implemented in hardware, reversing the bits is just as easy as not reversing, or as rotating, etc., as these are all topological equivalents. There are even more topological equivalents, such as '(rol(xoroshiro32pp, 1) ^ AD65h)_revbits(rol(xoroshiro32pp, 1))' and many more... dozens of easily expressed ones. And these are just waiting for discovery of any additional constants that surpass AD65h (which I need to get my other server working on again, now that I understand a bit more about the nature of that value).
Taken together, I am quite optimistic this concept should exceed any reasonable expectation (and already is far beyond my original expectations).
I am still working on formatting data in the file output, so I just manually put this pfreq(x) together from 5 random stream trials of: (rotl(xoroshiro32pp, 7) ^ AD65h)_(revbits(rotl(xoroshiro32pp, 7)))
The rotate left by 7 normalizes the highest entropy xoroshiro32++[13,5,10,9] bits to the ends (left for first iteration and right for second, due to revbits).
Edit: Fixed table to use floating point (instead of integer) values for calculation (which had minimal impact).
Tony, for the zfreq(0), how is 'mean run' calculated from it, or is it something different?
For example, I have a zfreq(0) from an arbitrary output file of 998774320, but an ideal 'mean run' from this table is 1.58197671.
I see the result is not used with the chi^2, so perhaps it is not important?
Tony, for the zfreq(0), how is 'mean run' calculated from it, or is it something different?
For example, I have a zfreq(0) from an arbitrary output file of 998774320, but an ideal 'mean run' from this table is 1.58197671.
I see the result is not used with the chi^2, so perhaps it is not important?
Chris, thanks for the pfreq results. It will be interesting to see the corresponding zfreq & nzfreq.
The following post defined mean zero run, see term (i) under Notes: http://forums.parallax.com/discussion/comment/1438409/#Comment_1438409
Mean non-zero run can be calculated in the same way and its expected value = e. I didn't know about Chi-Squared when I wrote this post and I used the incorrect |Actual-Expected|/Expected instead.
zfreq(0) and nzfreq(0) have no real meaning, so I thought of using their table entries for something else. The expected values were easy to deduce from a few results, which gives us another measure of quality or two. I thought at one time that mean run might be the most accurate, but I'm not so sure now because it is a ratio and errors could cancel out (or rather be divided out). zfreq(0)/nzfreq(0) could be used to store the number of zero/non-zero runs when generating the data and the two values will either be the same or differ by one.
P.S.
I think that mean run is important and Chi-Squared could be applied to it.
Also I now regard pfreq, zfreq & nzfreq as having equal worth.
It will be interesting to see the corresponding zfreq & nzfreq.
Thanks for the extra info. Sorry about skipping ahead abruptly in variants... I was curious to try and project limits to the possible randomness, and can back-fill stats for previous / simpler variants later (which may actually catch something I missed).
Here is the zfreq(x) for that same advanced rol7 variant (without addressing the meanrun):
Notice the 5th stream trial glitched at the very end... which only succeeded in demoting that particular stream sum to place it in the same class as 'scrorarns'.
Still manually constructing these tables... nzfreq will be a little bit harder.
Still manually constructing these tables... nzfreq will be a little bit harder.
The scro generator is excellent on pfreq, but your results show it can be beaten on zfreq. I'm not sure that averaging the 7x scores is allowed, though.
If it's too time consuming for you, I could use pfreq/zfreq/nzfreq binary data files (preferably named *.pb/*.zb/*.nzb) as inputs to a QuickBASIC program that generates tables. 256 longs per file, starting at freq(0), with zeroes for zfreq(0) and pfreq(0) if it is easier as total runs and mean run can be derived from freq(1+).
I'm not sure that averaging the 7x scores is allowed, though.
It is the only way I can think of to represent the generator as a whole, though I intend to use 14 (or 15) scores for the average.
To put the question to rest, we can look at scores for xoroshiro128** high 32 bits... I'll run them tomorrow and post the binaries for you to look at.
I was able to get Evan's code to work in .NET, with a few adjustments (e.g. string and monotonic clock related code changed for cross-compatibility... I will not call myself a C++ programmer, so still amazed I can get anything to work right across two platforms).
I'm not sure that averaging the 7x scores is allowed, though.
It is the only way I can think of to represent the generator as a whole, though I intend to use 14 (or 15) scores for the average.
Averaging the chi^2 total is allowable, I think, but not chi^2 for individual frequencies. Summing prank+zrank+nzrank would give a winner for 7A-7E, based on frequency distribution.
Comments
This will test 192GB of output out to ~12 double iterated streams, which I can normalize against the following table, which I would like you to verify:
Total Values with Zero Count of [high 16 bits = low 16 bits] Stream All 64K 1 1580670878 24119 2 581732119 8877 3 214094068 3267 4 78792744 1202 5 28997985 442 6 10672088 163 7 3927634 60 8 1445482 22 9 531979 8 10 195783 3 11 72054 1 12 26518 0
Assuming the above is correct, here is the results for 'n ^ AD65h/RevBits(n)', as a good bad example (data here is incorrect, will create a post with good data later):Stream Avg. of 6 Runs Diff 1 24126 7 2 8886 9 3 3278 12 4 1206 4 5 449 7 6 164 1 7 61 1 8 22 0 9 8 0 10 3 0 11 2 1 12 1 1 13 0 0
Edit: Fixed double-iteration formula, and also notice that AD65h and its bit reverse, A6B5h, are both prime (but not sure of implications).Edit2: There was a correlation issue with my seeding/re-seeding, which somehow managed to make all of the above AD65h results closer to theoretical than they should be.
I will post correct results in response to Tony's request below.
The three variants tested (for each first iteration xor constant) were only different in how the second iteration value was processed: no-operation [NOP(xororshiro++)], rotate left 1 bit [ROL(xororshiro++,1)], and reverse bits [REVBITS(xororshiro++)]
Larger differences in the first several streams might be tolerable for a PRNG of much larger period (that did not have to satisfy equidistribution requirements in 65536 16GB streams), but (average of 16 run) differences greater than ~15 in the 6th stream or later would be an issue for any PRNG of larger period.
Edit: Keep in mind that the NOP results are not just obviously worse, but also possibly a side-effect indicated by the early GAP fail and FPF issues with PractRand.
Edit2: There is a small error in the expected zeros and diff results in the attachment. Streams 1-6 (only) are offset by 10, 8, 4, 2, 0 and 1, respectively, which only minimally impacts the comparison of results.
Stream All 64K 1 1580030169 24109 2 8869 3 3263 4 1200 5 442 6 162 7 60 8 22 9 8 10 3 11 1 12 0
As the current emphasis is on the 64K duplicates, I've listed only the Stream 1 expected for All and the actual value for xoroshiro32++[13,5,10,9] = 1580109068, which is better than 99.99% accurate.Next time I will just type 'Exp(1)', as reciting e is a dyslexic endeavor for me (and perhaps many others, I guess).
Luckily, the maximum error in my results is still small enough to accurately discern RevBits as the better choice when considering only those two constants.
I planned to write an automated search for least (perhaps weighted) differential sum of 12 (or 13 streams), considering only primes that have 7 to 9 bits set.
However, in order to establish automated vetting criteria (which will be used to expedite the process), I need to create some more reference data using a good PRNG (that is beyond reproach).
In comparison to all of the other variations looked at, this is the best so far.
Let me know if you have any questions about the data, or if any further information is required.
Additionally, a 'good' PRNG will have a much greater StdDev of trial stream differences, so is not very useful with XOROACC due to the relatively short protraction of the 16/17 bits of state extension.
Therefore, I have written a search for odd xor values that produce an absolute difference (average of 16 trials) sum of less than 80 (which is the current absolute difference sum for AD65h).
I will take the best of those and look at some basic PractRand and gjrand stats (since they are quick to produce), before considering deeper PractRand and BigCrush analysis.
Re the ideal vs. actual pfreq0 distributions for the 64K duplicates (both words of 32-bit value the same), I tested xoroshiro64++[26,9,13,17] just to see how it behaved and here are the results for a single run with initial state = 1:
Stream Ideal Xoro64 1 24109 24128 2 8869 8886 3 3263 3224 4 1200 1178 5 442 446 6 162 159 7 60 53 8 22 19 9 8 8 10 3 5 11 1 3 12 0 2 13 0 0
Note that a "stream" for xoroshiro64++ is a period of 2^32-1 iterations to match that for xoroshiro32. (It makes no difference in this case if 2^32 is used instead.) I'm hoping that using the amended ideal values I posted your averaged results for xoroshiro32++a/pp will improve a little.Being able to record all the outputs to see how many iterations it takes for each of them to occur at least once would be more accurate than the duplicates and it would also give us the distributions of zero and non-zero runs (zfreq and nzfreq). 512MB would be enough to record all 2^32 outputs at one bit per output, but I still couldn't do it because that's all the RAM there is in my current PC.
A Chi-Square sum of (Actual - Expected)^2 / Expected for streams 1-11 would tell us the "goodness of fit" to the ideal pfreq0 distribution, the lower the better.
EDIT:
Corrected Chi-Square value.
1 24137 2 8892 3 3265 4 1200 5 448 6 162 7 60 8 22 9 8 10 3 11 2 12 1 13 0
Slight bias in streams 1, 2 and 5 is obvious, and greater than expected given the nature of this type of PRNG... however, I am in the process of trying hundreds of odd values, and have found nothing obviously better as yet, but am exploring a few contenders... I am fairly sure they are all prime, but need to double-check.Edit: 5B69h is NOT PRIME, and by initial inspection of pfreq0, PractRand and gjrand 32GB looks better than AD65!
I can likely perform a full test in 64-bit mode, in order to quickly allocate a full count array. Byte should be fine, I guess.
Edit2: Code written (attached... I really dislike writing in C++, but I cannot argue with the performance) and some tests already done.
5B69h is no good, as it takes 27 streams to complete, instead of the expected 23. AD65h is still the top contender.
Now looking at some other prime/non-prime values of seemingly good merit (i.e. low chi-square sum).
Will post some data in the next several days.
Stream Ideal XOROACC Chi- Xoroshiro128** Chi- 0 4294967296 ^AD65h,RevBits Sqr High Bits Sqr 1 1580030169 1579996615 0.71 1580047360 0.19 2 581260615 581238808 0.82 581286519 1.15 3 213833830 213837375 0.06 213847009 0.81 4 78665070 78668713 0.17 78661175 0.19 5 28939262 28946020 1.58 28946439 1.78 6 10646160 10649855 1.28 10649189 0.86 7 3916503 3917881 0.48 3916554 0.00 8 1440801 1441130 0.08 1442142 1.25 9 530041 529780 0.13 530779 1.03 10 194991 194494 1.27 195280 0.43 11 71733 71485 0.86 72207 3.13 12 26389 26266 0.58 26722 4.20 13 9708 9502 4.37 9854 2.19 14 3571 3433 5.36 3630 0.96 15 1314 1236 4.61 1230 5.35 16 483 445 3.04 453 1.90 17 178 150 4.35 175 0.04 18 65 55 1.66 65 0.00 19 24 21 0.39 25 0.04 20 9 8 0.08 11 0.52 21 3 3 0.02 6 2.31 22 1 2 0.54 3 2.71 23 0 1 0.71 2 5.52 24 0 0 0.16 0 0.16 Sum = 33.30 Sum = 36.73
By contrast, the simple variants using '^1,rol1', '+1,rol1' and '+1,revbits' all have lower pfreq0 chi-squared sums (20, 27, and 21, respectively)...Yet PractRand and gjrand agree that the simpler variants are significantly inferior to '^AD65h,RevBits' at 32GB (in multiple ways), which is only 2 streams.
The dozens of the other constants I have considered with any depth have not made it through the gauntlet of tests I have done without showing some obvious weakness.
Due to the vast amount of testing required to adequately explore more candidates, I should say that I believe that '^AD65h,RevBits' must be within the top 99.5%, and that any improvements would be statistically very small by comparison.
Can you provide a link or description that elaborates on pfreq(x), zfreq(x) and nzfreq(x) in addition to what you wrote here: https://forums.parallax.com/discussion/comment/1469463/#Comment_1469463?
That post is a concise description of the distributions and there isn't much more I can say apart from a good distribution does not imply a good score in randomness tests. [13,5,10,9] was chosen for xoroshiro32++ as it had the best (lowest) prank+zrank+nzrank and it performed well in PractRand tests.
XOROACC should be capable of slightly better pfreq, zfreq & nzfreq distributions than xoroshiro32++ [13,5,10,9] within a single stream. In other words, users wanting random 32-bit outputs that are a bit closer to the ideal could use XOROACC for 2^32-1 double-iterations instead of XORO32. Have you compared stream 1 Chi-Square pfreq/zfreq/nzfreq values for '^AD65,revbits' with the simple variants?
Chi-Square for multiple streams is something new and I need to think whether zfreq and nzfreq are still relevant and if so how. Maybe something else should be measured, such as the interval in double-iterations between outputs that have not occurred before. Note that the highest Chi-Square values for '^AD65,revbits' and most of its total are for streams 13-17. How do the simple variants perform for streams 1-12?
So basic questions need answered:
How does pair_frequency(0) equate to what I have done so far with counting zeros non-occurring values per stream, as I do not see the logical relationship?
Can you provide an example description of what pfreq(1), etc. is testing detailed enough so I can code it? Since that was a single trial, the bubbles in streams 13-17 should be just a normal statistical variance. I will run 16+ randomly selected stream sequences for that, xoroshiro128**, and simple variants, like I did for high16=low16.
However, I need to look into automated seeding under .NET (probably using Crypto APIs, since my VB6 AMLS bit fountain would take a lot of work to translate)... that will take some time to come together.
Edited to correct ambiguity, in bold.
Evan translated my ideas into C. As I was concerned we were being overly reliant on PractRand, I came up with the pfreq distribution as an additional, reality-based test. Chris Doty-Humphrey, the author of PractRand, suggested also looking at zero and non-zero runs, which led to zfreq and nzfreq.
Some definitions. A pair is two successive 16-bit xoroshiro/xoroacc PRN outputs concatenated in one 32-bit value. Pair frequency(n) or pfreq(n) is the number of different pairs that occur exactly n times, thus pfreq(0) is the number of pairs that occur zero times and the answer to your first question as amended is the two are the same. The expected set of values of pfreq(n) for different n is the well-known binomial distribution.
In our case, 4GB of RAM is required to find the actual pfreq(n) values, configured as an array of 2^32 bytes, called pair, say. Zero the entire pair array, then iterate xoroshiro/xoroacc for 2^32-1 double-iterations. Use each 32-bit pair output as an index into the pair array and increment that particular byte value. Now we need another array, called pfreq, which is array of 256 longs. Zero pfreq first, then for n = 0 to 2^32-1, read byte at pair(n) and use that as index into pfreq and increment that long value. At the end, we will know the pfreq distribution in its entirety and most pfreq(n) will be zero. It is possible to calculate pfreq(0) without a pfreq array, by reading the pair array bytes before writing to check whether they are zero. If so, decrement a 32-bit counter, say pfreq0, which is initially zero.
So far, no mention of zfreq nor nzfreq. The pair array will have runs of consecutive zero or non-zero bytes, e.g. pair(0) to pair(10) might all be non-zero and pair(11) zero, in which case the non-zero run length is 11. When reading all of the pair array from pair(0) to pair(2^32-1) and writing to pfreq, it is quite easy to also record the zero and non-zero runs and write these to two other arrays of 256 longs called zfreq and nzfreq. The final run, zero or non-zero, is terminated by end of data, not a non-zero or zero value. So now we have three real-world distributions which we can compare against the expected values using Chi-Square goodness of fit tests. My maths is not good to prove the expected equations for zfreq and nzfreq, but results show them to be correct.
Evan created pfreq, zfreq & nzfreq distributions for all 84 full-length [a,b,c,d] xoroshiro32++ candidates, except for a handful of complete duds from earlier PractRand tests. I calculated Chi-Square values from the distribution data, ranking each of pfreq/zfreq/nzfreq separately to give prank/zrank/nzrank and adding these using equal weighting the lowest prank+zrank+nzrank was [13,5,10,9]. Its PractRand scores were very acceptable and therefore this [a,b,c,d] was chosen for the P2 rev B silicon.
So far, I have some preliminary pfreq0 results from 14 (only, due to RAM limit, but should suffice) independent trials of xoroshiro128** high bits, ^AD65_RevBits, +1_RevBits and ^1_Revbits.
I will run a few more variants and post an expanded data set based on new code, once it is proofed to my satisfaction.
My initial preview of pfreq0 results indicates that there is indeed greater statistical bias in streams 13-18 with the variants tested so far.
Stream 13 is actually composited of single iterated streams 25 and 26, which might indicate an endemic artifact of the short protraction of equidistribution.
The way it occurred in XORO32 with the failure in PractRand at 512MB-1GB signaled a pending issue with double-iterated pfreq (if it had been carried beyond one double-iterated stream).
With XOROACC being based on that same underlying limit, the best that can be hoped for is to optimize the e xor mask to smooth the avalanche behavior.
For now, here is the pfreq0 data for ^AD65_Revbits (14 trials):
Edit: Replaced table with ods attachment.
The explicit near-perfect statistical termination at stream 23 is not something shared in common with simpler variants.
I have compared two extended statistics with 'Xoroshiro128** High 32 bits': chi-squared sum of stream averages, and average of chi-squared sums.
^AD65_RevBits : 39.65 and 43.48
Xoroshiro128**: 1.89 and 19.43
The former value ideally should approach zero with more trials, and I can only guess that ~20 is very close to the ideal normal for the latter value.
My belief is that an optimal e constant may exist which would allow XOROACC to reach a lower limit of ~30 for both statistics while maintaining the stream 23 termination... a marginal improvement for the first statistic and a meaningful improvement for the last statistic.
'+1_Revbits' has just a little above the 30/30 that I propose... but terminates too soon, decidedly at stream 22. I had discarded it early on due the excessive PractRand failures... but still a viable option if one is concerned with only pfreq0 performance on just 1 double-iterated stream, as it out-performs '^AD65_RevBits' by a considerable margin on stream 1 pfreq0 chi-square of the average of 14 streams: 0.0068 vs. 0.42 (the former value nearly ideal, and the latter value being, I believe, much below 0.05 probability).
Here is the best (which excels at PractRand, also):
(xoroshiro32pp ^ AD65h)_RevBits(xoroshiro32pp + 1): 19.51 and 22.88
What these pfreq0 statistics (ods file attached) imply is that an individual 23 stream trial is now similar to an individual xoroshiro128** (also attached) trial, but it only takes several trials to prove that that the per-stream biases are fixed to specific stream #s in the sequence.
I tried (xoroshiro32pp + 1)_RevBits(xoroshiro32pp + 1), and the results on PractRand and pfreq0 were not a step forward.
And yes, these hybrid variants somehow maintain the same equidistribution (which is quite extraordinary).
Tony, I think this type of simple hybrid variant is worth pursuing, but want your take on the implementation complexity (as this one adds carry implementation on one of the double iterations, in addition to the xor on the other).
Edited for clarity and to add attachments.
Edit2: I am currently looking at (xoroshiro32pp ^ AD65h)_rotl16(xoroshiro32pp + 1, 1), which shows NO precursor issues in PractRand at 16GB... the only variant that has ever accomplished this!
Edit3: (xoroshiro32pp ^ AD65h)_rotl16(xoroshiro32pp + 1, 1) is the best when completing 1 double-iterated stream, but decidedly completes pfreq0 two streams too early. I bet it would do well in BigCrush.
How do initial state and initial acc differ for each of the 14 trials? Are you sure there is no overlap between any of them for at least 23 streams?
The main reason for using xoroacc is to get 2D equidistribution. I was concerned that outputting all possible outputs might take longer with our 'pseudo' 32-bit generator than with a 'proper' 32-bit or 64-bit one. However, tests have shown this concern to be completely misplaced. Taking one stream or even two fewer than the ideal is erring on the good side and I think we should not be too worried about 22 instead of a perfect 23.
Studying the results, pfreq0 in general is on the low side from stream 9 onwards and more clearly so from streams 13 to 18 or 19. There are ~500 fewer missing outputs after stream 13 compared to the ideal remainder of ~15000. Some trials have very low Chi-Squared errors for streams 1 to 9, say, during which more than 99.99% of the possible outputs occur.
Thus it seems likely that, by choosing good constants and the appropriate initial state and acc values, a single stream of 2^32-1 double-iterations could be extremely close to the ideal random distribution for a 32-bit output. I think further tests should look at the total pfreq, zfreq and nzfreq distributions for stream 1 and these results considered along with those for multi-stream pfreq0.
Re hardware implementation, there ought to be enough time for an addition of X+Y+1, but I think we should avoid too much complexity. I'm not sure what (xoroshiro32pp ^ AD65h)_rotl16(xoroshiro32pp + 1, 1) means because the two acc additions are only 16-bit.
However, a perhaps larger issue at the moment concerns how to gather total pfreq, zfreq, and nzfreq: i.e. I am currently averaging 14 of each x statistic (takes about 2.3 hours total), admittedly generated via a seeding method that will eventually cause collisions. My notation is poor (thus open to suggestion), but simply indicates that the first xoroshiro result is xor'd with AD65h (prior to accumulation), and the second result has 1 added, then a left rotation by 1 (prior to accumulation).
However, I don't think the extra complexity will be required. For instance, that particular example using the +1 with rotate looks horrible under BigCrush meta-analysis (often exceeding 3 StdDev). Think of the meta-analysis as performing a very similar function to pfreq, zfreq and nzfreq, except that it is meshing a few perhaps similar statistics, along with lots of others.
Limiting BigCrush to just 1 trial (instead of the meta-analysis 16 each of forward bits, reverse bits and byte reverse) is pointless, since most well-conceived variants I have tried pass it easily.
From testing xoroshiro32++, any pfreq0 that begins 15800... or 15799... could be considered excellent.
Notation understood now, it was the 16 in rotl16 that was confusing me.
In any case, it takes the same amount of time to do 1 trial as it does 14 (with 15 perhaps possible).
I do not like the idea of hand picking data. However, for example, with electronic analytical balances, the old 3 of 5 algorithm suffices... so perform 5 independent trials and discard the high and low for each stream #, taking the average of the remaining 3. This approach might allow vetting (15/5) 3 candidate algorithms/constants simultaneously, for a total of ~30 per day (if fully automated).
Regarding other comments: '15800... or 15799' describes all of these... it is simply the perceived expectation of greater precision on stream #1 in particular, and > #1 as a bonus. I think it is within grasp, but that remains to be seen. I used the rotl16 notation, as other bit-size rotls existing in my code (so I have to consciously be careful in constructing logic in my head).
If 14 trials are as easy as one, then please find the 14 pfreq/zfreq/nzfreq distributions for stream 1 and calculate the Chi-Squared values. It is possible that all 14 might be better than xoroshiro32++[13,5,10,9] which was 6th/5th/1st best for pfreq/zfreq/nzfreq individually and best overall.
Taken together, I am quite optimistic this concept should exceed any reasonable expectation (and already is far beyond my original expectations). Will do... still reviewing Evan's code for details on relevant C++ constructs related to managing the forthcoming flood of data.
I may be out of touch for a little bit... too many balls in the air this week, but rest assured I am working on it.
#Actual #a b c d pfreq0 pfreq1 pfreq2 pfreq3 pfreq4 pfreq5 pfreq6 pfreq7 pfreq8 pfreq9 pfreq10 pfreq11 pfreq12 pfreq13 #ideal 1580030169 1580030169 790015084 263338361 65834590 13166918 2194486 313498 39187 4354 435 40 3 0 #scro 1580017722 1580044833 790021507 263329166 65837043 13165427 2194052 313378 39360 4320 448 36 3 0 13 5 10 9 1580109068 1579940482 789973580 263380902 65840007 13169609 2196174 313409 39161 4403 463 34 4 0 7A 1580015458 1580041521 790027661 263334728 65829677 13168221 2193149 312728 39414 4271 436 30 2 0 7B 1580067602 1579951300 790054791 263344764 65830561 13167558 2192369 314395 39031 4442 435 44 4 0 7C 1580074879 1579931497 790071246 263339092 65834637 13165036 2193291 313852 38857 4409 448 49 3 0 7D 1580018245 1580052681 789996805 263355476 65824356 13167688 2194665 313514 39193 4195 437 38 2 1 7E 1580016772 1580028981 790049016 263320979 65835984 13165412 2193580 312926 38945 4220 446 33 2 0 7A-E Avg. 1580038591 1580001196 790039904 263339008 65831043 13166783 2193411 313483 39088 4307 440 39 3 0 #|Actual-Ideal|^2/Ideal #a b c d pfreq0 pfreq1 pfreq2 pfreq3 pfreq4 pfreq5 pfreq6 pfreq7 pfreq8 pfreq9 pfreq10 pfreq11 pfreq12 pfreq13 total #ideal 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #scro 0 0 0 0 0 0 0 0 1 0 0 0 0 0 3 13 5 10 9 4 5 2 7 0 1 1 0 0 1 2 1 0 0 24 7A 0 0 0 0 0 0 1 2 1 2 0 2 1 0 10 7B 1 4 2 0 0 0 2 3 1 2 0 0 0 0 15 7C 1 6 4 0 0 0 1 0 3 1 0 2 0 0 19 7D 0 0 0 1 2 0 0 0 0 6 0 0 1 2 12 7E 0 0 1 1 0 0 0 1 1 4 0 1 1 0 12 7A-E Avg. 0 1 1 0 0 0 1 0 0 1 0 0 0 0 3
The rotate left by 7 normalizes the highest entropy xoroshiro32++[13,5,10,9] bits to the ends (left for first iteration and right for second, due to revbits).Edit: Fixed table to use floating point (instead of integer) values for calculation (which had minimal impact).
For example, I have a zfreq(0) from an arbitrary output file of 998774320, but an ideal 'mean run' from this table is 1.58197671.
I see the result is not used with the chi^2, so perhaps it is not important?
Chris, thanks for the pfreq results. It will be interesting to see the corresponding zfreq & nzfreq.
The following post defined mean zero run, see term (i) under Notes:
http://forums.parallax.com/discussion/comment/1438409/#Comment_1438409
Mean non-zero run can be calculated in the same way and its expected value = e. I didn't know about Chi-Squared when I wrote this post and I used the incorrect |Actual-Expected|/Expected instead.
zfreq(0) and nzfreq(0) have no real meaning, so I thought of using their table entries for something else. The expected values were easy to deduce from a few results, which gives us another measure of quality or two. I thought at one time that mean run might be the most accurate, but I'm not so sure now because it is a ratio and errors could cancel out (or rather be divided out). zfreq(0)/nzfreq(0) could be used to store the number of zero/non-zero runs when generating the data and the two values will either be the same or differ by one.
P.S.
I think that mean run is important and Chi-Squared could be applied to it.
Also I now regard pfreq, zfreq & nzfreq as having equal worth.
Here is the zfreq(x) for that same advanced rol7 variant (without addressing the meanrun):
#Actual #a b c d meanrun zfreq1 zfreq2 zfreq3 zfreq4 zfreq5 zfreq6 zfreq7 zfreq8 zfreq9 zfreq10 zfreq11 zfreq12 zfreq13 zfreq14 zfreq15 zfreq16 zfreq17 zfreq18 zfreq19 zfreq20 zfreq21 zfreq22 zfreq23 #ideal 1.58197671 631342768 232258025 85442952 31432706 11563446 4253954 1564942 575710 211792 77914 28663 10544 3879 1427 525 193 71 26 9.62 3.54 1.30 0.48 0.18 #scro 1.58199635 631325497 232246374 85451022 31434743 11560489 4254398 1565717 576225 211365 77684 28938 10692 3807 1451 510 212 63 30 21 7 1 0 0 13 5 10 9 1.58240278 631068452 232245039 85428827 31471299 11583389 4264295 1572133 579737 212366 78766 28980 10799 3953 1534 532 202 69 26 6 4 3 0 0 7A 998774320 631349023 232259106 85448687 31428705 11558563 4255503 1565047 575490 211179 77810 28582 10523 3905 1352 535 188 77 27 14 2 1 0 1 7B 998779783 631346495 232262172 85435049 31438481 11563714 4256536 1567259 575860 211422 77464 28832 10387 3843 1453 535 176 67 28 7 2 1 0 0 7C 998776905 631336812 232266097 85442633 31433127 11562187 4257603 1567525 576209 211951 77484 28760 10407 3855 1453 519 182 63 27 9 1 1 0 0 7D 998765675 631355806 232244264 85439694 31425333 11566605 4256881 1567013 575349 211817 77457 28821 10511 3874 1404 530 196 74 28 11 5 2 0 0 7E 998763416 631329577 232263755 85451874 31430101 11557090 4257396 1564027 575600 210998 77616 28734 10497 3838 1441 557 198 69 29 11 4 1 1 2 7 Avg. 998772020 631343543 232259079 85443587 31431149 11561632 4256784 1566174 575702 211473 77566 28746 10465 3863 1421 535 188 70 28 10 3 1 0 1 #|Actual-Ideal|^2/Ideal #a b c d zfreq1 zfreq2 zfreq3 zfreq4 zfreq5 zfreq6 zfreq7 zfreq8 zfreq9 zfreq10 zfreq11 zfreq12 zfreq13 zfreq14 zfreq15 zfreq16 zfreq17 zfreq18 zfreq19 zfreq20 zfreq21 zfreq22 zfreq23 total #ideal 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #scro 0 1 1 0 1 0 0 0 1 1 3 2 1 0 0 2 1 1 12 2 0 0 0 30 13 5 10 9 119 1 2 47 34 25 33 28 2 9 4 6 1 8 0 0 0 0 2 0 4 0 0 326 7A 0 0 0 1 2 1 0 0 2 0 0 0 0 4 0 0 0 0 2 1 0 0 4 18 7B 0 0 1 1 0 2 3 0 1 3 1 2 0 0 0 2 0 0 1 1 0 0 0 19 7C 0 0 0 0 0 3 4 0 0 2 0 2 0 0 0 1 1 0 0 2 0 0 0 18 7D 0 1 0 2 1 2 3 0 0 3 1 0 0 0 0 0 0 0 0 1 0 0 0 15 7E 0 0 1 0 3 3 1 0 3 1 0 0 0 0 2 0 0 0 0 0 0 1 19 36 7 Avg. 0 0 0 0 0 2 1 0 0 2 0 1 0 0 0 0 0 0 0 0 0 0 1 8
Notice the 5th stream trial glitched at the very end... which only succeeded in demoting that particular stream sum to place it in the same class as 'scrorarns'.Still manually constructing these tables... nzfreq will be a little bit harder.
The scro generator is excellent on pfreq, but your results show it can be beaten on zfreq. I'm not sure that averaging the 7x scores is allowed, though.
If it's too time consuming for you, I could use pfreq/zfreq/nzfreq binary data files (preferably named *.pb/*.zb/*.nzb) as inputs to a QuickBASIC program that generates tables. 256 longs per file, starting at freq(0), with zeroes for zfreq(0) and pfreq(0) if it is easier as total runs and mean run can be derived from freq(1+).
#Actual #a b c d nzfreq1 nzfreq2 nzfreq3 nzfreq4 nzfreq5 nzfreq6 nzfreq7 nzfreq8 nzfreq9 nzfreq10nzfreq11nzfreq12nzfreq13nzfreq14nzfreq15nzfreq16nzfreq17nzfreq18nzfreq19nzfreq20nzfreq21nzfreq22nzfreq23nzfreq24nzfreq25nzfreq26nzfreq27nzfreq28nzfreq29nzfreq30nzfreq31nzfreq32nzfreq33nzfreq34nzfreq35nzfreq36nzfreq37nzfreq38nzfreq39nzfreq40nzfreq41nzfreq42nzfreq43nzfreq44nzfreq45 #ideal 367426785 232258025 146815072 92804826 58663838 37082618 23440685 14817339 9366345 5920659 3742570 2365756 1495443 945300 597544 377720 238764 150928 95405 60307 38121 24097 15232 9629 6087 3847 2432 1537 972 614 388 245 155 98 62 39 25 16 9.90 6.26 3.96 2.50 1.58 1.00 0.63 0.40 0.25 0.16 #scro 367409237 232249822 146806621 92822979 58653366 37074317 23448505 14822833 9366603 5924351 3742556 2364557 1495255 945073 597224 376964 238020 151211 95677 60491 37941 24158 15310 9608 6148 3831 2415 1568 998 609 350 233 154 100 64 39 19 14 13 2 6 3 0 0 0 0 0 0 13 5 10 9 367302869 232229119 146718311 92791823 58677241 37058196 23440771 14827097 9386855 5930327 3748361 2370708 1497167 948777 596331 378695 238976 150237 95204 60327 37981 24077 15118 9441 6084 3877 2422 1443 958 559 384 265 149 99 63 45 16 17 10 4 5 1 2 1 0 0 0 0 7A 998774319 367417783 232257105 146835946 92802815 58663320 37087037 23438982 14810480 9363967 5918901 3744301 2368118 1495577 945177 598595 377673 238915 150544 95675 60087 37906 24001 15173 9669 5931 3958 2407 1631 964 600 401 279 146 97 52 41 23 9 7 11 7 4 1 2 1 0 0 0 7B 998779784 367464058 232234855 146816929 92792705 58674319 37083660 23437757 14817638 9365247 5918711 3744522 2364345 1496279 945002 596810 377705 238922 150985 95295 60282 38317 24114 15364 9492 6022 3919 2329 1539 969 627 409 247 160 106 49 46 22 9 6 3 3 1 2 1 0 1 0 1 7C 998776905 367466972 232233246 146807516 92803702 58669287 37082057 23435440 14818824 9366086 5919597 3745309 2364460 1495312 945843 596755 377552 238670 151057 95225 60068 38275 24037 15406 9561 6025 3961 2403 1610 944 612 401 244 159 126 63 48 22 11 7 3 3 0 3 0 0 2 0 1 7D 998765675 367427195 232242322 146826350 92800831 58670467 37083860 23432985 14811589 9373958 5923242 3742498 2365205 1494955 947046 597252 377430 238943 150883 95010 60509 37967 23780 15280 9527 6046 3892 2471 1530 1006 607 390 246 159 89 62 37 20 10 9 6 8 2 0 1 0 0 0 0 7E 998763417 367424986 232242529 146826451 92801330 58668706 37083044 23439532 14814047 9364169 5920380 3744943 2366552 1496126 945974 599279 377123 238646 150581 95352 60343 37918 23997 15230 9515 5971 3973 2485 1574 984 661 379 254 122 98 59 43 22 13 8 7 5 2 2 1 1 0 0 0 7 Avg. 998772020 367440199 232242011 146822638 92800277 58669220 37083932 23436939 14814516 9366685 5920166 3744315 2365736 1495650 945808 597738 377497 238819 150810 95311 60258 38077 23986 15291 9553 5999 3941 2419 1577 973 621 396 254 149 103 57 43 22 10 7 6 5 2 2 1 0 1 0 0 #|Actual-Ideal|^2/Ideal #a b c d nzfreq1 nzfreq2 nzfreq3 nzfreq4 nzfreq5 nzfreq6 nzfreq7 nzfreq8 nzfreq9 nzfreq10nzfreq11nzfreq12nzfreq13nzfreq14nzfreq15nzfreq16nzfreq17nzfreq18nzfreq19nzfreq20nzfreq21nzfreq22nzfreq23nzfreq24nzfreq25nzfreq26nzfreq27nzfreq28nzfreq29nzfreq30nzfreq31nzfreq32nzfreq33nzfreq34nzfreq35nzfreq36nzfreq37nzfreq38nzfreq39nzfreq40nzfreq41nzfreq42nzfreq43nzfreq44nzfreq45nzfreq46nzfreq47nzfreq48total #ideal 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 #scro 1 0 0 4 2 2 3 2 0 2 0 1 0 0 0 2 2 1 1 1 1 0 0 0 1 0 0 1 1 0 4 1 0 0 0 0 1 0 1 3 1 0 2 1 1 0 0 0 41 13 5 10 9 42 4 64 2 3 16 0 6 45 16 9 10 2 13 2 3 0 3 0 0 1 0 1 4 0 0 0 6 0 5 0 2 0 0 0 1 3 0 0 1 0 1 0 0 1 0 0 0 266 7A 0 0 3 0 0 1 0 3 1 1 1 2 0 0 2 0 0 1 1 1 1 0 0 0 4 3 0 6 0 0 0 5 1 0 2 0 0 3 1 4 2 1 0 1 0 0 0 0 52 7B 4 2 0 2 2 0 0 0 0 1 1 1 0 0 1 0 0 0 0 0 1 0 1 2 1 1 4 0 0 0 1 0 0 1 3 1 0 3 2 2 0 1 0 0 1 1 0 4 45 7C 4 3 0 0 1 0 1 0 0 0 2 1 0 0 1 0 0 0 0 1 1 0 2 0 1 3 0 3 1 0 0 0 0 8 0 2 0 1 1 2 0 3 1 1 1 6 0 4 58 7D 0 1 1 0 1 0 3 2 6 1 0 0 0 3 0 0 0 0 2 1 1 4 0 1 0 1 1 0 1 0 0 0 0 1 0 0 1 2 0 0 4 0 2 0 1 0 0 0 41 7E 0 1 1 0 0 0 0 1 1 0 2 0 0 0 5 1 0 1 0 0 1 0 0 1 2 4 1 1 0 4 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 39 7 Avg. 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 2 0 1 0 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0 0 0 0 0 17
Sorry about mangling the headers... too frustrated to fix it properly.To put the question to rest, we can look at scores for xoroshiro128** high 32 bits... I'll run them tomorrow and post the binaries for you to look at.
I was able to get Evan's code to work in .NET, with a few adjustments (e.g. string and monotonic clock related code changed for cross-compatibility... I will not call myself a C++ programmer, so still amazed I can get anything to work right across two platforms).
Averaging the chi^2 total is allowable, I think, but not chi^2 for individual frequencies. Summing prank+zrank+nzrank would give a winner for 7A-7E, based on frequency distribution.