XorSAR PRNG Hardware Implementation
xoroshironot
Posts: 309
I am struggling to advance the development process on my new-concept XorSAR PRNG due to lack of both computing power and mathematical methods to solve for required constants.
I am interested in:
1. Opinions on the usefulness of the PRNG, especially if implemented in hardware, considering its algorithmic simplicity and potential high-degree of pseudo-randomness at greater state sizes.
2. Ideas, resources or help solving for required constants (as I can only search for constants up 28-bit state by brute-force).
Description, source code, list of constants up to 22-bit state size, and example 128-bit state ASM dump are here:
https://groups.google.com/forum/#!topic/xorsar-prngs/rJr046H3jEs
Regards,
Chris
I am interested in:
1. Opinions on the usefulness of the PRNG, especially if implemented in hardware, considering its algorithmic simplicity and potential high-degree of pseudo-randomness at greater state sizes.
2. Ideas, resources or help solving for required constants (as I can only search for constants up 28-bit state by brute-force).
Description, source code, list of constants up to 22-bit state size, and example 128-bit state ASM dump are here:
https://groups.google.com/forum/#!topic/xorsar-prngs/rJr046H3jEs
Regards,
Chris
Comments
I was hoping to get some traction on my new original work, but need something more substantial to consider writing a paper.
Currently I'm working on basic matrix equations to express the lowliest 4-bit state version, but have not spent much time on it.
So far, it is harder for me than I first thought for such seemingly simple/fast code, which is part of why I imagine it may be worth the struggle.
Regards,
Chris
Apologies for not making a comment earlier. I had a great time running the various bits of code for Xoroshiro testing and automating the use of testing suites, Practrand in particular. My maths skills aren't great though so probably can't give much help.
Tony grok'd it better than I did.
Looks good for parallelising over all. A little hard to follow the state flow with the "core" listed separately, imho. I'm mostly interested in easy hardware solutions or low-end microcontroller software so would avoid the ones with multipliers.
The final xor'ing of a constant has a purpose I presume? Melissa tore shreds out of Xoshiro for heavily relying on multiplying a constant outside the iterative engine. I can see you aren't relying on it though.
So that mean A, B and C are the constants of interest for tuning.
The "core" is confusing at another point, as the 1D versions do not use the return value. I'll likely eliminate it, as it no longer serves its original purpose.
The final xor constant (D) is unnecessary, if you consider a non-zero single missing output acceptable. Xoroshiro (all variants) is missing a single zero, which might be required for some uses. The end user could perform the xor themselves, if it were somehow required.
Embedding the A constant plus or multiply in the iterative engine was certainly a conscious choice, but I found no way to fully digest it without the compare with -1. I have consoled myself to the idea that there is a certain obtuse elegance to it: Why would creating very-good statistical randomness be an artifact of 'clean-looking' code?
Indeed, A, B and C are the only constants required for tuning, but just be careful that you don't accidentally discover X and Y while seeding (which I have done, and is equivalent to seeding xoroshiro with 0,0). I eventually started seeding 0,0 and checked state for 0,0 after one call and, if found, seed 0,1 instead.
Discovering all ABC ++ and +* triplet constants up to 12-bit output/24-bit state (as I have already posted) is fairly trivial. It took a total of ~4.5 days on 16 cores/32 threads. Still working on 26-bit state... need significantly more processing power (~1000x) to discover even one triplet at 32-bit state (assuming they exist). I am hoping there is a better way to discover the triplets, but have yet to find it.
Funnily, as I was busy finding each set, Tony came on board and made contact with one of the Xoroshiro authors, Seba. Seba was able to produce the full-period candidate set of any sized N at the drop of the hat. And with this set came a sort of ranking number for each candidate in the set. This helps priorities the subsequent quality testing.
I've seen Melisa produce similar in her blog for completely unrelated algorithms, and Chris of Practrand fame also demonstrated it somewhere. So there is a non-brute-force mathematical solution. I just have no clue what it is though.
My original hope was that I would be able to quickly find a few full-period 30 or 32-bit state candidates in order to more accurately compare against other similar state size PRNGs using SmallCrush, PractRand and gjrand... perhaps even finding a pattern in the constants themselves. This has been less fruitful than I originally thought, as I have only a few 26-bit state constants (++, though +* are actually ~2x faster to find).
Currently, my preliminary analysis of 24-bit state candidates looks promising enough to move forward and continue work on an algorithm for calculating candidates. However, I also realized that the code (especially considering the ad-hoc rar function) might be more well-suited for hardware implementation.
I went ahead and moved the "core" function into the individual PRNGs, for better readability.
Thanks for the comments so far. I welcome any input on this, as I see some reasonable probability that this PRNG may prove useful, if taken to its logical conclusion.
EDIT: Maybe I'm wrong, it is in the iteration, any bit flipping is good there. I dunno.
My current plan is to attempt to write and solve equations describing the 4-bit state PRNG, which again, in theory should be possible. That being the case, scaling upward from there should be relatively easy.
Currently, with one of the few candidates I have for 26-bit state (and likely not a best-case one), I am hard-failing SmallCrush on only 3 test statistics each when looking at upper 8 bits and lower 8 bits. As I recall for milestones, it should be possible to remove 2 of the 3 failures at about 30-bits (passing 15-bits x 2 at a time) and the final failure should disappear at 32-bits, if all goes well.
BTW, as I recall, it was one of Tony's notes that gave me the idea to force a carry back through to the low bit(s). The hard part was to visualize how that could happen without breaking the equidistribution properties and invoking it through C++ (without resulting in overly complicated asm code).
I'll look into it anyway, as (if successful) it would bring much larger state sizes well within reach by my current (exhaustive search) method, hopefully without reducing statistical randomness.