Random/LFSR on P2



  • xoroshironotxoroshironot Posts: 258
    edited 2020-07-30 - 23:23:05
    Several points (forgive my rambling):
    1. I have only performed preliminary PractRand analysis, with e=0, f=1. It looks to be up to 2x better (judging by GAP and FPF failures) than the previously discussed RevBits(). That is not proof of better freqs performance (which I showed for RevBits plotted per stream across random seeds, and you requested it should also be done across sequential steams).

    2. Do not forget the parenthesis in the modified code, for absolute clarity, (as '+ 1' must be the final step, or summed with prn[31:16] first, which I recall is not as good).

    3. Using Xors only (without addition) does not interact adjacent bits, so no equidistribution recovery (or desired randomness benefit).

    4. My unpublished code currently uses the above 'modified xoroacc' variant (which is extremely simple and fast), but further has its own added final output scrambler (required to get to 2TB+ randomness). My final output scrambler is not compatible with achieving (in the case of 16-bit word size) 32-bit near-perfect equidistribution when prn[15:0]/prn[16:31] are used (as a near-perfect 1D source), but works fine when s0/s1 are used (as a near-perfect 2D source). Therefore, once published, it will use its own xoroshiro engine (and return two outputs that are near-perfectly equidistributed, with some pairs occurring once less often over the full period). Also, since s0/s1 must be used, the final output scrambler must also be used to fix linear complexity and binary matrix rank issues. This is not an issue when using the prn output from XORO32, since linear complexity/matrix rank has already been dealt with, but does create issues with XORO32 regarding promotion of randomness beyond 16 or 32GB (without resorting to 'ugly' methods).

    5. The 'modified xoroacc' could be further modified to replace either one of the ^ with a +, which (obviously) re-introduces more complexity, adds some noticable randomness benefit biased toward high bits (up to 64GB, perhaps), and (more important for me) some small x86-x64 speed improvement (due to enabling micro-op fusion / SIMD). Investigating the modified xoroacc change, and now this other valid possibility, is hanging up my research (not to mention the issue of finding my own optimal D value, used differently, which cannot be 0).
Sign In or Register to comment.