Shop OBEX P1 Docs P2 Docs Learn Events
Random/LFSR on P2 - Page 19 — Parallax Forums

Random/LFSR on P2

1161719212292

Comments

  • I confess I have no capability whatsoever to do what I suggested above. If PractRand tests at intervals of 256 * 63 bits then job done already.

    Two weeks ago I knew nothing about the Propeller and since then I have become strangely fascinated by it, spending far too much time reading various forum threads to the point that I'm beginning to feel my free will ebbing away ...
  • evanhevanh Posts: 15,126
    evanh wrote: »
    I've started an s12 scoring run at iteration intervals of 16. It's generating decent output so far, actually is getting some 128 MB scores. Will take another 30 minutes I think ...
    There's 48 scores that failed at 128 MB. A few even came up as HQ at 64 MB, although all bar one of those is strange a-b-c combination like 6-6-1, 1-1-8 and 1-6-6. Beats the non-intervalled iteration method hands down anyway.
  • evanhevanh Posts: 15,126
    edited 2017-04-16 20:14
    TonyB wrote: »
    I confess I have no capability whatsoever to do what I suggested above. If PractRand tests at intervals of 256 * 63 bits then job done already.
    There's no way for one computer to work a full combination for anything close to that word size (64 bit generator, 128 bit state, 63 bit useful output). I'm proving things with 12 bit generator (24 bit state, 11 bit useful output) because it gets brute force results in minutes and Chip has indicated interest in this size. EDIT: Edited a few times to fix details.
    Two weeks ago I knew nothing about the Propeller and since then I have become strangely fascinated by it, spending far too much time reading various forum threads to the point that I'm beginning to feel my free will ebbing away ...
    Oh, welcome to the Prop2!

    I was far too focused to notice your low post count. I'm not one for paying attention to whether a name is familiar or not.
  • evanhevanh Posts: 15,126
    evanh wrote: »
    evanh wrote: »
    I've started an s12 scoring run at iteration intervals of 16. It's generating decent output so far, actually is getting some 128 MB scores. Will take another 30 minutes I think ...
    There's 48 scores that failed at 128 MB. A few even came up as HQ at 64 MB, although all bar one of those is strange a-b-c combination like 6-6-1, 1-1-8 and 1-6-6. Beats the non-intervalled iteration method hands down anyway.
    Interval 256 is looking practically identical to interval 16 so far.
  • evanhevanh Posts: 15,126
    Might have to take a nap ... It's daylight already, damn, and it's Monday here.
  • evanhevanh Posts: 15,126
    evanh wrote: »
    TonyB wrote: »
    I confess I have no capability whatsoever to do what I suggested above. If PractRand tests at intervals of 256 * 63 bits then job done already.
    There's no way for one computer to work a full combination for anything close to that word size (64 bit generator, 128 bit state, 63 bit useful output). I'm proving things with 12 bit generator (24 bit state, 11 bit useful output) because it gets brute force results in minutes and Chip has indicated interest in this size. EDIT: Edited a few times to fix details.
    To give a little more insight to my exclamation. At the larger sizes, not only is there exponentially (size cubed) more combinations to test but generally the bigger sizes run longer before failing. The author of Xoroshiro128+ has stated testing up 256 TB, I think it was. Running a single combination to that length on my rig will be 30 days plus if I'm guessing right.

    Multiple those two factors and we're at something like 20000 years to produce the full combination set of 63^3.
  • evanhevanh Posts: 15,126
    Chip,
    I've just found out we've missed half the recognised combinations in our testing too. Looking at preceding xorshift* examples, the shift operator can go either direction!

    Like you say, though, there is even more possible combinations with custom operators.
  • cgraceycgracey Posts: 14,133
    It would be interesting to make four bit remap functions for 16 bits and name them r1..r4, then do:

    s1 = r1(s[0]) ^ r2(s[1])
    s0 = r3(s[0]) ^ r4(s[1])
    s[1] = s1
    s[0] = s0

    Experiment by changing r1..r4 and see if you can get anything like a max-run-length of $FFFF_FFFF. If not, add patterns and XORs and try again. Some patterns may replace certain bits with 0's to simulate the effect of a shift, as opposed to a lossless rotate.
  • cgraceycgracey Posts: 14,133
    edited 2017-04-17 06:08
    evanh wrote: »
    Chip,
    I've just found out we've missed half the recognised combinations in our testing too. Looking at preceding xorshift* examples, the shift operator can go either direction!

    Like you say, though, there is even more possible combinations with custom operators.

    Interesting. I bet if you made those rotate-lefts into rotate-rights (which can be done just by 16 minus current shift) and made that shift-left into a shift-right, you'd get the same quality.

    It might work better to just make one bit remap function and rotate it as needed, to be sure bits are on the move.
  • evanhevanh Posts: 15,126
    edited 2017-04-17 10:32
    Here's a s12 max iterate run with the "b" right-shift added. First thing that stood out is there is exactly double the number of found combinations. Second thing is top half and bottom half of the list are basically a mirror image of each other.

    I don't think there is any benefit in testing this any further.
     109 4   1, -9, 4	
     102 4   1, -2, 4	
     1 5 6   1,  5, 6	
     102 8   1, -2, 8	
     1 1 a   1,  1,10	
     2 2 3   2,  2, 3	
     20a 5   2,-10, 5	
     201 b   2, -1,11	
     3 2 2   3,  2, 2	
     303 4   3, -3, 4	
     3 2 8   3,  2, 8	
     304 8   3, -4, 8	
     409 1   4, -9, 1	
     402 1   4, -2, 1	
     403 3   4, -3, 3	
     404 5   4, -4, 5	
     4 4 7   4,  4, 7	
     4 9 7   4,  9, 7	
     4 4 9   4,  4, 9	
     402 9   4, -2, 9	
     4 2 b   4,  2,11	
     50a 2   5,-10, 2	
     504 4   5, -4, 4	
     509 8   5, -9, 8	
     504 8   5, -4, 8	
     6 5 1   6,  5, 1	
     605 b   6, -5,11	
     7 4 4   7,  4, 4	
     7 9 4   7,  9, 4	
     7 4 8   7,  4, 8	
     7 a a   7, 10,10	
     802 1   8, -2, 1	
     8 2 3   8,  2, 3	
     804 3   8, -4, 3	
     809 5   8, -9, 5	
     804 5   8, -4, 5	
     8 4 7   8,  4, 7	
     8 3 9   8,  3, 9	
     8 2 b   8,  2,11	
     8 9 b   8,  9,11	
     9 4 4   9,  4, 4	
     902 4   9, -2, 4	
     9 3 8   9,  3, 8	
     902 a   9, -2,10	
     a 1 1  10,  1, 1	
     a a 7  10, 10, 7	
     a02 9  10, -2, 9	
     b01 2  11, -1, 2	
     b 2 4  11,  2, 4	
     b05 6  11, -5, 6	
     b 2 8  11,  2, 8	
     b 9 8  11,  9, 8	
    
  • evanhevanh Posts: 15,126
    I've been trying to clean up my scripts to make it more user friendly before posting the whole package here. It's taken me quite a while to get over one particular hurdle of making a flexible line field reader that can pull the a,b,c constants from the above table and do a PractRand scoring run. And automatically repeating for each line until there is a score for every line.

    It wasn't a complicated solution but I just don't know bash nor my linux shell commands. I settled on using "awk" command.

    Next step is to have it also fill the table with the final scores. That'll be a whole new level of using awk me thinks.
  • evanhevanh Posts: 15,126
    Ha! Didn't need awk for that at all. "read" can do many more things than I first understood.
  • cgraceycgracey Posts: 14,133
    No worries, Evanh.

    Just showing the best xoroshiro32+ algorithm and its quality score would be great. That's all somebody would need to implement it with sufficient background knowledge.
  • evanhevanh Posts: 15,126
    Hehe, I've made all three scoring variations run in parallel. However, just by the comparative completion times, there is massive differences in quality between the three variations when applied to the max iterate s12 candidate group. I'd not noted this previously.

    Both bit truncated and byte truncated complete the whole set of scoring runs in under 2 minutes while full s12 word scoring takes about 20 minutes.

    It's so stark I'm wondering if I've got a new bug ...
  • evanhevanh Posts: 15,126
    No, it's all fine. Lots of 4 MB scores for bit truncated and mostly rubbish, except for a few 8MB/16MB scores, for byte truncated.
  • evanhevanh Posts: 15,126
    edited 2017-04-23 02:08
    Oh, maybe launching everything in parallel is a bad idea - just gobbled 10 GB of RAM!.

    EDIT: Wow! It really did the trick though. Finished the full word run in under a minute!
    EDIT2: I had 28 copies of PractRand running so, minusing 1.0 GB floor, it must use about 320 MB per instance.

    EDIT3: Lol, took a shot at s16 with it's candidate set of 84 and it nearly broke ... holding at the 29 GB mark ...
  • evanhevanh Posts: 15,126
    edited 2017-04-23 06:45
    Doh! Typical, it was the full word variant that had the bug. I had left in the 256 interval stepping which was not very efficient as a quick hack ... man, what a difference! s16 full word variant now finishes in 2.5 minutes flat. Now it's the other two variants that are comparatively slow. I'm gonna have to rework the balancing again ...
  • evanh wrote: »
    I've been trying to clean up my scripts to make it more user friendly before posting the whole package here. It's taken me quite a while to get over one particular hurdle of making a flexible line field reader that can pull the a,b,c constants from the above table and do a PractRand scoring run. And automatically repeating for each line until there is a score for every line.

    It wasn't a complicated solution but I just don't know bash nor my linux shell commands. I settled on using "awk" command.

    Next step is to have it also fill the table with the final scores. That'll be a whole new level of using awk me thinks.

    Evanh, I think I can help you with this.

    If you show me what is your input and what is the output you want, I can help you write the scripts.

    For simple scripts, I usually mix 'awk', 'cut' and 'tr'. It's simple. Just a few lines wrapped in a 'for' loop in bash gets the job done.

    I use Perl and RegExp when things gets more complicated. There is nothing you cannot do with Perl and RegExp.
  • evanhevanh Posts: 15,126
    edited 2017-04-23 23:57
    Thanks Ramon,
    I'm enjoying it for the moment. I've just learnt another, obvious in hindsight, detail that most operations are on text strings. It took me a while to realise an if[] wasn't working because it wasn't the numerical comparison I had expected. I've now found ["" -lt ""] ... there seems to be many ways to skin a cat in bash!

    I'm off to work right now ...
  • evanhevanh Posts: 15,126
    Ramon,
    Here's all my up to date sources, plus a copy of an s16 working directory, a little out of date but compatible, for an example of a candidate list "maxiter-s16.out" and their scoring output.

    I guess we're wanting to get the equivalent of the earlier tables, eg: http://forums.parallax.com/discussion/comment/1408198/#Comment_1408198. I'm thinking it'd probably be better to go in direction of CSV formatting though.
  • Evanh, you did an amazing job considering what the author of practrand throws as 'output'.

    I read 'bytecase-xoroshiro.sh' first to get a general idea of how the scripts are running.

    Please correct me if I am wrong:

    1) you compile xoroshiro with the 3 values (a,b,c).
    2) Later you run that program and inject into practrand using stdbuf (so I guess there must be some buffering issue to do that).
    3) Then you remove the big output file to save space, and just keep the report ('PRscore*')
    4) (Here goes the interesting part ...) You get the size of the report file. According to that report size you estimate (with SCOREMAP array) the true RNG length.

    I was thinking why just not parse the report file to find the true RNG length instead of doing a guess from the file size. And then I take a look inside one report file:

    OMG, look at the 'Evaluation' field.

    $ find . -type f -iname "PRscore-s16-*" | xargs grep '^ ' | cut -d':' -f2- | cut -c 61- | sort | uniq -c | sort -n
          1   FAIL !!!!
          2   FAIL !!!!!!
          9   FAIL !!
          9   FAIL !!!
         44   FAIL !
         51  VERY SUSPICIOUS
         52 suspicious
         59   FAIL
         67 very suspicious
        104 mildly suspicious
        306 unusual
       1356 Evaluation
      85215 normal
    

    - normal
    - unusual,
    - suspicious, mildly suspicious, very suspiciuous, VERY SUSPICIOUS
    - FAIL !, FAIL !!, FAIL !!!, FAIL !!!!, FAIL !!!!!!

    Is this a joke?? I will not call PractRand as 'script' friendly.

    Is there a single line that describes the RNG maximum lenght?
    How are we supposed to know if a RNG is good or bad?

    OK, I know ... 'mildly good', 'not so good', 'NOT SO GOOD, 'doesn't look bad', 'almost bad', 'BAD', 'BAD!', 'BAD!! , 'BAD!!!' ... :-D
  • cgraceycgracey Posts: 14,133
    edited 2017-04-25 00:28
    The early reports will usually show "pass". Each subsequent report covers double the prior report. As things begin to fail at longer run lengths, results other than "pass" begin to appear.
  • evanhevanh Posts: 15,126
    You got it. :) Except point 3, the deletion is just the small compiled binary after it has finished running. Just a little clutter removal is all.

    stdbuf -L was needed from the outset, I've not attempted to remedy it, due to redirected file buffering not flushing at program exit. The last piece of the PractRand output would always be cut off without stdbuf.

    I've been busy on other details so I haven't really tried to handle the PractRand output yet to be honest. The file size was good enough for displaying status progression.

    Regarding the PractRand output, as Chip says, if you have a look at the grouping of scoring inside a single file you'll see everything is "normal" until the very end of the file. PR spits out additional scoring data as it is evaluating the stream of random numbers it is receiving.

    "FAIL" is the only evaluation I have been interested in. FAIL is only in the final segment of the PR output. PR automatically exits at this point. Actually, I've not tried to find out if there is any options because that worked nicely.
  • I've visited this thread only sporadically of late, so I'm a bit lost as to what is being discussed or accomplished. Can anyone provide a simple tl;dr that summarizes which PRNGs are being considered and how they are faring under test?

    Thanks,
    -Phil
  • evanhevanh Posts: 15,126
    edited 2017-04-25 00:01
    I've visited this thread only sporadically of late, so I'm a bit lost as to what is being discussed or accomplished. Can anyone provide a simple tl;dr that summarizes which PRNGs are being considered and how they are faring under test?
    All done. Chip just has me documenting the process so it's usable for anyone that comes looking to do the same.

    The free running hardware generator is locked in as per original researcher's 128-state-bit algorithm.

    Chip also wanted to throw in a Cog instruction for a programmatically stepped generator that has a smaller state. We brute force solved the constants for a 32-state-bit generator. See http://forums.parallax.com/discussion/comment/1408079/#Comment_1408079
  • jmgjmg Posts: 15,140
    evanh wrote: »
    I've visited this thread only sporadically of late, so I'm a bit lost as to what is being discussed or accomplished. Can anyone provide a simple tl;dr that summarizes which PRNGs are being considered and how they are faring under test?
    All done. Chip just has me documenting the process so it's usable for anyone that comes looking to do the same.

    The free running hardware generator is locked in as per original researcher's 128-state-bit algorithm.

    Chip also wanted to throw in a Cog instruction for a programmatically stepped generator that has a smaller state. We brute force solved the constants for a 32-state-bit generator.

    I think this means
    a) Silicon has hardware PRNG upgraded to Xoroshiro128+ (accessed via GETRND, was 32-bit LFSR)
    IIRC this iterates every SysCLK, from reset, so is not predictable in the pattern, but is a high quality PRNG

    b) There is a smaller 32-state-bit generator that can be run in software.
    This is useful when you want to reproduce or compare testing.

    Did b) get a special opcode (hardware), or is that just a software library ?


  • evanhevanh Posts: 15,126
    Yes, yes, and yes I think there is a new opcode.
  • cgraceycgracey Posts: 14,133
    edited 2017-04-25 00:37
    evanh wrote: »
    Yes, yes, and yes I think there is a new opcode.

    Yes, I'll add to the FUNCS instruction to implement a 32-bit version of xoroshiro128+, which we've found to be the best, using Evanh's testing. We'll call it xoroshiro32+ in this thread, so in case anybody goes looking on the web for something smaller than the full 128-bit version, they'll find our proven 32-bit version. It's relatively small in state, but outputs quality numbers, like the bigger version. For software use, it's seed-able and step-able. All that's needed to iterate it is some XOR's and static shifts. After each iteration, you can get 15 bits of high-quality data out by adding the top and bottom words of the state long and use sum[15:01].
  • evanhevanh Posts: 15,126
    edited 2017-04-25 01:16
    I found SFUNC ... "Functions 0..7 are SPLITB, MERGEB, SPLITW, MERGEW, SEUSSF, SEUSSR, RGBSQZ, RGBEXP." Eek! You've used a two-operand instruction slot for those. Shouldn't those be elsewhere in the encoding map?

    Hmm, reading the individual instruction descriptions I see there is S data involved. Not clear on how the [2:0] bits stay out of the fray though.
  • I don't understand SFUNC either. Is S replaced by the function number, in which case where does S come from? Or does this require two instructions, the first one to set S? In which case D is duplicated.
Sign In or Register to comment.