Shop OBEX P1 Docs P2 Docs Learn Events
Random/LFSR on P2 - Page 12 — Parallax Forums

Random/LFSR on P2

191012141592

Comments

  • Heater. wrote: »
    Yeah, that's why I said "philosophical".

    The C language, for example, abstracts only the basic common ideas of number crunching, flow control, and memory into the language. All else is farmed out to functions. Which turns out to be great if you want an easily portable language.

    If portability is not the goal, then anything goes. Just keep inventing keywords to do whatever.


    Shouldn't that conversation be moved to the "New Spin" thread?
  • I think so. Will comment there.





  • Heater.Heater. Posts: 21,230
    evanh,
    Cogs have no need to be secured from each other. Propeller isn't targetting general computing.
    I know. I was just playing with the idea.

    People here have been worrying about the correlation between COGs. My Poker game thing was just a way to put that into perspective.
    Works as good as the original.
    Who says?

    Have you tested both for 2^128 iterations to check?

  • evanhevanh Posts: 16,029
    Heater. wrote: »
    evanh,
    Works as good as the original.
    Have you tested both for 2^128 iterations to check?
    Of course, done it in two seconds. 2^128 in 0.1 seconds, 2^130 in 0.4 seconds, the remaining time was just me fluffing about.
  • cgraceycgracey Posts: 14,206
    edited 2017-03-10 04:30
    Roy Eltham wrote: »
    Spin1 has a lot of what would be consider library stuff built in, like BASIC usually does. So far, Chip is following that model for Spin2.

    I like the idea of separating library stuff from the language, but we need a way that isn't cumbersome.

    If library stuff is external to the language in the current model, then we need objects to contain them and calling them is object.whatever() instead of just whatever(). Then you need to reference the library object(s) in all your objects. It's kind of a slippery slope going down that route.

    I'm not sure if it's a good idea to force the added complexity in this case?

    I think we need this:

    An implied library which is an object, but does not need the object.method() syntax, just the method() syntax. If any method uses an unknown keyword, the implied object's methods get checked for a name match. Any object that uses any of those methods will have that implied object included, which is just a 2-long cost. The top-level file can specify this implied object. That way, Spin can get extended without any tool changes.
  • I've added a comment about this in the New Spin thread.

    -Phil
  • evanh wrote: »
    Heater. wrote: »
    evanh,
    Works as good as the original.
    Have you tested both for 2^128 iterations to check?
    Of course, done it in two seconds. 2^128 in 0.1 seconds, 2^130 in 0.4 seconds, the remaining time was just me fluffing about.

    I'm sorry, but I find this hard to believe.
    If you could do an iteration per clock (upper bound for one cog), with a 128MHz clock, you can do about 2^27 iterations per second, so it should run for 2^101 seconds, not 0.1 seconds. For any reasonable multiplier of parallelism and available clock speeds, it should take way too long to prove the sequence length that way.
  • He was using a quantum computer.

    -Phil
  • Heater.Heater. Posts: 21,230
    I thought he was interpreting "^" as the exclusive OR operation. As in C.

    So it was iterated 130 times in 0.1 seconds and 132 times in 0.4 seconds.

    Perhaps running on an old CP/M machine.

    :)


  • evanhevanh Posts: 16,029
    edited 2017-03-11 12:30
    evanh wrote: »
    I'll do a little more testing this weekend ... see if I can prove anything on small accumulator size that won't take so long to repeat.

    Right, here's where I've got to:
    #include <limits.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <stdint.h>
    
    
    
    #define  RESULT_SIZE  30    // (Min=8, Max=63)  Word size, in bits, of the generated random numbers.
    
    #define  ACCUM_MASK  (((1UL<<RESULT_SIZE)-1)<<1|1)   // One more bit than results.
    
    typedef  unsigned __int128  uint128_t;
    
    
    // Seed the state array.  Seed MUST not be all zero.
    static uint64_t  s[2] = {1, 0};
    
    
    static inline  uint64_t rotl(uint64_t value, int shift) {
    //	return (value << shift) | (value >> (sizeof(value) * CHAR_BIT - shift));
    	return (value << shift) | (value >> (RESULT_SIZE + 1 - shift)) & ACCUM_MASK;
    }
    
    
    static uint64_t  next(void) {
    	uint64_t  s0 = s[0];
    	uint64_t  s1 = s[1];
    //	uint64_t  result = (s0 + s1) >> 1;   // Remove lsb because not quite so random.
    	uint64_t  result = ((s0 + s1) & ACCUM_MASK) >> 1;   // Remove lsb because not quite so random.
    
    	s1 ^= s0;
    //	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14);   // a, b  Preset for 63 bit results
    //	s[1] = rotl(s1, 36);   // c
    	s[0] = rotl(s0, (RESULT_SIZE)*55/64+1) ^ s1 ^ (s1 << ((RESULT_SIZE)*14/64+1)) & ACCUM_MASK;   // a, b
    	s[1] = rotl(s1, (RESULT_SIZE)*36/64+1);   // c
    
    	return result;
    }
    
    
    
    int  main(int argc, char* argv[])
    {
    	uint64_t  random64;
    	int  i, j = 0;
    	uint8_t  k = 0;
    
    
    	// Print some randomness
    	while(1)
    	{
    		i = RESULT_SIZE - 8 + j;   // Reduced by the concatenating size.
    		random64 = next();
    //		printf( "%016lx ", random64 );
    
    		if( j ) {
    			k |= (random64 >> i) & (0xff >> j);   // Concatenate one byte.
    			i -= 8;   // One byte size.
    			putchar( k );
    //			printf( "%02x ", k );
    		}
    
    		while(i >= 0) {
    			k = random64 >> i;   // One byte.
    			i -= 8;   // One byte size.
    			putchar( k );
    //			printf( "%02x ", k );
    		}
    //		printf("\n");
    
    		j = 8 + i;  k = random64 << (8 - j);  // Any remainder.
    //		printf( "%02x %04d ", k, j );
    	}
    	return 0;
    }
    
  • evanhevanh Posts: 16,029
    edited 2017-03-11 13:14
    Using PractRand 0.93, the randomness for smaller word sizes shows a massive drop off the moment the data stream exceeds the specified s0/s1 word size, which is RESULT_SIZE + 1. I've tested RESULT_SIZEs from 21 to 31 so far.

    Extrapolating on this, I predict the original xoroshiro128+ to start failing above 2^64 bytes. This leads to the question of is this the repeat point? If so, then where did the 2^128 figure come from?
    Ah, just found an important statement from Heater's link - http://xoroshiro.di.unimi.it/ - It says "A long period does not imply high quality". So, the obvious conclusion to draw from this is the repeat period cannot be derived from testing failures.

  • evanhevanh Posts: 16,029
    edited 2017-03-11 12:31
    Ah, just worked out how to make PractRand go below 26 bits usefully. There's a -tlmin option for starting it's analysis at a smaller size ... made a calibration tweak to the above source.
  • Heater.Heater. Posts: 21,230
    evanh,
    This leads to the question of is this the repeat point? If so, then where did the 2^128 figure come from?
    From the Xoroshiro128+ web page:
    http://xoroshiro.di.unimi.it/

    Also it's implied in the source code, in the comments on the jump() function:
    http://xoroshiro.di.unimi.it/xoroshiro128plus.c

    And from wikipedia (Which is always correct, right?) :
    https://en.wikipedia.org/wiki/Xoroshiro128+

    Correction: The period seems to be 2^128 - 1

    But whose counting? :)
  • evanhevanh Posts: 16,029
    edited 2017-03-11 13:33
    Bugger, size 32 failed above 2^30 instead of the expected 2^33. That's not good. :(

    Size 33,34 and 35 aren't any better. Of note is the particular test PractRand is failing them on is called "DC6-9x1Bytes-1". Wonder what that is telling me ... other than I don't know what I'm doing that is ...

    Hmm, everything over 32 bits is unhealthy, maybe a bug in my source code ...
  • evanhevanh Posts: 16,029
    edited 2017-03-12 00:20
    Heater. wrote: »
    Correction: The period seems to be 2^128 - 1
    Wow, this forum software is a bit screwy. I've only just now seen your post Heater even though it was long before I finished making my updates to the forum. That reference I made to your link was actually from way back on page 2 of this topic.

    Part of the problem is there is no numbering of posts now. Another problem is the fuzzy datestamping of posts. One very quickly loses perspective on the spacing of posts.
  • evanhevanh Posts: 16,029
    Learnt a little bit of bash and made meself an auto looping script for dumping scores of every size:
    #!/bin/bash
    for i in `seq 8 63`; do
    	gcc -o xoroshiro128plus-test xoroshiro128plus-test.c -O3 -DRESULT_SIZE=${i}
    	./xoroshiro128plus-test | stdbuf -o L ./RNG_test stdin -a -tlmin 1KB >PRscore-size${i}.out
    done
    
  • Heater.Heater. Posts: 21,230
    edited 2017-03-18 21:06
    Could someone verify some results for me?

    I'm running Chip's verilog version of xoroshiro128plus under the icarus verilog simulator. From here: http://forums.parallax.com/discussion/comment/1402326/#Comment_1402326

    I added Chip's 63 bit to 16 times 32 bit shuffling code. From here: http://forums.parallax.com/discussion/comment/1402448/#Comment_1402448 Wrapped in a little module of it's own.

    All of this is wrapped in a little test bench which outputs the first and second 32 bit batches of bits from the shuffled rnd output. As may be delivered to COG's 0 and 1 for example. Like so:
    $ vvp xoroshiro128plus_tb.vvp | head -n 16
    e3fa6f4d 023fbfb6
    e3fa6f4d 023fbfb6
    e3fb6f4f 023f3fb6
    ebfa6f4d 023bbcfe
    ebf32f0f 043f3c74
    f77d670c 02b92fc6
    933d474b 00bc3ffe
    ddc96a0e caae38e0
    676c98ca 7cc86f5b
    1546879a 6439fe36
    1a8506da 8e355c80
    ff3a717d f26b4223
    afa3e2ae 288b6c24
    3cac2ea1 8d9c44d8
    c4c8a522 47fd41b3
    71bb3187 d5ce5309 
    
    This is my first verilog so if anyone can say if those results are correct or not that would be great. The bytes maybe in reverse order but that is no concern.

    This is my shuffle module:
    //
    // Take a 32 bit number in and distribute it's bits over a 1024 bit output bus
    // in some randomish pattern.
    // 
    module shuff
    (
        input  wire[62 : 0]	       x,
        output wire[32*16 - 1 : 0] out
    );
        assign out = { 
            {!x[00], x[62],!x[02], x[28],!x[09], x[55], x[03],!x[34], x[45], x[37],!x[11], x[07],!x[41],!x[50],!x[52],!x[59],!x[27],!x[40], x[01], x[57], x[17],!x[53],!x[24],!x[22],!x[58], x[25], x[16],!x[29],!x[43], x[32],!x[14], x[30]},
            {!x[47], x[49],!x[35],!x[38], x[33],!x[31], x[44], x[13], x[06], x[54],!x[05],!x[10],!x[23],!x[19],!x[51], x[20],!x[39],!x[18],!x[46],!x[42], x[36], x[08], x[04], x[60],!x[48],!x[56],!x[26], x[21],!x[12], x[61],!x[15], x[16]},
            {!x[23],!x[21],!x[20],!x[47], x[10], x[48], x[51], x[07],!x[19], x[03],!x[50], x[60],!x[36], x[38],!x[33],!x[06],!x[28], x[27], x[00], x[53], x[24], x[52],!x[14],!x[29],!x[04], x[08], x[42],!x[13],!x[45],!x[56],!x[54], x[61]},
            {!x[22], x[15], x[41],!x[05], x[59], x[58],!x[35],!x[26],!x[11],!x[12], x[31],!x[01], x[39],!x[43],!x[62],!x[25],!x[55],!x[32],!x[44],!x[02],!x[46],!x[37], x[09], x[17], x[49],!x[18],!x[34], x[40],!x[57], x[30],!x[56],!x[08]},
            {!x[37],!x[42],!x[33], x[35], x[39], x[17],!x[10], x[09], x[23],!x[02],!x[60],!x[36], x[18],!x[00], x[32],!x[21],!x[27], x[34], x[20], x[53], x[48], x[15], x[55],!x[14], x[19],!x[62], x[04],!x[29], x[22],!x[28],!x[06], x[03]},
            {!x[45], x[26], x[47],!x[43],!x[50],!x[11], x[57],!x[44], x[31],!x[12],!x[41], x[58],!x[13], x[25], x[07],!x[30], x[05],!x[61],!x[49],!x[59], x[46], x[01], x[52],!x[51],!x[32],!x[16],!x[24],!x[38],!x[54],!x[21], x[37], x[40]},
            { x[44],!x[30], x[52], x[62],!x[58], x[45], x[38],!x[55], x[00],!x[12], x[14], x[47],!x[13], x[39], x[53], x[40],!x[18], x[31],!x[03], x[41],!x[48],!x[27], x[09],!x[07],!x[17],!x[15],!x[56], x[20], x[16], x[51],!x[59],!x[11]},
            { x[04], x[06], x[22], x[26],!x[50], x[29],!x[34], x[19], x[60], x[05],!x[49], x[57],!x[33], x[10],!x[01], x[46], x[24],!x[08], x[25], x[23], x[35],!x[42], x[54], x[43],!x[36],!x[02],!x[31],!x[28], x[13],!x[15],!x[61],!x[53]},
            {!x[27], x[44],!x[33], x[52],!x[34], x[29],!x[20],!x[35], x[01], x[11], x[60],!x[14],!x[45], x[10],!x[36], x[22],!x[21],!x[55], x[62],!x[46],!x[16],!x[26],!x[07],!x[30],!x[17], x[51], x[02],!x[48],!x[42], x[43],!x[08], x[19]},
            {!x[38], x[49], x[05], x[28],!x[39], x[23],!x[57],!x[25],!x[18],!x[54],!x[61],!x[24],!x[09],!x[03], x[56], x[04], x[41], x[37], x[40],!x[47],!x[00],!x[06],!x[59],!x[50], x[51],!x[26],!x[22], x[32],!x[58], x[27],!x[12], x[48]},
            { x[12],!x[17], x[04],!x[54],!x[18], x[02],!x[35], x[44],!x[40], x[50], x[21], x[37], x[15], x[58],!x[19],!x[42], x[41], x[00],!x[47],!x[56], x[46], x[20], x[34], x[11], x[45], x[32],!x[10],!x[60], x[23],!x[62],!x[61], x[13]},
            { x[03], x[53],!x[08], x[33], x[29],!x[01], x[07],!x[28],!x[31],!x[24],!x[57], x[25], x[43],!x[49], x[55],!x[38],!x[06],!x[09], x[05],!x[59],!x[30], x[39], x[16],!x[52], x[14], x[21],!x[20],!x[36], x[13],!x[04],!x[37],!x[34]},
            {!x[14],!x[36],!x[35],!x[05], x[45],!x[06],!x[54],!x[41], x[23],!x[15],!x[26],!x[49],!x[39], x[30], x[29],!x[38], x[22],!x[01], x[55],!x[43],!x[46], x[52],!x[53], x[40],!x[16], x[27],!x[00],!x[57],!x[18], x[62], x[28],!x[58]},
            { x[07], x[10], x[32], x[50], x[03],!x[12], x[59], x[09],!x[19], x[56],!x[44], x[25], x[47],!x[08], x[31],!x[51], x[17], x[42],!x[61],!x[02],!x[33], x[24], x[30], x[04],!x[43], x[48],!x[49],!x[11],!x[22],!x[29],!x[60],!x[38]},
            { x[42], x[37], x[10], x[08], x[61], x[21],!x[00], x[20], x[51], x[57],!x[33],!x[01],!x[03],!x[26],!x[32],!x[05],!x[13], x[19],!x[48],!x[31],!x[02],!x[60],!x[07],!x[36],!x[62], x[27],!x[34],!x[14], x[50],!x[16],!x[17], x[11]},
            {!x[56],!x[23],!x[28], x[15], x[45], x[09],!x[24],!x[46],!x[53],!x[25],!x[52],!x[39],!x[40], x[44],!x[55], x[54], x[59],!x[21],!x[06], x[08],!x[18],!x[01],!x[11],!x[61], x[47],!x[41], x[12], x[49],!x[20],!x[29], x[35],!x[58]}
        };
    endmodule
    
    This is my test bench:
    module xoroshiro128plus_tb;
    
        /* Make a reset that pulses low once. */
        reg reset = 1;
    
        initial begin
        /* verilator lint_off STMTDLY */
            # 1 reset = 0;
            # 1 reset = 1;
        /* verilator lint_on STMTDLY */
        end
    
        /* Make a regular pulsing clock. */
        reg clk = 0;
        /* verilator lint_off STMTDLY */
        always #5
        /* verilator lint_on STMTDLY */
        begin
            clk = !clk;
        end
    
        wire [62:0]          x;
        wire [32*16 - 1 : 0] y;
    
        always @ (posedge clk)
        begin
            $write ("%h %h\n", y[31:0], y[63:32]); 
        end
    
        rnd r1 (.resn(reset), .clk(clk), .out(x)); 
    
        shuff s1 (.x(x), .out(y));
    
    endmodule
    

    If this is crappy verilog please do say!








  • I swear, we are gonna have an awesome PRNG.

    Personally, I never gave all this too much thought. There are a lot of great used for good random numbers. IMHO, your efforts are worth it.

    :D
  • Heater.Heater. Posts: 21,230
    Every now and then I get a little obsessed with random number generation and PRNGs. It's that little mystery over how do you know they are random enough? It all started a few decades ago when we needed to get some random seed for a secure, portable, military communications box. Turned out to be harder than I would have guessed.

    The P2 will have awesome random numbers. Not up to cryptographic strength but very good.

    And now Chip has got me tinkering with Verilog. I just want to run the actual verilog output through the statistical tests.

    Then I'm going to start designing my own CPU....:)
  • evanhevanh Posts: 16,029
    I've been a little distracted ... For the above single threaded xoroshiro testing I'm getting double the speed - over my ageing Athlon64 - on half the total system power. :)

    The Ryzen is very comfy, I'm liking it ... which is a just as well because I've dropped down somewhat more moola on a single chip than ever before. The basic overclocking multiplier is pretty cool now, it doesn't impact on the dynamic clocking feature so there is no noticeable increase in idle power. It'll make a cheap gaming option when the quad cores arrive.
  • Heater, maybe do a JS CPU. Double precision float as native datatype. :D

    Don't bother with anything else but bytes for strings.

  • Heater.Heater. Posts: 21,230
    Ha, I'm having a hard enough time making the xoroshiro work nicely in Verilog. A CPU is far away. My CPU would probably be a subleq machine.

    Hmm...maybe I should actually try and get xoroshiro ticking on a real FPGA. The Icarus Verilog simulator is very slow. Now where is that DE-0 Nano?....
  • cgraceycgracey Posts: 14,206
    edited 2017-03-19 07:31
    Heater. wrote: »
    Ha, I'm having a hard enough time making the xoroshiro work nicely in Verilog. A CPU is far away. My CPU would probably be a subleq machine.

    Hmm...maybe I should actually try and get xoroshiro ticking on a real FPGA. The Icarus Verilog simulator is very slow. Now where is that DE-0 Nano?....

    Yeah, Verilog started out as a simulation language and was later used for describing hardware to be synthesized. My understanding, anyway. I don't know the simulation side of Verilog, only using it for hardware description. An FPGA will run 1000's of times faster, if not millions of times faster.
  • Heater.Heater. Posts: 21,230
    That is the way I understand of Verilog history as well. Same for VHDL I think.

    The great thing about a simulator like Icarus is that it is less than a second between hitting save on my newly edited module and having the thing running. The edit/debug cycle is very quick. This makes great training wheels for a Verilog newbie.

    I have not tried for some years but I recall that doing this with the Altair tools is thousands of times more complicated and takes forever to compile, synthesize and run.

    Downside is of course speed. I have been pumping the output of my xoroshoiro into the dieharder tests. It's been running all night and only the first test step has passed!

    Next up is using Verilator. That compiles Verilog into C++ which then runs hundreds of times faster.

    Or perhaps go straight to FPGA. I need to see some LEDs flashing :)
  • Heater.Heater. Posts: 21,230
    I presume what I need is Quartus Prime Lite Edition (Free).

    Can't get it though. Altera's web site is under maintenance...
  • Heater.Heater. Posts: 21,230
    Need Windows version for this Surface Pro here.
  • Heater.Heater. Posts: 21,230
    edited 2017-03-19 07:11
    Altera download page is now up again.

    Can't sign up or log in as their back end server is down.

    Is it really so that Intel can't keep a web site on line?



    Edit: Ha! I can't even complain on Altera's "How are we doing" link from their download page:

    "We are sorry. We are unable to accept your feedback at this time."
  • evanhevanh Posts: 16,029
    edited 2017-03-19 09:28
    I've discovered things aren't stable on the Ryzen setup. It seemed okay until I was hammering it ... after a while the display would just go black. Needing a hard reset to reboot.

    I did a search and found out Linux has a known issue with the Zen architecture. Supposedly Kernel 4.10.1 onwards fixes it.

    I discovered I also can't even reliably download a newer iso using the Ryzen - md5sum failed its check. Now I'm wondering how much file corruption has occurred on my main install ... back to the old box for a while maybe ...
  • evanhevanh Posts: 16,029
    edited 2017-03-19 13:27
    Newest Kubuntu daily release worked. Been hammering away for the past hour or so with no problems. Now the mouse wheel reverse direction setting is broken! Damn it!!!! I'd just got used to the Mac way too, it solved all the backwards zooming in newer 3D stuff.

    I should try the earlier January beta release I guess. Actually, good test to do there - downloading another iso and check it's md5 ...


    Spoke too soon. :(
Sign In or Register to comment.