Shop OBEX P1 Docs P2 Docs Learn Events
Random/LFSR on P2 - Page 28 — Parallax Forums

Random/LFSR on P2

1252628303192

Comments

  • TonyB_TonyB_ Posts: 2,196
    edited 2017-10-28 21:33
    deleted

  • TonyB_TonyB_ Posts: 2,196
    edited 2017-10-27 22:51
    TonyB_ wrote: »
    Someone is saying that Xoroshiro32+ cannot possibly be as a good as an 8-bit Complementary Multiply-With-Carry PRNG and that CMWC in fact "is in an entirely different league."

    Here's the pseudo-code:
    
    ;set constants
    
    a = 253
    ;b = 8
    ;r = 8	
    
    ;initialise variables
    
    c = 0
    i = 0
    x[0] = 82
    x[1] = 97
    x[2] = 120
    x[3] = 111
    x[4] = 102
    x[5] = 116
    x[6] = 20
    x[7] = 12
    
    ;iterate CMWC
    
    t := a * x[i] + c
    c := t / 256			;high byte of t
    x[i] := (t mod 256) xor 255	;1's complement of low byte of t
    prn := x[i]
    i := (i + 1) and 7
    
    ;a,c,i,x[i] are 8-bit
    ;t is 16-bit
    ;Period > 2^66
    
    

    I like to know how this compares to the top byte of Xoroshiro32+ [14,2,7] but I don't have the testing tools.

    First 16 prn values:
    F5, D1, 07, D6, C3, F6, C8, 0F, D3, 80, 45, 7A, 75, 20, 64, 66
    
  • cgraceycgracey Posts: 14,208
    Okay. I need one of you guys to tell me if these first 16 values from an initial seed of $00000001 are correct:

    55304084
    769F2C7C
    DFFCA249
    F2F0202C
    4DC85ACB
    ADCDD1D0
    1B310598
    540D8094
    9C9DD6A5
    35B52EAD
    545DCF86
    95D4BE50
    33E7D2EA
    1DCC2F55
    70113655
    08B16602
  • TonyB_TonyB_ Posts: 2,196
    edited 2017-10-27 23:16
    cgracey wrote: »
    Okay. I need one of you guys to tell me if these first 16 values from an initial seed of $00000001 are correct:

    55304084
    769F2C7C
    DFFCA249
    F2F0202C
    4DC85ACB
    ADCDD1D0
    1B310598
    540D8094
    9C9DD6A5
    35B52EAD
    545DCF86
    95D4BE50
    33E7D2EA
    1DCC2F55
    70113655
    08B16602

    First eight longs are correct, Chip.
  • Other eight longs also correct.
  • TonyB wrote: »
    Renaming XORO32 to RANDOM would make it so much easier to remember!
    cgracey wrote: »
    The XORO32 instruction iterates the state, but to actually get the random number out, you need to add the low and high 16-bit fields and use bits 15..1 of the sum, ignoring bit 0. So, just calling it RANDOM would be too simplistic.
    TonyB wrote: »
    Well, XORO32 will mean nothing to newbies, like me, especially when they find out that XORO16+16 would be more accurate.
    cgracey wrote: »
    Good point. We just used the naming precedent from xoroshiro128+. At first, I named it XORO32P, but that seemed twice as complex as XORO32.

    No excuse now! :)
  • cgraceycgracey Posts: 14,208
    edited 2017-10-27 23:27
    The word RANDOM, while descriptive, is also quite generic regarding implementation. I think our instruction name should reflect the uniqueness, to some degree. I kind of like XORO32, myself. It's got a mysterious aura.

    TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

    Here's the Prop2 code that generated those 16 numbers:
    again	mov	x,#16
    	mov	y,#1
    
    .lp	xoro32	y
    	mov	z,0
    	call	#z_to_hex
    	djnz	x,#.lp
    

    I want to be sure this is what we all think it is.
  • cgracey wrote: »
    TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

    Here's the Prop2 code that generated those 16 numbers:
    again	mov	x,#16
    	mov	y,#1
    
    .lp	xoro32	y
    	mov	z,0
    	call	#z_to_hex
    	djnz	x,#.lp
    

    I want to be sure this is what we all think it is.

    I don't know what Evan uses, probably C. I'm sure he'll reply when he's recovered from staying up all night.

    I use Z80 assembler! :cool:
  • cgraceycgracey Posts: 14,208
    TonyB_ wrote: »
    cgracey wrote: »
    TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

    Here's the Prop2 code that generated those 16 numbers:
    again	mov	x,#16
    	mov	y,#1
    
    .lp	xoro32	y
    	mov	z,0
    	call	#z_to_hex
    	djnz	x,#.lp
    

    I want to be sure this is what we all think it is.

    I don't know what Evan uses, probably C. I'm sure he'll reply when he's recovered from staying up all night.

    I use Z80 assembler! :cool:

    Z-80!!! Anything is possible, I guess. I'm curious to know how old you are. From your exuberance, I'd say you're young, but if you know Z-80...

    Anyway, I guess all I need to know is that this new XORO32 mechanism tests well in the randomness analyzers.
  • TonyB_TonyB_ Posts: 2,196
    edited 2018-07-07 17:17
    I'm kind of like XORO32, myself. I've got a mysterious aura.

    Here are the first 32 values, generated using QuickBASIC!

    55304084
    769F2C7C
    DFFCA249
    F2F0202C
    4DC85ACB
    ADCDD1D0
    1B310598
    540D8094
    9C9DD6A5
    35B52EAD
    545DCF86
    95D4BE50
    33E7D2EA
    1DCC2F55
    70113655
    08B16602
    3AC5CE5C
    8E98D521
    DCECB400
    0F410520
    F177AAA5
    5EE33D17
    5688B513
    CD39F4CD
    3B7D3953
    C9288B3B
    87D0A270
    D4C14834
    F024057E
    6F8986D5
    172C8FC9
    BB56FE23
  • cgraceycgracey Posts: 14,208
    Thanks, TonyB_.

    That all checks out on this end.

    Now, do we know that this scores well?
  • cgraceycgracey Posts: 14,208
    Ia 14,2,7 still the optimal setting for how XORO32 is now working?
  • evanhevanh Posts: 16,039
    TonyB_ wrote: »
    Thanks, Chip. Don't we want the first sum in the low word?
    wire [31:0] xoro32r	= {xoro32b[15:1], ^xoro32b,		// xoro32r = prng result
    			   xoro32a[15:1], ^xoro32a};
    
    No change needed there. Chip got that right for little-endian. Remember, for those 32 bits of xoro32r, when accessed as bytes, the end bits written there [7:0] are the first byte. So, the bits from xoro32a are already in low memory.
  • evanhevanh Posts: 16,039
    cgracey wrote: »
    Okay. I need one of you guys to tell me if these first 16 values from an initial seed of $00000001 are correct:
    55304084  769F2C7C  DFFCA249  F2F0202C  4DC85ACB  ADCDD1D0  1B310598  540D8094
    9C9DD6A5  35B52EAD  545DCF86  95D4BE50  33E7D2EA  1DCC2F55  70113655  08B16602
    
    Affirmative, my first 40 16-bit sums:
    0001 4084 5530 2c7c 769f a249 dffc 202c f2f0 5acb 4dc8 d1d0 adcd 0598 1b31 8094 540d d6a5 9c9d 2ead 35b5 cf86 545d be50 95d4 d2ea 33e7 2f55 1dcc 3655 7011 6602 08b1 ce5c 3ac5 d521 8e98 b400 dcec 0520
    
  • evanhevanh Posts: 16,039
    edited 2017-10-28 05:36
    TonyB_ wrote: »
    cgracey wrote: »
    TonyB_, are you certain that your number generation is the same as Evanh's? Thanks for checking those values.

    I want to be sure this is what we all think it is.

    I don't know what Evan uses, probably C. I'm sure he'll reply when he's recovered from staying up all night.

    I use Z80 assembler! :cool:

    Yep, C for the tight loops, plus PractRand. And Bash scripts for gluing it all together. The RNG and data formatting code is generalised so I can use any word size from 12 bits up to 64 bits. I've recently split the basic algorithm out of the data formatting code so that multiple versions of the data formatters can quickly be reassigned to any RNG.


    EDIT: Renamed an incorrectly used term.
  • evanhevanh Posts: 16,039
    cgracey wrote: »
    Ia 14,2,7 still the optimal setting for how XORO32 is now working?

    That's a big question right now. One explanation for the notable improvement of triplet [15 3 6], when swapping bit9 for bit1, is another point that Melissa O'Neill makes about all LCGs, like Xoroshift, is that they will, with enough state bits, pass every current statistical test.

    So, a possible way to interpret the good result of moving bit1 up to bit9 position is this has moved the dodginess out to the 512M statistical mark. Does this actually strengthen a small weakness? Dunno.
  • evanhevanh Posts: 16,039
    To see if there anything to compare I've just built score tables for every byte sampling variant, for the s16 triplet set, on both our current algorithm and also with the bit9<->bit1 swap.

    Nothing notable showed up. There looked to be better candidates until I then compared with the word variants and that always had something bad.

    I guess that means I'm testing wrong or the quality of the algorithm still needs work. I should try the PCG types ...
  • evanhevanh Posts: 16,039
    edited 2017-10-28 08:06
    cgracey wrote: »
    Ia 14,2,7 still the optimal setting for how XORO32 is now working?
    Yes, accepting the 128M "Byte1" score, this is the best triplet for current algorithm. All other byte variants of this triplet came out at 512M or 1G.
  • TonyB_TonyB_ Posts: 2,196
    edited 2017-10-28 09:01
    evanh wrote: »
    TonyB_ wrote: »
    Thanks, Chip. Don't we want the first sum in the low word?
    wire [31:0] xoro32r	= {xoro32b[15:1], ^xoro32b,		// xoro32r = prng result
    			   xoro32a[15:1], ^xoro32a};
    
    No change needed there. Chip got that right for little-endian. Remember, for those 32 bits of xoro32r, when accessed as bytes, the end bits written there [7:0] are the first byte. So, the bits from xoro32a are already in low memory.

    Chip edited his original post. I think it is right now and wire[7:0] xoro32r = xoro32a[7:0] = the first byte.

    On another point, in xoroshiro128plus.c at http://xoroshiro.di.unimi.it/ the sum and the state are out of sync by one iteration. The sum is calculated first so that the code could run faster in parallel but the summing should be at the end as in XORO32. This is mentioned in one of the c files.
  • TonyB_ wrote: »
    TonyB_ wrote: »
    Someone is saying that Xoroshiro32+ cannot possibly be as a good as an 8-bit Complementary Multiply-With-Carry PRNG and that CMWC in fact "is in an entirely different league."

    Here's the pseudo-code:
    
    ;set constants
    
    a = 253
    ;b = 8
    ;r = 8	
    
    ;initialise variables
    
    c = 0
    i = 0
    x[0] = 82
    x[1] = 97
    x[2] = 120
    x[3] = 111
    x[4] = 102
    x[5] = 116
    x[6] = 20
    x[7] = 12
    
    ;iterate CMWC
    
    t := a * x[i] + c
    c := t / 256			;high byte of t
    x[i] := (t mod 256) xor 255	;1's complement of low byte of t
    prn := x[i]
    i := (i + 1) and 7
    
    ;a,c,i,x[i] are 8-bit
    ;t is 16-bit
    ;Period > 2^66
    
    

    I like to know how this compares to the top byte of Xoroshiro32+ [14,2,7] but I don't have the testing tools.

    First 16 prn values:
    F5, D1, 07, D6, C3, F6, C8, 0F, D3, 80, 45, 7A, 75, 20, 64, 66
    

    Evan, if you have the time and the interest, could you please confirm the outputs and run a PractRand test on this? It's the only test of CMWC I'll ever ask you to do.
  • cgraceycgracey Posts: 14,208
    edited 2017-10-28 09:33
    evanh wrote: »
    cgracey wrote: »
    Ia 14,2,7 still the optimal setting for how XORO32 is now working?
    Yes, accepting the 128M "Byte1" score, this is the best triplet for current algorithm. All other byte variants of this triplet came out at 512M or 1G.

    Okay. Great!

    So, we can consider the improved XORO32 instruction to be optimized and DONE.

    I think it is quite complete now and it has maybe realized its full potential as a seedable/steppable 32-bit PRNG for software use. It couldn't be higher quality for 32 bits and it couldn't be any easier to use.

    It's very neat how your guys' ongoing efforts to understand and improve on this PRNG topology all came together, at the last minute, to present a way for encapsulstion, where every aspect of it could be so tidily sewn up. Very nice!

    There are so many neat things in the prop2 that are the result of so many people having so much inspiration. There are interesting stories behind every bit of it. I've got to pm Johannes, the original poster on this thread, so he can see how things have turned out.

    I'm going to add some seeding code in the ROM booter for our central XOROSHIRO 128+ generator. I"ll use P0's ADC calibration mode to garner a bunch of random bits to stuff in it, for like 100us, continuously. If this works, on no power-up will any Prop2 ever seed its PRNG with the same value as it has before, or any other Prop2 ever has or will in the future. We'll have randomness out the wazoo!

    I keep smiling about how TonyB_'s fitful concern over getting a solid 16, 32, or 64 bits of quality per iteration, regardless of how awkward it might have worked using C and Z flags as extra storage, all resulted in some magical confluence of realizations, to make something maybe perfect, in the end.

    And Evanh has been developing all these tests to discover what works best. It's like a big hunt.

    Please keep thinking about PRNG's. I just wanted to acknowledge the current milestone we've reached.
  • Yes, something magical happened to XORO32 in the past day or so and it's N times better than before, where N is quite a big number. Many thanks to David Roberts for the parity idea, Evan for all his testing and Chip for the Verilog change that seemed to appear from nowhere!

    Chip, is the endian-ness of xoro32r definitely correct now?
  • cgraceycgracey Posts: 14,208
    Yes, I had modified my Verilog post after your endian comment. It is what you see.
  • evanhevanh Posts: 16,039
    Thanks Chip and Tony. Glad to help.

    I wonder if Mr Robert's parity calculation includes the carry bit. That is what made the big improvement to the parity idea.
  • evanhevanh Posts: 16,039
    edited 2017-10-28 10:04
    TonyB_ wrote: »
    Evan, if you have the time and the interest, could you please confirm the outputs and run a PractRand test on this? It's the only test of CMWC I'll ever ask you to do.
    I've done the code entry for the algorithm:
    static uint8_t  next(void)  {
    //initialise variables
    	static uint8_t  i = 0;  // table index, incremented per iteration
    	static uint8_t  c = 0;  // high byte of t, held for next iteration
    	static uint8_t  x[8] = {82, 97, 120, 111, 102, 116, 20, 12};  // main state table, one entry changed per iteration
    
    	uint8_t  prn;
    	uint16_t  t;
    
    
    //iterate CMWC
    	t = _A * x[i] + c;
    	c = t / 256;    //high byte of t
    	x[i] = (t % 256) ^ 255;    //1's complement of low byte of t
    	prn = x[i];
    	i = (i + 1) & 7;
    
    	return prn;
    }
    
  • Excellent! Variable c is actually the carry in Complementary Multiply-With-Carry.
  • evanhevanh Posts: 16,039
    Oops, t should be 16 bits. I'll fix that ...
  • TonyB_TonyB_ Posts: 2,196
    edited 2017-10-28 10:10
    Could you simply return x[ i ] (spaces to prevent italics) and get rid of prn? I only added prn to make things clear.
  • Is _A declared as a constant = 253 somewhere else?
  • evanhevanh Posts: 16,039
    edited 2017-10-28 10:47
    Clarity is nice. Optimiser takes care of variable elimination.

    Yep, _A is what's called a #define. It's just a straight text replacement ahead of compiling. It never becomes a compile time symbol. Doing that that way was completely arbitrary. The labelling style of underscore-capital is sort of a convention in C for this type of code.
Sign In or Register to comment.