Shop OBEX P1 Docs P2 Docs Learn Events
Random/LFSR on P2 - Page 15 — Parallax Forums

Random/LFSR on P2

1121315171892

Comments

  • cgraceycgracey Posts: 14,206
    I made a program to use all 16 cogs to search the 16x16x16 space for our made-up xoroshiro32+ PRNG. It's running now, but won't have results for several hours:
    dat
    '
    ' Launch cogs 15..0 with xoroshiro32+ search program
    '
    	org
    
    	setq	##$1000/4-1	'clear 4096 bytes starting at $1000
    	wrlong	#0,##$1000
    
    .loop	coginit	cognum,#@search	'last iteration relaunches cog 0
    	djns	cognum,#.loop
    
    cognum	long	15
    '
    ' Search 1 x 16 x 16 solutions - each cog
    '
    	org
    
    search	cogid	t1		't1 is cogid 0..15
    	mov	t2,#0		't2 ranges 0..15
    	mov	t3,#0		't3 ranges 0..15
    
    new	mov	state,#1	'new test, seed state
    	mov	tests,#0	'reset test counter
    
    loop	getword	s0,state,#0	'extract s0 and s1 words from state long
    	getword	s1,state,#1
    
    	setword	s0,s0,#1	'copy bottom words to top words for ROL
    	setword	s1,s1,#1
    
    	xor	s1,s0		's1 ^= s0
    
    	rol	s0,t1		's0 = rol(s0,t1) ^ s1 ^ (s1 << t2)
    	xor	s0,s1
    	mov	x,s1
    	shl	x,t2
    	xor	s0,x
    
    	rol	s1,t3		's1 = rol(s1,t3)
    
    	setword	state,s0,#0	'pack s0 and s1 back into state long
    	setword	state,s1,#1
    
    	ijz	tests,#fail	'if 2^32 tests, caught in loop, fail
    
    	cmp	state,#1  wz	'loop if not returned to seed value
    if_nz	jmp	#loop
    
    	ijnz	tests,#fail	'if not 2^32-1 iterations, fail
    
    	decod	x,#12		'2^32-1 iterations!
    	setnib	x,t1,#2
    	setnib	x,t2,#1
    	setnib	x,t3,#0
    	wrbyte	#$FF,x		'mark byte in buffer
    
    fail	incmod	t2,#15	wc	'try next solution, inc t2/t3
    if_c	incmod	t3,#15	wc
    if_nc	jmp	#new
    
    	test	t1	wz	'this cog's solution space checked
    if_nz	cogstop	t1		'if not cog0, stop now
    '
    ' Cog 0 reports the results
    '
    	bmask	dirb,#15	'output to leds
    	mov	ptra,#0		'reset ptra
    
    show	rdbyte	x,ptra[##$1000]	wz	'get byte[ptra+$1000], 0?
    
    if_nz	mov	outb,ptra	'show 3 nibbles on leds
    
    if_nz	jp	#57,#$		'wait for pb3 cycle
    if_nz	waitx	##80_000_000/10
    if_nz	jnp	#57,#$
    if_nz	waitx	##80_000_000/10
    
    	incmod	ptra,##$FFF	'find and display next solution
    	jmp	#show
    '
    ' Variables
    '
    state	res	1
    tests	res	1
    t1	res	1
    t2	res	1
    t3	res	1
    s0	res	1
    s1	res	1
    x	res	1
    
  • cgraceycgracey Posts: 14,206
    edited 2017-04-11 09:54
    Heater. wrote: »
    How about this one from encryption guru Bruce Schneier:
    https://www.schneier.com/academic/archives/1994/09/pseudo-random_sequen.html

    Period 2^32 - 1
    int RANDOM ()  {
        static unsigned long register; /*Register must be unsigned so right
                                         shift works properly.*/
        /*Register should be initialized with some random value.*/
        register = ((((register >> 31) /*Shift the 32nd bit to the first bit*/
                   ^ (register >> 6)   /*XOR it with the seventh bit*/
                   ^ (register >> 4)   /*XOR it with the fifth bit*/
                   ^ (register >> 2)   /*XOR it with the third bit*/
                   ^ (register >> 1)   /*XOR it with the second bit*/
                   ^ register)         /*and XOR it with the first bit.*/
                   & 0x0000001)        /*Strip all the other bits off and*/
                   <<31)               /*move it back to the 32nd bit.*/
                   | (register >> 1);  /*Or with the register shifted right.*/
        return register & 0x00000001;  /*Return the first bit.*/
    }
    


    As he says:

    "One final note of caution: There are many more feedback arrangements for various-length LSFRs that produce maximum-length sequences, but don't fiddle with the feedback sequences without the proper mathematical theory. The particular bits that are XORed together may seem arbitrary, but they are chosen to ensure that the sequence takes 2n-1 bits to repeat. Blindly choosing a different feedback sequence can easily make the output sequence repeat after only a couple of hundred bits, and you would be better off sticking with your store-bought RND function."

    That's a simple LFSR. This would be the code for it in PASM:
    '
    ' Iterate lfsr, result in C flag
    '
    iterate	test	lfsr,mask	wc	'get parity of AND result
    _ret_	rcr	lfsr,#1		wc	'new bit in, old bit out
    
    lfsr	long	1
    mask	long	$8000_0056
    
  • evanhevanh Posts: 16,029
    cgracey wrote: »
    I effectively made this and it ran for $7FFF_FFFF iterations before it repeated the seed value of 1:
    Still an excellent result for a shot in the dark. I've not tried that test with any of my runs. It does beg the question of how easy it seems to be to get a long repeat sequence. Looks like it happens quite easy.
    uint16_t next(void) {
    	const uint16_t s0 = s[0];
    	const_uint16_t s1 = s[1];
    	const uint16_t result = s0 + s1;
    
    	s1 ^= s0;
    	s[0] = rotl(s0, 14) ^ s1 ^ (s1 << 4);
    	s[1] = rotl(s1, 9);
    
    	return result;
    }
    
    There is significant variations in the PractRand test results when differing constants are chosen. Here's a collect of Xoroshiro32+ runs I've just tried:
    a=11; b=1; c=6: High quality to 256 KBytes.
    a=11; b=2; c=6: High quality to 128 KBytes.
    a=11; b=3; c=6: High quality to 64 KBytes.
    a=11; b=4; c=6: High quality to 64 KBytes.
    a=11; b=5; c=6: High quality to 64 KBytes.
    a=11; b=1; c=7: High quality to 64 KBytes.
    a=11; b=2; c=7: High quality to 128 KBytes.
    a=11; b=3; c=7: High quality to 256 KBytes.
    a=11; b=4; c=7: High quality to 128 KBytes.
    a=11; b=5; c=7: High quality to 32 KBytes.
    a=11; b=1; c=8: High quality to 128 KBytes.
    a=11; b=2; c=8: High quality to 16 KBytes.
    a=11; b=3; c=8: High quality to 32 KBytes.
    a=11; b=4; c=8: High quality to 128 KBytes.
    a=11; b=5; c=8: High quality to 64 KBytes.
    a=11; b=1; c=9: High quality to 64 KBytes.
    a=11; b=2; c=9: High quality to 64 KBytes.
    a=11; b=3; c=9: High quality to 64 KBytes.
    a=11; b=4; c=9: High quality to 128 KBytes.
    a=11; b=5; c=9: High quality to 128 KBytes.
    a=11; b=1; c=10: High quality to 128 KBytes.
    a=11; b=2; c=10: High quality to 64 KBytes.
    a=11; b=3; c=10: High quality to 64 KBytes.
    a=11; b=4; c=10: High quality to 32 KBytes.
    a=11; b=5; c=10: High quality to 256 KBytes.
    a=12; b=1; c=6: High quality to 256 KBytes.
    a=12; b=2; c=6: *Instant fail!!!
    a=12; b=3; c=6: High quality to 128 KBytes.
    a=12; b=4; c=6: *Instant fail!!!
    a=12; b=5; c=6: High quality to 32 KBytes.
    a=12; b=1; c=7: High quality to 256 KBytes.
    a=12; b=2; c=7: High quality to 64 KBytes.
    a=12; b=3; c=7: High quality to 256 KBytes.
    a=12; b=4; c=7: High quality to 32 KBytes.
    a=12; b=5; c=7: High quality to 64 KBytes.
    a=12; b=1; c=8: High quality to 64 KBytes.
    a=12; b=2; c=8: *Instant fail!!!
    a=12; b=3; c=8: High quality to 32 KBytes.
    a=12; b=4; c=8: *Instant fail!!!
    a=12; b=5; c=8: High quality to 128 KBytes.
    a=12; b=1; c=9: High quality to 128 KBytes.
    a=12; b=2; c=9: High quality to 128 KBytes.
    a=12; b=3; c=9: High quality to 64 KBytes.
    a=12; b=4; c=9: High quality to 64 KBytes.
    a=12; b=5; c=9: High quality to 64 KBytes.
    a=12; b=1; c=10: High quality to 32 KBytes.
    a=12; b=2; c=10: *Instant fail!!!
    a=12; b=3; c=10: High quality to 256 KBytes.
    a=12; b=4; c=10: *Instant fail!!!
    a=12; b=5; c=10: High quality to 128 KBytes.
    a=13; b=1; c=6: High quality to 128 KBytes.
    a=13; b=2; c=6: High quality to 128 KBytes.
    a=13; b=3; c=6: High quality to 64 KBytes.
    a=13; b=4; c=6: High quality to 256 KBytes.
    a=13; b=5; c=6: High quality to 64 KBytes.
    a=13; b=1; c=7: High quality to 64 KBytes.
    a=13; b=2; c=7: High quality to 256 KBytes.
    a=13; b=3; c=7: High quality to 32 KBytes.
    a=13; b=4; c=7: High quality to 512 KBytes. (Winner #1)
    a=13; b=5; c=7: High quality to 64 KBytes.
    a=13; b=1; c=8: High quality to 128 KBytes.
    a=13; b=2; c=8: High quality to 64 KBytes.
    a=13; b=3; c=8: High quality to 256 KBytes. (This one conforms to my earlier description)
    a=13; b=4; c=8: High quality to 32 KBytes.
    a=13; b=5; c=8: High quality to 8 KBytes.
    a=13; b=1; c=9: High quality to 128 KBytes.
    a=13; b=2; c=9: High quality to 64 KBytes.
    a=13; b=3; c=9: High quality to 32 KBytes.
    a=13; b=4; c=9: High quality to 128 KBytes.
    a=13; b=5; c=9: High quality to 64 KBytes.
    a=13; b=1; c=10: High quality to 64 KBytes.
    a=13; b=2; c=10: High quality to 64 KBytes.
    a=13; b=3; c=10: High quality to 8 KBytes.
    a=13; b=4; c=10: High quality to 8 KBytes.
    a=13; b=5; c=10: High quality to 128 KBytes.
    a=14; b=1; c=6: High quality to 256 KBytes.
    a=14; b=2; c=6: *Instant fail!!!
    a=14; b=3; c=6: High quality to 256 KBytes.
    a=14; b=4; c=6: *Instant fail!!!
    a=14; b=5; c=6: High quality to 128 KBytes.
    a=14; b=1; c=7: High quality to 128 KBytes.
    a=14; b=2; c=7: High quality to 256 KBytes.
    a=14; b=3; c=7: High quality to 256 KBytes.
    a=14; b=4; c=7: High quality to 64 KBytes.
    a=14; b=5; c=7: High quality to 512 KBytes. (Winner #2)
    a=14; b=1; c=8: High quality to 64 KBytes.
    a=14; b=2; c=8: *Instant fail!!!
    a=14; b=3; c=8: High quality to 64 KBytes.
    a=14; b=4; c=8: *Instant fail!!!
    a=14; b=5; c=8: High quality to 32 KBytes.
    a=14; b=1; c=9: High quality to 64 KBytes.
    a=14; b=2; c=9: High quality to 128 KBytes.
    a=14; b=3; c=9: High quality to 64 KBytes.
    a=14; b=4; c=9: High quality to 128 KBytes. (This is your above constants)
    a=14; b=5; c=9: High quality to 16 KBytes.
    a=14; b=1; c=10: High quality to 128 KBytes.
    a=14; b=2; c=10: *Instant fail!!!
    a=14; b=3; c=10: High quality to 64 KBytes.
    a=14; b=4; c=10: *Instant fail!!!
    a=14; b=5; c=10: High quality to 32 KBytes.
    a=15; b=1; c=6: High quality to 256 KBytes.
    a=15; b=2; c=6: High quality to 512 KBytes. (Winner #3)
    a=15; b=3; c=6: High quality to 128 KBytes.
    a=15; b=4; c=6: High quality to 64 KBytes.
    a=15; b=5; c=6: High quality to 512 KBytes. (Winner #4)
    a=15; b=1; c=7: High quality to 128 KBytes.
    a=15; b=2; c=7: High quality to 128 KBytes.
    a=15; b=3; c=7: High quality to 128 KBytes.
    a=15; b=4; c=7: High quality to 128 KBytes.
    a=15; b=5; c=7: High quality to 64 KBytes.
    a=15; b=1; c=8: High quality to 256 KBytes.
    a=15; b=2; c=8: High quality to 64 KBytes.
    a=15; b=3; c=8: High quality to 128 KBytes.
    a=15; b=4; c=8: High quality to 64 KBytes.
    a=15; b=5; c=8: High quality to 256 KBytes.
    a=15; b=1; c=9: High quality to 256 KBytes.
    a=15; b=2; c=9: High quality to 64 KBytes.
    a=15; b=3; c=9: High quality to 64 KBytes.
    a=15; b=4; c=9: High quality to 64 KBytes.
    a=15; b=5; c=9: High quality to 64 KBytes.
    a=15; b=1; c=10: High quality to 128 KBytes.
    a=15; b=2; c=10: High quality to 128 KBytes.
    a=15; b=3; c=10: High quality to 64 KBytes.
    a=15; b=4; c=10: High quality to 32 KBytes.
    a=15; b=5; c=10: High quality to 16 KBytes.

    * Note that, no matter the result size, whenever all three constants are even numbers the tests are instant off-the-chart failures for every test of the battery.
  • evanhevanh Posts: 16,029
    edited 2017-04-11 17:10
    Whaaa! I think I've messed up somewhere. I've rebuilt my code and am now getting up to 512 MBytes! You might have to discard most of what I've written there, sorry.

    The even constants rule still applies though.

    Smile, it's 5:10AM too!
  • cgraceycgracey Posts: 14,206
    edited 2017-04-11 17:35
    evanh wrote: »
    Whaaa! I think I've messed up somewhere. I've rebuilt my code and am now getting up to 512 MBytes! You might have to discard most of what I've written there, sorry.

    The even constants rule still applies though.

    Smile, it's 5:10AM too!

    I went to bed late, too, and now I feel lousy.

    When I get into the office today, I'm hoping my 16-cog xoroshiro32+ tester has some full-length results. Then, let's test those. That's when we'll know for sure if we'very got something. Your initial tests showed some promise, anyway.
  • cgraceycgracey Posts: 14,206
    Cogs are starting to COGSTOP themselves on the xoroshiro32+ search program I started running last night. Maybe in an hour, I'll be able to see if we have any results - that is, $FFFF_FFFF iterations before returning to $0000_0001. If we get some good rotate/shift-value sets, I think Evanh will be able to test their actual quality.
  • cgraceycgracey Posts: 14,206
    edited 2017-04-11 21:42
    Evanh,

    Here is the algorithm:
    uint16_t next(void) {
    	const uint16_t s0 = s[0];
    	const_uint16_t s1 = s[1];
    	const uint16_t result = s0 + s1;
    
    	s1 ^= s0;
    	s[0] = rotl(s0, t1) ^ s1 ^ (s1 << t2);
    	s[1] = rotl(s1, t3);
    
    	return result;
    }
    

    It's looking like these values for t1, t2, and t3 produce maximal-length runs:

    10,7,11
    11,1,2
    11,2,6
    11,3,10
    11,7,10

    There's going to be a bunch more possibilities, but could you please test a few of these and discover if there is any qualitative difference among them?
  • cgraceycgracey Posts: 14,206
    edited 2017-04-12 02:28
    After running that 16-cog xoroshiro32+ search program for 15 hours, I got this data out.

    Note that for each 3-digit hex number, the 1st nibble is t1, the second nibble is t2, and the third nibble is t3. So, for the first data set, $05B, t1=0, t2=5, and t3=11.

    I'm hoping Evanh will run a few of these cases through dieharder and find out if we have a quality solution here for a 32-bit state PRNG. I would only bother with sets that have all different digits and no 0's in them.
    05B
    0F1
    128
    1F0
    21B
    21F
    227
    26F
    287
    299
    2B3
    326
    33A
    3B2
    3BE
    419
    47F
    485
    526
    584
    5EC
    623
    625
    62B
    63F
    6EB
    718
    722
    72A
    782
    7AA
    7F8
    817
    821
    85D
    87F
    89D
    8F7
    914
    92E
    98E
    992
    A27
    A33
    A3B
    A5D
    A7B
    AA7
    B12
    B1E
    B26
    B3A
    B50
    B7A
    B8C
    BAC
    BE6
    C4F
    C8B
    CAB
    CAD
    CE5
    D3E
    D58
    D5A
    D98
    DAC
    DBE
    DCE
    DDE
    E1B
    E27
    E29
    E3D
    E89
    EB3
    EBB
    EBD
    ECD
    EDD
    F12
    F36
    F4C
    F62
    F74
    F78
    
  • cgraceycgracey Posts: 14,206
    Evanh,

    Could you please test this one first:

    t1 = 5
    t2 = 2
    t3 = 6

    Thanks. I just did a separate test on that one and it looks really good to me. I wonder what dieharder would say.
  • evanhevanh Posts: 16,029
    Sure thing. I'll be off work in about 3 hours.
  • cgraceycgracey Posts: 14,206
    evanh wrote: »
    Sure thing. I'll be off work in about 3 hours.

    Cool.
  • evanhevanh Posts: 16,029
    Ended up doing overtime at work.

    I've repaired my bug. I wasn't completely taking care of arbitrary length in the rotl() function. I'd suffered from it from day one I think.
    #define  ACCUM_MASK  (((1UL<<RESULT_SIZE)-1)<<1|1)   // One more bit than results.
    
    static inline  uint64_t rotl(uint64_t value, int shift) {
    	return (value << shift) | ((value & ACCUM_MASK) >> (RESULT_SIZE + 1 - shift)) & ACCUM_MASK;
    }
    

    Right, rerunning last night's effort, my best Xoroshiro32+ high quality run was 1 GByte with constants: 14,4,7. Next best was 14,3,7 with one very suspicious case at 1 GByte. Both of these failed at 2 GByte. Massive fix from last night. :)

    Of the ones you've highlighted:
    10,7,11 HQ to 256 MB. Fails at 512 MB.
    11,1,2 HQ to 64 MB. Fails at 256 MB.
    11,2,6 HQ to 256 MB. Fails at 512 MB.
    11,3,10 HQ to 8 MB. Fails at 32 MB.
    11,7,10 HQ to 1 MB. Fails at 4 MB.
    5,2,6 HQ to 64 MB. Fails at 128 MB.
  • evanhevanh Posts: 16,029
    10,3,5 is not quite as clean as 14,4,7 but I'll still give it a HQ to 1 GB. Fails at 2 GB.
  • evanhevanh Posts: 16,029
    I've confirmed same result from using Heater's original rotl() but with uint16_t and also plugging in your above routine without any of my auto sizing tricks.

    Attached is the 14,4,7 full test output. What's notable now is the solid and dramatic failure case at 2 GBytes. Whereas when I had the bug in my rotl() function, all the higher sized tests failed in the same single test case every time - DC6-9x1Bytes-1.
  • cgraceycgracey Posts: 14,206
    edited 2017-04-12 16:41
    Evanh, thanks for trying those.

    How many bits are you sampling at each iteration? I would think bits 15..1 of the addition would be the best.

    When you say a test failed at 2GB, how many iterations was that? Ideally, we should be getting 4G iterations passing, right? All those patterns should repeat at 4G - 1.
  • evanhevanh Posts: 16,029
    edited 2017-04-13 00:30
    Generator result size is 15 bits and I continuously concatenate those together to feed bytes to the Practrand test suit. So that's just over 1 GSamples at the 2 GByte point.

    I'd post my full code now but I'm at work again ...
  • evanhevanh Posts: 16,029
    The author of Xoroshiro says not to expect any relationship between repeat length and quality. Other than, I guess, the quality has to collapse beyond the repeat length.
  • cgraceycgracey Posts: 14,206
    evanh wrote: »
    The author of Xoroshiro says not to expect any relationship between repeat length and quality. Other than, I guess, the quality has to collapse beyond the repeat length.

    But what about random seeding? That effectively jumps you ahead to some unknown point, right?
  • cgraceycgracey Posts: 14,206
    evanh wrote: »
    Generator result size is 15 bits and I continuously concatenate those together to feed bytes to the Practrand test suit. So that's just over 1 GSamples at the 2 GByte point.

    I'd post my full code now but I'm at work again ...

    OK, sounds good. And those 15 bits are the top bits of the sum, right, excluding the LSB?

    I wonder why we hit a wall after so many samples, instead of it just going on forever.
  • evanhevanh Posts: 16,029
    edited 2017-04-13 04:50
    You mean multiple reseedings during a single test? That would invalidate the purpose of the test I'd think.
  • cgraceycgracey Posts: 14,206
    evanh wrote: »
    You mean multiple reseedings during a single test? That would invalidate the purpose of the test I'd think.

    I meant that, in practice, a PRNG may be seeded, and it will iterate from that seed value. If we poop out at 512MB, what would happen to our quality if we seeded with a value from the fail point?
  • evanhevanh Posts: 16,029
    cgracey wrote: »
    OK, sounds good. And those 15 bits are the top bits of the sum, right, excluding the LSB?
    Yep. I right shift the s0 + s1 sum, then mask 0x7fff to be sure.
    I wonder why we hit a wall after so many samples, instead of it just going on forever.
    The random data stream does go forever. It's just the Practrand test suit is sensitive and self terminates when the accumulated stats go too far out. Makes for easy automation of test combinations.
  • evanhevanh Posts: 16,029
    edited 2017-04-13 05:59
    cgracey wrote: »
    I meant that, in practice, a PRNG may be seeded, and it will iterate from that seed value. If we poop out at 512MB, what would happen to our quality if we seeded with a value from the fail point?

    As things are done at the moment, the test would start afresh and produce another 512 MBytes before failing again. The initial seed value shouldn't have any impact on quality.

    EDIT: Fixed my fail value.
  • The initial seed value shouldn't have any impact on quality

    This makes rational sense, but I can't seem to visualize it.

    Random numbers are weird.



  • evanhevanh Posts: 16,029
    edited 2017-04-13 06:01
    The way I think about it is the repeat length doesn't change, same as if the total length of 0 -99 is 100 and, assuming 99 rolls over back to 0, no matter where you start, eg: at 50, it still takes 100 iterations to get back to the starting value of 50 again.
  • The roll over visualization helps! Thanks :D

  • cgraceycgracey Posts: 14,206
    So, up to the fail point, was the quality optimal? If so, I think this is suitable for our purpose of making a repeatable software PRNG.
  • cgraceycgracey Posts: 14,206
    evanh wrote: »
    cgracey wrote: »
    I meant that, in practice, a PRNG may be seeded, and it will iterate from that seed value. If we poop out at 512MB, what would happen to our quality if we seeded with a value from the fail point?

    As things are done at the moment, the test would start afresh and produce another 512 MBytes before failing again. The initial seed value shouldn't have any impact on quality.

    EDIT: Fixed my fail value.

    Okay. Thanks for explaining that. I understand now.
  • cgraceycgracey Posts: 14,206
    edited 2017-04-13 06:57
    evanh wrote: »
    I've confirmed same result from using Heater's original rotl() but with uint16_t and also plugging in your above routine without any of my auto sizing tricks.

    Attached is the 14,4,7 full test output. What's notable now is the solid and dramatic failure case at 2 GBytes. Whereas when I had the bug in my rotl() function, all the higher sized tests failed in the same single test case every time - DC6-9x1Bytes-1.

    I just realized something... 14,4,7 did not show up in my overnight 16-cog test as a setting which had a full-length period of $FFFF_FFFF iterations before returning to the original seed. That might explain the dramatic fail at 2GB. How did you decide to test that case?

    My findings skipped over 14,4,7:

    $E3D = 14,3,13
    $E89 = 13,8,9
  • evanhevanh Posts: 16,029
    I automated 125 test runs of the a,b,c combinations resulting in 125 Practrand result output files. The longer a particular test lasts the bigger the result file becomes. I examined only the biggest files and posted my favourite from them.

    I've since widened the combinations to 500 or so but not got anything better.
Sign In or Register to comment.