Random/LFSR on P2

cgracey · 2017-04-11 09:46

I made a program to use all 16 cogs to search the 16x16x16 space for our made-up xoroshiro32+ PRNG. It's running now, but won't have results for several hours:

dat
'
' Launch cogs 15..0 with xoroshiro32+ search program
'
	org

	setq	##$1000/4-1	'clear 4096 bytes starting at $1000
	wrlong	#0,##$1000

.loop	coginit	cognum,#@search	'last iteration relaunches cog 0
	djns	cognum,#.loop

cognum	long	15
'
' Search 1 x 16 x 16 solutions - each cog
'
	org

search	cogid	t1		't1 is cogid 0..15
	mov	t2,#0		't2 ranges 0..15
	mov	t3,#0		't3 ranges 0..15

new	mov	state,#1	'new test, seed state
	mov	tests,#0	'reset test counter

loop	getword	s0,state,#0	'extract s0 and s1 words from state long
	getword	s1,state,#1

	setword	s0,s0,#1	'copy bottom words to top words for ROL
	setword	s1,s1,#1

	xor	s1,s0		's1 ^= s0

	rol	s0,t1		's0 = rol(s0,t1) ^ s1 ^ (s1 << t2)
	xor	s0,s1
	mov	x,s1
	shl	x,t2
	xor	s0,x

	rol	s1,t3		's1 = rol(s1,t3)

	setword	state,s0,#0	'pack s0 and s1 back into state long
	setword	state,s1,#1

	ijz	tests,#fail	'if 2^32 tests, caught in loop, fail

	cmp	state,#1  wz	'loop if not returned to seed value
if_nz	jmp	#loop

	ijnz	tests,#fail	'if not 2^32-1 iterations, fail

	decod	x,#12		'2^32-1 iterations!
	setnib	x,t1,#2
	setnib	x,t2,#1
	setnib	x,t3,#0
	wrbyte	#$FF,x		'mark byte in buffer

fail	incmod	t2,#15	wc	'try next solution, inc t2/t3
if_c	incmod	t3,#15	wc
if_nc	jmp	#new

	test	t1	wz	'this cog's solution space checked
if_nz	cogstop	t1		'if not cog0, stop now
'
' Cog 0 reports the results
'
	bmask	dirb,#15	'output to leds
	mov	ptra,#0		'reset ptra

show	rdbyte	x,ptra[##$1000]	wz	'get byte[ptra+$1000], 0?

if_nz	mov	outb,ptra	'show 3 nibbles on leds

if_nz	jp	#57,#$		'wait for pb3 cycle
if_nz	waitx	##80_000_000/10
if_nz	jnp	#57,#$
if_nz	waitx	##80_000_000/10

	incmod	ptra,##$FFF	'find and display next solution
	jmp	#show
'
' Variables
'
state	res	1
tests	res	1
t1	res	1
t2	res	1
t3	res	1
s0	res	1
s1	res	1
x	res	1

cgracey · 2017-04-11 09:52

Heater. wrote: »
How about this one from encryption guru Bruce Schneier:
https://www.schneier.com/academic/archives/1994/09/pseudo-random_sequen.html

Period 2^32 - 1
int RANDOM ()  {
    static unsigned long register; /*Register must be unsigned so right
                                     shift works properly.*/
    /*Register should be initialized with some random value.*/
    register = ((((register >> 31) /*Shift the 32nd bit to the first bit*/
               ^ (register >> 6)   /*XOR it with the seventh bit*/
               ^ (register >> 4)   /*XOR it with the fifth bit*/
               ^ (register >> 2)   /*XOR it with the third bit*/
               ^ (register >> 1)   /*XOR it with the second bit*/
               ^ register)         /*and XOR it with the first bit.*/
               & 0x0000001)        /*Strip all the other bits off and*/
               <<31)               /*move it back to the 32nd bit.*/
               | (register >> 1);  /*Or with the register shifted right.*/
    return register & 0x00000001;  /*Return the first bit.*/
}
As he says:

"One final note of caution: There are many more feedback arrangements for various-length LSFRs that produce maximum-length sequences, but don't fiddle with the feedback sequences without the proper mathematical theory. The particular bits that are XORed together may seem arbitrary, but they are chosen to ensure that the sequence takes 2n-1 bits to repeat. Blindly choosing a different feedback sequence can easily make the output sequence repeat after only a couple of hundred bits, and you would be better off sticking with your store-bought RND function."

That's a simple LFSR. This would be the code for it in PASM:

'
' Iterate lfsr, result in C flag
'
iterate	test	lfsr,mask	wc	'get parity of AND result
_ret_	rcr	lfsr,#1		wc	'new bit in, old bit out

lfsr	long	1
mask	long	$8000_0056

evanh · 2017-04-11 15:59

cgracey wrote: »

I effectively made this and it ran for $7FFF_FFFF iterations before it repeated the seed value of 1:

Still an excellent result for a shot in the dark. I've not tried that test with any of my runs. It does beg the question of how easy it seems to be to get a long repeat sequence. Looks like it happens quite easy.

uint16_t next(void) {
	const uint16_t s0 = s[0];
	const_uint16_t s1 = s[1];
	const uint16_t result = s0 + s1;

	s1 ^= s0;
	s[0] = rotl(s0, 14) ^ s1 ^ (s1 << 4);
	s[1] = rotl(s1, 9);

	return result;
}

There is significant variations in the PractRand test results when differing constants are chosen. Here's a collect of Xoroshiro32+ runs I've just tried:
a=11; b=1; c=6: High quality to 256 KBytes.
a=11; b=2; c=6: High quality to 128 KBytes.
a=11; b=3; c=6: High quality to 64 KBytes.
a=11; b=4; c=6: High quality to 64 KBytes.
a=11; b=5; c=6: High quality to 64 KBytes.
a=11; b=1; c=7: High quality to 64 KBytes.
a=11; b=2; c=7: High quality to 128 KBytes.
a=11; b=3; c=7: High quality to 256 KBytes.
a=11; b=4; c=7: High quality to 128 KBytes.
a=11; b=5; c=7: High quality to 32 KBytes.
a=11; b=1; c=8: High quality to 128 KBytes.
a=11; b=2; c=8: High quality to 16 KBytes.
a=11; b=3; c=8: High quality to 32 KBytes.
a=11; b=4; c=8: High quality to 128 KBytes.
a=11; b=5; c=8: High quality to 64 KBytes.
a=11; b=1; c=9: High quality to 64 KBytes.
a=11; b=2; c=9: High quality to 64 KBytes.
a=11; b=3; c=9: High quality to 64 KBytes.
a=11; b=4; c=9: High quality to 128 KBytes.
a=11; b=5; c=9: High quality to 128 KBytes.
a=11; b=1; c=10: High quality to 128 KBytes.
a=11; b=2; c=10: High quality to 64 KBytes.
a=11; b=3; c=10: High quality to 64 KBytes.
a=11; b=4; c=10: High quality to 32 KBytes.
a=11; b=5; c=10: High quality to 256 KBytes.
a=12; b=1; c=6: High quality to 256 KBytes.
a=12; b=2; c=6: *Instant fail!!!
a=12; b=3; c=6: High quality to 128 KBytes.
a=12; b=4; c=6: *Instant fail!!!
a=12; b=5; c=6: High quality to 32 KBytes.
a=12; b=1; c=7: High quality to 256 KBytes.
a=12; b=2; c=7: High quality to 64 KBytes.
a=12; b=3; c=7: High quality to 256 KBytes.
a=12; b=4; c=7: High quality to 32 KBytes.
a=12; b=5; c=7: High quality to 64 KBytes.
a=12; b=1; c=8: High quality to 64 KBytes.
a=12; b=2; c=8: *Instant fail!!!
a=12; b=3; c=8: High quality to 32 KBytes.
a=12; b=4; c=8: *Instant fail!!!
a=12; b=5; c=8: High quality to 128 KBytes.
a=12; b=1; c=9: High quality to 128 KBytes.
a=12; b=2; c=9: High quality to 128 KBytes.
a=12; b=3; c=9: High quality to 64 KBytes.
a=12; b=4; c=9: High quality to 64 KBytes.
a=12; b=5; c=9: High quality to 64 KBytes.
a=12; b=1; c=10: High quality to 32 KBytes.
a=12; b=2; c=10: *Instant fail!!!
a=12; b=3; c=10: High quality to 256 KBytes.
a=12; b=4; c=10: *Instant fail!!!
a=12; b=5; c=10: High quality to 128 KBytes.
a=13; b=1; c=6: High quality to 128 KBytes.
a=13; b=2; c=6: High quality to 128 KBytes.
a=13; b=3; c=6: High quality to 64 KBytes.
a=13; b=4; c=6: High quality to 256 KBytes.
a=13; b=5; c=6: High quality to 64 KBytes.
a=13; b=1; c=7: High quality to 64 KBytes.
a=13; b=2; c=7: High quality to 256 KBytes.
a=13; b=3; c=7: High quality to 32 KBytes.
a=13; b=4; c=7: High quality to 512 KBytes. (Winner #1)
a=13; b=5; c=7: High quality to 64 KBytes.
a=13; b=1; c=8: High quality to 128 KBytes.
a=13; b=2; c=8: High quality to 64 KBytes.
a=13; b=3; c=8: High quality to 256 KBytes. (This one conforms to my earlier description)
a=13; b=4; c=8: High quality to 32 KBytes.
a=13; b=5; c=8: High quality to 8 KBytes.
a=13; b=1; c=9: High quality to 128 KBytes.
a=13; b=2; c=9: High quality to 64 KBytes.
a=13; b=3; c=9: High quality to 32 KBytes.
a=13; b=4; c=9: High quality to 128 KBytes.
a=13; b=5; c=9: High quality to 64 KBytes.
a=13; b=1; c=10: High quality to 64 KBytes.
a=13; b=2; c=10: High quality to 64 KBytes.
a=13; b=3; c=10: High quality to 8 KBytes.
a=13; b=4; c=10: High quality to 8 KBytes.
a=13; b=5; c=10: High quality to 128 KBytes.
a=14; b=1; c=6: High quality to 256 KBytes.
a=14; b=2; c=6: *Instant fail!!!
a=14; b=3; c=6: High quality to 256 KBytes.
a=14; b=4; c=6: *Instant fail!!!
a=14; b=5; c=6: High quality to 128 KBytes.
a=14; b=1; c=7: High quality to 128 KBytes.
a=14; b=2; c=7: High quality to 256 KBytes.
a=14; b=3; c=7: High quality to 256 KBytes.
a=14; b=4; c=7: High quality to 64 KBytes.
a=14; b=5; c=7: High quality to 512 KBytes. (Winner #2)
a=14; b=1; c=8: High quality to 64 KBytes.
a=14; b=2; c=8: *Instant fail!!!
a=14; b=3; c=8: High quality to 64 KBytes.
a=14; b=4; c=8: *Instant fail!!!
a=14; b=5; c=8: High quality to 32 KBytes.
a=14; b=1; c=9: High quality to 64 KBytes.
a=14; b=2; c=9: High quality to 128 KBytes.
a=14; b=3; c=9: High quality to 64 KBytes.
a=14; b=4; c=9: High quality to 128 KBytes. (This is your above constants)
a=14; b=5; c=9: High quality to 16 KBytes.
a=14; b=1; c=10: High quality to 128 KBytes.
a=14; b=2; c=10: *Instant fail!!!
a=14; b=3; c=10: High quality to 64 KBytes.
a=14; b=4; c=10: *Instant fail!!!
a=14; b=5; c=10: High quality to 32 KBytes.
a=15; b=1; c=6: High quality to 256 KBytes.
a=15; b=2; c=6: High quality to 512 KBytes. (Winner #3)
a=15; b=3; c=6: High quality to 128 KBytes.
a=15; b=4; c=6: High quality to 64 KBytes.
a=15; b=5; c=6: High quality to 512 KBytes. (Winner #4)
a=15; b=1; c=7: High quality to 128 KBytes.
a=15; b=2; c=7: High quality to 128 KBytes.
a=15; b=3; c=7: High quality to 128 KBytes.
a=15; b=4; c=7: High quality to 128 KBytes.
a=15; b=5; c=7: High quality to 64 KBytes.
a=15; b=1; c=8: High quality to 256 KBytes.
a=15; b=2; c=8: High quality to 64 KBytes.
a=15; b=3; c=8: High quality to 128 KBytes.
a=15; b=4; c=8: High quality to 64 KBytes.
a=15; b=5; c=8: High quality to 256 KBytes.
a=15; b=1; c=9: High quality to 256 KBytes.
a=15; b=2; c=9: High quality to 64 KBytes.
a=15; b=3; c=9: High quality to 64 KBytes.
a=15; b=4; c=9: High quality to 64 KBytes.
a=15; b=5; c=9: High quality to 64 KBytes.
a=15; b=1; c=10: High quality to 128 KBytes.
a=15; b=2; c=10: High quality to 128 KBytes.
a=15; b=3; c=10: High quality to 64 KBytes.
a=15; b=4; c=10: High quality to 32 KBytes.
a=15; b=5; c=10: High quality to 16 KBytes.

* Note that, no matter the result size, whenever all three constants are even numbers the tests are instant off-the-chart failures for every test of the battery.

evanh · 2017-04-11 17:09

Whaaa! I think I've messed up somewhere. I've rebuilt my code and am now getting up to 512 MBytes! You might have to discard most of what I've written there, sorry.

The even constants rule still applies though.

Smile, it's 5:10AM too!

cgracey · 2017-04-11 17:33

evanh wrote: »

Whaaa! I think I've messed up somewhere. I've rebuilt my code and am now getting up to 512 MBytes! You might have to discard most of what I've written there, sorry.

The even constants rule still applies though.

Smile, it's 5:10AM too!

I went to bed late, too, and now I feel lousy.

When I get into the office today, I'm hoping my 16-cog xoroshiro32+ tester has some full-length results. Then, let's test those. That's when we'll know for sure if we'very got something. Your initial tests showed some promise, anyway.

cgracey · 2017-04-11 21:05

Cogs are starting to COGSTOP themselves on the xoroshiro32+ search program I started running last night. Maybe in an hour, I'll be able to see if we have any results - that is, $FFFF_FFFF iterations before returning to $0000_0001. If we get some good rotate/shift-value sets, I think Evanh will be able to test their actual quality.

cgracey · 2017-04-11 21:42

Evanh,

Here is the algorithm:

uint16_t next(void) {
	const uint16_t s0 = s[0];
	const_uint16_t s1 = s[1];
	const uint16_t result = s0 + s1;

	s1 ^= s0;
	s[0] = rotl(s0, t1) ^ s1 ^ (s1 << t2);
	s[1] = rotl(s1, t3);

	return result;
}

It's looking like these values for t1, t2, and t3 produce maximal-length runs:

10,7,11
11,1,2
11,2,6
11,3,10
11,7,10

There's going to be a bunch more possibilities, but could you please test a few of these and discover if there is any qualitative difference among them?

cgracey · 2017-04-12 02:28

After running that 16-cog xoroshiro32+ search program for 15 hours, I got this data out.

Note that for each 3-digit hex number, the 1st nibble is t1, the second nibble is t2, and the third nibble is t3. So, for the first data set, $05B, t1=0, t2=5, and t3=11.

I'm hoping Evanh will run a few of these cases through dieharder and find out if we have a quality solution here for a 32-bit state PRNG. I would only bother with sets that have all different digits and no 0's in them.

05B
0F1
128
1F0
21B
21F
227
26F
287
299
2B3
326
33A
3B2
3BE
419
47F
485
526
584
5EC
623
625
62B
63F
6EB
718
722
72A
782
7AA
7F8
817
821
85D
87F
89D
8F7
914
92E
98E
992
A27
A33
A3B
A5D
A7B
AA7
B12
B1E
B26
B3A
B50
B7A
B8C
BAC
BE6
C4F
C8B
CAB
CAD
CE5
D3E
D58
D5A
D98
DAC
DBE
DCE
DDE
E1B
E27
E29
E3D
E89
EB3
EBB
EBD
ECD
EDD
F12
F36
F4C
F62
F74
F78

cgracey · 2017-04-12 02:45

Evanh,

Could you please test this one first:

t1 = 5
t2 = 2
t3 = 6

Thanks. I just did a separate test on that one and it looks really good to me. I wonder what dieharder would say.

evanh · 2017-04-12 03:27

Sure thing. I'll be off work in about 3 hours.

cgracey · 2017-04-12 03:30

evanh wrote: »

Sure thing. I'll be off work in about 3 hours.

Cool.

evanh · 2017-04-12 10:59

Ended up doing overtime at work.

I've repaired my bug. I wasn't completely taking care of arbitrary length in the rotl() function. I'd suffered from it from day one I think.

#define  ACCUM_MASK  (((1UL<<RESULT_SIZE)-1)<<1|1)   // One more bit than results.

static inline  uint64_t rotl(uint64_t value, int shift) {
	return (value << shift) | ((value & ACCUM_MASK) >> (RESULT_SIZE + 1 - shift)) & ACCUM_MASK;
}

Right, rerunning last night's effort, my best Xoroshiro32+ high quality run was 1 GByte with constants: 14,4,7. Next best was 14,3,7 with one very suspicious case at 1 GByte. Both of these failed at 2 GByte. Massive fix from last night.

Of the ones you've highlighted:
10,7,11 HQ to 256 MB. Fails at 512 MB.
11,1,2 HQ to 64 MB. Fails at 256 MB.
11,2,6 HQ to 256 MB. Fails at 512 MB.
11,3,10 HQ to 8 MB. Fails at 32 MB.
11,7,10 HQ to 1 MB. Fails at 4 MB.
5,2,6 HQ to 64 MB. Fails at 128 MB.

evanh · 2017-04-12 11:14

10,3,5 is not quite as clean as 14,4,7 but I'll still give it a HQ to 1 GB. Fails at 2 GB.

evanh · 2017-04-12 12:29

I've confirmed same result from using Heater's original rotl() but with uint16_t and also plugging in your above routine without any of my auto sizing tricks.

Attached is the 14,4,7 full test output. What's notable now is the solid and dramatic failure case at 2 GBytes. Whereas when I had the bug in my rotl() function, all the higher sized tests failed in the same single test case every time - DC6-9x1Bytes-1.

cgracey · 2017-04-12 16:39

Evanh, thanks for trying those.

How many bits are you sampling at each iteration? I would think bits 15..1 of the addition would be the best.

When you say a test failed at 2GB, how many iterations was that? Ideally, we should be getting 4G iterations passing, right? All those patterns should repeat at 4G - 1.

evanh · 2017-04-12 23:39

Generator result size is 15 bits and I continuously concatenate those together to feed bytes to the Practrand test suit. So that's just over 1 GSamples at the 2 GByte point.

I'd post my full code now but I'm at work again ...

evanh · 2017-04-13 00:22

The author of Xoroshiro says not to expect any relationship between repeat length and quality. Other than, I guess, the quality has to collapse beyond the repeat length.

cgracey · 2017-04-13 04:41

evanh wrote: »

The author of Xoroshiro says not to expect any relationship between repeat length and quality. Other than, I guess, the quality has to collapse beyond the repeat length.

But what about random seeding? That effectively jumps you ahead to some unknown point, right?

cgracey · 2017-04-13 04:44

evanh wrote: »

Generator result size is 15 bits and I continuously concatenate those together to feed bytes to the Practrand test suit. So that's just over 1 GSamples at the 2 GByte point.

I'd post my full code now but I'm at work again ...

OK, sounds good. And those 15 bits are the top bits of the sum, right, excluding the LSB?

I wonder why we hit a wall after so many samples, instead of it just going on forever.

evanh · 2017-04-13 04:49

You mean multiple reseedings during a single test? That would invalidate the purpose of the test I'd think.

cgracey · 2017-04-13 04:57

evanh wrote: »

You mean multiple reseedings during a single test? That would invalidate the purpose of the test I'd think.

I meant that, in practice, a PRNG may be seeded, and it will iterate from that seed value. If we poop out at 512MB, what would happen to our quality if we seeded with a value from the fail point?

evanh · 2017-04-13 04:57

cgracey wrote: »

OK, sounds good. And those 15 bits are the top bits of the sum, right, excluding the LSB?

Yep. I right shift the s0 + s1 sum, then mask 0x7fff to be sure.

I wonder why we hit a wall after so many samples, instead of it just going on forever.

The random data stream does go forever. It's just the Practrand test suit is sensitive and self terminates when the accumulated stats go too far out. Makes for easy automation of test combinations.

evanh · 2017-04-13 05:00

cgracey wrote: »

I meant that, in practice, a PRNG may be seeded, and it will iterate from that seed value. If we poop out at 512MB, what would happen to our quality if we seeded with a value from the fail point?

As things are done at the moment, the test would start afresh and produce another 512 MBytes before failing again. The initial seed value shouldn't have any impact on quality.

EDIT: Fixed my fail value.

potatohead · 2017-04-13 05:05

The initial seed value shouldn't have any impact on quality

This makes rational sense, but I can't seem to visualize it.

Random numbers are weird.

evanh · 2017-04-13 05:59

The way I think about it is the repeat length doesn't change, same as if the total length of 0 -99 is 100 and, assuming 99 rolls over back to 0, no matter where you start, eg: at 50, it still takes 100 iterations to get back to the starting value of 50 again.

potatohead · 2017-04-13 06:09

The roll over visualization helps! Thanks

cgracey · 2017-04-13 06:21

So, up to the fail point, was the quality optimal? If so, I think this is suitable for our purpose of making a repeatable software PRNG.

cgracey · 2017-04-13 06:22

evanh wrote: »

cgracey wrote: »

I meant that, in practice, a PRNG may be seeded, and it will iterate from that seed value. If we poop out at 512MB, what would happen to our quality if we seeded with a value from the fail point?

As things are done at the moment, the test would start afresh and produce another 512 MBytes before failing again. The initial seed value shouldn't have any impact on quality.

EDIT: Fixed my fail value.

Okay. Thanks for explaining that. I understand now.

cgracey · 2017-04-13 06:55

evanh wrote: »

I've confirmed same result from using Heater's original rotl() but with uint16_t and also plugging in your above routine without any of my auto sizing tricks.

Attached is the 14,4,7 full test output. What's notable now is the solid and dramatic failure case at 2 GBytes. Whereas when I had the bug in my rotl() function, all the higher sized tests failed in the same single test case every time - DC6-9x1Bytes-1.

I just realized something... 14,4,7 did not show up in my overnight 16-cog test as a setting which had a full-length period of $FFFF_FFFF iterations before returning to the original seed. That might explain the dramatic fail at 2GB. How did you decide to test that case?

My findings skipped over 14,4,7:

$E3D = 14,3,13
$E89 = 13,8,9

evanh · 2017-04-13 07:03

I automated 125 test runs of the a,b,c combinations resulting in 125 Practrand result output files. The longer a particular test lasts the bigger the result file becomes. I examined only the biggest files and posted my favourite from them.

I've since widened the combinations to 500 or so but not got anything better.

Random/LFSR on P2

Comments