Random/LFSR on P2

TonyB_ · 2017-10-30 01:32

Welcome, scro. Did you write PractRand?

TonyB_ · 2017-10-30 01:44

cgracey wrote: »

Anyone out there want a programming challenge?

I think it would be best to take the good 63 bits out of the xoroshiro128+ and come up with sixteen randomly-chosen static 32-bit patterns which each use 32 of those 63 bits. Most of those 63 bits should only be used 8 times across all 16 patterns, while a few will need to be used 9 times, since we only have 63 bits, not 64, from the xoroshiro128+. Each cog will get one of these 16 patterns for its own RND value.

We can get 64 good bits out of xoroshiro128+ by using the parity, which could simplify the bit scrambling for each cog (now only eight of course).

evanh · 2017-10-30 01:53

Another approach is not tap the lower 8 bits. That way the poorer quality bits are kept out of the Cogs altogether.

So, 56 bits of the sum available. 256 bits needed ... 4x56 + most significant 32.

scro · 2017-10-30 02:20

TonyB_ wrote: »

Welcome, scro. Did you write PractRand?

Yes. Once in a while I google PractRand to see how it gets used & talked about, and since the discussion was ongoing when I saw it I dropped it.

Though, truly, I don't understand a lot of what is going on. There seems to be a microcontroller for robots targeted at education & hobbyist markets, I think, with an ongoing custom chip design (though I would expect that to be programmable logic of some form or another, I didn't think the market was big enough to justify designing & fabbing chips just for such 'bots?). And I think it might have multiple subordinate processors in an arrangement vaguely reminiscent of the PlayStation 4's CPUs & APUs. And there is talk of a dedicated PRNG opcode, and PractRand testing to find optimal constants for that. And sometimes it seems to be limited to 32 bits of state, but much of what people are talking about doesn't sound limited to that, so maybe there are multiple things under discussion? So mostly I'm confused.

evanh · 2017-10-30 06:03

Damn! /me bows. Many thanks for making PractRand.

I presume you use the "scro" handle elsewhere? Otherwise I can't see how Tony guessed.

There is two incarnations of Xoroshiro implemented in the working design of the Prop2:
- One is a free running per clock Xoroshiro128+ that is shared by all eight processor cores, named "Cogs". This implementation is verbatim as per known version of Xoroshiro128+. The divvying up of its summed output is one discussion.
- The other is a crafted Xoroshiro32 based instruction, named XORO32. Which is where all the calibration and experimentation effort is going towards.

Regarding what the Propeller is: The general architecture is fully symmetrical. There is no OS focus, no MMU. The only isolation is the physically multiple cores. Library like features, often in the form of independently tasked objects, are the building blocks. There is hardware support for DMA channels, A/D, D/A, counting, pulsing, lots more. Again, all symmetrically spread.

More reading here - http://forums.parallax.com/discussion/162298/prop2-fpga-files/p1 In particular Chip's first link to the Prop2 Google Doc.

evanh · 2017-10-30 10:00

TonyB_ wrote: »

Would it be useful if XORO32 could modify Z, to report an invalid seed? Users could give it a random seed to get started and check after the MOV. "Shoot first, ask questions later" programming.

The invalid zero can be checked with the MOV seeding. The iterator doesn't self-produce a zero.

evanh · 2017-10-30 11:40

scro,
I'm just now reading your large reply to my request for an example. It might take me a while to digest it. Thanks for the input.

BTW: We're touching on 1GB with XORO32 as is. Have a nosy at some scores - http://forums.parallax.com/discussion/comment/1424014/#Comment_1424014

In fact, can you shed any light on why swapping two bits of the sum should affect the PractRand scoring in any significant way?

Heater. · 2017-10-30 12:05

I have a silly question.

How is it possible to get a cycle length of longer than 4G when there is only 32 bits of state?

evanh · 2017-10-30 12:38

You don't. Most PRNG algorithms use a much larger state than the period covers. A significantly underused state space is considered an advantage as long as it appears to be randomly distributed. I believe that's the area that Mellissa targets with her 2D maps. Looking for visible patterns in the used vs unused state space.

Xoroshiro is a bit unusual in that it has a totally uniform utilisation while still achieving decent quality. Every possible state, except for zero, occurs exactly once before repeating (like a plain incrementing counter).

EDIT: Obviously not every triplet combination works to that extent though. Some will have relatively very short periods. Hence the exercise in culling before running quality tests.

evanh · 2017-10-30 12:40

That's my current limited understanding.

Heater. · 2017-10-30 12:47

It's all a mystery to me. I was just puzzling about scro's statement:

...which gives you an effective cycle length of almost 8 gigabytes instead of the almost 16 gigabytes it would otherwise be.

Perhaps I'm not getting the right context when reading that.

TonyB_ · 2017-10-30 12:51

scro wrote: »

Any particular reason for the focus on LFSRs and LFibs and the like? I know in the PractRand docs I included a section on what PRNGs to use on exotic hardware, including this snippet:
...

evanh wrote: »

Damn! /me bows. Many thanks for making PractRand.

I presume you use the "scro" handle elsewhere? Otherwise I can't see how Tony guessed.

Try to keep up at the back!

evanh · 2017-10-30 12:57

Wow, lol, I totally missed that. And I can even remember exactly how I read that piece of the posting too - In my head I only saw "I know in the PractRand docs:" The remainder of the sentence never registered at all.

evanh · 2017-10-30 13:02

Heater. wrote: »

It's all a mystery to me. I was just puzzling about scro's statement:

...which gives you an effective cycle length of almost 8 gigabytes instead of the almost 16 gigabytes it would otherwise be.

Perhaps I'm not getting the right context when reading that.

Ah, right, 8GB is 4Giterations at 16 bits. A PractRand failure may occur at double that - giving 16GB as the score.
Ah, right, 16GB is 4Giterations at 32 bits. A PractRand failure may occur at double that - giving 32GB as the score.

I guess I should actually build that and run tests. Not tonight though.

Heater. · 2017-10-30 13:08

Ah, I get it, PractRand works on streams of bytes. That is probably my confusion.

scro · 2017-10-30 19:31

evanh wrote: »

Damn! /me bows. Many thanks for making PractRand.

I presume you use the "scro" handle elsewhere? Otherwise I can't see how Tony guessed.

Nice to feel appreciated. In my first post here I think I implied that I had written the PractRand documentation I was quoting. Easy to miss, but probably he spotted it.

evanh wrote: »

There is two incarnations of Xoroshiro implemented in the working design of the Prop2:
- One is a free running per clock Xoroshiro128+ that is shared by all eight processor cores, named "Cogs". This implementation is verbatim as per known version of Xoroshiro128+. The divvying up of its summed output is one discussion.

By free running, you mean that it generates one PRNG output per clock, regardless of demand? What is the point? If it's just to let "Cogs" access its output stream when they feel like, from their perspective they'd be losing the advantages of determinism since they can't control when the PRNG gets gated, so at most access to it could function as a substitute for a TRNG opcode. And for that kind of purpose, a CSPRNG would be better, or at least something that could easily pass all statistical tests.

evanh wrote: »

- The other is a crafted Xoroshiro32 based instruction, named XORO32. Which is where all the calibration and experimentation effort is going towards.

And that's where you're limited to 32 bits of state? It's actually a pretty decent choice of algorithm given the limitation. I did offer an alternative with higher quality output, but really there's a lot of judgement calls involved, one could argue that XORO32 is as good as it gets for this purpose since most higher quality output hashing won't be able to clock as fast, plus to get significantly better results on statistical tests of the output I had to bias the output hashing which is of dubious practical utility in this scenario. Though there's probably a few extra options available because of the nature of hardware implementations that I didn't fully consider.

evanh wrote: »

Regarding what the Propeller is: The general architecture is fully symmetrical. There is no OS focus, no MMU. The only isolation is the physically multiple cores. Library like features, often in the form of independently tasked objects, are the building blocks. There is hardware support for DMA channels, A/D, D/A, counting, pulsing, lots more. Again, all symmetrically spread.

Huh. I was confused because I was expecting symmetric cores to be referred to as "cores" or "processors" or something generic sounding, so I was thinking "Cogs" must be some form of subordinate processing unit.

evanh wrote: »

More reading here - http://forums.parallax.com/discussion/162298/prop2-fpga-files/p1 In particular Chip's first link to the Prop2 Google Doc.

I'll take a look.

evanh wrote: »

scro,
I'm just now reading your large reply to my request for an example. It might take me a while to digest it. Thanks for the input.

BTW: We're touching on 1GB with XORO32 as is. Have a nosy at some scores - http://forums.parallax.com/discussion/comment/1424014/#Comment_1424014

In fact, can you shed any light on why swapping two bits of the sum should affect the PractRand scoring in any significant way?

Why swapping bits would effect PractRand scoring? Um... if they were within the same byte it probably wouldn't on DC6, nor on BCFN... probably not on Gap16 either. FPF is a more complicated story though... different kinds of attention will be payed to different bits, also depending upon the content of other bits. And the foldings (all the test results prefixed with "[LowX/Y]" where X and Y are numbers) complicate things because after folding the bit positions will end up with slightly different relationships than they had in the original bit streams.
Wait, I see a comment there "Bits 1 and 9 reversed". Swapping bits between different bytes will effect the DC6 score. It looks for patterns in the hamming weights of nearby bytes. ie is a byte more likely to have a low hamming weight if the previous two bytes had medium-high hamming weights and the 3 before that had low hamming weights? That kind of thing. When you move bits around between bytes the patterns it looks for will change. You'd have to move bits farther than that to effect (unfolded) BCFN or Gap16 though. Probably, sometimes results surprise me.
edit: An answer more specific to xoroshiro128plus: In this PRNG, the biggest problem is that the lowest bits expose raw or almost raw LFSR output, with the excessive linearity issues inherent in LFSRs, which are normally detected by the BRank test. Bit 1 is low enough for that to be an issue (though I'd expect it to be much bigger in bit 0... we are using 0-based numbering schemes, right?). BRank may not be able to detect excessive linearity if bits lacking that flaw are also present in the input. PractRand does "folding", creating secondary bytestreams out of subsets of the original to help tests focus on the most vulnerable bits and not get thrown off by higher quality bits, and the "folding" particularly puts emphasis on the lowest bit positions. Swapping bit positions will effect which bits the folding concentrates on. So folded BRank tests (probably one of the most effective things against something like xoroshiro128plus) might end up concentrating on less vulnerable bits if you moved one of the originally lowest bits to a higher bit position.

TonyB_ · 2017-10-30 19:32

Sebastiano Vigna kindly emailed me all the xoroshiro32 characteristic polynomials, essential info for creating jump functions. The weight is the number of terms in the polynomial and in theory close to 16 is preferable. Our best triplet [14,2,7] has weight of 11 and second best [15,3,6] has weight of 13.


#a,  b,  c, weight,	xoroshiro32 characteristic polynomial

 4,  1,  9,	 5,	x^32 + x^18 + x^9  + x^2  + 1
 9,  1,  4,	 5,	x^32 + x^18 + x^9  + x^2  + 1

 2,  9,  9,	 7,	x^32 + x^23 + x^19 + x^13 + x^9  + x^8  + 1
 9,  9,  2,	 7,	x^32 + x^23 + x^19 + x^13 + x^9  + x^8  + 1

 1,  2,  8,	 9,	x^32 + x^18 + x^15 + x^13 + x^11 + x^9  + x^6  + x^4  + 1
 8,  2,  1,	 9,	x^32 + x^18 + x^15 + x^13 + x^11 + x^9  + x^6  + x^4  + 1

 4,  8,  5,	 9,	x^32 + x^21 + x^18 + x^16 + x^14 + x^12 + x^10 + x^8  + 1
 5,  8,  4,	 9,	x^32 + x^21 + x^18 + x^16 + x^14 + x^12 + x^10 + x^8  + 1

 5,  2,  6,	 9,	x^32 + x^24 + x^20 + x^19 + x^12 + x^10 + x^8  + x^6  + 1
 6,  2,  5,	 9,	x^32 + x^24 + x^20 + x^19 + x^12 + x^10 + x^8  + x^6  + 1

 5, 14, 12,	 9,	x^32 + x^21 + x^19 + x^16 + x^15 + x^14 + x^13 + x^6  + 1
12, 14,  5,	 9,	x^32 + x^21 + x^19 + x^16 + x^15 + x^14 + x^13 + x^6  + 1

 6, 14, 11, 	 9,	x^32 + x^24 + x^23 + x^22 + x^19 + x^17 + x^14 + x^9  + 1
11, 14,  6, 	 9,	x^32 + x^24 + x^23 + x^22 + x^19 + x^17 + x^14 + x^9  + 1

 7,  1,  8, 	 9,	x^32 + x^29 + x^28 + x^26 + x^24 + x^20 + x^14 + x^8  + 1
 8,  1,  7, 	 9,	x^32 + x^29 + x^28 + x^26 + x^24 + x^20 + x^14 + x^8  + 1

11,  8, 12, 	 9,	x^32 + x^21 + x^18 + x^16 + x^14 + x^12 + x^10 + x^8  + 1
12,  8, 11, 	 9,	x^32 + x^21 + x^18 + x^16 + x^14 + x^12 + x^10 + x^8  + 1

 2,  2,  7,	11,	x^32 + x^26 + x^24 + x^18 + x^16 + x^14 + x^12 + x^11 + x^6  + x^4  + 1
 7,  2,  2,	11,	x^32 + x^26 + x^24 + x^18 + x^16 + x^14 + x^12 + x^11 + x^6  + x^4  + 1

 7,  2, 10,	11,	x^32 + x^26 + x^24 + x^16 + x^14 + x^13 + x^12 + x^9  + x^8  + x^7  + 1
10,  2,  7,	11,	x^32 + x^26 + x^24 + x^16 + x^14 + x^13 + x^12 + x^9  + x^8  + x^7  + 1

 7,  2, 14,	11,	x^32 + x^28 + x^26 + x^24 + x^18 + x^14 + x^12 + x^11 + x^7  + x^5  + 1
14,  2,  7,	11,	x^32 + x^28 + x^26 + x^24 + x^18 + x^14 + x^12 + x^11 + x^7  + x^5  + 1

 7, 10, 10,	11,	x^32 + x^26 + x^24 + x^20 + x^19 + x^15 + x^14 + x^13 + x^11 + x^9  + 1
10, 10,  7,	11,	x^32 + x^26 + x^24 + x^20 + x^19 + x^15 + x^14 + x^13 + x^11 + x^9  + 1

 8,  5, 13,	11,	x^32 + x^19 + x^10 + x^9  + x^8  + x^6  + x^5  + x^4  + x^2  + x    + 1
13,  5,  8,	11,	x^32 + x^19 + x^10 + x^9  + x^8  + x^6  + x^5  + x^4  + x^2  + x    + 1

12,  4, 15,	11,	x^32 + x^24 + x^21 + x^18 + x^15 + x^12 + x^11 + x^10 + x^7  + x^5  + 1
15,  4, 12,	11,	x^32 + x^24 + x^21 + x^18 + x^15 + x^12 + x^11 + x^10 + x^7  + x^5  + 1

12, 10, 13,	11,	x^32 + x^24 + x^22 + x^20 + x^17 + x^16 + x^11 + x^9  + x^7  + x^4  + 1
13, 10, 12,	11,	x^32 + x^24 + x^22 + x^20 + x^17 + x^16 + x^11 + x^9  + x^7  + x^4  + 1

 2,  1, 15,	13,	x^32 + x^30 + x^27 + x^26 + x^22 + x^21 + x^10 + x^6  + x^5  + x^3  + x^2  + x    + 1
15,  1,  2,	13,	x^32 + x^30 + x^27 + x^26 + x^22 + x^21 + x^10 + x^6  + x^5  + x^3  + x^2  + x    + 1

 2,  8,  7,	13,	x^32 + x^27 + x^24 + x^23 + x^22 + x^19 + x^16 + x^14 + x^12 + x^11 + x^7  + x^4  + 1
 7,  8,  2,	13,	x^32 + x^27 + x^24 + x^23 + x^22 + x^19 + x^16 + x^14 + x^12 + x^11 + x^7  + x^4  + 1

 3,  3, 10,	13,	x^32 + x^29 + x^27 + x^26 + x^25 + x^22 + x^21 + x^18 + x^14 + x^7  + x^6  + x^3  + 1
10,  3,  3,	13,	x^32 + x^29 + x^27 + x^26 + x^25 + x^22 + x^21 + x^18 + x^14 + x^7  + x^6  + x^3  + 1

 6,  3, 15,	13,	x^32 + x^28 + x^24 + x^23 + x^19 + x^18 + x^14 + x^12 + x^9  + x^6  + x^5  + x^3  + 1
15,  3,  6,	13,	x^32 + x^28 + x^24 + x^23 + x^19 + x^18 + x^14 + x^12 + x^9  + x^6  + x^5  + x^3  + 1

 9,  2, 14,	13,	x^32 + x^28 + x^22 + x^20 + x^19 + x^18 + x^16 + x^14 + x^13 + x^9  + x^8  + x^6  + 1
14,  2,  9,	13,	x^32 + x^28 + x^22 + x^20 + x^19 + x^18 + x^16 + x^14 + x^13 + x^9  + x^8  + x^6  + 1

 9,  8, 14,	13,	x^32 + x^27 + x^24 + x^23 + x^22 + x^19 + x^16 + x^14 + x^12 + x^11 + x^7  + x^4  + 1
14,  8,  9,	13,	x^32 + x^27 + x^24 + x^23 + x^22 + x^19 + x^16 + x^14 + x^12 + x^11 + x^7  + x^4  + 1

10,  3, 11,	13,	x^32 + x^25 + x^24 + x^22 + x^21 + x^19 + x^16 + x^14 + x^11 + x^8  + x^6  + x^5  + 1
11,  3, 10,	13,	x^32 + x^25 + x^24 + x^22 + x^21 + x^19 + x^16 + x^14 + x^11 + x^8  + x^6  + x^5  + 1

10,  5, 13,	13,	x^32 + x^27 + x^26 + x^25 + x^21 + x^14 + x^12 + x^10 + x^9  + x^6  + x^5  + x^3  + 1
13,  5, 10,	13,	x^32 + x^27 + x^26 + x^25 + x^21 + x^14 + x^12 + x^10 + x^9  + x^6  + x^5  + x^3  + 1

11,  1, 14,	13,	x^32 + x^27 + x^26 + x^21 + x^20 + x^19 + x^17 + x^10 + x^6  + x^5  + x^4  + x^2  + 1
14,  1, 11,	13,	x^32 + x^27 + x^26 + x^21 + x^20 + x^19 + x^17 + x^10 + x^6  + x^5  + x^4  + x^2  + 1

11, 10, 12,	13,	x^32 + x^19 + x^17 + x^16 + x^13 + x^12 + x^11 + x^10 + x^9  + x^8  + x^6  + x^4  + 1
12, 10, 11,	13,	x^32 + x^19 + x^17 + x^16 + x^13 + x^12 + x^11 + x^10 + x^9  + x^8  + x^6  + x^4  + 1

11, 11, 14,	13,	x^32 + x^27 + x^23 + x^22 + x^18 + x^16 + x^14 + x^12 + x^9  + x^8  + x^7  + x^3  + 1
14, 11, 11,	13,	x^32 + x^27 + x^23 + x^22 + x^18 + x^16 + x^14 + x^12 + x^9  + x^8  + x^7  + x^3  + 1

13, 13, 14,	13,	x^32 + x^26 + x^25 + x^23 + x^22 + x^19 + x^17 + x^15 + x^11 + x^9  + x^7  + x^5  + 1
14, 13, 13,	13,	x^32 + x^26 + x^25 + x^23 + x^22 + x^19 + x^17 + x^15 + x^11 + x^9  + x^7  + x^5  + 1

 2,  1, 11,	15,	x^32 + x^28 + x^27 + x^26 + x^21 + x^20 + x^18 + x^12 + x^10 + x^9  + x^6  + x^5  + x^4  + x^3  + 1
11,  1,  2,	15,	x^32 + x^28 + x^27 + x^26 + x^21 + x^20 + x^18 + x^12 + x^10 + x^9  + x^6  + x^5  + x^4  + x^3  + 1

 2, 11,  3,	15,	x^32 + x^29 + x^27 + x^26 + x^25 + x^22 + x^13 + x^11 + x^10 + x^7  + x^6  + x^5  + x^4  + x^3  + 1
 3, 11,  2,	15,	x^32 + x^29 + x^27 + x^26 + x^25 + x^22 + x^13 + x^11 + x^10 + x^7  + x^6  + x^5  + x^4  + x^3  + 1

 3,  2,  6,	15,	x^32 + x^24 + x^23 + x^22 + x^21 + x^20 + x^19 + x^16 + x^15 + x^12 + x^11 + x^10 + x^6  + x^4  + 1
 6,  2,  3,	15,	x^32 + x^24 + x^23 + x^22 + x^21 + x^20 + x^19 + x^16 + x^15 + x^12 + x^11 + x^10 + x^6  + x^4  + 1

 4,  7, 15,	15,	x^32 + x^27 + x^25 + x^24 + x^22 + x^20 + x^18 + x^14 + x^12 + x^10 + x^8  + x^6  + x^4  + x^2  + 1
15,  7,  4,	15,	x^32 + x^27 + x^25 + x^24 + x^22 + x^20 + x^18 + x^14 + x^12 + x^10 + x^8  + x^6  + x^4  + x^2  + 1

 6,  2, 11,	15,	x^32 + x^24 + x^23 + x^22 + x^21 + x^19 + x^15 + x^14 + x^13 + x^12 + x^11 + x^10 + x^8  + x^5  + 1
11,  2,  6,	15,	x^32 + x^24 + x^23 + x^22 + x^21 + x^19 + x^15 + x^14 + x^13 + x^12 + x^11 + x^10 + x^8  + x^5  + 1

 7, 15,  8,	15,	x^32 + x^24 + x^22 + x^20 + x^18 + x^16 + x^15 + x^13 + x^11 + x^9  + x^7  + x^5  + x^3  + x    + 1
 8, 15,  7,	15,	x^32 + x^24 + x^22 + x^20 + x^18 + x^16 + x^15 + x^13 + x^11 + x^9  + x^7  + x^5  + x^3  + x    + 1

10,  7, 11,	15,	x^32 + x^27 + x^25 + x^24 + x^23 + x^22 + x^21 + x^17 + x^16 + x^15 + x^14 + x^12 + x^11 + x^8  + 1
11,  7, 10,	15,	x^32 + x^27 + x^25 + x^24 + x^23 + x^22 + x^21 + x^17 + x^16 + x^15 + x^14 + x^12 + x^11 + x^8  + 1

13, 12, 14,	15,	x^32 + x^22 + x^21 + x^20 + x^19 + x^18 + x^17 + x^16 + x^15 + x^13 + x^10 + x^9  + x^7  + x^6  + 1
14, 12, 13,	15,	x^32 + x^22 + x^21 + x^20 + x^19 + x^18 + x^17 + x^16 + x^15 + x^13 + x^10 + x^9  + x^7  + x^6  + 1

 2,  6, 15,	17,	x^32 + x^25 + x^24 + x^23 + x^22 + x^21 + x^20 + x^19 + x^18 + x^17 + x^14 + x^13 + x^12 + x^10 + x^8  + x^7  + 1
15,  6,  2,	17,	x^32 + x^25 + x^24 + x^23 + x^22 + x^21 + x^20 + x^19 + x^18 + x^17 + x^14 + x^13 + x^12 + x^10 + x^8  + x^7  + 1

 8,  7, 15,	17,	x^32 + x^24 + x^23 + x^21 + x^20 + x^18 + x^14 + x^13 + x^11 + x^10 + x^8  + x^7  + x^6  + x^4  + x^3  + x    + 1
15,  7,  8,	17,	x^32 + x^24 + x^23 + x^21 + x^20 + x^18 + x^14 + x^13 + x^11 + x^10 + x^8  + x^7  + x^6  + x^4  + x^3  + x    + 1

13,  3, 14,	17,	x^32 + x^30 + x^28 + x^21 + x^20 + x^17 + x^16 + x^13 + x^11 + x^10 + x^9  + x^7  + x^6  + x^5  + x^4  + x^2  + 1
14,  3, 13,	17,	x^32 + x^30 + x^28 + x^21 + x^20 + x^17 + x^16 + x^13 + x^11 + x^10 + x^9  + x^7  + x^6  + x^5  + x^4  + x^2  + 1

13, 11, 14,	17,	x^32 + x^27 + x^26 + x^24 + x^23 + x^22 + x^20 + x^19 + x^18 + x^17 + x^12 + x^11 + x^9  + x^5  + x^4  + x    + 1
14, 11, 13,	17,	x^32 + x^27 + x^26 + x^24 + x^23 + x^22 + x^20 + x^19 + x^18 + x^17 + x^12 + x^11 + x^9  + x^5  + x^4  + x    + 1

 3, 11, 14,	19,	x^32 + x^26 + x^24 + x^23 + x^22 + x^20 + x^19 + x^18 + x^17 + x^15 + x^14 + x^13 + x^12 + x^11 + x^10 + x^6  + x^5  + x^4  + 1
14, 11,  3,	19,	x^32 + x^26 + x^24 + x^23 + x^22 + x^20 + x^19 + x^18 + x^17 + x^15 + x^14 + x^13 + x^12 + x^11 + x^10 + x^6  + x^5  + x^4  + 1

 8,  9, 13,	19,	x^32 + x^28 + x^26 + x^25 + x^22 + x^21 + x^19 + x^18 + x^17 + x^16 + x^14 + x^13 + x^12 + x^10 + x^9  + x^7  + x^6  + x^3  + 1
13,  9,  8,	19,	x^32 + x^28 + x^26 + x^25 + x^22 + x^21 + x^19 + x^18 + x^17 + x^16 + x^14 + x^13 + x^12 + x^10 + x^9  + x^7  + x^6  + x^3  + 1

cgracey · 2017-10-30 20:13

Scro,

We are happy you came here!

The reason the xoroshiro-128+ runs on every clock is because its output is getting distributed to many sub-systems on the chip that may be running continuously, like dither sources for DAC outputs. So, it needs to just keep running.

About its lowest bits being lowest quality...

Could we rotate the output word by one bit position on each clock to spread the lousier bits around, over time? In other words, imagine we have 64 output bits each clock. We run that output word through a 64-bit rotator that steps one bit position each clock, or iteration. So, what is bit 0 first appears as bit 0, then as bit 1, etc., with all other bits rotating along with it. Would that make a uniform quality in all bits?

Thanks.

Chip

TonyB_ · 2017-10-31 02:17

TonyB_ wrote: »
Sebastiano Vigna kindly emailed me all the xoroshiro32 characteristic polynomials, essential info for creating jump functions. The weight is the number of terms in the polynomial and in theory close to 16 is preferable. Our best triplet [14,2,7] has weight of 11 and second best [15,3,6] has weight of 13.

#a,  b,  c, weight,	xoroshiro32 characteristic polynomial

 ...

 7,  2, 14,	11,	x^32 + x^28 + x^26 + x^24 + x^18 + x^14 + x^12 + x^11 + x^7  + x^5  + 1
14,  2,  7,	11,	x^32 + x^28 + x^26 + x^24 + x^18 + x^14 + x^12 + x^11 + x^7  + x^5  + 1

 ...

 6,  3, 15,	13,	x^32 + x^28 + x^24 + x^23 + x^19 + x^18 + x^14 + x^12 + x^9  + x^6  + x^5  + x^3  + 1
15,  3,  6,	13,	x^32 + x^28 + x^24 + x^23 + x^19 + x^18 + x^14 + x^12 + x^9  + x^6  + x^5  + x^3  + 1

 ...

I hadn't looked closely at all the full-period triplets before. For every [a,b,c] there is a corresponding [c,b,a] that shares the same characteristic polynomial. Knowing this in advance means brute force searches can be reduced by half.

evanh · 2017-10-31 03:38

TonyB_ wrote: »

For every [a,b,c] there is a corresponding [c,b,a] that shares the same characteristic polynomial. Knowing this in advance means brute force searches can be reduced by half.

That's the least of the time worries. The full period iterator time of each combination very quickly becomes the main search burden. I was all enthused after the easy success with the s16 candidates but the blood drained when I was hitting s20 and gave up at s21.

evanh · 2017-10-31 05:36

cgracey wrote: »

About its lowest bits being lowest quality...

Could we rotate the output word by one bit position on each clock to spread the lousier bits around, over time? In other words, imagine we have 64 output bits each clock. We run that output word through a 64-bit rotator that steps one bit position each clock, or iteration. So, what is bit 0 first appears as bit 0, then as bit 1, etc., with all other bits rotating along with it. Would that make a uniform quality in all bits?

Chip,
Melissa's lecture, that Heater linked, gave some insight on this. I think the recommended approach to truncating the bit width is always just don't use the lower bits. I feel not using the low byte would be a good middle ground between keeping quality and still using a wide selection of the summing output. - http://forums.parallax.com/discussion/comment/1423147/#Comment_1423147

evanh · 2017-10-31 07:30

scro wrote: »

edit: An answer more specific to xoroshiro128plus: In this PRNG, the biggest problem is that the lowest bits expose raw or almost raw LFSR output, with the excessive linearity issues inherent in LFSRs, which are normally detected by the BRank test. Bit 1 is low enough for that to be an issue (though I'd expect it to be much bigger in bit 0... we are using 0-based numbering schemes, right?).

Yes, bit 0 is lsbit. That experiment with swapping bit 1 for bit 9 was conducted just after we'd done the parity trick to replace bit 0, giving us a tidy 16 bits per iteration output, and we were pondering if there was anything that could be done to improve bit 1 as well.

So folded BRank tests (probably one of the most effective things against something like xoroshiro128plus) might end up concentrating on less vulnerable bits if you moved one of the originally lowest bits to a higher bit position.

Thanks for the explanation. This says to me that, by doing that bit swap, I've just hidden a known weakness in Xoroshiro, rather than made the quality any better. It's what I was expecting but good to know for sure.

evanh · 2017-10-31 07:58

scro wrote: »

... so at most access to it could function as a substitute for a TRNG opcode. And for that kind of purpose, a CSPRNG would be better, or at least something that could easily pass all statistical tests.

Yeah, I guess the free running 128+ is very much intended as a TRNG substitute. Chip intends the finished mask ROM to seed it from an internal noise source, an A/D converter, at boot up.

evanh · 2017-10-31 08:09

TonyB_ wrote: »
Sebastiano Vigna kindly emailed me all the xoroshiro32 characteristic polynomials, essential info for creating jump functions. The weight is the number of terms in the polynomial and in theory close to 16 is preferable. Our best triplet [14,2,7] has weight of 11 and second best [15,3,6] has weight of 13.
#a,  b,  c, weight,	xoroshiro32 characteristic polynomial
14,  2,  7,	11,	x^32 + x^28 + x^26 + x^24 + x^18 + x^14 + x^12 + x^11 + x^7  + x^5  + 1
15,  3,  6,	13,	x^32 + x^28 + x^24 + x^23 + x^19 + x^18 + x^14 + x^12 + x^9  + x^6  + x^5  + x^3  + 1

I can't even imagine how to build the jump keys from that.

evanh · 2017-10-31 08:22

scro wrote: »

evanh wrote: »

Regarding what the Propeller is: The general architecture is fully symmetrical. There is no OS focus, no MMU. The only isolation is the physically multiple cores. Library like features, often in the form of independently tasked objects, are the building blocks. There is hardware support for DMA channels, A/D, D/A, counting, pulsing, lots more. Again, all symmetrically spread.

Huh. I was confused because I was expecting symmetric cores to be referred to as "cores" or "processors" or something generic sounding, so I was thinking "Cogs" must be some form of subordinate processing unit.

Actually, even that doc is a little behind current events. The Prop2 was 16 cores for the full sized design, but half the cores had to be dropped when it was discovered the design was far too large to fit the largest target silicon.

evanh · 2017-10-31 09:43

msrobots wrote: »

Same for those ATARI/COMMODORE things, RANDOM was not RANDOM but predictable. Sure you can trick with new seeding at keystrokes, or time BUT the player would simply repeat them keystrokes timely and run the same sequence again.

To add to Heater's answer: True randomness is difficult to do at all, as far as I know. What a pseudo random number generator (PRNG) does is provide super fast procedural number generation that looks as messy as possible, for the purpose of appearing to be random. Getting it really messy is the tricky part.

The nature of it is if started (seeded) at the same state each time it will always produce the same sequence. A messy fractal if you like.

As far as I understand the random generator in the P1 uses some free running -there I got lost - thing but did produce a not repeatable sequence of numbers. That I would call RANDOM.

I assume you mean in the Prop2 still (The Prop1 doesn't have any explicit hardware support for random numbers. Spin has an operator that uses a variable as the state store for programming language based PRNG).

The purpose of the free-running PRNG in the Prop2 is to be less predictable than the Cog based XORO32 instruction. It is still following the same type of procedural sequence (Xoroshiro) but two things make it less predictable. One is the uncontrolled iterations, the other is the random unknown seed value. Also, it has 128 bit state compared with 32 bit state for XORO32.

Heater. · 2017-10-31 10:41

The Prop 1 can indeed generate real random numbers:

http://obex.parallax.com/object/498

It makes use of the jitter in it's phase locked loop.

Whilst not explicitly built to be a source of randomness it's a happy accident that it does.

scro · 2017-10-31 11:24

cgracey wrote: »

Scro,

We are happy you came here!

The reason the xoroshiro-128+ runs on every clock is because its output is getting distributed to many sub-systems on the chip that may be running continuously, like dither sources for DAC outputs. So, it needs to just keep running.

About its lowest bits being lowest quality...

Could we rotate the output word by one bit position on each clock to spread the lousier bits around, over time? In other words, imagine we have 64 output bits each clock. We run that output word through a 64-bit rotator that steps one bit position each clock, or iteration. So, what is bit 0 first appears as bit 0, then as bit 1, etc., with all other bits rotating along with it. Would that make a uniform quality in all bits?

Thanks.

Chip

Yes, that can work. Not well, but it would actually improve quality, at least from PractRand's perspective and probably other statistical test batteries. I wouldn't recommend it. You're basically adding an ultra-low quality hash (varying rotation) on to a low quality hash (adding the halves together) of an LFSR. If the quality just doesn't matter that much you could leave it alone, maybe no one cares about the binary rank of the lowest bits; if the quality does actually matter, even if only rarely, you could either improve the hash in more meaningful ways or improve the LFSR or just replace the whole thing with something better (Trivium, for instance, is actually designed for hardware implementation, fast, generates 64 bits at a time, and much higher quality)... actually is there a good reason to have this thing centralized and pipe the output all over the place? PRNGs can be pretty small in hardware, you could give a dedicated one to your ditherers for instance, especially if you know what kind of quality is needed.
/me suddenly thinks of random and dithering together and vaguelly recalls graphics code I wrote 15-20 years ago that did random dithering each frame when rendering from 32 bit sources to 8, 15, or 16 bit target images... on CRTs 15 bits per pixel was indistinguishable from 24 to the human eye that way, though 8 bpp was pushing it a little too far... no clue how it would look on a modern LCD...

evanh wrote: »

cgracey wrote: »

About its lowest bits being lowest quality...

Could we rotate the output word by one bit position on each clock to spread the lousier bits around, over time? In other words, imagine we have 64 output bits each clock. We run that output word through a 64-bit rotator that steps one bit position each clock, or iteration. So, what is bit 0 first appears as bit 0, then as bit 1, etc., with all other bits rotating along with it. Would that make a uniform quality in all bits?

Chip,
Melissa's lecture, that Heater linked, gave some insight on this. I think the recommended approach to truncating the bit width is always just don't use the lower bits. I feel not using the low byte would be a good middle ground between keeping quality and still using a wide selection of the summing output. - http://forums.parallax.com/discussion/comment/1423147/#Comment_1423147

The most common issues with low bits are seen in power-of-2-modulus-linear-congruential-generators, which are commonly used and extensively studied. In such LCGs, the quality of any given bit is, broadly speaking, directly proportional to its bit position. Bit 0 is garbage, bit 2N is twice as good as bit N. More or less. This means that your overall quality is more or less proportional to how many bits you throw away.
xoroshiro128plus's issues are different. The particular way a bit of it fails BRank is by having the carry bit flowing in to it being too predictable. Bit 0 is horrible because there is no incoming carry, bit 1 is slightly problematic because the incoming carry is pretty simple, once you get a couple bits in though everything is about the same and quality no longer meaningfully improves with bit position. So it's a completely different ramp up in quality with position than the more common case seen in LCGs. With respect to binary rank tests and similar tests. That's not xoroshiro128plus's only issue, but it's the easiest to detect, and the only issue PractRand will notice on normal settings.
Whatever Melissa said was probably heavily influenced by the situation with LCGs, and likely less than fully applicable here.

evanh wrote: »

scro wrote: »

edit: An answer more specific to xoroshiro128plus: In this PRNG, the biggest problem is that the lowest bits expose raw or almost raw LFSR output, with the excessive linearity issues inherent in LFSRs, which are normally detected by the BRank test. Bit 1 is low enough for that to be an issue (though I'd expect it to be much bigger in bit 0... we are using 0-based numbering schemes, right?).

Yes, bit 0 is lsbit. That experiment with swapping bit 1 for bit 9 was conducted just after we'd done the parity trick to replace bit 0, giving us a tidy 16 bits per iteration output, and we were pondering if there was anything that could be done to improve bit 1 as well.

So folded BRank tests (probably one of the most effective things against something like xoroshiro128plus) might end up concentrating on less vulnerable bits if you moved one of the originally lowest bits to a higher bit position.

Thanks for the explanation. This says to me that, by doing that bit swap, I've just hidden a known weakness in Xoroshiro, rather than made the quality any better. It's what I was expecting but good to know for sure.

Um... what's this parity bit 0 thing exactly? I've seen it mentioned a number of times here, but I've never seen in actually defined, unless it was in Verilog code I didn't bother trying to decipher.

TonyB_ · 2017-10-31 12:08

evanh wrote: »
TonyB_ wrote: »
Sebastiano Vigna kindly emailed me all the xoroshiro32 characteristic polynomials, essential info for creating jump functions. The weight is the number of terms in the polynomial and in theory close to 16 is preferable. Our best triplet [14,2,7] has weight of 11 and second best [15,3,6] has weight of 13.
#a,  b,  c, weight,	xoroshiro32 characteristic polynomial
14,  2,  7,	11,	x^32 + x^28 + x^26 + x^24 + x^18 + x^14 + x^12 + x^11 + x^7  + x^5  + 1
15,  3,  6,	13,	x^32 + x^28 + x^24 + x^23 + x^19 + x^18 + x^14 + x^12 + x^9  + x^6  + x^5  + x^3  + 1
I can't even imagine how to build the jump keys from that.

"Further scramblings of Marsaglia's xorshift generators" by Sebastiano Vigna tells us how.
http://vigna.di.unimi.it/ftp/papers/xorshiftplus.pdf

If we could understand it, then it would be easy to do 2^16 jumps with xoroshiro32+.
Relevant excerpt attached.

evanh · 2017-10-31 12:09

deleted

Random/LFSR on P2

Comments