Random/LFSR on P2

Heater. · 2018-04-03 02:24

Great, Pseudo Random Number Generator wars!

As I suspected, random people are throwing randomly generated algorithms at the problem hoping something will stick. With perhaps a smattering of theory to back them up.

TonyB_ · 2018-04-03 03:00

TonyB_ wrote: »

I've just spotted tonight that there is a new post about xoroshiro128+ on Melissa O'Neill's "PCG, A Better Random Number Generator" blog:
http://www.pcg-random.org/posts/xoroshiro-fails-truncated.html

I told her about the xoroshiro++ algorithm on February 22nd.

Notice the length of Melissa's PractRand tests:

xoroshiro64+ [31:16]
Failed after 14 hours but kept running for 210 days!

xoroshiro128+ [63:32]
Failed after 101 days!
Evan's [55:0] & [57:2] tests failed after 3 days

cgracey · 2018-04-03 04:53

Here is the new Verilog for the XORO32 instruction.

I had to rewrite it, to reveal its regularity, in order to manipulate it confidently:

cgracey · 2018-04-03 05:26

The new/faster XORO32 outputs these initial 16 longs when seeded with $0000_0001:

50AD0021
A3D9B89A
9BD90C87
35023CAA
3D248840
71F57287
A81890AC
2CE3B4C1
B31DCC58
B7A4DF3D
3B2EA2E3
BFFF2C20
5B9AAFE9
35F43BB8
6A2D921C
A217CD9A

This looks right to me.

cgracey · 2018-04-03 05:28

Now is the time, if we want to add an adder stage to the XOROSHIRO128+, like we did for the XORO32++.

Is there much risk in using the guideline given about a prime rotation value near 32?

Never mind that extra adder question for the central PRNG. What we have is probably much more than we need, anyway, and I need to get the files to On Semi.

I just sent On Semi what are, hopefully, the final Verilog files.

evanh · 2018-04-03 05:45

cgracey wrote: »

Is there much risk in using the guideline given about a prime rotation value near 32?

XORO32 experience suggest it's okay, assuming the original triplet is truly a good quality candidate. The [14 2 7] triplet produced a wide group of good scores.

The outcome is likely a big improvement no matter what, so I guess on that basis you are good to do it. But no guarantees.

evanh · 2018-04-03 05:46

cgracey wrote: »

Never mind that extra adder question for the central PRNG. What we have is probably much more than we need, anyway, and I need to get the file to On Semi.

Agreed.

cgracey · 2018-04-03 05:48

Okay. If a bug-fix arises, we could slip it in.

evanh · 2018-04-03 10:49

TonyB_ wrote: »

evanh wrote: »

Actually, I'll run more selections of 32-bit widths first. That's what the Cogs use after all ...

It would be best to test the actual 32-bit subsets used by GETRND D or 8-bit (?) subsets used for dithering.

I've started a run of 32-bit widths and the first one to drop out is, as expected, the one that has bit1 as lsb, [32:1]. It baled at 256 GB, which is in the ballpark of previous equivalent:
length= 256 gigabytes (2^38 bytes), time= 4539 seconds
Test Name Raw Processed Evaluation
[Low1/32]BRank(18):12K(1) R=+86692 p~= 0 FAIL !!!!!!!!
...and 1567 test result(s) without anomalies

Still running is [33:2] and [34:3], all eight Ryzen cores are active. Not expecting these two to drop out until the 16 TB mark I guess. After that I'll have a shot at an odd bit width - which should go much further.

PS: Just over 2 GB of RAM used by each at the moment. I think the prior 16 TB runs got up to about 6.5 GB of RAM each. I'll keep an eye on this run too.

TonyB_ · 2018-04-03 13:32

cgracey wrote: »
The new/faster XORO32 outputs these initial 16 longs when seeded with $0000_0001:
50AD0021
A3D9B89A
9BD90C87
35023CAA
3D248840
71F57287
A81890AC
2CE3B4C1
B31DCC58
B7A4DF3D
3B2EA2E3
BFFF2C20
5B9AAFE9
35F43BB8
6A2D921C
A217CD9A
This looks right to me.

TonyB_ wrote: »
First 16 outputs for XORO32 when seed = 1 {s1 = 0, s0 = 1}
PRN calculated at start of algorithm before new state generated
50AD0021
A3D9B89A
9BD90C87
35023CAA
3D248840
71F57287
A81890AC
2CE3B4C1
B31DCC58
B7A4DF3D
3B2EA2E3
BFFF2C20
5B9AAFE9
35F43BB8
6A2D921C
A217CD9A

More confirmation that new XORO32 working as intended.
The new Verilog looks much cleaner.

TonyB_ · 2018-04-04 23:23

Heater. wrote: »

TonyB_ wrote: »

I've just spotted tonight that there is a new post about xoroshiro128+ on Melissa O'Neill's "PCG, A Better Random Number Generator" blog:
http://www.pcg-random.org/posts/xoroshiro-fails-truncated.html

I told her about the xoroshiro++ algorithm on February 22nd.

Great, Pseudo Random Number Generator wars!

As I suspected, random people are throwing randomly generated algorithms at the problem hoping something will stick. With perhaps a smattering of theory to back them up.

No, just "not-invented-here" or more particularly "not-invented-by-me" syndrome, to a significant extent. xoroshiro+ is old hat now, anyway.

cgracey · 2018-04-05 02:17

I've found a bug that needs fixing. What do we need to do to also upgrade our xoroshiro128?

TonyB_ · 2018-04-05 02:28

cgracey wrote: »

I've found a bug that needs fixing. What do we need to do to also upgrade our xoroshiro128?

PM just sent.

cgracey · 2018-04-05 05:43

Thanks, Guys! We now have an improved Xoroshiro128 implementation.

I found the bug. It had to do with SETQ2+WRLONG (writing LUT longs to hub). All fixed now. New files going off to OnSemi.

evanh · 2018-04-06 11:51

evanh wrote: »

I've started a run of 32-bit widths and the first one to drop out is, as expected, the one that has bit1 as lsb, [32:1]. It baled at 256 GB, which is in the ballpark of previous equivalent:
length= 256 gigabytes (2^38 bytes), time= 4539 seconds
Test Name Raw Processed Evaluation
[Low1/32]BRank(18):12K(1) R=+86692 p~= 0 FAIL !!!!!!!!
...and 1567 test result(s) without anomalies

Still running is [33:2] and [34:3], all eight Ryzen cores are active. Not expecting these two to drop out until the 16 TB mark I guess. After that I'll have a shot at an odd bit width - which should go much further.

PS: Just over 2 GB of RAM used by each at the moment. I think the prior 16 TB runs got up to about 6.5 GB of RAM each. I'll keep an eye on this run too.

Both have just breezed passed 16 TB ... another 3 days 'till 32 TB mark. Not sure how that conforms, I expected at least the [33:2] to fail at or before the earlier [57:2] run.

I had planned on reflashing the BIOS this weekend too. That sucks a little now cos it might be some weeks.

RAM usage is now over 15 GB between them.

cgracey · 2018-04-06 11:55

Evanh, what are you testing? Is it some form of Xoroshiro128?

evanh · 2018-04-06 11:58

That was started a few days back, it's the original Xoroshiro128+ [55 14 36].

cgracey · 2018-04-06 12:01

Okay. Cool!

Wait... You mean you are also running the NEW one now? That would be good.

evanh · 2018-04-06 12:10

No I cancelled that so it wouldn't slow these two down. I guess now that they haven't failed where I expected them to they can be lowered in priority.

The short run test, [7:0] sampling is usually a weak point, that I did do of the new algorithm satisfied me it is good to go. I'll do a short run full width 64-bit run now to double check that ...

cgracey · 2018-04-06 12:15

Sounds good.

TonyB_ · 2018-04-06 14:42

Seba has confirmed our test outputs for the new and much improved xoroshiro128. So new it hasn't been made public yet. All 64 bits of the output are very good and are used equally by the bit scramblers. Previously with xoroshiro128+ the top 62 bits were used unequally and the weak lowest two bits ignored. Everything is fine for xoroshiro128.

Not necessarily so for XORO32. A little while ago we switched algorithms to xoroshiro32++ [14,2,7,5] from [14,2,7,6]. The only difference is a left rotation of 5, not 6, in the PRN calculation. I asked Seba to check [14,2,7,5] with BigCrush, which is one of the battery tests in TestUI, an alternative random number tester to PractRand that Evan has been using. He says we should not rely on a single test program.

The main batteries in the TestUI suite are SmallCrush, Crush and BigCrush, which are progressively more stringent and time-consuming. BigCrush results for [14,2,7,5] were much better than [14,2,7,6]. However, they were also totally wrong as xoroshiro64++ [14,2,7,5] was tested by mistake.

I have been sent the following report files:

SmallCrush [14,2,7,5] - 1 test failed (of 10)
Crush [14,2,7,5] - 5 tests failed (of 96)
BigCrush [14,2,7,6] - 11 tests failed (of 106)

Unfortunately I can't compare like-for-like. Seba told me that [14,2,7,6] passes all tests in Crush. I really need to see corresponding report files to be 100% sure but as things stand I think we should revert to [14,2,7,6] which has better scores with normal PractRand tests, but not extended ones.

evanh · 2018-04-06 21:52

TonyB_ wrote: »

... I think we should revert to [14,2,7,6] which has better scores with normal PractRand tests, but not extended ones.

I consider the extended tests more valid, and I didn't test as many sampling variants with the default PractRand settings either.

I'm not in favour of reverting.

evanh · 2018-04-06 22:15

Chip,
I left the full width sampling test of the updated Xoroshiro128 running overnight. It went past 2 TB without issue.

EDIT: To put that in context, under same test configuration, the original Xoroshiro128+ falls over at a tiny 4 MB! See https://forums.parallax.com/discussion/comment/1434165/#Comment_1434165

cgracey · 2018-04-06 22:42

Evanh, I hope our new XORO32 is optimal. I don't want to force a change with OnSemi over something like that.

TonyB_, any change of opinion?

evanh · 2018-04-06 23:05

I'm rerunning the same candidates with and without extended testing right now ...

PS: Seba's earlier scores for [14 2 7 x] would have both been for Xoroshiro64++ not for 32++. My guess is we have no idea how good Xoroshiro32++ [14 2 7 6] scores are on the Crush tests.

TonyB_ · 2018-04-06 23:13

cgracey wrote: »

Evanh, I hope our new XORO32 is optimal. I don't want to force a change with OnSemi over something like that.

TonyB_, any change of opinion?

Chip, Seba said he would run some more Crush and BigCrush tests and we'll have the output files to make a proper comparison between the two. Results not here yet and might not be now until tomorrow. I hope our XORO32 is optimal and we haven't gone backwards by switching.

Evan, can you post your extended results here, now that xoroshiro++ is not a secret any more? The problem I have is that the best two quadruples of [5,2,6,9] and [6,2,5,9] do badly on repeat frequency, which makes me doubt the worth of the extended tests. [14,2,7,5] and [14,2,7,6] do well on repeat frequency.

evanh · 2018-04-06 23:39

Chip,
This is certainly not something that requires intervention. The differences are nit picking.

Having said that, it's even murkier at this end now.

It would seem I was only just starting my foray into the extended options of PractRand when I generated that score table. It didn't include the -tf 2 option.

Here's a repost of that truncated table, plus I've added the equivalent table for standard options and also for -te 1 -tf 2 extended options:


  Xoroshiro32(16)++ PractRand Score Table.  PractRand v0.93 options: -multithreaded -te 1 -tlmin 1KB
__________________________________________________________________________________________________________
                Sampling apertures of the generator output, labelled as most to least bit-significance
  Candidate    ----------------------------------------------------------------------------------------
[ A  B  C  D]  15:00 14:00 13:00 15:01 15:02 15:08 14:07 13:06 12:05 11:04 10:03  9:02  8:01  7:00
==========================================================================================================
[ 5  2  6  9]     1G    4G    2G    2G    1G    2G    2G    1G    1G    1G    2G    1G    1G    1G
[ 6  2  5  9]     1G    4G    2G    4G    1G    2G    1G    2G    1G    1G    1G    1G    1G    2G
[14  2  7  5]   512M    4G    4G    2G    4G    1G    1G    1G    1G    1G    1G    1G    1G    1G
[14  2  7  6]   512M    8G    2G    4G    1G    2G    1G  256M  256M    2G  512M  512M  512M  512M
[15  4 12  9]   512M   16G    8G    8G    2G    1G    1G  512M  512M  512M  512M    1G    1G    1G



  Xoroshiro32(16)++ PractRand Score Table.  PractRand v0.93 options: -multithreaded -tlmin 1KB
__________________________________________________________________________________________________________
                Sampling apertures of the generator output, labelled as most to least bit-significance
  Candidate    ----------------------------------------------------------------------------------------
[ A  B  C  D]  15:00 14:00 13:00 15:01 15:02 15:08 14:07 13:06 12:05 11:04 10:03  9:02  8:01  7:00
==========================================================================================================
[ 5  2  6  9]     1G    8G   16G    8G    4G    2G    2G    1G    2G    2G    2G    8G    2G    1G
[ 6  2  5  9]     1G    8G    8G    8G    4G    2G    2G    2G    2G    2G    2G    2G    1G    2G
[14  2  7  5]   512M   16G   16G    8G   16G    2G    2G    1G    1G    2G    2G    2G    2G    2G
[14  2  7  6]   512M   16G    8G   16G    8G    4G    2G    1G  512M    2G    1G    1G    1G    2G
[15  4 12  9]   512M   16G   16G   16G    8G    1G    1G  512M  512M    1G    1G    1G    1G    1G



  Xoroshiro32(16)++ PractRand Score Table.  PractRand v0.93 options: -multithreaded -te 1 -tf 2 -tlmin 1KB
__________________________________________________________________________________________________________
                Sampling apertures of the generator output, labelled as most to least bit-significance
  Candidate    ----------------------------------------------------------------------------------------
[ A  B  C  D]  15:00 14:00 13:00 15:01 15:02 15:08 14:07 13:06 12:05 11:04 10:03  9:02  8:01  7:00
==========================================================================================================
[ 5  2  6  9]     1G    4G    2G    2G    1G    2G    2G    1G    1G    1G    2G    1G    1G    1G
[ 6  2  5  9]     1G    4G    2G    4G    1G    2G    1G  256M    1G    1G    1G    1G    1G    2G
[14  2  7  5]   256M    4G    4G    2G    4G    1G    1G    1G    1G    1G    1G    1G    1G    1G
[14  2  7  6]   512M    8G    2G    4G    2G    2G    1G  256M  256M    2G  512M  512M  512M  512M
[15  4 12  9]   256M   16G    8G    8G    2G    1G    1G  512M  512M  512M    1G    1G    1G    1G

Notably, with full extended options, [14 2 7 5] degrades to 256 MB on the full width sampling.

evanh · 2018-04-06 23:47

Here's all the full extended "-te 1 -tf 2" report files I just scored.

TonyB_ · 2018-04-07 00:02

evanh wrote: »

Chip,
This is certainly not something that requires intervention. The differences are nit picking.

It's more than nit-picking if [14,2,7,6] passes all the Crush tests with [14,2,7,5] failing 5. Or if [14,2,7,5] has fewer passes in BigCrush than [14,2,7,6]. The failures are due to the small state, if the generator is good, but that applies to PractRand as well.

P.S. xoroshiro[x]y++ is in the text in the output files.

evanh · 2018-04-07 00:11

No 32-bit state PRNG passes all Crush tests. Their periods just aren't long enough.

I'm confident we haven't seen any Crush scores for [14,2,7,6], and that it won't be significantly different to [14,2,7,5].

Random/LFSR on P2

Comments