Random/LFSR on P2

TonyB_ · 2018-05-15 00:59

TonyB_ wrote: »

evanh wrote: »

Ouch, Melisa isn't impressed.

But Seba is unfazed.

evanh wrote: »

I'm not convince her concerns are entirely valid for what we are wanting though:
1: The progressiveness in state changes are hidden by the scrambler. That's the scrambler's job.
2: Algorithmically predictable is a given. Not a concern to us. This isn't for cryptography.
3: Invertible I don't understand. She seems focused on particular multipliers there. Maybe it's similar to predictability in that it can be hacked. I'm unsure how important this is, but it just sounds like it needs new constants. No shortage of those to choose from.

The paper explains the theoretical basis for the multipliers in the * and ** scramblers.

I'm wondering how often people will multiply millions of outputs by the same value of 57.

A new critique of PCG:
http://pcg.di.unimi.it/pcg.php

Heater. · 2018-05-15 06:45

Wow, PRNG wars are heating up.

So, on the one hand, the best, recent, xoro.. generator is rubbish because multiplying it's output by 57 causes it to fail tests of randomness. Or multiply by any of trillions of other numbers.

On the other hand, the best PCG generators are Smile because if you make two copies of one generator, flip a bit in one of then then run them, interleaving the output, the result fails tests of randomness.

Now, seems to me the former failure is more likely to happen in a practical application. The second failure is just a weird thing that nobody would ever do. Which is worse? I don't know.

I find the language used in these discussions to be getting a pit personal. For example: "The trick used by Melissa O'Neill to make people believe she could..." Implies some kind of deliberate dishonesty, like some kind of card sharp. I know academics can get heated but usually the keep it out of their papers.

evanh · 2018-05-15 10:25

It does look a little tit-for-tat, but Seba's blog certainly isn't the published "paper".

evanh · 2018-05-15 10:36

TonyB_ wrote: »

According to your byte theory, 4-bit samples should have better scores than 8-bit samples.

I've just mashed up another pair of scripts to test then build a table of every possible aperture for a given candidate. Here's that poor scoring candidate:

__________________________________________________________________________________________________________
  PractRand scoring of Xoroshiro32(16)++ candidate [4 7 15 11].  Double Full Period = 16 GB.
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
      Scoring ran from 2018-05-15 17:34:01 to 2018-05-15 18:17:40.  Build 2018-05-15 18:18:33 +1200
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |    4M  128M  256M  256M  256M  128M  256M    1G  128M  256M  512M  512M  512M  512M    8M    2M
 15 |  128M  512M    1G  512M  512M  512M  512M    1G  512M    1G  128K  512M  512M  512M  512M  256M
 14 |   64M   64M  256M  512M  256M  256M  256M  512M  256M  512M  512M  512M  512M  512M  256M  256M
 13 |  128M    2G   16G    2G  512M  256M  256M  256M  512M  256M  256M  512M  256M  256M  256M  256M
 12 |   64M    1G    1G    1G  512M  256M   64M   32M   64M   64M   32M   32M   64M   64M   32M   64M
 11 |  128M    4G    2G    2G    4G    4G    4G  256M   64M   64M  128M   64M  128M  128M  128M  128M
 10 |   64M    1G    1G    1G    1G    1G    4G  512M   64M   32M   32M   32M   32M   32M   32M   32M
 09 |   64M    1G    4G    2G    2G    2G    2G    8G    2G   64M   32M   32M   32M   32M   32M   64M
 08 |    4M   64M    1G    1G    1G    1G  512M  512M  512M    1G    8M    4M    4M    4M    4M    8M
 07 |   32M  128M    1G    2G    2G    2G    1G  512M  256M    1G    8G   32M    8M    8M   16M   16M
 06 |    8M  128M  512M  512M  512M    1G  256M  128M  128M  128M  512M    2G    8M    4M    4M    8M
 05 |   16M  256M    1G  512M    2G    4G    2G  256M  256M    1G    4G    2G    2G    8M    8M   16M
 04 |    1M   32M  256M  128M  128M    1G    2G  512M   32M   64M  512M  512M  256M  256M    2M  512K
 03 |    2M   64M    2G    4G    1G    1G    2G    2G    1G  256M    4G    2G    2G    1G    2G    4M
 02 |  512K   16M    1G  512M    2G  128M    1G    1G    2G    2G  256M    2G    2G    1G    1G  256M
 01 |    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G  512M    1G    1G    1G

And the chosen XORO32:

__________________________________________________________________________________________________________
  PractRand scoring of Xoroshiro32(16)++ candidate [14 2 7 5].  Double Full Period = 16 GB.
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
      Scoring ran from 2018-05-15 18:24:42 to 2018-05-15 19:51:30.  Build 2018-05-15 19:52:30 +1200
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  256M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M
 15 |    4G    2G    8G    8G    4G    4G    2M    2G    4G    4G    4G    8G    4G    8G    4G   16G
 14 |    4G    4G    4G    2G    2G    2G    1G    2G    1G    2G    2G    2G    4G    4G    4G    2G
 13 |    2G    4G    4G    4G    4G    4G    4G    4G    4G    2G    4G    2G    4G    4G    2G    2G
 12 |  512M  512M    1G    1G  512M  512M    1G    1G    1G  512M    1G    1G    1G    1G  512M  512M
 11 |    2G    2G    4G    4G    4G    2G    4G    4G    4G    2G    2G    4G    4G    1M    4G    4G
 10 |    2G    2G    1G    2G    2G    2G    1G    2G    2G    2G    2G    2G    2G    2G    2G    2G
 09 |    2G    4G    4G    4G    4G    8G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G
 08 |    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G  256M  512M    1G    1G    1G    1G
 07 |    8G    2G    4G    4G    4G    4G    8G    8G    8G    8G    4G    4G    4G    8G    8G    4G
 06 |    4G    2G  512M    2G    1G    2G    2G    4G    2G    2G    2G    2G    1G    2G    4G    2G
 05 |    4G    2G    4G    4G    4G    2G    4G    2G    4G    4G    4G    4G    2G    2G    2G    4G
 04 |  128M    1G    2G    1G    1G    2G    1G    2G    2G    1G    1G    2G    1G  512M    1G  512M
 03 |    2G    4G    2G    4G    4G    4G    2G    4G    4G    4G    4G    4G    4G    4G    4G    2G
 02 |    1G    2G    2G    2G    2G    2G    2G    2G    2G    1G    1G    2G    2G    2G  512M  512M
 01 |    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G

Brian Fairchild · 2018-05-15 10:58

If there is so much uncertainty about the quality of these generators then why is one being hard-baked into the P2?

evanh · 2018-05-15 11:07

Brian Fairchild wrote: »

If there is so much uncertainty about the quality of these generators then why is one being hard-baked into the P2?

It's not as bad as it might seem. Melisa even says in her latest blog entry, http://www.pcg-random.org/posts/implausible-output-from-xoshiro256.html, that none of her raised points are likely to affect most real world applications.

TonyB_ · 2018-05-15 11:09

evanh wrote: »

TonyB_ wrote: »

According to your byte theory, 4-bit samples should have better scores than 8-bit samples.

I've just mashed up another pair of scripts to test then build a table of every possible aperture for a given candidate. Here's that poor scoring candidate:

__________________________________________________________________________________________________________
  PractRand scoring of Xoroshiro32(16)++ candidate [4 7 15 11].  Double Full Period = 16 GB.
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
      Scoring ran from 2018-05-15 17:34:01 to 2018-05-15 18:17:40.  Build 2018-05-15 18:18:33 +1200
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |    4M  128M  256M  256M  256M  128M  256M    1G  128M  256M  512M  512M  512M  512M    8M    2M
 15 |  128M  512M    1G  512M  512M  512M  512M    1G  512M    1G  128K  512M  512M  512M  512M  256M
 14 |   64M   64M  256M  512M  256M  256M  256M  512M  256M  512M  512M  512M  512M  512M  256M  256M
 13 |  128M    2G   16G    2G  512M  256M  256M  256M  512M  256M  256M  512M  256M  256M  256M  256M
 12 |   64M    1G    1G    1G  512M  256M   64M   32M   64M   64M   32M   32M   64M   64M   32M   64M
 11 |  128M    4G    2G    2G    4G    4G    4G  256M   64M   64M  128M   64M  128M  128M  128M  128M
 10 |   64M    1G    1G    1G    1G    1G    4G  512M   64M   32M   32M   32M   32M   32M   32M   32M
 09 |   64M    1G    4G    2G    2G    2G    2G    8G    2G   64M   32M   32M   32M   32M   32M   64M
 08 |    4M   64M    1G    1G    1G    1G  512M  512M  512M    1G    8M    4M    4M    4M    4M    8M
 07 |   32M  128M    1G    2G    2G    2G    1G  512M  256M    1G    8G   32M    8M    8M   16M   16M
 06 |    8M  128M  512M  512M  512M    1G  256M  128M  128M  128M  512M    2G    8M    4M    4M    8M
 05 |   16M  256M    1G  512M    2G    4G    2G  256M  256M    1G    4G    2G    2G    8M    8M   16M
 04 |    1M   32M  256M  128M  128M    1G    2G  512M   32M   64M  512M  512M  256M  256M    2M  512K
 03 |    2M   64M    2G    4G    1G    1G    2G    2G    1G  256M    4G    2G    2G    1G    2G    4M
 02 |  512K   16M    1G  512M    2G  128M    1G    1G    2G    2G  256M    2G    2G    1G    1G  256M
 01 |    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G  512M    1G    1G    1G

And the chosen XORO32:

__________________________________________________________________________________________________________
  PractRand scoring of Xoroshiro32(16)++ candidate [14 2 7 5].  Double Full Period = 16 GB.
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
      Scoring ran from 2018-05-15 18:24:42 to 2018-05-15 19:51:30.  Build 2018-05-15 19:52:30 +1200
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  256M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M
 15 |    4G    2G    8G    8G    4G    4G    2M    2G    4G    4G    4G    8G    4G    8G    4G   16G
 14 |    4G    4G    4G    2G    2G    2G    1G    2G    1G    2G    2G    2G    4G    4G    4G    2G
 13 |    2G    4G    4G    4G    4G    4G    4G    4G    4G    2G    4G    2G    4G    4G    2G    2G
 12 |  512M  512M    1G    1G  512M  512M    1G    1G    1G  512M    1G    1G    1G    1G  512M  512M
 11 |    2G    2G    4G    4G    4G    2G    4G    4G    4G    2G    2G    4G    4G    1M    4G    4G
 10 |    2G    2G    1G    2G    2G    2G    1G    2G    2G    2G    2G    2G    2G    2G    2G    2G
 09 |    2G    4G    4G    4G    4G    8G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G
 08 |    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G  256M  512M    1G    1G    1G    1G
 07 |    8G    2G    4G    4G    4G    4G    8G    8G    8G    8G    4G    4G    4G    8G    8G    4G
 06 |    4G    2G  512M    2G    1G    2G    2G    4G    2G    2G    2G    2G    1G    2G    4G    2G
 05 |    4G    2G    4G    4G    4G    2G    4G    2G    4G    4G    4G    4G    2G    2G    2G    4G
 04 |  128M    1G    2G    1G    1G    2G    1G    2G    2G    1G    1G    2G    1G  512M    1G  512M
 03 |    2G    4G    2G    4G    4G    4G    2G    4G    4G    4G    4G    4G    4G    4G    4G    2G
 02 |    1G    2G    2G    2G    2G    2G    2G    2G    2G    1G    1G    2G    2G    2G  512M  512M
 01 |    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G    1G

It's interesting that the lowest [14,2,7,5] score for 16- and 4-bit is when bit 0 is bit 0 in the sample. [4,7,15,11] score for 4-bit is much worse than 1-bit. I'm wondering how [14,2,7,6] and [6,2,3,9] and [6,2,11,6] do in the complete tests. None of these has a triplicated output but [3,2,6,5] and [14,2,7,5] do.

evanh · 2018-05-15 11:18

I'm mostly ignoring the 1-bit apertures.

I'm running [14 2 7 6] now.

There is two very poor scores, of 2M and 1M, in [14 2 7 5]. So I'm keen to see if any candidate has a clean sweep or not.

Heater. · 2018-05-15 11:50

evanh,

Seba's blog certainly isn't the published "paper".

I know that blog post is not a "paper" in the traditional academic, peer reviewed, journal paper way.

And tit-for-tat is OK. Academics and researchers argue over ideas all the time. Hopefully arriving and some consensus on a "correct" theory at some point. That's what we pay academics to do after all.

I'd like technical debates to stay impersonal though, in public.

TonyB_ · 2018-05-15 11:54

evanh wrote: »

I'm mostly ignoring the 1-bit apertures.

I'm running [14 2 7 6] now.

There is two very poor scores, of 2M and 1M, in [14 2 7 5]. So I'm keen to see if any candidate has a clean sweep or not.

Evan, previously here [6,2,3,9] had a terrible 2M score for bit 13 by itself, which you treated as an anomaly and more or less ignored. It ought to be easiest to get good scores with 1-bit samples, as they should have the least amount of correlation between the bits in the samples.

[14,2,7,5] has 2M score for 15 bits with bit 6 as lsb and 1M for 11 bits with bit 13 as lsb. Do these have no more failures until 2G or 4G, say? Could very low anomalies be recorded somehow but the tests allowed to continue automatically until "meaningful" failures occur?

Alternatively, as the number of anomalies is low, it might not take too much extra time to do special one-off tests.

Heater. · 2018-05-15 11:57

Brian,

If there is so much uncertainty about the quality of these generators then why is one being hard-baked into the P2?

A couple of reasons I can think of.

1) Speed. I can only assume the hardware PRNG is a lot faster than coding in assembler. I guess somebody, somewhere, sometime, may possibly. maybe, need the speed for some application that I cannot imagine.

I'd be interested to know how much faster the hardware PRNG is compared to a software implementation.

2) It might ensure that a "good" algorithm gets into general use. In the past people of used all kind of PRNs that were terrible.

3) Because we can!

evanh · 2018-05-15 12:02

Here's [14 2 7 6]

__________________________________________________________________________________________________________
  PractRand scoring of Xoroshiro32(16)++ candidate [14 2 7 6].  Double Full Period = 16 GB.
      PractRand v0.93 options:  stdin -multithreaded -te 1 -tf 2 -tlmin 1KB
      Scoring ran from 2018-05-15 22:12:42 to unknown.  Build 2018-05-15 23:44:21 +1200
    |===00====01====02====03====04====05====06====07====08====09====10====11====12====13====14====15=
 16 |  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M  512M
 15 |    8G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G    8G   16G    4G
 14 |    2G    2G    2G    2G    2G    2G    1G    2G    1G    1G    2G    2G    1G    2G    4G    4G
 13 |    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G    8G
 12 |  512M  512M  512M  512M  512M  512M    1G    1G    1G    1G    1G  512M  512M    1G  512M  512M
 11 |    2G    2G    4G    4G    4G    2G    4G    4G    4G    4G    4G    4G  128K    4G    4G    2G
 10 |  128K    1G    2G    2G    2G    2G    2G    2G    2G    2G    2G    1G    2G    2G    2G    2G
 09 |    4G    2G    2G    4G    4G    8G    8G    8G    4G    8G    4G    8G    2G    8G    8G    4G
 08 |  512M  512M  512M  512M    2G  256M  256M    1G    2G    1G    1G    1G    1G    2G    1G  512M
 07 |    4G    2G    2G    4G    4G    4G    2G    2G    8G    8G    4G    4G    4G    4G    4G    4G
 06 |    4G    4G    1G    2G    2G    4G    2G  512M    1G    2G    2G    2G    2G    2G    2G    2G
 05 |    1G    8K    2G    4G    4G    2G    4G    4G    4G    4G    4G    4G    4G    4G    4G    4G
 04 |    1G  512M  512M    2G  512M  256M  128M    1G  512M    1G    1G    1G    1G    4G    1G  256M
 03 |    2G    4G    2G    4G    4G    4G    4G    4G    2G    4G    4G    4G    4G    4G    4G    2G
 02 |  512M    2G    2G    2G    2G    1G  128M    2G    2G    1G    2G    2G    2G    2G    2G  128M
 01 |    1G    1G    1G    1G    1G    1G  512M    1G    1G    1G    1G    1G    1G    1G    1G    1G

Same situation here, three more very low scores of 128K and 8K.

It seems these occasional oddball scores are probably all just borderline threshold cases. Attached is the extended runs for these five exceptions. In all five cases, same as a previous finding, the failure point is only just over the threshold and testing beyond initial failure yields normal readings for the remainder of each case.

EDIT: Corrected broken datestamp and PR options listed in table heading - breakage due to reliance on last test run being the one referenced for table heading.

evanh · 2018-05-15 12:21

I'll go a little further and say the exceptional fails are only occurring on BCFN_FF tests, ie: This particular test needs a small adjustment. It's not something I understand anywhere near enough to be adjusting myself though.

TonyB_ · 2018-05-15 12:43

evanh wrote: »

I'll go a little further and say the exceptional fails are only occurring on BCFN_FF tests, ie: This particular test needs a small adjustment. It's not something I understand anywhere near enough to be adjusting myself though.

I wondering whether or not the BCFN_FF tests affect scores more generally. For [14,2,7,6] do 04 | 06 and 02 | 06 fail other tests at 128M? Or [14,2,7,5] 04 | 00 and 16 | 00?

evanh · 2018-05-15 13:12

Those are correct. Most real fails are on FPF tests. Or at least the FPF tests are also failing at the same time.

Here's reports for what you're asking. Also note on [14 2 7 6], aperture 2+15 almost fails a BCFN_FF test at 32 KB rather than the correct 128 MB.

evanh · 2018-05-15 14:09

Tony,
I think you missed my final pairs/zero runs distribution report you wanted: http://forums.parallax.com/discussion/download/122744/dist_report4.zip

Or at least you didn't mention any more conclusions after I posted it.

potatohead · 2018-05-15 16:11

Speed. I can only assume the hardware PRNG is a lot faster than coding in assembler. I guess somebody, somewhere, sometime, may possibly. maybe, need the speed for some application that I cannot imagine.

Don't we need a robust noise source for some DAC modes, or applications?

Heater. · 2018-05-15 18:40

potatohead,

Don't we need a robust noise source for some DAC modes, or applications?

No idea.

Don't we still have the true random number capability of the P1?

potatohead · 2018-05-15 19:10

Maybe, I don't know that either.

In any case, what we baked in is pretty darn nice. It's going to get used.

TonyB_ · 2018-05-15 19:12

Heater. wrote: »

Brian,

If there is so much uncertainty about the quality of these generators then why is one being hard-baked into the P2?

A couple of reasons I can think of.

1) Speed. I can only assume the hardware PRNG is a lot faster than coding in assembler. I guess somebody, somewhere, sometime, may possibly. maybe, need the speed for some application that I cannot imagine.

I'd be interested to know how much faster the hardware PRNG is compared to a software implementation.

2) It might ensure that a "good" algorithm gets into general use. In the past people of used all kind of PRNs that were terrible.

3) Because we can!

The P2 has two different types of random number generator:

(a) xoroshiro128** in the hub, with 128 state bits of state and 64 output bits.
(b) xoroshiro32++ in each cog, with 32 state bits and 16 output bits. XORO32 does a double iteration of xoroshiro32++ to produce a 32-bit output in two instructions, XORO32 itself followed by an output write instruction (which need not be a simple move).

I doubt that any other microcontroller or microprocessor even has had so much time spent on its inbuilt PRNGs. What the P2 has is very good and nobody need worry about the quality, as those of us involved have used the most up-to-date and most suitable algorithms. We have tried to choose the best set of constants we can for xoroshiro32 (which did not exist until Chip thought of it) and Evan has run thousands and thousands of tests.

Although the P2 does not use the new xoshiro256**, the subject of interesting debate currently, xoroshiro128** does share the same ** scrambler. Originally the hub had a xoroshiro128+ and a fortunate late bug-fix allowed us to change to better constants and implement the new scrambler. The latter produces a much better quality output with all 64 bits usable, instead of only the top 62 with the old + scrambler due to the not-so-good low two bits.

Multiplying by some special value to unscramble the ** output is a non-issue on the P2 as only scrambled subsets are sent to the cogs (and I/O pins) and more on this in the next post. Note that we could have implemented a xoroshiro128++ but it would have needed more logic than xoroshiro128**.

XORO32 is 20 times smaller and faster than a software implementation, which would require 41 instructions and four registers compared to only two instructions and at most two registers. The 16-bit rotations that are easy to do in hardware add 18 instructions in software.

TonyB_ · 2018-05-15 19:29

potatohead wrote: »

Speed. I can only assume the hardware PRNG is a lot faster than coding in assembler. I guess somebody, somewhere, sometime, may possibly. maybe, need the speed for some application that I cannot imagine.

Don't we need a robust noise source for some DAC modes, or applications?

Yes and we have that.

Chip modified his 512-bit RealRandom scrambler to spread the 64-bit xoroshiro128** output equally among the 16 potential cogs and 64 I/O pins. No cog or pin uses the same bit twice and each bit is used eight times in total by 64 pins and by 16 cogs. The first row of 32 bits below is for cog 0 and pins 0-3, probably. If not, I'm sure Chip will correct me.

assign rnd = {
	{ x[17],!x[45],!x[56], x[20],!x[16], x[11],!x[19], x[02],!x[12],!x[51], x[37],!x[06], x[23],!x[07],!x[01],!x[54],!x[44], x[33],!x[24], x[26], x[27], x[05], x[47],!x[43],!x[36],!x[48], x[55],!x[22],!x[52],!x[34],!x[21], x[49]},
	{ x[42],!x[25],!x[61],!x[50], x[41],!x[40], x[29],!x[08],!x[31], x[14], x[39],!x[59], x[28], x[18], x[62],!x[13], x[30], x[10],!x[04],!x[03],!x[58],!x[00],!x[09],!x[32], x[53],!x[60], x[38], x[57], x[63], x[35], x[46],!x[15]},
	{ x[19], x[61], x[48], x[32],!x[39], x[21],!x[09],!x[54], x[59],!x[49],!x[44],!x[11], x[17],!x[62],!x[27], x[13], x[04], x[10], x[43],!x[20], x[08],!x[16],!x[00],!x[58],!x[03],!x[40], x[29], x[31], x[57],!x[41], x[37], x[14]},
	{ x[52],!x[02],!x[15],!x[24], x[56], x[47], x[18],!x[35],!x[42], x[34], x[55], x[53],!x[01],!x[50],!x[26], x[63],!x[45],!x[28],!x[46],!x[30],!x[07],!x[60], x[22], x[12],!x[23], x[51],!x[05], x[25], x[36],!x[38], x[33], x[06]},
	{!x[19], x[02],!x[53],!x[11], x[25], x[06], x[61], x[23], x[09],!x[20],!x[01],!x[49], x[59], x[21], x[30], x[38], x[22],!x[63], x[46], x[42],!x[34],!x[08],!x[43], x[33], x[24],!x[41], x[48], x[12],!x[51], x[16], x[62],!x[29]},
	{ x[44], x[26], x[40], x[18], x[04],!x[52],!x[50], x[03], x[00],!x[54],!x[31], x[32],!x[15], x[36], x[45], x[35], x[47], x[14], x[37], x[55], x[58], x[39],!x[07],!x[13],!x[57], x[60],!x[05], x[17],!x[10], x[56], x[27], x[28]},
	{ x[05], x[57],!x[16], x[15],!x[46], x[34], x[60],!x[27],!x[10], x[19], x[62],!x[02], x[04], x[30],!x[37], x[17],!x[47], x[59],!x[52], x[63],!x[48],!x[06],!x[33],!x[44], x[01],!x[51],!x[09], x[14], x[07],!x[39], x[21], x[13]},
	{!x[38],!x[41],!x[56], x[43], x[22],!x[45], x[12], x[28], x[00], x[54], x[36],!x[42], x[03], x[26], x[24],!x[25], x[61], x[11],!x[40], x[23], x[58],!x[29],!x[53],!x[08],!x[20],!x[31],!x[32], x[50], x[35],!x[49], x[18],!x[55]},
	{!x[32], x[23], x[14],!x[25],!x[40], x[55], x[51],!x[18],!x[29],!x[26], x[28],!x[54],!x[59], x[10],!x[13],!x[16], x[39],!x[37],!x[34], x[12],!x[56], x[27], x[33],!x[05], x[02], x[61], x[24], x[48], x[01],!x[63], x[58],!x[08]},
	{!x[49], x[50], x[53],!x[21], x[07],!x[38],!x[57],!x[19],!x[11], x[17], x[62],!x[09],!x[60],!x[43],!x[44], x[30], x[41],!x[47],!x[45],!x[15],!x[52],!x[04],!x[06], x[22],!x[03],!x[36], x[46], x[35], x[31], x[42],!x[20],!x[00]},
	{!x[18],!x[58],!x[60],!x[11], x[25],!x[48], x[23], x[46], x[20], x[01],!x[02], x[45], x[54],!x[39], x[07],!x[50], x[15], x[62],!x[38], x[19],!x[34], x[13], x[41],!x[31],!x[61],!x[10], x[16],!x[55],!x[56],!x[63], x[33], x[03]},
	{ x[40], x[28],!x[14],!x[09],!x[12],!x[37], x[59],!x[49], x[36],!x[08],!x[17],!x[21],!x[42],!x[24], x[35],!x[52], x[04], x[29],!x[26],!x[57], x[27], x[53],!x[22],!x[51], x[43], x[47],!x[32], x[00], x[06],!x[30], x[44], x[05]},
	{!x[48], x[47],!x[27],!x[38],!x[52],!x[55],!x[31], x[21],!x[00], x[15],!x[23], x[50],!x[51],!x[26],!x[42], x[11], x[06], x[44],!x[22], x[35],!x[04], x[54], x[56],!x[24], x[12], x[37],!x[01], x[18],!x[10],!x[41], x[43],!x[30]},
	{!x[36],!x[57], x[62],!x[59],!x[09],!x[29], x[61],!x[45],!x[63],!x[03],!x[28], x[34], x[19], x[05], x[32],!x[16],!x[49],!x[14], x[07],!x[46], x[53], x[17], x[08], x[40],!x[20], x[60],!x[02],!x[58], x[33], x[13],!x[39], x[25]},
	{!x[01], x[00],!x[07], x[58], x[37],!x[16],!x[18], x[36],!x[41],!x[13], x[20],!x[55], x[35], x[63],!x[57],!x[03],!x[04], x[56], x[21],!x[39], x[44], x[30], x[49],!x[46], x[54],!x[59],!x[28], x[33], x[29], x[12], x[60], x[53]},
	{ x[52],!x[11],!x[34], x[48],!x[50],!x[40], x[27],!x[14],!x[24],!x[51], x[06],!x[17],!x[31], x[45], x[19], x[62], x[23],!x[47],!x[08],!x[05], x[02],!x[22], x[09],!x[43], x[38], x[32],!x[10],!x[26], x[61], x[25], x[42], x[15]} };

Heater. · 2018-05-15 19:56

TonyB_

Thanks for that explanation.

TonyB_ · 2018-05-15 22:26

Compared to a software double-iterated xoroshiro32++, a software xoroshiro64++ is much faster and smaller at 11 instructions and three registers (two for state and one for output). The hardware XORO32 is still over five times faster and smaller than a software implementation. The main reason for using xoroshiro64++ is the much larger period of 2^64-1 and a xoroshiro64+ would need only nine instructions.

It might be possible to implement a xoroshiro64** in hardware in the P2+ or P3 in three instructions and six clock cycles, if a 32-bit multiply of a variable by a constant could be done in three cycles. With a state width twice the output width, xoroshiro is inherently 2-dimensionally equidistributed, which a * or ** scrambler preserves whereas a + or ++ scrambler reduces this to 1-dimensional.

Thus, for xoroshiro64** every possible 32-bit output is repeated 2^32 times (1-dimensional) and every possible output pair occurs exactly once (2-dimensional). xoshiro256** is 4-dimensionally equidistributed, therefore every 64-bit output occurs 2^192 times, every pair 2^128, every triple 2^64 and every quadruple (32 bytes) once. Reduce these frequencies by one if outputs all zeroes in both cases.

TonyB_ · 2018-05-15 22:34

evanh wrote: »

Tony,
I think you missed my final pairs/zero runs distribution report you wanted: http://forums.parallax.com/discussion/download/122744/dist_report4.zip

Or at least you didn't mention any more conclusions after I posted it.

Evan, thanks for the reminder, I did see these reports earlier. The results show zfreq(n)/zfreq(n+1) = e but I need to work out what the zero frequency distribution should be in absolute terms. This requires some clear thinking and I've had lots of other things on my mind. I'm still pondering the meaning of the frequency of n zeroes in a row when n = 0.

potatohead · 2018-05-16 01:22

Bigger sample?

TonyB_ · 2018-05-16 01:33

The frequency of n zeroes in a row when n = 0 has no meaning in terms of zeroes and zfreq(0) should be the total number of non-zero pairs, i.e. zfreq(0) = sum of pfreq(n) for n > 0. Ideally, this should be 2^32(1-1/e) ~ 2714937127 and the typical current value ~ 1718000000 is much too low, by about 10^9.

The reason for the discrepancy is the length of every non-zero run - 1 is added to zfreq(0) and the -1 is the cause of the undercounting. The distribution.c source file needs changing in this section

	// read pair array and increment freq values
	irng = 0;
	do {
		ifreq = pairs[irng++];
		pfreq[ifreq]++;
		if( ifreq != 0 )
		{
			zfreq[ztally]++;
			ztally = 0;
		}
		else  ztally++;
	} while( irng );

In particular this code should be replaced

		{
			zfreq[ztally]++;
			ztally = 0;
		}

by the C-equivalent of the following pseudo-code

		if ztally > 0 then
		  zfreq[ztally] = zfreq[ztally] + 1
		  ztally = 0
		endif
		zfreq[0] = zfreq[0] + 1

potatohead · 2018-05-16 03:28

Thanks for that bit of education. My comment was largely in jest. Truth is, "good random" numbers are significantly more complex than I would have thought.

TonyB_ · 2018-05-16 04:36

Memo to self:
Ideally, if number of samples is N, zfreq(0) = N(1-1/e) and zfreq(x) = N(1-1/e)^2 / e^x for x > 0

evanh · 2018-05-16 13:02

Got it. Revised reports attached.

I've made a multi-case launching script too. It uses all RAM and creates a lot of smaller files now. I mainly chose this approach because I was wary of many programs concurrently trying to append the same files.

	// read pair array and increment freq values
	irng = 0;
	do {
		ifreq = pairs[irng++];
		pfreq[ifreq]++;

		// zero-run freq values
		if( ifreq == 0 )  ztally++;
		else {
			if( ztally != 0 )  zfreq[ztally]++;
			zfreq[0]++;
			ztally = 0;
		}
	} while( irng );

TonyB_ · 2018-05-16 13:52

Thanks a lot, Evan. zfreq(0) is correct now. Looking just at zfreq, [6,2,3,9] and [6,2,11,6] do very well. Not checked others.

Random/LFSR on P2

Comments