Random/LFSR on P2 - Page 39 — Parallax Forums

# Random/LFSR on P2

• Posts: 1,755
evanh wrote: »
TonyB_ wrote: »
TonyB_ wrote: »
Sebastiano Vigna kindly emailed me all the xoroshiro32 characteristic polynomials, essential info for creating jump functions. The weight is the number of terms in the polynomial and in theory close to 16 is preferable. Our best triplet [14,2,7] has weight of 11 and second best [15,3,6] has weight of 13.

I hadn't looked closely at all the full-period triplets before. For every [a,b,c] there is a corresponding [c,b,a] that shares the same characteristic polynomial. Knowing this in advance means brute force searches can be reduced by half.

Hey Tony, thanks for getting that. It perfectly matches our 84 brute forced findings. I only just really looked at it now.

And the recommendation of targeting only the ones of closely matching "degree" with word size will be valuable with what's coming up ... I've spoken with Sebastiano again and, same as you, I just had to ask and he immediately popped me a complete list of full-period candidates for a Xoroshiro64 iterator! Amazing stuff. It isn't nicely sorted in mirror pairs like your one was though.

Excellent news! Thanks for doing that, Evan. I could sort the list into mirror pairs, if you wish.
• Posts: 11,598
edited 2017-11-12 13:05
TonyB_ wrote: »
Excellent news! Thanks for doing that, Evan. I could sort the list into mirror pairs, if you wish.
29-33 done already while I'm reformatting it for input to the test suite.

• Posts: 1,755
edited 2017-11-12 14:07
cgracey wrote: »
So, we would need to do a separate addition to get the result? The "next s" trick would be used to update one of the 32-bit states?

No and no, but we do need a separate subtract to get s1 on its own.

Code when keeping sum[0]:
```	SUB	prn,s0			' prn = s1
XORO64	s0,prn			' s0,prn = 64-bit state
MOV	prn,0			' prn = sum[31:0] = new s0 + s1

Two registers required.
```

Code when replacing sum[0] with parity:
```	SUB	prn,s0			' prn = s1
XORO64	s0,prn		wz	' s0,prn = 64-bit state; Z = sum[0] ^ parity of sum[32:0] = parity of sum[32:1]
MOV	prn,0			' prn = sum[31:0] = new s0 + s1
MOV	prnp,prn
TESTB	prnp,#0		xorz	' prnp = sum[31:1],parity

Three registers required.
Assumes XORO64 can write Z.
```
• Posts: 1,755
evanh wrote: »
TonyB_ wrote: »
Excellent news! Thanks for doing that, Evan. I could sort the list into mirror pairs, if you wish.
29-33 done already while I'm reformatting it for input to the test suite.

Weights of 29-33 might not give best results. Good starting point, though.

• Posts: 1,755
edited 2017-11-12 14:07
TonyB_ wrote: »
```	SUB	prn,s0			' prn = s1
XORO64	s0,prn		wz	' s0,prn = 64-bit state; Z = sum[0] ^ parity of sum[32:0] = parity of sum[32:1]
MOV	prn,0			' prn = sum[31:0] = new s0 + s1
MOV	prnp,prn
TESTB	prnp,#0		xorz	' prnp = sum[31:1],parity

Three registers required.
Assumes XORO64 can write Z.
```

It is possible to reduce the above to four or only three instructions.

• Posts: 11,598
edited 2017-11-12 14:05
Tony,
In what I presented to Chip, I chose to revert back to just the earlier iterator only type function so as to limit the logic/flops burden in the instruction of the larger number of bits involved. Also, it keeps the execution to a single clock cycle, which I suspect is not the case with your approach.
• Posts: 11,598
edited 2017-11-12 14:04

• Posts: 1,755
edited 2017-11-13 19:21
evanh wrote: »
Tony,
In what I presented to Chip, I chose to revert back to just the earlier iterator only type function so as to limit the logic/flops burden in the instruction of the larger number of bits involved. Also, it keeps the execution to a single clock cycle, which I suspect is not the case with your approach.

If there are no timing issues with doing a couple of XORs before a 32-bit addition then my XORO64 idea will work.

EDIT:
Two parallel xoroshiro32s would be replaced by one xoroshiro64. The inputs to one of the two parallel 16-bit additions have a two xoroshiro delay in XORO32.
• Posts: 11,598
There is the subtraction first I believe.
• Posts: 1,755
The subtraction is a separate instruction.
• Posts: 11,598
Oh, okay.
• Posts: 1,755
edited 2017-11-12 23:52
cgracey wrote: »
What we have now is ideal, I think, and it's easy-to-use as can be. 32 bits is plenty for most uses, I imagine.

And if a *real* random number is needed, you can always just do a GETRND, without any contingencies.

XORO64 could produce 32-bit PRNs in three instructions. There is a new, simple and optional post-processing that should improve its output. I cannot say any more at the moment. We are getting tremendous help from Sebastiano Vigna, who has been testing the following xoroshiro[32]64 triples:
```10-13-19	31

13-3-18		27
14-4-25		25
18-5-7		27
21-3-12		27
27-4-14		27
```

The first one is his current recommendation.
• Posts: 1,755
edited 2017-11-14 13:20
All xoroshiro generators (without additional backend reworking) at some point have some Hamming dependency kicking in. That is, if you look at the number of ones and zeros in a sequence of, say, 10 consecutive outputs, the resulting 10-tuple doesn't have exactly the distribution it should be, as linear generator tend to create correlation between the number of 1s of consecutive outputs.

In xoroshiro128 this happens after several TB of data (at least 32TB), so it's not a real concern. But in smaller-sized generator it will happen before (e.g., "DC6" test from PractRand).

In this case, testing is fundamental because theory cannot really say much except what is full period and what not.
After a quick check, these triples (degree follows) fail DC6 at 128 gigabytes (and, of course, BRank):
```10-13-19	31

13-3-18		27
14-4-25		25
18-5-7		27
21-3-12		27
27-4-14		27
```
The first one has a better degree, so I would go for that.

I'll give a try to the first triple on BigCrush and let you know.
This passes (xoroshiro[32]64+-10-13-19) BigCrush except for the linearity tests.
• Posts: 1,755
edited 2017-11-14 03:15
cgracey wrote: »
Does this look correct for [15,3,6]?
```wire [15:0] xoro32z	= d[31:16] ^ d[15:0];			// first iteration
wire [31:0] xoro32y	= { xoro32z[9:0], xoro32z[15:10],
{d[0], d[15:1]} ^
{xoro32z[12:0], 3'b0} ^ xoro32z };

wire [15:0] xoro32x	= xoro32y[31:16] ^ xoro32y[15:0];	// second iteration
wire [31:0] xoro32	= { xoro32x[9:0], xoro32x[15:10],	// xoro32 = d result
{xoro32y[0], xoro32y[15:1]} ^
{xoro32x[12:0], 3'b0} ^ xoro32x };

wire [16:0] xoro32a	= xoro32y[31:16] + xoro32y[15:0];	// first sum
wire [16:0] xoro32b	= xoro32[31:16] + xoro32[15:0];		// second sum

assign xoro32r		= { xoro32b[15:1], ^xoro32b,		// xoro32r = prng result, next instruction's s value
xoro32a[15:1], ^xoro32a };
```

XORO32 could be improved probably with a little bit of re-ordering. This is not guesswork. PractRand tests are needed to be sure.

Evan, I tried to contact you by private message from prng forum. If you didn't receive it, please try the same, thanks.
• Posts: 11,598
That email address was just a throw away and it's actually stopped by the provided now, I seriously dislike the "social media" tendrils. Not to mention the horrid comment entry interface that Google Groups has. I'll be a very sad puppy if Parallax ever tries to use such rubbish for it's forums.

Fire away, what's the improvement?
• Posts: 11,598
edited 2017-11-14 07:42
It has definitely been helpful reading both Scro's and Seba's comments. It's helped to know what to look for when 100% testing isn't an option ... Here's a summary of what I've got so far:
```  Xoroshiro64+p PractRand Score Table
Summed bit0 removed
Parity prepended at bit31

Candidate   Deg   [31:0]  R=   [31:1]  [31:2]
==============================================
[14  4 25]   25 +   32G   5.8     2T
[25  4 14]   25     32G   3.2     1T
[26  4  9]   25     32G  10.5   256G
[11  2 14]   27     32G  10.4   256G
[15  2 26]   27     32G   4.2   512G
[26  2 15]   27     32G   6.7   512G
[13  3 18]   27 +   32G   1.4   512G
[12  3 21]   27     32G   2.3   512G
[21  3 12]   27 +   32G  -0.6     1T
[14  4 27]   27     32G   5.8     2T
[27  4 14]   27 +   32G   8.3     2T
[18  5  7]   27 +   32G   8.3   256G
[17 10  6]   27     32G  24.8   256G
[20  7 27]   29     32G  20.9   256G
[27  7 20]   29     32G  19.6   256G
[13  5 28]   31     32G  21.8   512G
[28  5 13]   31     32G  12.6     1T
[26  9 13]   31     32G  11.1   512G
[10 13 19]   31 +   32G   3.9   256G

+   Indicative good candidates by Seba.
R=  Raw value of the DC6-9x1Bytes-1 PractRand test for [31:0] sampling variant (Zero is good).

```
• Posts: 11,598
edited 2017-11-14 09:22
Note: The full width 32-bit tests top out at 32 GB. All failed with a massive brick wall BRank. I believe this is due to the summed bit1 being shifted into the more sensitive bit0 position for PractRand.

Example:
[Low1/32]BRank(12):3K(1) R=+21249 p~= 9e-6398 FAIL !!!!!!!!

EDIT:
The second column of scores, [31:1], what I've also called Word1 before, shows more substantial results. These all failed on the DC6-9x1Bytes-1 test in PractRand.

PS: The byte sized sampling variants (except for LSByte) feel like they're going to go a lot further. I tested a pair of low quality candidates out to 4 TB (took more than a day) and it wasn't giving up there.

LSByte, [7:0], always dies at 8 GB (exactly quarter of full 32-bit sampling). These have the same massive BRank fails as the full width variant.

EDIT2:
When placing the parity at bit0 the BRank test fails the [31:0] variant at 512 GB instead of 32G. There is a mix of DC6 failures at 256 GB with these. The group of good scores is diminished and not as obvious.

I'm running the parity at bit0 sweep at the moment.

• Posts: 11,598
Just remember, Tony, shifting a low quality bit further away from bit0 is not improving quality. It's just tricking Practrand into giving higher scores.
• Posts: 1,755
edited 2017-11-14 11:04
deleted
• Posts: 1,755
edited 2017-11-14 12:27
evanh wrote: »
Fire away, what's the improvement?

Email just sent.
• Posts: 1,755
evanh wrote: »
When placing the parity at bit0 the BRank test fails the [31:0] variant at 512 GB instead of 32G. There is a mix of DC6 failures at 256 GB with these. The group of good scores is diminished and not as obvious.

I'm running the parity at bit0 sweep at the moment.

Could you post these Xoroshiro64+p parity at bit0 scores some time?

• Posts: 11,598
edited 2017-11-14 15:18
I overwrote one of the good output files before I'd done the score table. I can post it minus that score ... EDIT: Correction, I did have that score noted already.

Here's the full list, I've only done the Word0 variant:
``` Summed bit0 removed                  Summed bit0 removed
Parity prepended at bit31            Parity appended at bit0

Candidate    [31:0]  [31:1]               [31:0]
============================             ========
[ 1  2 24]    512M    512M                 512M
[24  2  1]    256M    256M                 256M
[ 8  2 25]     32G     32G                  32G
[25  2  8]      8G    128G                 128G
[ 4  2 27]    128M    128M                 128M
[27  2  4]     16G     16G                  16G
[ 1  6 24]     32M     64M                  32M
[24  6  1]      4G      8G                   4G
[19  8 24]     32G    128G                  32G
[24  8 19]      8G     16G                   8G
[14 10 29]      8G      4G                   4G
[29 10 14]     32G     32G                  32G
[ 7 13  8]      2G      8G                   2G
[ 8 13  7]     16M     64M                  16M
[ 2 13 31]     64K      1M                 256K
[31 13  2]      1M      2M                   4M
[24 20 25]    256M    512M                 256M
[25 20 24]     32K      2M                   2M
[19 20 26]      8G      8G                   8G
[26 20 19]      8G      8G                   8G
[ 4 24 27]    256M    256M                 256M
[27 24  4]      4G      8G                   4G
[ 4  3 11]    128M    128M                  64M
[11  3  4]      4G      8G                   4G
[ 7  2 16]      8G     16G                  16G
[16  2  7]     32G     32G                  32G
[ 3  4 22]      8G     16G                   8G
[22  4  3]      8G      8G                   8G
[14  4 25]     32G      2T                 512G
[25  4 14]     32G      1T                 512G
[ 9  4 26]     32G     32G                  32G
[26  4  9]     32G    256G                 256G
[ 5  5 26]    128M    128M                 128M
[26  5  5]    256M    256M                 256M
[ 3  8 10]    256M    512M                 256M
[10  8  3]    256M    256M                 256M
[ 7  8 12]     16G     32G                  16G
[12  8  7]    512M    512M                 512M
[21  8 24]     16G    128G                  16G
[24  8 21]    512M      2G                 512M
[ 5 10 14]     16G     16G                  16G
[14 10  5]      8G     16G                   8G
[20 10 23]     32G     64G                  32G
[23 10 20]    512M      1G                 512M
[24 10 25]      2G      2G                   2G
[25 10 24]     32K     32M                  16M
[26 10 27]      2G      8G                   4G
[27 10 26]     16M     32M                  16M
[11 12 26]     32G     32G                  32G
[26 12 11]     32G     32G                  32G
[13 12 26]    512M    512M                 512M
[26 12 13]     32G    128G                 128G
[ 4 15 23]    512M    512M                 512M
[23 15  4]     16G     16G                   8G
[12 19 25]     32G    128G                  32G
[25 19 12]     16G     16G                  16G
[ 3 24 10]    128M    256M                 128M
[10 24  3]      4G      4G                   4G
[ 2 29 13]     16M     64M                  32M
[13 29  2]      4G      4G                   4G
[13 30 14]      1G      1G                   1G
[14 30 13]      8M      8M                   8M
[14 30 15]      1G      2G                   1G
[15 30 14]      4M      8M                   4M
[12 31 27]     16G     16G                  16G
[27 31 12]     16G     32G                  16G
[19  1 26]     32G     32G                  32G
[26  1 19]     16G     16G                  16G
[ 4  1 31]      2M      1M                   2M
[31  1  4]      2G    256M                   2G
[11  2 14]     32G    256G                 256G
[14  2 11]    256M    256M                 256M
[15  2 26]     32G    512G                 256G
[26  2 15]     32G    512G                 256G
[13  3 18]     32G    512G                 512G
[18  3 13]     16G     32G                  32G
[12  3 21]     32G    512G                 256G
[21  3 12]     32G      1T                 512G
[18  4 19]     16G     32G                   8G
[19  4 18]     64M    256M                 128M
[14  4 27]     32G      2T                 512G
[27  4 14]     32G      2T                 512G
[ 7  5 18]      2G      2G                   2G
[18  5  7]     32G    256G                 256G
[10  6 11]      4G     16G                   8G
[11  6 10]     32M    128M                  32M
[ 4  9 31]     64M     64M                  64M
[31  9  4]      4G      4G                   4G
[ 6 10 17]     32G    128G                  32G
[17 10  6]     32G    256G                  32G
[ 1 11  2]     64K    256K                 128K
[ 2 11  1]     64K    512K                 128K
[10 12 17]     32G     64G                  32G
[17 12 10]     16G     16G                  16G
[ 3 14  8]      4M    512M                 256M
[ 8 14  3]      2G      2G                   2G
[10 19 19]      8G      8G                   8G
[19 19 10]      1G    512M                   1G
[11 21 22]      1G      1G                   4G
[22 21 11]    512M    512M                 512M
[17 21 26]      8G      8G                   8G
[26 21 17]      8G      8G                   8G
[ 2 23 19]     32M     32M                  32M
[19 23  2]      2G      2G                   2G
[18 25 19]      4G      4G                   4G
[19 25 18]      4M      4M                   4M
[ 2 29 25]      8M      8M                   8M
[25 29  2]    512M    512M                 512M
[ 3 31 18]     64M    128M                  64M
[18 31  3]      4G      4G                   4G
[ 4  1 11]    512M    512M                 256M
[11  1  4]      4G      8G                   4G
[ 5  3 12]      1G      2G                   1G
[12  3  5]     16G     32G                  16G
[ 8  5 13]      8G      8G                   8G
[13  5  8]      8G     16G                   4G
[ 5  5 18]    128M    256M                 128M
[18  5  5]    256M    512M                 512M
[ 2  6 21]    128M      1G                 512M
[21  6  2]     16G    128G                  16G
[21  6 24]     16G    256G                  16G
[24  6 21]      1G      8G                   2G
[20  7 27]     32G    256G                  32G
[27  7 20]     32G    256G                  32G
[18  8 21]     32G    128G                  32G
[21  8 18]    512M      1G                 512M
[18 10 23]     32G     32G                  32G
[23 10 18]      4G      8G                   4G
[17 13 20]     32G    128G                  32G
[20 13 17]    128M    128M                 128M
[20 17 25]      4G      8G                   8G
[25 17 20]      1G      2G                   2G
[22 17 29]      2G      4G                   4G
[29 17 22]      8G      8G                   8G
[ 8 19  9]    512M      2G                 512M
[ 9 19  8]     32K      8M                   8M
[ 2 27 17]     16M     16M                  16M
[17 27  2]    256M      1G                   1G
[ 7 28 22]      8G     16G                   8G
[22 28  7]      8G     16G                   8G
[14 31 21]     16G     32G                  16G
[21 31 14]      8G     16G                  16G
[ 5  1 26]      1G      1G                   1G
[26  1  5]     32G     16G                  16G
[13  5 28]     32G    512G                 128G
[28  5 13]     32G      1T                 256G
[ 5  6 20]     16G     32G                  32G
[20  6  5]     16G     32G                  16G
[13  9 26]     16G     32G                  16G
[26  9 13]     32G    512G                 128G
[10 13 19]     32G    256G                 256G
[19 13 10]     32G    128G                 128G
[22 17 27]      4G      4G                   4G
[27 17 22]      1G      1G                   1G
[ 4 23  5]    128M    256M                 128M
[ 5 23  4]      1M      2M                   1M
[18 25 31]    256M    256M                 256M
[31 25 18]    256M    512M                 512M
[18 27 29]      2G      2G                   2G
[29 27 18]      4G      8G                   4G
[ 6 29 11]      4G      8G                   8G
[11 29  6]    512M      1G                 512M
[ 2 31  7]      8M     16M                   8M
[ 7 31  2]     32M     32M                  32M
[18 31 23]      8G     16G                   8G
[23 31 18]      1G      1G                   1G
[ 1 11 10]      4M      4M                   4M
[10 11  1]      1G      1G                   1G
[ 6 21 31]    128M    128M                 128M
[31 21  6]      2G      4G                   2G
[ 9 23 10]      1G      1G                   1G
[10 23  9]      4M      4M                   4M
[13 29 30]      2G      4G                   2G
[30 29 13]      4G      8G                   4G
[14 31 15]      1G      2G                   1G
[15 31 14]      4M      8M                   4M
[ 2  9  7]     16M     32M                  16M
[ 7  9  2]      2G      8G                 256M
[26 18 29]      2G      2G                   2G
[29 18 26]     64M     64M                  64M
[ 1 19 12]     16M     32M                  16M
[12 19  1]      1G      2G                   1G
[ 8 25 17]      8G      8G                   8G
[17 25  8]    128M      8G                   4G
[ 5 15  8]    128M      8G                   4G
[ 8 15  5]    512M    512M                 512M
[ 2 15 25]     32M     32M                  32M
[25 15  2]      2G      2G                   2G
```
• Posts: 11,598
edited 2017-11-14 15:27
Here's the revised high quality group: (BRanks hit at 512 GB for parity at bit0)
``` Xoroshiro64+p PractRand Score Table
Summed bit0 removed                    Summed bit0 removed
Parity prepended at bit31              Parity appended at bit0

Candidate   Deg   [31:0]  R=   [31:1]           [31:0]  R=
======================================         =============
[14  4 25]   25 +   32G   5.8     2T             512G  48.4
[25  4 14]   25     32G   3.2     1T             512G  44.3
[26  4  9]   25     32G  10.5   256G             256G
[11  2 14]   27     32G  10.4   256G             256G
[15  2 26]   27     32G   4.2   512G             256G
[26  2 15]   27     32G   6.7   512G             256G
[13  3 18]   27 +   32G   1.4   512G             512G  48.1
[12  3 21]   27     32G   2.3   512G             256G
[21  3 12]   27 +   32G  -0.6     1T             512G  45.5
[14  4 27]   27     32G   5.8     2T             512G  35.6
[27  4 14]   27 +   32G   8.3     2T             512G  39.3
[18  5  7]   27 +   32G   8.3   256G             256G
[17 10  6]   27     32G  24.8   256G              32G
[20  7 27]   29     32G  20.9   256G              32G
[27  7 20]   29     32G  19.6   256G              32G
[13  5 28]   31     32G  21.8   512G             128G
[28  5 13]   31     32G  12.6     1T             256G
[26  9 13]   31     32G  11.1   512G             128G
[10 13 19]   31 +   32G   3.9   256G             256G

+   Indicative good candidates by Seba.
R=  Raw value of the DC6-9x1Bytes-1 PractRand test for [31:0] sampling variant (Zero is good).

```
• Posts: 11,598
I would class five of those candidates as not making the grade:
[17 10 6]
[20 7 27]
[27 7 20]
[13 5 28]
[26 9 13]
• Posts: 1,755
edited 2017-11-14 15:31
Thanks, Evan. It's unfortunate there are no [31:1] scores for parity at bit 0. Never mind and don't let that disturb your current tests!

I've attached the xoroshiro[32]64+ characteristic polynomials sorted by weight/degree in CSV format.
• Posts: 1,755
evanh wrote: »
Here's the revised high quality group: (BRanks hit at 512 GB for parity at bit0)
``` Xoroshiro64+p PractRand Score Table
Summed bit0 removed                    Summed bit0 removed
Parity prepended at bit31              Parity appended at bit0

Candidate   Deg   [31:0]  R=   [31:1]           [31:0]  R=
======================================         =============
[14  4 25]   25 +   32G   5.8     2T             512G  48.4
[25  4 14]   25     32G   3.2     1T             512G  44.3
[26  4  9]   25     32G  10.5   256G             256G
[11  2 14]   27     32G  10.4   256G             256G
[15  2 26]   27     32G   4.2   512G             256G
[26  2 15]   27     32G   6.7   512G             256G
[13  3 18]   27 +   32G   1.4   512G             512G  48.1
[12  3 21]   27     32G   2.3   512G             256G
[21  3 12]   27 +   32G  -0.6     1T             512G  45.5
[14  4 27]   27     32G   5.8     2T             512G  35.6
[27  4 14]   27 +   32G   8.3     2T             512G  39.3
[18  5  7]   27 +   32G   8.3   256G             256G
[17 10  6]   27     32G  24.8   256G              32G
[20  7 27]   29     32G  20.9   256G              32G
[27  7 20]   29     32G  19.6   256G              32G
[13  5 28]   31     32G  21.8   512G             128G
[28  5 13]   31     32G  12.6     1T             256G
[26  9 13]   31     32G  11.1   512G             128G
[10 13 19]   31 +   32G   3.9   256G             256G

+   Indicative good candidates by Seba.
R=  Raw value of the DC6-9x1Bytes-1 PractRand test for [31:0] sampling variant (Zero is good).

```

Big difference in two sets of R values.
• Posts: 11,598
edited 2017-11-14 16:29
Yeah, It's probably not as bad as it looks because I think the scoring processed value gets scaled down more for larger datasets but those were bad enough to all fail the DC6 at the 512 MB mark as well as the BRank fail.

EDIT: The reason I included them is the 35.6 is the lowest. Just another datapoint to consider.
• Posts: 1,755
edited 2017-11-15 21:20
cgracey wrote: »
The current xoroshiro128+ implementation is as small as can be.

Every cog gets a uniquely-picked/ordered/inverted set of 32 bits from the 63-bit source. Every smart pin gets 8 such bits. Smart pins need these for DAC dithering and noise generation. Cogs need them for the GETRND instruction. I think cogs and pins will all be happy with this arrangement. These patterns really only need to serve as apparently-uncoordinated white noise sources.

On start-up, the PRNG will be 1/4 seeded maybe 64 times with 4 bits of pure thermal noise, along with another 27 bits of process/voltage/temperature-dependent bits.

A few questions:

1. Has the parity improvement been applied to xoroshiro128+ ? That would make a 64-bit source and could mean each bit is used the same number of times for the different cogs and smart pins.

2. Is there enough time to do the 64-bit addition in a single clock?

3. Have the different cog and smart pin scramblings been made public? (I don't want to hold up Chip with this one.)
• Posts: 13,610
edited 2017-11-15 21:44
No problem. Here is the Verilog that sets up 16 longs (one long for each cog, each byte goes to a unique pin):
```assign rnd = {
{!x[00], x[62],!x[02], x[28],!x[09], x[55], x[03],!x[34], x[45], x[37],!x[11], x[07],!x[41],!x[50],!x[52],!x[59],!x[27],!x[40], x[01], x[57], x[17],!x[53],!x[24],!x[22],!x[58], x[25], x[16],!x[29],!x[43], x[32],!x[14], x[30]},
{!x[47], x[49],!x[35],!x[38], x[33],!x[31], x[44], x[13], x[06], x[54],!x[05],!x[10],!x[23],!x[19],!x[51], x[20],!x[39],!x[18],!x[46],!x[42], x[36], x[08], x[04], x[60],!x[48],!x[56],!x[26], x[21],!x[12], x[61],!x[15], x[16]},
{!x[23],!x[21],!x[20],!x[47], x[10], x[48], x[51], x[07],!x[19], x[03],!x[50], x[60],!x[36], x[38],!x[33],!x[06],!x[28], x[27], x[00], x[53], x[24], x[52],!x[14],!x[29],!x[04], x[08], x[42],!x[13],!x[45],!x[56],!x[54], x[61]},
{!x[22], x[15], x[41],!x[05], x[59], x[58],!x[35],!x[26],!x[11],!x[12], x[31],!x[01], x[39],!x[43],!x[62],!x[25],!x[55],!x[32],!x[44],!x[02],!x[46],!x[37], x[09], x[17], x[49],!x[18],!x[34], x[40],!x[57], x[30],!x[56],!x[08]},
{!x[37],!x[42],!x[33], x[35], x[39], x[17],!x[10], x[09], x[23],!x[02],!x[60],!x[36], x[18],!x[00], x[32],!x[21],!x[27], x[34], x[20], x[53], x[48], x[15], x[55],!x[14], x[19],!x[62], x[04],!x[29], x[22],!x[28],!x[06], x[03]},
{!x[45], x[26], x[47],!x[43],!x[50],!x[11], x[57],!x[44], x[31],!x[12],!x[41], x[58],!x[13], x[25], x[07],!x[30], x[05],!x[61],!x[49],!x[59], x[46], x[01], x[52],!x[51],!x[32],!x[16],!x[24],!x[38],!x[54],!x[21], x[37], x[40]},
{ x[44],!x[30], x[52], x[62],!x[58], x[45], x[38],!x[55], x[00],!x[12], x[14], x[47],!x[13], x[39], x[53], x[40],!x[18], x[31],!x[03], x[41],!x[48],!x[27], x[09],!x[07],!x[17],!x[15],!x[56], x[20], x[16], x[51],!x[59],!x[11]},
{ x[04], x[06], x[22], x[26],!x[50], x[29],!x[34], x[19], x[60], x[05],!x[49], x[57],!x[33], x[10],!x[01], x[46], x[24],!x[08], x[25], x[23], x[35],!x[42], x[54], x[43],!x[36],!x[02],!x[31],!x[28], x[13],!x[15],!x[61],!x[53]},
{!x[27], x[44],!x[33], x[52],!x[34], x[29],!x[20],!x[35], x[01], x[11], x[60],!x[14],!x[45], x[10],!x[36], x[22],!x[21],!x[55], x[62],!x[46],!x[16],!x[26],!x[07],!x[30],!x[17], x[51], x[02],!x[48],!x[42], x[43],!x[08], x[19]},
{!x[38], x[49], x[05], x[28],!x[39], x[23],!x[57],!x[25],!x[18],!x[54],!x[61],!x[24],!x[09],!x[03], x[56], x[04], x[41], x[37], x[40],!x[47],!x[00],!x[06],!x[59],!x[50], x[51],!x[26],!x[22], x[32],!x[58], x[27],!x[12], x[48]},
{ x[12],!x[17], x[04],!x[54],!x[18], x[02],!x[35], x[44],!x[40], x[50], x[21], x[37], x[15], x[58],!x[19],!x[42], x[41], x[00],!x[47],!x[56], x[46], x[20], x[34], x[11], x[45], x[32],!x[10],!x[60], x[23],!x[62],!x[61], x[13]},
{ x[03], x[53],!x[08], x[33], x[29],!x[01], x[07],!x[28],!x[31],!x[24],!x[57], x[25], x[43],!x[49], x[55],!x[38],!x[06],!x[09], x[05],!x[59],!x[30], x[39], x[16],!x[52], x[14], x[21],!x[20],!x[36], x[13],!x[04],!x[37],!x[34]},
{!x[14],!x[36],!x[35],!x[05], x[45],!x[06],!x[54],!x[41], x[23],!x[15],!x[26],!x[49],!x[39], x[30], x[29],!x[38], x[22],!x[01], x[55],!x[43],!x[46], x[52],!x[53], x[40],!x[16], x[27],!x[00],!x[57],!x[18], x[62], x[28],!x[58]},
{ x[07], x[10], x[32], x[50], x[03],!x[12], x[59], x[09],!x[19], x[56],!x[44], x[25], x[47],!x[08], x[31],!x[51], x[17], x[42],!x[61],!x[02],!x[33], x[24], x[30], x[04],!x[43], x[48],!x[49],!x[11],!x[22],!x[29],!x[60],!x[38]},
{ x[42], x[37], x[10], x[08], x[61], x[21],!x[00], x[20], x[51], x[57],!x[33],!x[01],!x[03],!x[26],!x[32],!x[05],!x[13], x[19],!x[48],!x[31],!x[02],!x[60],!x[07],!x[36],!x[62], x[27],!x[34],!x[14], x[50],!x[16],!x[17], x[11]},
{!x[56],!x[23],!x[28], x[15], x[45], x[09],!x[24],!x[46],!x[53],!x[25],!x[52],!x[39],!x[40], x[44],!x[55], x[54], x[59],!x[21],!x[06], x[08],!x[18],!x[01],!x[11],!x[61], x[47],!x[41], x[12], x[49],!x[20],!x[29], x[35],!x[58]} };
```

A few bits get used 9 times, while the rest get used 8 times in those longs. This is because, as you know, we only have 63 bits in x[62:0]. No long uses the same bit twice (cog GETRND sources), nor does any byte (smart pin noise sources). I think the distribution is good enough that adding that parity fix wouldn't make any meaningful difference.

Here is the Spin program for the Prop1 that a I wrote to randomly come up with these patterns:
```''*********************************************
''*  Pick 16 random 32-tap sets from 63 bits  *
''*********************************************

CON

_clkmode        = xtal1 + pll16x
_xinfreq        = 5_000_000

OBJ

fds     : "FullDuplexSerial"
rr      : "RealRandom"

VAR

byte poolsize
byte pool[63]
byte uses[63]
byte taps[16*32]

PUB start | i, j, k

'start serial
fds.start(31,30,0,112500)

'start RealRandom
rr.start

'init pool
poolsize := 63
repeat i from 0 to poolsize - 1
pool[i] := i

'pick random tap sets
fds.str(string(13,13,"Picking taps...",13))
repeat j from 0 to 15
'scramble current pool
fds.str(string(13,"Pool "))
fds.hex(j,1)
scramble_pool
optimize_pool
show_pool
'fill a set of taps, track uses
repeat i from 0 to 31
taps[j*32+i] := pool[i]
uses[pool[i]]++

'output tap sets
fds.str(string(13,13,"16 tap sets in Verilog:",13,13))
repeat j from 0 to 15
fds.tx("{")
repeat i from 0 to 31
if rr.random & 1
fds.tx("!")
else
fds.tx(" ")
fds.str(string("x["))
k := taps[j*32+i]
if k < 10
fds.tx("0")
fds.dec(k)
fds.tx("]")
if i <> 31
fds.tx(",")
else
fds.str(string("},",13))

PRI scramble_pool | i, j

repeat 100
repeat i from 0 to poolsize - 1
if rr.random & 3 'mainly rotate
j := pool[i]
if i == poolsize - 1
pool[i] := pool[0]
pool[0] := j
else
pool[i] := pool[i+1]
pool[i+1] := j

PRI optimize_pool | i, j, k

repeat 10
repeat i from 0 to 31
'look for a used pool member in 0..31
if uses[pool[i]] > 0
'see if it can be swapped out with a less-used member in 32+
repeat j from 32 to poolsize - 1
if uses[pool[j]] < uses[pool[i]]
k := pool[i]
pool[i] := pool[j]
pool[j] := k
quit

PRI show_pool | i

'show pool as a bunch of characters with usage underneath, space at 32
fds.tx(13)
repeat i from 0 to poolsize - 1
fds.tx(\$40 + pool[i])
if i == 31
fds.tx(" ")
fds.tx(13)
repeat i from 0 to poolsize - 1
fds.hex(uses[pool[i]],1)
if i == 31
fds.tx(" ")
fds.tx(13)
```
• Posts: 1,755
Thanks a lot, Chip. How about Q2?
TonyB_ wrote: »
2. Is there enough time to do the 64-bit addition in a single clock?

If the answer is yes, I'm wondering whether two 16-bit additions could be done in one clock, with second using result of the first, perhaps with bespoke carry logic.