Random/LFSR on P2

TonyB_ · 2017-10-28 10:44

I've been looking again at the PractRand xoroshiro32+ scores for [14,2,7] and [15,3,6], the best two triplets right from the beginning, I believe.

Bits 0-15 normal:

			  Xoroshiro32+ sum bits, sum[0] = sum[16:0] parity
	       [15:0]  [15:1]  [15:2]  [15:8]  [11:4]   [9:2]   [8:1]   [7:0]	[15] 
====================================================================================
 [14  2  7]	512M	  1G	512M	  1G	512M	512M	128M	512M	 1G
 [15  3  6]	512M      1G    512M    512M    512M    256M    128M      1G	 1G

[14,2,7] has better top byte than bottom, [15,3,6] is the reverse and therefore [14,2,7] is the better choice. Both have better bits[7:0] than [8:1] which suggests bit 8 is not so random but it's how the bits perform as a group that matters.

Bits 1 and 9 reversed:

		          Xoroshiro32+ sum bits, sum[0] = sum[16:0] parity
	       [15:0]  [15:1]  [15:2]  [15:8]  [11:4]   [9:2]   [8:1]   [7:0]	[15] 
====================================================================================
 [14  2  7]	512M	  2G	256M	256M	 16M	512M	256M	512M	 1G
 [15  3  6]	512M	  2G	512M	512M	512M	512M	512M	  1G	 1G

I don't think we can swap arbitrary bits willy-nilly. I'd expect [15,3,6] to perform worse if other bits were swapped.

The bits that really matter to the user are the whole word, top byte, bottom byte and top bit:

		   Xoroshiro32+ sum bits
		 sum[0] = sum[16:0] parity
	       [15:0]  [15:8]  [7:0]	[15] 
============================================
 [14  2  7]	512M	 1G	512M	 1G
 [15  3  6]	512M    512M     1G	 1G

Thanks to the full parity, the bottom byte is now at least as good as the whole word and sometimes better.

Tor · 2017-10-28 10:57

TonyB_ wrote: »

Could you simply return x[ i ] (spaces to prevent italics) and get rid of prn? I only added prn to make things clear.

The nice thing about using even the simplest optimizing compiler is that it allows you to write code so that it makes things clear. The compiler will remove prn for you, so it can safely stay.

evanh · 2017-10-28 10:57

On that reasoning, if the parity bit is really that good then it probably should become the msbit and all others moved down.

TonyB_ · 2017-10-28 11:06

The only reason for making the parity bit the msb is if it gets a 2G score on its own, i.e. better than sum[15]. It would be nice to have a sum[0] test, but I hope there's a CMWC test churning away currently!

evanh · 2017-10-28 11:08

I'm watching tele

evanh · 2017-10-28 11:26

Here's a score table for the parity placed at bit 15 instead of bit 0:

  Xoroshiro32+p PractRand Score Table - Run 2017-10-29 00:20:50 +1300

Combination    Word0   Word1   Word2  msByte  Byte04   Byte2   Byte1   Byte0   msBit
=====================================================================================
 [ 1  2  8]      2M     32M     64M     32M     32M     32M     64M     32M      1G
 [ 2  1 11]      4M      8M      8M    128K    128K    128K    128K    256K      1G
 [ 2  1 15]     64K    128K    128K     64K     32K     64K     64K     32K      1G
 [ 2  2  7]      4M      8M     16M     64M     64M     32M     32M      4M      1G
 [ 2  6 15]      1M      4M      8M     64M    256K    128K    128K    128K      1G
 [ 2  8  7]      8M     16M     16M    512M     64M     32M     32M      4M      1G
 [ 2  9  9]    128K    512K    256K     64K     64K    128K    512K     64M      1G
 [ 2 11  3]      2M      4M      8M    128M    512K    256K    256K    256K      1G
 [ 3  2  6]     32M     64M     32M      8M      2M      2M      2M      1M      1G
 [ 3  3 10]     16M     32M     16M      1M    256K    256K    512K     64M      1G
 [ 3 11  2]    256K    512K    512K    128K    128K     64K     64K     64K      1G
 [ 3 11 14]      8M      8M      8M    256K    256K    256K    512K    512K      1G
 [ 4  1  9]      1M    512K      2M    512K      4M     64M    256K    512K      1G
 [ 4  7 15]     16M     64M     64M    256M    512K    512K    512K    512K      1G
 [ 4  8  5]     64M    256M    256M     16M      8M      8M      4M      2M      1G
 [ 5  2  6]     64M    256M    128M      8M      8M      8M      4M      4M      1G
 [ 5  8  4]      2M     32M     32M      4M      1M    512K    256K    128K      1G
 [ 5 14 12]    128M    128M     64M      8M      4M      4M     16M    128M      1G
 [ 6  2  3]     32M     32M     32M      1M      1M      1M      1M      2M      1G
 [ 6  2  5]      8M     16M     16M    512K    256K    256K    512K    512K      1G
 [ 6  2 11]    256M    512M    256M     64M      8M    256M    256M    128M      1G
 [ 6  3 15]     32M     32M     32M      1M      1M      1M      2M      4M      1G
 [ 6 14 11]     64M    256M    512M     64M     64M    256M    128M     16M      1G
 [ 7  1  8]    512K    128M    128M     16M     64M     64M     16M     16M      1G
 [ 7  2  2]      8M     16M     16M      2M      2M      2M      4M      4M      1G
 [ 7  2 10]    256M    512M    512M    256M    256M    512M     64M     32M      1G
 [ 7  2 14]    256M    512M    128M      8M      8M     16M     32M     32M      1G
 [ 7  8  2]    128M    512M    256M     64M     16M      4M      4M      4M      1G
 [ 7 10 10]     64M    256M    128M     32M     64M     64M    256M    128M      1G
 [ 7 15  8]    512K      2M      2M    256K    512K      1M    512K    256K      1G
 [ 8  1  7]    512K      4M      4M     16M      1M      1M      1M      1M      1G
 [ 8  2  1]      2M     32M    128M      8M      8M      8M     16M     16M      1G
 [ 8  5 13]     64M     64M     32M     32M      4M     16M    128M    128M      1G
 [ 8  7 15]    512K      2M      2M      1M      2M      2M      2M      1M      1G
 [ 8  9 13]    512M      1G    128M    512M    512M    512M    512M    128M      1G
 [ 8 15  7]    512K      1M      1M    256K    512K    512K    512K    256K      1G
 [ 9  1  4]     16M     32M     32M      8M      8M     16M     16M     16M    128M
 [ 9  2 14]    512M      2G      1G    256M    256M     64M     64M    128M      1G
 [ 9  8 14]      8M     32M     64M    256M     64M     64M     32M     64M      1G
 [ 9  9  2]     32M     64M     64M     32M     16M     16M     16M     32M      1G
 [10  2  7]     64M    256M    256M    128M     16M     16M     64M     32M      1G
 [10  3  3]     32M    128M    128M     32M     32M     32M     32M     64M      1G
 [10  3 11]    256M      2G      1G    512M    256M    256M    512M    128M      1G
 [10  5 13]    512M      2G      1G    512M    512M    128M    256M    128M      1G
 [10  7 11]    256M      1G    256M    256M    128M    128M    512M    128M      1G
 [10 10  7]     16M     16M     16M    128M     16M     16M    512M    128M      1G
 [11  1  2]    256M    512M    512M     32M     32M     32M     32M     32M      1G
 [11  1 14]    512M      2G      2G    512M     32M     16M     16M     16M      1G
 [11  2  6]    256M      1G      1G     32M     32M    512M    512M    128M      1G
 [11  3 10]      4M    128M     64M     64M     16M    256K    128K     64K      1G
 [11  7 10]      2M     16M     16M     64M     64K    128K    128K     64K      1G
 [11  8 12]    128M    512M    512M      1G    512M    512M    512M    128M      1G
 [11 10 12]     32M    256M    512M    512M    512M    512M    512M    128M      1G
 [11 11 14]     32M    128M    256M    256M     16M     16M     16M      8M      1G
 [11 14  6]    128M    512M    256M    256M     32M    512M    512M    128M      1G
 [12  4 15]    128M      1G      1G    128M     64M     16M     16M     16M      1G
 [12  8 11]      1M      8M     16M    128M    128K    128K    128K     64K      1G
 [12 10 11]      1M      4M      4M      1M    128K    128K    128K     64K      1G
 [12 10 13]     32M     64M    128M     64M     16M     16M     32M      8M      1G
 [12 14  5]    128M     64M     32M    256M    128M    128M     64M     64M      1G
 [13  3 14]    128M    512M    512M    128M     32M     16M     16M      8M      1G
 [13  5  8]    128M      1G    512M    128M    128M    128M    256M    128M      1G
 [13  5 10]    128M    128M     64M    128M    128M    256K    256K    256K      1G
 [13  9  8]    128M    128M    128M    512M    512M    512M    512M    128M      1G
 [13 10 12]      1M      4M      4M    128K    128K    128K    128K     64K      1G
 [13 11 14]     16M     16M     16M     16M      8M      8M      8M      8M      1G
 [13 12 14]     16M     16M     16M     16M      8M      8M      4M      4M      1G
 [13 13 14]     16M     16M     16M     16M      8M      8M      8M      8M      1G
 [14  1 11]     32M     64M     32M    256K    256K    256K    256K    256K      1G
 [14  2  7]    512M      1G      1G    256M    512M    512M    512M    128M      1G
 [14  2  9]    256M    512M    256M    512M    128M     64M     64M    128M      1G
 [14  3 13]      4M     16M     16M      1M      1M    256K     64K     64K      1G
 [14  8  9]    256M    128M     64M      2G     32M     64M     64M     64M      1G
 [14 11  3]    256M    512M      1G    256M    128M    128M    128M     64M      1G
 [14 11 11]     16M     32M     16M    128K    256K    256K    128K    256K      1G
 [14 11 13]    512K    512K    512K    128K    128K     64K     64K     64K      1G
 [14 12 13]    256K    512K    512K    128K    128K     64K     64K     64K      1G
 [14 13 13]    256K    512K    256K    128K    128K     64K    128K     64K      1G
 [15  1  2]      2M      2M      4M      2M    512K    512K    512K    256K      1G
 [15  3  6]    512M      2G      1G    512M    512M    512M    256M    128M      1G
 [15  4 12]     32M    128M    128M    256M     64M    512K    256K    256K      1G
 [15  6  2]     64M    256M    512M    256M     64M     64M     32M     16M      1G
 [15  7  4]    256M      2G      2G      1G    512M    256M    256M    128M      1G
 [15  7  8]     32M    128M    128M     64M     64M     16M      4M      2M      1G

evanh · 2017-10-28 11:27

It's mostly similar. Only thing of note is the msbit scores are almost entirely 1G, rather than a somewhat more mixed bag previously. And "Byte0" is now the weakest again.

Also, [15 3 6] may now just sneak past [14 2 7].

TonyB_ · 2017-10-28 11:35

Never mind that, what about CMWC?

TonyB_ · 2017-10-28 11:46

evanh wrote: »

It's mostly similar. Only thing of note is the msbit scores are almost entirely 1G, rather than a somewhat more mixed bag previously. And "Byte0" is now the weakest again.

Also, [15 3 6] may now just sneak past [14 2 7].

sum[0] = sum[16:0] parity:

		   Xoroshiro32+ sum bits
		 sum[0] = sum[16:0] parity
	       [15:0]  [15:8]  [7:0]	[15] 
============================================
 [14  2  7]	512M	 1G	512M	 1G
 [15  3  6]	512M    512M     1G	 1G

sum[15] = sum[16:0] parity, other bits shifted right:

		   Xoroshiro32+ sum bits
		 sum[15] = sum[16:0] parity
                   sum[15:1] -> sum[14:0]
	       [15:0]  [15:8]  [7:0]	[15] 
============================================
 [14  2  7]	512M	256M	128M	 1G
 [15  3  6]	512M    512M    128M	 1G

Parity on its own get a 1G so bits 31 or 16 or 15 or 0 of XORO32 output could all be used for a high-quality random bit. Low byte not so good now, therefore best to keep things as they are with parity in bit 0.

evanh · 2017-10-28 12:10

Lol, I've totally forgotten how to use PractRand by hand ... Anyway, result of CMWC8 scoring is 32M. Attached is the raw output of PractRand.

TonyB_ · 2017-10-28 12:20

Thanks, Evan. Apologies for the inconvenience. In a word, then, CMWC8 is worse than top or bottom byte of xoroshiro32+ [14,2,7]?

evanh · 2017-10-28 12:27

Yep. Though, it's probably not very fair comparing to an 8-bit generator. That said, there is substantially more state to CMWC8 compared to Xoroshiro32.

I'd imagine Melissa's geometric tests will be more favourable to CMWC.

TonyB_ · 2017-10-28 12:39

I was told this 8-bit CMWC is "is in an entirely different league" and it is, a much inferior league!

David Roberts has also used CMWC:
https://forum.powerbasic.com/forum/user-to-user-discussions/source-code/51476-complementary-multiply-with-carry-with-lag-prng

Other relevant posts of his:
https://forum.powerbasic.com/forum/user-to-user-discussions/programming/749040-xorshift128-prng
https://forum.powerbasic.com/forum/user-to-user-discussions/source-code/749403-xorshift-prng

TonyB_ · 2017-10-28 12:47

"Random Number Generators" by George Marsaglia, 2003
Download from http://digitalcommons.wayne.edu/jmasm/vol2/iss1/2/

evanh · 2017-10-28 12:49

Here's some initial output of my CMWC8 if you'd like to compare at all.

TonyB_ · 2017-10-28 13:02

evanh wrote: »

I wonder if Mr Robert's parity calculation includes the carry bit. That is what made the big improvement to the parity idea.

Including the carry was another bit of inspired thinking by Chip. I was moving in the other direction with only sum[15:1].

TonyB_ wrote: »

I have found a trick that apparently makes all the bits of the sum pass the binary-rank test:
https://forum.powerbasic.com/forum/user-to-user-discussions/source-code/749403-xorshift-prng?p=751037#post751037 (post #8).

You can see the parity calculation in x86 code in post #9 (for some reason my browser cannot go directly to these posts). I think the carry is not included.

I'll try to get in touch with David Roberts and the inventors of the Xoroshiro+ algorithm over the weekend to tell them what we have done to improve it.

TonyB_ · 2017-10-28 13:05

evanh wrote: »

Here's some initial output of my CMWC8 if you'd like to compare at all.

Thanks Evan. Our first 16 values agree and that's when I stopped!

TonyB_ · 2017-10-28 13:11

Summary of xoroshiro+ enhancement / improvement:

* Replace bit 0 of sum with parity of all sum bits including the carry bit

* Parity is at worst as random as the most-significant bit, the original suggestion for a random Boolean value, from PractRand tests

evanh · 2017-10-28 13:38

TonyB_ wrote: »

evanh wrote: »

I wonder if Mr Robert's parity calculation includes the carry bit. That is what made the big improvement to the parity idea.

Including the carry was another bit of inspired thinking by Chip. I was moving in the other direction with only sum[15:1].

TonyB_ wrote: »

I have found a trick that apparently makes all the bits of the sum pass the binary-rank test:
https://forum.powerbasic.com/forum/user-to-user-discussions/source-code/749403-xorshift-prng?p=751037#post751037 (post #8).

You can see the parity calculation in x86 code in post #9 (for some reason my browser cannot go directly to these posts). I think the carry is not included.

You might have to give me some pointers on where to look in that code. I don't see any mention of parity or carry in that source. The xoroshiro128+ assembly ends at the summing.

PowerBASIC and x86 asm are entirely foreign to me.

evanh · 2017-10-28 13:48

TonyB_ wrote: »

* Parity is as random as the most-significant bit, the original suggestion for a random Boolean value, from PractRand tests

I'd put it as slightly better on the basis of consistency. bit15 wasn't anywhere the same consistency across all the full period triplets.

evanh · 2017-10-28 14:06

TonyB_ wrote: »

I'll try to get in touch with David Roberts and the inventors of the Xoroshiro+ algorithm over the weekend to tell them what we have done to improve it.

Sebastiano Vigna is aware of our work and is mildly amused at a poky 32-bit state. That said, he might be interested in the parity mod though.

EDIT: Actually, he probably deliberately didn't include a parity step because it's so processing costly for a CPU to execute without a custom instruction.

potatohead · 2017-10-28 16:34

Great, interesting work. I've read with interest.

I like that you have agreement across a PC, P2 and a Z80.

Had I the time, I would love to add 6809, 6502 to that list.

Chip says, "so many neat things in P2", inspired by people's thoughts.

Yes.

Will be so very interesting to see what people do with it. Could be next year!

Random numbers are weird. Learned some new concepts following this work. Thank you.

Heater. · 2017-10-28 17:31

Spud,

I don't understand what you say there. I would like to think that any PRNG algorithm is independent of whatever machine runs it. P2, Z80, 6809, whatever.

Actually I have not understood this whole thread.

Seems to be a lot of random tweaking of PRNG generators and expecting the output is more "random" and has long periods.

Why not just use the algorithms as published that have been analysed already?

TonyB_ · 2017-10-28 18:19

Heater. wrote: »

Spud,

I don't understand what you say there. I would like to think that any PRNG algorithm is independent of whatever machine runs it. P2, Z80, 6809, whatever.

I think potatohead is amused at the different ways of testing the new and improved XORO32 algorithm: PASM2, Z80, QuickBASIC and C.

Even though we're suited, I'm slightly surprised nobody has done it in Forth.

Heater. wrote: »

Actually I have not understood this whole thread.

Seems to be a lot of random tweaking of PRNG generators and expecting the output is more "random" and has long periods.

Why not just use the algorithms as published that have been analysed already?

The P2 needed a PRNG and because of this thread it has the best one there is as of October 2017, in terms of ease of implementation, ease of use, small state and high quality.

Until yesterday, we had to sum the XORO32 output to get only 15 high-quality bits as bit 0 was rubbish. Since yesterday, bit 0 is the most random of the lot and XORO32 gives us 32-bit PRNG bits directly.

There is still work to do, though. I want to get the jump function working, which will help us select non-overlapping sequences. As you can speak geek, perhaps you could help with the C code below. What does i < sizeof JUMP / sizeof *JUMP mean?

/* This is the jump function for the generator. It is equivalent
   to 2^64 calls to next(); it can be used to generate 2^64
   non-overlapping subsequences for parallel computations. */

void jump(void) {
	static const uint64_t JUMP[] = { 0xbeac0467eba5facb, 0xd86b048b86aa9922 };

	uint64_t s0 = 0;
	uint64_t s1 = 0;
	for(int i = 0; i < sizeof JUMP / sizeof *JUMP; i++)
		for(int b = 0; b < 64; b++) {
			if (JUMP[i] & 1ULL << b) {
				s0 ^= s[0];
				s1 ^= s[1];
			}
			next();
		}

	s[0] = s0;
	s[1] = s1;
}

potatohead · 2017-10-28 18:20

Correct Tony. Was just fun.

Conga · 2017-10-28 19:16

TonyB_ wrote: »

perhaps you could help with the C code below. What does i < sizeof JUMP / sizeof *JUMP mean?

(sizeof ARRAY_VARIABLE / sizeof ARRAY_VARIABLE[0]), or
(sizeof ARRAY_VARIABLE / sizeof *ARRAY_VARIABLE),
gives the the number of elements in the array variable.
It's idiomatic C.

TonyB_ · 2017-10-28 20:21

Conga wrote: »

TonyB_ wrote: »

perhaps you could help with the C code below. What does i < sizeof JUMP / sizeof *JUMP mean?

(sizeof ARRAY_VARIABLE / sizeof ARRAY_VARIABLE[0]), or
(sizeof ARRAY_VARIABLE / sizeof *ARRAY_VARIABLE),
gives the the number of elements in the array variable.
It's idiomatic C.

Thanks for the reply.

So sizeof JUMP and sizeof *JUMP are in effect the same thing? In this case 64?

evanh · 2017-10-28 22:29

sizeof JUMP is the entire array
sizeof *JUMP is just one element

Conga gave an alternative that I find way clearer in intent: sizeof JUMP / sizeof JUMP[0]

evanh · 2017-10-28 22:31

In this case it resolves to a constant of 2. The outer for() loop only does two iterations.

And normally you'd have {} brackets following a for() but since there isn't the subsequent indentation is a hint that that for() loop only contains the singular inner for() loop. Ie; s[0] = s0; s[1] = s1; are executed only after all loops are completed.

evanh · 2017-10-28 22:50

TonyB_ wrote: »

Until yesterday, we had to sum the XORO32 output to get only 15 high-quality bits as bit 0 was rubbish. Since yesterday, bit 0 is the most random of the lot and XORO32 gives us 32-bit PRNG bits directly.

Heater is doing a whine about the whole effort around XORO32, not just our recent work. He thinks what we have now is rubbish and will laughed at when someone that understands the deep theory has a look at it.

Sound about right, Heater?

Random/LFSR on P2

Comments