Random/LFSR on P2

evanh · 2017-11-11 02:32

How did you come you up with those jump keys?

There will be a way to do the jumping using a double iterator function. But it's not like there is a great need for this with a Xoroshiro32. It's just too small a state space. At 160 MHz and using the REP instruction, a cog can iterate the XORO32 instruction 64k states in 0.4096 ms. So, about 13.1 ms to produce those 32 seeds above.

TonyB_ · 2017-11-11 13:30

TonyB_ wrote: »
Jump functions work with xoroshiro32. It is possible to jump by 64K or 128K states with only 32 single iterations plus some XORs according to the jump bit patterns:
JUMP_64K  = $1A57604D = %00011010010101110110000001001101
JUMP_128K = $2FD3E1B9 = %00101111110100111110000110111001

Bit 31 is the first bit to test. 
The jump function needs single-stepping of states which XORO32 can no longer do, unless it is modified. The easiest way, avoiding use of CZ opcode bits, would be to move XORO32 to an empty D,S slot with S selecting single or double iterations.

evanh wrote: »

How did you come you up with those jump keys?

There will be a way to do the jumping using a double iterator function. But it's not like there is a great need for this with a Xoroshiro32. It's just too small a state space. At 160 MHz and using the REP instruction, a cog can iterate the XORO32 instruction 64k states in 0.4096 ms. So, about 13.1 ms to produce those 32 seeds above.

Any jump, no matter how big, would take under 2 µs at 160 MHz. Here's a good one:

JUMP_1061939636 = FFFFFFFF

1,061,939,636 is 2^30 approx.

EDIT:
Changed big jump above.

TonyB_ · 2017-11-11 14:15

The jump function is quite good fun, if nothing else.

I've iterated xoroshiro32+ for the full 2^32-1 period to check each bit of the sum has one more '1' than '0' (as zero state is invalid). This is guaranteed with the original algorithm but we have changed bit 0 and I tested the other bits anyway. Each of sum[16:0], the parity of sum[15:0] and the parity of sum[16:0] do indeed have one more '1'.

evanh · 2017-11-11 22:49

Clearly, you have some code or tool at your disposal for jump key generation for the regular version of Xoroshiro. I wonder what it would take to modify that to handle a double iterating version like XORO32?

BTW: I tested the REP idea and it functionally works without error and at estimated speed. The S port insertion at the ALU doesn't corrupt the working state.

evanh · 2017-11-11 22:58

Chip,
Many thanks for the CORDIC! It's so painless and convenient to use that you'd hardly know it wasn't in the Cog.

cgracey · 2017-11-11 23:15

evanh wrote: »

Chip,
Many thanks for the CORDIC! It's so painless and convenient to use that you'd hardly know it wasn't in the Cog.

Thanks. With the built-in K-factor compensation, it really became nice to use. To preserve the compensation and meet timing for OnSemi, I had to add 16 stages to it. That seemed like a lot, but to have lost the compensation would have been huge. That thing does stuff (rotate, vector, log, exp) that would take very lengthy routines to accomplish. Having those things in hardware is like sci-fi, to me. You can put a few instructions together and do amazing things. We really need the silicon with analog I/O to demonstrate what can be done with the CORDIC.

evanh · 2017-11-11 23:46

/me ponders what is K-factor ...

evanh · 2017-11-12 00:01

Tony,
For me, the most use I'd put a key generator to is finding all the full-period triplets of larger word sizes, eg: Xoroshiro48. By the looks it can generate any arbitrary position in the state sequence.

Though, we'd have to prove behaviour of when jumping past the end of a given period.

cgracey · 2017-11-12 00:02

evanh wrote: »

/me ponders what is K-factor ...

https://en.wikipedia.org/wiki/CORDIC

With each CORDIC iteration, X and Y grow. With infinite iterations, you get a fixed growth factor of:

$934c4fa1acdf720afcee41a2a26cbe539bba608a$

After maybe a dozen iterations, you are pretty much there. We've got 32 iterations.

Anyway, this growth needs to be compensated for by multiplying the X and Y results by ~0.607. I made a BASIC program to find at what stages we could subtract a right-shifted amount of the current value from the current value, in order to arrive at the right compensation. It took 16 of these simple subtractors, but they each needed a stage, as I couldn't stack up two 40-bit adders in a single clock cycle. The reason the adders are 40 bits, instead of 32, is so that 8 fractional bits can be maintained. With those fractional bits and rounding, you get binary-perfect results.

evanh · 2017-11-12 00:32

Huh, that one is not in the Wikipedia list of disambiguous K-Factors. I'll have a go at adding it ... I put an entry in the talk page instead. I think it would have needed a separate article otherwise.

cgracey · 2017-11-12 00:37

evanh wrote: »

Huh, that one is not in the Wikipedia list of disambiguous K-Factors. I'll have a go at adding it ...

And don't forget about Circle-K and Special-K. Then there's K-Kauppa in Finland.

TonyB_ · 2017-11-12 00:38

evanh wrote: »

How did you come you up with those jump keys?

Brute force and ignorance of how to do it properly. Choose a start state, then iterate xoroshiro32+ the desired jump size to get an end state. Using the start state, run the jump function and test whether the resulting jump state is the end state for every possible 32-bit pattern until a match is found or not.

If the pattern bits are shifted out one at a time, the jump function can exit early when the remaining bits are all zero to save time. Finding the 64K and 128K jump bit patterns was quick as they are fairly close to the starting bit pattern count of zero, however I couldn't find any matching 32-bit pattern for 256K and 512K jumps, which took quite a while to fail.

evanh wrote: »

Clearly, you have some code or tool at your disposal for jump key generation for the regular version of Xoroshiro. I wonder what it would take to modify that to handle a double iterating version like XORO32?

The tool was MASM and the code x86 assembler. The jump function needs single-stepping through the states, as '1' bits could be anywhere in the bit pattern, not just the even bits. I think a double-iterating jump function is impossible. XORO32 that can double- and single-step is possible.

evanh · 2017-11-12 00:57

TonyB_ wrote: »

... whether the resulting jump state is the end state for every possible 32-bit pattern until a match is found or not.

Bugger. You had my hopes up there for a while!

TonyB_ · 2017-11-12 01:50

How about a single-iterated XORO64 using an empty D,S instruction slot and the same next S replacement trick as XORO32?

	XORO64	s0,s1		' s0,s1 = 64-bit state
	MOV	prn,0		' prn = 32-bit PRN

Period is 2^64-1 and it will work!

cgracey · 2017-11-12 01:56

TonyB_ wrote: »
How about a single-iterated XORO64 using an empty D,S instruction slot and the same next S replacement trick as XORO32?
	XORO64	s0,s1		' s0,s1 = 64-bit state
	MOV	prn,0		' prn = 32-bit PRN
Period is 2^64-1 and it will work!

We can only write one register per instruction, not two.

TonyB_ · 2017-11-12 02:02

cgracey wrote: »
TonyB_ wrote: »
How about a single-iterated XORO64 using an empty D,S instruction slot and the same next S replacement trick as XORO32?
	XORO64	s0,s1		' s0,s1 = 64-bit state
	MOV	prn,0		' prn = 32-bit PRN
Period is 2^64-1 and it will work!
We can only write one register per instruction, not two.

Yes, I know.

evanh · 2017-11-12 02:34

The biggest problem there is we have no way to find even one full-period Xoroshiro64 candidate!

cgracey · 2017-11-12 02:37

Hey, if we could come up with some jump-function know-how, we could leap ahead by 2^32 and do tests.

TonyB_ · 2017-11-12 02:44

TonyB_ wrote: »
How about a single-iterated XORO64 using an empty D,S instruction slot and the same next S replacement trick as XORO32?
	XORO64	s0,s1		' s0,s1 = 64-bit state
	MOV	prn,0		' prn = 32-bit PRN
Period is 2^64-1 and it will work!

Another instruction is required:

	SUB	prn,s0		' prn = s1
	XORO64	s0,prn		' s0,prn = 64-bit state
	MOV	prn,0		' prn = 32-bit PRN = s0 + s1

We need only two registers, for 32 bits of state and 32 bits of PRN.
Period is over 20000 years for six clocks at 160 MHz.

Only outstanding issue is parity for bit 0. XORO64 could report changed bit 0 from normal addition in the carry flag.

evanh · 2017-11-12 02:58

Well, the Prop can already do a 32-bit parity in a single instruction. So if we went back to the first incarnation of XORO, and have the second half of the state being forwarded on to the following instruction's S-port, then a XORO64 would work quite well.

Still don't have any way to find full-period triplets though.

cgracey · 2017-11-12 03:04

So, we would need to do a separate addition to get the result? The "next s" trick would be used to update one of the 32-bit states?

evanh · 2017-11-12 03:40

Yep, but the whole state at once. Here's an example snippet for using it:

' iterate the state
	xoro64   s0,s1            ' xoro64 directly updates s0 itself and
	mov      s1,0             '   passes s1 to the following instruction

' perform the hash (3 instructions with a shift to remove bit 0)
	mov      prn,s0
	add      prn,s1     wc    ' primary summing

' extend the hash to 32 bits (+3 instructions)
	bitc     carry,#0         ' save the summed carry
	or       prn,#0     wc    ' generate parity into C flag
	testb    carry,#0   xorc  ' include the carry in the parity
'	bitc     prn,#0           ' replace bit 0 with the new parity
	rcr      prn,#1           '   or shift parity into bit 31

EDIT: Ah, I see, 'next' is next instruction.
EDIT2: Improved commenting.

cgracey · 2017-11-12 04:46

What we have now is ideal, I think, and it's easy-to-use as can be. 32 bits is plenty for most uses, I imagine.

And if a *real* random number is needed, you can always just do a GETRND, without any contingencies.

evanh · 2017-11-12 05:05

Yep. It's academic anyway, we can't produce the needed triplet without getting the Xoroshiro authors to help us out.

evanh · 2017-11-12 12:07

TonyB_ wrote: »
TonyB_ wrote: »
Sebastiano Vigna kindly emailed me all the xoroshiro32 characteristic polynomials, essential info for creating jump functions. The weight is the number of terms in the polynomial and in theory close to 16 is preferable. Our best triplet [14,2,7] has weight of 11 and second best [15,3,6] has weight of 13.

#a,  b,  c, weight,	xoroshiro32 characteristic polynomial

 ...

 7,  2, 14,	11,	x^32 + x^28 + x^26 + x^24 + x^18 + x^14 + x^12 + x^11 + x^7  + x^5  + 1
14,  2,  7,	11,	x^32 + x^28 + x^26 + x^24 + x^18 + x^14 + x^12 + x^11 + x^7  + x^5  + 1

 ...

 6,  3, 15,	13,	x^32 + x^28 + x^24 + x^23 + x^19 + x^18 + x^14 + x^12 + x^9  + x^6  + x^5  + x^3  + 1
15,  3,  6,	13,	x^32 + x^28 + x^24 + x^23 + x^19 + x^18 + x^14 + x^12 + x^9  + x^6  + x^5  + x^3  + 1

 ...
I hadn't looked closely at all the full-period triplets before. For every [a,b,c] there is a corresponding [c,b,a] that shares the same characteristic polynomial. Knowing this in advance means brute force searches can be reduced by half.

Hey Tony, thanks for getting that. It perfectly matches our 84 brute forced findings. I only just really looked at it now.

And the recommendation of targeting only the ones of closely matching "degree" with word size will be valuable with what's coming up ... I've spoken with Sebastiano again and, same as you, I just had to ask and he immediately popped me a complete list of full-period candidates for a Xoroshiro64 iterator! Amazing stuff. It isn't nicely sorted in mirror pairs like your one was though.

evanh · 2017-11-12 12:14

Attached is the complete list of full-period triplet candidates for a Xoroshiro64. Not so academic any longer! O_o

TonyB_ · 2017-11-12 12:25

I have asked Sebastiano Vigna for the xoroshiro[32]64 characteristic polynomials.

EDIT:
Written before previous two posts appeared.

evanh · 2017-11-12 12:29

Hmm, 250 in total, and restricting the candidate set to a degree range of 27-33 it is still 110 candidates. I could halve that by taking out all the mirrors but not too keen on doing that just yet.

Restricting the degree range further to 29-33 drops that to 66 candidates.

TonyB_ · 2017-11-12 12:32

Unfortunate timing of posts here led to my bothering Sebastiano Vigna.

TonyB_ · 2017-11-12 12:44

TonyB_ wrote: »
TonyB_ wrote: »
How about a single-iterated XORO64 using an empty D,S instruction slot and the same next S replacement trick as XORO32?
	XORO64	s0,s1		' s0,s1 = 64-bit state
	MOV	prn,0		' prn = 32-bit PRN
Period is 2^64-1 and it will work!
Another instruction is required:
	SUB	prn,s0		' prn = s1
	XORO64	s0,prn		' s0,prn = 64-bit state
	MOV	prn,0		' prn = 32-bit PRN = s0 + s1
We need only two registers, for 32 bits of state and 32 bits of PRN.
Period is over 20000 years for six clocks at 160 MHz.

Only outstanding issue is parity for bit 0. XORO64 could report changed bit 0 from normal addition in the carry flag.

I think my XORO64 idea has not been understood fully. There is no need to save s1 separately because prn = s0 + s1. The three instructions above are the entire code if original sum[0] is acceptable. If XORO64 could flag when sum[0] and full parity are different (Z might be better choice than C), then code would increase to five instructions.

Random/LFSR on P2

Comments