Hey, what if we made another sum from the two 16-bit halves, but did so using their bits in reverse, then we took those two sums and combined them in some way?
Or, we could just concatenate the upper 8 bits of each sum.
This way, all result bits are the sum of at least two whole bytes (no thinly-veiled LFSR bits showing through). Also, the two bytes that make up the result word are sums of different sets of whole bits, with the fractional bits being each other's whole bits.
Chip, xoroshiro+ can run backwards using same amount of logic as forwards: two rotates, one shift and one add. Would that be useful? One of the spare opcode bits could select the direction.
Chip, xoroshiro+ can run backwards using same amount of logic as forwards: two rotates, one shift and one add. Would that be useful? One of the spare opcode bits could select the direction.
Yes, maybe. And we'd only need to run the iterator backwards. The sum can be computed the same.
Wait, there is some shifting, not just rotating in the iteration step. Shifting is not reversible. Are you sure?
Chip, xoroshiro+ can run backwards using same amount of logic as forwards: two rotates, one shift and one add. Would that be useful? One of the spare opcode bits could select the direction.
Yes, maybe. And we'd only need to run the iterator backwards. The sum can be computed the same.
Wait, there is some shifting, not just rotating in the iteration step. Shifting is not reversible. Are you sure?
Yes.
You XOR the same shift going backwards to counteract the XOR of that shift going forwards.
I think the first test to do next should be aimed at improving bit 1 as shown in the last line above. We know that 16-bit parity improves bit 0 and it might do the same for bit 1. We can't keep going to the parity well after that because 15-bit parity or less won't be random.
If [15:0] and [7:0] scores are the same, bit 1 will need testing on its own, before and after, to detect any difference.
The significant thing seems to be that there are at least a few stages of carry underneath the high-quality bits, or the bits are affected by what amounts to a few stages of carry - the more the better.
I'm anxious to see if Evanh can run the test on the normal and reverse-field adds, where every bit of the final result has at least 8 stages of carry under it.
;Reverse xoroshiro algorithm for triplet [a,b,c]
x = s1 ROR c ;x = s0 XOR s1
s0 := (s0 XOR (x SHL b) XOR x) ROR a
s1 := s0 XOR x
I think reverse code needs changing to this:
;Reverse xoroshiro algorithm for triplet [a,b,c]
x = s1 ROR c ;x = s0 XOR s1
s0 := (s0 XOR (x SHL b) XOR x) ROR a
s1 := ((s0 XOR (x SHL b) XOR x) ROR a) XOR x
The significant thing seems to be that there are at least a few stages of carry underneath the high-quality bits, or the bits are affected by what amounts to a few stages of carry - the more the better.
I'm anxious to see if Evanh can run the test on the normal and reverse-field adds, where every bit of the final result has at least 8 stages of carry under it.
It's not a case of either/or, just the order of the tests.
LFSR artifacts have gone from bit 0 now. Try to do the same for bit 1 where they are not so bad, so if any improvement is only a small one that's all we need, probably.
Two PRNGs could start off with the same seed in different directions. They wouldn't meet again for 2^31 or 2^31-1 iterations, the maximum gap possible. If they use all 32-bits as one PRN, the high word in one would be the low word in the other due to the odd period.
Comments
I thought of saying that - good job I didn't!
Adding scrambles the bits to a certain extent and this latest idea unscrambles them, it seems.
Xoroshiro32+p PractRand Score Table - Run 2017-11-01 03:39:40 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 Byte0 msBit ===================================================================================== [ 1 2 8] 8K 4K 4K 2K 16M 64M 512M 2M 1K [ 2 1 11] 16K 4K 4K 2K 256K 256K 512K 2M 1K [ 2 1 15] 16K 4K 4K 4K 128K 128K 128K 128K 1K [ 2 2 7] 8K 4K 4K 4K 16M 64M 128M 2M 1K [ 2 6 15] 8K 2K 4K 4K 256K 256K 256K 128K 1K [ 2 8 7] 16K 4K 4K 4K 16M 64M 128M 2M 1K [ 2 9 9] 16K 4K 2K 4K 512K 256M 2G 2M 1K [ 2 11 3] 8K 4K 2K 2K 256K 256K 256K 256K 1K [ 3 2 6] 8K 2K 2K 4K 16M 8M 4M 1M 1K [ 3 3 10] 8K 4K 4K 2K 1M 512M 1G 2M 1K [ 3 11 2] 8K 4K 4K 4K 128K 128K 128K 128K 1K [ 3 11 14] 4K 4K 2K 2K 512K 512K 1M 1M 1K [ 4 1 9] 8K 4K 4K 4K 256M 256M 64M 2M 1K [ 4 7 15] 16K 4K 4K 2K 1M 1M 1M 1M 1K [ 4 8 5] 16K 4K 2K 2K 16M 16M 8M 2M 1K [ 5 2 6] 8K 2K 4K 4K 16M 16M 16M 2M 1K [ 5 8 4] 16K 4K 2K 4K 2M 256K 128K 256K 1K [ 5 14 12] 16K 4K 4K 4K 8M 256M 2G 2M 1K [ 6 2 3] 8K 2K 2K 4K 2M 2M 4M 2M 1K [ 6 2 5] 8K 2K 2K 2K 512K 1M 1M 1M 1K [ 6 2 11] 8K 4K 4K 2K 256M 256M 512M 2M 1K [ 6 3 15] 8K 2K 2K 1K 2M 4M 8M 2M 1K [ 6 14 11] 16K 4K 4K 4K 64M 256M 512M 2M 1K [ 7 1 8] 16K 4K 4K 4K 64M 256M 512M 2M 1K [ 7 2 2] 8K 1K 1K 4K 4M 8M 8M 2M 1K [ 7 2 10] 16K 4K 4K 4K 256M 128M 1G 2M 1K [ 7 2 14] 8K 4K 4K 4K 16M 1G 1G 2M 1K [ 7 8 2] 16K 2K 2K 2K 8M 8M 8M 2M 1K [ 7 10 10] 8K 4K 2K 4K 2M 64M 256M 2M 1K [ 7 15 8] 8K 4K 4K 4K 16M 256M 128M 2M 1K [ 8 1 7] 16K 4K 4K 2K 64M 512M 256M 2M 1K [ 8 2 1] 8K 4K 4K 2K 16M 32M 512M 2M 1K [ 8 5 13] 8K 4K 4K 4K 512M 2G 1G 2M 1K [ 8 7 15] 16K 4K 4K 4K 64M 512M 512M 2M 1K [ 8 9 13] 8K 4K 4K 4K 512M 256M 1G 2M 1K [ 8 15 7] 16K 4K 4K 4K 4M 128M 128M 2M 1K [ 9 1 4] 16K 4K 4K 2K 32M 1G 512M 2M 1K [ 9 2 14] 4K 1K 1K 1K 1G 128M 512M 2M 1K [ 9 8 14] 16K 4K 4K 4K 64M 128M 128M 2M 1K [ 9 9 2] 16K 4K 4K 4K 64M 512M 512M 2M 1K [10 2 7] 8K 4K 4K 4K 64M 256M 256M 2M 1K [10 3 3] 16K 4K 4K 4K 64M 1G 512M 2M 1K [10 3 11] 16K 2K 4K 2K 32M 1G 512M 2M 1K [10 5 13] 16K 4K 4K 4K 256M 1G 512M 2M 1K [10 7 11] 8K 4K 4K 2K 256M 512M 512M 2M 1K [10 10 7] 8K 4K 4K 2K 16M 64M 256M 2M 1K [11 1 2] 8K 4K 2K 2K 64M 64M 64M 2M 1K [11 1 14] 16K 4K 4K 4K 16M 512M 512M 2M 1K [11 2 6] 16K 4K 4K 4K 64M 1G 1G 2M 1K [11 3 10] 16K 4K 2K 2K 8M 256K 128K 64K 1K [11 7 10] 16K 4K 4K 2K 4M 256K 64K 64K 1K [11 8 12] 16K 4K 4K 4K 128M 1G 512M 2M 1K [11 10 12] 8K 4K 4K 4K 128M 1G 512M 2M 1K [11 11 14] 8K 4K 4K 4K 16M 512M 512M 2M 1K [11 14 6] 8K 2K 2K 2K 64M 1G 1G 2M 1K [12 4 15] 8K 4K 2K 4K 128M 32M 32M 2M 1K [12 8 11] 8K 4K 4K 4K 256K 256K 128K 128K 1K [12 10 11] 8K 4K 2K 2K 256K 256K 128K 128K 1K [12 10 13] 8K 4K 2K 4K 512M 512M 512M 2M 1K [12 14 5] 16K 4K 4K 4K 32M 128M 128M 2M 1K [13 3 14] 8K 2K 4K 2K 64M 32M 16M 2M 1K [13 5 8] 8K 4K 4K 4K 256M 1G 256M 2M 1K [13 5 10] 16K 4K 4K 4K 16M 2M 1M 128K 1K [13 9 8] 4K 2K 4K 4K 128M 128M 128M 2M 1K [13 10 12] 8K 2K 2K 2K 256K 256K 128K 128K 1K [13 11 14] 8K 4K 4K 2K 32M 32M 16M 2M 1K [13 12 14] 16K 4K 4K 2K 32M 32M 16M 2M 1K [13 13 14] 8K 4K 4K 2K 32M 32M 16M 2M 1K [14 1 11] 8K 4K 4K 2K 512K 512K 512K 256K 1K [14 2 7] 16K 4K 4K 2K 64M 2G 1G 2M 1K [14 2 9] 16K 4K 4K 4K 512M 256M 256M 2M 1K [14 3 13] 16K 4K 4K 2K 512K 128K 128K 64K 1K [14 8 9] 16K 4K 4K 4K 128M 64M 128M 2M 1K [14 11 3] 8K 4K 4K 2K 256M 256M 256M 2M 1K [14 11 11] 8K 2K 4K 4K 512K 512K 512K 128K 1K [14 11 13] 16K 4K 4K 2K 128K 128K 128K 64K 1K [14 12 13] 16K 8K 8K 4K 128K 128K 128K 64K 1K [14 13 13] 16K 4K 4K 1K 128K 128K 128K 64K 1K [15 1 2] 8K 4K 4K 2K 64M 64M 64M 2M 1K [15 3 6] 16K 4K 4K 4K 64M 1G 1G 2M 1K [15 4 12] 8K 4K 4K 4K 16M 512K 512K 256K 1K [15 6 2] 8K 2K 4K 4K 256M 128M 64M 2M 1K [15 7 4] 16K 4K 4K 4K 1G 512M 512M 2M 1K [15 7 8] 8K 4K 4K 2K 128M 1G 1G 2M 1K
Or, we could just concatenate the upper 8 bits of each sum.
I think you're right about the unscrambling.
sum_a[15:0] = state[31:16] + state[15:0]
sum_b[15:0] = state[16:31] + state[0:15]
result[15:0] = {sum_a[15:8], sum_b[15:8]}
This way, all result bits are the sum of at least two whole bytes (no thinly-veiled LFSR bits showing through). Also, the two bytes that make up the result word are sums of different sets of whole bits, with the fractional bits being each other's whole bits.
Yes, maybe. And we'd only need to run the iterator backwards. The sum can be computed the same.
Wait, there is some shifting, not just rotating in the iteration step. Shifting is not reversible. Are you sure?
Yes.
You XOR the same shift going backwards to counteract the XOR of that shift going forwards.
What does the code look like that does that?
I'm not so good with C-style operators. I hope this pseudo-code is right.
;Reverse xoroshiro algorithm for triplet [a,b,c] x = s1 ROR c ;x = s0 XOR s1 s0 := (s0 XOR (x SHL b) XOR x) ROR a s1 := s0 XOR x
Added forward code as a check.Xoroshiro32+ [14,2,7] sum bits [15:0] [15:8] [7:0] [15] ==================================================================================== 512K 1G 1G sum[0] = sum[0] original algorithm 16M 1G 1G sum[0] = sum[15:0] parity 512M 1G 512M 1G sum[0] = sum[16:0] parity ? 1G ? 1G sum[0] = sum[16:0] parity, sum[1] = sum[16:1] parity
I think the first test to do next should be aimed at improving bit 1 as shown in the last line above. We know that 16-bit parity improves bit 0 and it might do the same for bit 1. We can't keep going to the parity well after that because 15-bit parity or less won't be random.
If [15:0] and [7:0] scores are the same, bit 1 will need testing on its own, before and after, to detect any difference.
I'm anxious to see if Evanh can run the test on the normal and reverse-field adds, where every bit of the final result has at least 8 stages of carry under it.
I think reverse code needs changing to this:
;Reverse xoroshiro algorithm for triplet [a,b,c] x = s1 ROR c ;x = s0 XOR s1 s0 := (s0 XOR (x SHL b) XOR x) ROR a s1 := ((s0 XOR (x SHL b) XOR x) ROR a) XOR x
It's not a case of either/or, just the order of the tests.
LFSR artifacts have gone from bit 0 now. Try to do the same for bit 1 where they are not so bad, so if any improvement is only a small one that's all we need, probably.
Reversing something like this seems complicated. And that shift still bothers me.
If you can go forward 100 iterations and then backward 100 iterations, and wind up where you started, I guess you've got it figured out.
Chip,
64K iterations backwards from state $0000_0001 gives state $0A5A_9287 and you could try that forwards.
Somehow, I'm getting $0001_0009 after 64k iterations, which is 32k XORO32 instructions, right?
Chip, what state do you get after 1024 XORO32's from state $0000_0001?
I'm not at my computer, anymore, for the next few hours. When I get home I'll check.
As xoroshiro32+ can run in reverse on the Z80, I'm sure it would be possible on the P2!
Another thought:
CZ opcode bits in XORO32 could select forward/reverse iterations and bits in each word normal/reversed.
I've just noticed that $0001_0009 is the 32-bit sum, not the state which should be $0000_0001. Nothing was wrong at my end.
Yes! I was thinking that was probably the case on the way back home. I was going to try it out.
So, it looks like it IS possible to reverse our PRNG pretty easily.
Now, I'm wondering if it's useful to do so. Any ideas?