What if you took the parity of the sum[16:1] and XOR'd it into the sum LSB, then used sum[15:0]?
Wow! That's perfect. I'm watching the log file at the moment and there is already 512M scores for word0 ...
EDIT: A detail, I've actually parity'd all bits, sum[16:0]
EDIT2: Doh! I see that is the same as yours.
If it winds up looking good, then do two iterations and feed the result words, concatenated together as a long, into your analyzer. Actually, that's effectively what is happening, right?
If it winds up looking good, then do two iterations and feed the result words, concatenated together as a long, into your analyzer. Actually, that's effectively what is happening, right?
Yep, everything is one long unending string to Practrand.
If XORO32 can be used to produce a high-quality 16-bit result per iteration, then I could enhance the instruction to do TWO iterations, get 16-bit sums from each, and substitute the high-quality 32-bit PRNG result into the next instruction's S value. Meanwhile, the double iteration would be written back to the original D in the XORO32 instruction.
So, do a XORO32 iteration and store the 32-bit state in S. Do another XORO32 iteration, then sum the two states as two parallel 16-bit adds, with the parity jiggery-pokery, to give a 32-bit PRN in D. Is that correct and is there time to do all this?
I'm all for XORO32 doing the summing itself, if possible. We'd need a XORO32 write for the seed and a XORO32 read for the PRN.
I think sum[0] = sum[15:1] parity is so bad because sum[0] is no longer a function of itself!
Okay, I'm struggling to see how those two sentences relate to one another but I've run that and not too surprisingly the byte0 scores are a pretty close match to the word0 scores.
However, that along with many other score tables that have similar confusing column differences that one would expect to not occur due to common use of poor quality bits.
So, either the idea of bit positions being of a given quality for a given calibration is wrong - I doubt it - or PractRand has some glaring deficiencies in detecting problems with certain bit positions.
One thought was maybe it was byte oriented in some fashion given that's how I feed it the data stream. So I swapped Xoroshiro32+p result bit 9 with result bit 1. Both bits are moved by a whole byte. That shouldn't have any impact on PractRand, right? Guess what, it has quite a decent impact ...
If XORO32 can be used to produce a high-quality 16-bit result per iteration, then I could enhance the instruction to do TWO iterations, get 16-bit sums from each, and substitute the high-quality 32-bit PRNG result into the next instruction's S value. Meanwhile, the double iteration would be written back to the original D in the XORO32 instruction.
So, do a XORO32 iteration and store the 32-bit state in S. Do another XORO32 iteration, then sum the two states as two parallel 16-bit adds, with the parity jiggery-pokery, to give a 32-bit PRN in D. Is that correct and is there time to do all this?
I'm all for XORO32 doing the summing itself, if possible. We'd need a XORO32 write for the seed and a XORO32 read for the PRN.
It's all in one!
Because the iteration is just a bunch of XOR's, we could do a double iteration at once, grabbing results from each to perform the separate 16-bit adds and parity computations, in order to compute the compound PRNG output for the next instruction's S value. The 'XORO32 D' will write the double-iterated value (which is separate from the PRNG output) back into D, which is just a register. To seed the PRNG, just put a non-0 value into the register being used.
XORO32 D 'iterate D and put 32-bit PRNG result into next instruction's S
Example:
XORO32 rnd 'iterate rndMOVoutb,0'write XORO32 PRNG result to outb
Amazing! The 0 could in MOV could be anything and interrupts disabled between XORO32 and MOV?
Yes, XORO32 would stave off an interrupt just like SETQ (SETQ+RDLONG) does. The next instruction would wind up with the PRNG result for its S value. This is what you've been working towards, but didn't realize how simple it would turn out. Me neither. This just uses any cog register. Easy peasy.
Comments
Wow! That's perfect. I'm watching the log file at the moment and there is already 512M scores for word0 ...
EDIT: A detail, I've actually parity'd all bits, sum[16:0]
EDIT2: Doh! I see that is the same as yours.
Totally, I was picking up on that when Melissa was describing using the high quality bits to dynamically do rotates.
If it winds up looking good, then do two iterations and feed the result words, concatenated together as a long, into your analyzer. Actually, that's effectively what is happening, right?
int shift; uint64_t result = s0 + s1; uint64_t parity = result; for( shift=1; shift <= ACCUM_SIZE; shift++ ) { parity = parity ^ (result >> shift); } result = ((result & -2LL) | (parity & 1LL)) & ACCUM_MASK;
And [14 2 7] is still out front
Xoroshiro32p+ PractRand Score Table - Run 2017-10-28 05:11:16 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 msBit ============================================================================= [ 1 2 8] 4M 32M 64M 16M 64M 64M 32M 1G [ 2 1 11] 8M 8M 8M 128K 128K 128K 256K 1G [ 2 1 15] 64K 128K 128K 32K 32K 64K 32K 1G [ 2 2 7] 4M 8M 8M 32M 32M 32M 4M 1G [ 2 6 15] 1M 2M 4M 16M 128K 128K 128K 1G [ 2 8 7] 4M 8M 16M 128M 32M 32M 4M 1G [ 2 9 9] 256K 256K 256K 16K 128K 512K 64M 1G [ 2 11 3] 2M 2M 4M 512K 256K 256K 256K 1G [ 3 2 6] 32M 64M 32M 2M 2M 2M 1M 1G [ 3 3 10] 16M 16M 16M 256K 256K 512K 64M 1G [ 3 11 2] 256K 256K 512K 128K 64K 64K 64K 1G [ 3 11 14] 8M 4M 4M 256K 256K 512K 512K 1G [ 4 1 9] 1M 512K 1M 512K 8M 256K 512K 512M [ 4 7 15] 16M 64M 64M 128M 512K 512K 512K 1G [ 4 8 5] 64M 512M 128M 32M 8M 4M 2M 1G [ 5 2 6] 64M 128M 128M 8M 4M 4M 4M 1G [ 5 8 4] 1M 4M 16M 4M 1M 256K 128K 64M [ 5 14 12] 256M 128M 64M 2M 4M 16M 128M 1G [ 6 2 3] 16M 32M 32M 1M 1M 1M 2M 2M [ 6 2 5] 16M 16M 16M 256K 512K 512K 512K 4M [ 6 2 11] 256M 512M 256M 8M 8M 256M 128M 1G [ 6 3 15] 32M 32M 32M 1M 1M 2M 4M 1G [ 6 14 11] 128M 128M 128M 32M 64M 128M 16M 1G [ 7 1 8] 1M 64M 128M 16M 128M 16M 16M 1G [ 7 2 2] 8M 16M 16M 2M 2M 4M 4M 4M [ 7 2 10] 512M 1G 512M 128M 256M 64M 32M 1G [ 7 2 14] 512M 512M 256M 4M 8M 32M 32M 256M [ 7 8 2] 128M 512M 512M 64M 8M 4M 4M 512M [ 7 10 10] 128M 128M 256M 32M 32M 256M 128M 1G [ 7 15 8] 1M 2M 2M 256K 1M 512K 256K 1G [ 8 1 7] 1M 1M 4M 1M 1M 1M 1M 64M [ 8 2 1] 4M 32M 64M 8M 8M 16M 16M 16M [ 8 5 13] 32M 32M 64M 16M 8M 128M 128M 1G [ 8 7 15] 1M 2M 2M 512K 2M 2M 1M 1G [ 8 9 13] 256M 512M 128M 512M 1G 512M 128M 1G [ 8 15 7] 512K 512K 1M 256K 512K 512K 256K 1G [ 9 1 4] 16M 32M 64M 8M 16M 16M 16M 64M [ 9 2 14] 512M 2G 1G 128M 256M 64M 128M 1G [ 9 8 14] 8M 16M 32M 256M 32M 32M 64M 1G [ 9 9 2] 32M 32M 64M 16M 16M 16M 32M 64M [10 2 7] 128M 256M 128M 16M 16M 64M 32M 1G [10 3 3] 32M 64M 64M 32M 32M 32M 64M 256M [10 3 11] 512M 2G 1G 256M 256M 512M 128M 1G [10 5 13] 512M 2G 512M 1G 512M 256M 128M 512M [10 7 11] 512M 512M 128M 512M 128M 512M 128M 1G [10 10 7] 16M 16M 16M 16M 16M 512M 128M 512M [11 1 2] 128M 256M 256M 32M 32M 32M 32M 1G [11 1 14] 512M 1G 1G 32M 16M 16M 16M 1G [11 2 6] 512M 512M 512M 32M 64M 512M 128M 512M [11 3 10] 16M 32M 32M 64M 1M 128K 64K 128M [11 7 10] 2M 4M 8M 32M 128K 128K 64K 1G [11 8 12] 128M 256M 256M 512M 512M 512M 128M 1G [11 10 12] 32M 128M 128M 1G 512M 512M 128M 1G [11 11 14] 32M 64M 128M 16M 16M 16M 8M 1G [11 14 6] 128M 256M 128M 256M 64M 512M 128M 1G [12 4 15] 256M 512M 1G 128M 64M 16M 16M 1G [12 8 11] 2M 4M 4M 64M 128K 128K 64K 1G [12 10 11] 2M 2M 2M 128K 128K 128K 64K 1G [12 10 13] 64M 32M 64M 16M 16M 32M 8M 1G [12 14 5] 128M 32M 32M 128M 128M 64M 64M 1G [13 3 14] 128M 512M 512M 64M 32M 16M 8M 1G [13 5 8] 128M 256M 256M 512M 128M 256M 128M 1G [13 5 10] 256M 64M 32M 64M 16M 256K 256K 1G [13 9 8] 128M 128M 64M 512M 512M 512M 128M 1G [13 10 12] 2M 2M 2M 128K 128K 128K 64K 512M [13 11 14] 16M 16M 16M 8M 8M 8M 8M 1G [13 12 14] 16M 16M 16M 8M 8M 4M 4M 1G [13 13 14] 8M 16M 16M 8M 8M 8M 8M 1G [14 1 11] 32M 64M 32M 256K 256K 256K 256K 1G [14 2 7] 512M 1G 512M 1G 512M 512M 128M 1G [14 2 9] 512M 512M 256M 128M 128M 64M 128M 1G [14 3 13] 4M 16M 16M 1M 256K 64K 64K 1G [14 8 9] 512M 128M 64M 1G 64M 64M 64M 1G [14 11 3] 512M 512M 512M 256M 128M 128M 64M 1G [14 11 11] 16M 16M 8M 256K 256K 128K 256K 1G [14 11 13] 512K 512K 512K 128K 64K 64K 64K 1G [14 12 13] 256K 512K 256K 128K 64K 64K 64K 1G [14 13 13] 128K 128K 256K 64K 64K 128K 64K 256M [15 1 2] 2M 2M 2M 1M 512K 512K 256K 1G [15 3 6] 512M 1G 512M 512M 512M 256M 128M 1G [15 4 12] 64M 128M 64M 128M 8M 256K 256K 1G [15 6 2] 64M 128M 256M 128M 128M 32M 16M 1G [15 7 4] 256M 512M 128M 512M 512M 256M 128M 1G [15 7 8] 16M 64M 64M 32M 32M 4M 2M 512M
Yep, everything is one long unending string to Practrand.
Xoroshiro32+ [14,2,7] sum bits [15:0] [15:1] [15:2] [15:8] [11:4] [9:2] [8:1] [15] ========================================================================================= 512K 1G 512M 1G 512M 512M 128M 1G sum[0] = s0[0] + s1[0] 512K 1G 512M 1G 512M 512M 128M 1G sum[0] = [s0,s1] parity 16M 1G 512M 1G 512M 512M 128M 1G sum[0] = sum[15:0] parity 16K 1G 512M 1G 512M 512M 128M 1G sum[0] = sum[15:1] parity 512M 1G 512M 1G 512M 512M 128M 1G sum[0] = sum[16:0] parity
This would address TonyB_'s longstanding desire to get good long results.
I think sum[0] = sum[15:1] parity is so bad because sum[0] is no longer a function of itself!
That's either the best question ever or you really do need to get some sleep!
So, do a XORO32 iteration and store the 32-bit state in S. Do another XORO32 iteration, then sum the two states as two parallel 16-bit adds, with the parity jiggery-pokery, to give a 32-bit PRN in D. Is that correct and is there time to do all this?
I'm all for XORO32 doing the summing itself, if possible. We'd need a XORO32 write for the seed and a XORO32 read for the PRN.
However, that along with many other score tables that have similar confusing column differences that one would expect to not occur due to common use of poor quality bits.
So, either the idea of bit positions being of a given quality for a given calibration is wrong - I doubt it - or PractRand has some glaring deficiencies in detecting problems with certain bit positions.
One thought was maybe it was byte oriented in some fashion given that's how I feed it the data stream. So I swapped Xoroshiro32+p result bit 9 with result bit 1. Both bits are moved by a whole byte. That shouldn't have any impact on PractRand, right? Guess what, it has quite a decent impact ...
Xoroshiro32+p PractRand Score Table - Run 2017-10-28 06:10:51 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 Byte0 ============================================================================= [ 1 2 8] 4M 32M 64M 16M 64M 64M 32M 32M [ 2 1 11] 8M 8M 8M 128K 128K 128K 256K 512K [ 2 1 15] 64K 128K 128K 32K 32K 64K 32K 64K [ 2 2 7] 4M 8M 8M 32M 32M 32M 4M 32M [ 2 6 15] 1M 2M 4M 16M 128K 128K 128K 256K [ 2 8 7] 4M 8M 16M 128M 32M 32M 4M 128M [ 2 9 9] 256K 256K 256K 16K 128K 512K 64M 256M [ 2 11 3] 2M 2M 4M 512K 256K 256K 256K 256K [ 3 2 6] 32M 64M 32M 2M 2M 2M 1M 4M [ 3 3 10] 16M 16M 16M 256K 256K 512K 64M 256M [ 3 11 2] 256K 256K 512K 128K 64K 64K 64K 64K [ 3 11 14] 8M 4M 4M 256K 256K 512K 512K 1M [ 4 1 9] 1M 512K 1M 512K 8M 256K 512K 512K [ 4 7 15] 16M 64M 64M 128M 512K 512K 512K 1M [ 4 8 5] 64M 512M 128M 32M 8M 4M 2M 8M [ 5 2 6] 64M 128M 128M 8M 4M 4M 4M 16M [ 5 8 4] 1M 4M 16M 4M 1M 256K 128K 128K [ 5 14 12] 256M 128M 64M 2M 4M 16M 128M 1G [ 6 2 3] 16M 32M 32M 1M 1M 1M 2M 4M [ 6 2 5] 16M 16M 16M 256K 512K 512K 512K 1M [ 6 2 11] 256M 512M 256M 8M 8M 256M 128M 512M [ 6 3 15] 32M 32M 32M 1M 1M 2M 4M 8M [ 6 14 11] 128M 128M 128M 32M 64M 128M 16M 128M [ 7 1 8] 1M 64M 128M 16M 128M 16M 16M 32M [ 7 2 2] 8M 16M 16M 2M 2M 4M 4M 8M [ 7 2 10] 512M 1G 512M 128M 256M 64M 32M 128M [ 7 2 14] 512M 512M 256M 4M 8M 32M 32M 1G [ 7 8 2] 128M 512M 512M 64M 8M 4M 4M 16M [ 7 10 10] 128M 128M 256M 32M 32M 256M 128M 512M [ 7 15 8] 1M 2M 2M 256K 1M 512K 256K 256K [ 8 1 7] 1M 1M 4M 1M 1M 1M 1M 32M [ 8 2 1] 4M 32M 64M 8M 8M 16M 16M 16M [ 8 5 13] 32M 32M 64M 16M 8M 128M 128M 128M [ 8 7 15] 1M 2M 2M 512K 2M 2M 1M 1M [ 8 9 13] 256M 512M 128M 512M 1G 512M 128M 1G [ 8 15 7] 512K 512K 1M 256K 512K 512K 256K 256K [ 9 1 4] 16M 32M 64M 8M 16M 16M 16M 16M [ 9 2 14] 512M 2G 1G 128M 256M 64M 128M 512M [ 9 8 14] 8M 16M 32M 256M 32M 32M 64M 128M [ 9 9 2] 32M 32M 64M 16M 16M 16M 32M 128M [10 2 7] 128M 256M 128M 16M 16M 64M 32M 64M [10 3 3] 32M 64M 64M 32M 32M 32M 64M 512M [10 3 11] 512M 2G 1G 256M 256M 512M 128M 512M [10 5 13] 512M 2G 512M 1G 512M 256M 128M 512M [10 7 11] 512M 512M 128M 512M 128M 512M 128M 1G [10 10 7] 16M 16M 16M 16M 16M 512M 128M 512M [11 1 2] 128M 256M 256M 32M 32M 32M 32M 128M [11 1 14] 512M 1G 1G 32M 16M 16M 16M 512M [11 2 6] 512M 512M 512M 32M 64M 512M 128M 1G [11 3 10] 16M 32M 32M 64M 1M 128K 64K 64K [11 7 10] 2M 4M 8M 32M 128K 128K 64K 64K [11 8 12] 128M 256M 256M 512M 512M 512M 128M 512M [11 10 12] 32M 128M 128M 1G 512M 512M 128M 512M [11 11 14] 32M 64M 128M 16M 16M 16M 8M 512M [11 14 6] 128M 256M 128M 256M 64M 512M 128M 512M [12 4 15] 256M 512M 1G 128M 64M 16M 16M 32M [12 8 11] 2M 4M 4M 64M 128K 128K 64K 128K [12 10 11] 2M 2M 2M 128K 128K 128K 64K 128K [12 10 13] 64M 32M 64M 16M 16M 32M 8M 512M [12 14 5] 128M 32M 32M 128M 128M 64M 64M 128M [13 3 14] 128M 512M 512M 64M 32M 16M 8M 16M [13 5 8] 128M 256M 256M 512M 128M 256M 128M 128M [13 5 10] 256M 64M 32M 64M 16M 256K 256K 2M [13 9 8] 128M 128M 64M 512M 512M 512M 128M 1G [13 10 12] 2M 2M 2M 128K 128K 128K 64K 128K [13 11 14] 16M 16M 16M 8M 8M 8M 8M 16M [13 12 14] 16M 16M 16M 8M 8M 4M 4M 16M [13 13 14] 8M 16M 16M 8M 8M 8M 8M 16M [14 1 11] 32M 64M 32M 256K 256K 256K 256K 512K [14 2 7] 512M 1G 512M 1G 512M 512M 128M 512M [14 2 9] 512M 512M 256M 128M 128M 64M 128M 256M [14 3 13] 4M 16M 16M 1M 256K 64K 64K 128K [14 8 9] 512M 128M 64M 1G 64M 64M 64M 1G [14 11 3] 512M 512M 512M 256M 128M 128M 64M 256M [14 11 11] 16M 16M 8M 256K 256K 128K 256K 512K [14 11 13] 512K 512K 512K 128K 64K 64K 64K 128K [14 12 13] 256K 512K 256K 128K 64K 64K 64K 128K [14 13 13] 128K 128K 256K 64K 64K 128K 64K 128K [15 1 2] 2M 2M 2M 1M 512K 512K 256K 1M [15 3 6] 512M 1G 512M 512M 512M 256M 128M 1G [15 4 12] 64M 128M 64M 128M 8M 256K 256K 512K [15 6 2] 64M 128M 256M 128M 128M 32M 16M 16M [15 7 4] 256M 512M 128M 512M 512M 256M 128M 64M [15 7 8] 16M 64M 64M 32M 32M 4M 2M 2M
And here's what we get with sum[9] <-> sum[1]:
Xoroshiro32+p PractRand Score Table - Run 2017-10-28 06:47:33 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 Byte0 msBit ===================================================================================== [ 1 2 8] 64M 16M 32M 64M 64M 32M 64M 128M 1G [ 2 1 11] 8M 16M 8M 128K 256K 128K 256K 1M 1G [ 2 1 15] 128K 128K 256K 128K 128K 32K 64K 128K 1G [ 2 2 7] 8M 8M 8M 256K 4M 4M 512K 512K 1G [ 2 6 15] 2M 4M 2M 32M 128K 128K 256K 256K 1G [ 2 8 7] 8M 16M 16M 256K 4M 4M 256K 512K 1G [ 2 9 9] 256K 512K 512K 32K 256K 64M 256K 256K 1G [ 2 11 3] 4M 8M 4M 512K 512K 256K 128K 128K 1G [ 3 2 6] 32M 64M 32M 16M 512K 1M 128K 512K 1G [ 3 3 10] 16M 32M 32M 1M 512K 64M 1M 4M 1G [ 3 11 2] 512K 512K 512K 128K 128K 64K 64K 128K 1G [ 3 11 14] 8M 16M 8M 512K 256K 256K 512K 2M 1G [ 4 1 9] 1M 512K 1M 512K 16M 512K 256K 256K 512M [ 4 7 15] 32M 32M 64M 128M 256K 512K 1M 2M 1G [ 4 8 5] 128M 512M 64M 32M 1M 2M 4M 16M 1G [ 5 2 6] 128M 256M 64M 8M 2M 4M 512K 2M 1G [ 5 8 4] 2M 8M 8M 128K 128K 128K 256K 256K 64M [ 5 14 12] 256M 512M 256M 8M 16M 512M 32M 128M 1G [ 6 2 3] 32M 32M 32M 1M 1M 1M 128K 512K 2M [ 6 2 5] 8M 16M 16M 128K 512K 512K 128K 256K 4M [ 6 2 11] 256M 1G 256M 8M 16M 256M 128M 512M 1G [ 6 3 15] 64M 64M 64M 2M 1M 2M 4M 8M 1G [ 6 14 11] 128M 256M 128M 32M 128M 16M 128M 1G 1G [ 7 1 8] 128M 256M 128M 32M 256M 16M 16M 64M 1G [ 7 2 2] 16M 32M 32M 4M 2M 4M 512K 4M 4M [ 7 2 10] 256M 1G 1G 128M 64M 32M 64M 512M 1G [ 7 2 14] 256M 1G 256M 8M 32M 512M 32M 256M 256M [ 7 8 2] 64M 1G 256M 64M 4M 4M 8M 16M 512M [ 7 10 10] 128M 256M 256M 64M 64M 256M 256M 512M 1G [ 7 15 8] 1M 2M 2M 512K 2M 256K 512K 512K 1G [ 8 1 7] 1M 2M 4M 128K 256K 1M 64K 256K 64M [ 8 2 1] 64M 32M 128M 16M 16M 16M 16M 32M 16M [ 8 5 13] 32M 128M 64M 32M 16M 128M 128M 1G 1G [ 8 7 15] 2M 4M 2M 2M 8M 1M 2M 2M 1G [ 8 9 13] 256M 512M 128M 512M 256M 128M 512M 1G 1G [ 8 15 7] 1M 1M 1M 128K 256K 256K 64K 256K 1G [ 9 1 4] 16M 32M 64M 8M 16M 16M 8M 32M 64M [ 9 2 14] 256M 2G 512M 128M 256M 64M 128M 256M 1G [ 9 8 14] 16M 16M 32M 256M 64M 32M 64M 256M 1G [ 9 9 2] 32M 64M 64M 32M 32M 32M 32M 128M 64M [10 2 7] 64M 256M 128M 16M 8M 32M 64M 256M 1G [10 3 3] 32M 128M 128M 16M 32M 64M 32M 128M 256M [10 3 11] 256M 2G 256M 128M 128M 512M 512M 2G 1G [10 5 13] 512M 2G 512M 1G 256M 256M 512M 1G 512M [10 7 11] 64M 512M 256M 128M 64M 512M 512M 2G 1G [10 10 7] 32M 32M 16M 16M 8M 256M 64M 512M 512M [11 1 2] 256M 512M 512M 16M 64M 16M 64M 128M 1G [11 1 14] 512M 2G 2G 64M 256M 512M 32M 128M 1G [11 2 6] 512M 2G 512M 64M 64M 512M 256M 2G 512M [11 3 10] 16M 16M 16M 2M 256K 64K 256K 1M 128M [11 7 10] 4M 8M 8M 256K 128K 64K 256K 1M 1G [11 8 12] 64M 512M 512M 256M 512M 512M 512M 512M 1G [11 10 12] 64M 128M 128M 512K 64M 512M 512M 1G 1G [11 11 14] 64M 128M 256M 16M 256M 128M 32M 64M 1G [11 14 6] 256M 512M 128M 1G 32M 128M 256M 1G 1G [12 4 15] 512M 1G 1G 256M 128M 16M 32M 64M 1G [12 8 11] 1M 4M 8M 8M 128K 64K 256K 512K 1G [12 10 11] 4M 4M 4M 1M 128K 64K 256K 512K 1G [12 10 13] 64M 128M 256M 16M 16M 128M 64M 128M 1G [12 14 5] 256M 128M 32M 128M 64M 64M 64M 256M 1G [13 3 14] 256M 512M 512M 256M 32M 8M 32M 64M 1G [13 5 8] 256M 512M 256M 512M 512M 512M 256M 512M 1G [13 5 10] 32M 128M 32M 16M 8M 128K 512K 1M 1G [13 9 8] 128M 256M 128M 1G 512M 512M 512M 1G 1G [13 10 12] 4M 4M 4M 512K 128K 64K 128K 256K 512M [13 11 14] 16M 32M 32M 16M 16M 8M 8M 32M 1G [13 12 14] 16M 32M 32M 512K 16M 4M 4M 64M 1G [13 13 14] 16M 32M 32M 16M 16M 8M 8M 32M 1G [14 1 11] 32M 64M 32M 1M 256K 256K 512K 2M 1G [14 2 7] 512M 2G 256M 256M 16M 512M 256M 512M 1G [14 2 9] 128M 512M 256M 64M 128M 256M 128M 128M 1G [14 3 13] 16M 32M 16M 1M 256K 64K 128K 256K 1G [14 8 9] 128M 256M 128M 1G 16M 256M 64M 64M 1G [14 11 3] 512M 1G 2G 256M 128M 64M 128M 256M 1G [14 11 11] 16M 32M 16M 512K 256K 128K 512K 1M 1G [14 11 13] 1M 1M 1M 256K 64K 64K 128K 128K 1G [14 12 13] 512K 512K 512K 256K 64K 64K 128K 128K 1G [14 13 13] 512K 256K 512K 128K 64K 64K 128K 128K 256M [15 1 2] 4M 4M 4M 2M 8M 256K 512K 4M 1G [15 3 6] 512M 2G 512M 512M 512M 512M 512M 1G 1G [15 4 12] 64M 128M 32M 256M 8M 256K 512K 1M 1G [15 6 2] 128M 256M 128M 1G 64M 16M 32M 32M 1G [15 7 4] 512M 2G 128M 512M 256M 128M 512M 512M 1G [15 7 8] 32M 128M 64M 128M 32M 2M 4M 4M 512M
It's all in one!
Because the iteration is just a bunch of XOR's, we could do a double iteration at once, grabbing results from each to perform the separate 16-bit adds and parity computations, in order to compute the compound PRNG output for the next instruction's S value. The 'XORO32 D' will write the double-iterated value (which is separate from the PRNG output) back into D, which is just a register. To seed the PRNG, just put a non-0 value into the register being used.
XORO32 D 'iterate D and put 32-bit PRNG result into next instruction's S
Example:
XORO32 rnd 'iterate rnd MOV outb,0 'write XORO32 PRNG result to outb
Yes, XORO32 would stave off an interrupt just like SETQ (SETQ+RDLONG) does. The next instruction would wind up with the PRNG result for its S value. This is what you've been working towards, but didn't realize how simple it would turn out. Me neither. This just uses any cog register. Easy peasy.
wire [15:0] xoro32z = d[31:16] ^ d[15:0]; // first iteration wire [31:0] xoro32y = {xoro32z[8:0], xoro32z[15:9], {d[1:0], d[15:2]} ^ {xoro32z[13:0], 2'b0} ^ xoro32z}; wire [15:0] xoro32x = xoro32y[31:16] ^ xoro32y[15:0]; // second iteration wire [31:0] xoro32 = {xoro32x[8:0], xoro32x[15:9], // xoro32 = d result {xoro32y[1:0], xoro32y[15:2]} ^ {xoro32x[13:0], 2'b0} ^ xoro32x}; wire [16:0] xoro32a = xoro32y[31:16] + xoro32y[15:0]; // first iteration sum wire [16:0] xoro32b = xoro32[31:16] + xoro32[15:0]; // second iteration sum wire [31:0] xoro32r = {xoro32b[15:1], ^xoro32b, // xoro32r = prng result xoro32a[15:1], ^xoro32a};
EDIT: I changed the xoro32r output so that the 1st iteration result is in the lower word and the 2nd iteration result is in the upper word.
wire [31:0] xoro32r = {xoro32b[15:1], ^xoro32b, // xoro32r = prng result xoro32a[15:1], ^xoro32a};
I think so. I'll change it.