What if you took the parity of the sum[16:1] and XOR'd it into the sum LSB, then used sum[15:0]?
Wow! That's perfect. I'm watching the log file at the moment and there is already 512M scores for word0 ...
EDIT: A detail, I've actually parity'd all bits, sum[16:0]
EDIT2: Doh! I see that is the same as yours.
If it winds up looking good, then do two iterations and feed the result words, concatenated together as a long, into your analyzer. Actually, that's effectively what is happening, right?
If it winds up looking good, then do two iterations and feed the result words, concatenated together as a long, into your analyzer. Actually, that's effectively what is happening, right?
Yep, everything is one long unending string to Practrand.
If XORO32 can be used to produce a high-quality 16-bit result per iteration, then I could enhance the instruction to do TWO iterations, get 16-bit sums from each, and substitute the high-quality 32-bit PRNG result into the next instruction's S value. Meanwhile, the double iteration would be written back to the original D in the XORO32 instruction.
So, do a XORO32 iteration and store the 32-bit state in S. Do another XORO32 iteration, then sum the two states as two parallel 16-bit adds, with the parity jiggery-pokery, to give a 32-bit PRN in D. Is that correct and is there time to do all this?
I'm all for XORO32 doing the summing itself, if possible. We'd need a XORO32 write for the seed and a XORO32 read for the PRN.
I think sum[0] = sum[15:1] parity is so bad because sum[0] is no longer a function of itself!
Okay, I'm struggling to see how those two sentences relate to one another but I've run that and not too surprisingly the byte0 scores are a pretty close match to the word0 scores.
However, that along with many other score tables that have similar confusing column differences that one would expect to not occur due to common use of poor quality bits.
So, either the idea of bit positions being of a given quality for a given calibration is wrong - I doubt it - or PractRand has some glaring deficiencies in detecting problems with certain bit positions.
One thought was maybe it was byte oriented in some fashion given that's how I feed it the data stream. So I swapped Xoroshiro32+p result bit 9 with result bit 1. Both bits are moved by a whole byte. That shouldn't have any impact on PractRand, right? Guess what, it has quite a decent impact ...
If XORO32 can be used to produce a high-quality 16-bit result per iteration, then I could enhance the instruction to do TWO iterations, get 16-bit sums from each, and substitute the high-quality 32-bit PRNG result into the next instruction's S value. Meanwhile, the double iteration would be written back to the original D in the XORO32 instruction.
So, do a XORO32 iteration and store the 32-bit state in S. Do another XORO32 iteration, then sum the two states as two parallel 16-bit adds, with the parity jiggery-pokery, to give a 32-bit PRN in D. Is that correct and is there time to do all this?
I'm all for XORO32 doing the summing itself, if possible. We'd need a XORO32 write for the seed and a XORO32 read for the PRN.
It's all in one!
Because the iteration is just a bunch of XOR's, we could do a double iteration at once, grabbing results from each to perform the separate 16-bit adds and parity computations, in order to compute the compound PRNG output for the next instruction's S value. The 'XORO32 D' will write the double-iterated value (which is separate from the PRNG output) back into D, which is just a register. To seed the PRNG, just put a non-0 value into the register being used.
XORO32 D 'iterate D and put 32-bit PRNG result into next instruction's S
Example:
XORO32 rnd 'iterate rnd
MOV outb,0 'write XORO32 PRNG result to outb
Amazing! The 0 could in MOV could be anything and interrupts disabled between XORO32 and MOV?
Yes, XORO32 would stave off an interrupt just like SETQ (SETQ+RDLONG) does. The next instruction would wind up with the PRNG result for its S value. This is what you've been working towards, but didn't realize how simple it would turn out. Me neither. This just uses any cog register. Easy peasy.
Comments
Wow! That's perfect. I'm watching the log file at the moment and there is already 512M scores for word0 ...
EDIT: A detail, I've actually parity'd all bits, sum[16:0]
EDIT2: Doh! I see that is the same as yours.
Totally, I was picking up on that when Melissa was describing using the high quality bits to dynamically do rotates.
If it winds up looking good, then do two iterations and feed the result words, concatenated together as a long, into your analyzer. Actually, that's effectively what is happening, right?
And [14 2 7] is still out front
Yep, everything is one long unending string to Practrand.
This would address TonyB_'s longstanding desire to get good long results.
I think sum[0] = sum[15:1] parity is so bad because sum[0] is no longer a function of itself!
That's either the best question ever or you really do need to get some sleep!
So, do a XORO32 iteration and store the 32-bit state in S. Do another XORO32 iteration, then sum the two states as two parallel 16-bit adds, with the parity jiggery-pokery, to give a 32-bit PRN in D. Is that correct and is there time to do all this?
I'm all for XORO32 doing the summing itself, if possible. We'd need a XORO32 write for the seed and a XORO32 read for the PRN.
However, that along with many other score tables that have similar confusing column differences that one would expect to not occur due to common use of poor quality bits.
So, either the idea of bit positions being of a given quality for a given calibration is wrong - I doubt it - or PractRand has some glaring deficiencies in detecting problems with certain bit positions.
One thought was maybe it was byte oriented in some fashion given that's how I feed it the data stream. So I swapped Xoroshiro32+p result bit 9 with result bit 1. Both bits are moved by a whole byte. That shouldn't have any impact on PractRand, right? Guess what, it has quite a decent impact ...
And here's what we get with sum[9] <-> sum[1]:
It's all in one!
Because the iteration is just a bunch of XOR's, we could do a double iteration at once, grabbing results from each to perform the separate 16-bit adds and parity computations, in order to compute the compound PRNG output for the next instruction's S value. The 'XORO32 D' will write the double-iterated value (which is separate from the PRNG output) back into D, which is just a register. To seed the PRNG, just put a non-0 value into the register being used.
XORO32 D 'iterate D and put 32-bit PRNG result into next instruction's S
Example:
Yes, XORO32 would stave off an interrupt just like SETQ (SETQ+RDLONG) does. The next instruction would wind up with the PRNG result for its S value. This is what you've been working towards, but didn't realize how simple it would turn out. Me neither. This just uses any cog register. Easy peasy.
EDIT: I changed the xoro32r output so that the 1st iteration result is in the lower word and the 2nd iteration result is in the upper word.
I think so. I'll change it.