Thanks Chip.
A long time ago you posted the basic clock processing for the P2. Not sure where that was and if it is even current. How does it work now?
Here's the P1 for example
I d S D e R
I d S D e R
Drawn that way, Prop2 is:
I SD e e R
I SD e e R
I SD e e R
EDIT: The S and D ALU ports of the second instruction can be fed straight from the first instruction completion - effectively a forced immediate addressing mode on the second instruction. Or the second instruction's SD operand substitution can be done after the first's first execution stage.
I'm assuming Chip prefers operand substitution where possible but what use does a random number have as anything other than an immediate number?
EDIT2: Chip did say the second execution stage is basically all mux'ing. I'm not sure the implications there. Does that mean that none of the prefixing instructions, eg: ALTS, can do operand substitution? I've long forgotten how those instructions work ... (Now I'm late for work!)
You might be amused to know, when only using 16-bits or less, the quality scores for our XORO32 doing the even-odd looping trick are generally a little better than straight incremental. There is a few 1 GB scores in the word0 column, and even some 4 GB and 8 GB scores in other columns too. It's also more of a mixed bag, [14 2 7] no longer looks so great for example.
For the testing, I modified the source to double iterate on every call to next(). Nothing else changed.
Remember when you swapped bits 1 and 9? [14,2,7] got worse but [15,3,6] got better. I wonder what the double-iteration score would be with that swapping?
Evan, could you please try a double-iteration test with bits 1 and 9 swapped? Also it would be good to add an lsBit test to this to see how bit 0 performs.
I think this will be the last test, I hope so anyway. PractRand has been an excellent tool for comparing triplets and [14,2,7] has always been the one in the Verilog, however XORO32 has changed and I think [15,3,6] is the better choice now. That's the only modification XORO32 really needs.
In that case, I'd recommend also moving the parity bit to the most significant bit position.
My impression is, without subsequent summing, a final bit shuffle is only hiding things from PractRand. I doubt it's a quality improvement.
To change the subject a little, that last odd-even scoring also had a lowly 32MB score for [15 3 6] as well as [14 2 7]. Tony is picking the score columns that will likely be used in production code but there is many more possible sampling variants that can show differing quality.
I don't know how relevant such aligned scores are to the true quality of an algorithm. Maybe what Tony is picking out is the right answer.
The question isn't all that rigidly defined is it.
The question is one of over-all quality. Doing lots of sampling variations like I've been doing, is there ever going to be a flat set of scores for a given algorithm? Or should we only expect to settle for finding a combination and config that makes the common uses produce good scores?
Evan, could you please try a double-iteration test with bits 1 and 9 swapped? Also it would be good to add an lsBit test to this to see how bit 0 performs.
Ouch, the whole set of candidates is taking up to an hour to run now! That increase will be solely down to the average increase in score sizes. Those old times, ~12 minutes, must have been just before we'd found the full-width parity improvement.
I'll do some more configurations tomorrow if Scro doesn't tell me I'm wasting my time.
I don't know what you'd be doing to wasting your time or not waste your time.
But if your trying xoroshiro32plus with a parity bit replacing one bit of output, could you instead have the parity bit act as the carry-in on the addition that produces the output? That might be better, maybe.
I'll do some more configurations tomorrow if Scro doesn't tell me I'm wasting my time.
I don't know what you'd be doing to wasting your time or not waste your time.
But if your trying xoroshiro32plus with a parity bit replacing one bit of output, could you instead have the parity bit act as the carry-in on the addition that produces the output? That might be better, maybe.
The idea of using the parity is a bit of a novelty and it's probably not easy to see in all our previous discussions where the parity comes from. It's the parity of the full 17-bit result after the addition of the two 16-bit halves of the state.
Evan, could you please try a double-iteration test with bits 1 and 9 swapped? Also it would be good to add an lsBit test to this to see how bit 0 performs.
Ouch, the whole set of candidates is taking up to an hour to run now! That increase will be solely down to the average increase in score sizes. Those old times, ~12 minutes, must have been just before we'd found the full-width parity improvement.
Many thanks for today's results, Evan. I think you have done enough tests.
Here is a summary of the most important scores for the top two triplets with three combinations of sum and parity:
Word size is 16 bits. Parity as msb has the worst scores but was a convenient way of doing a parity-only test and the top bit here has been copied to the bottom bit of the other two sets. Swapping bits 1 and 9 has worse scores overall than not swapping which leaves parity as lsb as the best option and here it is again on its own:
Top Bot Top Bot
Triplet WordByteByte Bit Bit Iter. PRN[15:0] =
------------------------------------------------------------------
[14,2,7] 512M 1G 512M 1G 1G single
[15,3,6] 512M 512M 1G 1G 1G "
sum[15:1],parity
[14,2,7] 512M 256M 1G 1G 1G double
[15,3,6] 1G 1G 2G 1G 1G "
------------------------------------------------------------------
XORO32 does a double iteration and concatenates two 16-bit sum & parity results. Treating those as one long PRN, [14,2,7] is better in the top bytes and [15,3,6] better in the bottom bytes but otherwise they score the same.
If the best possible word or byte PRN is wanted from each XORO32 instruction, then [15,3,6] is clearly superior. The word score of 1G is the best since using all 16 bits and the bottom byte score of 2G is the highest we have had.
We have always relied on the PractRand scores and they are saying choose [15,3,6].
I'll do some more configurations tomorrow if Scro doesn't tell me I'm wasting my time.
I don't know what you'd be doing to wasting your time or not waste your time.
The main attempts right now are shuffling the summing output bit order. This is the question I was trying to articulate earlier - Is there is any real quality advantage to a fixed post-summing shuffle? Or are we just tricking PrandRand?
I'll do some more configurations tomorrow if Scro doesn't tell me I'm wasting my time.
I don't know what you'd be doing to wasting your time or not waste your time.
The main attempts right now are shuffling the summing output bit order. This is the question I was trying to articulate earlier - Is there is any real quality advantage to a fixed post-summing shuffle? Or are we just tricking PrandRand?
Shuffling bit order in the final output? Yeah, that sounds pretty suspicious. Exactly how useless that is depends upon what test failures you're managing to avoid that way, but in general it sounds like a bad idea.
Many thanks for today's results, Evan. I think you have done enough tests.
The thing is, the shuffling is pretty arbitrary. The original bit9<->bit1 swap, intentionally 8 bits apart, was just to test if PracrRand can be fooled - which seems easily done.
If we're going to start down this path we should do a decent amount of it to see what else comes up.
Many thanks for today's results, Evan. I think you have done enough tests.
The thing is, the shuffling is pretty arbitrary. The original bit9<->bit1 swap, intentionally 8 bits apart, was just to test if PracrRand can be fooled - which seems easily done.
If we're going to start down this path we should do a decent amount of it to see what else comes up.
I think you have done enough tests for the decision about switching to [15,3,6]. Someone should double check my summary, though.
You could try swapping bits 1 and 8/10/11/12/13/14 but swapping bits 1 and 9 made things worse overall.
Tony,
You are somewhat cherry picking. There is other triplets that also look better than either of those two when the bits are shuffled appropriately.
Tony,
You are somewhat cherry picking. There is other triplets that also look better than either of those two when the bits are shuffled appropriately.
I chose the triplets that performed best in the three sets of tests that you did as listed in my summary post above. However, in my view the most important set is the first one and we need the best triplet for both single and double iterations when bit 0 is parity, i.e. just a single bit change from the original algorithm. I don't like the idea of shuffling bits about after the addition unless that gives better scores, in which case there is probably something wrong with the algorithm. What I am saying needs checking but I'm confident that [15,3,6] scores the best.
Jump functions work with xoroshiro32. It is possible to jump by 64K or 128K states with only 32 single iterations plus some XORs according to the jump bit patterns:
JUMP_64K = 1A57604D
JUMP_128K = 2FD3E1B9Bit31 is the first bit to test.
The jump function needs single-stepping of states which XORO32 can no longer do, unless it is modified. The easiest way, avoiding use of CZ opcode bits, would be to move XORO32 to an empty D,S slot with S selecting single or double iterations.
Here are the first 32 * 64K jumps from state 00000001:
Comments
You'll need a wide display to see this properly:
clk _________------------____________------------____________------------____________------------____________------------____________------------____________- | | | | | | | rdRAM Ib |-------+ | rdRAM Ic |-------+ | rdRAM Id |-------+ | rdRAM Ie | | | | | | | | | | | latch Da |---+ +----> rdRAM Db |------------> latch Db |---+ +----> rdRAM Dc |------------> latch Dc |---+ +----> rdRAM Dd |------------> latch Dd | latch Sa |---+ +----> rdRAM Sb |------------> latch Sb |---+ +----> rdRAM Sc |------------> latch Sc |---+ +----> rdRAM Sd |------------> latch Sd | latch Ia |---+ +----> latch Ib |------------> latch Ib |---+ +----> latch Ic |------------> latch Ic |---+ +----> latch Id |------------> latch Id | | | | | | | | | | | | +------------------ALU-----------> wrRAM Ra | +------------------ALU-----------> wrRAM Rb | +------------------ALU-----------> wrRAM Rc | | | | | | | | | | stall/done = 'gox' | | stall/done = 'gox' | | stall/done = 'gox' | | 'get' | done = 'go' | 'get' | done = 'go' | 'get' | done = 'go' |
I SD e e R I SD e e R I SD e e R
EDIT: The S and D ALU ports of the second instruction can be fed straight from the first instruction completion - effectively a forced immediate addressing mode on the second instruction. Or the second instruction's SD operand substitution can be done after the first's first execution stage.I'm assuming Chip prefers operand substitution where possible but what use does a random number have as anything other than an immediate number?
EDIT2: Chip did say the second execution stage is basically all mux'ing. I'm not sure the implications there. Does that mean that none of the prefixing instructions, eg: ALTS, can do operand substitution? I've long forgotten how those instructions work ... (Now I'm late for work!)
I SDe R I SDe R I SDe R
Evan, could you please try a double-iteration test with bits 1 and 9 swapped? Also it would be good to add an lsBit test to this to see how bit 0 performs.
I think this will be the last test, I hope so anyway. PractRand has been an excellent tool for comparing triplets and [14,2,7] has always been the one in the Verilog, however XORO32 has changed and I think [15,3,6] is the better choice now. That's the only modification XORO32 really needs.
My impression is, without subsequent summing, a final bit shuffle is only hiding things from PractRand. I doubt it's a quality improvement.
To change the subject a little, that last odd-even scoring also had a lowly 32MB score for [15 3 6] as well as [14 2 7]. Tony is picking the score columns that will likely be used in production code but there is many more possible sampling variants that can show differing quality.
I don't know how relevant such aligned scores are to the true quality of an algorithm. Maybe what Tony is picking out is the right answer.
Scro, is there a rigid answer to this question?
EDIT: Typo
The question is one of over-all quality. Doing lots of sampling variations like I've been doing, is there ever going to be a flat set of scores for a given algorithm? Or should we only expect to settle for finding a combination and config that makes the common uses produce good scores?
Here's the earlier scoring with parity at msBit position - http://forums.parallax.com/discussion/comment/1424022/#Comment_1424022
Scoring with double-iterating, parity at msBit and exchanged bit8 with bit0:
Xoroshiro32+p PractRand Score Table - Run 2017-11-07 23:48:10 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 Byte0 msBit ===================================================================================== [ 1 2 8] 64M 128M 256M 128M 256M 16M 4M 128M 1G [ 2 1 11] 64M 128M 64M 128M 128M 32M 256M 512M 1G [ 2 1 15] 32M 64M 32M 1M 1M 512K 512K 1M 1G [ 2 2 7] 64M 256M 128M 128M 128M 32M 8M 32M 1G [ 2 6 15] 128M 512M 512M 128M 256M 2M 1M 2M 1G [ 2 8 7] 256M 16G 8G 128M 1G 1G 512M 512M 1G [ 2 9 9] 128M 2G 512M 16M 8M 32M 128M 32M 1G [ 2 11 3] 64M 1G 1G 128M 8M 4M 4M 2M 1G [ 3 2 6] 512M 8G 8G 128M 1G 1G 1G 512M 1G [ 3 3 10] 64M 2G 1G 64M 128M 16M 128M 64M 1G [ 3 11 2] 64M 256M 64M 128M 2M 1M 2M 512K 1G [ 3 11 14] 512M 16G 8G 128M 128M 512M 512M 1G 1G [ 4 1 9] 512K 512K 1M 512K 64M 32M 512K 512K 1G [ 4 7 15] 64M 512M 512M 128M 32M 512M 512M 256M 1G [ 4 8 5] 512M 16G 8G 128M 1G 1G 512M 512M 1G [ 5 2 6] 512M 16G 4G 128M 2G 1G 128M 1G 1G [ 5 8 4] 128M 128M 128M 8M 1G 512M 512M 512M 1G [ 5 14 12] 512M 8G 8G 128M 1G 1G 512M 512M 1G [ 6 2 3] 1G 16G 16G 128M 1G 1G 1G 1G 1G [ 6 2 5] 256M 4G 4G 128M 64M 512M 32M 64M 1G [ 6 2 11] 512M 8G 8G 128M 2G 1G 512M 1G 1G [ 6 3 15] 1G 16G 8G 128M 1G 64M 128M 64M 1G [ 6 14 11] 512M 2G 4G 64M 64M 1G 128M 128M 1G [ 7 1 8] 32M 128M 256M 32M 64M 16M 8M 16M 1G [ 7 2 2] 64M 256M 512M 64M 256M 64M 8M 32M 1G [ 7 2 10] 128M 512M 2G 128M 512M 64M 16M 64M 1G [ 7 2 14] 512M 16G 8G 128M 1G 1G 32M 128M 1G [ 7 8 2] 512M 8G 8G 128M 1G 256M 256M 512M 1G [ 7 10 10] 256M 4G 4G 128M 128M 128M 128M 256M 1G [ 7 15 8] 512K 1M 1M 256K 2M 256K 64K 256K 1G [ 8 1 7] 16M 64M 64M 16M 128M 16M 4M 16M 1G [ 8 2 1] 64M 128M 256M 16M 256M 16M 8M 8M 1G [ 8 5 13] 64M 128M 128M 128M 8M 32M 64M 512M 1G [ 8 7 15] 4M 16M 16M 8M 32M 8M 2M 4M 1G [ 8 9 13] 512M 8G 4G 128M 256M 512M 128M 256M 1G [ 8 15 7] 512K 1M 1M 256K 1M 256K 128K 256K 1G [ 9 1 4] 8M 8M 32M 4M 16M 64M 2M 8M 128M [ 9 2 14] 512M 8G 4G 128M 512M 128M 64M 128M 1G [ 9 8 14] 512M 16G 8G 128M 256M 128M 32M 256M 1G [ 9 9 2] 512M 4G 2G 128M 256M 256M 256M 512M 1G [10 2 7] 128M 512M 512M 128M 1G 64M 16M 32M 1G [10 3 3] 64M 512M 512M 128M 64M 256M 256M 128M 1G [10 3 11] 512M 8G 8G 128M 2G 1G 128M 512M 1G [10 5 13] 512M 16G 8G 128M 256M 32M 1G 32M 1G [10 7 11] 1G 16G 8G 128M 512M 1G 512M 512M 1G [10 10 7] 256M 4G 4G 128M 16M 128M 128M 256M 1G [11 1 2] 512M 16G 4G 128M 1G 512M 512M 512M 1G [11 1 14] 512M 16G 8G 128M 256M 512M 512M 512M 1G [11 2 6] 512M 16G 8G 128M 1G 512M 512M 256M 1G [11 3 10] 128M 2G 2G 128M 128M 256M 512M 32M 1G [11 7 10] 512M 16G 8G 128M 128M 256M 64M 128M 1G [11 8 12] 512M 16G 16G 128M 1G 1G 512M 512M 1G [11 10 12] 128M 8G 8G 128M 512M 256M 512M 512M 1G [11 11 14] 256M 16G 8G 128M 1G 512M 256M 512M 1G [11 14 6] 512M 8G 8G 128M 128M 512M 512M 256M 1G [12 4 15] 128M 1G 1G 128M 1G 512M 512M 512M 1G [12 8 11] 64M 256M 512M 128M 64M 32M 32M 8M 1G [12 10 11] 512M 1G 2G 128M 32M 512M 64M 64M 1G [12 10 13] 128M 8G 8G 128M 1G 512M 512M 512M 1G [12 14 5] 128M 4G 4G 128M 1G 1G 512M 512M 1G [13 3 14] 512M 4G 1G 128M 512M 512M 512M 512M 1G [13 5 8] 128M 1G 4G 128M 256M 512M 512M 1G 1G [13 5 10] 512M 16G 8G 128M 1G 1G 512M 1G 1G [13 9 8] 512M 2G 8G 128M 256M 1G 512M 1G 1G [13 10 12] 256M 512M 128M 128M 512M 512M 512M 512M 1G [13 11 14] 64M 4G 1G 128M 512M 512M 512M 512M 1G [13 12 14] 16M 1G 256M 128M 128M 512M 512M 512M 1G [13 13 14] 512M 4G 1G 128M 512M 512M 512M 256M 1G [14 1 11] 512M 8G 16G 128M 1G 1G 1G 256M 1G [14 2 7] 256M 16G 8G 128M 64M 1G 32M 64M 1G [14 2 9] 512M 16G 16G 128M 64M 128M 64M 128M 1G [14 3 13] 128M 16M 32M 128M 2M 8M 512K 512K 1G [14 8 9] 512M 16G 8G 128M 2G 1G 512M 512M 1G [14 11 3] 512M 16G 8G 128M 128M 512M 512M 512M 1G [14 11 11] 512M 2G 4G 128M 128M 256M 256M 1G 1G [14 11 13] 128M 32M 16M 128M 1M 512K 1M 512K 1G [14 12 13] 128M 16M 8M 128M 512K 512K 2M 512K 1G [14 13 13] 32M 16M 16M 128M 1M 1M 2M 512K 1G [15 1 2] 512M 512M 4G 128M 32M 64M 64M 256M 1G [15 3 6] 512M 16G 16G 128M 128M 128M 512M 64M 1G [15 4 12] 256M 512M 1G 128M 512M 512M 512M 512M 1G [15 6 2] 512M 16G 8G 128M 512M 128M 256M 512M 1G [15 7 4] 1G 8G 2G 128M 256M 512M 512M 512M 1G [15 7 8] 32M 128M 128M 128M 512M 512M 128M 256M 1G
Xoroshiro32+p PractRand Score Table - Run 2017-11-07 22:30:28 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 Byte0 msBit ===================================================================================== [ 1 2 8] 32M 64M 128M 64M 128M 4M 128M 64M 1G [ 2 1 11] 64M 128M 64M 128M 32M 256M 512M 2G 1G [ 2 1 15] 128M 32M 16M 2M 512K 512K 1M 4M 1G [ 2 2 7] 128M 128M 128M 128M 128M 8M 32M 128M 1G [ 2 6 15] 256M 512M 256M 2G 32M 1M 2M 4M 1G [ 2 8 7] 256M 8G 4G 1G 512M 512M 512M 1G 1G [ 2 9 9] 256M 1G 512M 4M 16M 128M 32M 32M 1G [ 2 11 3] 128M 256M 256M 64M 4M 4M 2M 2M 1G [ 3 2 6] 512M 8G 4G 1G 1G 1G 512M 2G 1G [ 3 3 10] 256M 128M 1G 64M 32M 128M 64M 64M 1G [ 3 11 2] 128M 128M 128M 4M 1M 2M 512K 512K 1G [ 3 11 14] 512M 16G 16G 2G 128M 512M 1G 4G 1G [ 4 1 9] 512K 512K 512K 512K 256M 512K 512K 512K 512M [ 4 7 15] 64M 512M 512M 32M 128M 512M 256M 1G 1G [ 4 8 5] 1G 8G 8G 1G 1G 512M 512M 4G 1G [ 5 2 6] 512M 8G 4G 1G 1G 128M 1G 2G 1G [ 5 8 4] 128M 128M 256M 8M 1G 512M 512M 2G 1G [ 5 14 12] 512M 8G 8G 512M 1G 512M 512M 2G 512M [ 6 2 3] 1G 16G 4G 1G 1G 1G 1G 2G 512M [ 6 2 5] 512M 4G 2G 2G 128M 32M 64M 2G 1G [ 6 2 11] 512M 8G 2G 1G 512M 512M 1G 2G 1G [ 6 3 15] 1G 8G 4G 1G 32M 128M 64M 256M 1G [ 6 14 11] 1G 4G 2G 128M 64M 128M 128M 256M 256M [ 7 1 8] 64M 128M 128M 16M 64M 8M 16M 32M 1G [ 7 2 2] 256M 512M 512M 64M 128M 8M 32M 128M 1G [ 7 2 10] 256M 512M 512M 128M 256M 16M 64M 512M 1G [ 7 2 14] 512M 4G 1G 1G 1G 32M 128M 512M 128M [ 7 8 2] 512M 8G 4G 1G 512M 256M 512M 2G 1G [ 7 10 10] 512M 2G 4G 256M 128M 128M 256M 512M 1G [ 7 15 8] 512K 512K 512K 256K 2M 64K 256K 256K 1G [ 8 1 7] 32M 128M 64M 8M 64M 4M 16M 32M 1G [ 8 2 1] 32M 64M 64M 32M 128M 8M 8M 16M 128M [ 8 5 13] 64M 128M 128M 64M 32M 64M 512M 512M 1G [ 8 7 15] 4M 16M 16M 4M 32M 2M 4M 4M 1G [ 8 9 13] 512M 16G 4G 1G 64M 128M 256M 2G 1G [ 8 15 7] 256K 1M 512K 128K 1M 128K 256K 128K 512M [ 9 1 4] 8M 2M 16M 2M 16M 2M 8M 8M 512M [ 9 2 14] 512M 16G 4G 512M 512M 64M 128M 256M 1G [ 9 8 14] 512M 16G 8G 512M 1G 32M 256M 512M 512M [ 9 9 2] 1G 4G 512M 128M 512M 256M 512M 2G 1G [10 2 7] 256M 256M 256M 128M 128M 16M 32M 128M 1G [10 3 3] 256M 128M 256M 16M 32M 256M 128M 1G 1G [10 3 11] 1G 4G 4G 2G 1G 128M 512M 2G 1G [10 5 13] 512M 16G 4G 512M 64M 1G 32M 64M 1G [10 7 11] 1G 8G 8G 1G 512M 512M 512M 1G 1G [10 10 7] 512M 2G 2G 32M 128M 128M 256M 256M 1G [11 1 2] 512M 2G 4G 1G 512M 512M 512M 2G 1G [11 1 14] 512M 8G 4G 1G 256M 512M 512M 2G 1G [11 2 6] 1G 8G 4G 1G 1G 512M 256M 2G 1G [11 3 10] 512M 128M 2G 1G 256M 512M 32M 256M 1G [11 7 10] 512M 8G 8G 256M 128M 64M 128M 256M 1G [11 8 12] 512M 8G 4G 1G 1G 512M 512M 4G 1G [11 10 12] 256M 16G 4G 512M 512M 512M 512M 1G 1G [11 11 14] 512M 16G 8G 1G 1G 256M 512M 1G 1G [11 14 6] 512M 8G 4G 256M 256M 512M 256M 512M 1G [12 4 15] 128M 512M 512M 1G 1G 512M 512M 2G 1G [12 8 11] 64M 64M 128M 512M 64M 32M 8M 32M 1G [12 10 11] 512M 1G 256M 512M 64M 64M 64M 2G 1G [12 10 13] 256M 16G 4G 64M 1G 512M 512M 2G 1G [12 14 5] 256M 4G 2G 1G 1G 512M 512M 2G 1G [13 3 14] 512M 4G 1G 1G 512M 512M 512M 4G 1G [13 5 8] 256M 512M 1G 512M 64M 512M 1G 2G 1G [13 5 10] 512M 8G 4G 1G 64M 512M 1G 2G 1G [13 9 8] 512M 2G 2G 1G 128M 512M 1G 2G 1G [13 10 12] 512M 512M 128M 1G 512M 512M 512M 1G 1G [13 11 14] 128M 2G 256M 1G 512M 512M 512M 128M 1G [13 12 14] 32M 512M 256M 32M 512M 512M 512M 4G 1G [13 13 14] 512M 2G 512M 512M 512M 512M 256M 1G 1G [14 1 11] 1G 8G 4G 1G 512M 1G 256M 1G 1G [14 2 7] 512M 8G 8G 512M 1G 32M 64M 512M 1G [14 2 9] 1G 8G 8G 128M 256M 64M 128M 256M 1G [14 3 13] 256M 16M 8M 256M 4M 512K 512K 16M 1G [14 8 9] 512M 16G 8G 1G 1G 512M 512M 1G 1G [14 11 3] 512M 16G 8G 512M 256M 512M 512M 1G 1G [14 11 11] 512M 2G 2G 512M 512M 256M 1G 1G 1G [14 11 13] 256M 32M 16M 256M 1M 1M 512K 2M 1G [14 12 13] 256M 16M 8M 8M 512K 2M 512K 2M 1G [14 13 13] 64M 16M 8M 8M 512K 2M 512K 2M 1G [15 1 2] 512M 512M 2G 512M 256M 64M 256M 512M 1G [15 3 6] 1G 16G 8G 2G 1G 512M 64M 256M 1G [15 4 12] 512M 1G 256M 2G 512M 512M 512M 1G 1G [15 6 2] 512M 16G 4G 1G 512M 256M 512M 1G 1G [15 7 4] 512M 4G 1G 1G 512M 512M 512M 2G 1G [15 7 8] 32M 128M 128M 1G 512M 128M 256M 2G 1G
Xoroshiro32+p PractRand Score Table - Run 2017-11-08 00:50:09 +1300 Combination Word0 Word1 Word2 msByte Byte04 Byte2 Byte1 Byte0 msBit ===================================================================================== [ 1 2 8] 32M 64M 128M 16M 32M 256M 128M 4M 1G [ 2 1 11] 64M 128M 64M 128M 64M 512M 512M 128M 1G [ 2 1 15] 32M 16M 16M 512K 512K 512K 1M 1M 1G [ 2 2 7] 32M 64M 64M 32M 256M 64M 32M 8M 1G [ 2 6 15] 64M 256M 256M 1G 64M 4M 1M 1M 1G [ 2 8 7] 256M 16G 8G 2G 1G 512M 1G 128M 1G [ 2 9 9] 64M 512M 512M 256M 4M 16M 32M 128M 1G [ 2 11 3] 128M 1G 512M 256M 8M 8M 8M 8M 1G [ 3 2 6] 64M 8G 4G 2G 512M 512M 1G 128M 1G [ 3 3 10] 128M 256M 256M 512M 64M 64M 64M 128M 1G [ 3 11 2] 64M 256M 64M 64M 4M 4M 4M 4M 1G [ 3 11 14] 512M 8G 4G 2G 128M 128M 512M 128M 1G [ 4 1 9] 512K 512K 1M 256K 32M 32M 512K 512K 1G [ 4 7 15] 64M 512M 512M 1G 32M 256M 256M 128M 1G [ 4 8 5] 512M 8G 16G 1G 512M 512M 512M 128M 1G [ 5 2 6] 512M 8G 4G 2G 1G 1G 512M 64M 1G [ 5 8 4] 128M 256M 128M 8M 512M 512M 512M 128M 1G [ 5 14 12] 256M 2G 2G 1G 256M 1G 512M 128M 1G [ 6 2 3] 128M 8G 8G 2G 2G 1G 1G 128M 1G [ 6 2 5] 512M 4G 8G 128M 64M 32M 32M 32M 1G [ 6 2 11] 512M 128M 8G 1G 512M 128M 1G 128M 1G [ 6 3 15] 512M 16G 8G 2G 1G 64M 32M 128M 1G [ 6 14 11] 512M 16G 8G 1G 128M 512M 512M 128M 1G [ 7 1 8] 16M 64M 128M 16M 32M 32M 16M 8M 1G [ 7 2 2] 32M 128M 512M 64M 512M 64M 32M 8M 1G [ 7 2 10] 64M 512M 1G 128M 256M 128M 64M 16M 1G [ 7 2 14] 512M 16G 8G 1G 1G 256M 64M 64M 1G [ 7 8 2] 512M 16G 8G 1G 128M 512M 512M 128M 1G [ 7 10 10] 512M 2G 2G 128M 64M 64M 128M 128M 1G [ 7 15 8] 256K 1M 512K 128K 512K 256K 256K 64K 1G [ 8 1 7] 16M 64M 64M 16M 32M 32M 16M 4M 1G [ 8 2 1] 16M 64M 64M 32M 32M 64M 32M 8M 1G [ 8 5 13] 64M 128M 128M 64M 8M 16M 512M 32M 1G [ 8 7 15] 4M 8M 8M 2M 2M 4M 4M 2M 1G [ 8 9 13] 512M 8G 4G 1G 512M 512M 128M 128M 1G [ 8 15 7] 256K 512K 512K 128K 256K 256K 256K 128K 1G [ 9 1 4] 8M 8M 32M 2M 8M 256M 8M 2M 128M [ 9 2 14] 512M 16G 8G 2G 256M 128M 64M 64M 1G [ 9 8 14] 512M 16G 8G 128M 256M 128M 128M 64M 1G [ 9 9 2] 512M 2G 2G 64M 1G 512M 512M 128M 1G [10 2 7] 32M 256M 512M 128M 256M 64M 32M 16M 1G [10 3 3] 64M 256M 256M 256M 256M 256M 64M 128M 1G [10 3 11] 512M 2G 4G 1G 1G 512M 128M 128M 1G [10 5 13] 512M 8G 8G 512M 1G 32M 32M 128M 1G [10 7 11] 512M 16G 16G 1G 256M 512M 512M 128M 1G [10 10 7] 256M 1G 2G 256M 16M 64M 256M 128M 1G [11 1 2] 512M 16G 8G 2G 1G 512M 512M 128M 1G [11 1 14] 512M 8G 4G 1G 512M 512M 512M 128M 1G [11 2 6] 256M 256M 8G 1G 512M 128M 512M 128M 1G [11 3 10] 256M 512M 512M 1G 128M 512M 1G 128M 1G [11 7 10] 512M 16G 8G 1G 1G 512M 256M 64M 1G [11 8 12] 512M 16G 8G 1G 512M 512M 512M 128M 1G [11 10 12] 512M 8G 4G 2G 256M 256M 256M 128M 1G [11 11 14] 512M 8G 1G 1G 512M 512M 512M 128M 1G [11 14 6] 512M 8G 4G 1G 512M 512M 512M 128M 1G [12 4 15] 128M 512M 1G 1G 512M 512M 512M 128M 1G [12 8 11] 64M 64M 128M 1G 256M 8M 8M 32M 1G [12 10 11] 256M 8G 4G 2G 256M 32M 32M 32M 1G [12 10 13] 512M 16G 8G 1G 512M 512M 512M 128M 1G [12 14 5] 128M 4G 4G 2G 1G 512M 512M 128M 1G [13 3 14] 512M 4G 1G 2G 512M 512M 512M 128M 1G [13 5 8] 128M 1G 2G 2G 1G 128M 1G 128M 1G [13 5 10] 512M 8G 8G 512M 512M 1G 512M 128M 1G [13 9 8] 512M 2G 8G 1G 128M 256M 512M 128M 1G [13 10 12] 128M 128M 128M 2G 512M 128M 1G 128M 1G [13 11 14] 512M 2G 512M 1G 512M 512M 256M 128M 1G [13 12 14] 512M 1G 512M 1G 512M 512M 512M 128M 1G [13 13 14] 512M 1G 512M 1G 512M 256M 256M 128M 1G [14 1 11] 512M 16G 8G 2G 512M 512M 1G 128M 1G [14 2 7] 512M 16G 8G 64M 1G 128M 32M 32M 1G [14 2 9] 512M 16G 8G 1G 256M 128M 64M 64M 1G [14 3 13] 128M 64M 64M 2G 8M 1M 512K 256K 1G [14 8 9] 512M 16G 8G 1G 512M 1G 512M 128M 1G [14 11 3] 512M 16G 8G 1G 512M 512M 512M 128M 1G [14 11 11] 512M 2G 2G 256M 128M 512M 512M 128M 1G [14 11 13] 128M 16M 8M 1G 256K 256K 256K 256K 1G [14 12 13] 128M 8M 8M 1G 256K 256K 256K 512K 1G [14 13 13] 32M 8M 4M 4M 256K 256K 256K 256K 1G [15 1 2] 256M 512M 512M 256M 32M 256M 256M 64M 1G [15 3 6] 512M 16G 8G 1G 1G 64M 32M 128M 1G [15 4 12] 128M 2G 2G 1G 512M 512M 512M 128M 1G [15 6 2] 512M 16G 8G 1G 512M 512M 256M 128M 1G [15 7 4] 512M 4G 2G 1G 128M 512M 512M 128M 1G [15 7 8] 32M 64M 128M 2G 512M 512M 256M 128M 1G
I don't know what you'd be doing to wasting your time or not waste your time.
But if your trying xoroshiro32plus with a parity bit replacing one bit of output, could you instead have the parity bit act as the carry-in on the addition that produces the output? That might be better, maybe.
The idea of using the parity is a bit of a novelty and it's probably not easy to see in all our previous discussions where the parity comes from. It's the parity of the full 17-bit result after the addition of the two 16-bit halves of the state.
Many thanks for today's results, Evan. I think you have done enough tests.
Here is a summary of the most important scores for the top two triplets with three combinations of sum and parity:
Top Bot Top Bot Triplet Word Byte Byte Bit Bit Iter. PRN[15:0] = ------------------------------------------------------------------ [14,2,7] 512M 1G 512M 1G 1G single [15,3,6] 512M 512M 1G 1G 1G " sum[15:1],parity [14,2,7] 512M 256M 1G 1G 1G double [15,3,6] 1G 1G 2G 1G 1G " ------------------------------------------------------------------ [14,2,7] 512M 256M 512M 1G 1G single [15,3,6] 512M 512M 1G 1G 1G " sum[15:10],sum[1],sum[8:2],sum[9],parity [14,2,7] 512M 512M 512M 1G 1G double [15,3,6] 1G 2G 256M 1G 1G " ------------------------------------------------------------------ [14,2,7] 512M 256M 128M 1G ? single [15,3,6] 512M 512M 128M 1G ? " parity,sum[15:1] [14,2,7] 512M 64M 32M 1G ? double [15,3,6] 512M 1G 128M 1G ? " ------------------------------------------------------------------
Word size is 16 bits. Parity as msb has the worst scores but was a convenient way of doing a parity-only test and the top bit here has been copied to the bottom bit of the other two sets. Swapping bits 1 and 9 has worse scores overall than not swapping which leaves parity as lsb as the best option and here it is again on its own:
Top Bot Top Bot Triplet Word Byte Byte Bit Bit Iter. PRN[15:0] = ------------------------------------------------------------------ [14,2,7] 512M 1G 512M 1G 1G single [15,3,6] 512M 512M 1G 1G 1G " sum[15:1],parity [14,2,7] 512M 256M 1G 1G 1G double [15,3,6] 1G 1G 2G 1G 1G " ------------------------------------------------------------------
XORO32 does a double iteration and concatenates two 16-bit sum & parity results. Treating those as one long PRN, [14,2,7] is better in the top bytes and [15,3,6] better in the bottom bytes but otherwise they score the same.
If the best possible word or byte PRN is wanted from each XORO32 instruction, then [15,3,6] is clearly superior. The word score of 1G is the best since using all 16 bits and the bottom byte score of 2G is the highest we have had.
We have always relied on the PractRand scores and they are saying choose [15,3,6].
If we're going to start down this path we should do a decent amount of it to see what else comes up.
And our output is still {sum[15:1], ^sum[16:0]}, right?
wire [15:0] xoro32z = d[31:16] ^ d[15:0]; // first iteration wire [31:0] xoro32y = { xoro32z[9:0], xoro32z[15:10], {d[0], d[15:1]} ^ {xoro32z[12:0], 3'b0} ^ xoro32z }; wire [15:0] xoro32x = xoro32y[31:16] ^ xoro32y[15:0]; // second iteration wire [31:0] xoro32 = { xoro32x[9:0], xoro32x[15:10], // xoro32 = d result {xoro32y[0], xoro32y[15:1]} ^ {xoro32x[12:0], 3'b0} ^ xoro32x }; wire [16:0] xoro32a = xoro32y[31:16] + xoro32y[15:0]; // first sum wire [16:0] xoro32b = xoro32[31:16] + xoro32[15:0]; // second sum assign xoro32r = { xoro32b[15:1], ^xoro32b, // xoro32r = prng result, next instruction's s value xoro32a[15:1], ^xoro32a };
I think you have done enough tests for the decision about switching to [15,3,6]. Someone should double check my summary, though.
You could try swapping bits 1 and 8/10/11/12/13/14 but swapping bits 1 and 9 made things worse overall.
Yes, yes and yes, I reckon.
I'll run my prog to get some states and sums when seed is $0000_0001.
54658048 51BE8BA2 35E3784F AFD08E7D 629FE601 946714EB 302C36C5 5AF40E3A 71FD53EE 1D5B5B24 87F15D3B B5F7674D DFED8C27 F9C387AA 8977BE03 777DEC19
Do these look correct?
You are somewhat cherry picking. There is other triplets that also look better than either of those two when the bits are shuffled appropriately.
Yes, my first 32 results:
54658048 51BE8BA2 35E3784F AFD08E7D 629FE601 946714EB 302C36C5 5AF40E3A 71FD53EE 1D5B5B24 87F15D3B B5F7674D DFED8C27 F9C387AA 8977BE03 777DEC19 FF7837B4 A62D7B9D E33DBEC7 52C6AD4E 34ED75B5 14D188AA A63DD965 C193B6BB AFF7466C 06037367 8FD6A4C4 48EB868F 3DE8A410 B342C19D 97C72985 394CD5BC
I chose the triplets that performed best in the three sets of tests that you did as listed in my summary post above. However, in my view the most important set is the first one and we need the best triplet for both single and double iterations when bit 0 is parity, i.e. just a single bit change from the original algorithm. I don't like the idea of shuffling bits about after the addition unless that gives better scores, in which case there is probably something wrong with the algorithm. What I am saying needs checking but I'm confident that [15,3,6] scores the best.
Links to test results:
Bit 0 is parity, single iteration
http://forums.parallax.com/discussion/comment/1423900/#Comment_1423900
http://forums.parallax.com/discussion/comment/1423918/#Comment_1423918 (bottom byte added)
http://forums.parallax.com/discussion/comment/1424022/#Comment_1424022 (bit 0 test)
Bit 0 is parity, double iteration
http://forums.parallax.com/discussion/comment/1424903/#Comment_1424903
Bit 0 is parity, bits 1 and 9 swapped, single iteration
http://forums.parallax.com/discussion/comment/1423918/#Comment_1423918
Bit 0 is parity, bits 1 and 9 swapped, double iteration
http://forums.parallax.com/discussion/comment/1425052/#Comment_1425052
Bit 15 is parity, single iteration
http://forums.parallax.com/discussion/comment/1424022/#Comment_1424022
Bit 15 is parity, double iteration
http://forums.parallax.com/discussion/comment/1425053/#Comment_1425053
JUMP_64K = 1A57604D JUMP_128K = 2FD3E1B9 Bit 31 is the first bit to test.
The jump function needs single-stepping of states which XORO32 can no longer do, unless it is modified. The easiest way, avoiding use of CZ opcode bits, would be to move XORO32 to an empty D,S slot with S selecting single or double iterations.
Here are the first 32 * 64K jumps from state 00000001:
64K*n jumps for xoroshiro32+ [15,3,6] n, state 0, 00000001 1, 90B3FECF 2, 86FD9DC1 3, 90C5FFCF 4, EA7B0DC1 5, 432C08B9 6, 01186857 7, AFD000FD 8, 9F5ABCD0 9, BAC6246B 10, 0AAAD5CF 11, 1DDB13D4 12, E603101A 13, A61BF445 14, AF3D09E0 15, E4C59E46 16, 8785CE09 17, B80FDDE9 18, 38C4CEC4 19, 4BF8A52C 20, 8EC2FF96 21, F0BCA253 22, B0DFADBA 23, 0099D500 24, 43282641 25, 9F272A87 26, BB131234 27, 96833C01 28, 89D2AE8F 29, A726D929 30, D641DBF1 31, F71F077F 32, 98569497
EDIT:
Added 64K jumps.