On the above screenshot, the CPU Load graph is a sum of all 16 of the CPU's threads, which is just 6.25% of the vertical each. So you can see the vertical wall jumps up when all 15 iterator tasks are launched together (16 running tasks would have filled all 16 threads and hit 100%) and then the stepping down as they finish one by one.

PS: The three colours (Blue, Red, and Yellow) are, respectively, system tasks, user tasks, and "niced" tasks.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

One thing I realized was kind of neat about this xoroshiro topology, as opposed to a simple LFSR, is that by summing its halves together, it's possible to output a zero, even though its internal state is always non-0. And iterating this topology is just as fast as an LFSR, because there's only some shallow XOR operations. The adder gives it its final high quality output, but is not needed for iteration.

You know, you now have a toolset which you can use to come up with new topologies and prove them in the manageable 32-bit domain.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

My favourite of that lot is {8,2,3}. It conforms nicely and I do like that 64MB score.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

To save trying to deliver all three lots of 1331 files, for a total of 53.6 MB, of the much smaller s12 scoring run, here's a directory listing of the s12 byte scores. They're sorted by file size, suitable as a summary of scores. File sizes are listed in left most column in rounded up KByte.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

The hardware xoroshiro generator will create the next value in its sequence every system clock and the tests you have done show that such sequences are close to random but it is unlikely and perhaps impossible for a cog to read every consecutive value.

A cog that needs a regular flow of random numbers without using its own code will be sampling the xoroshiro output at interval n, where n > 1 probably. How do the tests perform for n > 1, for example 8, 16 or 32? Or 7, 15 or 31? Will some of the 63 bits be less random as a result and if so which ones?

The lsb of the summing component is avoided for that very reason. Not all bits are equal for sure. That'll be why the msb bit-truncated scoring is so much flatter than the other two variants. Though, generally the quality is good according to PractRand. I'd think any poor bit positions would trigger a fail in PractRand.

I've been pondering digging up the TestU01 BigCrush suite but haven't got around to it.

As for regular interval sampling, no idea. I'd be surprised if anything shows up but it's certainly something to test ...

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

I've just done some other iteration runs, namely s15 and s17. s15 took 1.5 minutes while s17 took a surprising 30 minutes using the Ryzen flat out! Even the dynamic clocking settled on full rate for every core for the duration - not a bump to be seen. Larger sizes are going to explode. Whaaa!

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

Chip,
It don't look promising. Here's the modified generator code. Attached is the s12 max iterate findings without any filter. It's across the board rapid terminations in less than 100 120 iterations. EDIT: And the scoring is 100% instant fail on every combination. EDIT2: Lol, we have actually have 3 of the 1331 scores that just scraped passed an instant fail.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

s0 = S[0]
s1 = S[1]
S[0] = s0 rol a ^ s0 ^ s1 ^ s0 shl b ^ s1 shl b
S[1] = s0 rol c ^ s1 rol c

You can imagine all the possibilities of rotates, shifts, and XORs. I imagine the xoroshiro topology economized by using the 's1 ^= s0' operation early. This made it practical for efficient software implementation. For hardware, all that matters is that the net XOR depth is not high. Bits don't even need to be ordered. What kind of quality can be economically achieved without software-constrained thinking?

I threw away the call to next() and almost eliminated the intermediate steps from it. EDIT: When I shrunk it down to just that loop I had to declare s0 and s1 as statics to prevent GCC from optimising that loop into oblivion.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

The compilers are too smart now. Out-thinking them is an increasingly tenuous art.

I think the quality of xoroshiro can be attributed to two things:

1) an LFSR-type iterator
2) an adder used only for output

Those are the take-aways.

The xoroshiro algorithm lies at the intersection of a few critical concepts and what can be practically computed by a CPU. If the core matters were to be optimized for gate count, it would look a lot different. But what would it be?

The hardware xoroshiro generator will create the next value in its sequence every system clock and the tests you have done show that such sequences are close to random but it is unlikely and perhaps impossible for a cog to read every consecutive value.

A cog that needs a regular flow of random numbers without using its own code will be sampling the xoroshiro output at interval n, where n > 1 probably. How do the tests perform for n > 1, for example 8, 16 or 32? Or 7, 15 or 31? Will some of the 63 bits be less random as a result and if so which ones?

If anyone thinks it's worthwhile, a couple of tests with n > 1 should be enough, not too close together, say n = 16 and n = 256.

That would be just me - unless you can handle a bunch of Bash scripts?

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

The hardware xoroshiro generator will create the next value in its sequence every system clock and the tests you have done show that such sequences are close to random but it is unlikely and perhaps impossible for a cog to read every consecutive value.

A cog that needs a regular flow of random numbers without using its own code will be sampling the xoroshiro output at interval n, where n > 1 probably. How do the tests perform for n > 1, for example 8, 16 or 32? Or 7, 15 or 31? Will some of the 63 bits be less random as a result and if so which ones?

If anyone thinks it's worthwhile, a couple of tests with n > 1 should be enough, not too close together, say n = 16 and n = 256.

I think the PRNG test batteries check at bit intervals, so this should be covered, already, by xoroshiro128+.

That would be just me - unless you can handle a bunch of Bash scripts?

I've never done Bash scripts. I don't want you to do any work on Easter, either. I'm just pondering what kind of high-quality 32-bit PRNG may be possible in hardware (without software-constrained thinking).

I've started an s12 scoring run at iteration intervals of 16. It's generating decent output so far, actually is getting some 128 MB scores. Will take another 30 minutes I think ...

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

I don't want you to do any work on Easter, either.

This is my holiday fun. I've not had anything I can quickly sink my teeth into for a very long time.

I'm just pondering what kind of high-quality 32-bit PRNG may be possible in hardware (without software-constrained thinking).

I'm not sure there is any real shortcuts as such. Just packing it down for small space, or chopping it up for power efficiency is about all I can imagine.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

I don't want you to do any work on Easter, either.

This is my holiday fun. I've not had anything I can quickly sink my teeth into for a very long time.

I'm just pondering what kind of high-quality 32-bit PRNG may be possible in hardware (without software-constrained thinking).

I'm not sure there is any real shortcuts as such. Just packing it down for small space, or chopping it up for power efficiency is about all I can imagine.

Think about this matter of rotating and shifting. The important part is that bits are changing places, not that they are necessarily all moving in a certain direction. The direction business comes from what a CPU can efficiently do.

I've never written a Bash script in my life before. I'll credit PractRand documentation for getting me started with pipes and automating things. And Google/Stackexchange for giving the answers on doing stuff with Bash, and C for that matter. It's easier to ask Google than try to find the answers in the manuals.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

Imagine a few different bit remaps, instead of those rotates. That's what those SEUSSF/SEUSSF instructions do, with ~half the bits being NOT'd.

The HDL compilers will do well on optimising fixed value rotates, it'll boil down to a fixed mapping too.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

Oh, now I get it. Effectively perform all ops on a register in a single op and maybe make it even better to boot. EDIT: Of course it's anyone's guess as to what changes would be a good improvement. Best I can do is pure trial and error.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

## Comments

8,033PS: The three colours (Blue, Red, and Yellow) are, respectively, system tasks, user tasks, and "niced" tasks.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,73411,734You know, you now have a toolset which you can use to come up with new topologies and prove them in the manageable 32-bit domain.

8,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

73A cog that needs a regular flow of random numbers without using its own code will be sampling the xoroshiro output at interval n, where n > 1 probably. How do the tests perform for n > 1, for example 8, 16 or 32? Or 7, 15 or 31? Will some of the 63 bits be less random as a result and if so which ones?

8,033I've been pondering digging up the TestU01 BigCrush suite but haven't got around to it.

As for regular interval sampling, no idea. I'd be surprised if anything shows up but it's certainly something to test ...

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,734If that ROL+ROL+SHL activity were changed to ROL+ROL+ROL, then the algorithm would become reversible, but would it still be any good?

8,033It don't look promising. Here's the modified generator code. Attached is the s12 max iterate findings without any filter. It's across the board rapid terminations in less than 100 120 iterations. EDIT: And the scoring is 100% instant fail on every combination. EDIT2: Lol, we have actually have 3 of the 1331 scores that just scraped passed an instant fail.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,73411,734s1 = S[1]

s1 ^= s0;

s[0] = (rotl( s0, CONSTANT_A ) ^ s1 ^ rotl( s1, CONSTANT_B )) & ACCUM_MASK;

s[1] = rotl(s1, CONSTANT_C);

...Could be rewritten as...

s0 = S[0]

s1 = S[1]

S[0] = s0 rol a ^ s0 ^ s1 ^ s0 shl b ^ s1 shl b

S[1] = s0 rol c ^ s1 rol c

You can imagine all the possibilities of rotates, shifts, and XORs. I imagine the xoroshiro topology economized by using the 's1 ^= s0' operation early. This made it practical for efficient software implementation. For hardware, all that matters is that the net XOR depth is not high. Bits don't even need to be ordered. What kind of quality can be economically achieved without software-constrained thinking?

8,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,734I think the quality of xoroshiro can be attributed to two things:

1) an LFSR-type iterator

2) an adder used only for output

Those are the take-aways.

The xoroshiro algorithm lies at the intersection of a few critical concepts and what can be practically computed by a CPU. If the core matters were to be optimized for gate count, it would look a lot different. But what would it be?

73If anyone thinks it's worthwhile, a couple of tests with n > 1 should be enough, not too close together, say n = 16 and n = 256.

8,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,734I think the PRNG test batteries check at bit intervals, so this should be covered, already, by xoroshiro128+.

11,734I've never done Bash scripts. I don't want you to do any work on Easter, either. I'm just pondering what kind of high-quality 32-bit PRNG may be possible in hardware (without software-constrained thinking).

8,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,033I'm not sure there is any real shortcuts as such. Just packing it down for small space, or chopping it up for power efficiency is about all I can imagine.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,734Think about this matter of rotating and shifting. The important part is that bits are changing places, not that they are necessarily all moving in a certain direction. The direction business comes from what a CPU can efficiently do.

11,7348,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,73411,7348,033By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."