Hmm, here's one example from the spreadsheet documentation:

EEEE 1001111 111 DDDDDDDDD 000000111 RGBEXP D Expand 5:6:5 RGB value in S[15:0] into 8:8:8 value in D[31:8]. D = {S[15:11,15:13], S[10:5,10:9], S[4:0,4:2], 8'b0}.

How is the S[15:0] addressing specified?

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

Chip,
Totally difference matter - I was just reading the original Xoroshiro128+ source again and noted a point stated about compilers will optimise the convoluted shifts and masks into a rotate instruction where they can. I knew that wasn't much help for my code since the word size is definable and therefore often won't fit the target processor ... then it dawned on me that that's exactly what a rotate instruction should be able to handle!

Hmm, I'm pondering the idea of adding a definable mask to the output of the ALU. Have a hidden register that holds the mask.

Its impact would be wide ranging and the details will take time to sort out - Something to think about for the Prop3 I guess.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

Totally difference matter - I was just reading the original Xoroshiro128+ source again and noted a point stated about compilers will optimise the convoluted shifts and masks into a rotate instruction where they can. I knew that wasn't much help for my code since the word size is definable and therefore often won't fit the target processor ... then it dawned on me that that's exactly what a rotate instruction should be able to handle!

What exactly are you suggesting here ?

There are two forms of Rotate in MCUs, one rotates a single-step, and another takes the rotate-count as a parameter.
It would be practical to modify the first form to have a reach, so allow ROTATE of any element width from 2-32 bits, and that would have immediate use for Rotate of 8b and 16b elements.
However, I'm not sure a Rotate with two params, one for rotate-count, and another for rotate-reach would be practical, seems to be quite logic hungry.

The rotate logic probably would be bit hideous, haha.

Don't worry, I was just musing on record.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

The rotate logic probably would be bit hideous, haha.
Don't worry, I was just musing on record.

ok.
The single-bit form could be useful, (and compact logic) if combined with REP.
Trades off a little speed for logic savings.
Saves logic, code is compact and covers any reach and any count, for those cases that require it.
I think REP can manage multiple lines, for > 32b, N shifts ?

Hmm, here's one example from the spreadsheet documentation:

EEEE 1001111 111 DDDDDDDDD 000000111 RGBEXP D Expand 5:6:5 RGB value in S[15:0] into 8:8:8 value in D[31:8]. D = {S[15:11,15:13], S[10:5,10:9], S[4:0,4:2], 8'b0}.

How is the S[15:0] addressing specified?

The RGB value to be expanded is in D[15:0] not S[15:0].

Ah, thanks Oz, docs not quite right then. So that means they can go with the single operand instructions - freeing that slot for something else of need.

And vanishing SFUNC from the list is a nice bonus too.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

I've been running some more tests and not doing so well ... I sat down and went over the numbers and worked out that, give or take quite a large margin for case variations, it's a 32x time multiplier per word size increase to check every combination. So the difference in time taken between a s16 full combination search and a s32 full combination search is a 32^16 = 1.2x10^24 (quadrillion, apparently) multiplier!

It's impractical to use 100% brute force beyond about 20 bits word size.

This issue applies heavily to the max-iteration search. I've tried to reduce the burden by limiting the combination ranges but it's not anywhere near enough. The inner most loop is a 4x per bit as a starting point.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

It seemed to me that we were both able to discover all max-length shift settings for xoroshiro32+, and then later you tested them all out for quality and determined that a single one was best, right?

Yep, that's all good and sorted - I call that a word size of 16. I'm just trying to make these tools useful to 32 bits (Xoroshiro64+) at least.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

dat org
bmask dirb,#15 'drive LEDs
loop getword outb,state,#1 'output sum to LEDs (LSB is low-quality)
add outb,state
waitx ##80_000_000/10
xoro32 state 'iterate xoroshiro32+
jmp #loop
state long 1 'seed {s1,s0} with 1

I checked it for max-run-length and it certainly looks random. This is the 14,2,7 set.

That's the whole iterator, now realized in the XORO32 instruction. It uses any cog register to track the 32-bit PRNG state. Evanh found that the ROL/SHL settings used in this implementation cause the PRNG to pass all random quality tests out to 1GB, whether from using the top bit, the top byte, or the top 15 bits of the sum of the two 16-bit fields that comprise the state. This is golden.

This all means that you can now have high-quality seedable and repeatable pseudo-random number generators in PASM, without having to implement any algorithm. This was modeled after the best PRNG topology known on the interwebs, and the very best settings were determined through a lot of testing. Thanks to Ahle, Evanh, and whoever came up with the xoroshiro128+ algorithm!

So that's just another instruction now, as in it's executed through the ALU using D source and D result. It's quite neat to think how compact that becomes.

PS: I've been offline for a week or so. Had the flu - couldn't sleep for a couple of days, then managed to sleep with an eyelash, or something in me eye. Damaged my eye enough that I couldn't sleep another night, went to doctor, got cleaned up but couldn't bare even looking at a display. Just slept for days.

All good now though.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

So that's just another instruction now, as in it's executed through the ALU using D source and D result. It's quite neat to think how compact that becomes.

PS: I've been offline for a week or so. Had the flu - couldn't sleep for a couple of days, then managed to sleep with an eyelash, or something in me eye. Damaged my eye enough that I couldn't sleep another night, went to doctor, got cleaned up but couldn't bare even looking at a display. Just slept for days.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

I may as well dump out all the scores I've generated myself I guess ... score tables for word sizes 12 to 20 are attached. s20 took a couple of days on a Ryzen 8-core running at 3800MHz on all cores.

Next step is look into using GPU for doing the full-period candidate searches ... The CPU takes enormous time when the word size is notched up - With respect to the word size, the per-combination run time is a square law 4^x exponential law and the number of combinations to search is a cube law. Overall candidate search time multiplies the two together.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

Yeah, that was a tad short-hand labelling:
- "Word" is the full summed word size minus the lsb, for s16 (Xoroshiro32+) that is the top 15 bits [15:1].
- "Byte3" is the most significant 8 bits of the summed word, for s16 that is bits [15:8].
- "Byte2" is half way down the summed word, for s16 that is bits [11:4].
- "Byte1" is always bits [8:1] for all word sizes.
- "Bit" is msb.

PS: You'll note I've included Xoroshiro's predecessor, Xorshift, for comparison. It performs notably worse on the PractRand scores even though the author indicates it's still a perfectly okay algorithm. To me that says that Xoroshiro is exceptionally good for it's compactness and speed.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

yep, I'd just answered that as you were typing the question.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

Thanks Doug,
The mathematicians that do the real work formulating these algorithms must work at an entirely different level though. Somehow, for a period of 2^128-1, the author of Xoroshiro must have picked full-period candidates without empirically ever having tested the period on even one combination.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

And, just to emphasise that a little more, there is algorithms with virtual non-repeating periods of 2^(some ten digit power) or something like that.

"We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
By doing that, we can more accurately measure their mass, and determine whether
scientists have systematically been underestimating how much matter they contain."

## Comments

11,7388,043How is the S[15:0] addressing specified?

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,043Totally difference matter - I was just reading the original Xoroshiro128+ source again and noted a point stated about compilers will optimise the convoluted shifts and masks into a rotate instruction where they can. I knew that wasn't much help for my code since the word size is definable and therefore often won't fit the target processor ... then it dawned on me that that's exactly what a rotate instruction should be able to handle!

Hmm, I'm pondering the idea of adding a definable mask to the output of the ALU. Have a hidden register that holds the mask.

Its impact would be wide ranging and the details will take time to sort out - Something to think about for the Prop3 I guess.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

14,008There are two forms of Rotate in MCUs, one rotates a single-step, and another takes the rotate-count as a parameter.

It would be practical to modify the first form to have a reach, so allow ROTATE of any element width from 2-32 bits, and that would have immediate use for Rotate of 8b and 16b elements.

However, I'm not sure a Rotate with

twoparams, one for rotate-count, and another for rotate-reach would be practical, seems to be quite logic hungry.8,043Don't worry, I was just musing on record.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

14,008The single-bit form could be useful, (and compact logic) if combined with REP.

Trades off a little speed for logic savings.

Saves logic, code is compact and covers any reach and any count, for those cases that require it.

I think REP can manage multiple lines, for > 32b, N shifts ?

2,583Melbourne, Australia8,043And vanishing SFUNC from the list is a nice bonus too.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,043It's impractical to use 100% brute force beyond about 20 bits word size.

This issue applies heavily to the max-iteration search. I've tried to reduce the burden by limiting the combination ranges but it's not anywhere near enough. The inner most loop is a 4x per bit as a starting point.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,7388,043By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,738Oh, yeah, that would take forever!

11,738I checked it for max-run-length and it certainly looks random. This is the 14,2,7 set.

21,213"In Search of Randomness"

11,738That's the whole iterator, now realized in the XORO32 instruction. It uses any cog register to track the 32-bit PRNG state. Evanh found that the ROL/SHL settings used in this implementation cause the PRNG to pass all random quality tests out to 1GB, whether from using the top bit, the top byte, or the top 15 bits of the sum of the two 16-bit fields that comprise the state. This is golden.

This all means that you can now have high-quality seedable and repeatable pseudo-random number generators in PASM, without having to implement any algorithm. This was modeled after the best PRNG topology known on the interwebs, and the very best settings were determined through a lot of testing. Thanks to Ahle, Evanh, and whoever came up with the xoroshiro128+ algorithm!

8,043So that's just another instruction now, as in it's executed through the ALU using D source and D result. It's quite neat to think how compact that becomes.

PS: I've been offline for a week or so. Had the flu - couldn't sleep for a couple of days, then managed to sleep with an eyelash, or something in me eye. Damaged my eye enough that I couldn't sleep another night, went to doctor, got cleaned up but couldn't bare even looking at a display. Just slept for days.

All good now though.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,738Wow! I'm glad you're back now.

8,043By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,043Attached is all the sources/scripts and here's a sample (s16) score table that's been auto generated:

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,043Next step is look into using GPU for doing the full-period candidate searches ... The CPU takes enormous time when the word size is notched up - With respect to the word size, the per-combination run time is a square law 4^x exponential law and the number of combinations to search is a cube law. Overall candidate search time multiplies the two together.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,738Could you please explain, again maybe, what the non-combination columns mean?

8,043- "Word" is the full summed word size minus the lsb, for s16 (Xoroshiro32+) that is the top 15 bits [15:1].

- "Byte3" is the most significant 8 bits of the summed word, for s16 that is bits [15:8].

- "Byte2" is half way down the summed word, for s16 that is bits [11:4].

- "Byte1" is always bits [8:1] for all word sizes.

- "Bit" is msb.

PS: You'll note I've included Xoroshiro's predecessor, Xorshift, for comparison. It performs notably worse on the PractRand scores even though the author indicates it's still a perfectly okay algorithm. To me that says that Xoroshiro is exceptionally good for it's compactness and speed.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

11,73811,738Was it a predecessor to xoroshiro32+ ?

8,043By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

9,807And we know we've got good numbers in the P2. The best, probably.

Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One AnotherSKYPE = acuity_dougParallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>

8,043The mathematicians that do the real work formulating these algorithms must work at an entirely different level though. Somehow, for a period of 2^128-1, the author of Xoroshiro must have picked full-period candidates without empirically ever having tested the period on even one combination.

By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

8,043By doing that, we can more accurately measure their mass, and determine whether

scientists have systematically been underestimating how much matter they contain."

9,807It's crazy. Glad they are around, and we can use their work.

Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One AnotherSKYPE = acuity_dougParallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>

21,213It's amazing what these number theory guys can deduce.