I wonder how Heater's code compiles down, as in does it create tiny state machines?
As it happens Quartus can show the logic schematic that it turns your code into. For example the "hello" module above ends up looking like this:
There is only one state machine in the design I posted. It's the one I created with a "case" statement and state variable in the UART. Again Quartus can display all the state machines in your design as the typical circle and line diagrams.
Now...you got me thinking. How come there is only one state machine in my design? I have another case statement with state variable in the hello module. So I looked at the code and saw that instead of
state <= 1;
Like I had in the UART it had:
state <= state + 1;
Hmm...that "+" is redundant. So I changed it to use constants.
BINGO! Turns out that little "+" was making the logic a lot bigger than it need be. The new version generates this schematic:
Just click on the clock-looking icon up at the top of the Quartus window. Then, double-click "Report Timing" and hit enter to get it started. It will show you the slowest paths. It's pretty self-explanatory.
Ah, that clock icon. I found that button yesterday. But on this Surface Pro screen what comes up is an illegible mess so I gave up. I tried it again and managed to get one violation out of it:
I have seen Chip's awesome screen set up, err, face to screen, as it were. I believe it got even more awesome since then.
I have a big Samsung monitor, but this Surface Pro really needs a 4K display. Or two Perhaps a birthday treat for myself.
If you want to get into Verilog as a beginner I seriously suggest leaving the DE0 in the project bin and skipping Quartus for a while.
Just write some Verilog snippets from tutorials and examples using your favorite editor. Vim has great Verilog syntax highlighting. Then run it under the Icarus Verilog simulator. I run it under the Linux subsystem for Windows.
Must have been some work mounting those to the wall.
I made two brackets from 1" angle iron and bolted the monitors together with them. Now, they just sit on the table like a hinged 3-panel picture frame, with the outer monitors turned inward 15 degrees. It sits quite stably.
Portraits are nice, but hard to see whole width of instruction spreadsheet on one of those, right?
I think best is landscape in the middle with somewhere between 4:3 and 16:9 ratio and then long portraits on the sides...
I forget exactly what my optometrist told me, but it was something like the usual chair to screen distance is easy on the eyes, they don't have to work hard to focus and arm's reach distance...
Can anyone point to a high-quality 32-bit PRNG that we could implement in Spin2? It would be nice to have a good, repeatable sequence.
Hey, some of you guys could even develop one, since you've familiarized yourselves with running those DieHarder tests. What are the chances, I wonder, that you could scale down xoroshiro128+ to 32 bits and pass those tests? You'd have to come up with new rotate values. Wait... That thing used 128 bits to get 63 out. At 32 bits, we'd get 15 bits out, only, but we could just run it twice.
Chip
A while back I made a test program to exercise the new PRNG mechanism.
The program generates random mazes and has a graphical representation of the maze generation.
The completed maze is output in text format over serial.
It can run on single cog FPGA builds without video too.
Larger maze generation is quite hypnotic. :zombie:
A fun exercise and shows the PRNG runs very nicely.
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
I I I I I I I I |
+ +---+ + +---+ + + + + +---+ +---+ + + +
I I I I I I I |
+ + +---+ +---+---+---+---+---+---+ +---+ +---+ + +
I I I I I I I I I |
+---+ + + + + + + +---+---+---+ +---+ + + +
I I I I I I I I I I I |
+ + +---+---+---+ + +---+ + +---+---+ +---+ + +
I I I I I I I |
+---+ +---+---+---+---+ + +---+---+ +---+ + +---+ +
I I I I I I I I I |
+ +---+---+ + + + +---+ + +---+---+---+ + + +
I I I I I I I I |
+---+---+ + +---+ + +---+---+---+ + +---+---+ + +
I I I I I I I I I |
+ +---+---+---+ + +---+ +---+ + +---+---+ +---+ +
I I I I I I I I |
+---+---+ + +---+---+ +---+---+---+ + + +---+ + +
I I I I I I I I I |
+ +---+---+---+ + + + + +---+---+---+ + +---+ +
I I I I I I I |
+---+---+---+ + +---+---+---+---+---+---+ +---+---+ + +
I I I I I I I I |
+ + + +---+---+ + +---+ + + +---+ + +---+ +
I I I I I I I I I |
+---+ +---+---+---+ + + +---+ +---+ +---+---+ +---+
I I I I I I I I |
+ +---+ +---+ +---+ + + +---+ +---+---+ +---+ +
I I I I I I I I |
+ +---+---+ + + + + +---+ +---+---+---+---+ + +
I I I I I I |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
... What are the chances, I wonder, that you could scale down xoroshiro128+ to 32 bits and pass those tests? You'd have to come up with new rotate values. Wait... That thing used 128 bits to get 63 out. At 32 bits, we'd get 15 bits out, only, but we could just run it twice.
Remember the xoroshiro128+ algorithm was really 2x 64 bit registers with 63 bit result, so a 32 bit version would be 2x 32 bit registers with 31 bit result.
If 31 bits is suitable then I can give some details when I get home tonight.
Chip
A while back I made a test program to exercise the new PRNG mechanism.
The program generates random mazes and has a graphical representation of the maze generation.
The completed maze is output in text format over serial.
It can run on single cog FPGA builds without video too.
Larger maze generation is quite hypnotic. :zombie:
A fun exercise and shows the PRNG runs very nicely.
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
I I I I I I I I |
+ +---+ + +---+ + + + + +---+ +---+ + + +
I I I I I I I |
+ + +---+ +---+---+---+---+---+---+ +---+ +---+ + +
I I I I I I I I I |
+---+ + + + + + + +---+---+---+ +---+ + + +
I I I I I I I I I I I |
+ + +---+---+---+ + +---+ + +---+---+ +---+ + +
I I I I I I I |
+---+ +---+---+---+---+ + +---+---+ +---+ + +---+ +
I I I I I I I I I |
+ +---+---+ + + + +---+ + +---+---+---+ + + +
I I I I I I I I |
+---+---+ + +---+ + +---+---+---+ + +---+---+ + +
I I I I I I I I I |
+ +---+---+---+ + +---+ +---+ + +---+---+ +---+ +
I I I I I I I I |
+---+---+ + +---+---+ +---+---+---+ + + +---+ + +
I I I I I I I I I |
+ +---+---+---+ + + + + +---+---+---+ + +---+ +
I I I I I I I |
+---+---+---+ + +---+---+---+---+---+---+ +---+---+ + +
I I I I I I I I |
+ + + +---+---+ + +---+ + + +---+ + +---+ +
I I I I I I I I I |
+---+ +---+---+---+ + + +---+ +---+ +---+---+ +---+
I I I I I I I I |
+ +---+ +---+ +---+ + + +---+ +---+---+ +---+ +
I I I I I I I I |
+ +---+---+ + + + + +---+ +---+---+---+---+ + +
I I I I I I |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Neat! If you used a simple LFSR, would the maze develop more of a cyclical appearance?
... What are the chances, I wonder, that you could scale down xoroshiro128+ to 32 bits and pass those tests? You'd have to come up with new rotate values. Wait... That thing used 128 bits to get 63 out. At 32 bits, we'd get 15 bits out, only, but we could just run it twice.
Remember the xoroshiro128+ algorithm was really 2x 64 bit registers with 63 bit result, so a 32 bit version would be 2x 32 bit registers with 31 bit result.
If 31 bits is suitable then I can give some details when I get home tonight.
The point is to have the PRNG work on only 32 bits of data, a long. We could get 32 bits out by running it two or three times.
PS: I've had comparatively good results with 32 bit testing of the Xoroshiro algorithm in my experiments. Obviously, though, the smaller the word size the sooner it'll fail the randomness testing.
The scaling is approximately linear to what I guess is repeat length up to about 28 bit result size, then there is some stronger signal that starts impacting the quality of randomness as the size increases from there, so you don't get as big a benefit from increasing the result size.
Or, the other way to look at it is going lower than 28 bits will dramatically reduce quality.
Remember the xoroshiro128+ algorithm was really 2x 64 bit registers with 63 bit result, so a 32 bit version would be 2x 32 bit registers with 31 bit result.
If 31 bits is suitable then I can give some details when I get home tonight.
The point is to have the PRNG work on only 32 bits of data, a long. We could get 32 bits out by running it two or three times.
Each iteration of the full 128 bits is actually working on two 64 bit registers then summing those together. There isn't any 128 bit register.
So a xoroshiro64+ would be in two 32 bit registers.
The values 55, 14, and 36 need to be changed. The solution space is 16x16x16, which is only 4096 possibilities. Which combo of three 0..15 values is optimal? That is the question.
Remember, based on the way the original algorithm worked, that the LSB of each sum is just an LFSR bit, and should not be used as output. So, this thing should give 15 bits of quality output per iteration.
I just scaled down from 64 bits for those three constants but with odd/even rule for the two larger ones. Ie: For the original values I had (55 | 1) for the first constant. And (36 & -2) for the last constant. This entirely eliminated an instant fail condition on seemingly random test combination.
I just made a little Prop2 PASM program to run on the Prop123_A9 that tries to make a xoroshiro32+. I kept the rotate/shift values proportional to xoroshiro128+. It looks pretty random on the LEDs:
dat org
bmask dirb,#15
loop getword s0,state,#0 'extract s0 and s1 words from state long
getword s1,state,#1
mov outb,s0 'output sum to LEDs (LSB is low-quality)
add outb,s1
setword s0,s0,#1 'copy bottom words to top words for ROL
setword s1,s1,#1
xor s1,s0 's1 ^= s0
rol s0,#14 's0 = rol(s0,14) ^ s1 ^ (s1 << 3)
xor s0,s1
mov x,s1
shl x,#3
xor s0,x
rol s1,#9 's1 = rol(s1, 9)
setword state,s0,#0 'pack s0 and s1 back into state long
setword state,s1,#1
waitx ##80_000_000/10 'wait 1/10th second, then loop
jmp #loop
state long 1 'seed {s1,s0} with 1
s0 res 1
s1 res 1
x res 1
But I'm looking for something that only has 32 bits of state, so that we can have a postfix math operator to iterate a long variable. In software, it's sometimes important to have a simple, but repeatable PRNG.
That's pretty good, but I don't know what it does if it gets into one of those missing states from a seed of 1. I wonder, does the xoroshiro128+ get all the way to 'power(2, 128) - 1' iterations before it loops? Or, can I find another shift set that will get to $FFFF_FFFF iterations before the original seed of 1 reappears?
Yes, a repeatable PRNG is very useful sometimes. JKISS32 is a repeatable PRNG just like xoroshiro128+ but smaller.
PUB random | t
y ^= (y << 5)
y ^= (y >> 7)
y ^= (y << 22)
t := z + w + c
z := w
c := (t < 0) & 1
w := t & 2147483647
x += 1411392427
return (x + y + w)
(I just mixed it up with real random in the object I liked to so as not need to use a COG for getting what look like real random numbers)
Ah, yes, the postfix math operator my be problematic.
I just found a page that says that the xoroshiro128+ period is, indeed, power(2, 128) - 1. So, I should be able to find a shift set that gets to power(2, 31) - 1.
int RANDOM () {
static unsigned long register; /*Register must be unsigned so right
shift works properly.*/
/*Register should be initialized with some random value.*/
register = ((((register >> 31) /*Shift the 32nd bit to the first bit*/
^ (register >> 6) /*XOR it with the seventh bit*/
^ (register >> 4) /*XOR it with the fifth bit*/
^ (register >> 2) /*XOR it with the third bit*/
^ (register >> 1) /*XOR it with the second bit*/
^ register) /*and XOR it with the first bit.*/
& 0x0000001) /*Strip all the other bits off and*/
<<31) /*move it back to the 32nd bit.*/
| (register >> 1); /*Or with the register shifted right.*/
return register & 0x00000001; /*Return the first bit.*/
}
As he says:
"One final note of caution: There are many more feedback arrangements for various-length LSFRs that produce maximum-length sequences, but don't fiddle with the feedback sequences without the proper mathematical theory. The particular bits that are XORed together may seem arbitrary, but they are chosen to ensure that the sequence takes 2n-1 bits to repeat. Blindly choosing a different feedback sequence can easily make the output sequence repeat after only a couple of hundred bits, and you would be better off sticking with your store-bought RND function."
Comments
You are a genius. I love you. See below:
As it happens Quartus can show the logic schematic that it turns your code into. For example the "hello" module above ends up looking like this:
There is only one state machine in the design I posted. It's the one I created with a "case" statement and state variable in the UART. Again Quartus can display all the state machines in your design as the typical circle and line diagrams.
Now...you got me thinking. How come there is only one state machine in my design? I have another case statement with state variable in the hello module. So I looked at the code and saw that instead of
state <= 1;
Like I had in the UART it had:
state <= state + 1;
Hmm...that "+" is redundant. So I changed it to use constants.
BINGO! Turns out that little "+" was making the logic a lot bigger than it need be. The new version generates this schematic:
As you see it has a lot less in it!
Thanks evanh.
-0.765 hello:h1|uartTx:t1|bitTimer[3] hello:h1|uartTx:t1|shifter[0] CLOCK_50 CLOCK_50 1.000 -0.038 1.714
Seems to not like my bitTimer.
I need a bigger screen for this !
Your success with Quartus has inspired me to give it another try. Now where did I put that DE0..
I have a big Samsung monitor, but this Surface Pro really needs a 4K display. Or two Perhaps a birthday treat for myself.
If you want to get into Verilog as a beginner I seriously suggest leaving the DE0 in the project bin and skipping Quartus for a while.
Just write some Verilog snippets from tutorials and examples using your favorite editor. Vim has great Verilog syntax highlighting. Then run it under the Icarus Verilog simulator. I run it under the Linux subsystem for Windows.
http://iverilog.icarus.com/
It's available as a package for the Ubunto that is in the Linux subsystem.
That makes for a very fast edit, run turn around time.
When you have something that works well under Icarus then it's time to turn to the horrible slow Quartus and dig out the nano.
The P2 engine room:
-Phil
I made two brackets from 1" angle iron and bolted the monitors together with them. Now, they just sit on the table like a hinged 3-panel picture frame, with the outer monitors turned inward 15 degrees. It sits quite stably.
I think best is landscape in the middle with somewhere between 4:3 and 16:9 ratio and then long portraits on the sides...
I forget exactly what my optometrist told me, but it was something like the usual chair to screen distance is easy on the eyes, they don't have to work hard to focus and arm's reach distance...
Did you try simulation runs of P1V ? - curious to get some equivalent clock speed numbers for Verilog Simulators.
Hey, some of you guys could even develop one, since you've familiarized yourselves with running those DieHarder tests. What are the chances, I wonder, that you could scale down xoroshiro128+ to 32 bits and pass those tests? You'd have to come up with new rotate values. Wait... That thing used 128 bits to get 63 out. At 32 bits, we'd get 15 bits out, only, but we could just run it twice.
A while back I made a test program to exercise the new PRNG mechanism.
The program generates random mazes and has a graphical representation of the maze generation.
The completed maze is output in text format over serial.
It can run on single cog FPGA builds without video too.
Larger maze generation is quite hypnotic. :zombie:
A fun exercise and shows the PRNG runs very nicely.
If 31 bits is suitable then I can give some details when I get home tonight.
Neat! If you used a simple LFSR, would the maze develop more of a cyclical appearance?
The point is to have the PRNG work on only 32 bits of data, a long. We could get 32 bits out by running it two or three times.
The scaling is approximately linear to what I guess is repeat length up to about 28 bit result size, then there is some stronger signal that starts impacting the quality of randomness as the size increases from there, so you don't get as big a benefit from increasing the result size.
Or, the other way to look at it is going lower than 28 bits will dramatically reduce quality.
So a xoroshiro64+ would be in two 32 bit registers.
PS: It'd be terrible at this size! Choose something else.
Yes. One long, only. We can iterate it a few times to get the next result.
The values 55, 14, and 36 need to be changed. The solution space is 16x16x16, which is only 4096 possibilities. Which combo of three 0..15 values is optimal? That is the question.
Remember, based on the way the original algorithm worked, that the LSB of each sum is just an LFSR bit, and should not be used as output. So, this thing should give 15 bits of quality output per iteration.
I have a Spin implementation here: http://forums.parallax.com/discussion/136975/jkiss32-high-quality-prng-with-realrandom-seed-save-a-cog
JKISS32 is described in this paper: "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications" of May 7th 2010.
http://www.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf
But I'm looking for something that only has 32 bits of state, so that we can have a postfix math operator to iterate a long variable. In software, it's sometimes important to have a simple, but repeatable PRNG.
I effectively made this and it ran for $7FFF_FFFF iterations before it repeated the seed value of 1:
That's pretty good, but I don't know what it does if it gets into one of those missing states from a seed of 1. I wonder, does the xoroshiro128+ get all the way to 'power(2, 128) - 1' iterations before it loops? Or, can I find another shift set that will get to $FFFF_FFFF iterations before the original seed of 1 reappears?
Ah, yes, the postfix math operator my be problematic.
https://www.schneier.com/academic/archives/1994/09/pseudo-random_sequen.html
Period 2^32 - 1
As he says:
"One final note of caution: There are many more feedback arrangements for various-length LSFRs that produce maximum-length sequences, but don't fiddle with the feedback sequences without the proper mathematical theory. The particular bits that are XORed together may seem arbitrary, but they are chosen to ensure that the sequence takes 2n-1 bits to repeat. Blindly choosing a different feedback sequence can easily make the output sequence repeat after only a couple of hundred bits, and you would be better off sticking with your store-bought RND function."