Random/LFSR on P2

Heater. · 2017-03-22 12:58

Post deleted:

Heater. · 2017-03-22 12:59

evanh,

You are a genius. I love you. See below:

I wonder how Heater's code compiles down, as in does it create tiny state machines?

As it happens Quartus can show the logic schematic that it turns your code into. For example the "hello" module above ends up looking like this:

There is only one state machine in the design I posted. It's the one I created with a "case" statement and state variable in the UART. Again Quartus can display all the state machines in your design as the typical circle and line diagrams.

Now...you got me thinking. How come there is only one state machine in my design? I have another case statement with state variable in the hello module. So I looked at the code and saw that instead of

state <= 1;

Like I had in the UART it had:

state <= state + 1;

Hmm...that "+" is redundant. So I changed it to use constants.

BINGO! Turns out that little "+" was making the logic a lot bigger than it need be. The new version generates this schematic:

As you see it has a lot less in it!

Thanks evanh.

Heater. · 2017-03-22 13:23

Chip,

Just click on the clock-looking icon up at the top of the Quartus window. Then, double-click "Report Timing" and hit enter to get it started. It will show you the slowest paths. It's pretty self-explanatory.

Ah, that clock icon. I found that button yesterday. But on this Surface Pro screen what comes up is an illegible mess so I gave up. I tried it again and managed to get one violation out of it:

-0.765 hello:h1|uartTx:t1|bitTimer[3] hello:h1|uartTx:t1|shifter[0] CLOCK_50 CLOCK_50 1.000 -0.038 1.714

Seems to not like my bitTimer.

I need a bigger screen for this !

Tor · 2017-03-22 14:26

Chip has some suggestions about useful screens.. you may need a bigger office though!

Your success with Quartus has inspired me to give it another try. Now where did I put that DE0..

Heater. · 2017-03-22 14:55

I have seen Chip's awesome screen set up, err, face to screen, as it were. I believe it got even more awesome since then.

I have a big Samsung monitor, but this Surface Pro really needs a 4K display. Or two

Perhaps a birthday treat for myself.

If you want to get into Verilog as a beginner I seriously suggest leaving the DE0 in the project bin and skipping Quartus for a while.

Just write some Verilog snippets from tutorials and examples using your favorite editor. Vim has great Verilog syntax highlighting. Then run it under the Icarus Verilog simulator. I run it under the Linux subsystem for Windows.

http://iverilog.icarus.com/

It's available as a package for the Ubunto that is in the Linux subsystem.

That makes for a very fast edit, run turn around time.

When you have something that works well under Icarus then it's time to turn to the horrible slow Quartus and dig out the nano.

The P2 engine room:

cgracey · 2017-03-23 12:12

Here's the improved setup. You can see everything with this:

Heater. · 2017-03-23 12:34

Good grief!

Phil Pilgrim (PhiPi) · 2017-03-23 18:57

Wow, Chip, you must have really good eyesight! Even with my multi-focal glasses, I doubt I'd be able to read half of that display while seated.

-Phil

evanh · 2017-03-23 21:17

Must have been some work mounting those to the wall.

cgracey · 2017-03-23 23:47

evanh wrote: »

Must have been some work mounting those to the wall.

I made two brackets from 1" angle iron and bolted the monitors together with them. Now, they just sit on the table like a hinged 3-panel picture frame, with the outer monitors turned inward 15 degrees. It sits quite stably.

Rayman · 2017-03-24 00:25

Portraits are nice, but hard to see whole width of instruction spreadsheet on one of those, right?

I think best is landscape in the middle with somewhere between 4:3 and 16:9 ratio and then long portraits on the sides...

I forget exactly what my optometrist told me, but it was something like the usual chair to screen distance is easy on the eyes, they don't have to work hard to focus and arm's reach distance...

jmg · 2017-03-24 23:23

Heater. wrote: »

Hacking around with Verilog with a simulator like Icarus has the same edit, run, edit, run feel of writing JS for node.js.

Did you try simulation runs of P1V ? - curious to get some equivalent clock speed numbers for Verilog Simulators.

cgracey · 2017-04-11 02:35

Can anyone point to a high-quality 32-bit PRNG that we could implement in Spin2? It would be nice to have a good, repeatable sequence.

Hey, some of you guys could even develop one, since you've familiarized yourselves with running those DieHarder tests. What are the chances, I wonder, that you could scale down xoroshiro128+ to 32 bits and pass those tests? You'd have to come up with new rotate values. Wait... That thing used 128 bits to get 63 out. At 32 bits, we'd get 15 bits out, only, but we could just run it twice.

ozpropdev · 2017-04-11 03:53

Chip
A while back I made a test program to exercise the new PRNG mechanism.
The program generates random mazes and has a graphical representation of the maze generation.
The completed maze is output in text format over serial.
It can run on single cog FPGA builds without video too.
Larger maze generation is quite hypnotic. :zombie:
A fun exercise and shows the PRNG runs very nicely.



+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
I       I       I       I   I       I               I       I   |
+   +---+   +   +---+   +   +   +   +   +---+   +---+   +   +   +
I   I       I                   I       I       I       I       |
+   +   +---+   +---+---+---+---+---+---+   +---+   +---+   +   +
I           I   I   I       I               I       I   I   I   |
+---+   +   +   +   +   +   +   +---+---+---+   +---+   +   +   +
I   I   I   I       I   I   I       I       I           I   I   |
+   +   +---+---+---+   +   +---+   +   +---+---+   +---+   +   +
I                       I   I       I           I   I       I   |
+---+   +---+---+---+---+   +   +---+---+   +---+   +   +---+   +
I       I       I   I       I       I               I   I   I   |
+   +---+---+   +   +   +   +---+   +   +---+---+---+   +   +   +
I           I   I       I       I       I   I               I   |
+---+---+   +   +---+   +   +---+---+---+   +   +---+---+   +   +
I       I           I   I   I           I       I       I   I   |
+   +---+---+---+   +   +---+   +---+   +   +---+---+   +---+   +
I           I       I       I       I       I       I       I   |
+---+---+   +   +---+---+   +---+---+---+   +   +   +---+   +   +
I           I       I   I   I       I           I   I       I   |
+   +---+---+---+   +   +   +   +   +---+---+---+   +   +---+   +
I               I       I       I           I       I       I   |
+---+---+---+   +   +---+---+---+---+---+---+   +---+---+   +   +
I       I       I       I           I       I       I       I   |
+   +   +   +---+---+   +   +---+   +   +   +---+   +   +---+   +
I   I   I               I   I       I   I       I       I       |
+---+   +---+---+---+   +   +   +---+   +---+   +---+---+   +---+
I       I               I   I   I       I   I           I       |
+   +---+   +---+   +---+   +   +   +---+   +---+---+   +---+   +
I   I       I       I   I   I       I                       I   |
+   +---+---+   +   +   +   +   +---+   +---+---+---+---+   +   +
I               I   I       I       I                   I       |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

evanh · 2017-04-11 06:08

cgracey wrote: »

... What are the chances, I wonder, that you could scale down xoroshiro128+ to 32 bits and pass those tests? You'd have to come up with new rotate values. Wait... That thing used 128 bits to get 63 out. At 32 bits, we'd get 15 bits out, only, but we could just run it twice.

Remember the xoroshiro128+ algorithm was really 2x 64 bit registers with 63 bit result, so a 32 bit version would be 2x 32 bit registers with 31 bit result.

If 31 bits is suitable then I can give some details when I get home tonight.

cgracey · 2017-04-11 06:11

ozpropdev wrote: »

Chip
A while back I made a test program to exercise the new PRNG mechanism.
The program generates random mazes and has a graphical representation of the maze generation.
The completed maze is output in text format over serial.
It can run on single cog FPGA builds without video too.
Larger maze generation is quite hypnotic. :zombie:
A fun exercise and shows the PRNG runs very nicely.



+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
I       I       I       I   I       I               I       I   |
+   +---+   +   +---+   +   +   +   +   +---+   +---+   +   +   +
I   I       I                   I       I       I       I       |
+   +   +---+   +---+---+---+---+---+---+   +---+   +---+   +   +
I           I   I   I       I               I       I   I   I   |
+---+   +   +   +   +   +   +   +---+---+---+   +---+   +   +   +
I   I   I   I       I   I   I       I       I           I   I   |
+   +   +---+---+---+   +   +---+   +   +---+---+   +---+   +   +
I                       I   I       I           I   I       I   |
+---+   +---+---+---+---+   +   +---+---+   +---+   +   +---+   +
I       I       I   I       I       I               I   I   I   |
+   +---+---+   +   +   +   +---+   +   +---+---+---+   +   +   +
I           I   I       I       I       I   I               I   |
+---+---+   +   +---+   +   +---+---+---+   +   +---+---+   +   +
I       I           I   I   I           I       I       I   I   |
+   +---+---+---+   +   +---+   +---+   +   +---+---+   +---+   +
I           I       I       I       I       I       I       I   |
+---+---+   +   +---+---+   +---+---+---+   +   +   +---+   +   +
I           I       I   I   I       I           I   I       I   |
+   +---+---+---+   +   +   +   +   +---+---+---+   +   +---+   +
I               I       I       I           I       I       I   |
+---+---+---+   +   +---+---+---+---+---+---+   +---+---+   +   +
I       I       I       I           I       I       I       I   |
+   +   +   +---+---+   +   +---+   +   +   +---+   +   +---+   +
I   I   I               I   I       I   I       I       I       |
+---+   +---+---+---+   +   +   +---+   +---+   +---+---+   +---+
I       I               I   I   I       I   I           I       |
+   +---+   +---+   +---+   +   +   +---+   +---+---+   +---+   +
I   I       I       I   I   I       I                       I   |
+   +---+---+   +   +   +   +   +---+   +---+---+---+---+   +   +
I               I   I       I       I                   I       |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

Neat! If you used a simple LFSR, would the maze develop more of a cyclical appearance?

cgracey · 2017-04-11 06:13

evanh wrote: »

cgracey wrote: »

... What are the chances, I wonder, that you could scale down xoroshiro128+ to 32 bits and pass those tests? You'd have to come up with new rotate values. Wait... That thing used 128 bits to get 63 out. At 32 bits, we'd get 15 bits out, only, but we could just run it twice.

Remember the xoroshiro128+ algorithm was really 2x 64 bit registers with 63 bit result, so a 32 bit version would be 2x 32 bit registers with 31 bit result.

If 31 bits is suitable then I can give some details when I get home tonight.

The point is to have the PRNG work on only 32 bits of data, a long. We could get 32 bits out by running it two or three times.

evanh · 2017-04-11 06:27

PS: I've had comparatively good results with 32 bit testing of the Xoroshiro algorithm in my experiments. Obviously, though, the smaller the word size the sooner it'll fail the randomness testing.

The scaling is approximately linear to what I guess is repeat length up to about 28 bit result size, then there is some stronger signal that starts impacting the quality of randomness as the size increases from there, so you don't get as big a benefit from increasing the result size.

Or, the other way to look at it is going lower than 28 bits will dramatically reduce quality.

evanh · 2017-04-11 06:34

cgracey wrote: »

evanh wrote: »

Remember the xoroshiro128+ algorithm was really 2x 64 bit registers with 63 bit result, so a 32 bit version would be 2x 32 bit registers with 31 bit result.

If 31 bits is suitable then I can give some details when I get home tonight.

The point is to have the PRNG work on only 32 bits of data, a long. We could get 32 bits out by running it two or three times.

Each iteration of the full 128 bits is actually working on two 64 bit registers then summing those together. There isn't any 128 bit register.

So a xoroshiro64+ would be in two 32 bit registers.

evanh · 2017-04-11 06:41

Or do you mean you only want total of 32 bits of internal state?

PS: It'd be terrible at this size! Choose something else.

cgracey · 2017-04-11 06:44

evanh wrote: »

Or do you mean you only want total of 32 bits of internal state?

Yes. One long, only. We can iterate it a few times to get the next result.

cgracey · 2017-04-11 06:49

Here is the algorithm, redone for 32 bits of state, instead of the original 128:

uint16_t next(void) {
	const uint16_t s0 = s[0];
	const_uint16_t s1 = s[1];
	const uint16_t result = s0 + s1;

	s1 ^= s0;
	s[0] = rotl(s0, 55) ^ s1 ^ (s1 << 14);
	s[1] = rotl(s1, 36);

	return result;
}

The values 55, 14, and 36 need to be changed. The solution space is 16x16x16, which is only 4096 possibilities. Which combo of three 0..15 values is optimal? That is the question.

Remember, based on the way the original algorithm worked, that the LSB of each sum is just an LFSR bit, and should not be used as output. So, this thing should give 15 bits of quality output per iteration.

evanh · 2017-04-11 07:04

I just scaled down from 64 bits for those three constants but with odd/even rule for the two larger ones. Ie: For the original values I had (55 | 1) for the first constant. And (36 & -2) for the last constant. This entirely eliminated an instant fail condition on seemingly random test combination.

cgracey · 2017-04-11 07:22

I just made a little Prop2 PASM program to run on the Prop123_A9 that tries to make a xoroshiro32+. I kept the rotate/shift values proportional to xoroshiro128+. It looks pretty random on the LEDs:

dat	org

	bmask	dirb,#15

loop	getword	s0,state,#0	'extract s0 and s1 words from state long
	getword	s1,state,#1

	mov	outb,s0		'output sum to LEDs (LSB is low-quality)
	add	outb,s1

	setword	s0,s0,#1	'copy bottom words to top words for ROL
	setword	s1,s1,#1

	xor	s1,s0		's1 ^= s0

	rol	s0,#14		's0 = rol(s0,14) ^ s1 ^ (s1 << 3)
	xor	s0,s1
	mov	x,s1
	shl	x,#3
	xor	s0,x

	rol	s1,#9		's1 = rol(s1, 9)

	setword	state,s0,#0	'pack s0 and s1 back into state long
	setword	state,s1,#1

	waitx	##80_000_000/10	'wait 1/10th second, then loop
	jmp	#loop


state	long	1		'seed {s1,s0} with 1

s0	res	1
s1	res	1
x	res	1

Heater. · 2017-04-11 07:59

Don't go inventing your own PRNG. Just use JKISS32. It is well analysed and tested. It has a period of ~2^121. And fast.

I have a Spin implementation here: http://forums.parallax.com/discussion/136975/jkiss32-high-quality-prng-with-realrandom-seed-save-a-cog

JKISS32 is described in this paper: "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications" of May 7th 2010.
http://www.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf

cgracey · 2017-04-11 08:10

Heater. wrote: »

Don't go inventing your own PRNG. Just use JKISS32. It is well analysed and tested. It has a period of ~2^121. And fast.

I have a Spin implementation here: http://forums.parallax.com/discussion/136975/jkiss32-high-quality-prng-with-realrandom-seed-save-a-cog

JKISS32 is described in this paper: "Good Practice in (Pseudo) Random Number Generation for Bioinformatics Applications" of May 7th 2010.
http://www.cs.ucl.ac.uk/staff/d.jones/GoodPracticeRNG.pdf

But I'm looking for something that only has 32 bits of state, so that we can have a postfix math operator to iterate a long variable. In software, it's sometimes important to have a simple, but repeatable PRNG.

cgracey · 2017-04-11 08:31

I changed my PASM program to count how many iterations before the seed value showed up again.

I effectively made this and it ran for $7FFF_FFFF iterations before it repeated the seed value of 1:

uint16_t next(void) {
	const uint16_t s0 = s[0];
	const_uint16_t s1 = s[1];
	const uint16_t result = s0 + s1;

	s1 ^= s0;
	s[0] = rotl(s0, 14) ^ s1 ^ (s1 << 4);
	s[1] = rotl(s1, 9);

	return result;
}

That's pretty good, but I don't know what it does if it gets into one of those missing states from a seed of 1. I wonder, does the xoroshiro128+ get all the way to 'power(2, 128) - 1' iterations before it loops? Or, can I find another shift set that will get to $FFFF_FFFF iterations before the original seed of 1 reappears?

Heater. · 2017-04-11 08:31

Yes, a repeatable PRNG is very useful sometimes. JKISS32 is a repeatable PRNG just like xoroshiro128+ but smaller.

PUB random | t
    y ^= (y <<  5)
    y ^= (y >>  7)
    y ^= (y << 22)
    t := z + w + c
    z := w
    c := (t < 0) & 1
    w := t & 2147483647
    x += 1411392427
    return (x + y + w)

(I just mixed it up with real random in the object I liked to so as not need to use a COG for getting what look like real random numbers)

Ah, yes, the postfix math operator my be problematic.

cgracey · 2017-04-11 08:37

I just found a page that says that the xoroshiro128+ period is, indeed, power(2, 128) - 1. So, I should be able to find a shift set that gets to power(2, 31) - 1.

Heater. · 2017-04-11 09:09

How about this one from encryption guru Bruce Schneier:
https://www.schneier.com/academic/archives/1994/09/pseudo-random_sequen.html

Period 2^32 - 1

int RANDOM ()  {
    static unsigned long register; /*Register must be unsigned so right
                                     shift works properly.*/
    /*Register should be initialized with some random value.*/
    register = ((((register >> 31) /*Shift the 32nd bit to the first bit*/
               ^ (register >> 6)   /*XOR it with the seventh bit*/
               ^ (register >> 4)   /*XOR it with the fifth bit*/
               ^ (register >> 2)   /*XOR it with the third bit*/
               ^ (register >> 1)   /*XOR it with the second bit*/
               ^ register)         /*and XOR it with the first bit.*/
               & 0x0000001)        /*Strip all the other bits off and*/
               <<31)               /*move it back to the 32nd bit.*/
               | (register >> 1);  /*Or with the register shifted right.*/
    return register & 0x00000001;  /*Return the first bit.*/
}

As he says:

"One final note of caution: There are many more feedback arrangements for various-length LSFRs that produce maximum-length sequences, but don't fiddle with the feedback sequences without the proper mathematical theory. The particular bits that are XORed together may seem arbitrary, but they are chosen to ensure that the sequence takes 2n-1 bits to repeat. Blindly choosing a different feedback sequence can easily make the output sequence repeat after only a couple of hundred bits, and you would be better off sticking with your store-bought RND function."

Random/LFSR on P2

Comments