Self modifying code spacer instruction requirement (SETS/SETD)

ozpropdev · 2015-11-15 09:45

Formely titled : FPGA platforms exhibit different behaviour (SETS issue)
Hi All

I've been playing around with some video stuff and have found some weird differences
between the various FPGA builds.

I have built a NTSC driver for the Nano that has minimal impact on HUB usage.
The font data is stored in cog so all that is required is a ~800 bytes text buffer.

This driver works differently on the Nano/DE2 platforms compared to the P123-A7 platform.

On the Nano/DE2 build: (see ntsc_nano.jpg)
The characters are displayed in the incorrect position (offset by 1 - see picture).
The first character displayed is actually the last character in the line buffer.
Their also is a random artifact appearing on the first character.
Otherwise picture is stable and clear.

On the P123-A7 build: (see ntsc_p123.jpg)
The characters are displayed in the correct position but several positions are
displaying the wrong characters.
Image has lots of random artifacts which can be cleaned up by inserting a single NOP
in the approprite spot (see code)

I'm guessing their might some compile/timing differences between Cyclone IV and Cyclone V.
I suspected my code (and still do) but getting two totally different results on different
platforms has got the old brain foggy again.

Any ideas?

P.S. You may notice a row of '*' and '?' missing.
These are flashing characters and I missed timed taking the picture.

LoopyByteloose · 2015-11-15 10:51

Quartus II is a rather huge application. And the Cyclone IV and Cyclone V do indeed have generational differences that may explain some of this.

At the core of the question is 'What exactly is Altera providing?". Altera is unlikely to enter into any discussion unless you are a business and have at least the potential to buy a licensed version of Quartus II for about $3000USD.

So the comunity of free users is pretty much left in the dark and can only speculate.

So, it pretty much leaves Parallax to working it out and communicating with Altera. We might come up with a satisfactory explaintion here, but I wouldn't waste my time trying to engage Altera in a reply.

Rayman · 2015-11-15 12:19

I'd wonder if it's the DACs... Maybe different s (scale) values would get you the same output?

ozpropdev · 2015-11-15 12:19

Loopy
I never expected to get Altera involved at all!
I've always found the Parallax community way more helpful.

ozpropdev · 2015-11-15 12:22

@Rayman
Incorrect signal levels still doesn't explain random bytes though.

rjo__ · 2015-11-15 13:17

I'm getting a correct display on P123-A7
by inserting an extra NOP in .nxtc... so two nop there...
above in .loop1, I can remove the NOP... leave it as it is or have two NOPs and display is still good.

rjo__ · 2015-11-15 13:31

.....

mindrobots · 2015-11-15 13:31

I can play with this today. I have the different boards to test on.

Really need to identify problems like this so we chase after the correct bugs in the FPGA image.

rjo__ · 2015-11-15 13:49

So... it looks like the issue is in the rep block?

SETS?
mov pixels, 0-0?

I'm guessing SETS inside the REP is being delayed.

rjo__ · 2015-11-15 14:08

I seem to remember that there is a difference in how the cyclone V handles Verilog immediate assignments... In the IV, if you make conflicting assignments, the last one wins and there is nothing to flag the conflict. On the V, it throws a message or error... can't remember which.

Rayman · 2015-11-15 15:21

Are there some instructions that aren't supposed to be in rep loop?

rjo__ · 2015-11-15 15:56

I think Chip posted something about that... can't exactly remember... but that could be the answer.

rjo__ · 2015-11-15 16:01

I only have the P123-A7 available. I am really interested in the fact that the behavior seems to vary depending upon the cyclone variant... and whether this points to something in the Verilog handling by Quartus. I know that I had code that wasn't good form and it got pointed at by Quartus when I ran it for the A7... the problem is that I can't remember exactly what happened:)

mindrobots · 2015-11-15 19:46

Wow! My DE2 output looks identical to your DE0 output.

My P123 is garbage. It has artifacts everywhere. The blinking dots and ?'s are good but everything else is illegible. One NOP or 2 NOP makes no difference. In fact, my most stable P123 display is without any NOP.

For reference, Potatohead's NTSC text driver is rock solid on both DE2 and P123.

Nothing jumps out at me yet but its kind of scary that things are that different!

ozpropdev · 2015-11-16 07:00

Both my Nano and DE2 produce the same output and the P123-A7 is different (worse).
I had earlier suspected the REP block but changing it to a DJNZ loop made no difference.
I will keep chipping away at it, pardon the pun

ozpropdev · 2015-11-16 07:47

Eureka!
It appears the problem is that SETS requires two NOP's spacing not one!

		sets	.get_pixels,asx
		nop
		nop
.get_pixels	mov	pixels,0-0

All tests so far on all three platforms now are correct and stable.
Will test on some more monitors later tonight.

Heater. · 2015-11-16 08:29

Whilst that extra NOP is a solution to get your code working I would say that it is not the solution but a work around.

We still don't know the root cause of the different behaviour between Nano/DE2 and P123-A7 builds.

Such a thing is very suspicious and should got to the bottom of. I guess only Chip is in a position to do that. Hope this is reported as a bug somewhere noticeable.

jmg · 2015-11-16 08:39

ozpropdev wrote: »

Eureka!
It appears the problem is that SETS requires two NOP's spacing not one!
...
All tests so far on all three platforms now are correct and stable.

So both output sets change, to converge on a 3rd, but correct display ?
I suppose some marginal timing (multicycle or race? ) could give different ultimate failures on differing processes.
If this passed all timing checks, it would seem there is good reason to be nervous more may be hiding...

ozpropdev · 2015-11-16 11:50

Using ALTI instead of SETS works fine on Nano/DE2-115 and P123-A7.

		alti	asx,#%000_000_100		
.get_pixels	mov	pixels,0-0

Hope that gives you a clue Chip.

78rpm · 2015-11-16 22:42

The SETS, SETD modify any instruction's field, but if that instruction is already in the pipeline you will miss it.

The instructions, ALTI, ALTR, ALTD and ALTS modify the NEXT instruction. This is internal to the pipeline stages.

Rayman · 2015-11-16 23:21

ALTI is a horrible name for that instruction...
Why isn't it called "ALTDS" ?

cgracey · 2015-11-16 23:35

Rayman wrote: »

ALTI is a horrible name for that instruction...
Why isn't it called "ALTDS" ?

Because it can modify the whole instruction, not just D and S.

cgracey · 2015-11-16 23:36

So, is it definite that two NOPs after SETS does the trick and takes all bad behavior away?

Rayman · 2015-11-16 23:42

I was looking at Jakacki's explanation for ALTI and didn't see that...
Actually, I don't see how ALTI could modify the whole instruction.
Guess I have to read the docs again.

I still don't like the name though.

Unless "SETI" also modifies I, D and S that is...

ozpropdev · 2015-11-16 23:44

So from this we arrive at:
SELF MODIFYING CODE RULE #1 - Two spacer instructions are required to allow correct pipeline fetch.
and there are distinct differences in timing between Cyclone IV and V.

potatohead · 2015-11-16 23:47

ALTISD, ALTDSI, ALTINS?

ALTINST?

Just typing these out shows a little ambiguity or difficulty in reading to me.

I personally like ALTISD the best.

Alter Instruction, Source, Destination.

ozpropdev · 2015-11-16 23:59

Thinking some more....
If the pipeline is 2 stage? then surely one space instruction is enough.
The results from the P123-A7 (Cyclone V) were ~95% correct with one spacer.
I feel their may still be something else going on here.
Perhaps Chip can shine some light on this.

ozpropdev · 2015-11-17 00:02

cgracey wrote: »

So, is it definite that two NOPs after SETS does the trick and takes all bad behavior away?

That's correct Chip

potatohead · 2015-11-17 00:03

Yeah, like a signal arriving just a little late...

Ariba · 2015-11-17 00:06

You can use ALTS here:

		alts	asx,#0		
.get_pixels	mov	pixels,0-0

Perhaps ALTI can be renamed to ALTNEXT or ALTNXT, so it does not suggest a specific field.

Regarding differencies for Cyclone 4 and 5:
Maybe the dual ported RAM behaves differently on read while write. Some FPGAs read the new value written and some return the old value when the same address is written and read.

Andy

potatohead · 2015-11-17 00:08

That is a significant difference!

Self modifying code spacer instruction requirement (SETS/SETD)

Comments