Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

evanh · 2018-12-17 05:09

Chip,
I found another obscure flaw: Branching out of a REP loop at the final instruction of the loop causes a crash of some sort. The cog active LED is still on but it looks stopped otherwise.

EDIT: Update: The branch is not taken, the loop stops. The cog appears to be locked.

ozpropdev · 2018-12-17 05:33

A NOP after the JUMP should fix that.
I believe its a pipeline thing.

evanh · 2018-12-17 05:38

I've simply changed the position of the branch in the loop.

Something to be fixed though. Can't have it just locking solid.

ozpropdev · 2018-12-17 05:59

A quick check here and it appears the REP loop does one extra loop.

evanh · 2018-12-17 07:10

Heh, it might be worse still. I've got a REP @label,#0 (loops forever) occasionally dropping out the bottom of the loop when a repeatedly branching condition breaks out and restarts the REP.

EDIT: Rewrote for clarity.

ozpropdev · 2018-12-17 07:26

Hre's what I found with this test code

		mov	pa,#0
		mov	pb,#0

		rep	#5,#10
		or	pa,pa
		or	pb,pb
		add	pa,#1
		incmod	pb,#7 wz
	if_z	jmp	#escape

		long	0[5]   '5 * NOP's, Must be same length as REP loop

escape		jmp	#$

With the NOP's REP loop works as expected.
Without the NOP's, an extra loop is executed.

evanh · 2018-12-17 07:29

I'm gonna say that any existing code designed with a REP breaking branch in it, will be a source of intermittent bugs in that software.

jmg · 2018-12-17 07:37

ozpropdev wrote: »
Hre's what I found with this test code
		mov	pa,#0
		mov	pb,#0

		rep	#5,#10
		or	pa,pa
		or	pb,pb
		add	pa,#1
		incmod	pb,#7 wz
	if_z	jmp	#escape

		long	0[5]   '5 * NOP's, Must be same length as REP loop

escape		jmp	#$
With the NOP's REP loop works as expected.
Without the NOP's, an extra loop is executed.

That's weird. If you really need to add trailing packer, aka NOP's, Must be same length as REP loop, that's far from 'working as expected'
How can the loop even 'know' how many NOPs follow it, unless the exit does not cleanly exit, but falls out with some size counter still set ?

evanh · 2018-12-17 07:42

JMG,
That part is easy enough to explain away. It's just a case of coincidence with both REP and JMP vying for writing the program counter together.

The new variant with intermittent outcome is not so easy to explain though.

evanh · 2018-12-17 07:48

Here's the loop code I'm currently using:

		rep     @.monend, #0          'loop forever (REP reduces monitor loop by two instructions)
.monl
		waitx   delay                 '2    WAITX reduces monitor loop by one instruction
		jse1   #.keyboard             '4    branch on event reduces monitor loop by one instruction

'decimator
		rdlut   samp, #(monitor & $1ff) '7
		sub     samp, diff1           '9
		add     diff1, samp           '11
		sub     samp, diff2           '13
		add     diff2, samp           '15
		sub     samp, diff3           '17
		add     diff3, samp           '19
		sub     samp, diff4           '21
		add     diff4, samp           '23

'spit to DAC for scope
		add     samp, offset          '25   offset centring
		wypin   samp, #1              '27   smartpin 16-bit dither into DAC1
.monend

		cogatn  #1                    'cog #0
		cogstop cid

I've got the JSE1 branch up near the start of the loop. No longer any lock-ups but the COGATN off the bottom can still be triggered just by triggering the SE1 event. "keyboard" code is further back and naturally falls into the REP.

ozpropdev · 2018-12-17 07:49

"as expected", should have been correct amount of loops and results.

With NOP's
pa = 8, pb = 0 'correct result

without NOP's
pa = 9,pb=1 'incorrect (extra loop excuted)

evanh · 2018-12-17 07:51

I only had the COGSTOP earlier. I started noticing that cog was dropping out. So I added the COGATN to verify. Now both that cog is stopping and cog #0 is getting an attention event.

Cog #0 is reporting data over the comport.

jmg · 2018-12-17 08:12

evanh wrote: »

JMG,
That part is easy enough to explain away. It's just a case of coincidence with both REP and JMP vying for writing the program counter together.

hmm.... but one is going back, and one is going forward, you would expect one to simply win, with no knowledge of the size of the other.

cgracey · 2018-12-17 08:22

evanh wrote: »

Chip,
I found another obscure flaw: Branching out of a REP loop at the final instruction of the loop causes a crash of some sort. The cog active LED is still on but it looks stopped otherwise.

EDIT: Update: The branch is not taken, the loop stops. The cog appears to be locked.

I'll look into this. At first glance, I don't see what's causing it.

Roy Eltham · 2018-12-17 08:41

Chip,
Did you see the shared lut bug mentioned a few message earlier in this thread?

evanh · 2018-12-17 09:18

Three flaws found today:
1) REP lockup - https://forums.parallax.com/discussion/comment/1458133/#Comment_1458133
2) REP fall-through - https://forums.parallax.com/discussion/comment/1458148/#Comment_1458148
3) RDLUT corruption - https://forums.parallax.com/discussion/comment/1458120/#Comment_1458120

Rayman · 2018-12-17 12:01

Been a while, but weren't there some restrictions when using REP?
Branching inside a REP doesn't sound like a good idea...

evanh · 2018-12-17 12:41

The doc says: "Any branch within the repeating instruction block will cancel REP activity. Interrupts will be ignored during REP looping."

So the only penalty is meant to be termination of the REP looping. Which is perfectly reasonable. And I was counting on it in my code.

ozpropdev · 2018-12-17 13:16

A branch in a REP block terminates correctly except if it's the last instruction in the REP block.

Edit: The issue seems to be related to relative branches.
If the last instruction is an absolute branch, the loop terminates Ok

evanh · 2018-12-17 14:28

ozpropdev wrote: »

Edit: The issue seems to be related to relative branches.
If the last instruction is an absolute branch, the loop terminates Ok

Ah, za'very interesting. Testing this with the fall-through variant and it has to specifically be an immediate absolute to start working. Register absolute doesn't work any better than PC-relative.

EDIT: No, that isn't it either. It depends on the branch instruction as to what works and what doesn't. How weird. Relative JMP is not triggering the fall-through flaw. It's the JSE1 instruction that has the problem and it doesn't matter about absolute vs relative.

PS: The fall-through happens, at the most, 1 in 20 events.

cgracey · 2018-12-17 17:25

I'm going to be at Parallax for the next few days getting the P2 Eval boards tested.

We will definitely address these issues. I'm really glad you guys are finding these things.

Rayman · 2018-12-17 18:06

I need to re-learn the fancier P2 tricks like REP and SETQ...

Will REP branching work in hubexec mode? What about lutexec mode?

(Do we still have a LUT exec mode?)

Can you do SETQ inside a REP?

cgracey · 2018-12-17 18:21

evanh wrote: »

Chip,
I've bumped into a design flaw/bug in lutRAM sharing! RDLUT data, or address, is being garbaged if the sharing cog WRLUTs to the same address on the same sysclock.

In my case, an instruction stall would also mess me up but it would have to be a number of clocks to produce the result I'm getting.

PS: I'm very certain. Testing is on P123 board with v32i image loaded.

This needs to be tested on the actual silicon. Could one of you guys with a P2D2 run Evan's code and see what you get? It's likely that there will be differences between the FPGA and the chip.

David Betz · 2018-12-17 18:27

cgracey wrote: »

evanh wrote: »

Chip,
I've bumped into a design flaw/bug in lutRAM sharing! RDLUT data, or address, is being garbaged if the sharing cog WRLUTs to the same address on the same sysclock.

In my case, an instruction stall would also mess me up but it would have to be a number of clocks to produce the result I'm getting.

PS: I'm very certain. Testing is on P123 board with v32i image loaded.

This needs to be tested on the actual silicon. Could one of you guys with a P2D2 run Evan's code and see what you get? It's likely that there will be differences between the FPGA and the chip.

Shouldn't we be concerned about any differences in behavior between the FPGA and the silicon? Wouldn't that likely mean some sort of timing problem that could cause intermittent failures?

potatohead · 2018-12-17 18:53

I think this kind of thing comes along with not clock gating, as well as real propagation differences between FPGA and silicon.

jmg · 2018-12-17 18:55

evanh wrote: »

PS: The fall-through happens, at the most, 1 in 20 events.

Did you try lower clock speeds ?
Can you get some test code that collects that ~ 5% failure number, to make it easier for others to run & confirm ?

David Betz wrote: »

Shouldn't we be concerned about any differences in behavior between the FPGA and the silicon? Wouldn't that likely mean some sort of timing problem that could cause intermittent failures?

Of course, but the FPGA is somewhat overclocked, so it may be that there are a few opcodes that do not quite manage things at the 80MHz.
Depends on the OnSemi tools, as to which opcodes are going to be the 'slowest' ones in the silicon. (Might even vary, cog to cog, too)

cgracey · 2018-12-17 18:59

There is always ambiguity when it comes to dual-port RAMs, how they react on simultaneous read and write. This is a memory IP issue. There could be differences between ON Semi's compiled memories and Altera's FPGA memory blocks.

potatohead · 2018-12-17 19:04

Makes sense. There are material differences in play then.

Looks like we need to find and explore edge cases where we can.

jmg · 2018-12-17 22:17

cgracey wrote: »

There is always ambiguity when it comes to dual-port RAMs, how they react on simultaneous read and write. This is a memory IP issue. There could be differences between ON Semi's compiled memories and Altera's FPGA memory blocks.

I found this on an Altera forum posting :

--- Data Quote Start ---
Mixed-Port Read-During-Write Mode
This mode applies to a RAM in simple or true dual-port mode, which has one port reading and the other port writing to the same address location with the same clock.
In this mode, you also have two output choices: old data or don't care.
In Old Data Mode, a read-during-write operation to different ports causes the RAM outputs to reflect the old data at that address location.
In Don't Care Mode, the same operation results in a "don't care" or unknown value on the RAM outputs.
--- Data Quote End ---

There are accompanying waveforms as well. Unknown means unknown folks. It does not mean "New Data." If you are going to write and read from the same address at the same time, it had better be because you "Don't Care" what the read data is.

They also say the 'old data' mode ticked needs extra logic (not all parts/cells have this option).

I guess the issue is new (write) data arrives sometime during the clock, only needing to be early enough to properly latch.
However, read paths see a mix of changing data, (vs normal 100% stable cell info) and it may be the read fails to settle in time.

You might expect that some slower clock speed can write, and then read-new ?
Given P2 is a 2-clock opcode, another option might be to write on one clock and read on the other ?

evanh · 2018-12-17 22:53

Rayman wrote: »

Will REP branching work in hubexec mode? What about lutexec mode?

(Do we still have a LUT exec mode?)

Yes, yes and yes, my printf for debug sits in lutRAM.

Can you do SETQ inside a REP?

Yes, REP itself doesn't have a use for SETQ so any other instruction is free to use its gift.

Small detail (my understanding): SETQ must prime a flag to say there is new data in the Q register. This can stay primed indefinitely. Only an instruction that can make use of Q will reset the flag upon use.

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments