Chip,
I found another obscure flaw: Branching out of a REP loop at the final instruction of the loop causes a crash of some sort. The cog active LED is still on but it looks stopped otherwise.
EDIT: Update: The branch is not taken, the loop stops. The cog appears to be locked.
Heh, it might be worse still. I've got a REP @label,#0 (loops forever) occasionally dropping out the bottom of the loop when a repeatedly branching condition breaks out and restarts the REP.
mov pa,#0
mov pb,#0
rep #5,#10
or pa,pa
or pb,pb
add pa,#1
incmod pb,#7 wz
if_z jmp #escape
long 0[5] '5 * NOP's, Must be same length as REP loop
escape jmp #$
With the NOP's REP loop works as expected.
Without the NOP's, an extra loop is executed.
mov pa,#0
mov pb,#0
rep #5,#10
or pa,pa
or pb,pb
add pa,#1
incmod pb,#7 wz
if_z jmp #escape
long 0[5] '5 * NOP's, Must be same length as REP loop
escape jmp #$
With the NOP's REP loop works as expected.
Without the NOP's, an extra loop is executed.
That's weird. If you really need to add trailing packer, aka NOP's, Must be same length as REP loop, that's far from 'working as expected'
How can the loop even 'know' how many NOPs follow it, unless the exit does not cleanly exit, but falls out with some size counter still set ?
rep @.monend, #0 'loop forever (REP reduces monitor loop by two instructions)
.monl
waitx delay '2 WAITX reduces monitor loop by one instruction
jse1 #.keyboard '4 branch on event reduces monitor loop by one instruction
'decimator
rdlut samp, #(monitor & $1ff) '7
sub samp, diff1 '9
add diff1, samp '11
sub samp, diff2 '13
add diff2, samp '15
sub samp, diff3 '17
add diff3, samp '19
sub samp, diff4 '21
add diff4, samp '23
'spit to DAC for scope
add samp, offset '25 offset centring
wypin samp, #1 '27 smartpin 16-bit dither into DAC1
.monend
cogatn #1 'cog #0
cogstop cid
I've got the JSE1 branch up near the start of the loop. No longer any lock-ups but the COGATN off the bottom can still be triggered just by triggering the SE1 event. "keyboard" code is further back and naturally falls into the REP.
I only had the COGSTOP earlier. I started noticing that cog was dropping out. So I added the COGATN to verify. Now both that cog is stopping and cog #0 is getting an attention event.
Chip,
I found another obscure flaw: Branching out of a REP loop at the final instruction of the loop causes a crash of some sort. The cog active LED is still on but it looks stopped otherwise.
EDIT: Update: The branch is not taken, the loop stops. The cog appears to be locked.
I'll look into this. At first glance, I don't see what's causing it.
Edit: The issue seems to be related to relative branches.
If the last instruction is an absolute branch, the loop terminates Ok
Ah, za'very interesting. Testing this with the fall-through variant and it has to specifically be an immediate absolute to start working. Register absolute doesn't work any better than PC-relative.
EDIT: No, that isn't it either. It depends on the branch instruction as to what works and what doesn't. How weird. Relative JMP is not triggering the fall-through flaw. It's the JSE1 instruction that has the problem and it doesn't matter about absolute vs relative.
PS: The fall-through happens, at the most, 1 in 20 events.
Chip,
I've bumped into a design flaw/bug in lutRAM sharing! RDLUT data, or address, is being garbaged if the sharing cog WRLUTs to the same address on the same sysclock.
In my case, an instruction stall would also mess me up but it would have to be a number of clocks to produce the result I'm getting.
PS: I'm very certain. Testing is on P123 board with v32i image loaded.
This needs to be tested on the actual silicon. Could one of you guys with a P2D2 run Evan's code and see what you get? It's likely that there will be differences between the FPGA and the chip.
Chip,
I've bumped into a design flaw/bug in lutRAM sharing! RDLUT data, or address, is being garbaged if the sharing cog WRLUTs to the same address on the same sysclock.
In my case, an instruction stall would also mess me up but it would have to be a number of clocks to produce the result I'm getting.
PS: I'm very certain. Testing is on P123 board with v32i image loaded.
This needs to be tested on the actual silicon. Could one of you guys with a P2D2 run Evan's code and see what you get? It's likely that there will be differences between the FPGA and the chip.
Shouldn't we be concerned about any differences in behavior between the FPGA and the silicon? Wouldn't that likely mean some sort of timing problem that could cause intermittent failures?
Shouldn't we be concerned about any differences in behavior between the FPGA and the silicon? Wouldn't that likely mean some sort of timing problem that could cause intermittent failures?
Of course, but the FPGA is somewhat overclocked, so it may be that there are a few opcodes that do not quite manage things at the 80MHz.
Depends on the OnSemi tools, as to which opcodes are going to be the 'slowest' ones in the silicon. (Might even vary, cog to cog, too)
There is always ambiguity when it comes to dual-port RAMs, how they react on simultaneous read and write. This is a memory IP issue. There could be differences between ON Semi's compiled memories and Altera's FPGA memory blocks.
There is always ambiguity when it comes to dual-port RAMs, how they react on simultaneous read and write. This is a memory IP issue. There could be differences between ON Semi's compiled memories and Altera's FPGA memory blocks.
I found this on an Altera forum posting :
--- Data Quote Start ---
Mixed-Port Read-During-Write Mode
This mode applies to a RAM in simple or true dual-port mode, which has one port reading and the other port writing to the same address location with the same clock.
In this mode, you also have two output choices: old data or don't care.
In Old Data Mode, a read-during-write operation to different ports causes the RAM outputs to reflect the old data at that address location.
In Don't Care Mode, the same operation results in a "don't care" or unknown value on the RAM outputs.
--- Data Quote End ---
There are accompanying waveforms as well. Unknown means unknown folks. It does not mean "New Data." If you are going to write and read from the same address at the same time, it had better be because you "Don't Care" what the read data is.
They also say the 'old data' mode ticked needs extra logic (not all parts/cells have this option).
I guess the issue is new (write) data arrives sometime during the clock, only needing to be early enough to properly latch.
However, read paths see a mix of changing data, (vs normal 100% stable cell info) and it may be the read fails to settle in time.
You might expect that some slower clock speed can write, and then read-new ?
Given P2 is a 2-clock opcode, another option might be to write on one clock and read on the other ?
Will REP branching work in hubexec mode? What about lutexec mode?
(Do we still have a LUT exec mode?)
Yes, yes and yes, my printf for debug sits in lutRAM.
Can you do SETQ inside a REP?
Yes, REP itself doesn't have a use for SETQ so any other instruction is free to use its gift.
Small detail (my understanding): SETQ must prime a flag to say there is new data in the Q register. This can stay primed indefinitely. Only an instruction that can make use of Q will reset the flag upon use.
Comments
I found another obscure flaw: Branching out of a REP loop at the final instruction of the loop causes a crash of some sort. The cog active LED is still on but it looks stopped otherwise.
EDIT: Update: The branch is not taken, the loop stops. The cog appears to be locked.
I believe its a pipeline thing.
Something to be fixed though. Can't have it just locking solid.
EDIT: Rewrote for clarity.
With the NOP's REP loop works as expected.
Without the NOP's, an extra loop is executed.
That's weird. If you really need to add trailing packer, aka NOP's, Must be same length as REP loop, that's far from 'working as expected'
How can the loop even 'know' how many NOPs follow it, unless the exit does not cleanly exit, but falls out with some size counter still set ?
That part is easy enough to explain away. It's just a case of coincidence with both REP and JMP vying for writing the program counter together.
The new variant with intermittent outcome is not so easy to explain though.
With NOP's
pa = 8, pb = 0 'correct result
without NOP's
pa = 9,pb=1 'incorrect (extra loop excuted)
Cog #0 is reporting data over the comport.
hmm.... but one is going back, and one is going forward, you would expect one to simply win, with no knowledge of the size of the other.
I'll look into this. At first glance, I don't see what's causing it.
Did you see the shared lut bug mentioned a few message earlier in this thread?
1) REP lockup - https://forums.parallax.com/discussion/comment/1458133/#Comment_1458133
2) REP fall-through - https://forums.parallax.com/discussion/comment/1458148/#Comment_1458148
3) RDLUT corruption - https://forums.parallax.com/discussion/comment/1458120/#Comment_1458120
Branching inside a REP doesn't sound like a good idea...
So the only penalty is meant to be termination of the REP looping. Which is perfectly reasonable. And I was counting on it in my code.
Edit: The issue seems to be related to relative branches.
If the last instruction is an absolute branch, the loop terminates Ok
Ah, za'very interesting. Testing this with the fall-through variant and it has to specifically be an immediate absolute to start working. Register absolute doesn't work any better than PC-relative.
EDIT: No, that isn't it either. It depends on the branch instruction as to what works and what doesn't. How weird. Relative JMP is not triggering the fall-through flaw. It's the JSE1 instruction that has the problem and it doesn't matter about absolute vs relative.
PS: The fall-through happens, at the most, 1 in 20 events.
We will definitely address these issues. I'm really glad you guys are finding these things.
Will REP branching work in hubexec mode? What about lutexec mode?
(Do we still have a LUT exec mode?)
Can you do SETQ inside a REP?
This needs to be tested on the actual silicon. Could one of you guys with a P2D2 run Evan's code and see what you get? It's likely that there will be differences between the FPGA and the chip.
Did you try lower clock speeds ?
Can you get some test code that collects that ~ 5% failure number, to make it easier for others to run & confirm ?
Of course, but the FPGA is somewhat overclocked, so it may be that there are a few opcodes that do not quite manage things at the 80MHz.
Depends on the OnSemi tools, as to which opcodes are going to be the 'slowest' ones in the silicon. (Might even vary, cog to cog, too)
Looks like we need to find and explore edge cases where we can.
I found this on an Altera forum posting :
--- Data Quote Start ---
Mixed-Port Read-During-Write Mode
This mode applies to a RAM in simple or true dual-port mode, which has one port reading and the other port writing to the same address location with the same clock.
In this mode, you also have two output choices: old data or don't care.
In Old Data Mode, a read-during-write operation to different ports causes the RAM outputs to reflect the old data at that address location.
In Don't Care Mode, the same operation results in a "don't care" or unknown value on the RAM outputs.
--- Data Quote End ---
There are accompanying waveforms as well. Unknown means unknown folks. It does not mean "New Data." If you are going to write and read from the same address at the same time, it had better be because you "Don't Care" what the read data is.
They also say the 'old data' mode ticked needs extra logic (not all parts/cells have this option).
I guess the issue is new (write) data arrives sometime during the clock, only needing to be early enough to properly latch.
However, read paths see a mix of changing data, (vs normal 100% stable cell info) and it may be the read fails to settle in time.
You might expect that some slower clock speed can write, and then read-new ?
Given P2 is a 2-clock opcode, another option might be to write on one clock and read on the other ?
Yes, REP itself doesn't have a use for SETQ so any other instruction is free to use its gift.
Small detail (my understanding): SETQ must prime a flag to say there is new data in the Q register. This can stay primed indefinitely. Only an instruction that can make use of Q will reset the flag upon use.