Hardware oddity: Dual-Port Hazard

This is something I noticed a long while ago, but here's a proper demo:
Reading a dual port RAM cell at the same time it is written returns an indeterminate value.
(The stored value is fine, it's just a momentary glitch)
CON _CLKFREQ = 10_000_000 ' Higher speeds are more suceptible HAZARD_CELL = $004 ' Which cogRAM cell to test - can't go lower than 3 with this DAT org long 0[HAZARD_CELL-3] .outer rep @.inner,testlen xor .hazard,#31 ' ---\ nop ' | XOR result written on same cycle as BITH opcode fetch .hazard bith val,#0 ' <-----/ .inner cmp val,expect wz if_nz jmp #.cought add loopctr,#1 test loopctr,##1023 wz if_nz jmp #.outer debug("Nothing yet... ",udec(loopctr)) jmp #.outer .cought debug("Hazard cought! ",ubin_long(val),udec(loopctr),uhex_long(.hazard)) jmp #$ val long 0 expect long $80000001 loopctr long 0 testlen long 1024 fit 496
You may need to try different HAZARD_CELL values or increase _CLKFREQ to get a hit.
It should also be possible to reproduce this with LUT (only streamer and pair sharing use the 2nd LUT port, so it's harder to run into this).
Comments
Guess one need two NOPs with self-modifying code to be safe?
Yes, but that's documented and (I hope) well known.
(it most usually will give you the old value, so there'd be a bug, anyways)
I was looking around for such "documentation" but couldn't find anything on self-modifying code... Have you seen it somewhere?
P2 silicon doc, but I don't think where I originally got it.

Ok, thanks. Know what happened... Tried to use the Edge search, but that doesn't work in Google Docs, have to use their search
Still seems like there should be a section titled "Sefl-Modifying Code" with that note in it...
New p2docs page: https://p2docs.github.io/errata.html
Featuring all the favorites!
Good to have all this in the one place. Nice work.
For the unhealthily curious and pedantic: An instruction taking more than 2 cycles still performs simultaneous result write / instruction prefetch.
Here demonstrated using RDLUT (3-cycle instruction):
Also seems to really be the case that each chip has it's own pattern of which cells are hazardous at what frequency. (maybe more data is needed)
Obvious clarification: Branching instructions that take 4 cycles are actually 2-cycle instructions, the extra 2 cycles come from the next instruction that was already prefetched being flushed out of the pipeline.
So the behaviour for branches that write registers is that the hazard occurs with the branch target itself. So using CALLPA to call to PA causes a dual-port hazard on PA.
Though this one had some frankly weird quirks (maybe only on the 1 chip I tested?).
Also: LUT sharing hazard. This one is probably easier to run into on accident (when not porting P1 self-modifying code):
EDIT: I THINK THIS ONE IS INCORRECT AND JUST REACTS TO STALE DATA FROM THE LOADER
Didn't take into account that the debugger will delay Cog 1's startup by so long (more than the 8000 cycle waitx)
Ok, further experiments have been unable to confirm the existence of a LUT sharing hazard. I wonder if @cgracey worked around that one in particular.
EDIT: Messing with LUTexec code with sharing has also not revealed any hazards