Shop OBEX P1 Docs P2 Docs Learn Events
Hardware oddity: Dual-Port Hazard — Parallax Forums

Hardware oddity: Dual-Port Hazard

Wuerfel_21Wuerfel_21 Posts: 5,362
edited 2025-05-20 19:15 in Propeller 2

This is something I noticed a long while ago, but here's a proper demo:
Reading a dual port RAM cell at the same time it is written returns an indeterminate value.
(The stored value is fine, it's just a momentary glitch)

CON
_CLKFREQ = 10_000_000 ' Higher speeds are more suceptible
HAZARD_CELL = $004 ' Which cogRAM cell to test - can't go lower than 3 with this

DAT
              org
              long 0[HAZARD_CELL-3]
.outer
              rep @.inner,testlen
              xor .hazard,#31 ' ---\
              nop             '    | XOR result written on same cycle as BITH opcode fetch
.hazard       bith val,#0  ' <-----/
.inner
              cmp val,expect wz
        if_nz jmp #.cought
              add loopctr,#1
              test loopctr,##1023 wz
        if_nz jmp #.outer
              debug("Nothing yet... ",udec(loopctr))
              jmp #.outer

.cought
              debug("Hazard cought! ",ubin_long(val),udec(loopctr),uhex_long(.hazard))
              jmp #$


val           long 0
expect        long $80000001
loopctr       long 0
testlen       long 1024

              fit 496

You may need to try different HAZARD_CELL values or increase _CLKFREQ to get a hit.
It should also be possible to reproduce this with LUT (only streamer and pair sharing use the 2nd LUT port, so it's harder to run into this).

Comments

  • RaymanRayman Posts: 15,223

    Guess one need two NOPs with self-modifying code to be safe?

  • Wuerfel_21Wuerfel_21 Posts: 5,362
    edited 2025-05-20 16:29

    @Rayman said:
    Guess one need two NOPs with self-modifying code to be safe?

    Yes, but that's documented and (I hope) well known.
    (it most usually will give you the old value, so there'd be a bug, anyways)

  • RaymanRayman Posts: 15,223

    I was looking around for such "documentation" but couldn't find anything on self-modifying code... Have you seen it somewhere?

  • @Rayman said:
    I was looking around for such "documentation" but couldn't find anything on self-modifying code... Have you seen it somewhere?

    P2 silicon doc, but I don't think where I originally got it.

  • RaymanRayman Posts: 15,223

    Ok, thanks. Know what happened... Tried to use the Edge search, but that doesn't work in Google Docs, have to use their search :(

    Still seems like there should be a section titled "Sefl-Modifying Code" with that note in it...

  • New p2docs page: https://p2docs.github.io/errata.html
    Featuring all the favorites!

  • roglohrogloh Posts: 5,911

    @Wuerfel_21 said:
    New p2docs page: https://p2docs.github.io/errata.html
    Featuring all the favorites!

    Good to have all this in the one place. Nice work.

  • For the unhealthily curious and pedantic: An instruction taking more than 2 cycles still performs simultaneous result write / instruction prefetch.

    Here demonstrated using RDLUT (3-cycle instruction):

    CON
    _CLKFREQ = 320_000_000 ' Higher speeds are more suceptible
    HAZARD_CELL = $005 ' Which cogRAM cell to test - can't go lower than 4 with this
    
    DAT
                  org
                  long 0[HAZARD_CELL-4]
                  call #.setup
    .outer
                  rep @.inner,testlen
                  rdlut .hazard,ptra++ ' --\
                  nop               '      | RDLUT result written on same cycle as BITH opcode fetch (?)
    .hazard       bith val,#0  ' <--------/
    .inner
                  cmp val,expect wz
            if_nz jmp #.cought
                  add loopctr,#1
                  test loopctr,##1023 wz
            if_nz jmp #.outer
                  debug("Nothing yet... ",udec(loopctr),udec(#HAZARD_CELL))
                  jmp #.outer
    
    .cought
                  debug("Hazard cought! ",ubin_long(val),udec(loopctr),uhex_long(.hazard),udec(#HAZARD_CELL))
                  jmp #$
    
    .setup
                  mov tmp,.hazard
                  xor tmp,#31
                  mov ptra,#0
                  rep #2,#256
                  wrlut .hazard,ptra++
                  wrlut tmp,ptra++
                  ret
    
    
    val           long 0
    expect        long $80000001
    loopctr       long 0
    testlen       long 1024
    tmp           res 1
    
                  fit 496
    

    Also seems to really be the case that each chip has it's own pattern of which cells are hazardous at what frequency. (maybe more data is needed)

  • Obvious clarification: Branching instructions that take 4 cycles are actually 2-cycle instructions, the extra 2 cycles come from the next instruction that was already prefetched being flushed out of the pipeline.
    So the behaviour for branches that write registers is that the hazard occurs with the branch target itself. So using CALLPA to call to PA causes a dual-port hazard on PA.

    Though this one had some frankly weird quirks (maybe only on the 1 chip I tested?).

    CON
    _CLKFREQ = 200_000_000 ' Higher speeds are more suceptible
    
    
    DAT
                  org
                  mov pa,.ins1
                  mov val,expect ' < this is load-bearing for some reason
                                 ' since sometimes it only seems to execute .ins1
    .outer
                  rep @.inner,testlen
                  callpa .ins1,hazard_loc
                  callpa .ins2,hazard_loc
    .inner
                  cmp val,expect wz
            if_nz jmp #.cought
                  add loopctr,#1
                  test loopctr,##1023 wz
            if_nz jmp #.outer
                  debug("Nothing yet... ",udec(loopctr))
                  jmp #.outer
    
    .cought
                  debug("Hazard cought! ",ubin_long(val),udec(loopctr),uhex_long(pa))
                  jmp #$
    
    .ins1   _ret_ bith val,#0
    .ins2   _ret_ bith val,#31
    hazard_loc    long pa
    
    val           long 0
    expect        long $80000001
    loopctr       long 0
    testlen       long 1024
    
                  fit 496
    
  • Wuerfel_21Wuerfel_21 Posts: 5,362
    edited 2025-05-28 22:57

    Also: LUT sharing hazard. This one is probably easier to run into on accident (when not porting P1 self-modifying code):

    CON
    _CLKFREQ = 200_000_000 ' Higher speeds are more suceptible
    HAZARD_CELL = $020 ' Which LUTRAM cell to test
    
    DAT
                  org
                  coginit #1|COGEXEC, ##@other_entry
                  setluts #1 ' <- receiving cog needs to enable
                  waitx ##8000
    .outer
                  mov tmp,testlen
    .inner
                  rdlut val,hzc_ours
                  cmp val,ones_ours wz
            if_nz cmp val,zero_ours wz
            if_z  djnz tmp,#.inner ' loop length is 11, also prime
    
            if_nz jmp #.cought
                  add loopctr,#1
                  test loopctr,##1023 wz
            if_nz jmp #.outer
                  debug("Nothing yet... ",udec(loopctr),udec(#HAZARD_CELL))
                  jmp #.outer
    
    .cought
                  debug("Hazard cought! ",ubin_long(val),udec(loopctr),udec(#HAZARD_CELL))
                  jmp #$
    
    val           long 0
    ones_ours     long -1
    zero_ours     long 0
    loopctr       long 0
    testlen       long 1024
    hzc_ours      long HAZARD_CELL
    tmp           res 1
    
                  fit 496
    
                  org 0
    other_entry
    
                  rep @.wrloop,#0
                  wrlut ones_other,hzc_other
                  wrlut zero_other,hzc_other
                  waitx #1 ' so loop is 7 cycles, nice prime number
    .wrloop
                  jmp #other_entry
    
    ones_other    long -1
    zero_other    long 0
    hzc_other     long HAZARD_CELL
    
    

    EDIT: I THINK THIS ONE IS INCORRECT AND JUST REACTS TO STALE DATA FROM THE LOADER
    Didn't take into account that the debugger will delay Cog 1's startup by so long (more than the 8000 cycle waitx)

  • Wuerfel_21Wuerfel_21 Posts: 5,362
    edited 2025-05-28 23:49

    Ok, further experiments have been unable to confirm the existence of a LUT sharing hazard. I wonder if @cgracey worked around that one in particular.

    EDIT: Messing with LUTexec code with sharing has also not revealed any hazards

Sign In or Register to comment.