Shop OBEX P1 Docs P2 Docs Learn Events
flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler - Page 131 — Parallax Forums

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

1125126127128129131»

Comments

  • __deets____deets__ Posts: 216
    edited 2026-02-08 19:21

    Ok so I found the culprit. I've written a program (years ago) that opens the serial port and just prints out what it reads. I call it stail for serial tail. And this detects when the propeller is being programmed, and shuts down the serial port so the loadp2 can do it's work, waits a second and tries to re-open the port.

    And it appears that has changed its behavior, possibly my new underlying Linux or something. After fiddling with it, I don't seem to suffer from this stray reset. And guess what, the not-working.spin2 happens to work.

    DIEZ 2026-02-01: sx1268_init:0
    sx1268-send-test: 0
    Lora1: sx1268: start send test.
    
    Lora1: sx1268: chip is busy.
    
    Lora1: sx1268: set standby failed.
    
    Lora1: sx1268: chip is busy.
    
    sx1268-send-test:failed
    
    

    So more investigation needed.

  • I just tried this:

    VAR
      long  position
      long  timeStamp
    
    PUB GetPosition (): pos | p, t, ct, dt, v, h
      p:= @position
      ORG
            setq    #1              ' atomic read of p+t
            rdlong  p,p
      END
      ct:= getct ()
    ...
    

    position and timeStamp are written by another cog. I assumed that p and t are consecutive registers which are later used in another assembler inline section. I think Chip used tricks like this in P1 Spin a lot which relied on the order of stack parameters and local variable in memory. But in P1 Spin those were in hub RAM, not cog RAM. The problem is that FlexSpin assigns registers in the order the variables appear in the code of the function, not in the list of local variables in the header. So _var01 is allocated to p and _var02 to ct instead of t.

    I know this is not a bug. It was never promised to work, just my optimistic assumption. o:) But is there a way to make it work? If I do

    PUB GetPosition (): pos | p, t, ct, dt, v, h
      p:= position
      t:= timeStamp
    ...
    

    instead my code works but, of course, the access to p and t is not atomic. SETQ + RDLONG is a nice way to avoid locks and delays.

  • @ManAtWork said:
    But is there a way to make it work?

    Easiest would be to add {++opt(!fast-inline-asm)} between PUB/PRI and the function name. That'll force it to do the whole interpreter-style variable copy (slow and nasty) for ORG/END sections.

    But I think if the variable you read into is a quad, array or structure, it will always place that in consecutive registers, so try that first.

  • evanhevanh Posts: 17,142
    edited 2026-03-04 23:11

    If those locals are exclusively Pasm used then move them into a group of labelled RES's at end of ORG section.

    VAR
      long  position
      long  timeStamp
    
    PUB GetPosition (): pos | ct, dt, v, h
      ct:= @position
      ORG
            setq    #1              ' atomic read of p+t
            rdlong  p,ct
            ...
            ret
    
    p       res 1
    t       res 1
      END
      ct:= getct ()
    
  • evanhevanh Posts: 17,142
    edited 2026-03-05 07:46

    Eric,
    Looks like a regression in the includes at some point. The newer _sdmm_open() base function, used by _vfs_open_sdcardx(), for starting up Flexspin's built-in SD driver is not declared anywhere now. I've previously been directly using this function due to its compatibility with the external driver plug-in mechanism. To get it in scope now, I place the following line in my top source file:

    vfs_file_t *_sdmm_open(int pclk, int pss, int pdi, int pdo) _IMPL("filesys/block/sdmm_vfs.c");
    
  • @Wuerfel_21 said:
    But I think if the variable you read into is a quad, array or structure, it will always place that in consecutive registers, so try that first.

    I didn't know that I can declare local array variables, so far. But I tried and it seems to work. :)

    PUB GetPosition (): pos | p[2], t, ct, dt, v, h
      p:= @position
      ORG
            setq    #1              ' atomic read of p+t
            rdlong  p,p
      END
      t:= p[1]
      ct:= getct ()
    

    @evanh Your idea is also good but unfortunatelly doesn't work in my case. I have a second inline assembler section after some calculations in Spin2 and I couldn't acess the p and t res 1 variables from the second section. Ok, it would work with some additional copy instructions but is way more complicated than the solution above. But thanks, anyway.

  • evanhevanh Posts: 17,142
    edited 2026-03-05 11:01

    I'd never have two Pasm sections split. That's double dipping on the Fcache overheads. I sure hope there's a damn good reason not to just merge them.

  • @evanh said:
    I'd never have two Pasm sections split. That's double dipping on the Fcache overheads. I sure hope there's a damn good reason not to just merge them.

    Yeah, you're absolutely right. I could translate the Spin code between the two assembler sections to assembler, too. I kept it because it was easier to debug and also for historical reasons. This was a P1 program, once. I've ported it to the P2 and added the 64 bit arithmetic later.

    PUB GetPosition (): pos | p[2], t, ct, dt, v, h
      p:= @position
      ORG
            setq    #1              ' atomic read of p+t
            rdlong  p,p
      END
      t:= p[1]
      ct:= getct ()
      dt:= ct - lastTime
      if p <> lastPos       ' position changed?
        last2Pos:= lastPos
        last2Time:= lastTime
        lastPos:= p
        lastTime:= t
      elseif dt > maxPeriod ' < min. speed
        last2Pos:= p                ' force v=0
        lastTime:= ct - maxPeriod
        last2Time:= lastTime - maxPeriod    ' avoid timer overflow
      else
        lastTime:= t ' update in the case dithering changed t
      ' v:= (lastPos - last2Pos) * cycleTime / (lastTime - last2Time)
      ' pos:= lastPos + v*dt / cycleTime
      p:= lastPos - last2pos
      t:= lastTime - last2Time
      ct:= cycleTime
      ORG ' use 64 bit arithmetic because v*dt or p*cycleTime could be >2^31
            abs     p wc
            qmul    p,ct ' (lastPos - last2Pos) * cycleTime
            getqx   p
            getqy   h
            setq    h
            qdiv    p,t             ' / (lastTime - last2Time)
            getqx   v
            qmul    v,dt            ' * dt
            getqx   v
            getqy   h
            setq    h
            qdiv    v,ct   ' / cycleTime
            getqx   p
            negc    p
      END
      pos:= lastPos + p
    
  • ersmithersmith Posts: 6,279

    @evanh said:
    Eric,
    Looks like a regression in the includes at some point. The newer _sdmm_open() base function, used by _vfs_open_sdcardx(), for starting up Flexspin's built-in SD driver is not declared anywhere now. I've previously been directly using this function due to its compatibility with the external driver plug-in mechanism. To get it in scope now, I place the following line in my top source file:

    vfs_file_t *_sdmm_open(int pclk, int pss, int pdi, int pdo) _IMPL("filesys/block/sdmm_vfs.c");
    

    I don't think that function was ever declared in header files, it was probably getting dragged in as a side effect of something else. Anyway, I've added an explicit declaration now. Thanks for catching this.

  • ersmithersmith Posts: 6,279

    @ManAtWork said:

    @Wuerfel_21 said:
    But I think if the variable you read into is a quad, array or structure, it will always place that in consecutive registers, so try that first.

    I didn't know that I can declare local array variables, so far. But I tried and it seems to work. :)

    Yes, that should work. So should structures, but if you're directly manipulating elements in assembly language using an array might be clearer.

  • @ersmith said:
    I don't think that function was ever declared in header files, it was probably getting dragged in as a side effect of something else. Anyway, I've added an explicit declaration now. Thanks for catching this.

    Remember, I separated out some of the SDMM stuff because it was getting compiled in and bloating binaries even when using an external block driver (SDSD). I think dead code elimination doesn't work well for functions that are referred to by function pointers somewhere.

  • ersmithersmith Posts: 6,279

    @Wuerfel_21 said:

    @ersmith said:
    I don't think that function was ever declared in header files, it was probably getting dragged in as a side effect of something else. Anyway, I've added an explicit declaration now. Thanks for catching this.

    Remember, I separated out some of the SDMM stuff because it was getting compiled in and bloating binaries even when using an external block driver (SDSD). I think dead code elimination doesn't work well for functions that are referred to by function pointers somewhere.

    Right, but I don't think just declaring the function should cause any issues. I looked at your patch that did the re-arrangement, and I don't think sdmm_open was declared before that either (unless I missed something).

  • evanhevanh Posts: 17,142
    .../include/filesys/block/sdmm_vfs.c:60: warning: Redefining function or subroutine _sdmm_open with an incompatible type
    

    I don't think struct vfs was a good choice. :)

  • ersmithersmith Posts: 6,279

    @evanh said:

    .../include/filesys/block/sdmm_vfs.c:60: warning: Redefining function or subroutine _sdmm_open with an incompatible type
    

    I don't think struct vfs was a good choice. :)

    Oops yes, thanks. I've pushed an update.

  • I've just tried to recompile some code I've written in 2022. IIRC it worked back then and didn't throw any error messages. But now I get

    B2I_Axis.c:290: warning: Deleting apparently unused cordic instruction qmul

    void BackpropAsm (AxisBuffEntry* buffer, int wri, int rdi)
    {
        uint32_t dlo = 0;
        uint32_t dhi = 0;
        uint32_t v2lo = halfAcc2;
        uint32_t v2hi = 0;
        fixp30_t v1, vMax;
        uint32_t* ptr;
        uint32_t hA = halfAcc;
        uint32_t nA = nomAcc;
        uint32_t mlo, mhi;
        __asm{
    .loop
            decmod wri,#AXIS_BUFFER_SIZE-1
            mov    ptr,wri
            shl    ptr,#6
            add    ptr,buffer
            add    ptr,#40   // v1= buffer[i].vNom
            rdlong v1,ptr
            mov    vMax,hA
            add    vMax,v1
            qmul   vMax,vMax // vMax² // <- here, line 290
            add    dlo,v1 wc
            addx   dhi,#0
            qmul   v1,nA     // v1 * nomAcc
            add    ptr,#8
            setq   #2-1
            wrlong dlo,ptr   // buffer[i].dDec= dist
            getqx  mlo
            getqy  mhi
            shr    mlo,#30
            mov    v1,mhi
            shl    v1,#2
            shr    mhi,#30
            or     mlo,v1
            cmp    v2lo,mlo wcz // if (v2 > vMax²) break
            cmpx   v2hi,mhi wcz
            getqx  mlo
            getqy  mhi
      if_a  jmp    #.break
            mov    v1,mlo
            shl    mhi,#3
            shl    mlo,#3
            shr    v1,#32-3
            or     mhi,v1
            add    v2lo,mlo wc // v2+= (v1 * nomAcc) * 8
            addx   v2hi,mhi
            cmp    wri,rdi wz
      if_nz jmp    #.loop
    .break
        }
    }
    

    Ok, I can use __asm const. But "improving" the optimization leads to breaking code that has worked, before.

  • @ManAtWork said:
    I've just tried to recompile some code I've written in 2022. IIRC it worked back then and didn't throw any error messages. But now I get

    B2I_Axis.c:290: warning: Deleting apparently unused cordic instruction qmul

    Ok, I can use __asm const. But "improving" the optimization leads to breaking code that has worked, before.

    Pipelined CORDIC operations were never guaranteed to work in non const/volatile context because the compiler is free to delete/add/move instructions, since the semantics of pipelined operations depend on exact timing (i.e. if the result window is missed, an instruction is skipped). This is too difficult to deal with, so CORDIC-related IR transformations are created with the assumption of a single operation in flight at a time. Removing a QMUL when there's no associated GETQX (e.g. has been removed by dead code pass) is necessary for correctness (this was actually causing bugs at some point).

  • @Wuerfel_21 said:
    Removing a QMUL when there's no associated GETQX (e.g. has been removed by dead code pass) is necessary for correctness (this was actually causing bugs at some point).

    Ok, this is true for compiler generated code where optimization and dead code removal makes sense. But if I'm pretty sure that my assembler code is already highly optimized I think it's safer to just use __asm const all the time, right? Or could you imagine a case where this causes problems?

  • evanhevanh Posts: 17,142

    const and volatile are generally a good thing to use when you're crafting it to the hardware. Leaving them out is you saying you're happy to have the optimiser rearrange things.

  • @ManAtWork said:

    @Wuerfel_21 said:
    Removing a QMUL when there's no associated GETQX (e.g. has been removed by dead code pass) is necessary for correctness (this was actually causing bugs at some point).

    Ok, this is true for compiler generated code where optimization and dead code removal makes sense. But if I'm pretty sure that my assembler code is already highly optimized I think it's safer to just use __asm const all the time, right? Or could you imagine a case where this causes problems?

    Basically, for the code you showed, use __asm const or __asm volatile (if you expect it to loop often).

    Using __asm const prevents the compiler from modifying the insides of the block at all (except for register allocation) and can cause missed optimizations around the block, too. If you're just writing a quick couple of instructions that would be hard to express in C in the middle of a function, the plain __asm block is better.
    random example pulled from a project:

                    #ifdef __propeller2__
                        __asm {
                            sub bc,#1
                            testb d,bc wc
                            bmask bc
                            if_nc sub d,bc
                        }
                    #else
                        bc = 1 << (bc - 1);             /* MSB position */
                        if (!(d & bc)) d -= (bc << 1) - 1;  /* Restore negative value if needed */
                    #endif
    

    A lot of the compiler's builtin functions are also implemented using this. For example the 64 bit compare:

    ' compare signed alo, ahi, return -1, 0, or +1
    pri _int64_cmps(alo, ahi, blo, bhi) : r
      asm
          cmp  alo, blo wc,wz
          cmpsx ahi, bhi wc,wz
    if_z  mov  r, #0
    if_nz negc r, #1
      endasm
    

    If the input values are constants, they can be propagated through the code.

    To recap:

    Spin syntax C syntax does what
    ASM / ENDASM __asm {} Treated the same as compiler-generated instructions
    ORGH/END __asm const {} Instructions can not be modified by optimizer (thus typically assemble as-is into hub RAM)
    ORG/END __asm volatile {} Block is copied into cog RAM before executing, can not be modified by optimizer)
Sign In or Register to comment.