Shop OBEX P1 Docs P2 Docs Learn Events
flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler - Page 124 — Parallax Forums

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

1118119120121122124»

Comments

  • ersmithersmith Posts: 6,079

    @evanh Yeah, the inline assembly code really isn't equipped to deal with _RET_ modifiers in any sensible way. Probably it should just throw an error on seeing them rather than trying to insert the RET instruction.

  • evanhevanh Posts: 16,059

    @evanh said:
    ... But I've hit a niggle that is probably something Chip needs to sort out ...

    Huh, LOL, I tried to make an example for Chip to work with, only to find Pnut completely handles everything no problem.
    Output from the Pnut v46 compiled example:

    Cog0  i = $D, p_clk = $22, p_dat = $23, m_lead = $24, v_nco = $0, m_se2 = $0, m_dat = $25, m_crc = $26
    Cog0  i = $14, p_clk = $2200_0000, p_dat = $2300_0000, m_lead = $2400_0000, v_nco = $0, m_se2 = $0, m_dat = $2500_0000, m_crc = $2600_0000
    

    Errors, including debug's undefines, compiling wih Flexspin Version 7.0.0-beta2-v7.0.0-beta2-34-gcbe5f1a4 Compiled on: Jan 3 2025

    reg_ref_test.spin2:36: error: Undefined symbol p_clk
    reg_ref_test.spin2:36: error: Undefined symbol p_dat
    reg_ref_test.spin2:36: error: Undefined symbol m_lead
    reg_ref_test.spin2:36: error: Undefined symbol v_nco
    reg_ref_test.spin2:36: error: Undefined symbol m_se2
    reg_ref_test.spin2:36: error: Undefined symbol m_dat
    reg_ref_test.spin2:36: error: Undefined symbol m_crc
    reg_ref_test.spin2:45: error: Undefined symbol p_clk
    reg_ref_test.spin2:45: error: Undefined symbol p_dat
    reg_ref_test.spin2:45: error: Undefined symbol m_lead
    reg_ref_test.spin2:45: error: Undefined symbol v_nco
    reg_ref_test.spin2:45: error: Undefined symbol m_se2
    reg_ref_test.spin2:45: error: Undefined symbol m_dat
    reg_ref_test.spin2:45: error: Undefined symbol m_crc
    reg_ref_test.spin2:49: error: data item is not constant
    
  • ersmithersmith Posts: 6,079

    @evanh said:

    @evanh said:
    ... But I've hit a niggle that is probably something Chip needs to sort out ...

    Huh, LOL, I tried to make an example for Chip to work with, only to find Pnut completely handles everything no problem.
    Output from the Pnut v46 compiled example:

    Cog0  i = $D, p_clk = $22, p_dat = $23, m_lead = $24, v_nco = $0, m_se2 = $0, m_dat = $25, m_crc = $26
    Cog0  i = $14, p_clk = $2200_0000, p_dat = $2300_0000, m_lead = $2400_0000, v_nco = $0, m_se2 = $0, m_dat = $2500_0000, m_crc = $2600_0000
    

    Unfortunately this is where my model of inline assembly (constructed prior to the existence of Spin2) breaks down. I had envisioned it as being used for short snippets of inline code, and wanted therefore to make it fast to call and tightly integrated with the rest of the compiler. This made sense because the compiled code is usually almost as fast as inline assembly would be anyway, so the asm mostly is needed only for things that are hard to express in the source language. But it meant having to convert it into the internal representation used by the compiler, which is somewhat restricted compared to general assembly.

    In PNut, by contrast, the inline assembly blocks are essentially mini-DAT sections. Setting them up and running them is relatively expensive. The interpreter is slow enough that the trade-off is worth it, and people are using inline assembly more and more.

    I already have something like the mini-DAT sections implemented for nucode (-2nu output) and I can probably port that over to the regular inline assembly. It'll be much slower to get into/out of these blocks, because we'll have to move the data in variables to and from temporary registers. On the plus side that will mean that the restriction on using variables in memory will be lifted. Is it worth it? I guess PNut compatibility is paramount, so I'll probably go ahead and do it, but the generated code will be ugly. For backwards compatibility (and performance) the current system will continue to be used for plain unadorned asm blocks.

  • evanhevanh Posts: 16,059
    edited 2025-01-05 09:23

    But the DAT like "variables" aren't scoped outside of register space though. They can't be, and I don't need them, accessed as C locals. I'm struggling to see why a drastic change would be needed.

    All I'm looking for here is the compiler to accept the symbol label as a register number to fill those DAT like LONGs. Specifically a fix for this error: reg_ref_test.spin2:49: error: data item is not constant

    EDIT: I guess you're looking at the other errors with the debug() function. I didn't know they would be a headache. Forget them then. I don't need that to work.

    PS: I'm looking at a C solution here too. The Spin example was done for testing Pnut is all.

    In the below listing the ptr symbol is currently set with two bit-shifted constants, but I'd like to replace the 0x1f6 with PA register symbol, and, importantly, COGCRCBUFF will become a symbolic reference to one of the later RES symbols. Both are cogRAM addresses and get used by an ALTI instruction.

    clocks      long 1 + 512 * 2    // start-bit + nibble count
    lnco        long 0x8000_0000    // tells RDFAST/WRFAST not to wait for FIFO ready
    endbit      long 0xffff_ffff
    poly        long 0x8408    // CRC-16-CCITT reversed (x16 + x12 + x5 + x0: 0x1021, even parity)
    ptr     long 0x1f6<<19 | COGCRCBUFF<<9    // register PA for ALTI result substitution, and CRC buffer address
    crc3        long 0
    crc2        long 0
    crc1        long 0
    crc0        long 0
    
  • ersmithersmith Posts: 6,079

    @evanh said:
    But the DAT like "variables" aren't scoped outside of register space though. They can't be, and I don't need them, accessed as C locals. I'm struggling to see why a drastic change would be needed.

    All I'm looking for here is the compiler to accept the symbol label as a register number to fill those DAT like LONGs. Specifically a fix for this error: reg_ref_test.spin2:49: error: data item is not constant

    Yeah, that's the problem -- the current inline assembly parser does not know any symbol values, and in general cannot because it runs before the optimizer and "linker". Of course for the case of inline assembly that goes into COG memory we can in principle calculate them, but that's adding another patch onto an already creaky structure. The whole LONG/RES support is already something that's kind of bolted on to the compiler internal representation. That's also the reason that _RET_ prefixes aren't supported, the compiler doesn't generate them so there's no way to represent them in the IR.

    I think the right solution is to bite the bullet and do a full assembly of the contents of ORG/END to binary, just as though it's inside the DAT section, rather than a partial assembly to IR that's then passed to later compiler passes. I've already got that for the P2 bytecode backend. The only tricky part I have left is accessing the local variables inside assembly; in bytecode the variables are always in HUB memory so we just copy from HUB to a designated part of COG before the inline asm runs, and back again after. We can do something similar for assembly output too. This will also apply to C __asm volatile (but not regular __asm, which will continue to be parsed with the existing code).

  • Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).

    Unrelatedly ORGH/END seems to have been broken at some point.

  • ersmithersmith Posts: 6,079

    @Wuerfel_21 said:
    Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).

    We can add an option to make it work the current way, but the default should be the new (more compatible) way. Making it an optimization flag would let us change it at function level granularity, which should be good enough I think?

    Unrelatedly ORGH/END seems to have been broken at some point.

    Huh, that's funny. I'll look into it.

  • @ersmith said:

    @Wuerfel_21 said:
    Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).

    We can add an option to make it work the current way, but the default should be the new (more compatible) way. Making it an optimization flag would let us change it at function level granularity, which should be good enough I think?

    I'd do it the opposite way - the current thing is good enough for most/all inline ASM that isn't super cracked and more efficient. You'd regress all the code that worked fine before to being slower. So make it an opt-in flag. The way attributes are given doesn't disturb PNut's parsing, so this is fine in all cases.

  • evanhevanh Posts: 16,059

    A switch on the function definition would work fine for me. These big chunks of assembly definitely have their own wrapper each.

  • ersmithersmith Posts: 6,079
    edited 2025-01-06 00:16

    OK, for now the current method of inline assembly is the default for -O1 and above. It's in an optimization flag called -Ofast-inline-asm, so it can be turned off globally with -Ono-fast-inline-asm or in a specific function by placing {++opt(no-fast-inline-asm)} after the PUB or PRI (and similarly in C with an __attribute__(opt(no-fast-inline-asm)) on the function declaration).

  • evanhevanh Posts: 16,059
    edited 2025-01-06 03:12

    Oh noes ... where's the assembly gone?

        callpa  #(@LR__0031-@LR__0030)>>2,fcache_load_ptr_
    LR__0030
        org 0
        byte    $03, $00, $90, $fe, $28, $0c, $64, $fd, $f6, $17, $00, $fb, $0b, $98, $03, $f6
        byte    $0c, $9a, $03, $f6, $0d, $9c, $03, $f6, $0e, $9e, $03, $f6, $0f, $a0, $03, $f6
        byte    $10, $a2, $03, $f6, $11, $a4, $03, $f6, $2d, $00, $64, $fd
        fit 256
    LR__0031
    

    PS: It doesn't work either. I should be getting

       clkfreq = 200000000   clkmode = 0x10009fb
      Compiled with FlexC v7.0.0-beta2-v7.0.0-beta2-47-gb889cbdb
     t1 = 00000056
     t2 = 00000057
     t3 = 00000000
     t4 = 00000000
     t5 = 00000058
     t6 = 00000059
     t7 = 00000060
    

    but instead get this

       clkfreq = 200000000   clkmode = 0x10009fb
      Compiled with FlexC v7.0.0-beta2-v7.0.0-beta2-47-gb889cbdb
     t1 = fd63ec01
     t2 = ff000002
     t3 = fce7ec04
     t4 = 00000000
     t5 = 0bebc200
     t6 = 010009fb
     t7 = 00000000
    
Sign In or Register to comment.