Shop OBEX P1 Docs P2 Docs Learn Events
flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler - Page 124 — Parallax Forums

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

1118119120121122124»

Comments

  • ersmithersmith Posts: 6,089

    @evanh Yeah, the inline assembly code really isn't equipped to deal with _RET_ modifiers in any sensible way. Probably it should just throw an error on seeing them rather than trying to insert the RET instruction.

  • evanhevanh Posts: 16,095

    @evanh said:
    ... But I've hit a niggle that is probably something Chip needs to sort out ...

    Huh, LOL, I tried to make an example for Chip to work with, only to find Pnut completely handles everything no problem.
    Output from the Pnut v46 compiled example:

    Cog0  i = $D, p_clk = $22, p_dat = $23, m_lead = $24, v_nco = $0, m_se2 = $0, m_dat = $25, m_crc = $26
    Cog0  i = $14, p_clk = $2200_0000, p_dat = $2300_0000, m_lead = $2400_0000, v_nco = $0, m_se2 = $0, m_dat = $2500_0000, m_crc = $2600_0000
    

    Errors, including debug's undefines, compiling wih Flexspin Version 7.0.0-beta2-v7.0.0-beta2-34-gcbe5f1a4 Compiled on: Jan 3 2025

    reg_ref_test.spin2:36: error: Undefined symbol p_clk
    reg_ref_test.spin2:36: error: Undefined symbol p_dat
    reg_ref_test.spin2:36: error: Undefined symbol m_lead
    reg_ref_test.spin2:36: error: Undefined symbol v_nco
    reg_ref_test.spin2:36: error: Undefined symbol m_se2
    reg_ref_test.spin2:36: error: Undefined symbol m_dat
    reg_ref_test.spin2:36: error: Undefined symbol m_crc
    reg_ref_test.spin2:45: error: Undefined symbol p_clk
    reg_ref_test.spin2:45: error: Undefined symbol p_dat
    reg_ref_test.spin2:45: error: Undefined symbol m_lead
    reg_ref_test.spin2:45: error: Undefined symbol v_nco
    reg_ref_test.spin2:45: error: Undefined symbol m_se2
    reg_ref_test.spin2:45: error: Undefined symbol m_dat
    reg_ref_test.spin2:45: error: Undefined symbol m_crc
    reg_ref_test.spin2:49: error: data item is not constant
    
  • ersmithersmith Posts: 6,089

    @evanh said:

    @evanh said:
    ... But I've hit a niggle that is probably something Chip needs to sort out ...

    Huh, LOL, I tried to make an example for Chip to work with, only to find Pnut completely handles everything no problem.
    Output from the Pnut v46 compiled example:

    Cog0  i = $D, p_clk = $22, p_dat = $23, m_lead = $24, v_nco = $0, m_se2 = $0, m_dat = $25, m_crc = $26
    Cog0  i = $14, p_clk = $2200_0000, p_dat = $2300_0000, m_lead = $2400_0000, v_nco = $0, m_se2 = $0, m_dat = $2500_0000, m_crc = $2600_0000
    

    Unfortunately this is where my model of inline assembly (constructed prior to the existence of Spin2) breaks down. I had envisioned it as being used for short snippets of inline code, and wanted therefore to make it fast to call and tightly integrated with the rest of the compiler. This made sense because the compiled code is usually almost as fast as inline assembly would be anyway, so the asm mostly is needed only for things that are hard to express in the source language. But it meant having to convert it into the internal representation used by the compiler, which is somewhat restricted compared to general assembly.

    In PNut, by contrast, the inline assembly blocks are essentially mini-DAT sections. Setting them up and running them is relatively expensive. The interpreter is slow enough that the trade-off is worth it, and people are using inline assembly more and more.

    I already have something like the mini-DAT sections implemented for nucode (-2nu output) and I can probably port that over to the regular inline assembly. It'll be much slower to get into/out of these blocks, because we'll have to move the data in variables to and from temporary registers. On the plus side that will mean that the restriction on using variables in memory will be lifted. Is it worth it? I guess PNut compatibility is paramount, so I'll probably go ahead and do it, but the generated code will be ugly. For backwards compatibility (and performance) the current system will continue to be used for plain unadorned asm blocks.

  • evanhevanh Posts: 16,095
    edited 2025-01-05 09:23

    But the DAT like "variables" aren't scoped outside of register space though. They can't be, and I don't need them, accessed as C locals. I'm struggling to see why a drastic change would be needed.

    All I'm looking for here is the compiler to accept the symbol label as a register number to fill those DAT like LONGs. Specifically a fix for this error: reg_ref_test.spin2:49: error: data item is not constant

    EDIT: I guess you're looking at the other errors with the debug() function. I didn't know they would be a headache. Forget them then. I don't need that to work.

    PS: I'm looking at a C solution here too. The Spin example was done for testing Pnut is all.

    In the below listing the ptr symbol is currently set with two bit-shifted constants, but I'd like to replace the 0x1f6 with PA register symbol, and, importantly, COGCRCBUFF will become a symbolic reference to one of the later RES symbols. Both are cogRAM addresses and get used by an ALTI instruction.

    clocks      long 1 + 512 * 2    // start-bit + nibble count
    lnco        long 0x8000_0000    // tells RDFAST/WRFAST not to wait for FIFO ready
    endbit      long 0xffff_ffff
    poly        long 0x8408    // CRC-16-CCITT reversed (x16 + x12 + x5 + x0: 0x1021, even parity)
    ptr     long 0x1f6<<19 | COGCRCBUFF<<9    // register PA for ALTI result substitution, and CRC buffer address
    crc3        long 0
    crc2        long 0
    crc1        long 0
    crc0        long 0
    
  • ersmithersmith Posts: 6,089

    @evanh said:
    But the DAT like "variables" aren't scoped outside of register space though. They can't be, and I don't need them, accessed as C locals. I'm struggling to see why a drastic change would be needed.

    All I'm looking for here is the compiler to accept the symbol label as a register number to fill those DAT like LONGs. Specifically a fix for this error: reg_ref_test.spin2:49: error: data item is not constant

    Yeah, that's the problem -- the current inline assembly parser does not know any symbol values, and in general cannot because it runs before the optimizer and "linker". Of course for the case of inline assembly that goes into COG memory we can in principle calculate them, but that's adding another patch onto an already creaky structure. The whole LONG/RES support is already something that's kind of bolted on to the compiler internal representation. That's also the reason that _RET_ prefixes aren't supported, the compiler doesn't generate them so there's no way to represent them in the IR.

    I think the right solution is to bite the bullet and do a full assembly of the contents of ORG/END to binary, just as though it's inside the DAT section, rather than a partial assembly to IR that's then passed to later compiler passes. I've already got that for the P2 bytecode backend. The only tricky part I have left is accessing the local variables inside assembly; in bytecode the variables are always in HUB memory so we just copy from HUB to a designated part of COG before the inline asm runs, and back again after. We can do something similar for assembly output too. This will also apply to C __asm volatile (but not regular __asm, which will continue to be parsed with the existing code).

  • Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).

    Unrelatedly ORGH/END seems to have been broken at some point.

  • ersmithersmith Posts: 6,089

    @Wuerfel_21 said:
    Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).

    We can add an option to make it work the current way, but the default should be the new (more compatible) way. Making it an optimization flag would let us change it at function level granularity, which should be good enough I think?

    Unrelatedly ORGH/END seems to have been broken at some point.

    Huh, that's funny. I'll look into it.

  • @ersmith said:

    @Wuerfel_21 said:
    Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).

    We can add an option to make it work the current way, but the default should be the new (more compatible) way. Making it an optimization flag would let us change it at function level granularity, which should be good enough I think?

    I'd do it the opposite way - the current thing is good enough for most/all inline ASM that isn't super cracked and more efficient. You'd regress all the code that worked fine before to being slower. So make it an opt-in flag. The way attributes are given doesn't disturb PNut's parsing, so this is fine in all cases.

  • evanhevanh Posts: 16,095

    A switch on the function definition would work fine for me. These big chunks of assembly definitely have their own wrapper each.

  • ersmithersmith Posts: 6,089
    edited 2025-01-06 00:16

    OK, for now the current method of inline assembly is the default for -O1 and above. It's in an optimization flag called -Ofast-inline-asm, so it can be turned off globally with -Ono-fast-inline-asm or in a specific function by placing {++opt(no-fast-inline-asm)} after the PUB or PRI (and similarly in C with an __attribute__(opt(no-fast-inline-asm)) on the function declaration).

  • evanhevanh Posts: 16,095
    edited 2025-01-06 03:12

    Oh noes ... where's the assembly gone?

        callpa  #(@LR__0031-@LR__0030)>>2,fcache_load_ptr_
    LR__0030
        org 0
        byte    $03, $00, $90, $fe, $28, $0c, $64, $fd, $f6, $17, $00, $fb, $0b, $98, $03, $f6
        byte    $0c, $9a, $03, $f6, $0d, $9c, $03, $f6, $0e, $9e, $03, $f6, $0f, $a0, $03, $f6
        byte    $10, $a2, $03, $f6, $11, $a4, $03, $f6, $2d, $00, $64, $fd
        fit 256
    LR__0031
    

    PS: It doesn't work either. I should be getting

       clkfreq = 200000000   clkmode = 0x10009fb
      Compiled with FlexC v7.0.0-beta2-v7.0.0-beta2-47-gb889cbdb
     t1 = 00000056
     t2 = 00000057
     t3 = 00000000
     t4 = 00000000
     t5 = 00000058
     t6 = 00000059
     t7 = 00000060
    

    but instead get this

       clkfreq = 200000000   clkmode = 0x10009fb
      Compiled with FlexC v7.0.0-beta2-v7.0.0-beta2-47-gb889cbdb
     t1 = fd63ec01
     t2 = ff000002
     t3 = fce7ec04
     t4 = 00000000
     t5 = 0bebc200
     t6 = 010009fb
     t7 = 00000000
    
  • ersmithersmith Posts: 6,089

    It looks like something's going wrong with the calculation of the address of testset in loc pa, #testset. Which isn't surprising, since addresses are not known at that point, but I don't know why there isn't a warning. For now you can work around this by passing in the address in another local variable void *ptr = &testset and then use ptr in place of pa.

  • evanhevanh Posts: 16,095
    edited 2025-01-06 14:31

    Oh, that figures, it is the same with Pnut. And same solution.
    Thanks for looking for me.

    EDIT: Yep, works now. :)

    EDIT2: Now added to the SD card driver in two places and appears to be working flawless. The block read and block write low level routines required this to allow automatic packing of the CRC processing buffer tightly against the code sitting in Fcache space. And it also allows the compiler to know the existence of the processing buffer at all.

    So, it should now be able to compile-time warn me if the whole lot doesn't fit ... hmm nope, it compiles without error. Then crashes after reading a block or two ... ah, maybe I need to add a FIT ... right, yeah, that worked.

    Next problem is Fcache size is not known.

  • ersmithersmith Posts: 6,089

    Thanks for testing this @evanh. I've updated the code slightly to fix some bugs (e.g. it was copying things to local memory that it shouldn't have). I also tried to create relocations for unknown HUB addresses. That didn't work, but it did have the happy side effect (which I'd forgotten about) of providing comments in the binary blob showing the original assembly.

  • evanhevanh Posts: 16,095
    edited 2025-01-06 20:16

    I should say, rather, Fcache size is not know when unspecified. Or is there a compiler generated value that could be compared?

    @ersmith said:
    ... but it did have the happy side effect (which I'd forgotten about) of providing comments in the binary blob showing the original assembly.

    Oh, so the hex dump in the .p2asm file is literally that - from an assembled binary file?
    I see the assembly now :)

        callpa  #(@LR__0031-@LR__0030)>>2,fcache_load_ptr_
    LR__0030
        org 0
    '-' 
    '-'         setq #sizeof(testset) / 4 - 1
        byte    $28, $0c, $64, $fd
    '-'         rdlong rp_clk, ptr
        byte    $d3, $19, $00, $fb
    '-' 
    '-'         mov t1, rp_clk
        byte    $0c, $98, $03, $f6
    '-'         mov t2, rp_dat
        byte    $0d, $9a, $03, $f6
    '-'         mov t3, rm_align
        byte    $0e, $9c, $03, $f6
    '-'         mov t4, rv_nco
        byte    $0f, $9e, $03, $f6
    '-'         mov t5, rm_se2
        byte    $10, $a0, $03, $f6
    '-'         mov t6, rm_dat
        byte    $11, $a2, $03, $f6
    '-'         alts altreg, #0
        byte    $00, $16, $94, $f9
    '-'         mov t7, 0-0
        byte    $00, $a4, $03, $f6
    '-'         ret
        byte    $2d, $00, $64, $fd
    '-' 
    '-' altreg long rm_crc
        byte    $12, $00, $00, $00
    '-' rp_clk res 1
    '-' rp_dat res 1
    '-' rm_align res 1
    '-' rv_nco res 1
    '-' rm_se2 res 1
    '-' rm_dat res 1
    '-' rm_crc res 1
        fit 128
    LR__0031
    

    EDIT: Hang-on, there is an auto-generated FIT there. It ain't working though. I only get a compile error when I explicitly put my own FIT 128 in after the large RES 130 I have for the CRC processing buffer.
    Example failure attached.

  • ersmithersmith Posts: 6,089

    @evanh: Yeah, the auto generated fit is inserted after we've already compiled the ORG mini-dat, so it's only testing the size of the binary blob (the bytes from the code and data) and doesn't know about any RES inside. It's better than nothing, but it won't catch RES caused errors.

  • evanhevanh Posts: 16,095
    edited 2025-01-07 19:39

    Back to wanting to know the Fcache size for me to add as a FIT directive then. And even better, an Fcache that can use whatever available cogRAM there is.

  • ersmithersmith Posts: 6,089

    @evanh said:
    Back to wanting to know the Fcache size for me to add as a FIT directive then. And even better, an Fcache that can use whatever available cogRAM there is.

    It's documented to be 256 longs, but for some reason the code was only allocating 128. I've checked in an update to actually provide 256 (which is also what the P2 bytecode interpreter provides) but beware -- it's been 128 for a long time, so I would not be surprised to find that 256 is too big and we have to shrink it.

    I have tried in the past to auto-allocate the amount of fcache, but that is extremely complicated due to all of the things that compete for COG memory, and it's never worked out very well.

  • evanhevanh Posts: 16,095
    edited 2025-01-07 21:36

    Thank you for raising it up again. I don't have to force it on the compile line now. That's also much closer to Pnut's level now too.

    Fair enough about the auto-sizing.

  • evanhevanh Posts: 16,095

    A bonus of the change in assembler, _RET_ is working now. :)

  • evanhevanh Posts: 16,095

    Hmm, Pnut checks for RES allocation size and spits an error if beyond the max allowed. Not much of a need for using FIT there.

  • ersmithersmith Posts: 6,089

    @evanh said:
    Hmm, Pnut checks for RES allocation size and spits an error if beyond the max allowed. Not much of a need for using FIT there.

    I've just added some code so flexspin should do this now too.

  • evanhevanh Posts: 16,095

    Champ! Thank you Eric.

Sign In or Register to comment.