Shop Learn
flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler - Page 96 — Parallax Forums

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

1909192939496»

Comments

  • ersmithersmith Posts: 5,333

    @Rayman said:

    @evanh said:
    Ah, that's like Unicode I think.

    I was thinking the same thing... It's essentially the same as UTF-8.

    No, it isn't -- in UTF-8 the first byte (if it's not 0-127) contains what's essentially a count of the remaining bytes in the upper bits of the first byte, and the remaining bytes store 6 bits per byte (upper bits must be 1 and then 0). UTF-8 was pretty specifically designed so that ASCII bytes (0-127) do not appear at all in the representation of values above 127, and also so that finding character boundaries is very simple (characters start with either 0x or 11, never 10, and bytes starting with 0x or 11 never appear as other than the first byte of a character). The RFVAR representation is much simpler (7 bits per byte, upper bit is 0 when finished).

    Except this is limited to 24 bits? Why not allow 32-bits like UTF-8?

    I think it's actually limited to 28 bits (4 bytes). I'm not sure why the option for 5 bytes wasn't provided -- it would have been nice to be able to cover the whole range of 32 bit values, but I presume something internally made it hard to go above 4 bytes per read.

  • TonyB_TonyB_ Posts: 1,922
    edited 2022-06-20 21:38

    @ersmith said:

    @Rayman said:
    Except this is limited to 24 bits? Why not allow 32-bits like UTF-8?

    I think it's actually limited to 28 bits (4 bytes). I'm not sure why the option for 5 bytes wasn't provided -- it would have been nice to be able to cover the whole range of 32 bit values, but I presume something internally made it hard to go above 4 bytes per read.

    RFVAR/RFVARS limit is 29 bits, as 4th byte can use all eight bits. I think FIFO being long-wide probably prevented a 5th byte and there is always RFLONG if 29 bits aren't enough. At first sight perhaps not very useful, RFVAR and RFVARS are actually a handy pair of space-saving instructions that take two cycles whether 1, 2, 3 or 4 bytes are read.

  • RaymanRayman Posts: 12,650

    @ersmith, @TonyB_ Thank's for explaining rfvar/rfvars. I wonder now if it'd be better to just copy UTF-8... But, I guess it's a done deal...

  • msrobotsmsrobots Posts: 3,555

    @ersmith,

    but @@@data1 does what I think and put absolute addresses or doesn't it? I am quite annoyed with relative addressing,

    mike

  • ersmithersmith Posts: 5,333

    @msrobots said:
    @ersmith,

    but @@@data1 does what I think and put absolute addresses or doesn't it? I am quite annoyed with relative addressing,

    Yes, @@@data1 is always absolute; but that's only available in flexspin, I don't think pnut/proptool support it.

  • roglohrogloh Posts: 4,217
    edited 2022-06-23 05:16

    Is there any way to prevent the flexspin optimizer from removing inline PASM register assignments on an individual method basis? Or do I have to resort to removing all optimizations for the whole project with the "-O0" option ?

    In my inline PASM code I'm writing to sequential local registers and then performing a setq burst transfer to write all these registers into HUB. I don't explicitly identify the consecutive register names after the first one written by the burst transfer. I'm finding the flexspin optimizer is then removing the instruction assignments to these registers because it believes the registers are no longer accessed so there is no need to perform the instructions, but this is bad as they are indirectly accessed in the setq wrlong transfer, and I get corrupted output. I've been trying to add some dummy register access instructions at the end of my loop that uses all these registers and sometimes it tricks the optimizer into believing the registers are used and so won't remove the code, but it's a little bit hit and miss. I'm thinking it's safer to just disable optimizations in these cases, but ideally do this on a per method basis. Is there a way?

    Here's an example of the problem...

    Source is this:

    transparent setq    #7
                rdlong  pix1, srcptr
                add     srcptr, #32
    setpixels   xor     char, #$ff
                skip    char
                mov     pix1, fg
                mov     pix2, fg
                mov     pix3, fg
                mov     pix4, fg
                mov     pix5, fg
                mov     pix6, fg
                mov     pix7, fg
                mov     pix8, fg
                setq    #7
                wrlong  pix1, destptr
    

    and the bad code generated is this, which is missing several intermediate instructions:

    00cb4     28 0E 64 FD |     setq    #7
    00cb8     FB FE 01 FB |     rdlong  local24, local20
    00cbc     20 F6 05 F1 |     add local20, #32
    00cc0                 | LR__0015
    00cc0     FF FA 65 F5 |     xor local22, #255
    00cc4     31 FA 61 FD |     skip    local22
    00cc8     ED FE 01 F6 |     mov local24, local06
    00ccc     ED 00 02 F6 |     mov local25, local06
    00cd0     28 0E 64 FD |     setq    #7
    00cd4     FC FE 61 FC |     wrlong  local24, local21
    

    It should be this (which is what happens when the optimizer is globally disabled):

    00e28     28 0E 64 FD |     setq    #7
    00e2c     9D 42 01 FB |     rdlong  local25, local21
    00e30     20 3A 05 F1 |     add local21, #32
    00e34                 | LR__0020
    00e34     FF 3E 65 F5 |     xor local23, #255
    00e38     31 3E 61 FD |     skip    local23
    00e3c     8F 42 01 F6 |     mov local25, local07
    00e40     8F 44 01 F6 |     mov local26, local07
    00e44     8F 46 01 F6 |     mov local27, local07
    00e48     8F 48 01 F6 |     mov local28, local07
    00e4c     8F 4A 01 F6 |     mov local29, local07
    00e50     8F 4C 01 F6 |     mov local30, local07
    00e54     8F 4E 01 F6 |     mov local31, local07
    00e58     8F 50 01 F6 |     mov local32, local07
    00e5c     28 0E 64 FD |     setq    #7
    00e60     9E 42 61 FC |     wrlong  local25, local22
    
  • roglohrogloh Posts: 4,217

    Ok, found my answer I think. I might need to use this:

    PUB {++opt(!regs)} methodName(args...)

    My code seems to work now.

  • evanhevanh Posts: 13,106

    __asm volatile {
    }

  • evanhevanh Posts: 13,106
    edited 2022-06-23 05:35

    @rogloh said:
    PUB {++opt(!regs)} methodName(args...)

    Oh, is that Spin? I thought Flexspin didn't optimise inline assembly there.

  • roglohrogloh Posts: 4,217

    Yeah I was the same and thought it would only work for other languages but SPIN is included as well now. I found it buried in the docs, seems to work okay. :smile:

  • Spin2-style ORG/END inline ASM is not optimized and always loads into cog RAM.

    Flexspin-style ASM/ENDASM is treated just like compiler-generated code.

    The 3rd inline ASM style (__asm volatile in C) isn't supported in Spin because uhhhhhhhhh.

    Unrelatedly, do not assume that local variables defined consecutively actually end up in consecutive registers. Register renaming will assign them in order of use. Should work if you use an array though?

  • roglohrogloh Posts: 4,217

    @Wuerfel_21 said:
    Spin2-style ORG/END inline ASM is not optimized and always loads into cog RAM.

    Flexspin-style ASM/ENDASM is treated just like compiler-generated code.

    Ok

    Unrelatedly, do not assume that local variables defined consecutively actually end up in consecutive registers. Register renaming will assign them in order of use. Should work if you use an array though?

    Arggh, what is the syntax for the register array? Is it a local or put in the VAR block? I need to do setq burst transfers on it so it needs to be a local register accessible from inline ASM/ENDASM block.

  • I think just a regular local array is sufficient? May not actually work like that, idk.

  • @Wuerfel_21 said:
    Unrelatedly, do not assume that local variables defined consecutively actually end up in consecutive registers. Register renaming will assign them in order of use.

    Hmm, I've seen many examples of "official" code in the library from Parallax where the adresses of parameters on the stack and also local variables were assumed to be in the order of declaration. Since Spin does not support any composed data types such as structs in C this makes sense although it makes some optimisations difficult.

  • If the variables are forced onto the stack (as happens when you ask for the address of one), then yes, they are in order, but the whole function gets slow and bloated, so don't do that if writing for flexspin specifically.

  • roglohrogloh Posts: 4,217

    Yeah I just hit the problem with the ordering of registers....I had to trick the optimizer again through referencing them explicitly in sequential order. You can't define a pix[8] array here instead, the inline PASM code doesn't deal with indexes, so I need explicit names. I don't think this would be even possible with FlexProp - too many args+local variables...?

    PRI {++opt(!regs)} gfxBlend32(src1, stride1, src2, stride2, destaddr, stride3, w, h, blendval, mixval) | pix1, pix2, pix3, pix4, pix5, pix6, pix7, pix8, len, srcptr, destptr, src, dest
    
        w <#= LINEBUFSIZE ' stay inside scratch buffer
        if w == 0 or h == 0
            return
    
        src := @srcbuf 
        dest := @textbuf
    
        asm
                mov     pix1, #0 ' only put here to ensure optimizer assigns registers in the usable order
                mov     pix2, #0
                mov     pix3, #0
                mov     pix4, #0
                mov     pix5, #0
                mov     pix6, #0
                mov     pix7, #0
                mov     pix8, #0
        endasm
        repeat h
            mem.read(@srcbuf, src1, w<<2)
            src1 += stride1
            mem.read(@textbuf, src2, w<<2)
            src2 += stride2
            asm
                mov     len, w 
                setpiv  blendval
                setpix  mixval
                mov     srcptr, src
                mov     destptr, dest
    mixloop     setq    #3
                rdlong  pix1, srcptr
                add     srcptr, #16
                setq    #3
                rdlong  pix5, destptr
                mixpix  pix5, pix1
                mixpix  pix6, pix2
                mixpix  pix7, pix3
                mixpix  pix8, pix4
                cmp     len, #4 wc ' check if less than 4 pixels
        if_c    altd    len, #$1ff
                setq    #3
                wrlong  pix5, destptr
                add     destptr, #16
                sub     len, #4 wcz
        if_nc_and_nz    jmp #mixloop
    endofline
            endasm
            'write back the rendered buffer data
            mem.write(@textbuf, destaddr, w<<2)
            destaddr += stride3
    
    
  • ersmithersmith Posts: 5,333

    @rogloh : I think if you use ORG/END instead of ASM/ENDASM you won't run into so many optimizer issues (and your code will be compatible with PropTool/PNut).

  • evanhevanh Posts: 13,106
    edited 2022-06-23 20:55

    That makes sense now - I've always used ORG/END for Spin based inline assembly.

  • roglohrogloh Posts: 4,217
    edited 2022-06-24 05:14

    @ersmith said:
    @rogloh : I think if you use ORG/END instead of ASM/ENDASM you won't run into so many optimizer issues (and your code will be compatible with PropTool/PNut).

    Yeah it can compile that way but I see in the listing that would drag in the fcached inline PASM code for every scan line, slowing down performance during rendering. It might have to be the way though for a PropTool variant if that's the way inline code works there and if this code ever wants to be executed on that platform.

    UPDATE: Actually in this case, if the outer loop SPIN functions are reasonably slow, it may not be that much slower over all. I'll have to benchmark it at some point.

Sign In or Register to comment.