flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

ersmith · 2022-06-20 11:41

@Rayman said:

@evanh said:
Ah, that's like Unicode I think.

I was thinking the same thing... It's essentially the same as UTF-8.

No, it isn't -- in UTF-8 the first byte (if it's not 0-127) contains what's essentially a count of the remaining bytes in the upper bits of the first byte, and the remaining bytes store 6 bits per byte (upper bits must be 1 and then 0). UTF-8 was pretty specifically designed so that ASCII bytes (0-127) do not appear at all in the representation of values above 127, and also so that finding character boundaries is very simple (characters start with either 0x or 11, never 10, and bytes starting with 0x or 11 never appear as other than the first byte of a character). The RFVAR representation is much simpler (7 bits per byte, upper bit is 0 when finished).

Except this is limited to 24 bits? Why not allow 32-bits like UTF-8?

I think it's actually limited to 28 bits (4 bytes). I'm not sure why the option for 5 bytes wasn't provided -- it would have been nice to be able to cover the whole range of 32 bit values, but I presume something internally made it hard to go above 4 bytes per read.

TonyB_ · 2022-06-20 21:37

@ersmith said:

@Rayman said:
Except this is limited to 24 bits? Why not allow 32-bits like UTF-8?

I think it's actually limited to 28 bits (4 bytes). I'm not sure why the option for 5 bytes wasn't provided -- it would have been nice to be able to cover the whole range of 32 bit values, but I presume something internally made it hard to go above 4 bytes per read.

RFVAR/RFVARS limit is 29 bits, as 4th byte can use all eight bits. I think FIFO being long-wide probably prevented a 5th byte and there is always RFLONG if 29 bits aren't enough. At first sight perhaps not very useful, RFVAR and RFVARS are actually a handy pair of space-saving instructions that take two cycles whether 1, 2, 3 or 4 bytes are read.

Rayman · 2022-06-20 22:00

@ersmith, @TonyB_ Thank's for explaining rfvar/rfvars. I wonder now if it'd be better to just copy UTF-8... But, I guess it's a done deal...

msrobots · 2022-06-21 01:46

@ersmith,

but @@@data1 does what I think and put absolute addresses or doesn't it? I am quite annoyed with relative addressing,

mike

ersmith · 2022-06-21 12:58

@msrobots said:
@ersmith,

but @@@data1 does what I think and put absolute addresses or doesn't it? I am quite annoyed with relative addressing,

Yes, @@@data1 is always absolute; but that's only available in flexspin, I don't think pnut/proptool support it.

rogloh · 2022-06-23 04:54

Is there any way to prevent the flexspin optimizer from removing inline PASM register assignments on an individual method basis? Or do I have to resort to removing all optimizations for the whole project with the "-O0" option ?

In my inline PASM code I'm writing to sequential local registers and then performing a setq burst transfer to write all these registers into HUB. I don't explicitly identify the consecutive register names after the first one written by the burst transfer. I'm finding the flexspin optimizer is then removing the instruction assignments to these registers because it believes the registers are no longer accessed so there is no need to perform the instructions, but this is bad as they are indirectly accessed in the setq wrlong transfer, and I get corrupted output. I've been trying to add some dummy register access instructions at the end of my loop that uses all these registers and sometimes it tricks the optimizer into believing the registers are used and so won't remove the code, but it's a little bit hit and miss. I'm thinking it's safer to just disable optimizations in these cases, but ideally do this on a per method basis. Is there a way?

Here's an example of the problem...

Source is this:

transparent setq    #7
            rdlong  pix1, srcptr
            add     srcptr, #32
setpixels   xor     char, #$ff
            skip    char
            mov     pix1, fg
            mov     pix2, fg
            mov     pix3, fg
            mov     pix4, fg
            mov     pix5, fg
            mov     pix6, fg
            mov     pix7, fg
            mov     pix8, fg
            setq    #7
            wrlong  pix1, destptr

and the bad code generated is this, which is missing several intermediate instructions:

00cb4     28 0E 64 FD |     setq    #7
00cb8     FB FE 01 FB |     rdlong  local24, local20
00cbc     20 F6 05 F1 |     add local20, #32
00cc0                 | LR__0015
00cc0     FF FA 65 F5 |     xor local22, #255
00cc4     31 FA 61 FD |     skip    local22
00cc8     ED FE 01 F6 |     mov local24, local06
00ccc     ED 00 02 F6 |     mov local25, local06
00cd0     28 0E 64 FD |     setq    #7
00cd4     FC FE 61 FC |     wrlong  local24, local21

It should be this (which is what happens when the optimizer is globally disabled):

00e28     28 0E 64 FD |     setq    #7
00e2c     9D 42 01 FB |     rdlong  local25, local21
00e30     20 3A 05 F1 |     add local21, #32
00e34                 | LR__0020
00e34     FF 3E 65 F5 |     xor local23, #255
00e38     31 3E 61 FD |     skip    local23
00e3c     8F 42 01 F6 |     mov local25, local07
00e40     8F 44 01 F6 |     mov local26, local07
00e44     8F 46 01 F6 |     mov local27, local07
00e48     8F 48 01 F6 |     mov local28, local07
00e4c     8F 4A 01 F6 |     mov local29, local07
00e50     8F 4C 01 F6 |     mov local30, local07
00e54     8F 4E 01 F6 |     mov local31, local07
00e58     8F 50 01 F6 |     mov local32, local07
00e5c     28 0E 64 FD |     setq    #7
00e60     9E 42 61 FC |     wrlong  local25, local22

rogloh · 2022-06-23 05:29

Ok, found my answer I think. I might need to use this:

PUB {++opt(!regs)} methodName(args...)

My code seems to work now.

evanh · 2022-06-23 05:30

__asm volatile {
}

evanh · 2022-06-23 05:32

@rogloh said:
PUB {++opt(!regs)} methodName(args...)

Oh, is that Spin? I thought Flexspin didn't optimise inline assembly there.

rogloh · 2022-06-23 05:52

Yeah I was the same and thought it would only work for other languages but SPIN is included as well now. I found it buried in the docs, seems to work okay.

Wuerfel_21 · 2022-06-23 11:09

Spin2-style ORG/END inline ASM is not optimized and always loads into cog RAM.

Flexspin-style ASM/ENDASM is treated just like compiler-generated code.

The 3rd inline ASM style (__asm volatile in C) isn't supported in Spin because uhhhhhhhhh.

Unrelatedly, do not assume that local variables defined consecutively actually end up in consecutive registers. Register renaming will assign them in order of use. Should work if you use an array though?

rogloh · 2022-06-23 11:54

@Wuerfel_21 said:
Spin2-style ORG/END inline ASM is not optimized and always loads into cog RAM.

Flexspin-style ASM/ENDASM is treated just like compiler-generated code.

Ok

Unrelatedly, do not assume that local variables defined consecutively actually end up in consecutive registers. Register renaming will assign them in order of use. Should work if you use an array though?

Arggh, what is the syntax for the register array? Is it a local or put in the VAR block? I need to do setq burst transfers on it so it needs to be a local register accessible from inline ASM/ENDASM block.

Wuerfel_21 · 2022-06-23 12:49

I think just a regular local array is sufficient? May not actually work like that, idk.

ManAtWork · 2022-06-23 13:17

@Wuerfel_21 said:
Unrelatedly, do not assume that local variables defined consecutively actually end up in consecutive registers. Register renaming will assign them in order of use.

Hmm, I've seen many examples of "official" code in the library from Parallax where the adresses of parameters on the stack and also local variables were assumed to be in the order of declaration. Since Spin does not support any composed data types such as structs in C this makes sense although it makes some optimisations difficult.

Wuerfel_21 · 2022-06-23 14:02

If the variables are forced onto the stack (as happens when you ask for the address of one), then yes, they are in order, but the whole function gets slow and bloated, so don't do that if writing for flexspin specifically.

rogloh · 2022-06-23 14:15

Yeah I just hit the problem with the ordering of registers....I had to trick the optimizer again through referencing them explicitly in sequential order. You can't define a pix[8] array here instead, the inline PASM code doesn't deal with indexes, so I need explicit names. I don't think this would be even possible with FlexProp - too many args+local variables...?

PRI {++opt(!regs)} gfxBlend32(src1, stride1, src2, stride2, destaddr, stride3, w, h, blendval, mixval) | pix1, pix2, pix3, pix4, pix5, pix6, pix7, pix8, len, srcptr, destptr, src, dest

    w <#= LINEBUFSIZE ' stay inside scratch buffer
    if w == 0 or h == 0
        return

    src := @srcbuf 
    dest := @textbuf

    asm
            mov     pix1, #0 ' only put here to ensure optimizer assigns registers in the usable order
            mov     pix2, #0
            mov     pix3, #0
            mov     pix4, #0
            mov     pix5, #0
            mov     pix6, #0
            mov     pix7, #0
            mov     pix8, #0
    endasm
    repeat h
        mem.read(@srcbuf, src1, w<<2)
        src1 += stride1
        mem.read(@textbuf, src2, w<<2)
        src2 += stride2
        asm
            mov     len, w 
            setpiv  blendval
            setpix  mixval
            mov     srcptr, src
            mov     destptr, dest
mixloop     setq    #3
            rdlong  pix1, srcptr
            add     srcptr, #16
            setq    #3
            rdlong  pix5, destptr
            mixpix  pix5, pix1
            mixpix  pix6, pix2
            mixpix  pix7, pix3
            mixpix  pix8, pix4
            cmp     len, #4 wc ' check if less than 4 pixels
    if_c    altd    len, #$1ff
            setq    #3
            wrlong  pix5, destptr
            add     destptr, #16
            sub     len, #4 wcz
    if_nc_and_nz    jmp #mixloop
endofline
        endasm
        'write back the rendered buffer data
        mem.write(@textbuf, destaddr, w<<2)
        destaddr += stride3

ersmith · 2022-06-23 17:28

@rogloh : I think if you use ORG/END instead of ASM/ENDASM you won't run into so many optimizer issues (and your code will be compatible with PropTool/PNut).

evanh · 2022-06-23 20:54

That makes sense now - I've always used ORG/END for Spin based inline assembly.

rogloh · 2022-06-24 05:11

@ersmith said:
@rogloh : I think if you use ORG/END instead of ASM/ENDASM you won't run into so many optimizer issues (and your code will be compatible with PropTool/PNut).

Yeah it can compile that way but I see in the listing that would drag in the fcached inline PASM code for every scan line, slowing down performance during rendering. It might have to be the way though for a PropTool variant if that's the way inline code works there and if this code ever wants to be executed on that platform.

UPDATE: Actually in this case, if the outer loop SPIN functions are reasonably slow, it may not be that much slower over all. I'll have to benchmark it at some point.

evanh · 2022-07-16 15:26

Hmm, Eric/Ada,
I've just re-run my speed tester C program that I had used heavily during the revamping of the SD driver. Problem is it is now getting consistent file open/close errors at the same place each run. Out of 24 files written and 20 files read there is exactly two file open-to-read fails and one file open-to-write fail. Always the same ones repeat fail. PS: And I've reformatted two SD cards to verify it wasn't corrupt cards. PPS: And all three fails are within repeating loops of a file. ie: They aren't unique file fails. The single open-to-read fail is the final repeat of a 200 KB file. The open-to-write fail (which has a subsequent open-to-read fail) is the first repeat of a 200 Byte file.

I haven't used the speed tester in a few weeks. The June update of Flexspin I had also had the issue. I've already deleted that one so can't be sure of its exact date. At any rate, this issue wasn't present during any of the driver work back in May.

I've forgotten how to find commits to bisect. Some hints on using Git again would be helpful. Never mind, Uncle Google told me - git log

Okay, first bad commit is:

commit a69d1e401d53f9a747bcaae37c5876ba0dc55815
Author: Eric Smith <ersmith@totalspectrum.ca>
Date:   Sat Jun 4 16:37:42 2022 -0300

    re-factored SD card code to support multiple cards; this required changing C struct __using code as well to be more class-like

And I've attached my test program. The errors stand out.

Wuerfel_21 · 2022-07-16 16:52

What error code is it giving you? I think that commit increased heap memory usage, so try with increased heap size.

evanh · 2022-07-16 17:03

It's a null pointer. Searching "heap" in docs ... enum {HEAPSIZE = 8192}; fixes it.

EDIT: 4400 bytes is enough.

Wuerfel_21 · 2022-07-16 17:12

@evanh said:
It's a null pointer.

The error code is in errno when a library call returns null. You can call strerror to get the printable string version.

evanh · 2022-07-16 17:23

Ah, interesting, errno is zero too. Definitely raising the heap size fixes it.

Wuerfel_21 · 2022-07-16 17:49

Speaking of, I just tried loading NeoYume over the host VFS and it doesn't work

CRLF issue perhaps? I do open files in binary mode, but idk if that translates across.

ersmith · 2022-07-16 18:08

I checked in a change to bump the default heap size on P2; the new file system code does need a bit more RAM sometimes.

ersmith · 2022-07-16 18:09

@Wuerfel_21 said:
Speaking of, I just tried loading NeoYume over the host VFS and it doesn't work

CRLF issue perhaps? I do open files in binary mode, but idk if that translates across.

Any clues as to what's going wrong? Are the files not being found, or are they found but not read correctly? The binary mode should be carried across, but I don't test as much on Windows, so it's possible I missed something there.

Wuerfel_21 · 2022-07-16 18:11

It is reading fine (but of course really slow at 230500 baud), but the data seems to be bad (black screen)

Wuerfel_21 · 2022-07-16 20:20

Actually, I'm just an idiot and set a bad memory config. Ignore me.

Wuerfel_21 · 2022-07-17 00:23

Also, regarding the last couple int64 related commits - do note that long is not a 64bit type on Windows and it can and will blow up if you try to printf("%ld\n",(int64_t)something);. You need long long

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

Comments