flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

ersmith · 2025-01-03 23:42

@evanh Yeah, the inline assembly code really isn't equipped to deal with _RET_ modifiers in any sensible way. Probably it should just throw an error on seeing them rather than trying to insert the RET instruction.

evanh · 2025-01-03 23:50

@evanh said:
... But I've hit a niggle that is probably something Chip needs to sort out ...

Huh, LOL, I tried to make an example for Chip to work with, only to find Pnut completely handles everything no problem.
Output from the Pnut v46 compiled example:

Cog0  i = $D, p_clk = $22, p_dat = $23, m_lead = $24, v_nco = $0, m_se2 = $0, m_dat = $25, m_crc = $26
Cog0  i = $14, p_clk = $2200_0000, p_dat = $2300_0000, m_lead = $2400_0000, v_nco = $0, m_se2 = $0, m_dat = $2500_0000, m_crc = $2600_0000

Errors, including debug's undefines, compiling wih Flexspin Version 7.0.0-beta2-v7.0.0-beta2-34-gcbe5f1a4 Compiled on: Jan 3 2025

reg_ref_test.spin2:36: error: Undefined symbol p_clk
reg_ref_test.spin2:36: error: Undefined symbol p_dat
reg_ref_test.spin2:36: error: Undefined symbol m_lead
reg_ref_test.spin2:36: error: Undefined symbol v_nco
reg_ref_test.spin2:36: error: Undefined symbol m_se2
reg_ref_test.spin2:36: error: Undefined symbol m_dat
reg_ref_test.spin2:36: error: Undefined symbol m_crc
reg_ref_test.spin2:45: error: Undefined symbol p_clk
reg_ref_test.spin2:45: error: Undefined symbol p_dat
reg_ref_test.spin2:45: error: Undefined symbol m_lead
reg_ref_test.spin2:45: error: Undefined symbol v_nco
reg_ref_test.spin2:45: error: Undefined symbol m_se2
reg_ref_test.spin2:45: error: Undefined symbol m_dat
reg_ref_test.spin2:45: error: Undefined symbol m_crc
reg_ref_test.spin2:49: error: data item is not constant

ersmith · 2025-01-04 23:35

@evanh said:

@evanh said:
... But I've hit a niggle that is probably something Chip needs to sort out ...

Huh, LOL, I tried to make an example for Chip to work with, only to find Pnut completely handles everything no problem.
Output from the Pnut v46 compiled example:
Cog0  i = $D, p_clk = $22, p_dat = $23, m_lead = $24, v_nco = $0, m_se2 = $0, m_dat = $25, m_crc = $26
Cog0  i = $14, p_clk = $2200_0000, p_dat = $2300_0000, m_lead = $2400_0000, v_nco = $0, m_se2 = $0, m_dat = $2500_0000, m_crc = $2600_0000

Unfortunately this is where my model of inline assembly (constructed prior to the existence of Spin2) breaks down. I had envisioned it as being used for short snippets of inline code, and wanted therefore to make it fast to call and tightly integrated with the rest of the compiler. This made sense because the compiled code is usually almost as fast as inline assembly would be anyway, so the asm mostly is needed only for things that are hard to express in the source language. But it meant having to convert it into the internal representation used by the compiler, which is somewhat restricted compared to general assembly.

In PNut, by contrast, the inline assembly blocks are essentially mini-DAT sections. Setting them up and running them is relatively expensive. The interpreter is slow enough that the trade-off is worth it, and people are using inline assembly more and more.

I already have something like the mini-DAT sections implemented for nucode (-2nu output) and I can probably port that over to the regular inline assembly. It'll be much slower to get into/out of these blocks, because we'll have to move the data in variables to and from temporary registers. On the plus side that will mean that the restriction on using variables in memory will be lifted. Is it worth it? I guess PNut compatibility is paramount, so I'll probably go ahead and do it, but the generated code will be ugly. For backwards compatibility (and performance) the current system will continue to be used for plain unadorned asm blocks.

evanh · 2025-01-05 09:02

But the DAT like "variables" aren't scoped outside of register space though. They can't be, and I don't need them, accessed as C locals. I'm struggling to see why a drastic change would be needed.

All I'm looking for here is the compiler to accept the symbol label as a register number to fill those DAT like LONGs. Specifically a fix for this error: reg_ref_test.spin2:49: error: data item is not constant

EDIT: I guess you're looking at the other errors with the debug() function. I didn't know they would be a headache. Forget them then. I don't need that to work.

PS: I'm looking at a C solution here too. The Spin example was done for testing Pnut is all.

In the below listing the ptr symbol is currently set with two bit-shifted constants, but I'd like to replace the 0x1f6 with PA register symbol, and, importantly, COGCRCBUFF will become a symbolic reference to one of the later RES symbols. Both are cogRAM addresses and get used by an ALTI instruction.

clocks      long 1 + 512 * 2    // start-bit + nibble count
lnco        long 0x8000_0000    // tells RDFAST/WRFAST not to wait for FIFO ready
endbit      long 0xffff_ffff
poly        long 0x8408    // CRC-16-CCITT reversed (x16 + x12 + x5 + x0: 0x1021, even parity)
ptr     long 0x1f6<<19 | COGCRCBUFF<<9    // register PA for ALTI result substitution, and CRC buffer address
crc3        long 0
crc2        long 0
crc1        long 0
crc0        long 0

ersmith · 2025-01-05 13:37

@evanh said:
But the DAT like "variables" aren't scoped outside of register space though. They can't be, and I don't need them, accessed as C locals. I'm struggling to see why a drastic change would be needed.

All I'm looking for here is the compiler to accept the symbol label as a register number to fill those DAT like LONGs. Specifically a fix for this error: reg_ref_test.spin2:49: error: data item is not constant

Yeah, that's the problem -- the current inline assembly parser does not know any symbol values, and in general cannot because it runs before the optimizer and "linker". Of course for the case of inline assembly that goes into COG memory we can in principle calculate them, but that's adding another patch onto an already creaky structure. The whole LONG/RES support is already something that's kind of bolted on to the compiler internal representation. That's also the reason that _RET_ prefixes aren't supported, the compiler doesn't generate them so there's no way to represent them in the IR.

I think the right solution is to bite the bullet and do a full assembly of the contents of ORG/END to binary, just as though it's inside the DAT section, rather than a partial assembly to IR that's then passed to later compiler passes. I've already got that for the P2 bytecode backend. The only tricky part I have left is accessing the local variables inside assembly; in bytecode the variables are always in HUB memory so we just copy from HUB to a designated part of COG before the inline asm runs, and back again after. We can do something similar for assembly output too. This will also apply to C __asm volatile (but not regular __asm, which will continue to be parsed with the existing code).

Wuerfel_21 · 2025-01-05 14:12

Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).

Unrelatedly ORGH/END seems to have been broken at some point.

ersmith · 2025-01-05 17:35

@Wuerfel_21 said:
Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).

We can add an option to make it work the current way, but the default should be the new (more compatible) way. Making it an optimization flag would let us change it at function level granularity, which should be good enough I think?

Unrelatedly ORGH/END seems to have been broken at some point.

Huh, that's funny. I'll look into it.

Wuerfel_21 · 2025-01-05 18:09

@ersmith said:

@Wuerfel_21 said:
Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).

We can add an option to make it work the current way, but the default should be the new (more compatible) way. Making it an optimization flag would let us change it at function level granularity, which should be good enough I think?

I'd do it the opposite way - the current thing is good enough for most/all inline ASM that isn't super cracked and more efficient. You'd regress all the code that worked fine before to being slower. So make it an opt-in flag. The way attributes are given doesn't disturb PNut's parsing, so this is fine in all cases.

evanh · 2025-01-05 22:08

A switch on the function definition would work fine for me. These big chunks of assembly definitely have their own wrapper each.

ersmith · 2025-01-06 00:15

OK, for now the current method of inline assembly is the default for -O1 and above. It's in an optimization flag called -Ofast-inline-asm, so it can be turned off globally with -Ono-fast-inline-asm or in a specific function by placing {++opt(no-fast-inline-asm)} after the PUB or PRI (and similarly in C with an __attribute__(opt(no-fast-inline-asm)) on the function declaration).

evanh · 2025-01-06 03:09

Oh noes ... where's the assembly gone?

    callpa  #(@LR__0031-@LR__0030)>>2,fcache_load_ptr_
LR__0030
    org 0
    byte    $03, $00, $90, $fe, $28, $0c, $64, $fd, $f6, $17, $00, $fb, $0b, $98, $03, $f6
    byte    $0c, $9a, $03, $f6, $0d, $9c, $03, $f6, $0e, $9e, $03, $f6, $0f, $a0, $03, $f6
    byte    $10, $a2, $03, $f6, $11, $a4, $03, $f6, $2d, $00, $64, $fd
    fit 256
LR__0031

PS: It doesn't work either. I should be getting

   clkfreq = 200000000   clkmode = 0x10009fb
  Compiled with FlexC v7.0.0-beta2-v7.0.0-beta2-47-gb889cbdb
 t1 = 00000056
 t2 = 00000057
 t3 = 00000000
 t4 = 00000000
 t5 = 00000058
 t6 = 00000059
 t7 = 00000060

but instead get this

   clkfreq = 200000000   clkmode = 0x10009fb
  Compiled with FlexC v7.0.0-beta2-v7.0.0-beta2-47-gb889cbdb
 t1 = fd63ec01
 t2 = ff000002
 t3 = fce7ec04
 t4 = 00000000
 t5 = 0bebc200
 t6 = 010009fb
 t7 = 00000000

ersmith · 2025-01-06 13:12

It looks like something's going wrong with the calculation of the address of testset in loc pa, #testset. Which isn't surprising, since addresses are not known at that point, but I don't know why there isn't a warning. For now you can work around this by passing in the address in another local variable void *ptr = &testset and then use ptr in place of pa.

evanh · 2025-01-06 13:42

Oh, that figures, it is the same with Pnut. And same solution.
Thanks for looking for me.

EDIT: Yep, works now.

EDIT2: Now added to the SD card driver in two places and appears to be working flawless. The block read and block write low level routines required this to allow automatic packing of the CRC processing buffer tightly against the code sitting in Fcache space. And it also allows the compiler to know the existence of the processing buffer at all.

So, it should now be able to compile-time warn me if the whole lot doesn't fit ... hmm nope, it compiles without error. Then crashes after reading a block or two ... ah, maybe I need to add a FIT ... right, yeah, that worked.

Next problem is Fcache size is not known.

ersmith · 2025-01-06 15:25

Thanks for testing this @evanh. I've updated the code slightly to fix some bugs (e.g. it was copying things to local memory that it shouldn't have). I also tried to create relocations for unknown HUB addresses. That didn't work, but it did have the happy side effect (which I'd forgotten about) of providing comments in the binary blob showing the original assembly.

evanh · 2025-01-06 20:02

I should say, rather, Fcache size is not know when unspecified. Or is there a compiler generated value that could be compared?

@ersmith said:
... but it did have the happy side effect (which I'd forgotten about) of providing comments in the binary blob showing the original assembly.

Oh, so the hex dump in the .p2asm file is literally that - from an assembled binary file?
I see the assembly now

    callpa  #(@LR__0031-@LR__0030)>>2,fcache_load_ptr_
LR__0030
    org 0
'-' 
'-'         setq #sizeof(testset) / 4 - 1
    byte    $28, $0c, $64, $fd
'-'         rdlong rp_clk, ptr
    byte    $d3, $19, $00, $fb
'-' 
'-'         mov t1, rp_clk
    byte    $0c, $98, $03, $f6
'-'         mov t2, rp_dat
    byte    $0d, $9a, $03, $f6
'-'         mov t3, rm_align
    byte    $0e, $9c, $03, $f6
'-'         mov t4, rv_nco
    byte    $0f, $9e, $03, $f6
'-'         mov t5, rm_se2
    byte    $10, $a0, $03, $f6
'-'         mov t6, rm_dat
    byte    $11, $a2, $03, $f6
'-'         alts altreg, #0
    byte    $00, $16, $94, $f9
'-'         mov t7, 0-0
    byte    $00, $a4, $03, $f6
'-'         ret
    byte    $2d, $00, $64, $fd
'-' 
'-' altreg long rm_crc
    byte    $12, $00, $00, $00
'-' rp_clk res 1
'-' rp_dat res 1
'-' rm_align res 1
'-' rv_nco res 1
'-' rm_se2 res 1
'-' rm_dat res 1
'-' rm_crc res 1
    fit 128
LR__0031

EDIT: Hang-on, there is an auto-generated FIT there. It ain't working though. I only get a compile error when I explicitly put my own FIT 128 in after the large RES 130 I have for the CRC processing buffer.
Example failure attached.

ersmith · 2025-01-07 17:38

@evanh: Yeah, the auto generated fit is inserted after we've already compiled the ORG mini-dat, so it's only testing the size of the binary blob (the bytes from the code and data) and doesn't know about any RES inside. It's better than nothing, but it won't catch RES caused errors.

evanh · 2025-01-07 19:39

Back to wanting to know the Fcache size for me to add as a FIT directive then. And even better, an Fcache that can use whatever available cogRAM there is.

ersmith · 2025-01-07 21:25

@evanh said:
Back to wanting to know the Fcache size for me to add as a FIT directive then. And even better, an Fcache that can use whatever available cogRAM there is.

It's documented to be 256 longs, but for some reason the code was only allocating 128. I've checked in an update to actually provide 256 (which is also what the P2 bytecode interpreter provides) but beware -- it's been 128 for a long time, so I would not be surprised to find that 256 is too big and we have to shrink it.

I have tried in the past to auto-allocate the amount of fcache, but that is extremely complicated due to all of the things that compete for COG memory, and it's never worked out very well.

evanh · 2025-01-07 21:34

Thank you for raising it up again. I don't have to force it on the compile line now. That's also much closer to Pnut's level now too.

Fair enough about the auto-sizing.

evanh · 2025-01-07 22:06

A bonus of the change in assembler, _RET_ is working now.

evanh · 2025-01-07 22:33

Hmm, Pnut checks for RES allocation size and spits an error if beyond the max allowed. Not much of a need for using FIT there.

ersmith · 2025-01-08 16:41

@evanh said:
Hmm, Pnut checks for RES allocation size and spits an error if beyond the max allowed. Not much of a need for using FIT there.

I've just added some code so flexspin should do this now too.

evanh · 2025-01-08 17:04

Champ! Thank you Eric.

rogloh · 2025-01-23 01:01

Hi @ersmith :
Related to #include and line numbering currently being discussed for PNut also, I think I just found a problem in the reported line numbers when an include file is used in flexspin. I am using the recently updated 7.0.0 build from github on Jan16.

Consider this snippet with a bug:

CON
    Z80FREQ = 4000000
    _clkfreq  = 252000000

#ifdef NONE
#include "z80.h"
#endif

DAT
    orgh
    mov pa, #aa

When I compile the code above in Flexspin I get the error on line 10 which is the orgh line, not the mov line.

Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2025 Total Spectrum Software Inc. and contributors
Version 7.0.0 Compiled on: Jan 16 2025
z.spin2
z.spin2
z.spin2:10: error: Unknown symbol 'aa'
Done.

When I disable the include file handling by commenting out the #include with an apostrophe and recompile it then reports the correct line number of the error as line 11, so it looks like the #include is throwing the numbering off by a line in the error report. Interestingly the #include should be ignored in both cases as we have not even defined NONE. So something seems wrong here, causing an offset of 1 in the error listing.

Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2025 Total Spectrum Software Inc. and contributors
Version 7.0.0 Compiled on: Jan 16 2025
z.spin2
z.spin2
z.spin2:11: error: Unknown symbol 'aa'
Done.

TonyB_ · 2025-01-23 12:05

@rogloh said:
Hi @ersmith :
Related to #include and line numbering currently being discussed for PNut also, I think I just found a problem in the reported line numbers when an include file is used in flexspin.

I've been using flexspin's #include for years for auto-generated skip pattern constants. I have an #include "xxxx.skp" line as the first thing after CON. I think occasionally error line numbers might be out by one line but that's close enough for me.

evanh · 2025-01-30 22:09

Help, I have this file read/write tester program of mine that I've ported from C to Spin2. It works fine using the built-in libc functions. eg:

PRI  mountsd() | rc
    debug("Driver = sdmm")
    rc := c._sdmm_open(CLK_RL, CS_RL, MOSI_RL, MISO_RL)
    rc := c._vfs_open_fat_handle(rc)
    c.mount(string("/sd"), rc)
    ...

However, when utilising a custom driver, the new C driver loading mechanism requires a custom *_open() function. eg:

PRI  mountsd() | rc
    debug("Driver = sdsd")
    rc := c._sdsd_open(CLK_RL, CMD_RL, DAT0_RL, PWR_RL, LED_RL)
    rc := c._vfs_open_fat_handle(rc)
    c.mount(string("/sd"), rc)
    ...

Since it doesn't exist in the magic libc hierarchy, that unsurprisingly produces the error error: unknown identifier _sdsd_open in class libc

How do I add it? I'm guessing I need to port the *_open() function to a Spin2 equivalent. One problem with that is there is a section of copying function pointers into a C structure. Can't say I know how that would be done. It also includes a bunch of C header files.

Wuerfel_21 · 2025-01-30 23:05

@evanh said:
How do I add it? I'm guessing I need to port the *_open() function to a Spin2 equivalent. One problem with that is there is a section of copying function pointers into a C structure. Can't say I know how that would be done. It also includes a bunch of C header files.

See libc.a (yes it's a text file) for how to load functions like that.

evanh · 2025-01-31 00:07

Where I'm unsure of direction is adding my driver as a separate object means it's outside of the libc object. Can they link together happily that way? Do I need to merge them in a hierarchy or something? I'm clueless on namespace reach here.

ersmith · 2025-01-31 13:03

@evanh said:
Since it doesn't exist in the magic libc hierarchy, that unsurprisingly produces the error error: unknown identifier _sdsd_open in class libc

How do I add it? I'm guessing I need to port the *_open() function to a Spin2 equivalent. One problem with that is there is a section of copying function pointers into a C structure. Can't say I know how that would be done. It also includes a bunch of C header files.

There's nothing magic about libc.a, it's just an object like any other one. I think you should be able to put your open() function into another file and include it as a different object, like:

c: "libc.a"
c2: "myopen.c"
..   
    rc := c2._sdsd_open(CLK_RL, CMD_RL, DAT0_RL, PWR_RL, LED_RL)
    rc := c._vfs_open_fat_handle(rc).

The only downside to this is that c2 may pull in its own copy of some of the standard C functions. How much this matters depends on how many library calls your _sdsd_open() function makes. I think -O2 should go a long way to reducing that duplication though.

evanh · 2025-01-31 20:47

Thanks Eric. It's compiling that way. Still to test it ...

Oddly, it also showed up two bugs:

One was a non-existent symbol being referenced in my driver. I've fixed the bug but am puzzled as to why it didn't produce a compile error when compiling the C version of the SD tester program.
The other is include/fcntl.h:56: error: syntax error, unexpected identifier 'mode_t' It's missing an include of it's own I guess. My current fix is to shift the order of includes in my driver. Again, the C version of the tester didn't produce any error here even though it's the identical driver code.
EDIT2: Come to think of it. A compile warning that I'd earlier dismissed shows up this same way - include/filesys/fatfs/fatfs_vfs.c:96: warning: mixing pointer and integer types in return: expected pointer to _struct__vfs but got int

EDIT: And seems to be running just fine. Good performance too.
File size for -O1 is 84_844 bytes
File size for -O2 is 81_980 bytes

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

Comments