@evanh Yeah, the inline assembly code really isn't equipped to deal with _RET_ modifiers in any sensible way. Probably it should just throw an error on seeing them rather than trying to insert the RET instruction.
@evanh said:
... But I've hit a niggle that is probably something Chip needs to sort out ...
Huh, LOL, I tried to make an example for Chip to work with, only to find Pnut completely handles everything no problem.
Output from the Pnut v46 compiled example:
Errors, including debug's undefines, compiling wih Flexspin Version 7.0.0-beta2-v7.0.0-beta2-34-gcbe5f1a4 Compiled on: Jan 3 2025
reg_ref_test.spin2:36: error: Undefined symbol p_clk
reg_ref_test.spin2:36: error: Undefined symbol p_dat
reg_ref_test.spin2:36: error: Undefined symbol m_lead
reg_ref_test.spin2:36: error: Undefined symbol v_nco
reg_ref_test.spin2:36: error: Undefined symbol m_se2
reg_ref_test.spin2:36: error: Undefined symbol m_dat
reg_ref_test.spin2:36: error: Undefined symbol m_crc
reg_ref_test.spin2:45: error: Undefined symbol p_clk
reg_ref_test.spin2:45: error: Undefined symbol p_dat
reg_ref_test.spin2:45: error: Undefined symbol m_lead
reg_ref_test.spin2:45: error: Undefined symbol v_nco
reg_ref_test.spin2:45: error: Undefined symbol m_se2
reg_ref_test.spin2:45: error: Undefined symbol m_dat
reg_ref_test.spin2:45: error: Undefined symbol m_crc
reg_ref_test.spin2:49: error: data item is not constant
@evanh said:
... But I've hit a niggle that is probably something Chip needs to sort out ...
Huh, LOL, I tried to make an example for Chip to work with, only to find Pnut completely handles everything no problem.
Output from the Pnut v46 compiled example:
Unfortunately this is where my model of inline assembly (constructed prior to the existence of Spin2) breaks down. I had envisioned it as being used for short snippets of inline code, and wanted therefore to make it fast to call and tightly integrated with the rest of the compiler. This made sense because the compiled code is usually almost as fast as inline assembly would be anyway, so the asm mostly is needed only for things that are hard to express in the source language. But it meant having to convert it into the internal representation used by the compiler, which is somewhat restricted compared to general assembly.
In PNut, by contrast, the inline assembly blocks are essentially mini-DAT sections. Setting them up and running them is relatively expensive. The interpreter is slow enough that the trade-off is worth it, and people are using inline assembly more and more.
I already have something like the mini-DAT sections implemented for nucode (-2nu output) and I can probably port that over to the regular inline assembly. It'll be much slower to get into/out of these blocks, because we'll have to move the data in variables to and from temporary registers. On the plus side that will mean that the restriction on using variables in memory will be lifted. Is it worth it? I guess PNut compatibility is paramount, so I'll probably go ahead and do it, but the generated code will be ugly. For backwards compatibility (and performance) the current system will continue to be used for plain unadorned asm blocks.
But the DAT like "variables" aren't scoped outside of register space though. They can't be, and I don't need them, accessed as C locals. I'm struggling to see why a drastic change would be needed.
All I'm looking for here is the compiler to accept the symbol label as a register number to fill those DAT like LONGs. Specifically a fix for this error: reg_ref_test.spin2:49: error: data item is not constant
EDIT: I guess you're looking at the other errors with the debug() function. I didn't know they would be a headache. Forget them then. I don't need that to work.
PS: I'm looking at a C solution here too. The Spin example was done for testing Pnut is all.
In the below listing the ptr symbol is currently set with two bit-shifted constants, but I'd like to replace the 0x1f6 with PA register symbol, and, importantly, COGCRCBUFF will become a symbolic reference to one of the later RES symbols. Both are cogRAM addresses and get used by an ALTI instruction.
clocks long 1 + 512 * 2 // start-bit + nibble count
lnco long 0x8000_0000 // tells RDFAST/WRFAST not to wait for FIFO ready
endbit long 0xffff_ffff
poly long 0x8408 // CRC-16-CCITT reversed (x16 + x12 + x5 + x0: 0x1021, even parity)
ptr long 0x1f6<<19 | COGCRCBUFF<<9 // register PA for ALTI result substitution, and CRC buffer address
crc3 long 0
crc2 long 0
crc1 long 0
crc0 long 0
@evanh said:
But the DAT like "variables" aren't scoped outside of register space though. They can't be, and I don't need them, accessed as C locals. I'm struggling to see why a drastic change would be needed.
All I'm looking for here is the compiler to accept the symbol label as a register number to fill those DAT like LONGs. Specifically a fix for this error: reg_ref_test.spin2:49: error: data item is not constant
Yeah, that's the problem -- the current inline assembly parser does not know any symbol values, and in general cannot because it runs before the optimizer and "linker". Of course for the case of inline assembly that goes into COG memory we can in principle calculate them, but that's adding another patch onto an already creaky structure. The whole LONG/RES support is already something that's kind of bolted on to the compiler internal representation. That's also the reason that _RET_ prefixes aren't supported, the compiler doesn't generate them so there's no way to represent them in the IR.
I think the right solution is to bite the bullet and do a full assembly of the contents of ORG/END to binary, just as though it's inside the DAT section, rather than a partial assembly to IR that's then passed to later compiler passes. I've already got that for the P2 bytecode backend. The only tricky part I have left is accessing the local variables inside assembly; in bytecode the variables are always in HUB memory so we just copy from HUB to a designated part of COG before the inline asm runs, and back again after. We can do something similar for assembly output too. This will also apply to C __asm volatile (but not regular __asm, which will continue to be parsed with the existing code).
Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).
Unrelatedly ORGH/END seems to have been broken at some point.
@Wuerfel_21 said:
Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).
We can add an option to make it work the current way, but the default should be the new (more compatible) way. Making it an optimization flag would let us change it at function level granularity, which should be good enough I think?
Unrelatedly ORGH/END seems to have been broken at some point.
@Wuerfel_21 said:
Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).
We can add an option to make it work the current way, but the default should be the new (more compatible) way. Making it an optimization flag would let us change it at function level granularity, which should be good enough I think?
I'd do it the opposite way - the current thing is good enough for most/all inline ASM that isn't super cracked and more efficient. You'd regress all the code that worked fine before to being slower. So make it an opt-in flag. The way attributes are given doesn't disturb PNut's parsing, so this is fine in all cases.
OK, for now the current method of inline assembly is the default for -O1 and above. It's in an optimization flag called -Ofast-inline-asm, so it can be turned off globally with -Ono-fast-inline-asm or in a specific function by placing {++opt(no-fast-inline-asm)} after the PUB or PRI (and similarly in C with an __attribute__(opt(no-fast-inline-asm)) on the function declaration).
Comments
@evanh Yeah, the inline assembly code really isn't equipped to deal with
_RET_
modifiers in any sensible way. Probably it should just throw an error on seeing them rather than trying to insert theRET
instruction.Huh, LOL, I tried to make an example for Chip to work with, only to find Pnut completely handles everything no problem.
Output from the Pnut v46 compiled example:
Errors, including debug's undefines, compiling wih Flexspin Version 7.0.0-beta2-v7.0.0-beta2-34-gcbe5f1a4 Compiled on: Jan 3 2025
Unfortunately this is where my model of inline assembly (constructed prior to the existence of Spin2) breaks down. I had envisioned it as being used for short snippets of inline code, and wanted therefore to make it fast to call and tightly integrated with the rest of the compiler. This made sense because the compiled code is usually almost as fast as inline assembly would be anyway, so the asm mostly is needed only for things that are hard to express in the source language. But it meant having to convert it into the internal representation used by the compiler, which is somewhat restricted compared to general assembly.
In PNut, by contrast, the inline assembly blocks are essentially mini-DAT sections. Setting them up and running them is relatively expensive. The interpreter is slow enough that the trade-off is worth it, and people are using inline assembly more and more.
I already have something like the mini-DAT sections implemented for nucode (
-2nu
output) and I can probably port that over to the regular inline assembly. It'll be much slower to get into/out of these blocks, because we'll have to move the data in variables to and from temporary registers. On the plus side that will mean that the restriction on using variables in memory will be lifted. Is it worth it? I guess PNut compatibility is paramount, so I'll probably go ahead and do it, but the generated code will be ugly. For backwards compatibility (and performance) the current system will continue to be used for plain unadornedasm
blocks.But the DAT like "variables" aren't scoped outside of register space though. They can't be, and I don't need them, accessed as C locals. I'm struggling to see why a drastic change would be needed.
All I'm looking for here is the compiler to accept the symbol label as a register number to fill those DAT like LONGs. Specifically a fix for this error:
reg_ref_test.spin2:49: error: data item is not constant
EDIT: I guess you're looking at the other errors with the debug() function. I didn't know they would be a headache. Forget them then. I don't need that to work.
PS: I'm looking at a C solution here too. The Spin example was done for testing Pnut is all.
In the below listing the
ptr
symbol is currently set with two bit-shifted constants, but I'd like to replace the0x1f6
withPA
register symbol, and, importantly,COGCRCBUFF
will become a symbolic reference to one of the later RES symbols. Both are cogRAM addresses and get used by an ALTI instruction.Yeah, that's the problem -- the current inline assembly parser does not know any symbol values, and in general cannot because it runs before the optimizer and "linker". Of course for the case of inline assembly that goes into COG memory we can in principle calculate them, but that's adding another patch onto an already creaky structure. The whole LONG/RES support is already something that's kind of bolted on to the compiler internal representation. That's also the reason that
_RET_
prefixes aren't supported, the compiler doesn't generate them so there's no way to represent them in the IR.I think the right solution is to bite the bullet and do a full assembly of the contents of ORG/END to binary, just as though it's inside the DAT section, rather than a partial assembly to IR that's then passed to later compiler passes. I've already got that for the P2 bytecode backend. The only tricky part I have left is accessing the local variables inside assembly; in bytecode the variables are always in HUB memory so we just copy from HUB to a designated part of COG before the inline asm runs, and back again after. We can do something similar for assembly output too. This will also apply to C
__asm volatile
(but not regular__asm
, which will continue to be parsed with the existing code).Removing the way it works currently is no good - I use ORG/END a lot to write loops with FIFO or overlapped CORDIC that only work in cogexec. Would be very annoying if those had extra startup overhead heaped on (compared to a regular fcache block).
Unrelatedly ORGH/END seems to have been broken at some point.
We can add an option to make it work the current way, but the default should be the new (more compatible) way. Making it an optimization flag would let us change it at function level granularity, which should be good enough I think?
Huh, that's funny. I'll look into it.
I'd do it the opposite way - the current thing is good enough for most/all inline ASM that isn't super cracked and more efficient. You'd regress all the code that worked fine before to being slower. So make it an opt-in flag. The way attributes are given doesn't disturb PNut's parsing, so this is fine in all cases.
A switch on the function definition would work fine for me. These big chunks of assembly definitely have their own wrapper each.
OK, for now the current method of inline assembly is the default for -O1 and above. It's in an optimization flag called
-Ofast-inline-asm
, so it can be turned off globally with-Ono-fast-inline-asm
or in a specific function by placing{++opt(no-fast-inline-asm)}
after thePUB
orPRI
(and similarly in C with an__attribute__(opt(no-fast-inline-asm))
on the function declaration).Oh noes ... where's the assembly gone?
PS: It doesn't work either. I should be getting
but instead get this