flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

evanh · 2025-05-10 16:24

Err, I should have said static uses instead of global uses but I presume you get the idea.

Wuerfel_21 · 2025-05-10 17:16

@evanh said:

@Wuerfel_21 said:
I have a lot of inline ASM that does this

In your emulators? I'm seeing both local uses and COGINIT'd uses, but nothing that seems to be global uses of PTRB within inlined assembly.

In the emulators that would usually be found in the "upper" part. NeoYume has a load_romdata function that transforms various data types on the fly while loading.
Though I am now realizing that in some recent version I essentially got rid of PTRB usage in the very load-bearing LOAD_CROM implementation (loads all the "C" files, in odd/even pairs). It still uses PTRB, but no more indexing, so that one could actually be moved to a different register.

evanh · 2025-05-10 17:34

@Wuerfel_21 said:
In the emulators that would usually be found in the "upper" part. NeoYume has a load_romdata function that transforms various data types on the fly while loading.

That's just local temporary use. You're not statically holding a running value in PTRB.

Wuerfel_21 · 2025-05-10 19:01

@evanh said:

@Wuerfel_21 said:
In the emulators that would usually be found in the "upper" part. NeoYume has a load_romdata function that transforms various data types on the fly while loading.

That's just local temporary use. You're not statically holding a running value in PTRB.

But if the compiler was using PTRB I'd be trashing its value.

evanh · 2025-05-10 19:19

If it is used as a static system variable the way PTRA register is being use, sure.

But there is the intermediate option of reloading it for internal uses. The compiler already uses PA register this way. Either PTRA or PTRB could be chosen to be reused this way. This approach has overhead but not near as much as using neither register.

evanh · 2025-05-10 20:06

I use this reloading/sharing mechanism in the SD card driver. It nicely cleans up the multiple instances situation. It got refined more recently with the bug fixes in both v1.2 and v1.4. There was two things I needed to get right:

Pin DIR control is a special resource in that one cog can negatively impact the operation of all other cogs. This needed DIR lowered on all used pins upon end of each transaction - exiting each callback. Which meant equally making sure each pin was appropriately controlled again upon entering each callback.
Cog specific resources like SETSE1/WAITSE1 always get init'd upon entering each callback.

evanh · 2025-05-12 07:45

Of note is the sharing mechanism relies on, or at least SETSEx config does, cooperative handling. Pre-emptive handling wouldn't work without also adding a serialising mechanism on top of that. Can't do this sharing with an interrupt routine for example.

EDIT: Hmm, on the other hand, a SETSE1 is probably going to be the IRQ source in that situation. It won't be sharing.

Christof Eb. · 2025-06-06 08:15

Hi,
the goal is to have a fast simple XBYTE routine executing from COG or LUT memory, that is still able to resolve to more complicated routines taken from the libraries or been self-written.

Perhaps it is worth asking:
Is a small (!) XBYTE interpreter (in assembler) possible, which is residing constantly in COG/LUT? Perhaps a lookup table of 24longs plus code of something like 60longs plus about 10cog register static variables.
This would be a state machine, which is called from a COG executing C.
If it encounters a special bytecode, it will return from this routine and execute that bytecode in the calling C routine. (A workaround to call a C-Routine from pasm)
Then the XBYTE interpreter must be restarted. It's state had been preserved through the cog static variables.

"The first 16 registers of LUT memory (from $200 to $20f) is left free for use by user PASM code, e.g. for the streamer. The remainder of the first half of LUT memory (from $210 to $300) is used for any functions explicitly placed into LUT."
So 128 longs of space is available in LUT at a fixed address. We would need to place the code and the table there somehow.

"Inline assembly may also appear outside of any function. In this case the inline assembly block is similar to a Spin DAT section, and creates a global block of code and/or data."
Is there an example for this? How is this code called? - A call from inline assembler via hardware stack?

While I avoid to use SPIN2, if necessary, I could try to write the routine, which calls the XBYTE interpreter in SPIN2.

Any hints greatly appreciated!
Christof

P.S.
1. It would be good to have addresses in cog memory printed in the list file for mixed source files like it is done for pure assembler files.
2. It would be great to have a description of the assembler language, including ORG, ORGH, $, @@@ somewhere in one place!

Edit:
The basic idea seems to work. Got it running. :-) What I don't understand is the distinction between those 16 longs for a table and the rest up to $300. I think, that the table should hold ???50??? instructions or so. I understand that I cannot have any other "functions explicitly placed into LUT".

Christof Eb. · 2025-06-06 10:19

Hi,
how can I do " extern register int *interpreter_pc; " in a SPIN2 file?
That's a static variable inside COG register memory.
Thanks a lot!
Christof

Edit: Seems I can #define names to PR0...7

Christof Eb. · 2025-06-10 14:22

@ersmith
Tried to bring a DAT section with PASM code of a SPIN2 file into a __pasm { section in a C File.
It does not like --ptrb in "rdlong tos, --ptrb" and also not "long 0[64]". It does not complain about ptrb++ in "wrlong tos, ptrb++" though.

ersmith · 2025-06-10 16:31

@"Christof Eb." It certainly should be possible to put an XBYTE interpreter into LUT and/or COG memory (you'll need to use some of the LUT for the XBYTE lookup table, but the top half of LUT is available for this). In theory the interpreter could be written in a C __pasm block or in inline __asm inside a COG or LUT resident C function. In practice this hasn't been tested very well, as you've discovered (thanks for the bug reports, I have checked in a fix for --ptrb and long 0[64]).

Spin2 does not have any equivalent of extern register, although as you discovered it does have some predefined variables that are always resident in COG memory.

evanh · 2025-06-21 17:42

Eric,
Is there a general rule I can use for calculating a safe area of cogRAM that some inline hubexec pasm could use as a block of local registers it copies into from a struct? Basically it's the same old parameter set as before but in this case I can't do a RES allocation to copy into because the code is not Fcache'd. It's actually been a niggle since resolving to using RES for the other routines. I'd ignored this one case during that effort.

This one routine doesn't make use of the FIFO like the others do, so I've kept it as hubexec. It feeds the streamer directly by code, from a call parameter, instead of fetching from hubRAM. It can easily keep up with sysclock/2 transfer rate because it is the CMD bits which are always clocked out one bit at a time. As opposed to four bits per clock for DAT.

At the moment the driver just has a guessed cogRAM start address of 0x70. Far enough in that it stays clear of compiled locals ... ? It's still using the old enum solution for the moment. It only needs six registers so using PR0..PR7 would be an option but I'd rather avoid consuming those.

struct cmd_parms_t {
    uint32_t  p_clk, p_cmd, m_align, v_nco, m_ca, m_se1;
} cmdset;

enum {    // address hack to land presets in the Fcache area, tx_command() isn't assigned to Fcache itself
      rp_clk = 0x70, rp_cmd, rm_align, rv_nco, rm_ca, rm_se1
};

Wuerfel_21 · 2025-06-21 19:49

I think you can generally use the cogRAM starting at zero. That's where the FCACHE blocks load to, so if you're in a const asm block, that should be clobber-able. (Debugger calls also clobbers these for some reason)

ersmith · 2025-06-22 15:16

@evanh said:
Is there a general rule I can use for calculating a safe area of cogRAM that some inline hubexec pasm could use as a block of local registers it copies into from a struct? Basically it's the same old parameter set as before but in this case I can't do a RES allocation to copy into because the code is not Fcache'd. It's actually been a niggle since resolving to using RES for the other routines. I'd ignored this one case during that effort.

As Ada said, the low block of cog memory (starting at 0) is actually what is free, the local variables start at $100 or so. At the moment that memory is in practice available for user use, but I'd suggest avoiding this: someday I'd like to make the fcache truly be a cache, in which case corrupting that memory would be bad.

We document that $1e0 to $1ef is always available for user use (PR0-PR7 are $1e0-$1e7, but those exist for the user to do things with, there's no need to avoid them). The first 16 registers of LUT memory ($200-$20f) are also always available for use by the streamer. This is documented in general.md, in the "Memory Map" section.

Wuerfel_21 · 2025-06-22 16:01

@ersmith said:
At the moment that memory is in practice available for user use, but I'd suggest avoiding this: someday I'd like to make the fcache truly be a cache, in which case corrupting that memory would be bad.

In such a case I guess there'd have to be an intrinsic to invalidate the cache. Maybe that can be added as a no-op already for future-proofing...

evanh · 2025-06-22 16:36

I don't consider the driver as user level, hence not keen to mess up $1e0..$1ef. The need is for driver internal presets that change at runtime. Primarily for handling pin I/O latencies under various conditions like sysclock frequency and divider ratio. Also for handling a difference in clock-streamer alignment when SD High-Speed access mode is engaged.

I did try using cogRAM $0 initially, way back, but it crashed ...

EDIT: Hmm, it's working at address $0 just fine now. I've even tried the Spin2 tester with it. Enabling debug is no issue, but then my SD testers all use the libc printf() these days anyway. Not much value in trying to maintain Pnut compatibility when using a Flex specific driver.

I now guess I'd messed up earlier.

evanh · 2025-06-22 16:52

I could just put the command gen routine into Fcache, the same as the rest. Then it can make use of the more formal RES allocation scheme.

evanh · 2025-06-22 17:06

Damn, it seems to work fine even when I use $1f0 or $100. Definitely has a problem if I use $1f8 though. I started to wonder if I was even editing the right file for a moment.

Wuerfel_21 · 2025-06-22 17:12

$1f0..$1f5 is the IRQ call/return pointers, so they're technically unused. $1f6 is PA, which will be trampled on FCACHE entry and possibly some other operations.

evanh · 2025-06-22 17:23

@Wuerfel_21 said:
$1f0..$1f5 is the IRQ call/return pointers, so they're technically unused.

Huh, I didn't look them up did I. Tempting but I'll stick with address $0 for the moment.

Rayman · 2025-06-24 11:04

@ersmith Thinking that going to need to use an 8 port version of fullduplex serial for a PLC project... Does one need to worry about the built in serial driver interfering with that on pins 62&63? Need to turn it off somehow? Or, maybe it is not invoked until called upon?

Now that think about it some more, maybe use the 8-port driver for all the other ports and use the built in one for 62&63. Guess the only downside there is that the built in one doesn't have buffers like the cog'd 8-port driver. Buffers could be a big benefit perhaps, although haven't needed it yet...

Now, wondering if could be a multi-port serial driver that doesn't need a cog... I.e., just more of the smartpin based drivers...
Probably safer to just use the cog'd version though...

evanh · 2025-06-24 11:14

Leave pins 62/63 for debug/telemetry/terminal access. The extra comports can do their own thing then.

BTW: Modbus, using RS485 interface, can have many slaves on a single master port. Don't have to use one port per slave.

Rayman · 2025-06-24 11:36

What it's for is some pressure gauges with USB output. Am using FTDI VDrive3 to convert USB into serial that is sent to P2. Going to have 2 or 3 of those things in the system...

Also, might want another port for Nextion display...

evanh · 2025-06-24 12:36

Looking up VDrive3 ... gives me a datasheet for Vinculum2 ... which contains this I/O list:

It has the entry "uart_tx_active" explicitly for RS485 turnaround control. Seems the VDrive3 should be able to handle being on a multi-drop RS485 bus with multiple other devices.

ersmith · 2025-06-24 13:56

If you don't use any of the built in serial code or debug then it shouldn't touch the pins. But if you do want to use debug then it's best to leave the standard pins alone and let the 8 port driver handle the other pins.

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

Comments