FlexSpin: is mixed HUB/LUT exec possible? (again)
The subject is almost identical to this thread but the application and requirements are different, this time, so I have started a new thread.
I'm about to implement the EtherCat master which will be strongly connected to the Ethernet PHY driver. The driver already used two cogs so to avoid wasting too many cogs I plan to use the sender cog of the Ethernet driver to also do most of the protocol handling for EtherCAT. The receiver cog needs to be ready for an incoming packet at any time. Sending and receiving of packets happens overlapping so we need full duplex and at least two cogs.
Bit-banging the RMII protocol requires maximum performance and needs to be coded in assembler. Parsing the command queue and building the Ethernet frames has to deal with a lot of data structures, containers and buffers. So coding this in C would be most convenient.
So my plan is to place the sender part of the Ethernet driver permanently in LUT RAM and call it from the C code. I also need some cog RAm/registers for scratch space and at least one parameter to pass to the assembler code. This time I don't need interrupts and the assembler code doesn't use the CORDIC unit.
Is this possible? I think I have to disable the FCACHE feature or at least limit it to the LUT space I do not use.
Comments
There is a nice attribute on function definitions for fixed assignment of a function to either cogRAM
__attribute__((cog))
or lutRAM__attribute__((lut))
. I assume it takes away from the available Fcache space but with sparing use it should be a solution. Funnily, it would be the perfect way to effect an ISR in C too.eg:
Ah, that's very interesting. But it doesn't help in my case. I need the C code to stay in hub RAM and the assembler code in LUT RAM, as it normally does. I'm just looking for a way to protect my assembler code from being overwritten by the FCACHE feature. And I'm looking for the documentation which hopefully grants some free cog registers which are not used by compiled code.
Anyway... I think the best method is to start coding and don't try to optimize too much at the beginning. I'll simply use 3 cogs. That makes everything a lot easier because I don't need to modify the ASM part each time I change the data structures. which would be error prone. It also doesn't restrict the use of DEBUG or printf() statements which is handy during development.
When I'm done I'll know exactly how large the C and assembler code is and how many registers I need. Then I can decide which option I choose:
1. translate the C code manually to assembler so that everything the sender PHY and the protocol part runs homogenously together in one cog without any tricks. This is a lot more work but would give best performance. If there is not enough LUT/COG RAM I could also place the less time critical parts in hub RAM.
2. Try to use mixed hub and LUT execution. Advantage: more readable and maintainable code. Disadvantage: not prtable to other compilers, restricted use of DEBUG/printf()
Well, a C wrapped inline pasm will work that way. Ie:
There might be a second way to do the above attributing to pure pasm too. There is supposed to be a way to compile a pure pasm section that was originally for the purpose of loading into its own cog. But it could possibly be repurposed for this too.
Yep, that sounds wise.
Yes I know, inline pasm is loaded to LUT RAM. This is perfectly OK for short code snippets. But I just want to avoid that it's re-loaded each time I call the driver because the code is relatively long and I call it quite often. Instead, it should stay there.
Inline ASM is loaded into low cog ram, LUT is left alone.
No, when it's attributed like I've done, the pasm is permanently loaded in cogRAM or lutRAM as specified. It is not Fcached at all then.
You can even make it an
__asm const {}
and it still gets placed in the specified RAM. It's just left unoptimised then.Functions marked
__attribute__((cog))
or__attribute__((lut))
are permanently kept in the respective local memories.Ah, thanks Eric. Now that you've mentioned it I found it right away in the General doc. Chapter "Functions in COG or LUT memory":
and Chapter "Register usage - P2":
So I think that's all I need. So if I put my assembler code inside a
__asm { }
block and that in turn inside a functioned marked as__attribute__((lut))
it should be placed permanently in LUT RAM. I think I could even use function arguments and local variables to pass parameters and return values, right?Yes.