LUT execution (Fastspin)
ManAtWork
Posts: 2,178
in Propeller 2
I tried to put some code of functions that I call often in the LUT RAM to speed up execution. However, I'm doing something wrong, obviously. The program crashes immediately unless I comment out the call ##0x200. I reduced everything to the absolute minimum to demonstrate it.
#include <stdio.h> #include <propeller.h> #define P2_TARGET_MHZ 180 #include "sys/p2es_clock.h" #ifndef _BAUD #define _BAUD 230400 #endif __asm { // runs in LUT RAM //ORG 0x200 // compiler complains about ORG but doesn't matter as long as code is PC relative Lut_code // put code here later, for now we just return... ret } int InitLut () { int32_t* s= &Lut_code; int32_t* d= 0x00; int i= 256; int tmp1; __asm { .loop rdlong tmp1,s wrlut tmp1,d add s,#4 add d,#1 djnz i,#.loop rdlut i,#0 // read back the LUT contents to verify } return i; } int CallLut (int in) { __asm { call ##0x200 } return in; } void main() { clkset(_SETFREQ, _CLOCKFREQ); _setbaud(_BAUD); int i= InitLut (); printf ("after Init i=%08x\n", i); i= CallLut (1); printf ("after Call i=%08x\n", i); }If I comment out the call the output is "i=fd64002d" and "i=00000001". The first number is the ret instruction after the Lut_code label. So I think my code is copied correctly to LUT RAM. What else can be wrong? The docs say
I know that I can't use the LUT when compiling with -O2 because the compiler will use the LUT itself. But I've checked the .p2asm file and haven't found any conflicts. My djnz instruction gets optimized into a rep (!) but everything else is compiled as expected.LOOKUP EXECUTION
When the PC is in the range of $00200 and $003FF, the cog is fetching instructions from cog lookup RAM. This is commonly referred to as "lut execution mode." There is no special consideration when taking branches to a cog lookup address,
Comments
tl;dr; use
Btw it uses the internal stack.
Doh!
I've needed absolute adresses so rarely that I have completely forgotten the use of "\".
That's a bug in the inline assembly (and it will only happen for inline assembly). It's fixed in github now.
If you cannot build fastspin from source, I suggest you wait a few days for the next release. It will have a way to copy inline assembly to LUT automatically (__asm volatile will do this).
BTW, what do I need to do this? Unlike FlexGUI it shouldn't need any special libraries. It's a console application that can be compiled with MinGW, isn't it.
I guess this is handled like -O2 optimization. I mean the code is copied into LUT each time before execution. What I'm currently looking for is a feature to copy code to LUT only once and then call it multiple times. If the code is called often but contains no loops the LUT execution would otherwise not give much benefit.
Putting functions into COG or LUT memory is on my TODO list, but it's not as easy as hijacking the FCACHE mechanism to load inline assembly before executing it. Functions in internal memory will take a while.