Automatic Stack Allocation In Flexprop And Catalina?

Wingineer19 · 2023-08-27 23:00

Hi ersmith and RossH,

Considering how critical Stack allocation is to proper operation when starting new cogs, it should be at the top of the list for the coders as we write programs to take advantage of the multi-cog capability. It should also be at the top of the fault list to examine if the code fails to execute or crashes.

However, determining the actual Stack allocation size required for a cog is difficult because just totaling the amount of bytes used by the auto variables within the function(s) executed by a particular cog and then using that value as the Stack size doesn't work. There's obviously more going on with the compilers and how they assign Stack space.

How does your compiler allocate the Stack space for the initial cog upon power-up since it's not necessary for the user to do that -- the cog just starts up and executes code.

Could a similar technique be used to automatically create the required Stack space when other cogs are activated?

Given the critical nature of Stack allocation, in a perfect world the compilers would allocate the Stack size automatically when new cogs are started and then reclaim that Stack size should the cog ever be stopped.

Not having to worry about Stack allocation would be a big hurdle the coders would have unknowingly cleared right at the outset of the coding process.

Could you provide your thoughts on this and the best way to calculate required Stack allocation if the compilers can't be configured to do that automatically?

RossH · 2023-08-28 00:12

Hello @Wingineer19

It's all fairly manual at the moment. The best thing I have found to do is instrument the code to print out the frame pointer, stack pointer or a variable address at various points, use these to calculate the stack size, and then allow a bit extra to account for things that you can't easily instrument and measure (such as library calls).

Here is a quick example of how to do this in Catalina - this works on a P1 or P2, in all modes except COMPACT (which uses different PASM):

// NOTE: we need to allocate a local variable to ensure a function has a 
// frame, and then USE it to ensure the compiler does not optimize it away!
void func() {
   char buff[100];
   printf("func FP = %u\n", PASM (" mov r0, FP\n"));
   printf("func SP = %u\n", PASM (" mov r0, SP\n"));
   printf("func &buff = %u\n", &buff);
}

void main() {
   char buff[100]; 
   printf("main FP = %u\n", PASM (" mov r0, FP\n"));
   printf("main SP = %u\n", PASM (" mov r0, SP\n"));
   printf("main &buff= %u\n", &buff);
   func();
}

I did once write a compiler that could calculate the runtime stack requirements of the programs it compiled, but it was for a much simpler language (a type of ladder logic used in Programmable Logic Controllers) that did not allow recursion. As far as I know, once you allow complex things like recursion all bets are off, and the only thing you can do is actually measure it in specific instances.

Another possibility would be to prefill the stack RAM with a known value that would never be used as an address, run the program, and see how far the stack remains untouched. In a stack-based language the return address (at least) will be written to the stack even when there are no local variables. This would also be a simple way to include the stack requirements of library function calls.

It occurs to me that on the Propeller 2, the debug capabilities might provide a possible way to measure it in particular instances without having to change the program itself at all - but I have never used the debug facilities, so I am not sure. Also, this capability does not exist on the Propeller 1.

Ross.

RossH · 2023-08-28 00:26

To answer your question about the initial stack pointer - Catalina does not calculate it. It simply allocates the memory required by all the loaded plugins from the top of Hub RAM down during program initialization, and then starts the user program using the next available address as the initial stack pointer. If there is not enough stack space to run the program (a fairly common occurrence on the P1!) then the stack may end up overwriting the program code or data.

Adding code to check this would certainly be possible, and I will think about adding it as an option to Catalina as a debugging aid - but would take more time and code space, so it would not be something you would want to leave enabled all the time - and in some cases it could not be used at all.

It also means you would probably need double the number of pre-compiled versions of the standard C libraries**, and Catalina has 7 of those already!

Ross.

EDIT: ** On the P2 I could do it entirely in the kernel, but on the P1 (where it would be the most useful) the kernel space is very, very tight, and I would have to sacrifice something else to fit it in.

ersmith · 2023-08-29 12:04

Like Catalina (and PropGCC, and probably all the other C compilers for the P1 and P2) FlexC allocates the initial default stack as being basically "all the memory that's left over", which is fine for the initial thread but doesn't scale to having more COGs.

I have wanted to add automatic stack size calculations, but haven't had the time yet. Recursive functions certainly pose a problem, but they're a relatively rare case in this scenario, and we'd want to print a warning. As a first step we could add a new compile time builtin _Stacksize(f) which would compute an estimate for the stack size of the function f (and if f is recursive it would do something like quadrupling the initial estimate and printing a warning).

Wingineer19 · 2023-08-30 00:00

Eric, Ross,

Thanks for your commentary on this.

I didn't give much thought to the Stack space issue until recently when I was running the Multi-Memory-Model supported by Catalina on a Prop1. One cog is configured to execute XMM code from external memory with HubRam reserved to only handle its Stack. Several other cogs are executing CMM code from HubRam which must also contain their code, data, and Stack. These CMM cogs communicate with the XMM cog via a shared data structure within HubRam.

When running only one or two cogs I could get by with a rough estimate of how much Stack space to assign to each. But with more cogs activated and accessing HubRam, manually assigning Stack space becomes critical and must be done carefully, especially on the Prop1 where memory is at a premium already.

If there's an abundance of memory, then assigning an excessive amount of Stack space shouldn't be an issue. However, with a very limited amount of memory like the Prop1, failure to assign sufficient Stack space to a cog invites disaster. Conversely, assigning too much is wasteful and reduces the amount of memory reserved for code and static data. So a balance must be achieved.

Significant time could be wasted attempting to debug what appears to be perfectly working code, only to find out later the whole problem was caused by a Stack overwrite.

If the compilers could automatically assign the proper amount of Stack space required by each cog it would certainly make the debugging process much easier since the overwrite problem could be ruled out.

Automatic Stack size allotment would be most beneficial on the Prop1 due to its memory constraints, but I think it would also benefit the Prop2 as well. Ross previously mentioned to me that an automatic Stack assignment would not only be beneficial to languages like C and Spin, but even more so to Forth.

Short of automatic assignment, some type of deterministic formula to allow manual assignment would be a good alternative. As I found out experimentally, just totaling up the number of bytes used by auto variables in all of the functions called by a cog, then setting that as the Stack space, was sadly insufficient. There appears to be some secret sauce within the compiler that requires an additional amount to be reserved, but I haven't figured out how or why.

RossH · 2023-08-30 02:54

@Wingineer19 said:
There appears to be some secret sauce within the compiler that requires an additional amount to be reserved, but I haven't figured out how or why.

It's complicated

When you call a function, the minimum you need to push onto the stack for a simple function call is the return address (but even in the simplest case there are exceptions - see 'inlining' below). But in more complex cases you generally allocate what it called a "stack frame", which contains all the function arguments and local variables. Then if you have a stack frame you will have to push the previous frame pointer onto the stack as well. Even simple functions may need a stack frame - and it is not always obvious which ones will need one and which ones will not.

While this can be calculated by the compiler for simple (e.g. non-recursive) cases, some of these things can be changed after compile-time. For instance, the Catalina Optimizer can change any or all of these things (e.g. by 'inlining' a function instead of calling it, or by removing unused space from the stack frame). It does this even for library functions, without needing to recompile them.

So even if details about the stack size are known at compile-time, they may end up being wrong at run-time. So compiling them into the program itself is risky.

Then there are miscellaneous other things that will definitely not be known at compile-time but only at link -time, such as additional stack space that might be needed to accommodate interrupts, or context switching in multi-threaded programs.

I will try and document this a little better, but I do not think it can ever be fully automated. Any attempt to do so will generally end up allocating too much stack space, so if you are running short of space you may still need to resort to manual "tweaking", in which case you are pretty much back to where you started. An initial "guess" is probably the most you can expect.

Ross.

evanh · 2023-08-30 22:32

Huh, I only just bothered to look up how a multitasking OS handles this. Turns out it's no different, just bigger numbers. And running out of stack is a common way to crash.

Automatic Stack Allocation In Flexprop And Catalina?

Comments