State machine vs. context switching
I'm currently writing a driver that handles multiple peripheral devices which require a complex initialization procedure. There may be hundreds of them and each one can be different. Doing the procedure one-by-one and step-by-step would take a long time because I have to wait for the network communication after each step. So I have to process the devices concurrently. There is no way to do this with cogs because I would soon run out of them.
The common way to handle this is with state machines. This means that the code has multiple big and nested switch/case statements to decide what has to be done next for each device and for each network packet. This makes the code look ugly.
The alternative method would be to do some sort of "soft" multitasking. The idea is to start a process for each device. This way the initialization procedure can be coded sequentially. Each process simply runs from the beginning to the end. Each time it needs to wait it calls a ContextSwitch() function that transfers control to another process by exchanging the stack pointer so that the return from ContextSwitch() jumps to a different location (the point where the other process called ContextSwitch() last time).
This works similar to the RESIx instructions: The next time the interrupt is invoked it is not executed from the beginning of the ISR but resumes after the last RESIx instruction.
Question @ersmith: Is this possible at all? I mean, when I use FlexProp can I assume that all context information is on the stack or is there some hidden state stored somewhere else (cog registers, LUT etc.)?
Comments
A system runs in real time if and only if the computational power of said system is higher than the requirements of the application. In the dark times I realized a multi task system in FORTRAN 60 running a loop, calling short processes and all those processes were exchanging information via "semaphores". Every process started first checked request and if not requested, returned. From that time I used this technique over and over and even in multi processor/core systems. Prioritizing is simply made by having more calls to tasks of higher priority in the loop, depending on the worst case runtime between such calls.
There is state in the COG (e.g. most local variables are in cog registers). Certainly just hacking the stack pointer will not be enough to context switch. In theory setjmp/longjmp should work to save/restore the state you need (that's the "official" C way to do context switching) but in practice it hasn't been tested much, so it may or may not work for you.
I wrote two simple classes for Taqoz Reloaded, the forth engine for P2 based on the Mini-OOF library.
1. Timers for Mini-OOF - enables a list of jobs to be run each with their own independent time period. The example shows a small list of jobs intermeshing in time.
2. State machine for Mini-OOF allows a list of state machines to run in turn, each one state at a time. e.g. SM1 state 1, SM2 state1, SM3 state 1, SM1 state 2, SM2 state2 etc.
Hope that stirs some ideas.
Cheers, Bob
@ErNa Real time performance is not the problem, here. The initialization procedures have to be completed before doing any real time control tasks. Once this is done the communication is very efficient. The bits and bytes relevant to the real-time control loop ("process data") of ALL devices can be mapped to a single logical address space and transmited all in a single network frame. But before this can take place there's a real awful lot of different things to do. Device addresses have to be assigned, buffers allocated with special interlock protections ("sync managers" control double buffering with mutexes), clocks have to be synchronized and process data has to be mapped to the address space (FMMU). Any action has to be processed by the external device and acknowledged. So we have to wait a lot in each process, anyway. This is a good application for co-operative multitasking.
For true real-time processing you generally need pre-emptive multitasking and priorisation. But that's a whole different story. Fortunatelly, the hard real-time tasks normally need much less resources and can be implemented with cogs on the Propeller offering true parallelism.
@ersmith Thanks for the advice. I'm not sure if context switching was a good idea for thes problem anyway. Because there might be a large number of devices and therefore processes I had to be very stingy with stack space or I would easily run out of memory. But stack overflows is something you definitely want to avoid in machine controllers where a crash can cause expensive consequences. So I think I'll do it in the traditional way and use state machines. I just was curious if it was possible because it could be an elegant solution for some problems.
Forth is no option here, but interesting anyways. I've also implemented such a timer device for the PC side of my CNC software. For the current project (EtherCAT) I hope to get away with a single, fixed cycle time period. Did you use a simple linear priority queue or some more sophisticated container like a map, binary tree or heap to improve insertion and search performance when the number of queue entries becomes large?
No - just a hard-coded list, this was naive code, to prove interleaved timers and state machines could be done without needing a traditional multitask engine.