Virtual interrupts using LMM?
Rayman
Posts: 14,815
Just out of curiosity, I wonder if one could add interrupts to a LMM code...· I haven't looked in depth at the LMM, but it seems that the code doing the fetching could check the state of something or other to indicate an interrupt.·
Then, maybe it could store the flag states and source location and jump to a different source code location?
Then, maybe it could store the flag states and source location and jump to a different source code location?
Comments
Of course one could add more LMM kernel services as necessary that would
run natively; having an asynchronous mechanism would also be useful.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley
Traffic is slow at times, but Parallax orders·always get here fast 8)
Two options - Add a check on an I/O pin or hub memory location for every fetch/execute of an LMM instruction but that could halve the speed of LMM execution, or only check for interrupt after a jump into the LMM kernel which would give fastest LMM linear code execution but longer latency. Most LMM code will use a jump into the LMM kernel but not all ( ie, immediate adds/subs on the LMM PC to do branching wouldn't ).
since we can't really do a real-time PC load on a pin-state change.
Of course having IPC·for embedded apps given a multi-tasking ability would be useful.
But yes one could have a configurable "Native" poll routine that would at least be able
to quickly check for state changes (less than 200us??), and raise status for the LMM.
Seems like the end jump of the unrolled kernel would be a fair place to put the flag
check and context switch (if enabled) since there will be a sync loss there anyway.
I've attached a version of an interrupt module.· It allows for ISR, IMR, and IPAR.
I know you guys have been doing this longer than me, so suggestions are welcome [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley
Traffic is slow at times, but Parallax orders·always get here fast 8)
Post Edited (jazzed) : 5/1/2008 4:53:42 AM GMT
Thanks for the example code - been programming since 6502, but the Propeller is new to me.
Post Edited (Rayman) : 5/1/2008 12:57:35 PM GMT
You can read more about with the URL below:
propeller.wikispaces.com/Large+Memory+Model
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Aka: CosmicBob
I built a 6502 based computer on a bread-board in 1987 [noparse]:)[/noparse]
Looking back·that example, the measured loop time is about 600ns at 80MHz.
Moving IPAR & IMR settings out of the loop and using stop/start instead of
dynamic pin and IMR selection makes idle loop-time ~200ns (250ns measured).
By the way, I'm finding C with LMM to be 4x to 8x faster than equivalent spin without function calls.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed·... about·living in·http://en.wikipedia.org/wiki/Silicon_Valley
Traffic is slow at times, but Parallax orders·always get here fast 8)
ICC LMM USERn variables are variables that allow you to communicate with the kernel. These are also available if you want to use global variables with in-line asm.
If USER0 is an interrupt "vector" flag and USER1 contains a first level callback function address, kernel code can load this first level callback function after saving the last PC on the stack. This callback code kernel function call is similar to ficall, but I found that using ficall does not work.
The KEY FACTOR in making this work is to have the first level callback function that the kernel invokes do nothing but call another function. This forces the kernel to maintain context. If you have any variables at all in the first level callback, the system fails since the register context is not preserved.
As far as performance goes, the latency caused by checking the interrupt vector flag at the end of the unrolled instruction fetcher is trivial. The more unrolled the better for kernel processing time in the face of an interrupt storm. The interrupt latency suffers with larger unroll count, but that is not likely to be critical since we can use the propeller cogs for handling tasks in parallel (not referring to SMP just PASM devices).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed
Post Edited (jazzed) : 5/9/2008 4:42:54 PM GMT
Thank you for some rough speed guideline too. Now we can say something concrete rather than "much faster" than Spin. This jives with LMM C is about 10% to 25% of native code, and Spin is about 40x to 80x slower than native code. The kicker is that we can improve on the LMM C performance. This is the low water mark. We will implement FCACHE too, and the loops like strcpy etc. will scream...
Lets see... so we effectively have a set of eight 8-12 Mhz 32 bit processing engines, with potential to run COG native routines at a separate COG at native 80 Mhz speed. Not bad at all...
1. the "vector" register.
2. first level callback function address (unless you can make a configurable library hook).
3. global interrupt enable/disable - also used for "hardware" interrupts.
The "hardware interrupts" using PASM like shown earlier can probably use a special "vector" cookie like -1 rather than some function address vector. The cookie would have to be reserved if you make a library hook.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed
This was a pain to get working [noparse]:)[/noparse]
If anyone wants to formalize the method by requesting special variables, speak up.
Richard has already shown willingness to adapt. I think feedback on a solution
with changes if necessary would help cement the idea.
Interrupts are the first step to pre-emptive mulitasked O/S. With the left over
cogs it would be great to create some form of symmetric multiprocessing.
Well probably exhaust memory before that point though [noparse]:([/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
jazzed
I've been reflecting on that. The limitations for ICC apply equally to any LMM which interprets 32-bit wide PASM code.
For a Prop I there can be 8K PASM instructions which includes the LMM interpreter itself, the Prop II will allow 64K PASM instructions. My gut feeling is that 64K will be enough for most applications, so it's only the Prop I which could have problems and, even there, 8K may be good enough for many things but note that any data RAM used comes from that same 8K/64K longs.
One option is to go for an Extended-LMM where generated PASM is held in external I2C Eeprom or even SD Card. That adds overhead in a number of places and will slow execution down but should be workable.
I've only looked at execution flow not data handling so far but it certainly seems feasible. With the least kernel changes ICC supports 64K PASM instructions as is, with minor changes that rises to 256K PASM instructions, more complicated changes and that goes up further.
This is based on using ICC with a home-brew assembler and linker working from the generated .s files to create an entirely 'run from Eeprom' linear image. More complicated solutions such as overlay would likely require changes within ICC code generation
The big overhead with Eeprom stored code is fetching time. That can be mitigated by using a separate Cog to fetch and cache Eeprom code as required, and it shouldn't overly complicate the kernel itself.
Is it practical ? No idea. In the worse case of 1MHz I2C it could reduce the Propeller to being a 0.1 MIPS processor, but that could still be entirely acceptable to many if they insist on using a Prop I and want more than 8K.