Multitasking musings
Dr_Acula
Posts: 5,484
The propeller can run 8 cogs in parallel with Pasm, but can it run multiple Spin programs in parallel too?
On the wikipedia page there is a comment about multithreading but it is not totally clear whether this refers to spin or pasm. I've also been searching the obex for a while but I'm not quite sure what to search for.
Simple concept is that you divide the hub ram into two 16k halves, and each spin program has their own half. There would be two cogs, each running a copy of the spin bytecode interpreter. Would this be possible?
Getting technical, each spin program has a stack, and does this stack end up at the top of hub ram, or the top of that spin program, or in some other location? Also, are there other bits of code that are scattered around the hub, or does the Proptool tend to package it all up into a compact binary file where working variables and the stack and anything else tends to stay within this package?
The reason I ask is that I've been playing around with a circuit using two propeller chips and two sram chips, with a 16 bit data bus, and using a tight pasm routine it is possible to move data from the ram to the hub at a fairly decent speed (maybe 5 pasm loop, 20,000,000 instructions a second, so 4mhz, 4 mega-words a second or 8 megabytes, ram is 32k, maybe we can fill the ram in 1/100th of a second).
So let's say you have a spin program running and it is a big program and takes 25k, and you want to access an SD card, and the SD card program when compiled takes 12k. It is not possible to combine these two into one program. What you can do, and what I have done in the past, is reboot the propeller, load it with the compiled SD program, move data blocks out to external ram, then reboot the original program. But reboots take a little more time, and I'm wondering if there is another way.
The first idea is to pad out a spin program with 16k of zeros, so that it compiles with the bulk of the code starting at 16k. I don't know if the proptool will do this or will optimise that 16k of zeros and put it at the end, or in the middle of the program rather than the end?
Another idea is to have spin programs that can hibernate - store their working variables outside of hub, eg in external ram, then just go into a small loop that is checking for a particular long up at the top of ram to change. Then bring in a whole lot of data from external ram for a separate spin program. The original hibernating program would need to be running in a known part of hub so it doesn't get overwritten.
If all this could work, I'd couple it with the previous work done separating out pasm code and spin code, so that up to 14k of hub ram is freed up and spin programs load in two parts - the first sets the cogs going with their pasm, and the second part is just spin with no pasm, and it talks to the cogs via data at known fixed locations (or even better, via locations that you pass to the pasm code).
I don't know if any of this is possible. I guess the idea is to try to build an operating system within the limited hub ram available.
For simplicity sake, all the code is spin. No pasm. But thinking about this, rather than have spin programs that hibernate by jumping to tiny routines in fixed locations in hub ram, I wonder if there is another way. Spin is being run by a pasm interpreter running on a cog. Would it be possible to pause the running of this cog?
So - two spin interpreters, each running in their own cogs. First one is running, second one is paused. First program wants to transfer control to the second program, so it needs to store its own local environment maybe up in hub ram, maybe external ram (stack location etc). Then it sends a message to the second spin interpreter to start running after a small delay, and puts itself into hibernation. Maybe there is a third cog devoted to doing block moves of memory from external ram to hub ram, so when the second spin interpreter comes to life, it just keeps running as if nothing had ever changed.
I've pondered all this before eg http://forums.parallax.com/showthread.php/153351-Overlays-for-large-spin-programs
This time I'm thinking of something a bit simpler. Maybe it is even as simple as a bootloader but it is loading the spin from external ram rather than eeprom, and doing it much faster because it is 16 bits in parallel rather than a serial i2c link.
Maybe I just want to be able to run two spin programs in parallel.
Thoughts would be most appreciated
On the wikipedia page there is a comment about multithreading but it is not totally clear whether this refers to spin or pasm. I've also been searching the obex for a while but I'm not quite sure what to search for.
Simple concept is that you divide the hub ram into two 16k halves, and each spin program has their own half. There would be two cogs, each running a copy of the spin bytecode interpreter. Would this be possible?
Getting technical, each spin program has a stack, and does this stack end up at the top of hub ram, or the top of that spin program, or in some other location? Also, are there other bits of code that are scattered around the hub, or does the Proptool tend to package it all up into a compact binary file where working variables and the stack and anything else tends to stay within this package?
The reason I ask is that I've been playing around with a circuit using two propeller chips and two sram chips, with a 16 bit data bus, and using a tight pasm routine it is possible to move data from the ram to the hub at a fairly decent speed (maybe 5 pasm loop, 20,000,000 instructions a second, so 4mhz, 4 mega-words a second or 8 megabytes, ram is 32k, maybe we can fill the ram in 1/100th of a second).
So let's say you have a spin program running and it is a big program and takes 25k, and you want to access an SD card, and the SD card program when compiled takes 12k. It is not possible to combine these two into one program. What you can do, and what I have done in the past, is reboot the propeller, load it with the compiled SD program, move data blocks out to external ram, then reboot the original program. But reboots take a little more time, and I'm wondering if there is another way.
The first idea is to pad out a spin program with 16k of zeros, so that it compiles with the bulk of the code starting at 16k. I don't know if the proptool will do this or will optimise that 16k of zeros and put it at the end, or in the middle of the program rather than the end?
Another idea is to have spin programs that can hibernate - store their working variables outside of hub, eg in external ram, then just go into a small loop that is checking for a particular long up at the top of ram to change. Then bring in a whole lot of data from external ram for a separate spin program. The original hibernating program would need to be running in a known part of hub so it doesn't get overwritten.
If all this could work, I'd couple it with the previous work done separating out pasm code and spin code, so that up to 14k of hub ram is freed up and spin programs load in two parts - the first sets the cogs going with their pasm, and the second part is just spin with no pasm, and it talks to the cogs via data at known fixed locations (or even better, via locations that you pass to the pasm code).
I don't know if any of this is possible. I guess the idea is to try to build an operating system within the limited hub ram available.
For simplicity sake, all the code is spin. No pasm. But thinking about this, rather than have spin programs that hibernate by jumping to tiny routines in fixed locations in hub ram, I wonder if there is another way. Spin is being run by a pasm interpreter running on a cog. Would it be possible to pause the running of this cog?
So - two spin interpreters, each running in their own cogs. First one is running, second one is paused. First program wants to transfer control to the second program, so it needs to store its own local environment maybe up in hub ram, maybe external ram (stack location etc). Then it sends a message to the second spin interpreter to start running after a small delay, and puts itself into hibernation. Maybe there is a third cog devoted to doing block moves of memory from external ram to hub ram, so when the second spin interpreter comes to life, it just keeps running as if nothing had ever changed.
I've pondered all this before eg http://forums.parallax.com/showthread.php/153351-Overlays-for-large-spin-programs
This time I'm thinking of something a bit simpler. Maybe it is even as simple as a bootloader but it is loading the spin from external ram rather than eeprom, and doing it much faster because it is 16 bits in parallel rather than a serial i2c link.
Maybe I just want to be able to run two spin programs in parallel.
Thoughts would be most appreciated
Comments
Thanks,
I knew I was missing the point.
It almost sounded like Dr_Acula wasn't aware how easy it is to launch a second Spin interpreter.
I figured I wasn't understanding the post correctly. I thought I had deleted my earlier post before it had been read.
The basics are you can have as many spin programs running as you have spare cogs.
You have variable space in hub, and a stack in hub (which you define for all other spin programs other than the first).
In the cog, there are 5 pointers? (dcurr, pcurr, etc) and you have a bunch of internal interpreter variables x, y, etc.
So far, so good.
Now, presuming I understand correctly, you could make your spin object code (with/without variables???) unloadable/reloadable. I suppose you could even do this for the stack too.
Then, you could have two co-operating cogs running spin. Both would share the same code/variable/stack space, and each would have to co-operate to suspend one of them at a time, and have the variable/stack saved off, and the code/variable/stack reloaded. It would be better/simpler/faster to just overlay the object code since you would only have to read in the new hub code (ie not write out the old code).
Now, with intimate knowledge of the interpreter, it would be possible to do this. So, is this what you are trying to do?
The following post will answer some of your questions more directly.
That sounds encouraging
Thinking out loud, the spin interpreter lives in rom and runs in a cog so should be easy to copy to another cog and start it up.
I guess where I'm confused is how the proptool would handle this - what would the code look like? Would it be one spin program, or would you compile it as two separate programs at different locations and then merge them together, or do it some other way?
To launch a new Spin interpreter, you just need to use cognew with a Spin method.
The above code launches a second Spin interpreter to blink a LED on P16. The method "Blink" could be as complex as you'd like (it could call other methods). You'd just need to increase the stack space if you increase the complexity of the code running in the second interpreter.
Here's a link to a thread where I investigated stack size (many others have also posted info about stack sizes).
You want to be able to suspend the main spin program while the SD card is accessed. This way you can overlay the "main spin program" in hub with the "sd spin program", and execute the sd card program until it is done. The you want to suspend the sd card program, reload (overlay) the "sd spin program" with the "main spin program" in hub, and restart/continue the main spin program.
Apart from getting the pin interpreters to "synchronise" (which can be done - just need to think it thru but it is definitely possible), you need to be able to map the hub so that you know where to place the two set of codes ("main spin program" and "sd spin programs").
bst and homespun both provide object code listings. I am unsure if OpenSpin can do this. Some have figured out what the compiler places first. And it is possible to fill hub space using DAT and/or VAR sections (see my 1pin-TV for an example of how I fill hub following spin code so that I can reuse that space as the text buffer). Answered above. Easier just to use the bytecode section, at least initially anyway. As you know, objects are relocatable, so its only necessary to pass the interpreter a starting address of the object code, and an address of the stack. The object code contains a list of relative addresses for variables, objects etc. Take a look at my Prop OS that I built using your KyeDos as a base. Answer is YES. But of course its a bit more complicated than this. Maybe ponder my answers. Then we can discuss further.
The simplest question...
Are the objects/overlays standalone? ie Do they need to resume from where they were suspended, or Do they begin fresh?
One thing working with the propeller - I find it easier to think of programs running in parallel. Going over to the Arduino for some coding, the concept of a single program is the same as on the propeller, but when you run two programs, the arduino starts to struggle. Things like dual uarts for instance. And ask the Arduino to run 4 uarts and produce a TV signal at the same time and it is very hard, yet the propeller could do this easily.
Thinking about that a bit more, if you were doing this on the arduino, you could write some uart code, and then you might write a second uart and splice it into the first uart. But each time you do that, you have to revisit old code. It gets complex when you are juggling in your head uart code, servo code, display code all at once.
On the propeller though, the code can be written once and then can go to the obex. It also makes it easier to use code someone else has written since in many cases you never have to look at the source. If you have a servo running, and add a VGA driver, you know the servo timing will still be precisely the same.
But all this is for pasm, and I'm not sure the same parallelism has been explored so much for Spin.
Looking back through some old links, there are certainly some common variables that can be changed to relocate code, but I wonder if there is a simpler concept?
Say we take a spin program, and we say that it has an allocated block of hub ram that it has to run in. It can do what it likes in that block of ram, but it must stay inside that block. The stack must be internal to that block, and so must any variables. If each spin program has its own block, they ought to be able to run in parallel without ever upsetting each other.
So maybe all that needs to be done is to write a Spin program that, say, runs in a block of hub ram at location 4000H to 6000H. Can we pad out the beginning of the spin program with a DAT block so the main spin code starts at a higher ram location?
(cross post with cluso, pondering questions now, but to answer the last one, I'd like several spin programs to be running at the same time, all in parallel)
I'm having a hard time understanding what you're after. You can do this with Spin just as easy as with PASM.
Say you write a PWM program in SPIN. Then you can easily run 3 or 4 identical versions in parallel. They will run at the same speed, etc. No interrupts to worry about.
Drac, you are overthinking it
Seems you can break up what you want into modules (or objects) in SPIN. Then, you want to be able to run different objects, one at a time, while the running module calls the next appropriate module.
To do this just requires a mod to the spin interpreter (probably the easiest).
The best way to test this out would be to get a few modules that could do specific tasks. Compile them all separately so we know they work.
Then marry them together as an exercise. I am in for it
Hows this for an example...
1. We write a spin object to wait for a text message from the serial port, terminated in cr/crlf.
2. We write a spin object to take that message and send out to the serial port, and terminates after the cr/crlf.
3. We write a spin object to display the text message on an LCD (I have 2 types of LCD - a Nokia 5110 and a 2.2" parallel color).
Repeat the loop.
Each of these 3 spin programs would be called, one after the other.
Initially, we will use the main spin program in Cog#0 to run each spin program, one following the other, all in Cog#1.
When everything works fine, we will add a new spin program that will be called between each of the above 3 programs. It will move the object program from its normal hub location to a new fixed hub location. Then we will start the "moved" object at the new fixed hub address. This will prove the mechanism to relocate multiple objects (which would allow you to reload from SRAM).
Now, we just need to mod the interpreter (or spin object) so that it can run the next object, without restarting Cog#1. This would give you your total solution.
If you haven't been using multiple Spin interpreters, then you've been missing out on a very powerful feature of the Propeller.
I often have several cogs running Spin code in my projects.
In my hexapod program, I have one cog computing the x, y, z coordinates of the foot positions while another cog computes the IK angles of the legs. Each of these two Spin cogs use their own instance of F32 as a coprocessor. A third Spin cog monitors the com line, Wii Nunchuck input and sends the desired patterns to the LED arrays (aka eyes).
You're in for a treat if you've only been running one Spin interpreter in your projects. IMO, being able to run multiple Spin interpreters is one of the funnest aspects of using the Propeller.
Cluso, if Drac hasn't been using multiple Spin interpreters, IMO, it would be a good idea to start out with some simple examples of how multiple Spin cogs can work together. I think modifying the interpreter should wait until Drac has a better idea of what can be done with the unmodified interpreter.
He wrote some spin coglets years ago. So he's more adept than his post shows.
As for doing the spin mod, I am quite happy to help.
Sounds very promising!
Ok, the Hello World of microprocessors, flashing a led.
Spin program 1 flashing a led once a second on a propeller pin.
Spin program 2 flashing a led on a different propeller pin.
Each spin program running on its own cog spin interpreter.
Each spin program has its own little area of hub ram.
This isn't a pasm program flashing a led, and there is no pasm anywhere in this program. This is spin. What would this look like in spin code? Does it have two "main" routines?
And here is the homespun listing (homespun -d filename)
Can we now push it a bit further. Is it possible to add a 'compile to absolute location' just before "PRI flash2"
Looking at the hex, just for argument's sake, so that line gets compiled to hex 0100. Or if not possible, some way of extracting that location by parsing the source code or the hex. Maybe some unique bytes in a dat section you could search for. But ideally, compile to a fixed location, because what would be really useful to is divide up the hub into separate sections and each spin program stays in its area and doesn't go outside of this.
If you can do that, then we can start to think about spin programs that can load other spin programs at certain locations. A spin program to load a binary or a hex file and just to copy some of the bytes to a certain location, and then to fire off a spin interpreter cog.
if that works, can we then look at bits of spin code that have a stack. eg flash1 calls an object, and flash2 calls an object, where are each of those stacks ending up, and more specifically, are they ending up as separate stacks, or is the compiler combining them together?
I will do that now and post shortly.
Postedit
Done that. So I can create standalone style objects But I need to create a base/starter object that has provision for the 1st Object to be our "loadable object(s)". Then any other user objects will be located in hub. Just working out how to do this.
Back in the olden days, compilers had the .org instruction - compile starting at this memory location. Is there an equivalent in spin?
However, we can do I tried [$1000-$/4] but it didn't work.
Hmm - that could be useful. Ok, can an entire spin program be moved in ram to a different location, and will every single instruction still work, or are there a handful of 'gotcha' instructions?
How about a pile of dummy instructions, even just a dat section, to pad out the beginning of a program so the spin starts at a higher location? Sure, you would not want to do this for a program sitting in eeprom as it wastes space, particularly if the program is 2k long and is designed to run in the upper 2k of hub ram, so now it is 32k long. But storing programs on sd cards etc doesn't cost anything. Would that save having to worry about all those header values?
I'm very impressed with how simple that dual led flasher program is.
What I am trying to do is keep the original hub pbase/vbase/dbase/pcurr/dcurr in hub $0006-000F unchanged, allowing the user to have his spin code following.
But, I also want to allow a loadable module with fixed locations in high hub, with Object Header at say $8000+, the stack at say $8100, and the variables at say $8200, and the code starting at say $8300. To do this, I would need to have a reserved Object Pointer at say $0014 (prt#1) which I can plug in after the hub $8000+++ is loaded, and then issue a coginit (n, xxxx, @xxxx) to run the loaded spin object in cog n.
I just need some more time to think about this.
Using this method I've been able to run 4 Spin programs at the same time under spinix.
When you say the header is adjusted, do you mean the Dbase etc at hub 0006 or the object pointer at about hub 0014?
I will check out Spinks soon.
I know what Drac wants to do is possible, it's just finding the simplest yet most flexible way.
Thanks heaps!
I was looking at Spinix and partially understood what you were doing. Then checked back here and you above post made things gel.
So basically, you load a "compiled object" into hub at any free space (you allocated this). This hub base area (AppAddr) contains the freq(4), xtalmode(1), checksum(1),pbase(2),vbase(2),dbase(2),pcurr(2),dcurr(2), and then the object header, spin bytecode, var. The new stack will be built above this.
Then you add the new AppAddr to the pbase(2),vbase(2),dbase(2),pcurr(2),dcurr(2) to form the correct offset values due to relocating the object load.
Then you calculate the VAR size and clear it.
Then you initialise the stack by clearing the RESULT variable (first on the stack), followed by the 2 x $FFF9FFFF stack frames (is this bit required??? as I have not seen it done to objects other than the first).
Lastly, you start a cog with the COGNEW(spininterpreter, AppAddr+4). The App+4 points to the xtal mode due to longs = addr(pbase) -2
I had been thinking I had to predefine some areas and then try to load the object pointers, object code, variables, and stack into fixed locations. I was also worried that I had to change the lower hub pbase/etc at hub 0006+, and the implications if another spin program ran one of those preloaded objects/methods. This solves all these problems
Just looking at the *.binary files created for an object.
I had presumed the VAR and the start of the stack frame (2x FFFFFFF9) had been stored in the binary file.
Isn't the first FFFFFFF9 the RESULT. You are loading
00000000 <- RESULT
FFFFFFF9
FFFFFFF9
When examining the PropTool visual binary they have
FFFFFFF9 <- RESULT
FFFFFFF9
Am I missing something???