Multitasking musings

Dr_Acula · 2015-04-06 17:40

The propeller can run 8 cogs in parallel with Pasm, but can it run multiple Spin programs in parallel too?

On the wikipedia page there is a comment about multithreading but it is not totally clear whether this refers to spin or pasm. I've also been searching the obex for a while but I'm not quite sure what to search for.

Simple concept is that you divide the hub ram into two 16k halves, and each spin program has their own half. There would be two cogs, each running a copy of the spin bytecode interpreter. Would this be possible?

Getting technical, each spin program has a stack, and does this stack end up at the top of hub ram, or the top of that spin program, or in some other location? Also, are there other bits of code that are scattered around the hub, or does the Proptool tend to package it all up into a compact binary file where working variables and the stack and anything else tends to stay within this package?

The reason I ask is that I've been playing around with a circuit using two propeller chips and two sram chips, with a 16 bit data bus, and using a tight pasm routine it is possible to move data from the ram to the hub at a fairly decent speed (maybe 5 pasm loop, 20,000,000 instructions a second, so 4mhz, 4 mega-words a second or 8 megabytes, ram is 32k, maybe we can fill the ram in 1/100th of a second).

So let's say you have a spin program running and it is a big program and takes 25k, and you want to access an SD card, and the SD card program when compiled takes 12k. It is not possible to combine these two into one program. What you can do, and what I have done in the past, is reboot the propeller, load it with the compiled SD program, move data blocks out to external ram, then reboot the original program. But reboots take a little more time, and I'm wondering if there is another way.

The first idea is to pad out a spin program with 16k of zeros, so that it compiles with the bulk of the code starting at 16k. I don't know if the proptool will do this or will optimise that 16k of zeros and put it at the end, or in the middle of the program rather than the end?

Another idea is to have spin programs that can hibernate - store their working variables outside of hub, eg in external ram, then just go into a small loop that is checking for a particular long up at the top of ram to change. Then bring in a whole lot of data from external ram for a separate spin program. The original hibernating program would need to be running in a known part of hub so it doesn't get overwritten.

If all this could work, I'd couple it with the previous work done separating out pasm code and spin code, so that up to 14k of hub ram is freed up and spin programs load in two parts - the first sets the cogs going with their pasm, and the second part is just spin with no pasm, and it talks to the cogs via data at known fixed locations (or even better, via locations that you pass to the pasm code).

I don't know if any of this is possible. I guess the idea is to try to build an operating system within the limited hub ram available.

For simplicity sake, all the code is spin. No pasm. But thinking about this, rather than have spin programs that hibernate by jumping to tiny routines in fixed locations in hub ram, I wonder if there is another way. Spin is being run by a pasm interpreter running on a cog. Would it be possible to pause the running of this cog?

So - two spin interpreters, each running in their own cogs. First one is running, second one is paused. First program wants to transfer control to the second program, so it needs to store its own local environment maybe up in hub ram, maybe external ram (stack location etc). Then it sends a message to the second spin interpreter to start running after a small delay, and puts itself into hibernation. Maybe there is a third cog devoted to doing block moves of memory from external ram to hub ram, so when the second spin interpreter comes to life, it just keeps running as if nothing had ever changed.

I've pondered all this before eg http://forums.parallax.com/showthread.php/153351-Overlays-for-large-spin-programs
This time I'm thinking of something a bit simpler. Maybe it is even as simple as a bootloader but it is loading the spin from external ram rather than eeprom, and doing it much faster because it is 16 bits in parallel rather than a serial i2c link.

Maybe I just want to be able to run two spin programs in parallel.

Thoughts would be most appreciated

kwinn · 2015-04-06 19:08

Isn't this already done in one or more of the Propeller operating systems, complete with caching and external memory? If not then perhaps a small kernel program that swaps the two programs in and out from external memory to hub memory as required. A form of co-operative multitasking iow.

Electrodude · 2015-04-06 20:32

What you're talking about, and dynamic spin object linking for that matter, is very possible, it's just that nobody has ever done it. It would basically be a matter of swapping out spin functions and making the method table entries of swapped out functions all point to one swapper function that figures out what should have been called, swaps it in (all code addresses in Spin are either relative or indirect, so it can put it pretty much anywhere), fixes the method table entry and then jumps to it. The Spin compiler I'm designing will be able to facilitate this with some macros or libraries.

Electrodude · 2015-04-06 20:38

@Duane Degn: I think he means that everything doesn't fit in hubram at once.

Duane Degn · 2015-04-06 20:41

Electrodude wrote: »

@Duane Degn: I think he means that everything doesn't fit in hubram at once.

Thanks,

I knew I was missing the point.

It almost sounded like Dr_Acula wasn't aware how easy it is to launch a second Spin interpreter.

I figured I wasn't understanding the post correctly. I thought I had deleted my earlier post before it had been read.

Cluso99 · 2015-04-06 21:27

Drac
The basics are you can have as many spin programs running as you have spare cogs.
You have variable space in hub, and a stack in hub (which you define for all other spin programs other than the first).
In the cog, there are 5 pointers? (dcurr, pcurr, etc) and you have a bunch of internal interpreter variables x, y, etc.
So far, so good.

Now, presuming I understand correctly, you could make your spin object code (with/without variables???) unloadable/reloadable. I suppose you could even do this for the stack too.

Then, you could have two co-operating cogs running spin. Both would share the same code/variable/stack space, and each would have to co-operate to suspend one of them at a time, and have the variable/stack saved off, and the code/variable/stack reloaded. It would be better/simpler/faster to just overlay the object code since you would only have to read in the new hub code (ie not write out the old code).

Now, with intimate knowledge of the interpreter, it would be possible to do this. So, is this what you are trying to do?

The following post will answer some of your questions more directly.

Dr_Acula · 2015-04-06 21:28

It almost sounded like Dr_Acula wasn't aware how easy it is to launch a second Spin interpreter.

That sounds encouraging

Thinking out loud, the spin interpreter lives in rom and runs in a cog so should be easy to copy to another cog and start it up.

I guess where I'm confused is how the proptool would handle this - what would the code look like? Would it be one spin program, or would you compile it as two separate programs at different locations and then merge them together, or do it some other way?

Duane Degn · 2015-04-06 21:41

Dr_Acula wrote: »

That sounds encouraging

Thinking out loud, the spin interpreter lives in rom and runs in a cog so should be easy to copy to another cog and start it up.

I guess where I'm confused is how the proptool would handle this - what would the code look like? Would it be one spin program, or would you compile it as two separate programs at different locations and then merge them together, or do it some other way?

To launch a new Spin interpreter, you just need to use cognew with a Spin method.

VAR
  long stack[32]

PUB Main

  cognew(Blink, @stack)
  ' do other stuff in with cog #0

PRI Blink

  dira[16] := 1 ' always set I/O pin states from within cog using the I/O pins
  repeat
    !outa[16] ' toggle pin 16
    waitcnt(clkfreq / 4 + cnt)

The above code launches a second Spin interpreter to blink a LED on P16. The method "Blink" could be as complex as you'd like (it could call other methods). You'd just need to increase the stack space if you increase the complexity of the code running in the second interpreter.

Here's a link to a thread where I investigated stack size (many others have also posted info about stack sizes).

Cluso99 · 2015-04-06 21:58

Dr_Acula wrote: »

The propeller can run 8 cogs in parallel with Pasm, but can it run multiple Spin programs in parallel too?

You have 8 total cogs. Each cog can run PASM or it can run SPIN (which is really a PASM interpreter running in the cog, and the SPIN bytecode, variables and stack reside in hub). So you can in fact run 8 spin programs.

On the wikipedia page there is a comment about multithreading but it is not totally clear whether this refers to spin or pasm. I've also been searching the obex for a while but I'm not quite sure what to search for.

Do you mean multi-threading, as in running more than 1 spin program in a single cog???

Simple concept is that you divide the hub ram into two 16k halves, and each spin program has their own half. There would be two cogs, each running a copy of the spin bytecode interpreter. Would this be possible?

Absolutely. But why waste 16KB each? You only need to reserve as much object code + variable space + stack space as required by each spin program. Of course, you can fill space but this takes a bit more understanding of the compiler.

Getting technical, each spin program has a stack, and does this stack end up at the top of hub ram, or the top of that spin program, or in some other location? Also, are there other bits of code that are scattered around the hub, or does the Proptool tend to package it all up into a compact binary file where working variables and the stack and anything else tends to stay within this package?

I don't know, other than the initial spin program places its stack at the end of all the code. If you look at the binary created, there is an FF9F or similar which is an indicator of the top used hub space. IIRC the stack builds from there up, not from FFFF down.

The reason I ask is that I've been playing around with a circuit using two propeller chips and two sram chips, with a 16 bit data bus, and using a tight pasm routine it is possible to move data from the ram to the hub at a fairly decent speed (maybe 5 pasm loop, 20,000,000 instructions a second, so 4mhz, 4 mega-words a second or 8 megabytes, ram is 32k, maybe we can fill the ram in 1/100th of a second).

OK

So let's say you have a spin program running and it is a big program and takes 25k, and you want to access an SD card, and the SD card program when compiled takes 12k. It is not possible to combine these two into one program. What you can do, and what I have done in the past, is reboot the propeller, load it with the compiled SD program, move data blocks out to external ram, then reboot the original program. But reboots take a little more time, and I'm wondering if there is another way.

Good. I understand this.

You want to be able to suspend the main spin program while the SD card is accessed. This way you can overlay the "main spin program" in hub with the "sd spin program", and execute the sd card program until it is done. The you want to suspend the sd card program, reload (overlay) the "sd spin program" with the "main spin program" in hub, and restart/continue the main spin program.

Apart from getting the pin interpreters to "synchronise" (which can be done - just need to think it thru but it is definitely possible), you need to be able to map the hub so that you know where to place the two set of codes ("main spin program" and "sd spin programs").

bst and homespun both provide object code listings. I am unsure if OpenSpin can do this. Some have figured out what the compiler places first. And it is possible to fill hub space using DAT and/or VAR sections (see my 1pin-TV for an example of how I fill hub following spin code so that I can reuse that space as the text buffer).

The first idea is to pad out a spin program with 16k of zeros, so that it compiles with the bulk of the code starting at 16k. I don't know if the proptool will do this or will optimise that 16k of zeros and put it at the end, or in the middle of the program rather than the end?

Answered above.

Another idea is to have spin programs that can hibernate - store their working variables outside of hub, eg in external ram, then just go into a small loop that is checking for a particular long up at the top of ram to change. Then bring in a whole lot of data from external ram for a separate spin program. The original hibernating program would need to be running in a known part of hub so it doesn't get overwritten.

Easier just to use the bytecode section, at least initially anyway.

If all this could work, I'd couple it with the previous work done separating out pasm code and spin code, so that up to 14k of hub ram is freed up and spin programs load in two parts - the first sets the cogs going with their pasm, and the second part is just spin with no pasm, and it talks to the cogs via data at known fixed locations (or even better, via locations that you pass to the pasm code).

As you know, objects are relocatable, so its only necessary to pass the interpreter a starting address of the object code, and an address of the stack. The object code contains a list of relative addresses for variables, objects etc.

I don't know if any of this is possible. I guess the idea is to try to build an operating system within the limited hub ram available.

Take a look at my Prop OS that I built using your KyeDos as a base.

For simplicity sake, all the code is spin. No pasm. But thinking about this, rather than have spin programs that hibernate by jumping to tiny routines in fixed locations in hub ram, I wonder if there is another way. Spin is being run by a pasm interpreter running on a cog. Would it be possible to pause the running of this cog?

Answer is YES. But of course its a bit more complicated than this.

So - two spin interpreters, each running in their own cogs. First one is running, second one is paused. First program wants to transfer control to the second program, so it needs to store its own local environment maybe up in hub ram, maybe external ram (stack location etc). Then it sends a message to the second spin interpreter to start running after a small delay, and puts itself into hibernation. Maybe there is a third cog devoted to doing block moves of memory from external ram to hub ram, so when the second spin interpreter comes to life, it just keeps running as if nothing had ever changed.

I've pondered all this before eg http://forums.parallax.com/showthread.php/153351-Overlays-for-large-spin-programs
This time I'm thinking of something a bit simpler. Maybe it is even as simple as a bootloader but it is loading the spin from external ram rather than eeprom, and doing it much faster because it is 16 bits in parallel rather than a serial i2c link.

Maybe I just want to be able to run two spin programs in parallel.

Thoughts would be most appreciated

Maybe ponder my answers. Then we can discuss further.

The simplest question...
Are the objects/overlays standalone? ie Do they need to resume from where they were suspended, or Do they begin fresh?

Dr_Acula · 2015-04-06 22:18

Thanks for all these comments - I think I'm grasping some interesting concepts here.

One thing working with the propeller - I find it easier to think of programs running in parallel. Going over to the Arduino for some coding, the concept of a single program is the same as on the propeller, but when you run two programs, the arduino starts to struggle. Things like dual uarts for instance. And ask the Arduino to run 4 uarts and produce a TV signal at the same time and it is very hard, yet the propeller could do this easily.

Thinking about that a bit more, if you were doing this on the arduino, you could write some uart code, and then you might write a second uart and splice it into the first uart. But each time you do that, you have to revisit old code. It gets complex when you are juggling in your head uart code, servo code, display code all at once.

On the propeller though, the code can be written once and then can go to the obex. It also makes it easier to use code someone else has written since in many cases you never have to look at the source. If you have a servo running, and add a VGA driver, you know the servo timing will still be precisely the same.

But all this is for pasm, and I'm not sure the same parallelism has been explored so much for Spin.

Looking back through some old links, there are certainly some common variables that can be changed to relocate code, but I wonder if there is a simpler concept?

Say we take a spin program, and we say that it has an allocated block of hub ram that it has to run in. It can do what it likes in that block of ram, but it must stay inside that block. The stack must be internal to that block, and so must any variables. If each spin program has its own block, they ought to be able to run in parallel without ever upsetting each other.

So maybe all that needs to be done is to write a Spin program that, say, runs in a block of hub ram at location 4000H to 6000H. Can we pad out the beginning of the spin program with a DAT block so the main spin code starts at a higher ram location?

(cross post with cluso, pondering questions now, but to answer the last one, I'd like several spin programs to be running at the same time, all in parallel)

Duane Degn · 2015-04-06 22:24

Dr_Acula wrote: »

But all this is for pasm, and I'm not sure the same parallelism has been explored so much for Spin.

I'm having a hard time understanding what you're after. You can do this with Spin just as easy as with PASM.

Cluso99 · 2015-04-06 22:37

Yes, SPIN programs can run in parallel, just like PASM programs run in parallel. The only difference is that SPIN is slower and hard to determine its speed.
Say you write a PWM program in SPIN. Then you can easily run 3 or 4 identical versions in parallel. They will run at the same speed, etc. No interrupts to worry about.
Drac, you are overthinking it

Cluso99 · 2015-04-06 22:41

Drac,
Seems you can break up what you want into modules (or objects) in SPIN. Then, you want to be able to run different objects, one at a time, while the running module calls the next appropriate module.
To do this just requires a mod to the spin interpreter (probably the easiest).

The best way to test this out would be to get a few modules that could do specific tasks. Compile them all separately so we know they work.

Then marry them together as an exercise. I am in for it

Cluso99 · 2015-04-06 22:53

Drac,
Hows this for an example...

1. We write a spin object to wait for a text message from the serial port, terminated in cr/crlf.
2. We write a spin object to take that message and send out to the serial port, and terminates after the cr/crlf.
3. We write a spin object to display the text message on an LCD (I have 2 types of LCD - a Nokia 5110 and a 2.2" parallel color).
Repeat the loop.

Each of these 3 spin programs would be called, one after the other.
Initially, we will use the main spin program in Cog#0 to run each spin program, one following the other, all in Cog#1.

When everything works fine, we will add a new spin program that will be called between each of the above 3 programs. It will move the object program from its normal hub location to a new fixed hub location. Then we will start the "moved" object at the new fixed hub address. This will prove the mechanism to relocate multiple objects (which would allow you to reload from SRAM).

Now, we just need to mod the interpreter (or spin object) so that it can run the next object, without restarting Cog#1. This would give you your total solution.

Duane Degn · 2015-04-06 22:54

Dr_Acula,

If you haven't been using multiple Spin interpreters, then you've been missing out on a very powerful feature of the Propeller.

I often have several cogs running Spin code in my projects.

In my hexapod program, I have one cog computing the x, y, z coordinates of the foot positions while another cog computes the IK angles of the legs. Each of these two Spin cogs use their own instance of F32 as a coprocessor. A third Spin cog monitors the com line, Wii Nunchuck input and sends the desired patterns to the LED arrays (aka eyes).

You're in for a treat if you've only been running one Spin interpreter in your projects. IMO, being able to run multiple Spin interpreters is one of the funnest aspects of using the Propeller.

Duane Degn · 2015-04-06 22:59

Cluso99 wrote: »

Drac,
Hows this for an example...

Cluso, if Drac hasn't been using multiple Spin interpreters, IMO, it would be a good idea to start out with some simple examples of how multiple Spin cogs can work together. I think modifying the interpreter should wait until Drac has a better idea of what can be done with the unmodified interpreter.

Cluso99 · 2015-04-06 23:15

Duane Degn wrote: »

Cluso, if Drac hasn't been using multiple Spin interpreters, IMO, it would be a good idea to start out with some simple examples of how multiple Spin cogs can work together. I think modifying the interpreter should wait until Drac has a better idea of what can be done with the unmodified interpreter.

I bet Drac has in fact run multiple spin cogs, but just hasn't realised it.

He wrote some spin coglets years ago. So he's more adept than his post shows.

As for doing the spin mod, I am quite happy to help.

Dr_Acula · 2015-04-06 23:16

If you haven't been using multiple Spin interpreters, then you've been missing out on a very powerful feature of the Propeller.

Sounds very promising!

Ok, the Hello World of microprocessors, flashing a led.

Spin program 1 flashing a led once a second on a propeller pin.
Spin program 2 flashing a led on a different propeller pin.
Each spin program running on its own cog spin interpreter.
Each spin program has its own little area of hub ram.

This isn't a pasm program flashing a led, and there is no pasm anywhere in this program. This is spin. What would this look like in spin code? Does it have two "main" routines?

Cluso99 · 2015-04-06 23:31

This will have 3 cogs running spin

CON
  _CLKMODE = XTAL1 + PLL16X     'Set to ext low-speed xtal, 16x PLL
  _XINFREQ = 5_000_000          'Xtal 5MHz

VAR
  long  stack1[16]
  long  stack2[16]

PUB Main                                   'starts in cog 0
  cognew(flash1,@stack1)            'start next cog which s/be #1
  coginit(2,flash2,@stack2)          'start cog #2
  repeat  'just so we loop here instead of shutting down this cog!!

PRI flash1
  dira[1] := 1
  repeat
    outa[1] := !outa[1]
    waitcnt(clkfreq/1 + cnt)

PRI flash2
  dira[2]~~
  repeat
    !outa[2]
    waitcnt(clkfreq * 2 + cnt)

And here is the homespun listing (homespun -d filename)

0000: 00 b4 c4 04 ' Frequency: 80000000 Hz
0004: 6f          ' XTAL mode
0005: bd          ' Checksum
0006: 10 00       ' Base of program
0008: 64 00       ' Base of variables
000a: ec 00       ' Base of stack
000c: 20 00       ' Initial program counter
000e: f0 00       ' Initial stack pointer

'******************************************************************************
'                             flash_3cogspin.spin                              
'******************************************************************************

'=================================== CONs =====================================
_CLKMODE = 1032
_XINFREQ = 5000000
'=============================== Object Header ================================
0010: 54 00 04 00 ' 84 bytes, 4-1 methods, 0 object pointers
0014: 10 00 00 00 ' ptr #1 to $0020: PUB Main (locals size: 0)
0018: 25 00 00 00 ' ptr #2 to $0035: PRI flash1 (locals size: 0)
001c: 3b 00 00 00 ' ptr #3 to $004b: PRI flash2 (locals size: 0)
'============================ Method #1: PUB Main =============================
'PUB Main                        'starts in cog 0
'------------------------------------------------------------------------------
  cognew(flash1,@stack1)        'start next cog which s/be #1
'------------------------------------------------------------------------------
0020: 37 00          PUSH#kp	2 ($2)
0022: 43             PUSH#	VAR+0
0023: 15             MARK 	
0024: 2c             COGISUB	
'------------------------------------------------------------------------------
  coginit(2,flash2,@stack2)     'start cog #2
'------------------------------------------------------------------------------
0025: 37 21          PUSH#kp	3 ($3)
0027: cb 40          PUSH#.L	VAR+64
0029: 15             MARK 	
002a: 37 00          PUSH#kp	2 ($2)
002c: 3f 8f          REGPUSH	$8f?
002e: 37 61          PUSH#kp	-4 ($fffffffc)
0030: d1             POP.L	Mem[][]
0031: 2c             COGISUB	
'------------------------------------------------------------------------------
  repeat                        'just so we loop here instead of shutting down this cog!!
'------------------------------------------------------------------------------
0032: 04 7e          GOTO 	.-2 (dest:$0032)
0034: 32             RETURN	
'=========================== Method #2: PRI flash1 ============================
'PRI flash1
'------------------------------------------------------------------------------
  dira[1] := 1
'------------------------------------------------------------------------------
0035: 36             PUSH#1	
0036: 36             PUSH#1	
0037: 3d b6          REGPOP	DIRA<>
'------------------------------------------------------------------------------
  repeat
'------------------------------------------------------------------------------
'------------------------------------------------------------------------------
    outa[1] := !outa[1]
'------------------------------------------------------------------------------
0039: 36             PUSH#1	
003a: 3d 94          REGPUSH	OUTA<>
003c: e7             BIT_NOT	
003d: 36             PUSH#1	
003e: 3d b4          REGPOP	OUTA<>
'------------------------------------------------------------------------------
    waitcnt(clkfreq/1 + cnt)
'------------------------------------------------------------------------------
0040: 35             PUSH#0	
0041: c0             PUSH.L	Mem[]
0042: 36             PUSH#1	
0043: f6             DIV  	
0044: 3f 91          REGPUSH	CNT
0046: ec             ADD  	
0047: 23             WAITCNT	
0048: 04 6f          GOTO 	.-17 (dest:$0039)
004a: 32             RETURN	
'=========================== Method #3: PRI flash2 ============================
'PRI flash2
'------------------------------------------------------------------------------
  dira[2]~~
'------------------------------------------------------------------------------
004b: 37 00          PUSH#kp	2 ($2)
004d: 3d d6 1c       REGUSING	DIRA<>, POSTSET
'------------------------------------------------------------------------------
  repeat
'------------------------------------------------------------------------------
'------------------------------------------------------------------------------
    !outa[2]
'------------------------------------------------------------------------------
0050: 37 00          PUSH#kp	2 ($2)
0052: 3d d4 47       REGUSING	OUTA<>, BIT_NOT
'------------------------------------------------------------------------------
    waitcnt(clkfreq * 2 + cnt)
'------------------------------------------------------------------------------
0055: 35             PUSH#0	
0056: c0             PUSH.L	Mem[]
0057: 37 00          PUSH#kp	2 ($2)
0059: f4             MPY  	
005a: 3f 91          REGPUSH	CNT
005c: ec             ADD  	
005d: 23             WAITCNT	
005e: 04 70          GOTO 	.-16 (dest:$0050)
0060: 32             RETURN	
0061: 00 00 00    
'================================ VAR Section =================================
0064: 00 00 00 00 00 00 00 00 ' LONG stack1(16)
006c: 00 00 00 00 00 00 00 00 ' 
0074: 00 00 00 00 00 00 00 00 ' 
007c: 00 00 00 00 00 00 00 00 ' 
0084: 00 00 00 00 00 00 00 00 ' 
008c: 00 00 00 00 00 00 00 00 ' 
0094: 00 00 00 00 00 00 00 00 ' 
009c: 00 00 00 00 00 00 00 00 ' 
00a4: 00 00 00 00 00 00 00 00 ' LONG stack2(16)
00ac: 00 00 00 00 00 00 00 00 ' 
00b4: 00 00 00 00 00 00 00 00 ' 
00bc: 00 00 00 00 00 00 00 00 ' 
00c4: 00 00 00 00 00 00 00 00 ' 
00cc: 00 00 00 00 00 00 00 00 ' 
00d4: 00 00 00 00 00 00 00 00 ' 
00dc: 00 00 00 00 00 00 00 00 ' 
00e4: ff ff f9 ff ff ff f9 ff

Dr_Acula · 2015-04-07 01:45

Ah, now I get it. Brilliant!!

Can we now push it a bit further. Is it possible to add a 'compile to absolute location' just before "PRI flash2"

Looking at the hex, just for argument's sake, so that line gets compiled to hex 0100. Or if not possible, some way of extracting that location by parsing the source code or the hex. Maybe some unique bytes in a dat section you could search for. But ideally, compile to a fixed location, because what would be really useful to is divide up the hub into separate sections and each spin program stays in its area and doesn't go outside of this.

If you can do that, then we can start to think about spin programs that can load other spin programs at certain locations. A spin program to load a binary or a hex file and just to copy some of the bytes to a certain location, and then to fire off a spin interpreter cog.

if that works, can we then look at bits of spin code that have a stack. eg flash1 calls an object, and flash2 calls an object, where are each of those stacks ending up, and more specifically, are they ending up as separate stacks, or is the compiler combining them together?

Cluso99 · 2015-04-07 02:16

OK. I think we need to separate the Flash1 & Flash2 objects into separate files, and add a VAR section to each so we have a full object model to play with.
I will do that now and post shortly.

Postedit
Done that. So I can create standalone style objects But I need to create a base/starter object that has provision for the 1st Object to be our "loadable object(s)". Then any other user objects will be located in hub. Just working out how to do this.

Dr_Acula · 2015-04-07 04:31

Thanks cluso for doing these experiments. I think the propeller has a few tricks yet that we haven't tried out. I see there is a long running language thread of C vs Basic running on the forum - how crazy/nifty/impossible would it to be to neatly sidestep the argument by running multiple languages on the propeller... all at the same time? Allocate separate hub memory for each one, a cog per language, and set them all going.
Back in the olden days, compilers had the .org instruction - compile starting at this memory location. Is there an equivalent in spin?

Dave Hein · 2015-04-07 04:33

A Spin binary can be run at an arbitrary location by adjusting the parameters in the Spin header. This technique is used in spinix and a few other loaders that I've seen. The Spin header is as follows:

bytes 0-3   clkfreq
byte  4     clkmode
byte  5     checksum
bytes 6-7   pbase
bytes 8-9   vbase
bytes 10-11 dbase
bytes 12-13 pcurr
bytes 14-15 dcurr

Normally a Spin program is loaded at location 0. If you load it at another location the starting address must be added to the values of pbase, vbase, dbase, pcurr and dcurr. The area from vbase to dbase is the VAR area, and it must be zeroed to ensure proper operation. The long at dbase must also be zeroed, which is where the top object's RESULT value is located. The area after dcurr is the stack area. The stack size must be sufficient to handle the stack usage of the program.

Cluso99 · 2015-04-07 04:42

There are some orgh etc values in the P2 PropTool but we don't have this in our PropTool.
However, we can do

DAT
  long 0[$1000-$1C]

I tried [$1000-$/4] but it didn't work.

Dr_Acula · 2015-04-07 04:51

Dave said

A Spin binary can be run at an arbitrary location by adjusting the parameters in the Spin header.

Hmm - that could be useful. Ok, can an entire spin program be moved in ram to a different location, and will every single instruction still work, or are there a handful of 'gotcha' instructions?

How about a pile of dummy instructions, even just a dat section, to pad out the beginning of a program so the spin starts at a higher location? Sure, you would not want to do this for a program sitting in eeprom as it wastes space, particularly if the program is 2k long and is designed to run in the upper 2k of hub ram, so now it is 32k long. But storing programs on sd cards etc doesn't cost anything. Would that save having to worry about all those header values?

I'm very impressed with how simple that dual led flasher program is.

Cluso99 · 2015-04-07 04:51

Dave,
What I am trying to do is keep the original hub pbase/vbase/dbase/pcurr/dcurr in hub $0006-000F unchanged, allowing the user to have his spin code following.
But, I also want to allow a loadable module with fixed locations in high hub, with Object Header at say $8000+, the stack at say $8100, and the variables at say $8200, and the code starting at say $8300. To do this, I would need to have a reserved Object Pointer at say $0014 (prt#1) which I can plug in after the hub $8000+++ is loaded, and then issue a coginit (n, xxxx, @xxxx) to run the loaded spin object in cog n.

I just need some more time to think about this.

Dave Hein · 2015-04-07 05:00

Yes, I understand what you want to do. I've done it in spinix by allocating a chunk of memory that is sufficient to hold the program code plus VAR data and the stack, and then reading the program from an SD file into the allocated memory. The header is then adjusted and a new cog is started by specifying the Spin interpreter in ROM at $F004. The PAR parameter points to the CLKMODE value in the header, which is 2 bytes before the PBASE value. This is done because the PAR value must be a multiple of 4.

Using this method I've been able to run 4 Spin programs at the same time under spinix.

Cluso99 · 2015-04-07 14:13

Thanks Dave.
When you say the header is adjusted, do you mean the Dbase etc at hub 0006 or the object pointer at about hub 0014?

I will check out Spinks soon.

I know what Drac wants to do is possible, it's just finding the simplest yet most flexible way.

Dave Hein · 2015-04-07 14:43

All 5 words at the end of the header are adjusted. Here is an edited version of the run routine that's used in spinix.

OBJ
  c : "clibsd"

CON
  SEEK_SET = 0
  STACK_SIZE = 200

PUB run(fname) | infile, size, AppAddr, i, cognum, vbase, dbase, header[4]
 
  ' Open the Spin binary file 
  infile := c.fopen(fname, string("r"))
  ifnot infile
    return -3

  ' Determine the program size including the VAR section
  c.fread(@header, 1, 16, infile)
  size := word[@header][5] ' size is equal to the value of dbase
  c.fseek(infile, 0, SEEK_SET)
  
  ' Allocate the memory plus some for the stack
  AppAddr := c.malloc(size + STACK_SIZE)
  ifnot AppAddr
    c.fclose(infile)
    return -2

  ' Read the program into memory
  c.fread(AppAddr, 1, size, infile)
  c.fclose(infile)

  ' Add offset to the data pointers
  repeat i from 3 to 7
    word[AppAddr][i] += AppAddr

  ' Get VBASE and DBASE
  vbase := word[AppAddr][4]
  dbase := word[AppAddr][5]

  ' Intialize the stack frame and RESULT variable
  long[dbase][-2] := $fff9ffff
  long[dbase][-1] := $fff9ffff
  long[dbase]~

  ' Zero the VAR area
  longfill(vbase, 0, (dbase - vbase - 8) >> 2)

  ' Start up the program in a new cog
  cognum := cognew($f004, AppAddr + 4)
  if cognum == -1
    c.free(AppAddr)
    return -4

  return cognum

Cluso99 · 2015-04-07 15:25

Dave,
Thanks heaps!
I was looking at Spinix and partially understood what you were doing. Then checked back here and you above post made things gel.

So basically, you load a "compiled object" into hub at any free space (you allocated this). This hub base area (AppAddr) contains the freq(4), xtalmode(1), checksum(1),pbase(2),vbase(2),dbase(2),pcurr(2),dcurr(2), and then the object header, spin bytecode, var. The new stack will be built above this.
Then you add the new AppAddr to the pbase(2),vbase(2),dbase(2),pcurr(2),dcurr(2) to form the correct offset values due to relocating the object load.
Then you calculate the VAR size and clear it.
Then you initialise the stack by clearing the RESULT variable (first on the stack), followed by the 2 x $FFF9FFFF stack frames (is this bit required??? as I have not seen it done to objects other than the first).
Lastly, you start a cog with the COGNEW(spininterpreter, AppAddr+4). The App+4 points to the xtal mode due to longs = addr(pbase) -2

I had been thinking I had to predefine some areas and then try to load the object pointers, object code, variables, and stack into fixed locations. I was also worried that I had to change the lower hub pbase/etc at hub 0006+, and the implications if another spin program ran one of those preloaded objects/methods. This solves all these problems

Cluso99 · 2015-04-07 15:44

Dave,
Just looking at the *.binary files created for an object.

I had presumed the VAR and the start of the stack frame (2x FFFFFFF9) had been stored in the binary file.

Isn't the first FFFFFFF9 the RESULT. You are loading
00000000 <- RESULT
FFFFFFF9
FFFFFFF9
When examining the PropTool visual binary they have
FFFFFFF9 <- RESULT
FFFFFFF9

Am I missing something???

Multitasking musings

Comments