Overlays for large spin programs

Dr_Acula · 2014-01-22 01:45

I've got this obsession about trying to run large spin programs on the propeller! Searches for this on google seem to recursively point me back to old threads that I have long since forgotten I started, so apologies in advance for sounding like a broken record

Ok, let's say we have a propeller chip, we have some mass storage such as an SD card and we have a display for debugging such as a VGA display. Many propeller boards will be able to do this. Further, just to make the challenge a bit more interesting, can this be done with just the proptool?

I'm going to note at this stage there are at least 4 propeller compilers that may be able to do this - proptool, homespun, BST and sphinx. Maybe openspin. sphinx might be the easiest to modify as it is written in spin and works on the propeller, but I'm going to try the proptool first.

What are overlays? Well, I'm glad you asked. Overlays http://en.wikipedia.org/wiki/Overlay_(programming) were popular before virtual memory drivers came along. In a simple sense, you have a core program, and then you have other parts of the program that are loaded into memory and run as needed. For a very simplistic model, a propeller might have the lower 16k of hub devoted to the core program which would contain the SD driver and the graphics driver, and then other parts of the program are loaded in as needed.

My understanding of overlays is they were used back in the 1980s when memory was typically 64k or less, but they were a reinvention of a technique used in mainframes 20 years earlier. What is interesting is the systems that used overlays had memory that was equal or less than that available in the propeller. So, if the boffins of yore could do this, we should be able too. Or to translate this to propeller-speak, this is totally impossible *grin*

Let's try hacking the proptool. What we need is to have 2 files. We take file 1, which has three objects, SD, VGA and Overlay1 and we compile that. The we have file 2, which also has three objects SD, VGA and Overlay2, and we compile that. Now we need to run the program and move Overlay1 or Overlay 2 into higher hub memory and see if we can swap them in and out.

First step is to create a common point in both programs. Some of the propeller compiles have the @@@ symbol but let's try the proptool first. Compiled spin programs don't start at address 0 in hub ram, so we need to work out how the compiler is working. Dat sections seem to go first, and with our two programs, the Dat sections are going to be the same so that should not matter. Var seems to be compiled after the first Pub so that makes things easier. But the location of the first Pub seems to move depending on how many objects are added. And further, it doesn't matter where in the code the objects are listed, they all end up at the beginning.

I'm using a string print routine to track where things are ending up and using F8 to compile. If I print a string, and have one object, the string appears at address 0x2A. If I add a second object that moves to address 0x2E. So it appears there is a lookup table of objects at the beginning of the program with one long per object. I think looking further at the bytes, that some of those bytes are the jump location in code.

So if program 1 and program 2 both contain the same number of objects then the jump locations should not change.

I think what we need now is dummy overlays. So program 1 might be compiled with a call to Overlay1, and from within overlay1 we have an overlay loader (read a binary file from SD) and then continue with the program with calls to Pubs within that overlay. Then the program needs to return and load up Overlay2. I'm not sure of the exact details of this, but I think that if code is swapped in and out of high memory and both programs are compiled with the same core objects and one dummy overlay object, all the jump locations should be the same. Further, it should be possible to pass one long which is a pointer to a list of longs in hub and this can be a way of passing data back and forth.

There are many compiler options but I'm thinking of staying with the proptool, and compiling two binary programs with a simple F8 and then save to SD card. They will be different sizes, but if this works, the core part of the program with the SD and display driver should be the same. If we have some sort of data pointer in the program, eg some text "end program", it should be possible for a program to work out how big it is itself by searching through its own hub. It can then load in another binary program, search for this same string, and then after that string start moving that data into higher memory. So we can then load the top part of the program in and out of high memory without the lower part of the program realising it has happened because the lower part of the program is the same.

This is the theory. I'm not entirely sure of the syntax within a spin program, but I think it will end up with a core program with lots of overlays commented out, and you uncomment them one at a time and recompile each time for a new binary file. The structure of the program is up to the programmer, but considering a game for instance, one might have one overlay that involves artificial intelligence (eg in pacman) and a new overlay that involves moving sprites around as per the recalculated positions. This means overlays are not being moved in and out of memory too often as there is obviously a time cost associated with this. At the very simplest level, the code is something like

PUB main
start sd driver
start vga driver
repeat
  load overlay1
  run overlay 1
  load overlay2
  run overlay2

Will this outperform a cached memory solution using virtual memory? Possibly not, and I suspect the GCC coders have a better solution. But... Spin can't run using cached memory, so there are not really any other options for Spin.

I'd be very interested in thoughts from forum members. Is this crazy?

Cluso99 · 2014-01-22 02:05

IIRC Sphinx is able to compile objects one at a time, and ultimately it links these together. IIRC bst can also compile an object separately too.

I think the relocation of those compiled objects is quite simple. David Betz might be the best person to answer this as I think he worked out what needs to e done to relocate an object, which is after all, what you are trying to do. Each object would then become an overlay.

David Betz · 2014-01-22 04:26

Cluso99 wrote: »

IIRC Sphinx is able to compile objects one at a time, and ultimately it links these together. IIRC bst can also compile an object separately too.

I think the relocation of those compiled objects is quite simple. David Betz might be the best person to answer this as I think he worked out what needs to e done to relocate an object, which is after all, what you are trying to do. Each object would then become an overlay.

Actually, I have not done this but I did ask a similar question recently and I was told that relocating an object involves adjusting the VBASE, PBASE, etc values in the object header. I'm sure that the knowledgable person who answered my quesiton will jump in here and give you the whole story.

Dave Hein · 2014-01-22 06:19

A Spin object can be loaded anywhere in hub RAM and called by properly setting up PBASE, VBASE, DBASE, PCURR and DCURR. The MethodPointer object does this for resident objects by storing the values of the first 4 state variables in a 4-word array, and then setting up DCURR based on the current stack pointer. So it would be fairly easy to implement overlays using this technique. I'm not sure how efficient this would be. It depends on how often you would have to switch overlays.

David Betz · 2014-01-22 06:33

Dave Hein wrote: »

A Spin object can be loaded anywhere in hub RAM and called by properly setting up PBASE, VBASE, DBASE, PCURR and DCURR. The MethodPointer object does this for resident objects by storing the values of the first 4 state variables in a 4-word array, and then setting up DCURR based on the current stack pointer. So it would be fairly easy to implement overlays using this technique. I'm not sure how efficient this would be. It depends on how often you would have to switch overlays.

Hi Dave,

Thanks for the summary of how to relocate Spin objects. Could you explain a little what's happening here? For example, if I have two instances of a Spin object, which of these values are the same for both instances and which vary? Also, if I want to run the same object on two different COGs then I must have a separate stack for each COG. Which of these variables indicates where the stack lives?

Thanks!
David

Dave Hein · 2014-01-22 06:53

Two instances of the same object will have the same PBASE and PCURR values, since they are both executing the same code. PBASE is just the absolute starting address of the object, and PCURR is the Spin program counter. Each method within the object uses a value for PCURR that points to the first bytecode in the method.

VBASE is the absolute starting address of the object's VAR variables. This is different for each instance of the object. DBASE is the absolute address of the RESULT variable when calling a method. This is dependent on the location in the stack at the time that a method is called. DCURR is the stack pointer. When a method is called, DCURR is set to the current stack pointer value plus space for the stack frame, local variables and calling parameters.

The stack frame is a 4-word structure that is placed on the stack when calling another method. It contains the values of PBASE, VBASE, DBASE and PCURR just before a method is to be called. The stack frame is used by the RETURN and ABORT bytecodes to restore the Spin state variables. The stack frame also contains two additional bits to indicate whether the return value should be pushed to the stack, and whether an ABORT should stop returning at this point or cause a return to the previous method.

David Betz · 2014-01-22 07:07

Dave Hein wrote: »

Two instances of the same object will have the same PBASE and PCURR values, since they are both executing the same code. PBASE is just the absolute starting address of the object, and PCURR is the Spin program counter. Each method within the object uses a value for PCURR that points to the first bytecode in the method.

VBASE is the absolute starting address of the object's VAR variables. This is different for each instance of the object. DBASE is the absolute address of the RESULT variable when calling a method. This is dependent on the location in the stack at the time that a method is called. DCURR is the stack pointer. When a method is called, DCURR is set to the current stack pointer value plus space for the stack frame, local variables and calling parameters.

The stack frame is a 4-word structure that is placed on the stack when calling another method. It contains the values of PBASE, VBASE, DBASE and PCURR just before a method is to be called. The stack frame is used by the RETURN and ABORT bytecodes to restore the Spin state variables. The stack frame also contains two additional bits to indicate whether the return value should be pushed to the stack, and whether an ABORT should stop returning at this point or cause a return to the previous method.

Thanks for the details! Is this information posted somewhere so we don't always have to search the forums to find it or ask you? :-)

MagIO2 · 2014-01-22 07:34

Hmmm ... isn't this "load an binary into some RAM and let it run" common sense?

From what I remember, I also answered at least one of your ancient threads, Dr Acula.

What I do in my CogOS and in another big command driven program is:
1. Drivers (like FullDuplex or TV or keyboard, SD card...) are separated in a PASM and a spin part.
2. The PASM mailbox is created at the end of HUB-RAM by a memory manager which also knows which drivers are running
3. The spin part is told by an init function where to find the mailbox

The overlays only need to use those stripped-down spin parts of the drivers. They ask a stripped down version of the memory manager where to find the mailbox, init the spin driver and then can use the ressource.
Passing parameters to the overlay works as well as starting functions in the overlay loader (somehow similar to an INT13 interrupt) and returning a result.

Maybe it would be an improvement if it is possible to share one spin driver between overlay and overlay loader, but so far my overlays are small enough.

LoopyByteloose · 2014-01-22 08:44

Spin objects? I suspect that the same can be done in Forth and run faster. Ask Dave Hein how Spinx might do this with an SDcard. You would also have much more flexiblity within Forth than in the .spin compiler. Of course, if your Spin binaries are inclusive of PASM code, that is a very different story as the assembler always outperforms all else.

Dave Hein · 2014-01-22 08:46

Running PASM programs in other cogs is different than running Spin overlays in the same cog.

Heater. · 2014-01-22 08:49

Overlays! Runs screaming!

We had "overlays" in the 1970's on the Marconi Locus 16 embedded mini-computer for radar systems. Except they wern't really overlays in the modern sense but rather a means of switching in chunks of RAM to high memory that has previously been loaded with code.

We had overlays in the late 70's early 80's with CP/M. Sometimes as hardware memory bank switching, as above. Sometime using actual "overlaying linkers" that would build programs bigger than memory space and swap functions in and out of disk.

We had overlays in the 1980's on the PC. Whilst working at Racal on a schematic design and PCB layout package called CADSTAR. CADSTAR was written in PL/M (A Pascal/C like language) and had to fit all it's code and the design data into 640K. This was done with PLINK an "overlaying linker".

The thing about an overlaying linker is that you can tell it how to partition your program into independent parts that can be swapped in and out of high memory. Of course those parts can have independent sub-parts and so on in a nested fashion. This was all specified in a big linker configuration file. With that in place your code could call functions in overlays directly. The linker would insert code that would intercept those calls, load the appropriate overlay, find the function in it and call it.

Sweet.

Except as your program grows his becomes a real pain to organize. Then you find that the amount of common, shared, code you need in the "resident" section becomes too big. All in all it was a pain to build programs with overlays.

Then you start adopting the various methods of expanding the PC memory, bank switching basically. That gets you more space for design data. But organizational problems grow and grow!

Then comes 32 bit machines and Linux, virtual memory, shared libraries and swap files. And we are happy at last

Dave Hein · 2014-01-22 08:52

Loopy Byteloose wrote: »

Spin objects? I suspect that the same can be done in Forth and run faster. Ask Dave Hein how Spinx might do this with an SDcard. You would also have much more flexiblity within Forth than in the .spin compiler. Of course, if your Spin binaries are inclusive of PASM code, that is a very different story as the assembler always outperforms all else.

In Forth you could load some Forth code, and then remove the Forth code by using the FORGET word, and load more Forth code, and so on. Or you could generate pre-compiled Forth code that runs in a fixed overlay area. That would be similar to the Spin-Overlay idea.

In Spinix, I would use a shell script to run large programs that have been broken up into smaller programs. That how it runs the Spin compiler. The spc script runs spinit, spasm and splink to compile, assemble and link a spin program.

David Betz · 2014-01-22 09:08

Dave Hein wrote: »

In Forth you could load some Forth code, and then remove the Forth code by using the FORGET word, and load more Forth code, and so on. Or you could generate pre-compiled Forth code that runs in a fixed overlay area. That would be similar to the Spin-Overlay idea.

In Spinix, I would use a shell script to run large programs that have been broken up into smaller programs. That how it runs the Spin compiler. The spc script runs spinit, spasm and splink to compile, assemble and link a spin program.

You support shell scripts in Spinix? What does the scripting language look like?

Dave Hein · 2014-01-22 09:16

The Spinix scripting language is a subset of Bash. Here's the spc script that runs the Spin compiler.

#shell
if [ $# -ne 1 ]
then
echo "usage: spc file"
exit
fi
echo spinit $1
spinit $1
if [ $? -ne 0 ]
then
exit
fi
echo spasm $1
spasm $1
if [ $? -ne 0 ]
then
exit
fi
echo splink $1.bin clibsd.bin $1
splink $1.bin clibsd.bin $1

David Betz · 2014-01-22 09:18

Dave Hein wrote: »
The Spinix scripting language is a subset of Bash. Here's the spc script that runs the Spin compiler.
#shell
if [ $# -ne 1 ]
then
echo "usage: spc file"
exit
fi
echo spinit $1
spinit $1
if [ $? -ne 0 ]
then
exit
fi
echo spasm $1
spasm $1
if [ $? -ne 0 ]
then
exit
fi
echo splink $1.bin clibsd.bin $1
splink $1.bin clibsd.bin $1

Nice to see something like that on the Propeller! Makes me wonder why people are asking for a self-hosting development platform. Looks like we already have it!

LoopyByteloose · 2014-01-23 02:10

David Betz wrote: »

Nice to see something like that on the Propeller! Makes me wonder why people are asking for a self-hosting development platform. Looks like we already have it!

YES !!!! This is a very sweet resource that people are just beginning to see the utility and FUN of using. Spinx is a delightful system setup that makes the Propeller with an SDcard very extensible and powerful.

The dilemma is that people are very wary of Forth. The public perception is that it is passe. But it is better than ever when you apply 32bits, 8 CPUs, and have 32K of ram extended by a SDcard interface.

Dave Hein · 2014-01-23 04:55

Loopy Byteloose wrote: »

The dilemma is that people are very wary of Forth. The public perception is that it is passe. But it is better than ever when you apply 32bits, 8 CPUs, and have 32K of ram extended by a SDcard interface.

I'm not sure what the dilemma is. Spinix can be used without running pfth. If someone really hates Forth they can remove the forth directory from the SD card. On the other hand, if someone is interested in trying Forth I think that Spinix provides a nice environment for it.

LoopyByteloose · 2014-01-23 08:43

Hi Dave,
I wasn't aware that Spinx had progressed that far. I am still focusing on Forth for my own personal learning, and pfth is being very rewarding.

Dr_Acula · 2014-01-23 16:46

Thanks for all the great ideas!

So - consider a large spin program - object 1,2,3,4 each compiles to 15k so the total size is 60k. This clearly won't fit in the hub. For simplicity lets assume it is all spin with no pasm, and no cogs are being started or stopped. Object 1 stays in memory and objects 2,3,4 are loaded in and out as required.

The program is written in the proptool and so there are no #ifdefs. Blocks of code are commented in and out as needed and the program is compiled 3 times - object 1+2, object 1+3 then object 1+4. These 3 binary files, each 30k in size, are copied to an SD disk and the program starts off running the program in object 1+2.

The main object passes control to each object in turn.

I think the clue to getting this working is Dave Hein's previous work - in fact I think much of the code might be already written. His code is below and explains the whole pbase vbase concept. The main difference I am looking for is a way of including an SD loader, and making the tasks much bigger.

'******************************************************************************
' Spin Task Swapper Test Program
' Copyright (c) 2011, Dave Hein
' See end of file for terms of use.
'******************************************************************************
con
  _clkfreq = 80_000_000
  _clkmode = xtal1+pll16x

obj
  ser : "FullDuplexSerial"

var
  word task1[5]
  word task2[5]
  word task3[5]
  long stack2[20]
  long stack3[20]

pub Method1 | i
{{
  This program tests the task swapping routines.  This method runs on the
  first task thread.  It sets up the task pointers for the second and third
  tasks, and performs a periodic task swap to the second task.
}}
  ' Wait two seconds so user can start prop terminal
  waitcnt(clkfreq*2 + cnt)

  ' Initialize serial I/O  
  ser.start(31, 30, 0, 115200)

  ' Initialize the task pointers for Method2 and Method3
  SetTaskPtr(@task2, @stack2, 0, 2)
  SetTaskPtr(@task3, @stack3, 0, 3)
  
  ' Do a dummy call to SwapTask to initialize it
  SwapTask(@task1, @task2)

  ' Begin task 1's test loop
  repeat i from 1 to 10
    ser.str(string("task 1, loop "))
    ser.dec(i)
    ser.tx(13)
    SwapTask(@task1, @task2)

  ' Task 1's sleep test loop
  repeat i from 11 to 20
    Sleep(1000, @task1, @task2)
    ser.str(string("task 1, loop "))
    ser.dec(i)
    ser.tx(13)

pub Method2 | i
{{
  This is the method for the second task.  It performs a periodic
  task swap to the third task.
}}
  ' Task 2's test loop
  repeat i from 1 to 10
    ser.str(string("task 2, loop "))
    ser.dec(i)
    ser.tx(13)
    SwapTask(@task2, @task3)

  ' Task 2's sleep test loop
  repeat i from 11 to 100
    Sleep(500, @task2, @task3)
    ser.str(string("task 2, loop "))
    ser.dec(i)
    ser.tx(13)

  ' Do not exit from task.  Switch to next task instead
  repeat
    SwapTask(@task2, @task3)

pub Method3 | i
{{
  This is the method for the third task.  It performs a periodic
  task swap to the first task.
}}
  ' Task 3's test loop
  repeat i from 1 to 10
    ser.str(string("task 3, loop "))
    ser.dec(i)
    ser.tx(13)
    SwapTask(@task3, @task1)

  ' Task 3's sleep test loop
  repeat i from 11 to 100
    Sleep(250, @task3, @task1)
    ser.str(string("task 3, loop "))
    ser.dec(i)
    ser.tx(13)

  ' Do not exit from task.  Switch to next task instead
  repeat
    SwapTask(@task2, @task3)

pub Sleep(msecs, taskptr1, taskptr2) | count
{{
  This routine switches to taskptr2 until the number of milli-seconds defined
  by "msecs" has elapsed, after which it will return to the caller.
}}
  count := (msecs * (clkfreq/1000)) + cnt
  repeat while (count - cnt) > 0
    SwapTask(taskptr1, taskptr2) 

pub SetTaskPtr(taskptr, stackptr, objnum, methnum) | pbase, vbase, doffset, pcurr, dbase, index
{{
  This routine initializes the values in taskptr to the initial state for the method defined by objnum
  and methnum.  objnum and methnum must match the indices used in the calling object's method table for
  the target method.  objnum must be zero for a target method that is local to the calling object.  The
  target method must not have any calling parameters.  stackptr points to the stack that will be used by
  this task.
}}
  dbase := @result                    ' Get current dbase
  pbase := word[dbase][-4] & $fffc    ' Get caller's pbase
  vbase := word[dbase][-3]            ' Get caller's vbase
  if objnum                           ' If non-zero object number update pbase and vbase from new object
    index := (objnum << 1)            ' Word index is two times object number
    vbase += word[pbase][index + 1]   ' Add object's var offset to vbase
    pbase += word[pbase][index]       ' Add object offset to pbase
  index := (methnum << 1)             ' Word index is two times method number
  pcurr := word[pbase][index] + pbase ' Get method's absolute starting address
  doffset := word[pbase][index + 1]   ' Get method's stack variable space size
  word[taskptr][0] := pbase           ' Save pbase in method pointer
  word[taskptr][1] := vbase           ' Save vbase in method pointer
  word[taskptr][2] := stackptr        ' Save dbase
  word[taskptr][3] := pcurr           ' Save method's starting address in method pointer
  word[taskptr][4] := stackptr + doffset + 4 ' Save stack variable space size in method pointer
  long[stackptr] := 0

pub SwapTask(taskptr1, taskptr2)
{{
  This routine performs a context switch from one task to another task.  It saves the return state information
  in taskptr1 and writes the state information in taskptr2 to the Spin interpreter's state variables.
  It currently uses location $7ffc to store the value of taskptr2 so that it will not need to depend on the
  values of pbase, dbase or vbase to access the pointer.  It also calls the WriteOps routine, which overwrites
  the opcode for outb with the values for the Spin state variables.  A dummy call to SwapTask must be done
  to run WriteOps the first time.  Subsequent calls to SwapTask will set pcurr, and WriteOps will not be called.
  The used of location $7ffc and WriteOps can be eliminated by rewriting this routine in Spin bytecodes.
}}
  ' Save the return information in taskptr1
  word[taskptr1][0] := word[@result][-4] & $fffc ' pbase
  word[taskptr1][1] := word[@result][-3]         ' vbase
  word[taskptr1][2] := word[@result][-2]         ' dbase
  word[taskptr1][3] := word[@result][-1]         ' pcurr
  word[taskptr1][4] := @result - 8               ' dcurr
  
  ' Copy the information in taskptr2 to the Spin state variables
  long[$7ffc] := taskptr2        ' Save the task pointer at $7ffc so we don't depend on pbase, dbase or vbase
  outb := word[long[$7ffc]][0]   ' Set pbase
  outb := word[long[$7ffc]][1]   ' Set vbase
  outb := word[long[$7ffc]][2]   ' Set dbase
  outb := word[long[$7ffc]][4]   ' Set dcurr - causes a switch to the new stack
  outb := word[long[$7ffc]][3]   ' Set pcurr - causes a jump to the task entry point
  WriteOps

pub WriteOps | addr
{{
  This routine writes the register address for pbase, vbase, dbase, dcurr and pcurr into the
  code of the calling routine.  It requires that these register addresses are used in the
  calling routine at the precise offsets that are used in this routine.
}}
  addr := word[@result][-1]  ' Get return address
  byte[addr-39] := $ab       ' $1eb - pbase
  byte[addr-31] := $ac       ' $1ec - vbase
  byte[addr-22] := $ad       ' $1ed - dbase
  byte[addr-13] := $af       ' $1ef - dcurr
  byte[addr-4]  := $ae       ' $1ee - pcurr


  
{{
+------------------------------------------------------------------------------------------------------------------------------+
|                                                   TERMS OF USE: MIT License                                                  |
+------------------------------------------------------------------------------------------------------------------------------+
|Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation    |
|files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy,    |
|modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software|
|is furnished to do so, subject to the following conditions:                                                                   |
|                                                                                                                              |
|The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.|
|                                                                                                                              |
|THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE          |
|WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR         |
|COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,   |
|ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.                         |
+------------------------------------------------------------------------------------------------------------------------------+
}}

Dave Hein · 2014-01-23 18:32

I had forgotten about the Spin task swapper. It's probably a bit more than is needed to do overlays. The task swapper requires a stack for each task. An overlay swapper could be implemented with a single stack. However, the basic idea of manipulating the Spin state variables is the same for both the task swapper and the overlay swapper.

Dr_Acula · 2014-01-23 19:01

Hi Dave. Thanks for the info - good to hear this is on the right track. I might need a bit of guidance on stacks.
Looking at your code, each task is calling objects like the serial object. Normal spin has limitations too such as objects not being able to reference back to other objects declared in the main routine. But there are other solutions - eg an overlay could replicate all the spin code for running the serial driver, and just pass it the location in hub and a new serial driver can be incrementing head, tail and transmitting and receiving data.
So - for simplicity, each overlay consists of spin code, and if any objects are needed within that spin code, they are included in the overlay object, even if that duplicates code already in the main object (and assuming no conflicts like trying to talk to the same hub locations). So things are as simple as possible - the overlay is a self contained piece of code.
I'm still not sure about jumps in spin - whether they are relative or absolute. If they are relative, then code is easier to relocate to different memory locations. But I suspect jumps are absolute, so if they are, I'm thinking of ways of compiling code.
If an overlay is self contained, could one compile it to always run at a fixed location in ram? Say it runs at location 20,000 in hub. DAT code seems to end up at the start of a program, so could one pad out the beginning of a spin program with a DAT section consisting of 20,000 NOPs.
Hmm - that won't work. If the overlay has any DAT code it will end up at the beginning as well, with an unknown length.
Maybe instead of the overlay sitting in high ram, it would be easier to relocate to low ram? Then it becomes like a standard Spin program.
So we kick off the program with a bootloader that has a jump to a PUB Main, and we put in a DAT with 20,000 NOPs and the compiler then reorders this so the DAT is at the beginning and then jumps to the PUB which is now running from higher memory. Now we load a precompiled overlay into lower memory. Maybe we need to save the object list first and then reload that back in, as the object list sits at the beginning of the program.
Hmm- tricky.

Dave Hein · 2014-01-23 19:38

Jumps in Spin are relative. Variables and code positions are all relative to PBASE, VBASE and DBASE. This makes it very easy to relocate Spin objects.

Dr_Acula · 2014-01-23 20:02

Ah, that simplifies things. I'll think about this some more.

David Betz · 2014-01-23 20:03

Dave Hein wrote: »

Jumps in Spin are relative. Variables and code positions are all relative to PBASE, VBASE and DBASE. This makes it very easy to relocate Spin objects.

I'm not sure what you mean by "Spin objects". You're not talking about the things defined in Spin OBJ statements I guess. I assume you're refering to complete compiled Spin programs in object format. If not, how does one Spin object invoke a method in another?

Dr_Acula · 2014-01-24 03:09

Jumps in Spin are relative

Ok, the challenge is to try to hack the proptool in the simplest way. Let's check that.

Program 1

and now add a little DAT section
Program 2

We know that the DAT section gets moved so it ends up before the program. So this means the program starts at 0x18. Then we can infer that the PUB compiles to 36 65 04 7C 32 00
The "repeat" presumably compiles to a hidden GoTo. So then we can look at this PUB code and confirm that yes, indeed, it is the same code but in a different location. So in fact, the "repeat" is compiling to a "GoTo PC +/- a certain number"

The code at the beginning of programs is complicated, so lets leave the complicated stuff at the beginning and jump to a simple routine at the top of ram. So we can maybe compile a program which might be

DAT 31000 dummy bytes
PUB main
  repeat
    load overlay1
    jump to location 18H 
    jump back here, possibly with some self modifying code in the loaded overlay
    unload overlay1
    load overlay2 etc

I'm not entirely sure about stacks and the like, but the general idea is to have the main routine kept very simple and running in high memory. Also in high memory needs to be the code to actually load and unload overlays - eg an SD driver, or an SRAM or Flash driver. Probably whichever is the smallest code. I'm kind of drawn to SPI SRAM with 4 data lines.

I'm not sure of the details, but I think there must be a solution where you have a self contained spin program where the main has been relocated to near the top of ram, then you run it, and then trick the program by moving different code into lower ram, running that, then unloading it and moving the original code back. If the stack etc are preserved, the program never need know that a new bit of code was moved in and out of lower ram.

Cluso99 · 2014-01-24 04:52

Drac,
You might be better off using homespun or bst and look at the listing. There you can see where code is being located and what comes first. IIRC bst is capable of reserving space at the bottom of hub before any spin objects. I don't recall the command, but it was so we could access the lower hub ram using rd/wrbyte/word/long and the immediate hub value mode.

Dave Hein · 2014-01-24 04:57

David Betz wrote: »

I'm not sure what you mean by "Spin objects". You're not talking about the things defined in Spin OBJ statements I guess. I assume you're refering to complete compiled Spin programs in object format. If not, how does one Spin object invoke a method in another?

Yes, I'm referring to a Spin program. I think the easiest way to implement overlays is to have a main program that stays resident in memory and overlays that are complete Spin programs. The resident program will load an overlay program, and the main method in the overlay. The easiest way to do this is to use the same number of parameters for the main method in each overlay. Also, instead of directly manipulating the Spin state variables a dummy object can be defined in the method table. Ideally, this would be the last object in the table. I'll see if I can put together a simple example.

Dr_Acula · 2014-01-24 14:28

@Cluso, yes those other programs will be helpful. Ideally I'd like to see some sort of structure that works on any of the compilers.

Dave, yes I am thinking along the same lines. Overlays are complete spin programs. Same number of parameters so a common skeleton structure.
I'm wondering about something like

PUB start
   main ' gives us a jump to the stay resident code at the top of memory
PUB overlay
   routine1
   routine2
   routine3
   ... routine maximum??
PUB dummy code
   16k to 30k of dummy code here - dat won't work because that will be compiled in front of PUB start.
PUB main
   start serial
   start display
   start SD card
   repeat
     load overlay1
     overlay   ' this jumps back to PUB overlay at the beginning of the program, but it could be a different overlay
     load overlay2
     overlay

If that would work, need to also think about DAT and VAR and OBJ - do they need to be the same structure for the overlays?

Heater. · 2014-01-24 14:54

Surely what you are describing now, a complete spin program invoked via it's main method, is not an "overlay" mechanism but a "program loader"?
Sounds like you are building an operating system.

Dr_Acula · 2014-01-24 19:05

Good point heater. Kyedos can reload a complete spin program but it is a bit slow. Maybe it is an operating system - I guess you could argue the definitions. I think the main thing is flexibility - the ability to load an overlay that could be a 20k compiled spin program, or it could be just one small PUB. If it works it could do both of these.

Duane Degn · 2014-01-24 20:21

I have several projects where I'm running out of RAM on the Propeller. One of these projects is my robot remote.

I used to take the Kyedos route and load a second program when certain menus were requested but I've recently figured out how to move all the touchscreen menu data onto the SD card which has greatly reduced the size of the program. I no longer need to break it up into two programs. I fear my consolidation down to a single program may be short lived. I can store information about a menu's layout and contents on the SD card but there are times when there will be something unique about a menu or display that doesn't lend itself to being stored externally and requires a unique method. These unique methods continue to accumulate, threatening my single program arrangement.

It would sure be nice if these unique methods could be stored externally and used as a sort of "overlay".

I'll continue to watch these discussions with interest.

Overlays for large spin programs

Comments