Post Boot PASM COG Loader?

jazzed · 2009-01-14 19:58

Anyone created boot loader that would not include PASM until after startup?

The PASM code would have well defined interfaces and be compiled independent
of Spin. Spin would have small API wrappers as is typically used already for
PASM interface. The difference would be not having to carry it all in the binary.

It seems that using such an approach would let most of 32K Hub memory be
dedicated to Spin only except for maybe SDcard code. Having all the 32 bit per
PASM instruction code·in the binary seems quite wasteful.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

heater · 2009-01-14 20:53

Funny you should ask that. I was just pondering ways of reclaiming the 2K byte space taken up by the PASM 8080 emulator code. Currently this loaded once from the binary and hence from the RAM and thereafter constitutes a huge waste of HUB RAM given that I never want to stop and restart it. Same goes for all the other bits of PASM.

Now I can fiddle with my code and arrange for the PASM emulator code to sit where the emulated 8080 expects it's RAM to be when it runs thus reclaiming the space. And I probably will one day, but it messes up the code structure and makes it harder to maintain.

Some way of keeping the PASM part of the binary on SD card or other EEPROM would be great. But how to compile it? The spin tool needs some kind of "memory segments" idea so that you can tell it where to put things and still have them linked nicely against the SPIN parts.

In the worst case we have 8 * 2 = 16K of our HUB RAM wasted in "dead" PASM code. 50% !!!

Ideally the boot EEPROM would have been twice as big (is/was there such a thing?) and the SPIN tool would have put the PASM parts in the extra space when told to.

Perhaps mpark or BradC could add such a "segment" feature like this to their already wonderfull compilers to allow us to pull out the PASM blobs into SD card or wherever we like.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Carl Jacobs · 2009-01-14 21:35

I've put a fair bit of thought into this, but not yet found the time to write the specific loader code.

I agree with heater that possibly the best place to put the PASM parts is in the top 32K of a 64K EEPROM (you could even use the top section of the bottom 32K).

Consideration needs to be given to how the PASM code is written. All parameters passed to the assembly COG need to be loaded via the PAR register, otherwise the code relocation becomes more of a challenge. The following example is bad for relocatable code:

PUB Start(baudrate)
  ...
  aBaud := CLKFREQ / baudrate
  ...

DAT
  ...
  aBaud    long  0
  ...

The following is much better:

VAR
  LONG baud_rate

PUB Start(baudrate)
  ...
  baud_rate := CLKFREQ / baudrate
  cognew(@entry, @baud_rate)
  ...

DAT
  ...
entry      rdlong aBaud, PAR
  ...
  aBaud    res  0
  ...

All of my recently written code has been written in this way for when I eventually write a PASM loader/linker/mapper tool. My recently submitted serial port object discussed in another post follows these principles for this reason. In that object I was trying to save even *more* RAM by having the FIFO buffers in the COG.

If the 64K of EEPROM is divided into 2K segments, then there are a total of 32 segments, of which probably at most 6 will be used in the typical large application - this still represents a saving of up to 14K of RAM!! although in reality none of the PASM COGs are ever all that full, so the real saving will be more like half of that.

The whole idea of having segments, is that it is probably easier to manage the linker if you know that you'll be using segments 0 to 6 which then get hardcoded into the spin file. It would probably be worth including a simple checksum as part of the PASM image to prevent the loading and execution of non-existent code.

Regards,

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Carl Jacobs

JDForth - Forth to Spin Compiler http://www.jacobsdesign.com.au/software/jdforth/jdforth.php
Includes: FAT16 support for SD cards. Bit-bash Serial at 2M baud. 32-bit floating point maths.·Fib(28) in 0.86 seconds. ~3x faster than spin, ~40% larger than spin.

jazzed · 2009-01-14 21:36

Ya, I figured I wasn't the only one to consider this [noparse]:)[/noparse]

Compile the file the ordinary way and exctract the PASM. After that it's simple. I've done this often with C.
Load the binary into a Propeller buffer and cognew with the address and the user parameter buffer.
I guess one needs a generic 2K byte buffer to get started; reuse space from the load device DAT section ....
Some way to check the interfaces would be nice, but we have to do that by hand today anyway.

I was thinking that just about any function that uses local·data could be made in PASM and called by Spin.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Post Edited (jazzed) : 1/14/2009 9:45:57 PM GMT

Phil Pilgrim (PhiPi) · 2009-01-14 21:57

You can always reuse the hub's DAT space for data, buffers, stacks, etc., once the ASM code has been "encogged". For example, a serial object's FIFO buffer could easily share the hub RAM occupied by its assembly code. Basically, you just pass the hub address of the code itself via the PAR register when issuing a COGNEW, and let the assembly startup routine initialize it as a buffer area. Of course, this works only if the driver is started once, from a single object.

If this is not the case, the driver could act as its own bootloader, reloading itself directly out of EEPROM. For this method, you'd pass a the address of the long just before the code. This long is initialized to zero and points to the last assigned buffer space. In the assembly initialization code, if it reads a zero, it assigns it a value of PAR + 4 * BootloaderLongs + 4. Otherwise, it just increments it by the size of the buffer(s). It can always find itself in EEPROM at PAR + $8004 and begin loading from PAR + $8004 + 4 * BootloaderLongs.

-Phil

heater · 2009-01-14 22:19

Phil, I know what you mean but doesn't that start to make spaghetti out of your code. Objects that need RAM space then need to know where other, totally unrelated objects PASM may be. Mixing and matching objects becomes a mess.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

jazzed · 2009-01-14 22:29

Phil, the approaches you describe are fine (and i've used the first), but niether one buy more code space for Spin interpretable code. Edit: actually, your second approach would buy spin space if the driver for the load target can be fully enclosed in one generic PASM loader so that all drivers or other support code could be loaded to different cogs. Then, you would have the luxury of loading 7 cogs and could reclaim the original loader area for other purposes if you don't need it. Sweet [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Post Edited (jazzed) : 1/14/2009 10:40:21 PM GMT

heater · 2009-01-14 22:42

Hmm, so what we want is:

a) bootloader that is the only code that ever gets blown into EEPROM.
b) boot loader is probably written in PASM and loads PASM blobs to cogs and SPIN code to HUB.
c) Loader can be customized to use something other than SD card.
d) PASM blobs have no linkage to SPIN VAR and DAT variables except via a pointer passed in PAR
e) The bootloader should be able to fetch new SPIN/blob packages from a PC to put into SD card as per the Prop loader.

Some how in b) the loader needs to be able to load it's own COG with new code as a last step.

Some how COGS don't want to start their processing until all COGS and SPIN has been loaded. Perhaps they could wait on their PAR registers becoming non-zero.

This requires that PASM blobs are written with loader awareness i.e. only PAR linkage and start up procedure.
And that the SPIN parts provide the other end of that loader awareness to set PAR when the COGs should RUN.

Now all we need is a compiler that will spit out separate PASM binary blobs and SPIN for the loader to fetch from the PC.

Anything else ?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

jazzed · 2009-01-14 22:56

Items b - e make sense. I was thinking a small snippet of Spin could manage the loading to simplify
things, etc in less than 20 lines + data.

I don't think we have to go as far as a) to have access to more executable spin byte space.
The bootloader only needs to be in one pasm section.
One loader variant per project specified at compile time: EEloader, SDloader, SerialLoader, etc...).
Bootloader does not have to send spin to HUB and start it, though that is an interesting feature that could boot "anything".
We don't need a compiler, I have a tool that automatically extracts PASM. I'll find it ... it's posted here somewhere.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Carl Jacobs · 2009-01-14 22:59

heater, a couple of comments:

Extraction of PASM images from EEPROM (as opposed to SD) is probably going to be a sufficient start. Not that SD is hard, but it does require extra pin configurations. EEPROM access is pretty much stock-standard.

The only way to load PAR is by using cognew - so I guess what you're saying is wait for the address pointed to by PAR to become non-zero.

My normal method for a sychronised start up is exactly that, i.e. start when the time is ??? implemented quite simply with WaitCnt. I'm sure that most loader processing is going to take less than 53 seconds.

Regards,

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Carl Jacobs

JDForth - Forth to Spin Compiler http://www.jacobsdesign.com.au/software/jdforth/jdforth.php
Includes: FAT16 support for SD cards. Bit-bash Serial at 2M baud. 32-bit floating point maths.·Fib(28) in 0.86 seconds. ~3x faster than spin, ~40% larger than spin.

Mike Green · 2009-01-14 23:04

The Propeller OS that I wrote a while ago was designed just this way. The "main" program included all the I/O drivers and the command interpreter. If the I/O drivers were not running, the program loaded and initialized them. "User" programs included only the Spin API for the drivers (keyboard, either TV or VGA display, and I2C driver). The drivers and the Spin programs communicated through common areas in the high end of hub ram. The low level SPI driver was added to the I2C driver later. It should be possible for the "main" program to include Rokicki's SD card routines that would be used to look up (and possibly create) some specific files and stash their absolute SD card addresses in a work area so that the low level routines could access them with minimal memory impact.

heater · 2009-01-14 23:08

Well what I'm imagining is that we basically replace the Props native boot loader with a new one that fetches a file containing concatenated 8 * 2K PASM blobs and 1 * 32K SPIN blob from your PC. This file is the output of the compilation process by whatever means. It can use some nice high speed serial protocol to do this. This downloaded file is placed in SD card. The loader can be customized to use different storage, baud rates, serial protocol, clock settings etc. etc.

Now if there is no file to be downloaded from the PC it starts to fetch the file off the SD card, loading everything into the right place. and then starting the main SPIN method. Not sure how the SPIN objects then tell their COGs they can start now.

Problem is that all those nice OBEX objects would have to be modified to work with this scheme....

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

jazzed · 2009-01-14 23:20

Mike can you post a link to your work so it can be evaluated ?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Kye · 2009-01-14 23:22

You guys should be care not to overcomplicate things. Rewriting clever code is good... but the track you guys plan on getting on will defeat the purpose of using the wonderful propeller chip.

If you think your running out of ram then triming your code of all unessential elements will help, not hurt.

It's suprising how much space is wasted in some code...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,

heater · 2009-01-14 23:29

OK my solution may be a bit over the top and complicated. It's would be of use in some some extreme cases where one has a lot of SPIN code, nearly 32K, and little data. In which case the EEPROM would then be stuffed with PASM code and no place for the SPIN you want.

In a case like my emulator really I have as little SPIN as possible, I want lots of free RAM for the emulators 8080 RAM space.

As it stands with the Prop tool I have a module/object containing the emulator PASM which is wasting space. I have another object/module that defines the RAM content for the programs emulator is to run and free RAM space. Basically the emulated CPUs memory map.

To reclaim the PASM space I would have to put the emulator PASM into the memory module somewhere. Then that space would be reused when the emulator runs. This is horribly messy.

Now. It occurs to me that if I could extract the emulator PASM from the Prop tools binary I could then put that binary into a file. And then in a separate simulator build I could place a "file" statement into the memory map object to include my emulator PASM code. Bingo I have reused the memory in a very neat way. I only have to get the PASM blobs address out of the memory module and use it to start the emulator COG.

So whoever has the PASM extractor please let me know how to do this.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

heater · 2009-01-14 23:34

Kye, I agree with you. I like simple and elegant. And I'm not into rewriting a lot of existing stuff that works.

But in normal use of the Propeller tool it is wasting space with every piece of PASM code you have. Unless you take steps to place your PASM code in places that will later be used as variable/buffer storage.

The Prop tool sadly does not support such placement of code or overlaying of code/data areas.

So we have to come up with some tricks or end up with spaghetti code.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Kye · 2009-01-15 00:02

Hmm, maybe it would be possible to use a 64k eeprom with a 512 long buffer in memory and an I2C object to squentially grab pasm code. Put into the buffer, and then load it up, and then repeat for other cogs.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,

Cluso99 · 2009-01-15 00:55

Its all doable, and not that complex.

Need to find the best method to load from anywhere and create a stub that exists in hub. Just on the way out, so will comment later.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

My cruising website http://www.bluemagic.biz

jazzed · 2009-01-15 01:34

Good or bad almost all my projects need more memory [noparse]:)[/noparse] My current Prop project has 380 longs left ...
barely enough for stack I think, and I cringe at the moment when I start hitting stack problems ... again.
It's hard to do anything truly useful with a 32K Byte limit. I only need two PASM drivers, so at least
12KB is wasted. Being able to use any extra on-chip space for instructions in any way at all helps.

@Heater, The program I mentioned produces a C array, but I'm sure you could easily make a PASM
binary with some source changes ... be my guest, just post back. Source is here:
http://forums.parallax.com/forums/default.aspx?f=25&m=287818&g=294044#m294044

@Mike, I looked and it looks like I'm forced use your O/S to harvest extra memory. Not what I wanted.
I want a stand-alone solution. It was kind of you to bring it up though. Please explain if I misunderstand.
Thanks.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Cluso99 · 2009-01-15 04:02

Has anyone looked at the PropLoader and code in the Prop that burns to EEPROM. What I am thinking is to just have the two compilers modified to allow compilation up to 64KB (an option). If required, modify the loader so that the EEPROM is loaded with this 64K, instead of 32K at present.

Now, the writers can place the pasm code above 32KB, and the compiler will accommodate this. There will be some restrictions in that any addresses referred to which are >32KB will be errors, so only pasm DAT would be able to be placed here (for now anyway).

So now all that is needed is a pasm stub that can load the pasm code from EEPROM above 32KB. Par will remain as always, below 32KB.

This method is simple and can be expanded later to load from other places, but lets get this going first.

What do we need...

1. Ask BradC (bst)·and MPark (Homespun) to allow option to compile above 32KB.

2. Check if the Prop Loader and burn code will burn 64KB, else create new object to do this.

3. Write stub code for pasm loading above 32KB - I can do this with zero cog footprint, so all cog memory available. For this to work, I will need about $30 longs of lower hub ram (below 2KB) and probably a 2KB buffer (or maybe much smaller) to load from EEPROM.

Any thoughts???

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

My cruising website http://www.bluemagic.biz

Post Edited (Cluso99) : 1/15/2009 8:14:07 AM GMT

heater · 2009-01-15 05:09

@kyw, just to reiterate briefly. The real problem is that from the 32K EEPROM all the way up to the PropTool we cannot get 32K + 8 *2K = 48K of code into a Prop. This is one third of the chips cabability and really needs fixing or working around.

@jazzed, That tool sounds useful, I will check. Becomes all the more urgent for me when I move to using 4 COGS for emulation,

@Cluso99, Your proposal appears to be quite sound. I hope BradC and mpark see this[noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

jazzed · 2009-01-15 05:49

@Cluso99. It would be nice to have a fully integrated solution in the compiler. I hope there are not constraints other than compiler talent or time [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

heater · 2009-01-15 05:51

Looks like Cluso99 had exactly that in mind, where is BradC and mpark ?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Cluso99 · 2009-01-15 08:46

Ok, I have checked the Booter code...

The PropLoader can only load 32KB from the pc to hub ram. Anything less than 32KB will be zero filled.·If eeprom is selected,·it will·burn the full 32KB from hub ram to eeprom.

So, the compiler will need to output a seperate file for the code above 32KB and a special Loader will need to be used to load this from the pc to eeprom. There is nothing complex about this, just that it has to be done.

I will have a think about the prop side code.

Michael and Brad - any comments?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

My cruising website http://www.bluemagic.biz

BradC · 2009-01-15 09:01

Cluso99 said...

Michael and Brad - any comments?

Certainly none from me. Aside from some special code in the prop to load the second half of the eeprom I don't see anything that requires assistance from the compiler really.
From what I see you are effectively loading the top half of the eeprom with PASM that you will need to page in 512 longs at a time to load cogs. Is there something else I'm missing?

If you can come up with a concrete proposal and requirement I don't have any problems at all modifying the compiler to suit and working with the loader to do the job.

I already have the ability to load the second half of the eeprom with my USB HID bootloader (which requires the top 16 longs of ram, 2 cogs and 3 port pins), so it's not that hard to do. You could probably modify chips boot code to load ram / burn low /verify low / load ram / butn high / load high, and modifying the loader to do this would not be difficult at all.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cardinal Fang! Fetch the comfy chair.

heater · 2009-01-15 10:14

BradC,

Isn't it just that a compiler should be directed to place one or more or all of the PASM blocks from objects into the high end of the binary file each in their own 512 long space. Rather than have them lying around at random within the file.

Of course the NOT PASM code i.e. actual DAT data still needs to be in the low 32K as normal so that Spin can find it. How do we know which bits are code and which bits are data ? Especially when writing LMM.

The user should some how specify what goes where.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Carl Jacobs · 2009-01-15 10:41

heater said...
The user should some how specify what goes where.

This was my thought on it which is why I suggested numbering the memory blocks. Since we can't use the bottom 32K it makes sense that 32K is block 0, and block n is 32K + 2K x n.

The way I would see this as working is that block 0 would always get a copy of (or a variation of) FullDuplexSerial. Everytime I load my code I wouldn't bother with the top 32K - it's debugged code that just needs to be used.

Development of new PASM COGs would occur in the usual manner that we are all accustomed to without the requirment for any special tools. All thats needed then is a file something along the lines of:

[u]CONFIG.SPIN[/u]
CON
  FullDuplex = 0
  VGA = 1
  PS2 = 2

[u]FullDuplexSerial.spin[/u]
OBJ
  cfg : "config.spin"
  ee : "eepromloader.spin"

VAR
  long  cog                     'cog flag/id
  long  rx_head                 '9 contiguous longs
  long  rx_tail
  long  tx_head

PUB Start(rx, tx, mode baud) : cognum
  ...
  okay := cog := ee.LoadCog(cfg#FullDuplex, @rx_head) + 1 
  ...

Something like that would do me just fine. Debug and develop as normal, then burn to an upper page for production use.

Regards,

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Carl Jacobs

JDForth - Forth to Spin Compiler http://www.jacobsdesign.com.au/software/jdforth/jdforth.php
Includes: FAT16 support for SD cards. Bit-bash Serial at 2M baud. 32-bit floating point maths.·Fib(28) in 0.86 seconds. ~3x faster than spin, ~40% larger than spin.

Cluso99 · 2009-01-15 12:22

Brad: The compiler needs a new option "HubObj $8000" to move the hub pointer up to the next 32KB. It also needs to compile code up to $FFFF (64KB). It would look as though the Prop now has 64KB of ram to the compiler. The programmer would be responsible for ensuring that only cog pasm code was placed above the 32KB limit.

· I think this may only be a simple mod to the compiler.

There will need to be a new PropLoader which will load > 32KB programs, and of course there will need to be code in the prop, both for loading and storing in eeprom, and for loading cogs from the upper 32KB eeprom bank. I am happy to do the Prop code (download and write to eprom and run loader to load from upper 32KB to cog). It will also require the zero footprint code I use in my debugger to load the cog code from the upper 32KB eeprom.

Is anyone interested in writing the new PropLoader?··

I can write the spec if required. It would be preferable to be windoze/linux/mac compatable (I'm windoze only).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

My cruising website http://www.bluemagic.biz

BradC · 2009-01-15 12:36

Cluso99 said...

Is anyone interested in writing the new PropLoader? I can write the spec if required. It would be preferable to be windoze/linux/mac compatable (I'm windoze only).

Umm hell yeah, I'll write it no problem at all. You give me a spec and prototype and I'll make sure it works across the platforms..

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cardinal Fang! Fetch the comfy chair.

Cluso99 · 2009-01-15 12:50

Brad - did you write the proploader for linux and mac?

Spec by PM. I'll get started my end asap.

Do you want to zero fill up to the 32KB ($7FFF) if your compiler gets the "HubObj $8000" option/line? Would make the loading program easier - I am not sure of the binary or hex formats.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Prop Tools under Development or Completed (Index)
http://forums.parallax.com/showthread.php?p=753439

My cruising website http://www.bluemagic.biz

Post Edited (Cluso99) : 1/15/2009 1:13:50 PM GMT

heater · 2009-01-15 13:25

Shouldn't we make that "huborg $8XXX" ?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Post Boot PASM COG Loader?

Comments