WANTED: PASM > 32k

RossH · 2009-04-21 12:34

All,

I'm currently working the ability for Catalina C programs to run from external RAM - I have a Hydra Xtreme 512K card that looks like it will be just the biscuit. I'm also still working on reducing the Catalina code size further, but I already have C programs written that I now realize will never fit within 32k.

It should be relatively simple. However ....

I'm hampered by the current 32k SPIN/PASM compiler limit. I can't see any particular reason for this limit, but all the current SPIN/PASM compilers seem to have it (I've tried bstc, homespun, parallax - anybody have another?). Obviously all these compilers will have to extend this limit when the Prop II arrives, but who knows when that will be - and I really don't want to wait. To progress without it, I'll have to resort to all kinds of nasty things like vector tables, demand paging and/or relocatable program segments. All doable - but all time consuming. And it'll all be redundant when the Prop II arrives anyway.

BradC has said that there is no particular reason for the 32k limit in his compiler, except that he didn't see much point in allowing programs to get larger than that - presumably since that's all the Propeller has onboard. I believe Bill Henning is working on an LMM PASM assembler, but but I don't know if he is planning to extend the 32k limit, and also whether his compiler will be fully compatible with existing SPIN/PASM.

So what about it BradC? Bill? mpark? Anyone else? What I need is a standard SPIN/PASM compiler that will allow the user to generate PASM program images larger than 32k - any format is fine. I'll do the rest (loading, memory management etc) as part of Catalina. If anyone knows of any language design or other limitations that mean PASM images > 32k can't be generated, then please let me know.

Thanks in advance,

Ross.

heater · 2009-04-21 12:42

Not saying it can't be done but when I looked at the Hydra RAM card interface it seemed to require a lot of code and would be quite slow for random access. Still it's a good start because it is readily available and easy to hook up. I hope Cluso could let you have TriBlade to work on it has a very lean interface for random access.

At the end of the day the Catalina kernel must be modifiable for multiple different external RAM implementations.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

virtuPIC · 2009-04-21 14:07

Uh-oh! You are asking for virtual memory. On the Prop I only know about cor Ram as first level memory and hub RAM as second level memory in LMM. You are asking for second or third level external memory.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Airspace V - international hangar flying!
www.airspace-v.com/ggadgets for tools & toys

Bill Henning · 2009-04-21 14:09

Hi Ross,

LASM will support 32 bits of address space

It would however be best to have separate code, data and stack segments, ideally with relocatable code/data. The layout I am considering is pretty standard:

Code: x1 longs of code
Data: x2 longs of data
Stack/heap: x3 longs of data

It would be possible to have a relocating loader, however I think it best if instead all code is relative to where it is loaded, ditto for data & stack.

RossH said...
All,

I'm currently working the ability for Catalina C programs to run from external RAM - I have a Hydra Xtreme 512K card that looks like it will be just the biscuit. I'm also still working on reducing the Catalina code size further, but I already have C programs written that I now realize will never fit within 32k.

It should be relatively simple. However ....

I'm hampered by the current 32k SPIN/PASM compiler limit. I can't see any particular reason for this limit, but all the current SPIN/PASM compilers seem to have it (I've tried bstc, homespun, parallax - anybody have another?). Obviously all these compilers will have to extend this limit when the Prop II arrives, but who knows when that will be - and I really don't want to wait. To progress without it, I'll have to resort to all kinds of nasty things like vector tables, demand paging and/or relocatable program segments. All doable - but all time consuming. And it'll all be redundant when the Prop II arrives anyway.

BradC has said that there is no particular reason for the 32k limit in his compiler, except that he didn't see much point in allowing programs to get larger than that - presumably since that's all the Propeller has onboard. I believe Bill Henning is working on an LMM PASM assembler, but but I don't know if he is planning to extend the 32k limit, and also whether his compiler will be fully compatible with existing SPIN/PASM.

So what about it BradC? Bill? mpark? Anyone else? What I need is a standard SPIN/PASM compiler that will allow the user to generate PASM program images larger than 32k - any format is fine. I'll do the rest (loading, memory management etc) as part of Catalina. If anyone knows of any language design or other limitations that mean PASM images > 32k can't be generated, then please let me know.

Thanks in advance,

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Post Edited (Bill Henning) : 4/21/2009 2:14:36 PM GMT

jazzed · 2009-04-21 16:34

The model where .text lives in External Memory (XMM) and all else lives in Propeller HUB is attractive.
In the work I've done, a loader lives in the eeprom that starts up the XMM code ... works.

I'm prototyping a DRAM solution right now that has some speed optimization and allows 1MB memory,
a Keyboard, Mouse, SD CARD, Monural audio and TV with one Propeller.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

hippy · 2009-04-21 16:45

RossH said...
And it'll all be redundant when the Prop II arrives anyway.

Not necessarily. Unless it is decided to support XMM & Friends, there's no compelling reason the PropTool and Parallax compilers shouldn't be capped to the 256KB of hub memory image.

If you can split Catalina output into Spin plus C-LMM, that would simplify things; you can compile the Spin 'C-LMM launcher / Kernel loading' part using tools which exist, then it's just a case of finding an assembler for C-LMM ( eg, Bill's LASM ) and generate an image from address $00000000 which is placed in Eeprom / SD card / wherever.

The only issue then is how the Spin part knows where the C-LMM parts are. That can be solved by having a set of pointers at the start of the C-LMM image. Spin can then get those indirectly knowing where they will be in the image.

That's a nice split because it also means one can write a PC-based emulator which does whatever the Spin launcher does, the C-LMM being the entire C program in its own right.

The 'special case' is then for small programs where the C-LMM is small enough to fit into the Prop's hub memory.

OwenS · 2009-04-21 17:31

I need to get my old assembler available again. It's a traditional assembler, though: That is, it produces it's own object files to be linked by it's own linker; I don't know how relevant that is to you (I assume it would remove need for what you call the "Binder"?). It has an LMM mode, though that probably needs a bit of reworking for compiler support.

I need to revamp it's object format a bit though to support more powerful relocations (Currently it can just do Symbol + Constant). I also need to do some work on my combined LMM/virtual machine system (I was intending to port LCC to the 16-bit VMM opcodes - which are normaly identical to PASM's but it has some powerful ones like calculated loads & strores and LEA)

The default linker script (Actually just a Lua script file [noparse]:)[/noparse] ) limits the image to 32k, but theres no reason you couldn't change that.

Post Edited (OwenS) : 4/21/2009 5:36:44 PM GMT

RossH · 2009-04-21 23:43

All,

Thanks for the responses.

@virtuPIC, jazzed - I'm thinking more along the lines of overlaid code/data segments, not full blown VM. Something that does to XMM/Hub RAM what the FCACHE idea does for Hub/Cog RAM. I have thought about just putting the text segment in eeprom - it is common practice on the Hydra to use external eeprom for game assets, so I may do that as a first step anyway.

@heater - I'm starting with the Hydra Xtreme because I just happen to have one. I keep trying to hint to Cluso that he should donate a TriBlade Prop to the cause, but maybe I'm being too subtle [noparse]:)[/noparse] ... however I will certainly try to make any Catalina XMM interface generic to other platforms.

@bill - yes, I'm planning on reorganizing Catalina's memory usage along the lines you describe. Will LASM accept current SPIN/PASM programs unchanged?

@hippy - personally, I'd be happy with 256Kb - that would be enough for me to do what I want to do. Beyond that, I suspect it may be better to turn the whole Propeller software architecture on its head and build a SPIN interpreter that can be launched as a Catalina plugin ... but what's missing here is a proper language independent binary object format plus an assembler and linker that supports better memory management and is also independent of SPIN. Which brings me to ...

@OwenS - you are correct. I was in a hurry with Catalina, and didn't want to spend the time required to define a new object format or write a linker. Writing the Catalina binder only took a few hours, so I'd be prepared to abandon it in favour of something better. Is your assembler/linker currently available anywhere so I could take a look at it? I did a quick forum search but didn't spot it.

Ross.

Dr_Acula · 2009-04-22 02:35

I've found you *always* need more memory. At least with modern programs like .net you don't have to worry about running out of memory and you can define ridiculous sized arrays and get away with it. Even then, the software I wrote to run the accounting system for my medical practice has started slowing up a bit when doing a quicksort on 50,000 consultations.

I've moved to CP/M not just because it is a retro system, but mainly because of the ability to handle bigger programs. Ok, CP/M tops out at about 50k and I've run up to that in the past, but then you can write multiple programs and then compile them and shell or chain from one to the next, and exchange data between programs via text files. Plus you can choose your language, like Mbasic for quick and simple "hello world", and Sbasic and BDS C for more professional structured programming.

The triblade prop is intriguing because not only does it give you access to all the clever Propeller functions, and also access to an operating system that can effectively be running programs that take megabytes of memory. I'll have one working soon, just as soon as I sort out Future Electronics now wanting me to fax them a signed letter on my company letterhead saying I won't be putting the memory chips into a nuclear missile (don't laugh, they really have asked me this!).

Post Edited (Dr_Acula (James Moxham)) : 4/22/2009 2:52:41 AM GMT

OwenS · 2009-04-22 07:58

@RossH - I've put up the source here. Be aware the assembler source is kinda messy (I've got code generation in the parser... uck... although it's perhaps more structured than it could have ended up). Also, I didn't strip out the SVN files (Which reminds me: I think I've accidentally firewalled or turned off my SVN server!)

I'm gonna update it soon so you can do full arithmetic with symbols, and then make it endian independent (It currently assumes little endian hosts)

RossH · 2009-04-22 15:10

OwenS,

Thanks for the posting the source - I didn't have time to look at them today, I hope to get to them sometime in the next couple of days.

Ross.

Baggers · 2009-04-22 15:24

Dr_A what does your company do? maybe the company name was something that looked a bit suspicious lol

Do you work for a company called WMD R'us? lol

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
http://www.propgfx.co.uk/forum/·home of the PropGFX Lite

·

Bill Henning · 2009-04-22 16:08

Hi Ross,

LASM does not accept Spin code at all, however it is more-or-less compatible at the pasm level (the less is not currently supporting things like |< in expressions)

RossH said...

@bill - yes, I'm planning on reorganizing Catalina's memory usage along the lines you describe. Will LASM accept current SPIN/PASM programs unchanged?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

Dr_Acula · 2009-04-23 00:54

@Baggers - yes my company is a medical practice. But we all know that is just a front for Dr Evil's lair where we plot our nefarious plan for World Domination.

I'm not the expert on pasm, but would a combination of the tripblade and PropDOS be able to handle bigger assembly programs?

RossH · 2009-04-23 02:07

Hi Dr_Acula

Yes, Cluso's TriBladeProp should be able to execute larger PASM programs - this looks like being the best solution available at present, but it is not actually necessary (nor is PropDOS). PASM can be executed from any external storage, including eeprom and SD-RAM (albeit very slowly!)

With normal PASM the code is stored in cog ram, and executed by that cog - but this is limited to 512 LONGs. With LMM PASM the code can be stored in hub RAM, and is executed by a cog that first loads it to cog RAM. This allows LMM programs to be up to 8192 longs (i.e. 32K) - which is the size of Hub RAM on the current Propeller, and therefore a limit that most PASM compilers currently assume. But if we store the code in external memory (commonly called XMM) this limit is also no longer relevant and should now be removed. Michael Park has removed the 8192 LONG limit from an experimental version of his Homespun PASM compiler that I am using, so I can now use Catalina to compile C programs that end up with PASM code segments larger than 32k. All I have to do now is to store the result somewhere where it can be fetched a LONG at a a time. Initially I will use the Hydra's 128k external eeprom - but this is too slow to be useful. The next step will be use the Hydra's XTreme 512K RAM card. But obviously the bigger and faster the XMM better - and until the Prop II arrives, the TriBladeProp is probably the next logical option.

Ross.

P.S. Please don't pass this information onto Dr Evil!

jazzed · 2009-04-23 03:05

RossH writes:
>> TriBladeProp is probably the next logical option.

Yup unless you have a soldering iron handy [noparse]:)[/noparse] I've had 2 such painful prototypes for months. There will be no such thing as a LONG at a time XMM of course with P8X32A. Performance with a design similar to TriBladeProp is about 800K LMM Instructions Per Second (LIPS) ... roll over Sarah [noparse]:)[/noparse]

An optimization can be done to speed up fetch with SRAM to about 2M LIPS ... of course that assumes .text in XMM and all else in HUB. I'm looking at DRAM options again because I really can't stand dedicating an entire Propeller to memory management and DRAM density is just too attractive to pass up.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

OwenS · 2009-04-23 10:35

Ross,

Our projects' licensing terms are very similar (PropTools is GPLv3; Runtime support code is X11); they would seem to fit together in a similar way to GCC and Binutils do.

If you were to port Catalina to PropTools, I could probably modify PLink a bit so I linker scripts can put .text past 32k.

I'm hosting PropTools in Git now (I was using SVN when I created it, but today prefer Git); you can find a repository browser here, and later tonight will be able to check out the code by doing a

git clone http://tdn.teknetium.com/git/proptools.git

Perhaps we could merge them both into one repository? (Though I imagine that would require modifying one of our build scripts - I'm happy to port the LCC build system to CMake though - and I imagine CMake would be desirable from a portability standpoint)

How hard is porting LCC to a new architecture? I'd love to add support for my virtual machine to LCC (Which reminds me to check in vmm.s); LMM C code could probably also run in the virtual machines' LMM kernel, though thats currently a minimal size implementation (I haven't as of yet unrolled it), and I don't have in-VM floating point (Depending upon the code space requirements I could probably add it).

Post Edited (OwenS) : 4/24/2009 11:27:53 PM GMT

RossH · 2009-04-24 05:09

Hi OwenS,

I've been doing a bit of thinking about the whole subject, and I'm slowly coming around to the realization that Catalina cannot remain compatible with the existing SPIN/PASM compilers much longer - they just don't offer enough support for LMM/XMM programs. The problem is not with PASM itself, but with the absence of a common object format and also memory segment management. I guess this is what Bill found, and is why he's now writing his own assembler. Perhaps both Bill and yourself realized this sooner than I did.

However, I think I will do one more release of Catalina where just the constant segments (i.e. text or code or both) can be relocated above 32K - I believe I can use mpark's modified Homespun compiler to do this with fairly minimal effort. Once this is working (and I finish porting the C98 library) then Catalina will be functional enough to develop real-world applications for the Propeller in C.

After that I'll investigate porting Catalina to another assembler/linker - or perhaps someone else will have already done so. At this stage the target could be either your toolset or Bill's (I've not evaluated either!).

Another reason I want Catalina to remain compatible with existing tools for at least a little while yet is that as soon as I move it to another toolset, I am likely to lose the ability to use SPIN during startup - and just being able to use SPIN to bootstrap the Propeller into running C programs has saved me weeks of work. SPIN is really great for getting something up and running quickly.

Ross.

Cluso99 · 2009-04-24 05:42

FYI: Ross and I have been discussing offline - I had responded before the subtle hint.
I have been a bit tied up over the last couple of days to reply here.
Turns out, when I am in Gosford (Sydney) we are fairly close - small world.

LMM & >32KB
Heater is correct with an estimate of 800K LIPS for the TriBladeProp. A 32bit fetch takes approximately 1uS. However, I cannot see how DRAM can be any faster and in fact should be considerably slower. We just don't have the pins for any other methods. If the fetching was mainly sequential then there would be other methods that could be used, but in reality that is not the case.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBladeProp, SixBladeProp, website (Multiple propeller pcbs)
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index)
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

jazzed · 2009-04-24 06:06

The best possible XMM instruction fetch data rate with PX32A would be around 2M LIPS with 16 bits SRAM as I described before (not interesting enough to overcome other momentum at that time [noparse]:)[/noparse] DRAM can not be faster because of RAS* & CAS* transactions. SDRAM bursts make up for much of that loss, but you still have to weave the data into a long on the Prop.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

RossH · 2009-05-01 13:33

All,

Just a quick update - I've just successfully compiled and run my first Catalina C program that uses XMM (i.e. external RAM) to execute a program that won't fit in the Propeller's 32K internal RAM.

I expect to release a new version of Catalina soon that will support C programs with arbitrary code sizes. Currently, this will require the use of a special version of the Homespun compiler - which mpark kindly generated for me - that overcomes the normal PASM 32k limit.

Other than size, Catalina XMM programs are still fully compatible with existing SPIN and PASM compilers and tools. I have a new Catalina XMM Kernel that knows how to load code from eeprom to XMM on startup and then execute it from XMM. Current eeprom sizes probably limit such programs to about 128k (which is the eeprom size on the Hydra) but ultimately I anticipate Catalina will also support loading such programs from SDRAM or other sources.

The Catalina XMM Kernel currently only works with the Hydra Xtreme 512K RAM card (the only XMM hardware I have available) but the kernel should be relatively easy to modify for other XMM hardware. Cluso99 has promised me a TriBladeProp, so that will be my next target.

Ross.

jazzed · 2009-05-01 14:16

Congrats Ross. I'll give it a try on my XMM board after you publish the code. Any plans for an SD Card loader?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

Cluso99 · 2009-05-01 23:46

Great work Ross and thanks Michael for doing the homespun mod.

Ross - I am still in Qld - expect to be back in Sydney sometime next week so will chat then.

I have a question... Is 512KB of external SRAM enough (TriBlade has 1MB) ? On another post I mentioned what I am doing but it's a little more than that and am just wondering if 512KB would be sufficient (there are reasons I am not wanting to openly discuss yet).

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBladeProp, SixBladeProp, website (Multiple propeller pcbs)
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index)
· Search the Propeller forums (via Google)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm

RossH · 2009-05-02 00:30

Hi jazzed.

It's all a matter of finding the time. But it should be easy for you to add this yourself - as usual, I do all the code loading and cog setup in SPIN before starting the Catalina Kernel (just because it's easier). The Catalina program segments could just as easily be loaded from an image file instead of from eeprom, which is what I currently do. There are several SDRAM file loaders that can load into hub RAM - you will need to make one aware of where the Catalina program segments are within the image file, and then add the capability to load some parts of the image into hub RAM, and some parts into XMM.

Remember that any SPIN code gets overwritten once the kernel takes over - multitasking Catalina programs with SPIN programs may be possible once I can move most of the Catalina program out to XMM, but Catalina currently assumes it has full control of the Prop.

Nearly all of the changes required to support XMM are in the Catalina binder, not the kernel - they are all to do with managing the memory segments appropriately. There are only two small kernel routines that need modifying to support any type of XMM - one initializes the Kernel's XMM access and the other loads data from XMM. The first release will only support read access to XMM - this is ok for code segments and text segments (which are the ones which tend to be larger than 32k).

I don't do any XMM caching or read-ahead yet - I don't have space to do that within the Kernel itself. But once I have all the basic stuff in place it should be possible to create an XMM cache manager that runs in another cog. Again - it's just a matter of finding the time.

I hope to publish the code in the next few days (Unless I find a major problem - I thought debugging LMM programs was bad enough, but debugging XMM programs is much harder! I only really made progress on Catalina once I added LMM support to POD - if I strike any major problems I suspect I will have to add XMM support as well).

Ross.

RossH · 2009-05-02 00:35

Hi Cluso,

I think 512K would be enough for most applications. The Prop is such a wonderful beast that we tend to forget it is just a microcontroller. If I had an application that required more RAM than that I would probably not use the Prop I (but I may use the Prop II!)

Talk to you sometime next week.

Ross.

jazzed · 2009-05-02 01:25

Hi Ross,

Thanks for taking time to write that ... you have answered most of my questions [noparse]:)[/noparse] I've done the embedded image thing with ICCPROP. I also wrote an intel-hex loader which I'm using now with a serial port that takes a string of bytes ... so loading from SDCARD should be cake (famous last words) assuming one is connected.

The ICCPROP loader I'm using has access to certain "kernel" variables for doing loader role and switching to XMM for boot ... this is the main part that I suspect you'll need to share for Catalina. Devil is in the details ... it was a PITA before.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve

Propalyzer: Propeller PC Logic Analyzer
http://forums.parallax.com/showthread.php?p=788230

WANTED: PASM > 32k

Comments