Have you ever considered generating binaries that would work with the GNU linker? It handles all of that for me in ZOG. You can place .text and .data anywhere you want and you can also create an image of .data in the .text section and copy it into place at startup. That allows me to put the initial values of globals in flash and then move them to their place in SRAM just before calling main().
Hi David,
It's not really practical unless you can somehow convince Parallax to also adopt this format. Catalina binaries are compatible with any other Propeller binaries, and hence are loadable by the built-in loader on any Propeller. Of course on the Prop I this means the total program size must be under 32k (48k if loading from EEPROM) - but on the Prop II the built-in loader will presumably support much larger binaries. I want Catalina to be compatible with that, and I don't really want to have to support two separate binary formats.
Also, I get enough grief from people who believe that LMM C is somehow not "native" - even though it compiles C to PASM, and also executes much faster than SPIN (which somehow is considered "native" ). I can just imagine the additional grief I would get if I also insisted on using a non-native binary format!
I guess that's a good point. It's unfortunate though since there are so many excellent tools that work with the GNU binary format that would become available if it were adopted for Propeller binaries. Oh well, I guess there's not point in fighting a losing battle...
I'm looking forward to your version of Catalina that generates code for the C3 SPI flash and SRAM. I started working with Catalina a long time ago but switched to ZOG because I couldn't fit big enough C programs in the 32k of hub memory. I wonder if my basic interpreter will fit in 1mb of flash with Catalina? It will be fun to find out.
I'm looking forward to your version of Catalina that generates code for the C3 SPI flash and SRAM. I started working with Catalina a long time ago but switched to ZOG because I couldn't fit big enough C programs in the 32k of hub memory. I wonder if my basic interpreter will fit in 1mb of flash with Catalina? It will be fun to find out.
David,
If your Basic Interpreter won't fit in 1MB of FLASH RAM, then I'll send you a free copy of Catalina!
Also, I get enough grief from people who believe that LMM C is somehow not "native" - even though it compiles C to PASM, and also executes much faster than SPIN (which somehow is considered "native" ).
Who are these people? Where do they live? I've a good mind to get some of the lads together and we can go round and make them write it out a hundred times "C is native to the Propeller" *grin*!
David said
I started working with Catalina a long time ago but switched to ZOG because I couldn't fit big enough C programs in the 32k of hub memory.
Zog running in hub ram will run out of memory fairly quickly too, won't it? So presumably you are running Zog with external memory? And if so, then you could use the same external memory to run Catalina, in XMM mode? Or Big Spin etc etc?
As an aside, I spent last night coding a new IDE that is written in Basic but compiled to Java, and hence is not tied to Windows machines. I'm on the steep part of the learning curve, but today got the serial port working, and we have a skeleton IDE working now. Much of the catalina IDE should port over from VB.NET. Already there is a significant speed increase compared with vb.net. I'm hoping the richtextbox will be able to do all the nice code colors like the vb.net one can. If we keep this open source, then we can add any tweaks to the Spin language that might be needed to get Big Spin to work.
Who are these people? Where do they live? I've a good mind to get some of the lads together and we can go round and make them write it out a hundred times "C is native to the Propeller" *grin*!
Dr_A,
I can give you a list of names - can you find enough horse's heads to put in their beds?
IMHO, the more languages, implementations, emulations, etc, etc, the better the prop will be. SO keep up the great work all
The prop seems to have gained a little momentum recently in exposure. Every bit helps.
Zog running in hub ram will run out of memory fairly quickly too, won't it? So presumably you are running Zog with external memory? And if so, then you could use the same external memory to run Catalina, in XMM mode? Or Big Spin etc etc?.
You are correct that ZOG ran out of hub memory fairly quickly as well. However, it was far easier for me to add external memory support for the C3 to ZOG than to Catalina so I tried that approach first. Also, my first attempt at C on the Propeller was using a SPI SRAM board I made for the Hydra and that only had 64k of SRAM. With that much memory I was able to get some of my C programs running under ZOG but they would not have fit in 64k with Catalina. Now that I have access to 1mb of flash on the C3 it makes sense to investigate Catalina again, especially if RossH adds the flash memory support! :-)
I find this discussion about external memory and caches very interesting. Has anybody thought about implementing a memory management unit that could remap blocks of memory? This would allow multiple programs to execute without having to relocate them. Spin programs are naturally relocatable by adjusting the memory pointers in the Spin binary file header. However, some Spin programs use the @@@ operator provided by BST, which generates absolute address values. These Spin programs cannot be easily relocated. I would guess that Catalina C programs and programs from other compilers are not relocatable.
A MMU would allow execution of multple programs that are built for the same memory space. It may even make it feasible to implement linux on the Prop.
Currently you are going to need GCC to compile it and run under Zog. So you might not live long enough to see the command prompt come up:)
Maybe we could get one of the younger members of the forum to do the port.
Actually, adding an MMU to the cache logic wouldn't be that hard. The only problem is it would slow every access down since you'd have to go through an extra level of mapping. Also, with the tiny caches we've been using you'd likely get bad performance when switching processes since the entire cache would probably end up being overwritten.
Another issue with linux would be process scheduling. If linux uses an interrupt to do the scheduling we would have to emulate that with multiple cogs. This would limit the number of processes that could be run concurrently to the number of available cogs. If process scheduling is cooperative, then it could be done in a single cog. That would require doing a periodic OS call so that processes could be swapped.
I've run four concurrent Spin processes in spinix, but there's not enough hub RAM to run any more. BigSpin would allow me to run a few more processes until I run out of cogs. More concurrent processes could be added with cooperative process scheduling. spinix could support mxing Spin, C and PropBasic programs if the relocation issue was resolved. Of course, there would also need to be a consistent OS calling protocol, which would require defining a rendevous area with a consistent method for using it.
Well, as Heater said, to get Linux to compile we'd need GCC and at the moment that means we'd have to use ZOG. Since ZOG is a virtual machine, we could easily add interrupt capability to it if it isn't already there. This would be more difficult if someone creates a GCC that generates LMM code.
For VMCOG2 (the BIG VM version) I was considering the following strategies:
1) Single Address Space
- add VBASE to every incoming memory request, and check to make sure it does not exceed VLIMIT
This is the simplest, and as it would happen *before* the page present check, it would be the simplest way to go.
I estimate a 0.6us performance hit per memory access when the access is within the sandbox
2) Split I&D
- add CSEG to all code access, check against CLIMIT
- add DSEG to all data access, check against DLIMIT
- keeps stack in hub
This would have a similar speed penalty to (1), but would require ZOG / other clients to use different messages for accessing code or data
One advantage would be that it would be easy to specify a different number of pages in the working set for code and data
Note: Different virtual addresses could be used to distinguish between code & data, however that would slow down each access by an additional approx. 0.6us
3) Shades of Intel
- add CSEG to all code access, check against CLIMIT
- add DSEG to all data access, check against DLIMIT
- add SSEG to all stack access, check against SLIMIT
This would have a similar speed penalty to (2), but would require ZOG / other clients to use different messages for accessing code, data or stack
It would be easy to specify a different number of pages in the working set for code, data and stack.
Note: Different virtual addresses could be used to distinguish between code & data & stack, however that would slow down each access by an additional approx. 0.6us
4) Ghost Of Minis Past (GOMP: similar to first multi-tasking swapping kernels)
- single virtual space per process, with VBASE and VLIMIT like in 1
- processes are swapped every 10ms-30ms, all dirty pages flushed, and page table, VBASE, VLIMIT for next process loaded
Now all of the above solutions are technically feasible, and it would be cool to run a full multi tasking / multi user mini OS on the Prop, however let's not forget how much speed we would lose.
It could maybe run at the same speed as original Unix did on a low end PDP-11.
Note2: if limit checks were omitted, the overhead for adding a memory base to all incoming memory requests is a single ADD instruction
To run Linux you'd probably need to implement a page table so that virtual addresses would not necessarily need to be contiguous in physical memory. This allows the address space of processes to expand and shrink without having to move everything around in memory. However, I wasn't really serious about running Linux. It is almost certainly overkill for Propeller applications. Also, it is impossible to run Linux on a Propeller!
Note: The word "impossible" was mentioned above to trigger someone to spend countless hours proving me wrong! :-)
The fft benchmark is on its way. In Spin and C. No internet for my PC today and I can't post it from this phone.
Spin version is 1465ms vs the C version on my PC at 195 micro seconds!
Have not tried it with zog yet.
Sorry to raise such an old thread from the dead, but I came across it while googling for something else and was curious to see that Dr_Acula predicted most of the features of fastspin (5 years in advance!):
So it seems to me that what we need is something to translate Spin into LMM, and I see two ways to do this. One is to translate to the bytecode that the Spin interpreter uses. I see most of this as possible but there are still bits missing, though the Spin Simulator might be able to plug those holes. Especially the bit about the flat 32 bit memory map.
The other option I see is Bean's Basic option where you compile to pure Pasm and don't run it through any simulator at runtime. This may well produce faster code and save a cog as well.
fastspin is exactly the second option (Just like Bean's Basic, it turns Spin into Pure Pasm, either COG or LMM). So all that's missing from fastspin to make it "BigSpin" is an XMM mode or something similar. This would definitely be do-able. But is it still of interest? With the Prop2 just around the corner perhaps BigSpin on Prop1 is a dead issue. I'd be interested in hearing peoples' opinions on this.
...But is it still of interest? With the Prop2 just around the corner perhaps BigSpin on Prop1 is a dead issue. I'd be interested in hearing peoples' opinions on this.
The only reason that I started dabbling in Spin again, is because I noticed that their is a way to get Spin LMM. Now, to possibly get it to do XMM, I am very interested, but I may just be the only person that is.
I would not count on P2 being "just around the corner", just yet. Besides, by the time something useful for most of the users here, comes on line, it may be just that much longer.
Sorry to raise such an old thread from the dead, but I came across it while googling for something else and was curious to see that Dr_Acula predicted most of the features of fastspin (5 years in advance!):
So it seems to me that what we need is something to translate Spin into LMM, and I see two ways to do this. One is to translate to the bytecode that the Spin interpreter uses. I see most of this as possible but there are still bits missing, though the Spin Simulator might be able to plug those holes. Especially the bit about the flat 32 bit memory map.
The other option I see is Bean's Basic option where you compile to pure Pasm and don't run it through any simulator at runtime. This may well produce faster code and save a cog as well.
fastspin is exactly the second option (Just like Bean's Basic, it turns Spin into Pure Pasm, either COG or LMM). So all that's missing from fastspin to make it "BigSpin" is an XMM mode or something similar. This would definitely be do-able. But is it still of interest? With the Prop2 just around the corner perhaps BigSpin on Prop1 is a dead issue. I'd be interested in hearing peoples' opinions on this.
I think XMM code generation would be great. Note however that with the demise of the C3 Parallax no longer sells any boards that can run XMM except from EEPROM which really doesn't have enough capacity on most boards. In case someone mentions the SD card implementation of XMM, it isn't really practical because of the large SD sector size.
With spin2cpp it's possible to convert Spin to C, and then compile it using the XMM model. So this capability has been around for a while. Unfortunately, XMM has been a bit disappointing because of the low speed. However, I think with flash memory the speed is comparable with Spin speeds. Running from an SD card is quite a bit slower.
With spin2cpp it's possible to convert Spin to C, and then compile it using the XMM model. So this capability has been around for a while. Unfortunately, XMM has been a bit disappointing because of the low speed. However, I think with flash memory the speed is comparable with Spin speeds. Running from an SD card is quite a bit slower.
Yes, that is true. It is unfortunate that Parallax has chosen not to market boards with SPI flash chips on them. However, Martin Hodge's DNA board does have a socket for a SPI flash chip and should work well in XMM mode.
I think XMM code generation would be great. Note however that with the demise of the C3 Parallax no longer sells any boards that can run XMM except from EEPROM which really doesn't have enough capacity on most boards. In case someone mentions the SD card implementation of XMM, it isn't really practical because of the large SD sector size.
What about XMM and QuadSPI memory, or even XMM and HyperRAM ?
Yes, either of those would need a suitable supporting hardware platform, but maybe Parallax is open to doing a P1 refresh ?
QuadSPI Flash memory is now very cheap.
I am interested in seeing results of P1 and HyperRAM - the BGA package is less than ideal for casual use, but it does give a shipload of RAM, and if Parallax made a module, that would come pre-mounted.
I think XMM code generation would be great. Note however that with the demise of the C3 Parallax no longer sells any boards that can run XMM except from EEPROM which really doesn't have enough capacity on most boards. In case someone mentions the SD card implementation of XMM, it isn't really practical because of the large SD sector size.
What about XMM and QuadSPI memory, or even XMM and HyperRAM ?
Yes, either of those would need a suitable supporting hardware platform, but maybe Parallax is open to doing a P1 refresh ?
QuadSPI Flash memory is now very cheap.
I am interested in seeing results of P1 and HyperRAM - the BGA package is less than ideal for casual use, but it does give a shipload of RAM, and if Parallax made a module, that would come pre-mounted.
QuadSPI works for XMM and HyperRAM should also. Does anyone have a Propeller board with HyperRAM?
Comments
Hi David,
It's not really practical unless you can somehow convince Parallax to also adopt this format. Catalina binaries are compatible with any other Propeller binaries, and hence are loadable by the built-in loader on any Propeller. Of course on the Prop I this means the total program size must be under 32k (48k if loading from EEPROM) - but on the Prop II the built-in loader will presumably support much larger binaries. I want Catalina to be compatible with that, and I don't really want to have to support two separate binary formats.
Also, I get enough grief from people who believe that LMM C is somehow not "native" - even though it compiles C to PASM, and also executes much faster than SPIN (which somehow is considered "native" ). I can just imagine the additional grief I would get if I also insisted on using a non-native binary format!
Ross.
I'm looking forward to your version of Catalina that generates code for the C3 SPI flash and SRAM. I started working with Catalina a long time ago but switched to ZOG because I couldn't fit big enough C programs in the 32k of hub memory. I wonder if my basic interpreter will fit in 1mb of flash with Catalina? It will be fun to find out.
David,
If your Basic Interpreter won't fit in 1MB of FLASH RAM, then I'll send you a free copy of Catalina!
Ross.
Isn't that going to be a chicken and egg problem since I'll have to already have a copy of Catalina to determine if it will fit! :-)
Thanks for the offer but I'm willing to pay twice the full price for Catalina if my code will fit!
Who are these people? Where do they live? I've a good mind to get some of the lads together and we can go round and make them write it out a hundred times "C is native to the Propeller" *grin*!
David said
Zog running in hub ram will run out of memory fairly quickly too, won't it? So presumably you are running Zog with external memory? And if so, then you could use the same external memory to run Catalina, in XMM mode? Or Big Spin etc etc?
As an aside, I spent last night coding a new IDE that is written in Basic but compiled to Java, and hence is not tied to Windows machines. I'm on the steep part of the learning curve, but today got the serial port working, and we have a skeleton IDE working now. Much of the catalina IDE should port over from VB.NET. Already there is a significant speed increase compared with vb.net. I'm hoping the richtextbox will be able to do all the nice code colors like the vb.net one can. If we keep this open source, then we can add any tweaks to the Spin language that might be needed to get Big Spin to work.
Dr_A,
I can give you a list of names - can you find enough horse's heads to put in their beds?
Ross.
The prop seems to have gained a little momentum recently in exposure. Every bit helps.
A MMU would allow execution of multple programs that are built for the same memory space. It may even make it feasible to implement linux on the Prop.
Currently you are going to need GCC to compile it and run under Zog. So you might not live long enough to see the command prompt come up:)
Actually, adding an MMU to the cache logic wouldn't be that hard. The only problem is it would slow every access down since you'd have to go through an extra level of mapping. Also, with the tiny caches we've been using you'd likely get bad performance when switching processes since the entire cache would probably end up being overwritten.
I've run four concurrent Spin processes in spinix, but there's not enough hub RAM to run any more. BigSpin would allow me to run a few more processes until I run out of cogs. More concurrent processes could be added with cooperative process scheduling. spinix could support mxing Spin, C and PropBasic programs if the relocation issue was resolved. Of course, there would also need to be a consistent OS calling protocol, which would require defining a rendevous area with a consistent method for using it.
For VMCOG2 (the BIG VM version) I was considering the following strategies:
1) Single Address Space
- add VBASE to every incoming memory request, and check to make sure it does not exceed VLIMIT
This is the simplest, and as it would happen *before* the page present check, it would be the simplest way to go.
I estimate a 0.6us performance hit per memory access when the access is within the sandbox
2) Split I&D
- add CSEG to all code access, check against CLIMIT
- add DSEG to all data access, check against DLIMIT
- keeps stack in hub
This would have a similar speed penalty to (1), but would require ZOG / other clients to use different messages for accessing code or data
One advantage would be that it would be easy to specify a different number of pages in the working set for code and data
Note: Different virtual addresses could be used to distinguish between code & data, however that would slow down each access by an additional approx. 0.6us
3) Shades of Intel
- add CSEG to all code access, check against CLIMIT
- add DSEG to all data access, check against DLIMIT
- add SSEG to all stack access, check against SLIMIT
This would have a similar speed penalty to (2), but would require ZOG / other clients to use different messages for accessing code, data or stack
It would be easy to specify a different number of pages in the working set for code, data and stack.
Note: Different virtual addresses could be used to distinguish between code & data & stack, however that would slow down each access by an additional approx. 0.6us
4) Ghost Of Minis Past (GOMP: similar to first multi-tasking swapping kernels)
- single virtual space per process, with VBASE and VLIMIT like in 1
- processes are swapped every 10ms-30ms, all dirty pages flushed, and page table, VBASE, VLIMIT for next process loaded
Now all of the above solutions are technically feasible, and it would be cool to run a full multi tasking / multi user mini OS on the Prop, however let's not forget how much speed we would lose.
It could maybe run at the same speed as original Unix did on a low end PDP-11.
Note2: if limit checks were omitted, the overhead for adding a memory base to all incoming memory requests is a single ADD instruction
Note: The word "impossible" was mentioned above to trigger someone to spend countless hours proving me wrong! :-)
Spin version is 1465ms vs the C version on my PC at 195 micro seconds!
Have not tried it with zog yet.
I would not count on P2 being "just around the corner", just yet. Besides, by the time something useful for most of the users here, comes on line, it may be just that much longer.
Ray
https://mghdesigns.com/propeller/dnartc.html
What about XMM and QuadSPI memory, or even XMM and HyperRAM ?
Yes, either of those would need a suitable supporting hardware platform, but maybe Parallax is open to doing a P1 refresh ?
QuadSPI Flash memory is now very cheap.
I am interested in seeing results of P1 and HyperRAM - the BGA package is less than ideal for casual use, but it does give a shipload of RAM, and if Parallax made a module, that would come pre-mounted.
Yes, there is this one, being tested ...
http://forums.parallax.com/discussion/164923/new-boards-in-for-my-project-p1-max10m25