Simplifying "X" memory models
Jeff Martin
Posts: 760
Hi,
Recent events and internal discussions have led us to the decision to de-support and remove the XMM-SINGLE and XMM-SPLIT memory models in favor of the XMMC memory model in newer versions of Propeller GCC.
As such, we need to ask: Is anyone using either the XMM-SINGLE or XMM-SPLIT memory models in active projects for a specific reason that XMMC will not satisfy or improve upon?
Thank you.
Recent events and internal discussions have led us to the decision to de-support and remove the XMM-SINGLE and XMM-SPLIT memory models in favor of the XMMC memory model in newer versions of Propeller GCC.
As such, we need to ask: Is anyone using either the XMM-SINGLE or XMM-SPLIT memory models in active projects for a specific reason that XMMC will not satisfy or improve upon?
Thank you.
Comments
XMMC limits global data, arrays, stack and heap to whatever memory is available in the hub, so eliminating XMM-SINGLE & XMM-SPLIT means that PropGCC will be limited to less than 24KB data/heap (at least 8KB is used for the code cache).
I am personally not using XMM-SINGLE or XMM-SPLIT, however I thought I'd point out the limitation eliminating them will impose.
Bill
P2 external RAM access has been optimized in some way has it not?
I imagine these modes will be a lot more useful on the P2 if not essential to make use of all that ext RAM space in a any sensible way. (Would make a nice RAM disk I guess).
Looks like we need a mode for the P2 which is C code in HUB, executing super quick due to the new hubexec mode, and stack/data in external RAM.
Now you guys did it! With all of Yesterday's talk about C and Spin and non-professional P1 and professional P2s and C being a professional language and lacking Spin features in SimpleIDE, blah, blah, blah...they are going to pull X memory models from SimpleIDE and put them into ProIDE which you can purchase to support the "professional" P2 features. The money they raise with this will be used to add the requested Spin features to SimpleIDE. Then everyone will be happy - those professionals wanting a full featured C development environment can pay for that and the unprofessional Spin programmers among us can use the free tool! Just like all the other vendors!!!
(none of this is serious speculation, of course!)
Cool!! Or put the Forth kernel and core words in HubExec mode, put the user dictionary in external RAM and use each COGs internal memory for really fast stack space! You probably already considered that option!
The P2 can now seamlessly glide in and out of executing code in COG or code in HUB. You just have to jump in and out of the HUB address range and there you are.
This leads to the idea of XMM-TURBO mode:
In XMM-TURBO mode you can compile a huge C program some parts of which which live in HUB and big parts of which live in external memory. Code is compiled such that calls to external memory functions fire up whatever kernel is required to do that fetch/execute work. When leaving external memory code thing just drop back to executing from HUB. All done seamlessly by calling and returning.
With XMM-TURBO mode we can locate code that needs the speed in HUB, all else in ext RAM.
For bonus points combine that with the FCACHE idea so as to pus execution rate to 11.
Ha! You're funny; thanks for sharing. :-)
We've enhanced XMMC to allow for multiple-XMMC cogs to execute in an application. This was done in order to give a developer a path to continue on (with understandable limitations that come with it, of course) if he/she had first been developing a multi-cog application in a lower memory model and then ran out of memory. The XMM-SINGLE and XMM-SIMPLE modes, however, are not as compatible with this feature- leaving the developer open to dangerous pitfalls; this is a support issue.
Has anybody actually requested or used the capability to run two or more COGs from code in external memory?
My gut tells me that it would be so slow as to be totally pointless. Might as well combine whatever functionality those two tasks do into one on a single COG.
It's sounds like giving up the possibility to run huge code (XMM-SINGLE, XMM-SPLIT) in order to do something that no one will ever use.
Or is there more to this than I can see?
Understood. It was requested and it was pointed out that an existing "lmm" multicog development effort in a lesser memory model suddenly had no opportunity to expand to larger memory short of a complete rewrite.
Each greater model comes with caveats- speed being one of them. What's practical for a particular application has to be decided by the designer involved, of course, but having a sudden imposed road-block of "yes, you can switch to XMM, but only one of your LMM/CMM cogs can come along" seemed to be too unreasonable.
So far, no one in this thread has said that XMM-SINGLE/XMM-SPLIT is critical to their application's success.
But has been fun from time to time to compile some big old program and try it out on the Prop with the GadgetGanster 32MB card. No worries about code or data sizes or where anything lives. It's all out there in ext RAM.
The Espruino JavaScript interpreter for example. It really needs a P2 for speed though.
It does seem like giving up useful features to satisfy one odd case.
Noted. Thanks Heater!
As for the COGs in XMM mode, yes, I would be one of the persons that would experiment and probably use that setup. There have been a couple of instances where it would have been nice to use COGs in the XMM mode. Everybody keeps talking about the speed aspect, but since we do not have that as an option, at the moment, we will never know if it is of any use in an application.
Ray
at full speed (cogc driver for LED DMD 128x32, multiple channel wav player in LMM cog mode, driver for the shift registers).
I don't think, that the full program will fit into 32k HUB ram, so XMM is needed but I also think that all the data (especially double buffered dmd driver) will not fit into 32k HUB ram, so I will need additional SRAM.
But that's just an assumption.
Christian
Which is cool. Until one day you find you need some funky new feature of 2.x.x but you can't have it because some other funky feature in 1.x.x that you built your world around is no longer there.
I think you got me wrong. I planned to have all buffers, which needs to be accessed very fast, in HUB RAM forced by the HUBDATA annotation. A COGC driver will read the dmd buffer and controls the dmd. Also for audio buffers.
The other data, variables and objects should be in SRAM which is by default when using XMM-SPLIT.
I have hundreds of objects (lights, flasher, coils, switches), various state machines which mostly are singletons (which live the whole program lifecycle). My fear is, that the buffers plus the normal data will not fit into HUB, therefore
using XMM-SPLIT, which puts all data into sram except the data annotated with HUBDATA.
That should work, shouldn't it?
But doesn't make removing the XMM-SPLIT feature make the whole XMM feature useless? My experience is that the bigger the program is, the bigger the data will be. I'm absolutely no microcontroller developer, mainly Java/.NET,
but I have the experience, that the data maybe will grow linear to the code size. The more code, the more variables, objects you have (singletons, local variables, ...).
Providing XMM mode for code but restricting the data size to 32k may limit the code size in some way anyways...
It's not relevant to GCC, but I use Catalina XMM in a commercial product. It is a single cog and runs my RamBlade3 circuit (dedicated prop with 512KB parallel SRAM, no latches, and SD card). We hold a lot of data files on the SD card.
Hi Cluso
Catalina's SMALL memory model (XMM SRAM or XMM FLASH used for code, 32kb Hub RAM used for data and stack) and LARGE memory model (XMM SRAM or XMM FLASH used for code, XMM SRAM used for global data, 32Kb Hub RAM used for local variables and stack) will remain supported on all the XMM boards supported by Catalina.
FYI, I believe Catalina's LARGE mode corresponds to progcc's XMM-SINGLE mode, Catalina's LARGE mode plus FLASH corresponds to propgcc's to XMM-SPLIT mode, and Catalina's SMALL mode corresponds to XMMC.
Ross.
Ross - good news re/ external memory data remaining in Catalina!
Parallax education wants multiple-xmm cog C function capability, and they are getting it. It is up to Parallax to choose whether or not they can support that and XMM SINGLE/SPLIT modes simultaneously. Since Parallax does not promote C with their commercial customers (as we have learned recently), I don't see how they could possibly support more complicated C memory modes. Parallax education's abilities are better equipped than the commercial efforts because Andy and Jeff are a competent C programmers.
I like XMM-SINGLE model because it allows everything to live in an SRAM, but it does suffer a performance hit. XMM-SPLIT mode has been useful for experiments like running the Javascript interpreter. Whether any of those is really a valid usage of Propeller is partially the subject of this thread. I'm not really a fan of changing the existing behaviour of a program, but we are trying to adapt. One thing we need to consider going forward is what X modes if any will be used for P2 - most likely something will be supported by someone, because it's just our nature.
At this point I'm guessing that it should be easy enough to allow either multi-cog C function XMMC that works the same as CMM or LMM mode (a Parallax requirement) or a separate non multi-cog C function mode of operation for XMM SINGLE/SPLIT (same as today). There will just be a few more knobs to turn.
Cached XMMC is the fastest and most useful XMM technology. It happens to also allow, for example, keeping large chunks of time critical code in HUB memory when performance counts - a wonderful feature only possible with PropellerGCC. Of course any code can be overlaid into a COG - something only possible with the C compilers by design. I think the things we ALL offer are useful (some are more useful than others of course).
It's not really relevant to this topic, but I should point out that this is not true as a blanket statement - many of us find uncached parallel XMM much more useful. As fast (faster in some cases) than cached serial XMM and it doesn't consume any additional cogs or Hub RAM.
Ross.
Transfreak, Sort of true for all those little "book keeping" variables programs have. Many times of course we would expect the data to be far bigger than the code. Think graphics programs or in-memory databases or ... see below:
Guys,
Removing those external memory mode means no longer being able to run wonderful things like this:
http://www.megalith.co.uk/8086tiny/
An x86 / IBM PC emulator and MSDOS !
Which is of course essential for the future prosperity of Parallax
8086tiny might be one of those cases where we would like to see all the code in HUB (the emulator) and all the data in ext RAM (The x86 memory space and disk image). Assuming the compiled binary size can be kept down. It's only 20KB as an Intel executable.
After all, with pointers and such, C is the perfect language to get beginners into trouble already.