Simplifying "X" memory models

Jeff Martin · 2014-01-27 10:11

Hi,

Recent events and internal discussions have led us to the decision to de-support and remove the XMM-SINGLE and XMM-SPLIT memory models in favor of the XMMC memory model in newer versions of Propeller GCC.

As such, we need to ask: Is anyone using either the XMM-SINGLE or XMM-SPLIT memory models in active projects for a specific reason that XMMC will not satisfy or improve upon?

Thank you.

Bill Henning · 2014-01-27 18:18

Jeff,

XMMC limits global data, arrays, stack and heap to whatever memory is available in the hub, so eliminating XMM-SINGLE & XMM-SPLIT means that PropGCC will be limited to less than 24KB data/heap (at least 8KB is used for the code cache).

I am personally not using XMM-SINGLE or XMM-SPLIT, however I thought I'd point out the limitation eliminating them will impose.

Bill

Jeff Martin wrote: »

Hi,

Recent events and internal discussions have led us to the decision to de-support and remove the XMM-SINGLE and XMM-SPLIT memory models in favor of the XMMC memory model in newer versions of Propeller GCC.

As such, we need to ask: Is anyone using either the XMM-SINGLE or XMM-SPLIT memory models in active projects for a specific reason that XMMC will not satisfy or improve upon?

Thank you.

David Betz · 2014-01-27 19:44

Bill Henning wrote: »

Jeff,

XMMC limits global data, arrays, stack and heap to whatever memory is available in the hub, so eliminating XMM-SINGLE & XMM-SPLIT means that PropGCC will be limited to less than 24KB data/heap (at least 8KB is used for the code cache).

I am personally not using XMM-SINGLE or XMM-SPLIT, however I thought I'd point out the limitation eliminating them will impose.

Bill

Yes, that is basically correct. I believe that jazzed has experimented with smaller caches including as small as 2k with decent performance for some programs so it might be possible to have up to 30k of data space. Another way of looking at this is that XMMC provides the extra code space that many other MCUs have internally but also provides more RAM the most of them have. XMMC certainly doesn't make a Propeller into a desktop PC like XMM-SINGLE could with a 32MB SDRAM module connected but it does bring P1 into the league of other similar MCUs wtih only the addition of a single flash chip.

SRLM · 2014-01-28 03:43

I've never had a need for XMM(anything) mode...

Heater. · 2014-01-28 05:12

What is the motivation for wanting to remove features? They are there and they work, I guess.

Heater. · 2014-01-28 05:40

Also, the P2 is to be shipped on a board with 32MB RAM if I understand the plan correctly.
P2 external RAM access has been optimized in some way has it not?

I imagine these modes will be a lot more useful on the P2 if not essential to make use of all that ext RAM space in a any sensible way. (Would make a nice RAM disk I guess).

Looks like we need a mode for the P2 which is C code in HUB, executing super quick due to the new hubexec mode, and stack/data in external RAM.

mindrobots · 2014-01-28 05:57

Heater. wrote: »

Also, the P2 is to be shipped on a board with 32MB RAM if I understand the plan correctly.
P2 external RAM access has been optimized in some way has it not?

I imagine these modes will be a lot more useful on the P2 if not essential to make use of all that ext RAM space in a any sensible way. (Would make a nice RAM disk I guess).

Looks like we need a mode for the P2 which is C code in HUB, executing super quick due to the new hubexec mode, and stack/data in external RAM.

Now you guys did it! With all of Yesterday's talk about C and Spin and non-professional P1 and professional P2s and C being a professional language and lacking Spin features in SimpleIDE, blah, blah, blah...they are going to pull X memory models from SimpleIDE and put them into ProIDE which you can purchase to support the "professional" P2 features. The money they raise with this will be used to add the requested Spin features to SimpleIDE. Then everyone will be happy - those professionals wanting a full featured C development environment can pay for that and the unprofessional Spin programmers among us can use the free tool! Just like all the other vendors!!!

(none of this is serious speculation, of course!)

Looks like we need a mode for the P2 which is C code in HUB, executing super quick due to the new hubexec mode, and stack/data in external RAM.

Cool!! Or put the Forth kernel and core words in HubExec mode, put the user dictionary in external RAM and use each COGs internal memory for really fast stack space!

You probably already considered that option!

Heater. · 2014-01-28 06:11

mindrobots,

Or put the Forth kernel....You probably already considered that option!

I've also considered poking myself in the eye with a hot soldering iron from time to time.

Heater. · 2014-01-28 06:20

A little challenge for the propgcc guys:

The P2 can now seamlessly glide in and out of executing code in COG or code in HUB. You just have to jump in and out of the HUB address range and there you are.

This leads to the idea of XMM-TURBO mode:

In XMM-TURBO mode you can compile a huge C program some parts of which which live in HUB and big parts of which live in external memory. Code is compiled such that calls to external memory functions fire up whatever kernel is required to do that fetch/execute work. When leaving external memory code thing just drop back to executing from HUB. All done seamlessly by calling and returning.

With XMM-TURBO mode we can locate code that needs the speed in HUB, all else in ext RAM.

For bonus points combine that with the FCACHE idea so as to pus execution rate to 11.

Jeff Martin · 2014-01-28 07:15

mindrobots wrote: »

Now you guys did it! With all of Yesterday's talk about C and Spin and non-professional P1 and professional P2s and C being a professional language and lacking Spin features in SimpleIDE, blah, blah, blah...they are going to pull X memory models from SimpleIDE and put them into ProIDE which you can purchase to support the "professional" P2 features. The money they raise with this will be used to add the requested Spin features to SimpleIDE. Then everyone will be happy - those professionals wanting a full featured C development environment can pay for that and the unprofessional Spin programmers among us can use the free tool! Just like all the other vendors!!!

(none of this is serious speculation, of course!)

Ha! You're funny; thanks for sharing. :-)

Jeff Martin · 2014-01-28 07:24

Heater. wrote: »

What is the motivation for wanting to remove features? They are there and they work, I guess.

We've enhanced XMMC to allow for multiple-XMMC cogs to execute in an application. This was done in order to give a developer a path to continue on (with understandable limitations that come with it, of course) if he/she had first been developing a multi-cog application in a lower memory model and then ran out of memory. The XMM-SINGLE and XMM-SIMPLE modes, however, are not as compatible with this feature- leaving the developer open to dangerous pitfalls; this is a support issue.

Heater. · 2014-01-28 08:00

That sounds a bit nuts. If I may be so bold.

Has anybody actually requested or used the capability to run two or more COGs from code in external memory?

My gut tells me that it would be so slow as to be totally pointless. Might as well combine whatever functionality those two tasks do into one on a single COG.

It's sounds like giving up the possibility to run huge code (XMM-SINGLE, XMM-SPLIT) in order to do something that no one will ever use.

Or is there more to this than I can see?

Jeff Martin · 2014-01-28 08:34

Heater. wrote: »

That sounds a bit nuts. If I may be so bold.

Has anybody actually requested or used the capability to run two or more COGs from code in external memory?

My gut tells me that it would be so slow as to be totally pointless. Might as well combine whatever functionality those two tasks do into one on a single COG.

It's sounds like giving up the possibility to run huge code (XMM-SINGLE, XMM-SPLIT) in order to do something that no one will ever use.

Or is there more to this than I can see?

Understood. It was requested and it was pointed out that an existing "lmm" multicog development effort in a lesser memory model suddenly had no opportunity to expand to larger memory short of a complete rewrite.

Each greater model comes with caveats- speed being one of them. What's practical for a particular application has to be decided by the designer involved, of course, but having a sudden imposed road-block of "yes, you can switch to XMM, but only one of your LMM/CMM cogs can come along" seemed to be too unreasonable.

So far, no one in this thread has said that XMM-SINGLE/XMM-SPLIT is critical to their application's success.

Heater. · 2014-01-28 08:50

I have no essential need for any XMM so I should not shout one way or the other.

But has been fun from time to time to compile some big old program and try it out on the Prop with the GadgetGanster 32MB card. No worries about code or data sizes or where anything lives. It's all out there in ext RAM.

The Espruino JavaScript interpreter for example. It really needs a P2 for speed though.

It does seem like giving up useful features to satisfy one odd case.

Jeff Martin · 2014-01-28 08:55

Heater. wrote: »

I have no essential need for any XMM so I should not shout one way or the other.

But has been fun from time to time to compile some big old program and try it out on the Prop with the GadgetGanster 32MB card. No worries about code or data sizes or where anything lives. It's all out there in ext RAM.

The Espruino JavaScript interpreter for example. It really needs a P2 for speed though.

It does seem like giving up useful features to satisfy one odd case.

Noted. Thanks Heater!

Rsadeika · 2014-01-28 09:09

Last week when I was using the C3, I tried using XMM-SINGLE/XMM-SPLIT modes, but I ran into some problems. In one case my test program would run in C3F XMMC combination, but when I tried to run it in the other modes, it would not work correctly. It just might be that my programming skills are not up to par and I, maybe, really do not understand how to use those modes. If you get rid of those modes what would be a compelling reason to buy more C3 boards? unless of course you are discontinuing the C3, and supporting just the Activity and PropBOE boards.

As for the COGs in XMM mode, yes, I would be one of the persons that would experiment and probably use that setup. There have been a couple of instances where it would have been nice to use COGs in the XMM mode. Everybody keeps talking about the speed aspect, but since we do not have that as an option, at the moment, we will never know if it is of any use in an application.

Ray

jazzed · 2014-01-28 09:20

The propellergcc release_1_0 branch will always have the current features, and will also remain in the current repository. It will become inactive however when we move to release_2_0.

trancefreak · 2014-01-28 12:19

I might need multiple XMM cogs with additional SRAM for data. I'm developing a complete pinball application with some tasks which don't have to run at full speed (so maybe XMM cogs) and tasks which need to run
at full speed (cogc driver for LED DMD 128x32, multiple channel wav player in LMM cog mode, driver for the shift registers).

I don't think, that the full program will fit into 32k HUB ram, so XMM is needed but I also think that all the data (especially double buffered dmd driver) will not fit into 32k HUB ram, so I will need additional SRAM.
But that's just an assumption.

Christian

jazzed · 2014-01-28 12:33

trancefreak wrote: »

I don't think, that the full program will fit into 32k HUB ram, so XMM is needed but I also think that all the data (especially double buffered dmd driver) will not fit into 32k HUB ram, so I will need additional SRAM.

It's not exactly clear that you would need XMM SINGLE/SPLIT pointing to the buffer ram. By managing the buffer with your own driver and API the performance would be much better. PropellerGCC XMM SINGLE/SPLIT models are typically beneficial for large programs with global data chunks declared all over the memory map.

Heater. · 2014-01-28 13:37

jazzed,

The propellergcc release_1_0 branch will always have the current features, and will also remain in the current repository. It will become inactive however when we move to release_2_0.

Which is cool. Until one day you find you need some funky new feature of 2.x.x but you can't have it because some other funky feature in 1.x.x that you built your world around is no longer there.

David Betz · 2014-01-28 14:34

trancefreak wrote: »

I might need multiple XMM cogs with additional SRAM for data. I'm developing a complete pinball application with some tasks which don't have to run at full speed (so maybe XMM cogs) and tasks which need to run
at full speed (cogc driver for LED DMD 128x32, multiple channel wav player in LMM cog mode, driver for the shift registers).

I don't think, that the full program will fit into 32k HUB ram, so XMM is needed but I also think that all the data (especially double buffered dmd driver) will not fit into 32k HUB ram, so I will need additional SRAM.
But that's just an assumption.

Christian

Unfortunately, neither XMM-SINGLE nor XMM-SPLIT will help you if you need to use the external memory for drivers. Only the propgcc cache driver will be able to access the external memory. You won't be able to use it for video buffers or with other COG.

David Betz · 2014-01-28 14:40

Heater. wrote: »

What is the motivation for wanting to remove features? They are there and they work, I guess.

The XMM modes that put data in external memory are not very compatible with running multiple XMM COGs. I guess we could modify the library so that it would refuse to start a second XMM COG in the -SPLIT or -SINGLE modes. This is really a support issue more than anything else. If we keep these XMM data modes we have to be prepared to explain to people what combinations of things work in what modes. While -SPLIT and -SINGLE modes will work with multiple XMM COGs, they introduce very difficult to manage cache coherency problems that may limit their usefulness. I guess this boils down to whether we think we should be trying to protect the user by eliminating options that we think might get him/her into trouble.

trancefreak · 2014-01-28 15:40

David Betz wrote: »

Unfortunately, neither XMM-SINGLE nor XMM-SPLIT will help you if you need to use the external memory for drivers. Only the propgcc cache driver will be able to access the external memory. You won't be able to use it for video buffers or with other COG.

I think you got me wrong. I planned to have all buffers, which needs to be accessed very fast, in HUB RAM forced by the HUBDATA annotation. A COGC driver will read the dmd buffer and controls the dmd. Also for audio buffers.
The other data, variables and objects should be in SRAM which is by default when using XMM-SPLIT.

I have hundreds of objects (lights, flasher, coils, switches), various state machines which mostly are singletons (which live the whole program lifecycle). My fear is, that the buffers plus the normal data will not fit into HUB, therefore
using XMM-SPLIT, which puts all data into sram except the data annotated with HUBDATA.

That should work, shouldn't it?

But doesn't make removing the XMM-SPLIT feature make the whole XMM feature useless? My experience is that the bigger the program is, the bigger the data will be. I'm absolutely no microcontroller developer, mainly Java/.NET,
but I have the experience, that the data maybe will grow linear to the code size. The more code, the more variables, objects you have (singletons, local variables, ...).

Providing XMM mode for code but restricting the data size to 32k may limit the code size in some way anyways...

Cluso99 · 2014-01-28 16:33

Jeff,
It's not relevant to GCC, but I use Catalina XMM in a commercial product. It is a single cog and runs my RamBlade3 circuit (dedicated prop with 512KB parallel SRAM, no latches, and SD card). We hold a lot of data files on the SD card.

RossH · 2014-01-28 17:01

Cluso99 wrote: »

Jeff,
It's not relevant to GCC, but I use Catalina XMM in a commercial product. It is a single cog and runs my RamBlade3 circuit (dedicated prop with 512KB parallel SRAM, no latches, and SD card). We hold a lot of data files on the SD card.

Hi Cluso

Catalina's SMALL memory model (XMM SRAM or XMM FLASH used for code, 32kb Hub RAM used for data and stack) and LARGE memory model (XMM SRAM or XMM FLASH used for code, XMM SRAM used for global data, 32Kb Hub RAM used for local variables and stack) will remain supported on all the XMM boards supported by Catalina.

FYI, I believe Catalina's LARGE mode corresponds to progcc's XMM-SINGLE mode, Catalina's LARGE mode plus FLASH corresponds to propgcc's to XMM-SPLIT mode, and Catalina's SMALL mode corresponds to XMMC.

Ross.

Bill Henning · 2014-01-28 17:39

I think it is a very bad move to remove XMM-SPLIT and XMM-SINGLE from propgcc, it will significantly reduce its usefulness.

Ross - good news re/ external memory data remaining in Catalina!

jazzed · 2014-01-28 19:00

I'd like to say that I don't really care one way or another how this works out other than Parallax will get what they want. I personally believe that Parallax doesn't care much for the XMM modes because P8X32a Propeller is not supposed to be anything more than a deterministic and flexible micro-controller.

Parallax education wants multiple-xmm cog C function capability, and they are getting it. It is up to Parallax to choose whether or not they can support that and XMM SINGLE/SPLIT modes simultaneously. Since Parallax does not promote C with their commercial customers (as we have learned recently), I don't see how they could possibly support more complicated C memory modes. Parallax education's abilities are better equipped than the commercial efforts because Andy and Jeff are a competent C programmers.

I like XMM-SINGLE model because it allows everything to live in an SRAM, but it does suffer a performance hit. XMM-SPLIT mode has been useful for experiments like running the Javascript interpreter. Whether any of those is really a valid usage of Propeller is partially the subject of this thread. I'm not really a fan of changing the existing behaviour of a program, but we are trying to adapt. One thing we need to consider going forward is what X modes if any will be used for P2 - most likely something will be supported by someone, because it's just our nature.

At this point I'm guessing that it should be easy enough to allow either multi-cog C function XMMC that works the same as CMM or LMM mode (a Parallax requirement) or a separate non multi-cog C function mode of operation for XMM SINGLE/SPLIT (same as today). There will just be a few more knobs to turn.

Cached XMMC is the fastest and most useful XMM technology. It happens to also allow, for example, keeping large chunks of time critical code in HUB memory when performance counts - a wonderful feature only possible with PropellerGCC. Of course any code can be overlaid into a COG - something only possible with the C compilers by design. I think the things we ALL offer are useful (some are more useful than others of course).

RossH · 2014-01-28 19:37

jazzed wrote: »

Cached XMMC is the fastest and most useful XMM technology.

It's not really relevant to this topic, but I should point out that this is not true as a blanket statement - many of us find uncached parallel XMM much more useful. As fast (faster in some cases) than cached serial XMM and it doesn't consume any additional cogs or Hub RAM.

Ross.

Heater. · 2014-01-28 20:09

David,

I guess this boils down to whether we think we should be trying to protect the user by eliminating options that we think might get him/her into trouble.

Do what? This is a C compiler right?

Transfreak,

...that the data maybe will grow linear to the code size...

Sort of true for all those little "book keeping" variables programs have. Many times of course we would expect the data to be far bigger than the code. Think graphics programs or in-memory databases or ... see below:

Guys,

Removing those external memory mode means no longer being able to run wonderful things like this:

http://www.megalith.co.uk/8086tiny/

An x86 / IBM PC emulator and MSDOS !

Which is of course essential for the future prosperity of Parallax

8086tiny might be one of those cases where we would like to see all the code in HUB (the emulator) and all the data in ext RAM (The x86 memory space and disk image). Assuming the compiled binary size can be kept down. It's only 20KB as an Intel executable.

jazzed · 2014-01-28 20:33

RossH wrote: »

It's not really relevant to this topic, but I should point out that this is not true as a blanket statement - many of us find uncached parallel XMM much more useful. As fast (faster in some cases) than cached serial XMM and it doesn't consume any additional cogs or Hub RAM.

Ross.

Chip does want to sell more propellers, so I guess that is useful too. Performance (and COG use) is more murky than usefulness. A board solution that only requires one Propeller is more useful to most of us because that's all Parallax sells.

David Betz · 2014-01-28 20:52

Heater. wrote: »

David,

Do what? This is a C compiler right?

Good point! :-)
After all, with pointers and such, C is the perfect language to get beginners into trouble already.

Simplifying "X" memory models

Comments