LMM-Thumb

ImageCraft · 2008-11-11 22:17

OK, all ye smart Propeller people. As you may know, we are prototyping on some ideas on XMM-C (external memory model) for Propeller (a plug in module with 128K bytes to mutli-megs of memory). The compiler change for that is actually minimal, as in close to zero. The trick is the hardware design.

Any case, I am also thinking going the other direction - the C compiler emitting a compact form of LMM. There was some ideas about LMM "Thumb" mode being bantered around before. The goal is of course to make the resulting program more compact, at the cost of higher interpretation overhead. As we have a 5x-10x speed improvement over Spin (and that is not counting using FCACHE), if we decrease the code size with a proportional slowdown, it may still be very attractive to some users. The normal LCC mode is of course still available.

In fact, it's probably a case that the NC non-commercial use version only supports LMM-Thumb, and the STD version supports both.

Just thinking aloud right now. Appreciate any comments and feedback, and also any technical insight on the LMM-Thumb design.

// richard

Bill Henning · 2008-11-11 23:46

Why don't you target the spin byte codes? That would potentially allow mixed C/spin code...

ImageCraft said...
OK, all ye smart Propeller people. As you may know, we are prototyping on some ideas on XMM-C (external memory model) for Propeller (a plug in module with 128K bytes to mutli-megs of memory). The compiler change for that is actually minimal, as in close to zero. The trick is the hardware design.

Any case, I am also thinking going the other direction - the C compiler emitting a compact form of LMM. There was some ideas about LMM "Thumb" mode being bantered around before. The goal is of course to make the resulting program more compact, at the cost of higher interpretation overhead. As we have a 5x-10x speed improvement over Spin (and that is not counting using FCACHE), if we decrease the code size with a proportional slowdown, it may still be very attractive to some users. The normal LCC mode is of course still available.

In fact, it's probably a case that the NC non-commercial use version only supports LMM-Thumb, and the STD version supports both.

Just thinking aloud right now. Appreciate any comments and feedback, and also any technical insight on the LMM-Thumb design.

// richard

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com - a new blog about microcontrollers

ImageCraft · 2008-11-12 02:26

Is there a comprehensive list?

Peter Verkaik · 2008-11-12 08:56

Richard,
Would this external memory be addressed as the larger internal memory when Prop II
comes out or is more like using an SD card for memory?

Also, is there a way to generate an array of 512 longs that can be used
to run a cog without LMM? I believe constructing such an array is one of the
ways to run independant cog code. Help file quote:

Another method of specifying Cog code is by using a table of hex code, e.g. 

long VGA_Text_Array[noparse][[/noparse]] = { 0xa0fda84f, 0xa0fdb008, 0x5cbdaad4, 0xe4fdb002, 0xa0bdcce7, 0xa0fdca00, 0x627dd804, 0x083d9bf0, 0xec7dba0c, 0xa0bdb0dd, 0x54fc8580, 0x5cfc9c3c, ... 
This is typically gotten by dumping out the content of a native .binary file, for example,
a driver object in the Spin Object Exchange forum. In this case, you just need to cast the argument to cognew_native: id = cognew_native((void (*)())VGA_Text_Array, (void*)&gVgaText);

Could ICC not generate this table from C code without LMM parts? If I see how much code can be
generated with SX/B in a SX18/28 page (512 words, comparable to 512 longs in a cog) then that
allows us to write cog code without using assembly or manually extracting hexcode from a binary
image created by proptool (and where still assembly is necessary).

regards peter

ImageCraft · 2008-11-12 10:51

Peter, I think you are asking whether the C compiler can generate native PASM code without LMM, and the answer is currently no. Most things that you want to dedicate a Cog to running is probably too time critical or Cog RAM limited to be in C (I could be wrong). We may add that later on though. There are thousands things that we can be working on, so we need to be carefully in gauging what things are needed and wanted by most people.

Also, remember that a Cog 512 word "page" is not quite the same as code page in other micros. A Cog takes time to load 512 words. It's not a paging scheme per se.

***
As for XMM, I won't want to speculate too much on how things will look for Prop II as I have no inside info on Prop II. However, I suspect the changes between LMM to XMM is minimal, and the change from LMM/XMM to Prop II is also minimal, but that's just a guess.

Cluso99 · 2008-11-12 11:59

I am seeing an increase in performance of 25% in my ClusoInterpreter. There is some space left to unravel some bytecodes to increase speed, but I have no profiling to know where the speed increase would be the most beneficial. A much greater increase in performance is met in the mathematics section. I haven't released the code as it is not fully debugged and I have been distracted. I have also written a fast LMM style interpreter and overlay loader. My ClusoDebugger also uses an LMM model.

My work with overlays leads me to believe that will be faster than LMM execution. I was achieving the sweet spot for loading, including variable length.

However, I don't know anything about your code and what could be achieved. Suffice to say that the code footprint stored externally will have more impact than anything else so this is where the work will need to be done.

So I guess what I am saying is
1. External code must be small (highest priority)
2. A mix of cog interpreter, overlay and LMM for the code to interpret the bytecode/wordcode or whatever.

What I see with the spin interpreter is that a lot of time is spent pushing and popping data. The architecture doesn't allow this to be circumvented, and pushing and popping is done to hub so that impacts performance immensely. (I am not criticising spin - we are just pushing the envelope).

Ron Sutcliffe · 2008-11-12 20:11

Having the option to compile code to native PASM without LMM is something I would like to see. I don’t see much value in considering extended memory hardware, when at some stage we are going to see Prop II.

The ability to use ICC to compile driver code could be used for both versions of Prop in the future. How compact would ICC compiler native PASM code be ?

Ron

Mike Huselton · 2008-11-12 21:12

Cluso,

[noparse][[/noparse]quote] I have also written a fast LMM style interpreter and overlay loader. My ClusoDebugger also uses an LMM model.

Have you released any copy or revision this work on the forum?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH - Electronics: Engineer - Programming: Professional

Post Edited (Quantum) : 11/12/2008 9:18:24 PM GMT

Cluso99 · 2008-11-13 03:34

The ClusoDebugger is available here
PASM and SPIN debug with Zero Footprint http://forums.parallax.com/forums/default.aspx?f=25&m=290946&p=2

The Overlay loader is here
Assembly Oververlay Loader for Cog FAST (renamed & released) http://forums.parallax.com/forums/default.aspx?f=25&m=272823

I have not released the ClusoInterpreter yet :-( but here is some discussion
Spin Interpreter - Faster??? http://forums.parallax.com/forums/default.aspx?f=25&m=273607&p=1&ord=d

Mike Huselton · 2008-11-13 03:59

Thanks, Cluso. That is exactly what I needed.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH - Electronics: Engineer - Programming: Professional

LMM-Thumb

Comments