C on the Propeller

RossH · 2010-07-14 06:29

@all,

This is in response to heater's comments in OBC's thread http://forums.parallax.com/showthread.php?p=922187

I thought I'd better start a new thread rather than clog up OBC's one any further (which was really on another topic). Feel free to contribute to this thread instead. I apologize in advance for the length of this post.

Basically, heater thought he'd struck a nerve with me by saying C was not a suitable language for the Propeller. Naturally, I disagree. However, while he didn't strike a nerve, he did hit a slightly tender spot, which I now can't help but keep poking

Heater's point was mainly that since LMM was not "native" then LMM-based languages wouldn't get used by industrial Propeller users. Well, let's consider this in more detail. On the Prop we basically have three choices of language type (plus some odd mixtures of these three):

- Assembly based languages (like PASM and AAC).
- LMM languages (like PropBASIC and C)
- Byte coded languages (like Zog and SPIN)

I accept that only assembly language can truly be considered "native" (this is true of all chips except for those occasional ill-fated ventures to implement high level languages directly in silicon - such as the various Java chips that come and go occasionally). However, I think LMM languages have a good claim to be considered a far more "native" choice on the Propeller than any byte coded language - including SPIN. In fact, given that LMM C is many times faster than any of the byte-coded languages (including SPIN), and many times less complex than PASM, LMM C is a far better choice for most industrial users than either a byte-coded language or an assembly language.

First of all, there are plenty of situations where assembly language is simply not a viable solution on any micro. Despite the Propeller having a very nice native instruction set, it is still well beyond the capabilities of most people to write significant sized applications in assembly language - and even if this could be done the decision is usually made not to do so because of the high costs associated with maintaining such programs. On the Propeller you also have the additional problem that assembly language programs must be segmented into chunks of 496 instructions or less. Also, the Propeller offers no built-in inter-cog communications, which means you have to do it all yourself - usually via the hub - which has an impact on speed which we will explore in more detail below.

First off, a bit more about LMM not being "native". LMM programs use only PASM instructions - they just run them in a kind of "big cog" environment to overcome the built-in PASM 512 cog address limit. Of course, some PASM instructions (like jumps) that have a built in range of only those 512 addresses can only be used for local jumps, and have to be augmgented by non-PASM equivalents for longer jumps. Aside from speed (which I'll discuss below), that's the main difference between LMM and PASM. But this parallels very closely what happens in many other micros in any case - they often have both short and long addressing modes. Should the use of long addressing modes on such micros be considered somewhow "less native" than the use of short addressing modes? Of course not! So the difference between LMM and PASM basically comes down just to speed.

The difference between byte-coded languages and PASM is, of course, much more significant - with the same byte code being able to be executed equally well on just about any processor, and doing very little that is in any way Propeller-specific (except perhaps where the bytecodes parallel PASM instructions on a 1-for-1 basis, like the SPIN "cognew" etc). This is not anywhere near as "native" as LMM, where all LMM instructions ARE actual PASM instructions.

So - now some more about speed. Heater quite rightly said that people expect C to be fast, and of course LMM C is already on the order of 4 times faster than a byte-coded equivalent SPIN program - so far so good!. But is this fast enough? While no-one expects C to be quite as fast as assembly, ideally it should probably come closer than that. So how fast could LMM be on the Propeller compared to PASM? Looking at the basic LMM loop, people (inlcuding me) often say that LMM PASM is always going to be something like between 4 times and 8 times slower than the equivalent PASM - but this is not really the case. This ratio only applies to purely internal cog operations. Anytime you need to access hub RAM, LMM typically takes two consecutive hub instructions while PASM takes one - but in the case of consecutive hub operations this means that LMM may only be 2 times slower than PASM, not 4 or 8. Also, we should compare equivalent algorithms. Unless you are writing a device driver (which are usually still written in assembly even when the rest of the system is written in C), most programs will actually have significant hub RAM interactions - and in such cases (as we have seen) LMM may only be 2 times slower. So what's the best possible outcome? - well, it's hard to say precisely - but with a very good code generator, and while executing an algorithm that requires access to significant hub RAM space (a matrix multiplication algorithm might be a good example), then we should be able to get C to execute closer to 2 times PASM speed than 4 or 8 times. Not too shabby! Even better, when some parts of the program (like device drivers or library functions) continue to be written in assembly language the overall program execution time could in fact be less than two times the equivalent PASM program. This is certainly good enough to be seriously considered as a viable alternative to writing the whole application in assembly language - especially as users are likely to already have the algorithms they need fully implemented and debugged in C (e.g. a matrix maths library).

Now a bit about space. An LMM program should occupy about the same amount of space as a PASM program that does the same job, but both of these will occupy more space than the equivalent byte-coded program. But how much more space? Again, the temptation is to say it takes at least four times as much (since each LMM instruction is one - or sometimes two - longs, whereas each byte code is a byte). But this is also not the case. Even byte coded languages have operands - so they are not all one byte instructions, they are often 2, 3, 4 or even 5 bytes. Also, we have to again compare equivalent programs, and consider that not all programs are code space anyway - most have a significant data component that is the same size in any language. Again, with a good code generator, and also a good choice of the things we implement as LMM "primitives", it should be possible to get LMM programs that are less than 4 times the size of the equivalent byte-coded programs. How much less? Again, difficult to say - but approaching something like 2 times as much space may be possible.

Finally, does size really matter? You can buy "another manufacturer's" micro with as little as 0.5k of RAM - and people routinely program such micros in C, even when they only run at a few Mhz. So even if programming the Propeller in LMM C takes betwen 2 and 4 times as much RAM as programming in byte coded C, and is between 2 and 4 times slower, we should still be able to accomodate programs written in C that would require between 8kb and 16kb in "another manufacturer's" chips - even if their assembly language were as efficient as a byte code (which is unlikely).

While I agree that LMM on the Prop I is not as fast as we might ideally like, in situations where PASM is ruled out (basically any application that can't trivially be segmented into a small number of chunks of 496 instructions or less), and SPIN is too slow - then LMM C is the logical and natural choice. Such a solution may only be twice as large as pure byte code, and may only be twice as slow as pure PASM. Pricing issues aside, if you think that a typical industrial micro user wouldn't think this was an entirely reasonable tradeoff, then I can only say that I am one, and I beg to differ!

Of course, price is always an issue that can't be ignored, and also more work needs to be done on the existing LMM C code generators to make them as efficient as possible, but I firmly believe that the problem that many people seem to percieve with C on the Propeller is actually a mindset problem, and not a technical problem at all.

In fact, given the severe size and complexity limitations of PASM, and the severe performance limitations of SPIN, I've come to think of LMM as the true "native" assembly language of the Prop, while PASM is more like a microcode that can be tweaked at will (anybody remember microcode?).

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

Post Edited (RossH) : 7/14/2010 6:42:59 AM GMT

markaeric · 2010-07-14 07:20

I don't understand how C with LMM can't be considered a legitimate programming language option on the propeller. There are not many options when it comes bypassing the 496 Longs limitations of cog ram, so a speed hit can in many situations be considered a fair trade off. Sure, C on the prop might not act as it does on other micros, but then again, the prop is significantly different from other micros anyways, so an apples to apples comparison will never be truly possible. Just because it requires some overhead to go beyond the cog memory limits doesn't mean it's somehow not C.

Don't all modern processors still contain microcode? I'd imagine that the start sequence code in ROM could be considered microcode of sorts. I however would consider LMM to be the microcode in your example Ross, rather than PASM.

RossH · 2010-07-14 07:28

@markaeric,

heater's point was less about whether it was real C and more about whether it was useful C.

My microcode analogy was more about the various LMM "primitives" that you choose to implement in your "big cog" LMM kernel. For example, the Catalina kernel implements a single LMM PASM instruction that can push up to 24 cog registers onto the stack (and another that can restore them all). I have rewritten that function several times to improve it's speed - but provided I don't change the overall functionality this does not affect Catalina programs in any way, and I don't need to modify the compiler or recompile any programs (just relink them with the new kernel).

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

markaeric · 2010-07-14 07:52

Ok, I see what you're saying.

I admit that I'm not much of a programmer. I tried to wet my beak in C several years back, but nothing ever came of it. With that said, I have not tried Catalina, and can't comment out of first (or even second) hand experience. If it's a full (or close to) implementation of C, what I said before still hold true. How could it not be useful? For people that are most comfortable in a C environment is enough proof that it can be considered useful. That's just my thought, for what it's worth.

heater · 2010-07-14 09:21

I should not really do this but I have to quote myself from the other thread to set the context:

heater said...

Oops Ross, I hit a nerve, sorry.

OK: "The Propeller architecture is not at all suited to C."

The way I see it is:
1) C compiled to native PASM is a no go. The instruction set does not suit it and there is no space in the COGs.

Well there is no 2). If you can't compile to the hardware then that's it. Game's over for C in the industrial sector.

As work arounds one can compile to byte codes or such. That's a no go as well as it is too slow. People accept that for Java I don't know why not for C.

The Prop is special in that one can compile to "almost native" using LMM. But that still it lacks the expected speed of C and has a large code size.

Now don't get me wrong, I think Catalina (and ICC) is brilliant. I would happily ignore 1) above and make the trade off of accepting LMM to get to work in C on the Prop. It's just that I'm not sure that industrial users are quite so willing.

On Prop II Catalina will be great. I think Parallax should provide it out of the box along with Spin.

My point was not about "real C and more about whether it was useful C" but more about the perceptions of C on the Prop as observed by those who are used to C on pretty much any other platform. Note my references to "industrial users".

I like your comparison of LMM with short vs long addressing modes on other chips. Basically you are making the case that the RDLONG in the LMM loop is like using "FAR" pointer in good old segmented Intel code. Where the comparison breaks down is that those chips did not execute code directly from those FAR addresses.

I also like the idea of looking at LMM as the "microcode" of the machine. With that idea in place LMM is the native instruction set.

Which leads me to a bizarre thought:

Imagine that Chip was a bit more "old school" in his thinking when designing the Prop environment.
1) Imagine he wanted it to be programmable in C from the get go.
2) Imagine he thought of LMM, designed a suitable kernel and targeted the GNU C compiler at that kernel.
3) Imagine that LMM kernel was in ROM and started up at boot time instead of the current Spin interpreter.
4) Imagine the Prop tool provided that nice easy to use IDE for C and PASM coding and downloading.

In that scenario LMM would really be the native instruction set of the machine as defined by Parallax. C would have been used from the beginning.

This whole debate about acceptance or not of C on the Prop would never have arisen in the first place.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Cluso99 · 2010-07-14 09:55

RossH: I agree that C is viable on the prop.

Often we are comparing 20MHz micros to the prop. Now, in reality it is executing·PASM instructions at 20MHz (80MHz / 4 clocks per instruction). The instructions on the prop are more powerful and I hope this translates into faster code for the prop. Now, in steps LMM to take that away, ok.

The next thing is that most micros that are being compared to the prop may have 32KB flash but usually only have 2-4KB ram. The prop until recently had a lot more RAM than RAM + FLASH/EEPROM on·most other micros. This is a feature of the prop because the RAM can be used for both and can be loaded from an external EEPROM or microSD, so it can be more flexible. So, the EEPROM can be both a disadvantage and an advantage.

Another thing to be considered are the Objects running in other cogs. They are in fact Intelligent Peripherals and so execute code that would otherwise be running in the main code. So the main code does not need to handle interrupts which are a burden on any micro, not only in speed but also in complexity. So, in fact, the C code in LMM may not be that bad either.

Now perhaps getting a little off-topic, but as you know I did a faster interpreter. There is no reason that now this couldn't be taken further to expand on the work Dave & co have done with SpinLMM to improve the performance even further. I gained a significant increase in the maths routines once I obtained code space by doing the decoding via a hub table. I was on the way to unravelling some of the other code with the space I made free. Of course, to really make this beneficial, we need some profiling to see where the time delays are.

I would like to see two C variants... Yours and a simpler C that could be compiled to bytecode or LMM on the prop or PC. This would be for the hobbyists to simply program, much like they did on the Palm Pilot over 10 years ago (not the main C programs, just the toy·C programs which were coded on the Pilot).

Of course, we are still looking for that killer app. It will most likely be one that the user does not know what·micro is under the hood. At least while we are discussing these things, it may prompt someone to think of one.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:

· Home of the MultiBladeProps: TriBlade,·RamBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: CPUs Z80 etc; Micros Altair etc;· Terminals·VT100 etc; (Index) ZiCog (Z80) , MoCog (6809)·
· Prop OS: SphinxOS·, PropDos , PropCmd··· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBlade Props: www.cluso.bluemagic.biz

RossH · 2010-07-14 10:06

@heater,

Sorry if I misrepresented your comments. I agree it is more of a perception problem - where we may differ is whether this can and should be overcome. I believe we should worry less about whether users may have some philosophical objection to LMM C vs PASM C (especially as I can't really see why anyone would), and instead just make available some more more "industry friendly" tools and then promote the Prop by highlighting the special (and valuable) features that only the Prop can bring to the table (8 independent processors, each one capable of video, etc etc).

As to your "Imagine ..." list, I guess we'll only know should Chip choose to chime in here. However, I suspect that given his antipathy to C, then even if he had thought of LMM originally, he would still have come up with something pretty much like SPIN - but I maybe he would have implemented SPIN (fully or partially) as an LMM language rather than as a byte coded one.

I also agree with everything you say about gcc, and that if gcc had been available when the Prop were released, we would not be having this discussion. However, I'm not particularly stuck on gcc per se (as some others are) because gcc is only useful if you actually need the gcc-specific extensions (e.g. if you want to run Linux). I don't especially want that, but I know others in these forums do - and I think I've said several times that someone somewhere is probably working on such a thing - and (if they're not) then maybe Parallax themselves should be doing so.

For my purposes, LCC (on which Catalina is based) is actually better, since it gives me many benefits, including the potential to mix SPIN and C, and also use most of the existing Parallax drivers - this would not be so easy with gcc. Partly because with gcc you would have to adopt all the object and binary formats which all the existing tools expect.

Also using LCC gives me a chance to have a play with various models for concurrent computing (which not be either much harder with gcc (being a much more complex compiler) or - if you wanted to maintain compatibility with Linux - not possible at all, since you would pretty much HAVE to implement threads as your concurrency model - a model that is not particularly well suited on the Prop!

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

Christof Eb. · 2010-07-14 10:39

Hi Ross,
(I have to admit, that I only once tried to get Catalina going very shortly with windows and that did not work immediately, so I did not try harder.)

What I really love with PropBasic is that you can write COG-code (with inline assembler code, if you want) and LMM-code within one single source file - for different COGS of course. So you do have the full choice within one language. Isn't this possible with C? So you can write a fast driver in ProBasic Cog code. And if this is still not fast enough, you have got the assembler output of the compiler as a starter for a true assembler driver to optimize.

I have not seen a LMM video driver for example, so there is an absolute need for cog code. Due to the limits of cog code, LMM is a very good work around. I have really never understood the bytecode approach, which burns all that performance. As the power consumption is proportional to clock frequency, you can say a compiler will give much more efficiency.

Best regards
Christof

RossH · 2010-07-14 10:40

@Cluso,

Yes, but my point in my original post was that when you consider a typical algorithm implemented in C (or any other high level language) which requires lots of hub ram interaction because it operates on a data space larger than a few longs, then you aren't going to get 20mips (80/4) anyway - you will actually get somewhere between that and 4 mips (80/22) - and it will probably be somewhere between 4 or 8 mips - LMM or not!

Of course, with the Prop you have 8 cogs available, so with concurrent processing you still get at least 32 mips, which is one of the big advantage the Prop brings (provided you can take advantage of it!)

And yes, I've been watching the efforts to combine LMM with SPIN with interest, even though I don't have time to contribute. Only time will tell how useful it will prove to be, but I can certainly see synergies between what Dave Hein & co are doing and what Parallax wanted to do with PMC (but maybe at the time didn't really know quite how to tackle!)

Funnily enough, while I have no problem with having multiple versions of C floating around (after all, there are two already!), my main issue with such hybrid languages is in fact very much the same as heater's problem with LMM C - i.e. is it fast enough to be useful, and (especially if it isn't an existing industrial language) then is it really likely to be adopted outside these forums? While I know what the answer is to those questions about Catalina C ("yes", and "yes"), it will take some time yet to find out the answers with either SPIN/LMM or SPIN/C.

Ross.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

RossH · 2010-07-14 10:52

@Christof,

No worries - C is not suitable for everybody, and not needed by everybody! However, I would be interested in what you found difficult to get working - if I can make it easier I will do so.

Using either LMM PASM or "cog" PASM is possible with Catalina - but it may not be as simple as it is with PropBasic (I don't know bacause I've not actually used PropBasic) - check out the section on page 43 in the latest Catalina Reference Manual titled "USING PASM WITH CATALINA".

I absolutely agree there will always be a need for PASM in most embedded applications - video drivers are a good example.

Ross.

P.S. Maybe we should stop calling it LMM PASM and start calling it "hub" PASM - as opposed to the traditional "cog" PASM.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

heater · 2010-07-14 10:53

RossH "...multiple versions of C floating around (after all, there are two already!)"

Hmm..I count three. Should Zog be getting angry or ImageCraft? You really don't want the former [noparse]:)[/noparse]

I know, I know you meant native LMM C compilers, just kidding.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

RossH · 2010-07-14 11:03

@heater,

Actually, how do you know I wasn't insulting both Zog and Imagecraft by referring to Deve Hein's C to SPIN translator?

Oh no! - here comes Zog! Heeelp!

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina

heater · 2010-07-14 13:00

So I now make that 5 C compilers for the Propeller.

The worlds first C code to run on the Prop was compiled with Leor Zolman's BDSC for the Z80 and run under the ZiCog Z80 emulator. (PropAltair back in the day?)

Plus now the C to Spin translator.

I won't count GCC targeted at 6809 the output of which has also run in the Prop under MoCog until that emulator is a bit more mature.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bill Henning · 2010-07-14 13:06

Achm... as the guy who came up with LMM, I get to name it [noparse]:)[/noparse] so LMM it stays ... not "hub pasm"

I just read this thread, and I agree with pretty much everything you said; however you are all under estimating the potential speed of LMM. I have yet to see a good implementation of FCACHE with any of the compilers. Free up at leat 64 longs for an FCACHE area, compile tight inner loops to direct pasm code for it, watch most of the speed difference between PASM and LMM PASM dissapear.

Want proofs? Implement strcpy() etc, memcpy() etc as "pure LMM" and also as FCACHED'd LMM. Look for the tons of small loops (that don't call functions) in C code, FCACHE those.

RossH said...
@Christof,

No worries - C is not suitable for everybody, and not needed by everybody! However, I would be interested in what you found difficult to get working - if I can make it easier I will do so.

Using either LMM PASM or "cog" PASM is possible with Catalina - but it may not be as simple as it is with PropBasic (I don't know bacause I've not actually used PropBasic) - check out the section on page 43 in the latest Catalina Reference Manual titled "USING PASM WITH CATALINA".

I absolutely agree there will always be a need for PASM in most embedded applications - video drivers are a good example.

Ross.

P.S. Maybe we should stop calling it LMM PASM and start calling it "hub" PASM - as opposed to the traditional "cog" PASM.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system

Leon · 2010-07-14 13:13

heater said...
So I now make that 5 C compilers for the Propeller.

The worlds first C code to run on the Prop was compiled with Leor Zolman's BDSC for the Z80 and run under the ZiCog Z80 emulator. (PropAltair back in the day?)

Plus now the C to Spin translator.

I won't count GCC targeted at 6809 the output of which has also run in the Prop under MoCog until that emulator is a bit more mature.

I remember Leor Zolman and his BDSC compiler from about 30 years ago, a copy of it was supplied with some database software I used a long time ago.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Leon Heller
Amateur radio callsign: G1HSM

heater · 2010-07-14 13:24

Leon, this forum has a member called "BrainDamage". That is the one and only Leor Zolman. He honored us with his presence early on in the ZiCog thread. http://forums.parallax.com/showthread.php?p=788511

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Bean · 2010-07-14 14:11

RossH said...
Using either LMM PASM or "cog" PASM is possible with Catalina - but it may not be as simple as it is with PropBasic (I don't know bacause I've not actually used PropBasic)

In PropBasic there is a command "PROGRAM label" that tells the compiler where to start execution. By appending "LMM" to this command, the program uses LMM. That's all there is to it.

The big slowdown I see in LMM is jumps and calls. These are very common (more common that hub access), and·are·slow in LMM because the address is too large to store in the instruction. So an additional HUB location must be read to get the address.

I wish the Propeller tool would support the @@@ prefix like BST does, then I could make LMM faster because right now I must add an offset to every jump address.

Bean

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Use BASIC on the Propeller with the speed of assembly language.
PropBASIC thread http://forums.parallax.com/showthread.php?p=867134

March 2010 Nuts and Volts article·http://www.parallax.com/Portals/0/Downloads/docs/cols/nv/prop/col/nvp5.pdf
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
There are two rules in life:
· 1) Never divulge all information
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If you choose not to decide, you still have made a choice. [noparse][[/noparse]RUSH - Freewill]

heater · 2010-07-14 14:17

Bean, Isn't possible to make most of your jumps and calls relative to the LMM Program Counter?
Then you don't have to add an offset all the time.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

jazzed · 2010-07-14 14:52

@RossH,

LMM C is 2.5 times or more bigger than equivalent Spin size.

Consider the C GraphicsDemo in OBEX.

#define X_TILES         8 //16
#define Y_TILES         9 //12
#define X_EXPANSION     16 //10
    
#define X_ORIGIN        64 // 128
#define Y_ORIGIN        72 // 96

These define the graphics buffer geometry.
The commented numbers are the sizes that would be used with SPIN.
The defines are what it took to fit the C version in memory.

The LMM C code took up so much memory that the tiles were cut more than half.

While LMM C has some value, the advantage of having the compiler "organize code
for me" is lost in a small memory footprint. With a larger footprint it would be fine.
I wrote Propeller JVM in PASM/SPIN. If I used LMM C there's no way it would ever fit.

@Bean,
heater is right. use ADD/SUB for relative jumps within 512 address range.

Cheers,
--Steve

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM

Post Edited (jazzed) : 7/14/2010 5:16:44 PM GMT

Dave Hein · 2010-07-14 18:18

heater said...
...

"The Propeller architecture is not at all suited to C."
...

I think this is an accurate statement, except I would have left out the words "at all".· Some processors are designed to run C efficiently, and other processors are not.· The Propeller doesn't run large C programs very efficiently.· At best, C programs will run with only 25% efficiency.· I have worked on some processors where the C code runs almost as fast as assembly code.· Over the past 15 years I have converted a few pieces of C code to assembly to get speed improvements, but it's normally not worth the effort just to get a few percentage points improvement in speed.
The Prop is different.· It was not designed to run programs from hub RAM, so we need to implement this indirectly through a small program -- the LMM interpreter.· What do the other chips have that the Prop doesn't?· It mostly comes down to memory cache's.· Other processors use hardware-driven memory caches to pull in instructions and data from slower, but larger memory.· LMM implements a single-long instruction cache, but there is no data cache.
A future version of the Prop could implement hardware memory caches by using the Cog memory as the L1 cache and the hub memory as the L2 cache.· Large programs and data could reside in external DDR memory.· The Cog instruction set could remain virtually the same, but immediate jumps would probably have to be done within a 512-long bank.· Flat memory jumps would be done using registers or a jump address stored after the instruction.
Edit:· Just to clarify my statement.· The Propeller can run small C programs efficiently as long as they fit in the Cog's memory, and they do not perform a large number of hub memory accesses.· It seems odd to me that the current C compilers seem to ignore this mode of operation, which would be highly useful for writing drivers and compute-intensive algorithms.

Post Edited (Dave Hein) : 7/14/2010 6:27:07 PM GMT

Mike Green · 2010-07-14 18:52

As Bean has shown, it's not hard to produce a compiler that can generate native Prop code. The problem is to make a compiler that produces efficient native Prop code from a high level language, not just execution efficient, but space efficient as well. It's very very hard to produce a compiler that will generate code anywhere near hand coded efficiency on a RISC architecture like the Prop. LMM relaxes the code space constraints somewhat and produces only a modest hit in execution speed.

Spin bytecode is a much better match for things like C or Spin languages and Spin+LMM/CachedProp or C+LMM/CachedProp would be a great match with a suitable optimizing compiler. It wouldn't be hard to take the existing interpreter and modify it slightly to produce a C+LMM/CachedProp interpreter with an emphasis on the cached native code. I would even leave in the Spin operators that are not in C. There could always be some kind of C-compatible syntax sugar provided so you could use them.

The issue that would not be addressed is that of "selling" it to potential industrial users. That's not a technical issue ... it's a perceptual (cultural) one.

Parallax is planning to create a subsidiary to market the Prop to the industrial use sector, complete with reps and engineers that can wear suits and dull literature maybe without the beanie or as fanciful a name as "Propeller". Maybe they'll even leave out the "cog" and "hub" and substitute "processor" and "coordinator" or "scheduler" or "synchronizer". It'll be dull and uninspiring, but it'll be the same hardware and it will make more money for Parallax.

HollyMinkowski · 2010-07-14 19:18

The Prop is so marvelously different than other uC's that I think
it deserves a marvelously different assembler for PASM.

I keep thinking of what such an assembler might be like.
One thing it needs is a code generator which after going
through a series of questions creates a template asm code
file with a great deal of the monotonous code already placed
with comments. This would be an invaluable learning environment
for PASM newbies and really speed up code creation. Basic drivers
could also be generated and well commented. Coders could
add new items to such a code generating assembler for others
to use, this would be a lot like the present code repository but
with automatic integration into the users PASM code. A very powerful
emulator built into a PASM IDE would be a great help too.

With an assembler just as powerful and unique as the Prop itself
you would not need a high level language at all...PASM would
itself become the best high level language.

All compilers are asm code generators....I just suggest one much
closer to the native asm but with lots of helpful gadgets and generators
to make asm easier to understand. I firmly believe that you simply
must know the asm language for any uC to really do advanced work
with it...it gives you such a solid grounding in the hardware and a feel
for what is possible.

heater · 2010-07-14 19:31

Dave Hein: I've started to agree with RossH about the efficiency of LMM for C.

You see on the face of it having an LMM loop to execute your code looks like a huge hit in performance, you need at least 4 instructions time to execute one PASM instruction. So it looks like a 20 MIPs processor is now down to 5 MIPs.

Well I think Ross would, and has, argued it this way:

a) Let's assume your application needs more code than 512 instructions, say 1K or 10K or whatever.
b) Let's assume your application needs more data than will fit in COG, say 1K or 10K or whatever.
c) Let's assume that for performance reasons we don't want to use Spin or any byte code system.

Question: How are you going to code that application on the Prop?

Well, immediately we see that, however you code it, the data will mostly reside in HUB and require rd/rwlong etc to access it. Well there is a huge performance straight away.

What about the code. The traditional approach is to use code overlays when constrained for space. That's a huge performance hit again.

But on the Prop we can use LMM. Again a performance hit.

What we begin to see is that for any app bigger than what will fit in COG we are NEVER going to see anywhere near the Props 20MIPs. There will always be a lot of HUBOPs and code swapping going on.

Use the best hand crafted PASM you can muster and you still can't approach that 20MIPs

So Ross would argue that asking a C compiler to compete with the speed of PASM within a COG is just not realistic. A well optimizing C compiler should be compared with what you would get by hand coding a "bigish" program. A lot less than it first seems.

I'm starting to agree with Ross on this.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

jazzed · 2010-07-14 19:57

Mike Green said...
Parallax is planning to create a subsidiary to market the Prop to the industrial use sector, complete with reps and engineers that can wear suits and dull literature maybe without the beanie or as fanciful a name as "Propeller". ...

This is good news I think. I wish Parallax best of luck in the venture.
Unfortunately "checklist items" may replace "think different" there.

@heater,
LMM is necessary for big C program performance and will shine on PropII.
With write-back caching or VMCOG the SDRAM interface will be fast enough.
Hopefully GCC in some form will be available. Would ZOG be fast enough?

Cheers.
--Steve

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM

Dave Hein · 2010-07-14 20:04

I agree that a human coder can always produce more efficient code than a compiler.· However, there are highly optimized compilers that produce extremely efficient assembly code.· They can get within a few percent of a human coder on a RISC architecture.

I think it is good for newbies to become familiar with Prop assembly.· It's good to understand exactly how the processor works.· HOwever, writing in PASM can become very tedious after a while, and it would be much nice to be able to write cog code in a high level language.· Bean's PropBASIC is an execellent tool for this.· A C compiler that generates cog PASM would be nice addition to the tool set.

Heater, my point is that C on the Prop works great for small programs that fit in a cog.· It's only 25% efficient when run from hub memory.· So it's not a good fit for large C programs.· I acknowlege that it not a good fit for large PASM programs either.· If speed is important, then the program must fit in the 8 cogs.· If program size is important, then Spin may be a better choice.· With that being said, there are a lot of applications where a C program running on a 5 MIPs processor works just fine.

Post Edited (Dave Hein) : 7/14/2010 8:18:46 PM GMT

jazzed · 2010-07-14 20:10

Dave Hein said...
A C compiler that generates cog PASM would be nice addition to the tool set.

A C compiler that generates COG PASM does have value.
One value add is easing the learning curve for new users.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM

AntoineDoinel · 2010-07-14 21:04

Personally I don't have any problem to consider LMM as "native" and PASM as "accessible microcode".

You can use PASM for emulating peripherals, extending LMM with complex "microcoded" instructions, and in the end even for full sections of critical code.

A large range of applications, if not the majority, can fit well in a model where optimizing the inner loops only is enough to reach almost native speed, IMHO.

If LMM is not native, the same could be said of 6502 compilers using zero page, or the whole mid-low range of PICs juggling around with the W register.

Just my two eurocents

heater · 2010-07-14 21:11

Dave Hein: "...It's only 25% efficient when run from hub memory"

25% efficient compared to what? The point of my post was that for "bigger than COG" code and data, you won't get anything like 20MIPS from the best possible hand crafted solution. So the C compiler will be looking a lot more than 25% efficient.

As for compiling C into real native PASM confined to the 512 byes of COG. It may have some uses some where but given the size of program here it hardly seems worth having a compiler. I can't see anyone wanting to put in the huge effort required to get C compiled to that confined space for such little benefit.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

heater · 2010-07-14 21:30

Jazzed: "LMM is necessary for big C program performance and will shine on PropII. With write-back caching or VMCOG the SDRAM interface will be fast enough..."

Oh indeed, I looking forward to that.

"...Hopefully GCC in some form will be available. Would ZOG be fast enough?"

Perhaps the Prop II will inspire someone with the required skills to produce a GCC target for it.

As for Zog. I'm not sure the ZPU interpreter concept has a place on the Prop II unless you really insist on using GCC or C++.

The other initial motivation for trying out a ZPU bytecode machine on the Prop was to see how well it worked with external memory accessed through byte wide data buses. As on the TriBlade, DracBlade etc. I thought it might have some advantage there compared to having to always fetch 32bit wide instructions. Actually that experiment has not been done and may never be done because Bill has provided VMCog which makes the memory width problem moot.

Then Zog was hopefully to have better code density than an LMM solution. Still not sure about that one.

So what do you think? Does the ZPU interpreter make sence on a PropII with so much more HUB space.

As it stands Zog gives me GCC and C++ on the Prop in a very simple way so I will probably want to continue with it on Prop II when it arrives.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

jazzed · 2010-07-14 21:55

I think ZOG makes lots of sense until a straight up LMM GCC solution comes. After that I'm not so sure, but at least we'll have a "bird in hand" until then. Right now we have a small cache, later we'll have a big cache AND a huge video or whatever buffer. Let's just call it a way to get some killer app going now that can be easily ported later. Now where was that list of killer apps? ...grumble.... [noparse]:)[/noparse]

Cheers,
--Steve

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM

ImageCraft · 2010-07-14 22:48

Bill, ICC implements FCACHE already. Speed is never mentioned as an issue for ICC.

Heck, ICC does many optimizations such as register packing. It's really quite a bit far cry from generic LCC.

Again, not really the main issue.

C on the Propeller

Comments