Ok - since you are the originator of LMM, I'll accede to your wishes and stick with "LMM PASM" and just plain old "PASM". Saves me rewriting all my documentation anyway!
Regarding FCACHE - I thought ICC implemented this? I could be wrong because I've not looked at their compiler in detail. One thing I can say is that implementing it in a general way within a compiler is extremely difficult (so kudos to anyone who has done so!). It's so difficult that I doubt it will ever be done in a C compiler to any good effect (at least not automatically - at best you may be able to allow the user to hint to the compiler that it may consider FCACHEing some small functions, similar to "inline" or "register" - with the compiler free to ignore them if it wants). Also, apart from specialized cases, my "gut feeling" has always been that the overall benefit in a real world C program would be small, and probably not worth the additional compiler complexity.
The idea of using FCACHE to implement pure LMM for library functions (like strcpy) is better, but this can also be done by having a separate "library" cog - in that cog you could implement a simple cached function overlay scheme, since there may be more functions you would like to implement than you can fit in a cog. I think ths would give you nearly all the benefits of FCACHE, but at much less cost (this one is on my "todo" list).
@Heater,
Yes, exactly! Comparing C against hand-crafted cog code is not even comparing apples and oranges - it's more like comparing apples and iPhones. No-one is ever going to get anywhere near 20 mips per cog out of the Propeller running a program written in ANY high level language - or even in hand-crafted PASM program except in extremely specialized cases when there is no cog-cog or cog-hub interaction required.
@Dave & Jazzed,
I keep hoping someone will take up my suggestion of turning "tiny C" into a cog-only PASM compiler - I think this would be a really useful thing to have (for things like drivers, as Dave suggests). I'll do it myself if I get time, but I'd be happy for anyone else to do it. Unfortunately, as I also pointed out, you would have to implement only a very small subset of C to make it work, so it has very little general applicability for larger programs, and probably should not even really be called "C"
@Jazzed,
If I understand correctly your comments about the C graphics demo, then I think you are confirming my point - with the ICC compiler LMM C size is about 2.5 times SPIN code size. Catalina produces smaller code than ICC, so it would seem we are already well on the way to requiring only 2 times the equivalent byte code program size. The diifference is that you seem to think this makes C not worth considering, whereas I think it makes C a very attractive alternative to a byte coded language like SPIN - i.e. only 2 times the code size but many, many times the performance!
@Mike,
C implemented as SPIN bytecode will never be able to implement anything like the efficiency of C implemented as LMM. The current SPIN + LMM hybrid proposals will be able to do better, but will approach LMM efficiency only as their code sizes approach LMM code size - i.e. if they are virtually all LMM and very little SPIN bytecode. If instead they are virtually all SPIN bytecode with just the small amount of LMM required to implement those parts of C that are terribly inefficient to implement using SPIN bytecodes then they will be as slow as SPIN (actually, probably slower).
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Our posts crossed! Kudos on implementing FCACHE. And yes, ICC does generate faster (but larger) code than Catalina in some (but not all!) cases.
But both are so much faster than SPIN that I don't really think comparing Catalina to ICC is the battle we should face - if we can make C compilers that generate code that is only twice as large as the equivalent SPIN in code size, but 4-8 times faster in execution speed, then I can't really see why we seem to be having so much trouble getting acceptance that C is a perfectly "natural" language for the Propeller!
Ross.
Edit: OOPS - meant for Richard, not Bill!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
RossH,
I like the Spin / Native Prop / LMM combination because most programs spend 90% of the time in 10% of the code and the 90% of the code where they spend little time can very nicely be implemented in Spin bytecodes (or something similar) and the speed hit in overall execution time is minimal with a significant savings in program size. The 10% of the code where the program spends 90% of its time needs to be done in native Prop code, not even LMM because of the need for speed. We're talking about a relatively small amount of code that, most of the time, would easily fit in a cog. The biggest problem with these observations is that you frequently can't tell ahead of time which parts of the program are which. The best way to handle this is to get the program working in Spin (or something similar), instrument the thing, and get some measurements. You then need to be able to tell the compiler with schemas or something similar that particular pieces of code need to be compiled to native code and run in a cog or, at least, as LMM code.
Generally, I agree. But the "significant savings in code size" turn out to be only around 50% of the equivalent LMM size of some parts of the code.
So is the additional complexity (not just compiler complexity, but also user complexity in instrumenting, measuring and then redesigning the code) worth it to save only 50% of the size of of some proportion of the overall program size? Your total saving could actually be 25% or even less (remember that static data segments, dynamic data segments and any code already in PASM - drivers etc - will not be reduced in size).
Ok, if the Prop I was all we were ever going to have, it may be worthwhile - but the Prop II will have 10 times the RAM, and these issues will go away (to return of course once we exhaust the capacity of the Prop II - but by then the Prop III will be on the way!).
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Thanks Richard! I was not aware you were using it, however I am very glad you are using it as it was intended to be THE way of speeding up LMM to almost PASM speeds
ImageCraft said...
Bill, ICC implements FCACHE already. Speed is never mentioned as an issue for ICC.
Heck, ICC does many optimizations such as register packing. It's really quite a bit far cry from generic LCC.
Ok - since you are the originator of LMM, I'll accede to your wishes and stick with "LMM PASM" and just plain old "PASM". Saves me rewriting all my documentation anyway!
Thanks
Generally I personally use PASM for cog-only code, and for me, LMM means large model code (hand or compiler generated)
RossH said...
Regarding FCACHE - I thought ICC implemented this? I could be wrong because I've not looked at their compiler in detail.
Richard pointed that out - and I am glad he is utilizing it!
RossH said...
One thing I can say is that implementing it in a general way within a compiler is extremely difficult (so kudos to anyone who has done so!). It's so difficult that I doubt it will ever be done in a C compiler to any good effect (at least not automatically - at best you may be able to allow the user to hint to the compiler that it may consider FCACHEing some small functions, similar to "inline" or "register" - with the compiler free to ignore them if it wants). Also, apart from specialized cases, my "gut feeling" has always been that the overall benefit in a real world C program would be small, and probably not worth the additional compiler complexity.
You are entirely correct, it is a *LOT* of extra work. For best code generation, you have to speculatively compile, throw away once you exceed fcache buffer size, etc., PropellerBasic is intended to do this, however consulting and new products have stalled PropellerBasic.
RossH said...
The idea of using FCACHE to implement pure LMM for library functions (like strcpy) is better, but this can also be done by having a separate "library" cog - in that cog you could implement a simple cached function overlay scheme, since there may be more functions you would like to implement than you can fit in a cog. I think ths would give you nearly all the benefits of FCACHE, but at much less cost (this one is on my "todo" list).
Here I (respectfully) disagree.
FCACHE saves having to burn an extra cog; and doing the initialization of the routine, and switching to FCACHE for the inner loops, is the biggest win.
Until Chip blesses us with 16-32 cog Props, I am a founding member of the "Save the Cogs Foundation"!
I very much look forward to seeing FCACHE fully implemented in PropellerBasic!
I don't want to sideline my own thread, but I have previously run though a bunch of typical C programs (BASIC may be different) loooking for instances where caching things (such as the inner loop of a larger function) would be worth even a fairly modest FCACHE overhead. I excluded driver-type C code because (at least on the Propeller) these will continue to be written in assembly for the forseeable future anyway.
The few examples I could find were nearly always completely artificial (e.g. benchmarks designed to test the efficiency of register variables) or due to poor programming - including my own! - such as hand-writing C code to do things better done using a library function (e.g. using a 'for' loop to find a character in a string). Also, unless the code deals only with in-cog data (I never found a case of this outside driver code) then the speed up from FCACHE is fairly modest anyway - i.e. around two times normal LMM PASM speeds.
I agree FCACHE is a very useful technique for hand-coded LMM PASM, but it is not meaningful to try and compare what you can achieve in hand-crafted PASM (or LMM PASM) with the LMM PASM generated from a high-level language compiler. I spent years programming in assembly languages, and I've never seen a compiler that could come anywhere close to generating the kind of code we routinely used to save time and space (and of course we thankfully moved to using high-level languages as soon as computers got large enough and fast enough to do so!).
Which brings me back to the original point of this thread - if we want high-level languages on the Propeller (and of course I think we do!) then LMM speeds (and LMM code sizes) are pretty much what we should expect to get. We seem to focus too much on what is possible with PASM within a cog on the Prop (which is amazing!) and forget that there is much more to most real-world applications than that!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
I went back to the "everyday Joe" thread to read the posts that started this thread.· The question that was raised is whether the Prop is suited to the C language.· C can be implemented on almost any processor, so the question is whether it works well on the Prop.· Even Ross stated in that thread that the Prop is not ideal for C.· However, it is clear that C does work well on the Prop.· The limitations we see in large C programs are the same as large LMM PASM or LMM PropBASIC programs.· It has nothing to do with C, but is a limitation of the Prop hardware.
LMM is a great invention, and it has transformed the Prop.· It has made possible large C, PropBASIC and PASM programs.· It would be nice to squeeze every possible MIP out of the Prop, but for most applications 160 MIPS is not really required.· C runs well on the Prop, but it could be slightly improved with a few extra instructions, such as an auto-increment feature and randomly addressable Cog memory.
RossH said...
The diifference is that you seem to think this makes C not worth considering, whereas I think it makes C a very attractive alternative to a byte coded language like SPIN - i.e. only 2 times the code size but many, many times the performance!
LMM C is worth considering for small projects with built-in Propeller resources. For the edge of memory cases (most of my projects) SPIN/PASM wins given the choice of cost/performance. With external memory LMM C is more interesting, but that's a different story.
I am somewhat interested in your EMM EEPROM solution. Have you done any performance comparisons with that? Maybe you can summarize how that works. Is it cached for example?
I don't know that size is really an issue - I use LMM C for projects that are simply not practical at all in SPIN - either because SPIN is too slow, or because SPIN lacks floating point (except with a heap of ugly messing around) or because the algorithm I need to use I can already get off the net fully written and tested in C, and I don't have to try and rewrite it (and then debug it!) in SPIN.
As to EMM - this is really a load option, not a run-time option. EMM does not change the execution model, and EMM programs still use the LMM kernel.
What EMM programs do is load the execution environment from the first 32k of EEPROM (i.e. all the device drivers, plugins and the kernel itself), and when all that has been initiaized then the program code itself is loaded from the second 32k (EMM programs require a 64k EEPROM).
When you use a normal load process, up to 16k (worst case) of your potentially usable Hub RAM may actually be occupied by drivers and plugins that are not needed after initialization time. Even a small LMM program might waste 4kb to 8kb on drivers and plugins, but with EMM the program can actually use the full 32kb of hub RAM for compiled C code (of course, you still need some stack space).
Also, as I said earlier in this thread, I plan to add the option of using one or more "library cogs" to Catalina - when I do that I estimate I could save between 4k and 8k of space currently occupied by various parts of the C library (by putting equivalent functions into a library cog). And they will execute faster as well!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
RossH said...
I don't know that size is really an issue ....
That's usually compensated with fast cars [noparse]:)[/noparse]
RossH said...
As to EMM - this is really a load option, not a run-time option. EMM does not change the execution model, and EMM programs still use the LMM kernel.
Maybe it should be a run time option. With an AT24C1024B or equivalent, that gives 128KB of flat addressable memory. It would be slower than parallel data XMM, but not so bad with a cache and most of HUB RAM not used for stack or cache could be used for video or whatever. Of course your memory model might not be flexible enough to allow a separation of text/data like GCC. Does Catalina build with Linux? I finally gave Vista the boot.
Catalina supports the usual memory segments common to nearly all compilers (the names sometimes change):
Code : a read-only segment containing program code.
Cnst : a read-only segment containing constant data.
Init : a read/write segment containing initialized data.
Data : a read/write segment containing uninitialized data.
Here's a description of the available segment layouts:
-x0 segments are arranged as: Code, Cnst, Init, Data all in Hub RAM. This is LMM mode.
-x1 segments are arranged as: Code, Cnst, Init, Data all in Hub RAM (same as -x0). This is the current EMM mode.
-x2 segments are arranged as: Cnst, Init, Data in Hub RAM, with Code in External RAM. This is XMM (SMALL) mode.
-x5 segments are arranged as: Code, Cnst, Init, Data in External RAM, with only the stack in hub RAM. This is XMM (LARGE) mode.
You could use -x2 to do what you want, or implement -x3 (not shown because I never fully implemented it, but it is actually "Init & Data in Hub RAM with Code & Cnst in External RAM").
I never thought anyone would want a memory model SLOWER than XMM, but if you wanted to execute code direct from EEPROM using layout -x2 it would be fairly trivial to implement the necessary routines to fetch from EEPROM instead of XMM RAM.
Ross.
P.S. I recently benchmarked -x2 on Cluso99's RamBlade, and it gets 333 kwips (thousand single-precision floating point whetstones per second) using one cog. This is a similar speed to the original MicroVax 1.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
How do you tell the compiler what parts of the C code you want FCACHE'd? Is it automatic, or via something like an 'inline' hint to the compiler?
Ross.
Ross, it's automatic. We did it more than a year ago now so details are hazy, but we detect loops and do the magic. Remember, we don't generate asm code immediately. We generate pseudo-code and then optimize on that so we can detect and rewrite things easily.
No offense, this is why I tend not to compare ICC with Catalina C. I have been writing compilers since '85 and ImageCraft has been in business since 94. We can always do better in compiler code generation of course but ICC is way beyond LCC. We do global register allocation, we do live analysis etc. There is nothing that can be done in base LCC that we cannot do, while we can do way beyond base LCC - if we choose to. You want smaller size? We can do whole program code compression that identifies common code sequence and squeeze them into subroutines. Most of these are common code and they are implemented on most of our compilers.
Anyway, Catalina serves its functions and no doubt has its fan. It's free so it's immediately more appealing to the hobbyists and of course you have shown that it is quite robust, able to handle large programs. Best of luck to you.
I should add that FCACHE loops can be very effectively since a program is mostly loops in runtime behaviors. I am not entirely convinced that this is the norm of embedded programs.
Don't worry - I have no intention of trying to implement FCACHE!
I've acknowledged previously that ICC's code generation is more advanced than Catalina's, and often (but not always) generates faster code. However, Catalina now tends to generate smaller code (especially when using the new code optimizer) and can also compile programs of arbitrary size and complexity - so we may end up with the situation that if you want small size for arbitrarily large programs use Catalina, but if you want faster small programs use ICC. Or (soon!) if you want super fast but super small programs, use Tiny C!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
I agree that all programs are mostly loops - and I think this is even more true of embedded programs. But my experience is that (outside driver code) these loops are simply too large to be effectively "FCACHE'd" in only a small number of longs. If the FCACHE could be larger and cache several entire functions at once it would be very effective. Then it would behave more like a level 1 cache - but most L1 caches are many times larger than the entire Propeller RAM!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
RossH: "I never thought anyone would want a memory model SLOWER than XMM, but if you wanted to execute code direct from EEPROM using layout -x2 it would be fairly trivial to implement the necessary routines to fetch from EEPROM instead of XMM RAM."
Actually some of us do. Generally XMM solutions involve attaching some RAM chip to the Prop and eating a lot of pins. The extreme example is Cluso's TriBlade Blade #2 RAM solution where there is only just the pins available for a serial connection and the SD card has to share pins which is a pain.
This means that you can't use most of your lovely COGs because they have nowhere to go.
This means you may as well be using some other chip to run your C code on one CPU. It would be simpler and faster.
The solution is to use a serially attached store for all that code hat won't fit in HUB. EEPROM or SPI RAM or whatever. Slower but conserves pins and frees up COGS for the fast stuff.
I imagine this will work best with Bill Hennings VMCog. The upside of which is:
1) Allows a lot of code to be kept out of HUB.
2) Frees up most pins normally used for ext RAM.
3) Which in turn allows one to make use of more COGs.
4) VMCog takes care of the hardware so you don't have to produce a different LMM kernel for each new hardware solution. Just let VMCog handle it.
5) Should be faster than direct access what with all that block access to the memory device and caching in VMCog.
6) Is flexible, one can run two or more serial devices in parallel for speed, just let VMCOG handle it.
The down side is:
1) Slower execution compared to a fast XMM RAM interface.
2) Eats one COG for VMCog, but may allow better use of the the remaining 6 COGs.
3) Eats a handful of 512 byte blocks in HUB for the VM working set.
For this reason Zog may never contain direct hardware access to ext RAM in it's interpreter COG.
Zog already supports a bunch different XMM solutions, most of which I have never had in my hand, as a result of using VMCog.
Now if only I could figure out how to get GCC's linker scripts to separate out code form data...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Dr_Acula's DracBlade shows that it is possible to have XMM RAM on a fairly simple board with lots of spare pins available for other uses - he even fits a full VGA interface on there!. Of course, his XMM RAM is about three times slower than Cluso's RamBlade - but I'll bet it is still faster than either executing code direct from EEPROM or via VMCog.
However, either one would be fairly simple to implement in Catalina if there were sufficient demand. Any volunteers?
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
That mythical JOE Propeller user (or more likely Brian) has now been sold on the Prop idea and has a Demo Board in his hands. Or some equivalent. Or if he's like me he just has a DIP prop lashed up with some flying leads on a strip board with some LEDs. There are many such Brians already.
Very soon, being keen, he's run out of program space. Especially if he's using Catalina[noparse]:)[/noparse]
A serial RAM/EEPROM solution to get that bit more code in is a one hour soldering job and zero software effort if using VMCog. Or even just a two minute job fitting a bigger EEPROM. And it does not bugger up all those other peripherals he has welded onto the Prop pins already.
Now I love the TriBlade and DracBlade solutions. But should Brian have to be tied to one of those? Should Brian have to invest all that time and effort into building something similar?
The DracBlade is a gem. It was designed with a specific purpose in mind, running a Z80 emulation and CP/M. It just happens to be perfect for a lot of other applications that require memory, video, SD card, keyboard, serial I/O.
In summary these ext RAM solutions are complicated and expensive and disruptive of any existing design that may find it needs more program space. The serial solutions are not.
Convinced yet?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I'm not really convinced - but when I get some time I'll have a look at VMCog. The "execute direct from EEPROM" solution is quite simple so anyone could do that one (but I'd be hesitant to include it with Catalina as it would be soooooo slooooow).
Much as I may like to believe otherwise, I don't really think "Brian" is going to launch himself straight into Catalina. He's always going to start with SPIN, and only want something else when he goes online (or looks over his mate's shoulder) and sees all those cool things you can do with C on an Arduino.
THEN he's going to see he has to shell out some pocket money on something more than just a Demo board - somethine like a <<insert your favorite Prop platform here>> - and THEN he's going to use Catalina!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Very soon, being keen, he's run out of program space. Especially if he's using Catalina[noparse]:)[/noparse]
So rather than move up to a bigger aircraft that actually does what he needs, he straps it to a bathtub, bolts water wings to the propeller and tries to turn it into a flying ship only to find that he's pushing to make 5 knots into a mild headwind and all hopes of again leaving the water are completely dashed.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"I mean, if I went around sayin' I was an emperor just because some moistened bint had lobbed a scimitar at me they'd put me away!"
RossH: I hope my little jibes cause a little chuckle sometimes, if not that they go ignored. I don't mean to offend anyone I'm just being playful.
BradC: That's not a very nice way to talk about the DracBlade[noparse]:)[/noparse]
(There now, I will have upset Dr Acula).
Both of you: I'm not describing the situation of Brian upgrading to a TriBlade or DracBlade or whatever when he hits the RAM end stops.
No. My mythical Brian, or any other Propeller project developer has just got his head around the Prop. He's developed his own idea of a super wonder gadget, surrounding his Prop with all kinds of peripherals. He might have a board designed around it even. Anyway a lot of effort and time has gone into that so far.
Now he hits the RAM limit. Or he dreams up some new cool feature that just won't fit. What to do.?
Upgrade to a Propeller + "bath tub" solution? No he can't fit his existing hardware design around that.
Upgrade to some other processor that has more RAM? No. He likes the Props interrupt free, simple to use environment and does not want to recreate all his code. Besides where do you connect all the gadgets on that other chip.
Well, if he has a COG free and couple of pins he can start adding code in C.
Is that a common scenario? No idea.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
RossH said...
Much as I may like to believe otherwise, I don't really think "Brian" is going to launch himself straight into Catalina. He's always going to start with SPIN, and only want something else when he goes online (or looks over his mate's shoulder) and sees all those cool things you can do with C on an Arduino.
Actually, at this point he'll probably just buy an Arduino and not look back...
This isn't to discredit the work that has been done in porting C to the Propeller, which I think is great (for all of the projects). It's just that the Propeller options are a lot harder to get started with than Arduino (IMO).
I'm not sure what it is really. I think that it's a combination of several factors... installation issues, documentation issues, lack of sample code, etc. The Arduino IDE, while simple, is dirt easy to install, and using the Arduino "Wiring" language, the code to blink an LED is...
// note: this is the blink code minus comments taken from arduino.cc
int ledPin = 13;
void setup()
{
pinMode(ledPin, OUTPUT);
}
void loop()
{
digitalWrite(ledPin, HIGH);
delay(1000);
digitalWrite(ledPin, LOW);
delay(1000);
}
Okay, so this isn't exactly the same as firing up AVR Studio and programming directly against the ATMega. But that admittedly more advanced option is available if and when it's wanted or needed. However, as the Arduino IDE is basically a front end to GCC, adding user defined functions to a "sketch" is easy and straightforward...
So, given the overall structure of an Arduino sketch (setup & loop functions required), there is still a lot of C programming that can be done before needing to get really dirty. I know that some people tend to slam the Arduino because it's not a Propeller, or because it was designed for artists instead of engineers, but honestly, it is much more accessible to the average Joe than the Propeller is. I don't think it's quite at the BS2 level yet, but it's fast approaching it.
So how does all of this relate to this thread? The fastest way (IMO) to get people using C on the Propeller is to basically copy the Arduino. Port the libraries, and make the install/compile/upload cycle as easy as it is for the Arduino, and keep the advanced stuff there for those that want it.
Oh yeah, and start making hardware devices USB powered, because for some reason, every Parallax-ish development board seems to use a different wall wart...
Post Edited (Kevin Wood) : 7/15/2010 3:49:09 PM GMT
That's when you shell out US$50 on my new Catalina Code Optimizer! Shrinks code size by 10% and improves performance by 5 to 15%! Guaranteed or your money back!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Blimey, Kevin, I was just adding functions to the C library for Zog and they look very similar already. Well why would they not we are only setting up toggling an I/O here.
Where is this Arduino library and "sketch" stuff documented. I might want to make Zog have a lookalike library. See how much Arduino code can be supported on the Prop with it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Comments
Ok - since you are the originator of LMM, I'll accede to your wishes and stick with "LMM PASM" and just plain old "PASM". Saves me rewriting all my documentation anyway!
Regarding FCACHE - I thought ICC implemented this? I could be wrong because I've not looked at their compiler in detail. One thing I can say is that implementing it in a general way within a compiler is extremely difficult (so kudos to anyone who has done so!). It's so difficult that I doubt it will ever be done in a C compiler to any good effect (at least not automatically - at best you may be able to allow the user to hint to the compiler that it may consider FCACHEing some small functions, similar to "inline" or "register" - with the compiler free to ignore them if it wants). Also, apart from specialized cases, my "gut feeling" has always been that the overall benefit in a real world C program would be small, and probably not worth the additional compiler complexity.
The idea of using FCACHE to implement pure LMM for library functions (like strcpy) is better, but this can also be done by having a separate "library" cog - in that cog you could implement a simple cached function overlay scheme, since there may be more functions you would like to implement than you can fit in a cog. I think ths would give you nearly all the benefits of FCACHE, but at much less cost (this one is on my "todo" list).
@Heater,
Yes, exactly! Comparing C against hand-crafted cog code is not even comparing apples and oranges - it's more like comparing apples and iPhones. No-one is ever going to get anywhere near 20 mips per cog out of the Propeller running a program written in ANY high level language - or even in hand-crafted PASM program except in extremely specialized cases when there is no cog-cog or cog-hub interaction required.
@Dave & Jazzed,
I keep hoping someone will take up my suggestion of turning "tiny C" into a cog-only PASM compiler - I think this would be a really useful thing to have (for things like drivers, as Dave suggests). I'll do it myself if I get time, but I'd be happy for anyone else to do it. Unfortunately, as I also pointed out, you would have to implement only a very small subset of C to make it work, so it has very little general applicability for larger programs, and probably should not even really be called "C"
@Jazzed,
If I understand correctly your comments about the C graphics demo, then I think you are confirming my point - with the ICC compiler LMM C size is about 2.5 times SPIN code size. Catalina produces smaller code than ICC, so it would seem we are already well on the way to requiring only 2 times the equivalent byte code program size. The diifference is that you seem to think this makes C not worth considering, whereas I think it makes C a very attractive alternative to a byte coded language like SPIN - i.e. only 2 times the code size but many, many times the performance!
@Mike,
C implemented as SPIN bytecode will never be able to implement anything like the efficiency of C implemented as LMM. The current SPIN + LMM hybrid proposals will be able to do better, but will approach LMM efficiency only as their code sizes approach LMM code size - i.e. if they are virtually all LMM and very little SPIN bytecode. If instead they are virtually all SPIN bytecode with just the small amount of LMM required to implement those parts of C that are terribly inefficient to implement using SPIN bytecodes then they will be as slow as SPIN (actually, probably slower).
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Our posts crossed! Kudos on implementing FCACHE. And yes, ICC does generate faster (but larger) code than Catalina in some (but not all!) cases.
But both are so much faster than SPIN that I don't really think comparing Catalina to ICC is the battle we should face - if we can make C compilers that generate code that is only twice as large as the equivalent SPIN in code size, but 4-8 times faster in execution speed, then I can't really see why we seem to be having so much trouble getting acceptance that C is a perfectly "natural" language for the Propeller!
Ross.
Edit: OOPS - meant for Richard, not Bill!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
I like the Spin / Native Prop / LMM combination because most programs spend 90% of the time in 10% of the code and the 90% of the code where they spend little time can very nicely be implemented in Spin bytecodes (or something similar) and the speed hit in overall execution time is minimal with a significant savings in program size. The 10% of the code where the program spends 90% of its time needs to be done in native Prop code, not even LMM because of the need for speed. We're talking about a relatively small amount of code that, most of the time, would easily fit in a cog. The biggest problem with these observations is that you frequently can't tell ahead of time which parts of the program are which. The best way to handle this is to get the program working in Spin (or something similar), instrument the thing, and get some measurements. You then need to be able to tell the compiler with schemas or something similar that particular pieces of code need to be compiled to native code and run in a cog or, at least, as LMM code.
Generally, I agree. But the "significant savings in code size" turn out to be only around 50% of the equivalent LMM size of some parts of the code.
So is the additional complexity (not just compiler complexity, but also user complexity in instrumenting, measuring and then redesigning the code) worth it to save only 50% of the size of of some proportion of the overall program size? Your total saving could actually be 25% or even less (remember that static data segments, dynamic data segments and any code already in PASM - drivers etc - will not be reduced in size).
Ok, if the Prop I was all we were ever going to have, it may be worthwhile - but the Prop II will have 10 times the RAM, and these issues will go away (to return of course once we exhaust the capacity of the Prop II - but by then the Prop III will be on the way!).
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Post Edited (RossH) : 7/15/2010 2:10:50 AM GMT
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
Thanks
Generally I personally use PASM for cog-only code, and for me, LMM means large model code (hand or compiler generated)
Richard pointed that out - and I am glad he is utilizing it!
You are entirely correct, it is a *LOT* of extra work. For best code generation, you have to speculatively compile, throw away once you exceed fcache buffer size, etc., PropellerBasic is intended to do this, however consulting and new products have stalled PropellerBasic.
Here I (respectfully) disagree.
FCACHE saves having to burn an extra cog; and doing the initialization of the routine, and switching to FCACHE for the inner loops, is the biggest win.
Until Chip blesses us with 16-32 cog Props, I am a founding member of the "Save the Cogs Foundation"!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
www.mikronauts.com E-mail: mikronauts _at_ gmail _dot_ com
My products: Morpheus / Mem+ / PropCade / FlexMem / VMCOG / Propteus / Proteus / SerPlug
and 6.250MHz Crystals to run Propellers at 100MHz & 5.0" OEM TFT VGA LCD modules
Las - Large model assembler Largos - upcoming nano operating system
How do you tell the compiler what parts of the C code you want FCACHE'd? Is it automatic, or via something like an 'inline' hint to the compiler?
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
I very much look forward to seeing FCACHE fully implemented in PropellerBasic!
I don't want to sideline my own thread, but I have previously run though a bunch of typical C programs (BASIC may be different) loooking for instances where caching things (such as the inner loop of a larger function) would be worth even a fairly modest FCACHE overhead. I excluded driver-type C code because (at least on the Propeller) these will continue to be written in assembly for the forseeable future anyway.
The few examples I could find were nearly always completely artificial (e.g. benchmarks designed to test the efficiency of register variables) or due to poor programming - including my own! - such as hand-writing C code to do things better done using a library function (e.g. using a 'for' loop to find a character in a string). Also, unless the code deals only with in-cog data (I never found a case of this outside driver code) then the speed up from FCACHE is fairly modest anyway - i.e. around two times normal LMM PASM speeds.
I agree FCACHE is a very useful technique for hand-coded LMM PASM, but it is not meaningful to try and compare what you can achieve in hand-crafted PASM (or LMM PASM) with the LMM PASM generated from a high-level language compiler. I spent years programming in assembly languages, and I've never seen a compiler that could come anywhere close to generating the kind of code we routinely used to save time and space (and of course we thankfully moved to using high-level languages as soon as computers got large enough and fast enough to do so!).
Which brings me back to the original point of this thread - if we want high-level languages on the Propeller (and of course I think we do!) then LMM speeds (and LMM code sizes) are pretty much what we should expect to get. We seem to focus too much on what is possible with PASM within a cog on the Prop (which is amazing!) and forget that there is much more to most real-world applications than that!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
LMM is a great invention, and it has transformed the Prop.· It has made possible large C, PropBASIC and PASM programs.· It would be nice to squeeze every possible MIP out of the Prop, but for most applications 160 MIPS is not really required.· C runs well on the Prop, but it could be slightly improved with a few extra instructions, such as an auto-increment feature and randomly addressable Cog memory.
I am somewhat interested in your EMM EEPROM solution. Have you done any performance comparisons with that? Maybe you can summarize how that works. Is it cached for example?
Cheers,
--Steve
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
I don't know that size is really an issue - I use LMM C for projects that are simply not practical at all in SPIN - either because SPIN is too slow, or because SPIN lacks floating point (except with a heap of ugly messing around) or because the algorithm I need to use I can already get off the net fully written and tested in C, and I don't have to try and rewrite it (and then debug it!) in SPIN.
As to EMM - this is really a load option, not a run-time option. EMM does not change the execution model, and EMM programs still use the LMM kernel.
What EMM programs do is load the execution environment from the first 32k of EEPROM (i.e. all the device drivers, plugins and the kernel itself), and when all that has been initiaized then the program code itself is loaded from the second 32k (EMM programs require a 64k EEPROM).
When you use a normal load process, up to 16k (worst case) of your potentially usable Hub RAM may actually be occupied by drivers and plugins that are not needed after initialization time. Even a small LMM program might waste 4kb to 8kb on drivers and plugins, but with EMM the program can actually use the full 32kb of hub RAM for compiled C code (of course, you still need some stack space).
Also, as I said earlier in this thread, I plan to add the option of using one or more "library cogs" to Catalina - when I do that I estimate I could save between 4k and 8k of space currently occupied by various parts of the C library (by putting equivalent functions into a library cog). And they will execute faster as well!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Maybe it should be a run time option. With an AT24C1024B or equivalent, that gives 128KB of flat addressable memory. It would be slower than parallel data XMM, but not so bad with a cache and most of HUB RAM not used for stack or cache could be used for video or whatever. Of course your memory model might not be flexible enough to allow a separation of text/data like GCC. Does Catalina build with Linux? I finally gave Vista the boot.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Pages: Propeller JVM
Catalina supports the usual memory segments common to nearly all compilers (the names sometimes change):
Here's a description of the available segment layouts:
- -x0 segments are arranged as: Code, Cnst, Init, Data all in Hub RAM. This is LMM mode.
- -x1 segments are arranged as: Code, Cnst, Init, Data all in Hub RAM (same as -x0). This is the current EMM mode.
- -x2 segments are arranged as: Cnst, Init, Data in Hub RAM, with Code in External RAM. This is XMM (SMALL) mode.
- -x5 segments are arranged as: Code, Cnst, Init, Data in External RAM, with only the stack in hub RAM. This is XMM (LARGE) mode.
You could use -x2 to do what you want, or implement -x3 (not shown because I never fully implemented it, but it is actually "Init & Data in Hub RAM with Code & Cnst in External RAM").I never thought anyone would want a memory model SLOWER than XMM, but if you wanted to execute code direct from EEPROM using layout -x2 it would be fairly trivial to implement the necessary routines to fetch from EEPROM instead of XMM RAM.
Ross.
P.S. I recently benchmarked -x2 on Cluso99's RamBlade, and it gets 333 kwips (thousand single-precision floating point whetstones per second) using one cog. This is a similar speed to the original MicroVax 1.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Ross, it's automatic. We did it more than a year ago now so details are hazy, but we detect loops and do the magic. Remember, we don't generate asm code immediately. We generate pseudo-code and then optimize on that so we can detect and rewrite things easily.
No offense, this is why I tend not to compare ICC with Catalina C. I have been writing compilers since '85 and ImageCraft has been in business since 94. We can always do better in compiler code generation of course but ICC is way beyond LCC. We do global register allocation, we do live analysis etc. There is nothing that can be done in base LCC that we cannot do, while we can do way beyond base LCC - if we choose to. You want smaller size? We can do whole program code compression that identifies common code sequence and squeeze them into subroutines. Most of these are common code and they are implemented on most of our compilers.
Anyway, Catalina serves its functions and no doubt has its fan. It's free so it's immediately more appealing to the hobbyists and of course you have shown that it is quite robust, able to handle large programs. Best of luck to you.
Don't worry - I have no intention of trying to implement FCACHE!
I've acknowledged previously that ICC's code generation is more advanced than Catalina's, and often (but not always) generates faster code. However, Catalina now tends to generate smaller code (especially when using the new code optimizer) and can also compile programs of arbitrary size and complexity - so we may end up with the situation that if you want small size for arbitrarily large programs use Catalina, but if you want faster small programs use ICC. Or (soon!) if you want super fast but super small programs, use Tiny C!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
I agree that all programs are mostly loops - and I think this is even more true of embedded programs. But my experience is that (outside driver code) these loops are simply too large to be effectively "FCACHE'd" in only a small number of longs. If the FCACHE could be larger and cache several entire functions at once it would be very effective. Then it would behave more like a level 1 cache - but most L1 caches are many times larger than the entire Propeller RAM!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Actually some of us do. Generally XMM solutions involve attaching some RAM chip to the Prop and eating a lot of pins. The extreme example is Cluso's TriBlade Blade #2 RAM solution where there is only just the pins available for a serial connection and the SD card has to share pins which is a pain.
This means that you can't use most of your lovely COGs because they have nowhere to go.
This means you may as well be using some other chip to run your C code on one CPU. It would be simpler and faster.
The solution is to use a serially attached store for all that code hat won't fit in HUB. EEPROM or SPI RAM or whatever. Slower but conserves pins and frees up COGS for the fast stuff.
I imagine this will work best with Bill Hennings VMCog. The upside of which is:
1) Allows a lot of code to be kept out of HUB.
2) Frees up most pins normally used for ext RAM.
3) Which in turn allows one to make use of more COGs.
4) VMCog takes care of the hardware so you don't have to produce a different LMM kernel for each new hardware solution. Just let VMCog handle it.
5) Should be faster than direct access what with all that block access to the memory device and caching in VMCog.
6) Is flexible, one can run two or more serial devices in parallel for speed, just let VMCOG handle it.
The down side is:
1) Slower execution compared to a fast XMM RAM interface.
2) Eats one COG for VMCog, but may allow better use of the the remaining 6 COGs.
3) Eats a handful of 512 byte blocks in HUB for the VM working set.
For this reason Zog may never contain direct hardware access to ext RAM in it's interpreter COG.
Zog already supports a bunch different XMM solutions, most of which I have never had in my hand, as a result of using VMCog.
Now if only I could figure out how to get GCC's linker scripts to separate out code form data...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I'm not convinced.
Dr_Acula's DracBlade shows that it is possible to have XMM RAM on a fairly simple board with lots of spare pins available for other uses - he even fits a full VGA interface on there!. Of course, his XMM RAM is about three times slower than Cluso's RamBlade - but I'll bet it is still faster than either executing code direct from EEPROM or via VMCog.
However, either one would be fairly simple to implement in Catalina if there were sufficient demand. Any volunteers?
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
That mythical JOE Propeller user (or more likely Brian) has now been sold on the Prop idea and has a Demo Board in his hands. Or some equivalent. Or if he's like me he just has a DIP prop lashed up with some flying leads on a strip board with some LEDs. There are many such Brians already.
Very soon, being keen, he's run out of program space. Especially if he's using Catalina[noparse]:)[/noparse]
A serial RAM/EEPROM solution to get that bit more code in is a one hour soldering job and zero software effort if using VMCog. Or even just a two minute job fitting a bigger EEPROM. And it does not bugger up all those other peripherals he has welded onto the Prop pins already.
Now I love the TriBlade and DracBlade solutions. But should Brian have to be tied to one of those? Should Brian have to invest all that time and effort into building something similar?
The DracBlade is a gem. It was designed with a specific purpose in mind, running a Z80 emulation and CP/M. It just happens to be perfect for a lot of other applications that require memory, video, SD card, keyboard, serial I/O.
In summary these ext RAM solutions are complicated and expensive and disruptive of any existing design that may find it needs more program space. The serial solutions are not.
Convinced yet?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
"Brian" could buy an iron.
·I have even built a "Dracblade" on a bread board, with lead lenths in excess of 8 inches (mind you, I was gobsmacked when that one ran!)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Why did I think a new, more challenging, job was a good idea ??
But you must admit it's a lot more work than a SPI RAM or bigger EEPROM which is none. When It comes to work I like none[noparse]:)[/noparse]
And then there's all those extra chips and the size of the thing.
I'm sure serial devices for code has a place. If that place is big enough for Ross to expend effort on I have no idea.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I'm not really convinced - but when I get some time I'll have a look at VMCog. The "execute direct from EEPROM" solution is quite simple so anyone could do that one (but I'd be hesitant to include it with Catalina as it would be soooooo slooooow).
Much as I may like to believe otherwise, I don't really think "Brian" is going to launch himself straight into Catalina. He's always going to start with SPIN, and only want something else when he goes online (or looks over his mate's shoulder) and sees all those cool things you can do with C on an Arduino.
THEN he's going to see he has to shell out some pocket money on something more than just a Demo board - somethine like a <<insert your favorite Prop platform here>> - and THEN he's going to use Catalina!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Did you notice how I politely ignored that jibe about Catalina code size?
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
So rather than move up to a bigger aircraft that actually does what he needs, he straps it to a bathtub, bolts water wings to the propeller and tries to turn it into a flying ship only to find that he's pushing to make 5 knots into a mild headwind and all hopes of again leaving the water are completely dashed.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"I mean, if I went around sayin' I was an emperor just because some moistened bint had lobbed a scimitar at me they'd put me away!"
That's not a nice way to talk about the Prop II
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
BradC: That's not a very nice way to talk about the DracBlade[noparse]:)[/noparse]
(There now, I will have upset Dr Acula).
Both of you: I'm not describing the situation of Brian upgrading to a TriBlade or DracBlade or whatever when he hits the RAM end stops.
No. My mythical Brian, or any other Propeller project developer has just got his head around the Prop. He's developed his own idea of a super wonder gadget, surrounding his Prop with all kinds of peripherals. He might have a board designed around it even. Anyway a lot of effort and time has gone into that so far.
Now he hits the RAM limit. Or he dreams up some new cool feature that just won't fit. What to do.?
Upgrade to a Propeller + "bath tub" solution? No he can't fit his existing hardware design around that.
Upgrade to some other processor that has more RAM? No. He likes the Props interrupt free, simple to use environment and does not want to recreate all his code. Besides where do you connect all the gadgets on that other chip.
Well, if he has a COG free and couple of pins he can start adding code in C.
Is that a common scenario? No idea.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Actually, at this point he'll probably just buy an Arduino and not look back...
This isn't to discredit the work that has been done in porting C to the Propeller, which I think is great (for all of the projects). It's just that the Propeller options are a lot harder to get started with than Arduino (IMO).
I'm not sure what it is really. I think that it's a combination of several factors... installation issues, documentation issues, lack of sample code, etc. The Arduino IDE, while simple, is dirt easy to install, and using the Arduino "Wiring" language, the code to blink an LED is...
Okay, so this isn't exactly the same as firing up AVR Studio and programming directly against the ATMega. But that admittedly more advanced option is available if and when it's wanted or needed. However, as the Arduino IDE is basically a front end to GCC, adding user defined functions to a "sketch" is easy and straightforward...
So, given the overall structure of an Arduino sketch (setup & loop functions required), there is still a lot of C programming that can be done before needing to get really dirty. I know that some people tend to slam the Arduino because it's not a Propeller, or because it was designed for artists instead of engineers, but honestly, it is much more accessible to the average Joe than the Propeller is. I don't think it's quite at the BS2 level yet, but it's fast approaching it.
So how does all of this relate to this thread? The fastest way (IMO) to get people using C on the Propeller is to basically copy the Arduino. Port the libraries, and make the install/compile/upload cycle as easy as it is for the Arduino, and keep the advanced stuff there for those that want it.
Oh yeah, and start making hardware devices USB powered, because for some reason, every Parallax-ish development board seems to use a different wall wart...
Post Edited (Kevin Wood) : 7/15/2010 3:49:09 PM GMT
Agree 100% - see my post in Cluso's thread http://forums.parallax.com/showthread.php?p=922576
@heater,
That's when you shell out US$50 on my new Catalina Code Optimizer! Shrinks code size by 10% and improves performance by 5 to 15%! Guaranteed or your money back!
Ross.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Catalina - a FREE C compiler for the Propeller - see Catalina
Where is this Arduino library and "sketch" stuff documented. I might want to make Zog have a lookalike library. See how much Arduino code can be supported on the Prop with it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.