PropellerGCC CMM Preview

jazzed · 2012-09-06 12:28

Hi All.

PropellerGCC CMM preview packages are beginning to show up in the SimpleIDE downloads page.

Click here to download the Windows package. Other packages will appear throughout the day.

Planned Packages

Windows
MacOSX x86-64 bit
Debian Linux i686 32bit (Debian, Mint, Ubuntu)
Debian and Fedora x86_64
SUSE x86_64 (possibly)

If you have a need for a special build post it here.

Below are notes in the download description. Please read them if you plan to try the package.

This is a Windows Propeller-GCC Compressed Memory Model CMM preview test package. Unzip the program and double-click the setup program to install.

The package contains the PropellerGCC tool chain and SimpleIDE. SimpleIDE has enhancements for choosing CMM program type and using the tiny library with various memory models.

Programs compiled with CMM are about 41% smaller than a comparable 10KB or more LMM program. CMM programs run about 37% slower than comparable LMM programs. Smaller LMM programs are not compressed as much because the GCC "kernel" interpreter is loaded with the main program to allow running C functions in multiple propeller COGs.

There are no SPIN enhancements in SimpleIDE. Such enhancements are currently the subject of internal discussions with Parallax. The installer still requires running as admin by default.

Notes:

* A new __attribute__(fcache) has been added to let the programmer force loading small loops into the kernel COG to run at COG speed. It is available for any model including CMM which is smaller but also slower than LMM.

* Please ensure that binary libraries are compatible with the memory model selected. For example CMM code and XMM code are not compatible with LMM libraries. Propeller-GCC libraries are provided in all flavors and are automatically selected by the compiler when the -m$MODEL is on the command line.

* The default memory model for gcc is still LMM. To use CMM pass -mcmm on the gcc command line.

* The tiny library has been added to propeller-gcc and is built for all memory models. There are important caveats in using the tiny library:

-- Do not use the tiny library with programs using FILE* pointers such as passed to functions like fprintf, fread, etc....
-- Do not use the tiny library with programs using floating point.
-- Do not use the tiny library with -Dprintf=__simple_printf or programs that use #define print __simple_printf.

See issues pages for PropellerGCC and SimpleIDE for known bugs or enhancement requests being considered. Several issues have been resolved but not verified.

Thanks,
--Steve

jazzed · 2012-09-06 12:41

PropellerGCC demos now include SimpleIDE project files where applicable.

http://code.google.com/p/propgcc/downloads/detail?name=propellergcc_v0_3_5_demos.zip&can=2

jazzed · 2012-09-06 12:42

Here are some statistics. These tables compare the much-maligned FIBO program size and performance on the C3 board (80MHZ) using non FCACHE flags. GCC was set to -Os optimize for size in all cases. There are better benchmarks than FIBO, but it is the simplest for these comparisons. Other comparisons can be made, and the C3Files program which is a more real life application has already been mentioned (this post prepared after several replies). Code Bytes is the total program code size. xSize and xSpeed are multiples for the Compiler Memory Model -vs- SPIN. The notes below explain the differences in the table entries.

Comparable SPIN TinyLib GCC Sizes¹

Compiler Memory Model
Boardtype
Code Bytes
FIBO(26) ms
xSize -vs- SPIN
xSpeed -vs- SPIN

SPIN
HUB
2888
10056
1.00
1.0

GCC XMM-SPLIT
C3
4328
7425
1.50
1.4

GCC XMMC
C3F
4172
6560
1.44
1.5

GCC CMM
HUB
3292
4792
1.14
2.1

GCC LMM
HUB
4364
1767
1.51
5.7

Comparable SPIN Default GCC Sizes²

Compiler Memory Model
Boardtype
Code Bytes
FIBO(26) ms
xSize -vs- SPIN
xSpeed -vs- SPIN

SPIN
HUB
2888
10056
1.00
1.0

GCC XMM-SPLIT
C3
14896
7425
5.16
1.4

GCC XMMC
C3F
13920
6560
4.82
1.5

GCC CMM
HUB
8356
4792
2.89
2.1

GCC LMM
HUB
14104
1767
4.88
5.7

TinyLib GCC and SPIN Sizes³

Compiler Memory Model
Boardtype
Code Bytes
FIBO(26) ms
xSize -vs- SPIN
xSpeed -vs- SPIN

SPIN
HUB
904
10056
1.00
1.0

GCC XMM-SPLIT
C3
4328
7425
4.79
1.4

GCC XMMC
C3F
4172
6560
4.62
1.5

GCC CMM
HUB
3292
4792
3.64
2.1

GCC LMM
HUB
4364
1767
4.83
5.7

Default GCC and SPIN Sizes⁴

Compiler Memory Model
Boardtype
Code Bytes
FIBO(26) ms
xSize -vs- SPIN
xSpeed -vs- SPIN

SPIN
HUB
904
10056
1.00
1.0

GCC XMM-SPLIT
C3
14896
7425
16.48
1.4

GCC XMMC
C3F
13920
6560
15.40
1.5

GCC CMM
HUB
8356
4792
9.24
2.1

GCC LMM
HUB
14104
1767
15.60
5.7

1. Comparable: This case compares SPIN and GCC on level ground. That is, both SPIN and GCC program types (except GCC COG mode) require an interpreter kernel work. In the case of SPIN, the interpreter is kept in the Propeller ROM and is about 1984 bytes, so for apples to apples comparisons 1984 is added to the 904 bytes of the FIBO spin program to get 2888. One could remove the kernel using a special loader to save memory and lose parallelism.

In the case of non COG GCC, the interpreter is kept in HUB RAM. Of course SPIN can start a function in a new COG using it's built-in interpreter ROM to allow multi-processor programs. For GCC to allow multi-processor programs, the interpreter must be kept somewhere too, and that is in HUB RAM by default. A COGNEW instruction can read from HUB RAM or ROM only.

SPIN does not by default include support for Floating point and a FILE* handle library, so the Tiny library which does not offer either is use in this comparison.

It is possible to throw away the GCC kernel and other tricks by using a two stage loader, but that would essentially make Propeller into a single core processor machine that happens to have peripherals running in other COGs. As we have been reminded, that is undesirable.

Just for completeness, it should be noted that the GCC COG fibo program does not use an intperpreter, and completes FIBO(26) in 486ms with a program that is 1904 bytes. Of course hand crafted PASM would be better, but the programmer army is pretty small for that. GCC COG programs are practical for hardware devices, but there is not much room for anything else.

2. This is similar to 1) above except that the library provides for using FILE* handles and floating point.

3. This is a direct comparison where GCC uses the Tiny library as in 1) above except that the SPIN interpreter is removed from the SPIN program size.

4. This is a default comparison in that SPIN and GCC are used as the tools are provided by default.

Rsadeika · 2012-09-06 14:02

I just loaded and ran the new SimpleIDE, so far so good. There has been no application crash or freeze up, but I have only run the default "Hello World". Just a simple question about that, when I change the program to:

/*
 * This is a non-traditional hello demo for Propeller-GCC.
 * The demo repeats printing every 100ms with the iteration.
 * It uses waitcnt instead of sleep so it will fit in a COG.
 */
#include <stdio.h>
#include <propeller.h>

int main(void)
{
//    int n = 1;
//    while(1) {
//        waitcnt(CLKFREQ/10+CNT);
//        printf("Hello World %d\n", n);
      waitcnt(CLKFREQ/10+CNT);
      printf("Hello World\n");
//        n++;
//    }
    return 0;
}

I do not get anything on the console window.

Now I know you are featuring CMM with this release, but what does that do for me? I have seen it talked about in other threads, but it is still vague to me as to how I am supposed to appreciate this enhancement. Not sure what to make of this.

As for Tiny lib, there was mention that it contains cout/cin, is that still valid? If it is, does it just work with C++ mode?

Ray

ersmith · 2012-09-06 14:10

Rsadeika wrote: »

Now I know you are featuring CMM with this release, but what does that do for me? I have seen it talked about in other threads, but it is still vague to me as to how I am supposed to appreciate this enhancement. Not sure what to make of this.

CMM stands for "compressed memory model"; in this mode the Propeller instructions that GCC creates are compressed. Basically it makes programs smaller, so that more code fits in the memory. The tradeoff is that the code is slower. Steve will probably be posting detailed comparisons, but very roughly the CMM code is about half the size (except that there's a fixed size for the "kernel") and half the speed of corresponding LMM code, depending on the exact functions and the compiler optimization options given.

I don't know why you're having trouble with the demo, but I checked and it works for me on the command line. What memory model are you building in? What options are given to the compiler? Are you using the tiny library?

Eric

Dave Hein · 2012-09-06 14:20

Ray, you commented out the loop in your version of Hello World, so it is only printing once. The program is waiting one-tenth of second before it prints, so the print may not be making it to the screen in time.

CMM reduces the size of programs so that they have a better chance of fitting in hub RAM. I recompiled the c3files demo program with CMM and it reduced the binary down from 29,920 bytes with LMM to 17,076 with CMM. The vgademo program must be built as an XMMC program, but I can almost get it to fit in 32K using CMM. I think I can get a subset of the vgademo running under CMM.

Rsadeika · 2012-09-06 14:30

When I run the default "Hello World", I get the continuous stream of "Hello World", in all memory memory models for the QuickStart board. When I just changed to just display "Hello World" just one time, like I have in the code, I get nothing on the screen. So, basically by taking it out of the while loop, there is nothing showing up on the screen. I hope that explains it a little better.

Ray, you commented out the loop in your version of Hello World, so it is only printing once. The program is waiting one-tenth of second before it prints, so the print may not be making it to the screen in time.

But why does the same time length work for the original program? I tried increasing the time length and it does not change anything.

Ray

jazzed · 2012-09-06 14:36

Ray, I use a one second delay and the "hello world" output shows no problem.

Regarding tiny cout, i've run the libtiny_tests in the propellergcc-demos package and it seems to work pretty good, but it will still follow the caveats listed in the notes. The cout stream and friends is only valid in C++. You must #include <tinyiostream> to use the tiny library cout stream features.

The notes mentioned the CMM size and performance relative to LMM. The benefit is that programs like the calculator and filetest now have room to grow.

--Steve

Dave Hein · 2012-09-06 14:39

With the original hello.c code I don't see the first print, but I do see the prints after that. If I change the wait to clkfreq/6 I see the first print.

If I comment out the code like you did I still see the print with clkfreq/6. I don't see it if I use clkfreq/7.

EDIT: BTW, the Spin version of filetest compiles to 17,116 bytes, so it's virtually the same size as CMM. If I use the -O cgru flags with BST it does get it down to 15,660 bytes, but CMM is still very close to the size of Spin.

Heater. · 2012-09-06 17:03

Oh yeah,

Another 20 hours compiling all this for the Raspberry Pi.

Impressive work guys.

ersmith · 2012-09-06 17:57

Something else that may not be apparent from Steve's table is that CMM offers a wider range of tradeoffs in speed versus space, depending on what compiler flags are used. For example, consider the xxtea demo. In the table below "size" is the size of the btea function in bytes, and "cycles" is the number of cycles to decode the sample string. xSize is the size relative to Spin, and xSpeed is the speed relative to Spin. As you can see, the -Os (optimize for size) and -O2 (optimize for speed) flags produce very different results in CMM, mainly because in -O2 the fcache is enabled. In LMM there is very little difference, because fcache is enabled in both -Os and -O2 (it's disabled in CMM -Os because instructions that go in fcache cannot be compressed).

xxtea is an extreme example, because it is so heavily compute bound, but in general the -O2 option in CMM will offer you somewhat faster and larger code, whereas -Os will offer smaller code. The difference between the two is minimal in LMM (and indeed -Os is often faster than -O2).

xxtea results
               size    cycles  xSize  xSpeed
Spin bstc:      326   1044016   1.00   1.00
GCC CMM -Os:    320    322880   0.98   3.23
GCC CMM -O2:    528     59664   1.62  17.50 
GCC LMM -Os:    696     20944   2.13  49.85
GCC LMM -O2:    808     21408   2.48  48.77

Heater. · 2012-09-06 18:53

I'd say that over 3 times the speed and even a tiny bit smaller code than Spin is pretty impressive.
As is 17 times the speed for 1.6 times bigger code.

I presume you have missed out the CMM kernel size in the above table though.

Can we mix CMM and LMM code (running on different COGs)? Would we want to? We would then have two kernels worth of baggage in the binary.

Circuitsoft · 2012-09-06 21:09

Can't the CMM kernel handle uncompressed code, too?

ersmith · 2012-09-06 21:59

Heater. wrote: »

I presume you have missed out the CMM kernel size in the above table though.

The size is the size of the btea function (only), without any runtime or kernel. That seems to be the standard way we've reported the xxtea size in similar benchmark threads.

Can we mix CMM and LMM code (running on different COGs)? Would we want to? We would then have two kernels worth of baggage in the binary.

I guess in theory it's possible, but it's not something the compiler and linker are set up to do right now -- you'd have to write a custom linker script to do it. Would you want to? It seems unlikely -- CMM compiled with -O2 will usually be fast enough, and if it's not then you can declare small functions with __attribute__((fcache)) to ensure they always run from COG memory, or write COG C code.

ersmith · 2012-09-06 22:12

Circuitsoft wrote: »

Can't the CMM kernel handle uncompressed code, too?

No. It parses compressed instructions only. We could certainly add a simple LMM loop to the kernel, but then we'd have to have a mode switch to say when to use it, and keeping track of switching between compressed and uncompressed mode could be difficult.

All PASM instructions can be represented in the compressed code, and in fact the mapping is pretty direct. I need to finish writing up the documentation, but basically there are 5 cases:

(1) One of the 16 most common arithmetic instructions (add, sub, or, xor, etc.) with destination one of the 16 C registers and source either a C register or an immediate; the encoding is either 2 bytes (register/register or register/4 bit immediate) or 3 bytes (register/9 bit immediate).

(2) Moves: there are 4 move immediate instructions (long, word, byte, and 0, taking 5, 3, 2, and 1 byte respectively), and a mov register,register (taking 2 bytes). It's also possible to "piggyback" a move onto one of the common instructions in (1) above, at the cost of one extra byte in those instructions.

(3) Branches: there are short relative branches (2 bytes) and long jumps (3 bytes), both of them with a possible condition. There are also some short "skip 2 bytes" and "skip 3 bytes" instructions that can be used to implement conditional execution of cases (1) and (2) above.

(4) Other instructions with no conditional execution restriction: encoded as 4 bytes, basically the same as the original PASM but re-arranged to make parsing possible when we see the first byte.

(5) Special combinations: a few of the more common instruction combinations are encoded as "macros" that expand to 1 or 2 PASM instructions; usually these are calls into the kernel. One special macro is a "native" escape which says to treat the next 4 bytes as a verbatim instruction; this allows us to handle anything that the cases above don't (it's extremely rare).

Rayman · 2012-09-09 07:51

Does the size of a.out tell you how much eeprom space the program with take in CMM and LMM mode?
If so, my first test doesn't show much of a size reduction... I used spin2cpp to convert the PTP2 ScaleDemo example to C++.
In LMM mode it comes out to 20kB and in CMM mode it comes out ot 17kB.

Perhaps a lot of the size is the interpreter and the assembly driver, that I'd think that should only account for 4kB of uncompressable code...
The Spin version is 4kB total size...

ersmith · 2012-09-09 08:32

Rayman wrote: »

Does the size of a.out tell you how much eeprom space the program with take in CMM and LMM mode?

No. The a.out file includes a symbol table, debugging information, ELF header, and other things that are never loaded into the Propeller and are just used by PC tools.

If you strip the a.out with propeller-elf-strip you'll get closer to the final load size, but it still will have some headers and things that are never downloaded. The way to tell the actual download size is to look at the propeller-load output, or else to run "propeller-load -s a.out" to produce a binary file "a.binary", which will just have the things (both code and data) that are downloaded to the Propeller.

Rayman · 2012-09-09 08:47

Ok. Thanks.
Was just able to convert another PTP2 demo to C++ and have it run under CMM.
This is "PTP2_Bitmap_ScaleDemo", which is 24,536 bytes in Spin (mostly because of an embedded bmp picture).

Weird thing is that it works only in CMM mode... In LMM mode, I can load and run over serial, but if I try to load eeprom, it doesn't work...
It appears to load and verify, but I get a blank screen when it boots..
Binary in this mode is 30,828 bytes, so you'd think it should work...

I think maybe this LMM I2C driver is too fast or something like that...

jazzed · 2012-09-09 09:18

Rayman wrote: »

I think maybe this LMM I2C driver is too fast or something like that...

Could be. CMM is about 1/3 fast as LMM.

SimpleIDE gives the Code Size and Total Size (code + data size) of a program.
Propeller-load gives the Code Size of a program.

Rayman · 2012-09-09 09:35

Ok, actually the other demo has the same problem, I just didn't notice because wasn't cycling power... Seems the lcd initialization (over i2c) doesn't work in LMM mode, but it does work in CMM mode... I've tried adding delays in the code, but that hasn't fixed it yet...

Actually, adding a delay did fix it. But now it seems that this is more a Spin2Cpp question than a CMM question, so I'll ask it in the other thread.

Rayman · 2012-09-09 16:02

CMM mode really helps for the "PTP2_Bitmap_ScaleDemo"...
In LMM mode, the size is 31,980 bytes, which I think is way too close to the limit...
CMM mode brings that down to 28,124, giving enough breathing room to add more code.
So, this is doing pretty good consider a lot of this space is the embedded bitmap (20,278 bytes)...

The SPIN version is 24,536 bytes, so CMM is doing a pretty good job.

Christof Eb. · 2012-09-10 11:50

Whow cmm is great, propgcc is now really usable.
Congratulations!!!
I love the faster download due to smaller code.
Christof

Rayman · 2012-09-11 16:14

I've started looking at "Graphics Demo" to see if there's any to make it work in C++...

It compiles under LMM, but in CMM mode, I get:

propeller-elf-c++ -o a.out -Os -mcmm -I . -fno-exceptions -fno-rtti graphics.cpp mouse.cpp tv.cpp Graphics_Demo.cpp
C:\Users\Ray\AppData\Local\Temp\ccz7r0Wf.s:204: Error: value of 131072 too large for field of 2 bytes at 389

Any idea what that might be?

(BTW: If my math is right, then LMM is 8,524 bytes too big to run double buffered...)

jazzed · 2012-09-11 16:42

Rayman wrote: »
I've started looking at "Graphics Demo" to see if there's any to make it work in C++...

It compiles under LMM, but in CMM mode, I get:
propeller-elf-c++ -o a.out -Os -mcmm -I . -fno-exceptions -fno-rtti graphics.cpp mouse.cpp tv.cpp Graphics_Demo.cpp
C:\Users\Ray\AppData\Local\Temp\ccz7r0Wf.s:204: Error: value of 131072 too large for field of 2 bytes at 389
Any idea what that might be?

(BTW: If my math is right, then LMM is 8,524 bytes too big to run double buffered...)

I've been running LMM double buffered at 12x10 tiles without a mouse since ICC first released their compiler.
Got a zip of your C++ package ?

Rayman · 2012-09-11 17:31

I can believe you can get it double buffered by reducing the resolution from 16x12 to 12x10...
But, I'd really like to see CMM replicate the original's 16x12 tile resolution...

BTW: I'm not really sure what that error message is, but it looks like a CMM compiler bug to the untrained eye

jazzed · 2012-09-11 18:46

Rayman wrote: »

BTW: I'm not really sure what that error message is, but it looks like a CMM compiler bug to the untrained eye

Please post a zip so we can evaluate it.

Rayman · 2012-09-12 02:50

I think I've seen Eric say he's compiled Graphics_Demo with Spin2Cpp, so I think he has this already, but just in case I did something different, here's the stock Graphics_Demo after running through Spin2Cpp. Only change I made was to make it single-buffered by changing "bitmap_base" from $2000 to $5000.

CMM_Bug_Graphics.zip

ersmith · 2012-09-12 04:57

Rayman wrote: »

I think I've seen Eric say he's compiled Graphics_Demo with Spin2Cpp, so I think he has this already, but just in case I did something different, here's the stock Graphics_Demo after running through Spin2Cpp. Only change I made was to make it single-buffered by changing "bitmap_base" from $2000 to $5000.

That's funny, I must have a different version of Graphics_Demo -- mine wasn't triggering this bug (and yes, it is a bug; I'll look into it).

Thanks for the bug report!
Eric

Rayman · 2012-09-12 07:20

It's possible that I used the Graphics_Demo from an old Prop Tool (maybe 1.05). I'm not aware of any changes being made in these objects though...

Rayman · 2012-09-13 11:44

Any fixes for this bug? I just tried again we the Graphics_Demo source from the latest Prop Tool and got the same result...

Did notice though, that if I change to -O1 Mixed it works. Just doesn't work for -Os size or -O2 speed...

BTW: It does look promising for getting this working... The -O1 output is smaller, so now I think I only need about 4kB more to make it work...
I'm pretty sure reusing the drivers cog code space will give me that much back...

ersmith · 2012-09-13 12:35

Rayman wrote: »

Any fixes for this bug? I just tried again we the Graphics_Demo source from the latest Prop Tool and got the same result...

The fix is checked in to the propgcc repository. Making a new binary release is probably going to take a while, though.

PropellerGCC CMM Preview

Comments