PropellerGCC CMM Preview
jazzed
Posts: 11,803
Hi All.
PropellerGCC CMM preview packages are beginning to show up in the SimpleIDE downloads page.
Click here to download the Windows package. Other packages will appear throughout the day.
Planned Packages
Below are notes in the download description. Please read them if you plan to try the package.
PropellerGCC CMM preview packages are beginning to show up in the SimpleIDE downloads page.
Click here to download the Windows package. Other packages will appear throughout the day.
Planned Packages
- Windows
- MacOSX x86-64 bit
- Debian Linux i686 32bit (Debian, Mint, Ubuntu)
- Debian and Fedora x86_64
- SUSE x86_64 (possibly)
Below are notes in the download description. Please read them if you plan to try the package.
This is a Windows Propeller-GCC Compressed Memory Model CMM preview test package. Unzip the program and double-click the setup program to install.
The package contains the PropellerGCC tool chain and SimpleIDE. SimpleIDE has enhancements for choosing CMM program type and using the tiny library with various memory models.
Programs compiled with CMM are about 41% smaller than a comparable 10KB or more LMM program. CMM programs run about 37% slower than comparable LMM programs. Smaller LMM programs are not compressed as much because the GCC "kernel" interpreter is loaded with the main program to allow running C functions in multiple propeller COGs.
There are no SPIN enhancements in SimpleIDE. Such enhancements are currently the subject of internal discussions with Parallax. The installer still requires running as admin by default.
Notes:
* A new __attribute__(fcache) has been added to let the programmer force loading small loops into the kernel COG to run at COG speed. It is available for any model including CMM which is smaller but also slower than LMM.
* Please ensure that binary libraries are compatible with the memory model selected. For example CMM code and XMM code are not compatible with LMM libraries. Propeller-GCC libraries are provided in all flavors and are automatically selected by the compiler when the -m$MODEL is on the command line.
* The default memory model for gcc is still LMM. To use CMM pass -mcmm on the gcc command line.
* The tiny library has been added to propeller-gcc and is built for all memory models. There are important caveats in using the tiny library:
-- Do not use the tiny library with programs using FILE* pointers such as passed to functions like fprintf, fread, etc....
-- Do not use the tiny library with programs using floating point.
-- Do not use the tiny library with -Dprintf=__simple_printf or programs that use #define print __simple_printf.
See issues pages for PropellerGCC and SimpleIDE for known bugs or enhancement requests being considered. Several issues have been resolved but not verified.
Thanks,
--Steve
Comments
http://code.google.com/p/propgcc/downloads/detail?name=propellergcc_v0_3_5_demos.zip&can=2
Comparable SPIN TinyLib GCC Sizes1
Compiler Memory Model
Boardtype
Code Bytes
FIBO(26) ms
xSize -vs- SPIN
xSpeed -vs- SPIN
SPIN
HUB
2888
10056
1.00
1.0
GCC XMM-SPLIT
C3
4328
7425
1.50
1.4
GCC XMMC
C3F
4172
6560
1.44
1.5
GCC CMM
HUB
3292
4792
1.14
2.1
GCC LMM
HUB
4364
1767
1.51
5.7
Comparable SPIN Default GCC Sizes2
Compiler Memory Model
Boardtype
Code Bytes
FIBO(26) ms
xSize -vs- SPIN
xSpeed -vs- SPIN
SPIN
HUB
2888
10056
1.00
1.0
GCC XMM-SPLIT
C3
14896
7425
5.16
1.4
GCC XMMC
C3F
13920
6560
4.82
1.5
GCC CMM
HUB
8356
4792
2.89
2.1
GCC LMM
HUB
14104
1767
4.88
5.7
TinyLib GCC and SPIN Sizes3
Compiler Memory Model
Boardtype
Code Bytes
FIBO(26) ms
xSize -vs- SPIN
xSpeed -vs- SPIN
SPIN
HUB
904
10056
1.00
1.0
GCC XMM-SPLIT
C3
4328
7425
4.79
1.4
GCC XMMC
C3F
4172
6560
4.62
1.5
GCC CMM
HUB
3292
4792
3.64
2.1
GCC LMM
HUB
4364
1767
4.83
5.7
Default GCC and SPIN Sizes4
Compiler Memory Model
Boardtype
Code Bytes
FIBO(26) ms
xSize -vs- SPIN
xSpeed -vs- SPIN
SPIN
HUB
904
10056
1.00
1.0
GCC XMM-SPLIT
C3
14896
7425
16.48
1.4
GCC XMMC
C3F
13920
6560
15.40
1.5
GCC CMM
HUB
8356
4792
9.24
2.1
GCC LMM
HUB
14104
1767
15.60
5.7
1. Comparable: This case compares SPIN and GCC on level ground. That is, both SPIN and GCC program types (except GCC COG mode) require an interpreter kernel work. In the case of SPIN, the interpreter is kept in the Propeller ROM and is about 1984 bytes, so for apples to apples comparisons 1984 is added to the 904 bytes of the FIBO spin program to get 2888. One could remove the kernel using a special loader to save memory and lose parallelism.
In the case of non COG GCC, the interpreter is kept in HUB RAM. Of course SPIN can start a function in a new COG using it's built-in interpreter ROM to allow multi-processor programs. For GCC to allow multi-processor programs, the interpreter must be kept somewhere too, and that is in HUB RAM by default. A COGNEW instruction can read from HUB RAM or ROM only.
SPIN does not by default include support for Floating point and a FILE* handle library, so the Tiny library which does not offer either is use in this comparison.
It is possible to throw away the GCC kernel and other tricks by using a two stage loader, but that would essentially make Propeller into a single core processor machine that happens to have peripherals running in other COGs. As we have been reminded, that is undesirable.
Just for completeness, it should be noted that the GCC COG fibo program does not use an intperpreter, and completes FIBO(26) in 486ms with a program that is 1904 bytes. Of course hand crafted PASM would be better, but the programmer army is pretty small for that. GCC COG programs are practical for hardware devices, but there is not much room for anything else.
2. This is similar to 1) above except that the library provides for using FILE* handles and floating point.
3. This is a direct comparison where GCC uses the Tiny library as in 1) above except that the SPIN interpreter is removed from the SPIN program size.
4. This is a default comparison in that SPIN and GCC are used as the tools are provided by default.
Now I know you are featuring CMM with this release, but what does that do for me? I have seen it talked about in other threads, but it is still vague to me as to how I am supposed to appreciate this enhancement. Not sure what to make of this.
As for Tiny lib, there was mention that it contains cout/cin, is that still valid? If it is, does it just work with C++ mode?
Ray
CMM stands for "compressed memory model"; in this mode the Propeller instructions that GCC creates are compressed. Basically it makes programs smaller, so that more code fits in the memory. The tradeoff is that the code is slower. Steve will probably be posting detailed comparisons, but very roughly the CMM code is about half the size (except that there's a fixed size for the "kernel") and half the speed of corresponding LMM code, depending on the exact functions and the compiler optimization options given.
I don't know why you're having trouble with the demo, but I checked and it works for me on the command line. What memory model are you building in? What options are given to the compiler? Are you using the tiny library?
Eric
CMM reduces the size of programs so that they have a better chance of fitting in hub RAM. I recompiled the c3files demo program with CMM and it reduced the binary down from 29,920 bytes with LMM to 17,076 with CMM. The vgademo program must be built as an XMMC program, but I can almost get it to fit in 32K using CMM. I think I can get a subset of the vgademo running under CMM.
But why does the same time length work for the original program? I tried increasing the time length and it does not change anything.
Ray
Regarding tiny cout, i've run the libtiny_tests in the propellergcc-demos package and it seems to work pretty good, but it will still follow the caveats listed in the notes. The cout stream and friends is only valid in C++. You must #include <tinyiostream> to use the tiny library cout stream features.
The notes mentioned the CMM size and performance relative to LMM. The benefit is that programs like the calculator and filetest now have room to grow.
--Steve
If I comment out the code like you did I still see the print with clkfreq/6. I don't see it if I use clkfreq/7.
EDIT: BTW, the Spin version of filetest compiles to 17,116 bytes, so it's virtually the same size as CMM. If I use the -O cgru flags with BST it does get it down to 15,660 bytes, but CMM is still very close to the size of Spin.
Another 20 hours compiling all this for the Raspberry Pi.
Impressive work guys.
xxtea is an extreme example, because it is so heavily compute bound, but in general the -O2 option in CMM will offer you somewhat faster and larger code, whereas -Os will offer smaller code. The difference between the two is minimal in LMM (and indeed -Os is often faster than -O2).
As is 17 times the speed for 1.6 times bigger code.
I presume you have missed out the CMM kernel size in the above table though.
Can we mix CMM and LMM code (running on different COGs)? Would we want to? We would then have two kernels worth of baggage in the binary.
I guess in theory it's possible, but it's not something the compiler and linker are set up to do right now -- you'd have to write a custom linker script to do it. Would you want to? It seems unlikely -- CMM compiled with -O2 will usually be fast enough, and if it's not then you can declare small functions with __attribute__((fcache)) to ensure they always run from COG memory, or write COG C code.
No. It parses compressed instructions only. We could certainly add a simple LMM loop to the kernel, but then we'd have to have a mode switch to say when to use it, and keeping track of switching between compressed and uncompressed mode could be difficult.
All PASM instructions can be represented in the compressed code, and in fact the mapping is pretty direct. I need to finish writing up the documentation, but basically there are 5 cases:
(1) One of the 16 most common arithmetic instructions (add, sub, or, xor, etc.) with destination one of the 16 C registers and source either a C register or an immediate; the encoding is either 2 bytes (register/register or register/4 bit immediate) or 3 bytes (register/9 bit immediate).
(2) Moves: there are 4 move immediate instructions (long, word, byte, and 0, taking 5, 3, 2, and 1 byte respectively), and a mov register,register (taking 2 bytes). It's also possible to "piggyback" a move onto one of the common instructions in (1) above, at the cost of one extra byte in those instructions.
(3) Branches: there are short relative branches (2 bytes) and long jumps (3 bytes), both of them with a possible condition. There are also some short "skip 2 bytes" and "skip 3 bytes" instructions that can be used to implement conditional execution of cases (1) and (2) above.
(4) Other instructions with no conditional execution restriction: encoded as 4 bytes, basically the same as the original PASM but re-arranged to make parsing possible when we see the first byte.
(5) Special combinations: a few of the more common instruction combinations are encoded as "macros" that expand to 1 or 2 PASM instructions; usually these are calls into the kernel. One special macro is a "native" escape which says to treat the next 4 bytes as a verbatim instruction; this allows us to handle anything that the cases above don't (it's extremely rare).
If so, my first test doesn't show much of a size reduction... I used spin2cpp to convert the PTP2 ScaleDemo example to C++.
In LMM mode it comes out to 20kB and in CMM mode it comes out ot 17kB.
Perhaps a lot of the size is the interpreter and the assembly driver, that I'd think that should only account for 4kB of uncompressable code...
The Spin version is 4kB total size...
If you strip the a.out with propeller-elf-strip you'll get closer to the final load size, but it still will have some headers and things that are never downloaded. The way to tell the actual download size is to look at the propeller-load output, or else to run "propeller-load -s a.out" to produce a binary file "a.binary", which will just have the things (both code and data) that are downloaded to the Propeller.
Was just able to convert another PTP2 demo to C++ and have it run under CMM.
This is "PTP2_Bitmap_ScaleDemo", which is 24,536 bytes in Spin (mostly because of an embedded bmp picture).
Weird thing is that it works only in CMM mode... In LMM mode, I can load and run over serial, but if I try to load eeprom, it doesn't work...
It appears to load and verify, but I get a blank screen when it boots..
Binary in this mode is 30,828 bytes, so you'd think it should work...
I think maybe this LMM I2C driver is too fast or something like that...
Could be. CMM is about 1/3 fast as LMM.
SimpleIDE gives the Code Size and Total Size (code + data size) of a program.
Propeller-load gives the Code Size of a program.
Actually, adding a delay did fix it. But now it seems that this is more a Spin2Cpp question than a CMM question, so I'll ask it in the other thread.
In LMM mode, the size is 31,980 bytes, which I think is way too close to the limit...
CMM mode brings that down to 28,124, giving enough breathing room to add more code.
So, this is doing pretty good consider a lot of this space is the embedded bitmap (20,278 bytes)...
The SPIN version is 24,536 bytes, so CMM is doing a pretty good job.
Congratulations!!!
I love the faster download due to smaller code.
Christof
It compiles under LMM, but in CMM mode, I get:
Any idea what that might be?
(BTW: If my math is right, then LMM is 8,524 bytes too big to run double buffered...)
I've been running LMM double buffered at 12x10 tiles without a mouse since ICC first released their compiler.
Got a zip of your C++ package ?
But, I'd really like to see CMM replicate the original's 16x12 tile resolution...
BTW: I'm not really sure what that error message is, but it looks like a CMM compiler bug to the untrained eye
Please post a zip so we can evaluate it.
CMM_Bug_Graphics.zip
Thanks for the bug report!
Eric
Did notice though, that if I change to -O1 Mixed it works. Just doesn't work for -Os size or -O2 speed...
BTW: It does look promising for getting this working... The -O1 output is smaller, so now I think I only need about 4kB more to make it work...
I'm pretty sure reusing the drivers cog code space will give me that much back...