Shop OBEX P1 Docs P2 Docs Learn Events
C and C++ for the Propeller 2 — Parallax Forums

C and C++ for the Propeller 2

On Thursday, Aug. 6 at 2pm Pacific time I plan to talk about C and C++ compilers available for the P2. I'll be discussing the following compilers:

Catalina 4.3
- https://sourceforge.net/projects/catalina-c/

fastspin 4.2.7
- https://github.com/totalspectrum/spin2cpp/

p2gcc 0.007 with propeller-elf-gcc 4.6.1 (SimpleIDE version)
- https://github.com/davehein/p2gcc/

riscvp2 (GCC 8.3.0)
- https://github.com/totalspectrum/riscvp2/

The basic outline will be:

- Overview of the 4 compilers
- Advantages and disadvantages of each
- Some benchmark comparisons
- Making code portable between compilers
- Using spin2cpp to convert Spin2 drivers to C code

Obviously since I'm the author of fastspin and riscvp2 you'll have to take my analysis with a grain of salt; clearly I am biased, but I will try to be fair to Catalina and p2gcc, which have some enthusiastic users :).

The benchmarks I was planning to run are:

CoreMark 1.0 (https://www.eembc.org/coremark/)
fftbench (https://github.com/ZiCog/fftbench)
proplisp (https://github.com/totalspectrum/proplisp)
«13

Comments

  • ersmithersmith Posts: 6,088
    edited 2020-08-04 08:00
    Hmmm, it looks like the latest version of Heater's fft benchmark uses C99 features, which won't work in Catalina at all and would require changing the p2gcc run script to pass -std=c99 to PropGCC. So I'm going to use a slightly older version of his benchmark. I'll attach it here, along with some preliminary results. The time is measured at 180 MHz for all compilers, and less time to compute the fft is better:

    Edit: updated sizes / times / command lines to reflect RossH's suggestions later in this thread.
    results:       time / size of binary loaded to P2
    
    riscvp2_lut:  8119 us  32412 bytes
    fastspin:    13130 us  25180 bytes
    p2gcc:       22740 us  20584 bytes
    catalina:    24666 us  23552 bytes
    
    
    command lines used:
    
    catalina -lci -ltiny -p2 -O5 -C P2_EVAL -C NATIVE -D __P2__ fft_bench.c
    fastspin -2 -O2 -o fastspin.bin fft_bench.c
    p2gcc -D __P2__=1 fft_bench.c
    riscv-none-embed-gcc -T riscvp2_lut.ld -specs=nano.specs -Os -o lut.elf fft_bench.c
    
    Compilers:
    
    catalina 4.3
    fastspin 4.2.7
    p2gcc 0.006 with PropGCC 4.6.1
    riscvp2 with GCC 8.3.0
    

    This is a small benchmark, so I wouldn't put too much stock into the sizes (they will be dominated by library sizes; both Catalina and riscvp2 have pretty complete libraries, fastspin and p2gcc less so).
  • RossHRossH Posts: 5,503
    Hi Eric

    Can you let me know what Catalina options you are using when compiling your benchmarks?

    Also, can you post the version of fftbench you are using? Version 1.0 works ok, but 1.2 needs some modification to to compile with Catalina since it is not strictly ANSI C, and I'd like to make sure we are using the same code.

    If I get time (not for your presentation) I'd like to modify it to use Catalina's multi-processing capabilities instead of OMP. It would make a nice demo, since it should speed up the execution time according to the number of available cogs.


    Ross.
  • RossHRossH Posts: 5,503
    Hi Eric

    Looks like our posts crossed - you actually answered my questions before I had asked them! :)
    ersmith wrote: »
    catalina:    25018 us  27104 bytes
    

    Why do you use optimizaton level 3 and not 5 for Catalina? The additional optimization doesn't make a great deal of difference in this case, but it may in some others.

    If you use -O5 in this case, the results are:
    catalina:    24802 us  26304 bytes
    

    Also, you can add -ltiny to use the "tiny" version of printf (does this look familiar? :) ):
    catalina:    24666 us  23552 bytes
    

    Again, not a huge difference, but it does highlight the fact that this particular benchmark is including parts of the standard C library, so if some of the compilers are not implementing the full library function it will make a difference in both the file sizes and the execution times.

    So the command I finally used (note that you don't need the -D PROPELLER) was:
    catalina -lci -ltiny -p2 -O5 -C P2_EVAL -C NATIVE -D __P2__ fft_bench.c
    

    Ross.
  • Thanks for the suggestions, Ross. I hadn't realized that -On went up to 5 on Catalina, I'll use that and -ltiny (when appropriate) in future.
  • RossHRossH Posts: 5,503
    Hi Eric

    Another thing to point out when benchmarking (at least for Catalina, but I am sure it would be true of some of the other compilers as well) is that the file size has little bearing on the actual memory footprint occupied by the program.

    Catalina loads up things like plugins and kernels into cogs, and then re-uses the memory they occupied as stack and heap space where possible. So quoting the file size is not a good indication of the run-time size. On the P1, which had only 32kb, it was crucial to re-use as much Hub RAM as possible, and this complicated the load process (and added to the file sizes) significantly.

    While this is now less critical on the P2, Catalina still reclaims memory where possible, so a better indication of the actual memory "footprint" of a program is the sum of the segment sizes (i.e. code, cnst, init, data) and not the final file size, which is often many kb larger.

    For instance, if you compile this program as a COMPACT program with Catalina, the code size is almost halved, but that difference is not reflected in the file size.

    Ross.
  • Hi Eric,

    this all sounds very interesting. I've only used Fastspin so far. I've tried Catalina but I have to admit that I gave up after not being able to run HelloWorld after two hours. I'd probably worked if I tried harder but I was too lazy because I'm really happy with Fastspin at the moment.

    However, I've seen that there is a debugger for Catalina called "black box" or "black cat". I think this is quite interesting as there seems to be no other real source level debugger for the P2 at the moment. Will you (or Ross if he also participates) talk about debugging in C?
  • RossHRossH Posts: 5,503
    ManAtWork wrote: »
    Hi Eric,

    this all sounds very interesting. I've only used Fastspin so far. I've tried Catalina but I have to admit that I gave up after not being able to run HelloWorld after two hours. I'd probably worked if I tried harder but I was too lazy because I'm really happy with Fastspin at the moment.

    However, I've seen that there is a debugger for Catalina called "black box" or "black cat". I think this is quite interesting as there seems to be no other real source level debugger for the P2 at the moment. Will you (or Ross if he also participates) talk about debugging in C?

    I don't really know how you can simplify "catalina -lc hello_world.c" much further ... but perhaps that's just me! :)

    BlackBox and BlackCat are two different (but similar) source-level debuggers. BlackCat was originally created by Bob Anderson, but with some technical input from me. It is a graphical debugger, but runs only on Windows. I wanted a debugger that worked on all platforms, so I created BlackBox, which is a command-line debugger in the style of gdb.

    I don't think I will be abe to participate in this discussion - the time doesn't work well for me because I generally have to work during the day. But I will certainly try and catch up with it afterward.
  • Taking an empirical view of the forum, it seems like there is a major push for Spin2 usage, and a way slower approach to using any flavor of C for the Propeller.

    Since the Simple Tools Library has a wealth of support for all kinds of devices that are used for the P1 and could be possibly used on the P2, should there be a c2spin2 conversion program, if that is even possible?

    Yea, that seems kind of weird and somehow backwards, but the Simple Tools Library has a lot of stuff covered. Sure would hate to see that go to waste, if everybody is going to switch over to using Spin2.

    Ray
  • Rsadeika,

    A conversion program is great if you just need some working code but it's not a good long term solution.

    Spin2 code would be nice for us Propeller-heads but Parallax needs Python code for their education customers and the truth is they are the money makers at the moment.

    I think going forward all documentation should include Pseudo-code that defines all the constants needed along with generic functions that show how to use the device.
    That would make it easier to customize the code for whatever micro or language someone wants to use.
  • ManAtWork wrote: »
    However, I've seen that there is a debugger for Catalina called "black box" or "black cat". I think this is quite interesting as there seems to be no other real source level debugger for the P2 at the moment. Will you (or Ross if he also participates) talk about debugging in C?

    I haven't used Catalina's debugger, so I'm not really qualified to talk about it.
    Rsadeika wrote: »
    Since the Simple Tools Library has a wealth of support for all kinds of devices that are used for the P1 and could be possibly used on the P2, should there be a c2spin2 conversion program, if that is even possible?
    There's not much point: all of the SimpleTools C code is based on Spin objects (or has Spin object equivalents). And it all has to be ported from P1 to P2. Given that it has to be ported anyway, I think most people will start with the original Spin objects rather than the C code.

  • RossH wrote: »
    I don't really know how you can simplify "catalina -lc hello_world.c" much further ... but perhaps that's just me! :)

    I got lots of error messages but was finally able to compile but not to download and run it. (see discussion here). But as I said, I had too little patience, that time. I'll watch the presentation and maybe give it another try.
  • RossHRossH Posts: 5,503
    ManAtWork wrote: »
    RossH wrote: »
    I don't really know how you can simplify "catalina -lc hello_world.c" much further ... but perhaps that's just me! :)

    I got lots of error messages but was finally able to compile but not to download and run it. (see discussion here). But as I said, I had too little patience, that time. I'll watch the presentation and maybe give it another try.

    Oh, yes - I remember. You had a problem with the payload loader. When you are ready to try again, let me know and we'll see if we can track it down. But let's take it up again in the original thread.
  • Another benchmark, this time the EEMBC CoreMark benchmark. These are not official scores, just my unoffical results. The binary sizes are just here because I know people will ask for them, but as Ross has mentioned, they really are more about the libraries than about the compiler's code generation. Unfortunately it's difficult to extract code sizes from some of the compilers, so this will have to do as a really lame estimate of relative sizes.

    I wasn't able to build this with the new llvm based C compiler; while it looks promising, it's still not quite ready to build big applications yet.

    Coremark benchmark results (more iterations/sec is better)
    riscvp2 gcc 8.3.0:   52 iterations/sec SIZE: 51476 bytes
    p2gcc   gcc 4.6.1:   40 iterations/sec SIZE: 21040 bytes
    fastspin 4.2.7:      28 iterations/sec SIZE: 30360 bytes
    catalina 4.3:        17 iterations/sec SIZE: 29952 bytes
    riscvp2_lut 8.3.0:   17 iterations/sec SIZE: 28948 bytes
    

    All tests were run at 180 MHz. fastspin, p2gcc, and riscvp2 binaries were run with:
      loadp2 -f180000000 -PATCH coremark.elf -b230400 -t
    

    Catalina defaults to 180 MHz, so the -PATCH was unnecessary.

    Various quirks:

    I had to disable floating point in core_portme.h in order to get the p2gcc version to build. This was probably good anyway, because it made the binaries smaller. I also had to manually copy a file, because p2gcc couldn't find its .s file in a subdirectory.

    fastspin produced incorrect results when compiled with -O2, so this is exposing an optimizer bug :(.

    riscvp2 has 2 ways of building: with a 32K instruction cache (the default) or with cache in LUT (riscvp2_lut). The latter is much smaller, but also much slower.

    How to build:

    Source code is at https://github.com/totalspectrum/coremark.

    for riscvp2, edit the propeller/core_portme.mak and do
      make PORT_DIR=propeller
    
    for fastspin, edit the propeller/core_portme.mak and do
      make PORT_DIR=propeller PORT_CFLAGS=-O1
    

    For catalina, use the command line:

    catalina -p2 -O5 -lci -ltiny -C NATIVE -I propeller -I . -D 'FLAGS_STR=\"default\"' -D ITERATIONS=0 -D PERFORMANCE_RUN=1 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c propeller/core_portme.c -o catalina

    For p2gcc, you'll need a hack, because it can't find the .s file in propeller/; so link core_portme.c to the current directory and do:

    p2gcc -I propeller -I . -D 'FLAGS_STR="default"' -D ITERATIONS=450 -D PERFORMANCE_RUN=1 -D P2GCC=1 core_list_join.c core_main.c core_matrix.c core_state.c core_util.c core_portme.c -o p2gcc.bin

    (The ITERATIONS count is limited because above a certain amount of time the "Total ticks" line is printed wrong, probably a bug in p2gcc's printf.)
  • Hi,
    it would be interesting to hear, if there is now a path possible to bring P2 into the arduino environment with one of these compilers.
    Not because this is the best tool (it is not) but because of the immense whealth of code and libraries (and a library management) and because it will make it easy to switch to P2. Programming Esp32 and P2 with the same tool would be great for example.
  • Hi,
    it would be interesting to hear, if there is now a path possible to bring P2 into the arduino environment with one of these compilers.
    Not because this is the best tool (it is not) but because of the immense whealth of code and libraries (and a library management) and because it will make it easy to switch to P2. Programming Esp32 and P2 with the same tool would be great for example.

    The arduino environment requires C++, so for now riscvp2 would be the only feasible option.
  • Arduino also requires a bunch of libs/code to be ported to work on P2, not just having a C++ compiler.
  • Here's another benchmark. This one is dead simple, just simulating sending 512 bytes of data over SPI via bit-banging. Of course on the P2 you'd be more likely to use smart pins for this, but I wanted something very simple and small so that (a) I could look at the output of the various compilers and perhaps discuss this at the Zoom meeting tomorrow, and (b) it might be feasible to try out the new LLVM based compiler, which is still very much under development. The LLVM performance wasn't great for now, but on the other hand it's still very early days and it will improve.

    As a bonus the benchmark was simple enough to translate into Spin easily, and so I could compare the C compilers to Spin2 (PNut and fastspin). All of the C compilers handily beat PNut on this benchmark.

    The SPI code I basically just took from the first google hit I got for "bit banged SPI in C"; I had to tweak it a bit of course (changed macros to work with pinh/pinl instructions, etc.). If I were writing it myself from scratch I would definitely use "int" everywhere instead of "unsigned char", which is slower on most compilers (you can see the difference in fastspin's performance in C vs. Spin; Spin only has "int" parameters). OTOH the compiler should adapt to the human's code, not vice-versa, so perhaps it's good to feed sub-optimal code to the compiler.

    Results (fewer cycles is better):
    C:
    
    fastspin:      82174 cycles elapsed
    riscvp2:      185120 cycles elapsed
    p2gcc:        219656 cycles elapsed
    llvm:         294421 cycles elapsed
    catalina:     572413 cycles elapsed
    
    Spin2:
    
    fastspin:     57589 cycles elapsed
    pnut:       1612440 cycles elapsed
    
    Command lines:
    
    catalina -lci -O5 -p2 -C NATIVE -o catalina simplespi.c
    fastspin -2 -O2 -o fastspin.bin simplespi.c
    p2gcc -D P2GCC simplespi.c
    riscv-none-embed-gcc -Triscvp2.ld -O3 -specs=nano.specs -o rv32.elf simplespi.c
    
    LLVM:
    
    ~/Parallax/llvm-propeller2/build/bin/clang -O3 -ffunction-sections -fdata-sections -fno-jump-tables --target=p2 -I ~/Parallax/llvm-propeller2/build/../p2_dev_tests/p2lib -I ~/Parallax/llvm-propeller2/build/../p2-libc/include -D__propeller2__ -D__P2__ -DLLVM -o llvm.o -c simplespi.c
    
  • Wuerfel_21Wuerfel_21 Posts: 5,124
    edited 2020-08-05 22:04
    Maybe slightly offtopic, but if Chip's Spin2 is anything like Spin1, then your SPI code should(tm) go faster if written like this: (or use PINW, but not sure how that behaves)
    pri spiWriteData(spiData) | spiCount
      pinh(SPI_CS)
      pinl(SPI_CK)
      pinl(SPI_CS)
      repeat 8
        if ((spiData <<= 1) & $100)
          pinh(SPI_MOSI)
        else
          pinl(SPI_MOSI)
        pinh(SPI_CK)
        pinl(SPI_CK)
      pinh(SPI_CS)
      pinl(SPI_MOSI)
    
  • Wuerfel_21 wrote: »
    Maybe slightly offtopic, but if Chip's Spin2 is anything like Spin1, then your SPI code should(tm) go faster if written like this: (or use PINW, but not sure how that behaves)
    Yes, your version actually does go a bit faster on PNut (1448600 cycles vs. 1612440 cycles) while keeping the same speed in fastspin. But that's kind of beside the point: as I mentioned I chose the code by googling and doing the most straightforward translation I could, with a deliberate goal of not trying to optimize for any one compiler. That's because I know the internals of fastspin and riscvp2 quite well and I didn't want to bias towards those.

    If I wanted the fastest possible SPI on P2 I would shift the data left by 24 at the beginning and test the high bit, because I know fastspin can optimize "x & $8000_0000" and "x <<= 1" by using the carry bit, leading to an inner loop of:
    	rep	@LR__0003, #8
    LR__0002
    	shl	local04, #1 wc
    	drvc	#5
    	drvh	#7
    	drvl	#7
    LR__0003
    
    and a benchmark score of 49397 cycles. I didn't want to cheat :).
  • Wuerfel_21Wuerfel_21 Posts: 5,124
    edited 2020-08-05 23:32
    Maybe I just have Spin bytecode optimization OCD.
    I've never looked at those colored blocks ever the same since the VentilatorOS DIR sorting incident :)
    IDK.

  • jmgjmg Posts: 15,181
    ersmith wrote: »
    .... That's because I know the internals of fastspin and riscvp2 quite well and I didn't want to bias towards those.

    If I wanted the fastest possible SPI on P2 I would shift the data left by 24 at the beginning and test the high bit, because I know fastspin can optimize "x & $8000_0000" and "x <<= 1" by using the carry bit, leading to an inner loop of:
    	rep	@LR__0003, #8
    LR__0002
    	shl	local04, #1 wc
    	drvc	#5
    	drvh	#7
    	drvl	#7
    LR__0003
    
    and a benchmark score of 49397 cycles. I didn't want to cheat :).

    That's not entirely cheating, as a good language should have pathways to be "high level assembler' and if that means the C code becomes less clear, but the ASM is faster, to me that is ok.

    I have quite a lot of MCU C code exactly like that, where I have to whack the compiler about the head, until the ASM it gives, is half-way-sane.

    ie I'd suggest you add a 'crafted fastspin' line, where C clarity is traded for speed. 49397 cycles crafted is faster than 82174 cycles as generic.
    Usually that is only done on inner loops, like in this example.


    Curious why fastspin(spin2) 57589 cycles is different from fastspin(c) 82174 cycles ? Is there an optimize step your spin2 compile path has, that is not yet in c ?
  • ersmithersmith Posts: 6,088
    edited 2020-08-06 00:02
    @RossH : I've been looking at compiler output in some detail as part of doing these benchmarks, and I think there's some low hanging fruit for further optimizing Catalina:

    (1) Subroutine calls are very expensive in Catalina. Is there some way you could modify the code generator to use one of the (many) P2 call instruction variants directly, instead of calling into the kernel for the call and the return?

    (2) As mentioned on another thread, pushing/popping multiple registers is cheaper with setq / rdlong patterns. In fact given the overhead of calling the current push_m / pop_m functions, you may find a saving by using these and saving/restoring all of the eligible registers all the time. Individual rdlong and wrlong are expensive.

    (3) Catalina is generating qmul / getqx for multiplies, which is fine, but you've spent a lot of effort elsewhere in the code to make it re-entrant and that particular sequence isn't (an interrupt could happen between the qmul and the getqx). Frankly I'm not sure the effort to allow ISRs written in C is worth it for the P2, but OTOH it is a unique feature of Catalina.

    (4) mov r22, ##512 is smaller and faster if re-written as "encod r22, #9"

  • jmg wrote: »
    That's not entirely cheating, as a good language should have pathways to be "high level assembler' and if that means the C code becomes less clear, but the ASM is faster, to me that is ok.
    Well yeah, but then the benchmark gets tuned to the compiler, and I wanted all of the compilers on a level playing field, at least for this particular comparison. If I have time tomorrow perhaps I can talk about some of the specific low-level optimizations fastspin is aware of. For bit-banging I don't think any compiler will beat it on P2; it's on higher level algorithms that GCC and LLVM can do better.
    Curious why fastspin(spin2) 57589 cycles is different from fastspin(c) 82174 cycles ? Is there an optimize step your spin2 compile path has, that is not yet in c ?

    I think I mentioned that the C version uses an "unsigned char" variable rather than "int". That means the compiler has to insert a "zerox" instruction in the inner loop of the C version (C requires that unsigned char be extended to int before doing arithmetic on it). Again, *I* know that int is better for fastspin, and indeed that will probably be true for all compilers, but I got the SPI code from Google and I tried to resist any urges to improve it. For Spin the issue didn't come up, parameters in Spin2 are always LONG.
  • jmgjmg Posts: 15,181
    ersmith wrote: »
    I think I mentioned that the C version uses an "unsigned char" variable rather than "int". That means the compiler has to insert a "zerox" instruction in the inner loop of the C version (C requires that unsigned char be extended to int before doing arithmetic on it). Again, *I* know that int is better for fastspin, and indeed that will probably be true for all compilers, but I got the SPI code from Google and I tried to resist any urges to improve it. For Spin the issue didn't come up, parameters in Spin2 are always LONG.

    I can understand the desire to harvest test code, but if one test then skews to use int, because that language cannot support unsigned char, then the benchmarks actually become less useful.
    The best benchmarks compare like-with-like.
  • RossHRossH Posts: 5,503
    edited 2020-08-06 02:29
    ersmith wrote: »
    @RossH : I've been looking at compiler output in some detail as part of doing these benchmarks, and I think there's some low hanging fruit for further optimizing Catalina:

    (1) Subroutine calls are very expensive in Catalina. Is there some way you could modify the code generator to use one of the (many) P2 call instruction variants directly, instead of calling into the kernel for the call and the return?
    Yes, the code was written for the P1 to implement register passing, and I had a terrible time getting it to work with lcc, which didn't want to co-operate at all. I could simplify it considerably it for the P2 - I don't think the overheads of register passing make as much sense on the P2, and not doing it would also reduce the code size.

    (2) As mentioned on another thread, pushing/popping multiple registers is cheaper with setq / rdlong patterns. In fact given the overhead of calling the current push_m / pop_m functions, you may find a saving by using these and saving/restoring all of the eligible registers all the time. Individual rdlong and wrlong are expensive.
    There is a problem here. I will have to try and recall the details, but I found you can't do things the obvious way if you want to implement interrupts. I decided in the end that I would rather have Catalina completely functional on the P2, and sacrifice efficiency. But again, I might revisit it. I originally had two mechanisms for saving and loading registers - the slow one that worked with interrupts, and the fast one that didn't. But then I found you can't always tell in advance which code will need interrupts, and it all became too difficult to continue to support both, so I have (for the moment) stuck with the slow but safe way.

    EDIT: I recall now - it was not SETQ that you can't use in an interrupt routine - it is the SKIP operations. I should indeed rewrite the push_m and pop_m functions to use SETQ!

    (3) Catalina is generating qmul / getqx for multiplies, which is fine, but you've spent a lot of effort elsewhere in the code to make it re-entrant and that particular sequence isn't (an interrupt could happen between the qmul and the getqx). Frankly I'm not sure the effort to allow ISRs written in C is worth it for the P2, but OTOH it is a unique feature of Catalina.
    Ah! Thanks for that - I didn't know that - I will have to disable interrupts :(

    (4) mov r22, ##512 is smaller and faster if re-written as "encod r22, #9"

    Good call, thanks.

    My approach with Catalina has always been "functionality first, efficiency later" ... but then there always seems to be more interesting functionality to implement!

    Had I but world enough and time ... :(

  • jmg wrote: »
    ersmith wrote: »
    I think I mentioned that the C version uses an "unsigned char" variable rather than "int". That means the compiler has to insert a "zerox" instruction in the inner loop of the C version (C requires that unsigned char be extended to int before doing arithmetic on it). Again, *I* know that int is better for fastspin, and indeed that will probably be true for all compilers, but I got the SPI code from Google and I tried to resist any urges to improve it. For Spin the issue didn't come up, parameters in Spin2 are always LONG.

    I can understand the desire to harvest test code, but if one test then skews to use int, because that language cannot support unsigned char, then the benchmarks actually become less useful.
    The best benchmarks compare like-with-like.

    Different languages are always going to have different code. The Spin2 version is just a sideshow. I'm benchmarking C compilers here, and I thought the Spin2 comparison would be an interesting contrast and a rough order of magnitude comparison of how hubexec and xbyte stack up, but evidently it's a pretty big distraction :(.
  • Sorry if I was a little short, @jmg; I was kind of hyper-focused on the C presentation at the time and didn't want distractions.

    But you do bring up some interesting points. It might be a good idea to have a "fastest implementation" kind of benchmark where people, for example, try to create the fastest possible SPI implementation for their particular language/compiler. We'd have to set out some rules (like no inline assembly) to really test the language itself, but it would provide a good kind of "upper bound" on how good a particular compiler can be. As you say, sometimes in MCU development you really want to squeeze out all the performance you can and tweak the code to be whatever is best for the exact tool you're using.
  • Yesterday's presentation was really useful for many, Eric. There's no better way to understand all that's going on than a live presentation where people can read and listen, ask questions, and get confirmation on the little details in their head.

    Roy brought up some points about cross-platform code style so it runs in each compiler. Parallax would like to get more involved, host libraries and lend some backing to Eric's efforts. I suggest we have a meeting before the end of August with those interested in C to discuss some of these details so we get more behind the effort.

    I'll schedule that meeting after we get the next couple planned on Zoom: the Free-for-all; SmartPins with Chip, and JonnyMac's "Tao of Spin2 Programming".

    Ken Gracey
  • Ken Gracey wrote: »
    Roy brought up some points about cross-platform code style so it runs in each compiler. Parallax would like to get more involved, host libraries and lend some backing to Eric's efforts. I suggest we have a meeting before the end of August with those interested in C to discuss some of these details so we get more behind the effort.

    That's a great idea, Ken. Having everyone on the same call to talk about making standard C libraries/defines would be really good.

    A small correction: it's not just "my" efforts, but rather the efforts of everyone involved in making C for the Propeller, and there are a lot of us :). Which is why having some coordination would be so helpful.
  • ersmith wrote: »
    Ken Gracey wrote: »
    Roy brought up some points about cross-platform code style so it runs in each compiler. Parallax would like to get more involved, host libraries and lend some backing to Eric's efforts. I suggest we have a meeting before the end of August with those interested in C to discuss some of these details so we get more behind the effort.
    A small correction: it's not just "my" efforts, but rather the efforts of everyone involved in making C for the Propeller, and there are a lot of us :). Which is why having some coordination would be so helpful.

    Noted with much recognition for all contributors!

    Ken Gracey
Sign In or Register to comment.