Shop OBEX P1 Docs P2 Docs Learn Events
Radio Shack LCD: C vs Spin vs +PASM grudge match — Parallax Forums

Radio Shack LCD: C vs Spin vs +PASM grudge match

localrogerlocalroger Posts: 3,451
edited 2014-09-14 08:35 in Propeller 1
In the discussion of the Radio Shack touchscreen LCD module Jazzed very helpfully ported the Arduino code to C for the Propeller ASC as seen here:

http://forums.parallax.com/showthread.php/157141-2.8-quot-TFT-Touch-Screen?p=1290310&viewfull=1#post1290310

This is actually an interesting demo to use to compare development languages. It's small enough not to be hindered by the Prop's memory size, and doesn't especially benefit from any of the Prop's special features; it's basically shoveling bits out of I/O pins as fast as possible.

Working mostly line for line I translated the application to pure Spin:

SS_TFT.spin

That pretty much worked the first time. I then added pretty much the simplest possible PASM helpers. You wouldn't think PASM would buy you much in this application because the TFT display has a parallel interface; there are no SPI or IIC loops serializing the data, and it takes 12 I/O lines. Yet you would be wrong.

SS_TFT_PASM.spin

The C version is compiled to LMM because as Jazzed warns its performance is terrible in bytecode. In LMM it takes 2 seconds to clear the screen and about 1.5 seconds to draw three progressively larger lines of "Happy!"

It also takes 8772 bytes of Hub RAM. And the distribution necessary to make sure I get all the files needed to build it is a 2.6 megabyte folder of 299 files.

The pure Spin version is very slow; it takes 20 seconds to clear the screen, but draws the text in a very competitive 2 seconds. But it's also a single self-contained file and only takes 2,500 bytes of Hub RAM.

But look at the PASM-helped version! The amount of PASM is really modest and only increases the Hub RAM usage to 2640 bytes while making minimal changes to the Spin logic. It clears the screen in 2.5 seconds and draws the text in about a quarter of a second.

Edit: Adding just a few more longs of PASM blows out the jams. The whole demo in under 0.2 sec: SS_TFT_PASMX.spin

So suppose we resort to C bytecode? The Arduino code compiled to CMM takes 9 seconds to clear the screen and 6 seconds to draw the three iterations of "Happy!" struggling very noticeably on the third and largest line. And it still takes 5644 bytes of Hub RAM.

So what the hell is going on here?

A lot of C code is informed by the idea that calls are expensive, which is why you get things like this hidden in the .h file:
#define WR_HIGH     {PORT_WR|=WR_BIT;}
#define WR_LOW      {PORT_WR&=~WR_BIT;}

Those are macros. It's bothersome enough that you have to go snorkeling in the .h file to find out what the hell WR_HIGH does when you see it in the main source file, but what's even more bothersome is that every time you use it it generates a little string of byte codes -- or worse LMM instructions. This technique very effectively hides from you just how much RAM you're burning with a sequence like
    CS_LOW;
    RS_HIGH;
    RD_HIGH;
    WR_LOW;

The macro actually isn't a bad idea in a 32-bit or 64-bit system where the cost of a CALL can be 5 or 9 bytes, but in embedded byte code it kills you.

There's also this massive eater of RAM...
    sendCommand(0x0001);
    sendData(0x0100);
    sendCommand(0x0002);
    sendData(0x0700);
    //...repeat and repeat and repeat

In byte code each of those command-data pairs eats at least 10 bytes: Two call tokens with 16-bit arguments, and two push tokens with an 8 and a 16 bit argument. And if you're wondering why they didn't build a bloody DAT table like I did in the conversion for the startup magic number spray, just look at the syntastic gymnastics necessary to encode the font that way.

There is also the matter of the impressively execrable (I can't think of a more descriptive word that's OK in PG-13 land here) performance of the routine that draws text characters. I can't even figure out exactly why it's so bad but it's really, really bad, even in LMM. I suspect there is a lot of hidden stack frame overhead or something in those for statements. In any case Spin does it a whole lot better, with nearly the same syntax.

Anyway, that's C vs. Spin vs. Spin with a teeny bit of helper PASM on the Propeller. My urge to get up to speed on the C side of things has wilted considerably.
«1

Comments

  • David BetzDavid Betz Posts: 14,511
    edited 2014-09-10 18:46
    localroger wrote: »
    In the discussion of the Radio Shack touchscreen LCD module Jazzed very helpfully ported the Arduino code to C for the Propeller ASC as seen here:

    http://forums.parallax.com/showthread.php/157141-2.8-quot-TFT-Touch-Screen?p=1290310&viewfull=1#post1290310

    This is actually an interesting demo to use to compare development languages. It's small enough not to be hindered by the Prop's memory size, and doesn't especially benefit from any of the Prop's special features; it's basically shoveling bits out of I/O pins as fast as possible.

    Working mostly line for line I translated the application to pure Spin:

    SS_TFT.spin

    That pretty much worked the first time. I then added pretty much the simplest possible PASM helpers. You wouldn't think PASM would buy you much in this application because the TFT display has a parallel interface; there are no SPI or IIC loops serializing the data, and it takes 12 I/O lines. Yet you would be wrong.

    SS_TFT_PASM.spin

    The C version is compiled to LMM because as Jazzed warns its performance is terrible in bytecode. In LMM it takes 2 seconds to clear the screen and about 1.5 seconds to draw three progressively larger lines of "Happy!"

    It also takes 8772 bytes of Hub RAM. And the distribution necessary to make sure I get all the files needed to build it is a 2.6 megabyte folder of 299 files.

    The pure Spin version is very slow; it takes 20 seconds to clear the screen, but draws the text in a very competitive 2 seconds. But it's also a single self-contained file and only takes 2,500 bytes of Hub RAM.

    But look at the PASM-helped version! The amount of PASM is really modest and only increases the Hub RAM usage to 2640 bytes while making minimal changes to the Spin logic. It clears the screen in 2.5 seconds and draws the text in about a quarter of a second.

    So suppose we resort to C bytecode? The Arduino code compiled to CMM takes 9 seconds to clear the screen and 6 seconds to draw the three iterations of "Happy!" struggling very noticeably on the third and largest line. And it still takes 5644 bytes of Hub RAM.

    So what the hell is going on here?

    A lot of C code is informed by the idea that calls are expensive, which is why you get things like this hidden in the .h file:
    #define WR_HIGH     {PORT_WR|=WR_BIT;}
    #define WR_LOW      {PORT_WR&=~WR_BIT;}
    

    Those are macros. It's bothersome enough that you have to go snorkeling in the .h file to find out what the hell WR_HIGH does when you see it in the main source file, but what's even more bothersome is that every time you use it it generates a little string of byte codes -- or worse LMM instructions. This technique very effectively hides from you just how much RAM you're burning with a sequence like
        CS_LOW;
        RS_HIGH;
        RD_HIGH;
        WR_LOW;
    

    The macro actually isn't a bad idea in a 32-bit or 64-bit system where the cost of a CALL can be 5 or 9 bytes, but in embedded byte code it kills you.

    There's also this massive eater of RAM...
        sendCommand(0x0001);
        sendData(0x0100);
        sendCommand(0x0002);
        sendData(0x0700);
        //...repeat and repeat and repeat
    

    In byte code each of those command-data pairs eats at least 10 bytes: Two call tokens with 16-bit arguments, and two push tokens with an 8 and a 16 bit argument. And if you're wondering why they didn't build a bloody DAT table like I did in the conversion for the startup magic number spray, just look at the syntastic gymnastics necessary to encode the font that way.

    There is also the matter of the impressively execrable (I can't think of a more descriptive word that's OK in PG-13 land here) performance of the routine that draws text characters. I can't even figure out exactly why it's so bad but it's really, really bad, even in LMM. I suspect there is a lot of hidden stack frame overhead or something in those for statements. In any case Spin does it a whole lot better, with nearly the same syntax.

    Anyway, that's C vs. Spin vs. Spin with a teeny bit of helper PASM on the Propeller. My urge to get up to speed on the C side of things has wilted considerably.
    Have you tried using the PASM helper code with C so you're comparing apples to apples?
  • localrogerlocalroger Posts: 3,451
    edited 2014-09-10 19:16
    David Betz wrote: »
    Have you tried using the PASM helper code with C so you're comparing apples to apples?

    In C you don't generally launch PASM helper cogs; you have inline PASM which is actually very cool but still LMM. If it's possible at all to launch helper cogs it doesn't seem to be the simple and baked-in thing it is with the PropTool. If someone who knows the C dev system wants to try it I'm all ears. (If you don't have a RS TFT board I'll try it on mine.) It seems that would just be adding another PASM image on top of the LMM or CMM interpreter and not correcting the fundamental resource usage problems that make the C code so inefficient at this scale.
  • David BetzDavid Betz Posts: 14,511
    edited 2014-09-10 19:31
    localroger wrote: »
    In C you don't generally launch PASM helper cogs; you have inline PASM which is actually very cool but still LMM. If it's possible at all to launch helper cogs it doesn't seem to be the simple and baked-in thing it is with the PropTool. If someone who knows the C dev system wants to try it I'm all ears. (If you don't have a RS TFT board I'll try it on mine.) It seems that would just be adding another PASM image on top of the LMM or CMM interpreter and not correcting the fundamental resource usage problems that make the C code so inefficient at this scale.
    I pretty much never use inline assembler and often launch PASM in separate COGs. In fact, sometimes I steal that PASM from Spin programs. That's what I did to convert JonnyMac's DEFCON badge code to C++. You're right though that on a normal processor where C gets compiled into native assembler you would probably use inline assembly. The Propeller isn't such a processor though unless you use -mcog mode. On a more standard processor there is usually no need to go to inline assembler since the native code generation of a C compiler is usually fast enough. No LMM or CMM or Spin byte code interpreter involved.
  • Heater.Heater. Posts: 21,230
    edited 2014-09-11 00:52
    localroger,

    In summary you have:
    Size   Clear Screen  Write "Happy"
    
    
    Spin         2500   20            2
    
    
    PASM + PASM  2640   2.5           0.25
    
    
    C (LMM)      8772   2             1.5         
    
    I guess there are no surprises there.

    Spin is a marvel of design. As well as being a simple and elegant language design it gets compiled down to really small byte code programs, maximizing the amount of functionality that can be squeezed into such a small space as the Propeller. The byte code design itself is a gem, the required interpreter fits into the 4096 instructions of a COG. Amazing! Try that with a Java run time or even the old Pascal p-code system. The cost for all this capability is speed. Spin is slow.

    But wait. Spin makes it really easy to add assembler code to your program where needed for speed. And the Propeller architecture and PASM syntax are the simplest ways of working in assembler I have ever seen. It's not much harder to work in assembler on the Prop than it is to work in a high level language. It's even easier to write in PASM than Forth:)

    So with the integration of all these elements, the Prop architecture, the Spin language, the interpreter, PASM and it's seamless integration into Spin we get the best of both worlds. Small code for the bulk of an application and fast code where needed. Brilliant!

    Then there is C.

    Years ago, before there was a C compiler fro the Propeller, we used to discuss the possibility of having one.

    Some of us argued it was a totally pointless exercise because:

    1) C is normally compiled to native machine instructions. That makes no sense on the Prop because we can only run 496 instructions of native code. What use is that?

    2) LMM gets you bigger code but at the cost of the huge size of those 32 bit instructions and the terrible slow down of fetching them into COG.

    Could it be that you have just discovered that the nay sayers were right?

    On the other hand:

    My Fast Fourier Transform exists in C, Spin and PASM. Amazingly it turns out that the C version is not much slower than the PASM version. That FCACHE mechanism really works well there. Sorry I don't have the performance figures to hand.

    I have also written a C version of Full Duplex Serial that fits in the COG and manages 115200 baud.

    So what's up with that LCD driver code?
  • Heater.Heater. Posts: 21,230
    edited 2014-09-11 00:54
    localroger,
    In C you don't generally launch PASM helper cogs;
    Well why not?

    Certainly you can launch C code into COGs. See my C version of FDS. Certainly that helper code can be written in assembler instead of C.
  • Heater.Heater. Posts: 21,230
    edited 2014-09-11 01:07
    localroger,

    I did not really get your point about macros like:
    #define WR_HIGH     {PORT_WR|=WR_BIT;}
    
    Whenever you use WR_HIGH it will insert some code. After all it has to read the port, read WR_BIT, OR them together and write the PORT. You can't expect the work to be done with out some code being generated. How well that get's optimized is another story.

    The intent here is that whatever code the macro inserts, in line, into your program is smaller and faster than making a call to do the same thing. What with the overheads of passing parameters, calling and returning etc.

    If that is not the case using a macro is perhaps a poor choice.

    In Spin we have things like INA, OUTA and DIRA which are not actually variables. They are features baked into the language which no doubt makes the resulting code very tight. This is not really going to happen in C. As a cross platform high level language C cannot have hardware dependent things like that built in.
  • localrogerlocalroger Posts: 3,451
    edited 2014-09-11 05:20
    The macros aren't an entirely bad idea; I'm sure one reason the pure Spin version is so slow is that it's calling routines for things like RD_HIGH, and straight |= statements would be a lot faster. They also help make the code platform agnostic; the actual application Jazzed sent me will run on a real Arduino by changing an #IFDEF.

    I'm also pretty sure the Spin version is so slow to clear the screen because of all the subroutine calling overhead for those routines that were macros in the sketch.

    But it's not really obvious what is happening when you use a macro, and even this demo warns in the C comments that one of the common Arduinos has the pins arranged awkwardly so that it runs slow. You also lose the ability to combine several of these statements into a single AND or OR if ordering is unimportant. I'm seriously tempted to do a search-and-replace manual macro substitution on the Spin version to see what that does to the performance and file size.
  • jazzedjazzed Posts: 11,803
    edited 2014-09-11 06:55
    localroger wrote: »
    In the discussion of the Radio Shack touchscreen LCD module Jazzed very helpfully ported the Arduino code to C for the Propeller ASC as seen here:

    Up and running in about an hour. Very little pain for a C programmer.

    Get it working first, then optimize as resources permit.

    localroger wrote: »
    This is actually an interesting demo to use to compare development languages. It's small enough not to be hindered by the Prop's memory size, and doesn't especially benefit from any of the Prop's special features; it's basically shoveling bits out of I/O pins as fast as possible.

    Actually doing such comparisons is very helpful. ;-)

    localroger wrote: »
    There's also this massive eater of RAM...
        sendCommand(0x0001);
        sendData(0x0100);
        sendCommand(0x0002);
        sendData(0x0700);
        //...repeat and repeat and repeat
    

    In byte code each of those command-data pairs eats at least 10 bytes: Two call tokens with 16-bit arguments, and two push tokens with an 8 and a 16 bit argument. And if you're wondering why they didn't build a bloody DAT table like I did in the conversion for the startup magic number spray, just look at the syntastic gymnastics necessary to encode the font that way.

    Maybe the original author thought it was easier or clearer - both of which are often requirements for demos.

    Fonts are often created with separate software packages.

    How does the actual Arduino performance compare to what has been presented here?
  • localrogerlocalroger Posts: 3,451
    edited 2014-09-11 07:28
    jazzed wrote: »
    How does the actual Arduino performance compare to what has been presented here?

    That's a pretty good question. Maybe someone who has an actual Arduino can tell us.
  • DavidZemonDavidZemon Posts: 2,973
    edited 2014-09-11 07:34
    I just ordered mine from eBay (will be here Tuesday). I'm very curious to see what a bit of inlined-fcached-assembly can do for it (and of course, PropWare :) ). I'll start porting code over to PropWare's routines tonight and will test it soon as I get the screen Tuesday.

    Edit: Also, if someone hasn't already tested it by then i can time it on my Arduino
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-09-11 08:14
    Great comparison Roger!

    There is something wonky about the PASM clear screen code, it should be faster than LMM.
  • localrogerlocalroger Posts: 3,451
    edited 2014-09-11 09:39
    There is something wonky about the PASM clear screen code, it should be faster than LMM.

    The PASM version is still using Spin to iterate across all the pixels. When I tried coding that loop in PASM I couldn't get it to work. All of the methods in the example are amenable to full PASM conversion, but that involves a lot more work, and it's frustrating to debug because the source is totally undocumented and doesn't work at all if any tiny thing is wrong.
  • localrogerlocalroger Posts: 3,451
    edited 2014-09-11 15:18
    Well I tried replacing all the calls to routines like WR_LOW with the code as if they were replaced by a macro preprocessor, and it made no noticeable difference, and did make the code a lot harder to read.
  • David BetzDavid Betz Posts: 14,511
    edited 2014-09-11 17:05
    I think the problem here is that the Propeller doesn't have a native architecture that is good for high level languages. This is shown by the fact that Spin runs so slowly and that it is necessary to use hacks like LMM or CMM to implement a C or C++ compiler. One reason that Spin is so nice is that it allows PASM to be easily included in the same module as high level Spin code. However, this is really just a bandaid to get around the fact that Spin is so slow in the first place. On any other MCU you would probably write all of the code in a high level language. If there was a compiler for it, that could even be Spin. There would be little need for PASM blobs. Maybe this will happen with P2 where you will be able to execute code directly from hub memory. This is likely to remove most of the need for PASM code except in the most demanding situations.

    Edit: And I mean "hack" in the good sense! :-)
  • DavidZemonDavidZemon Posts: 2,973
    edited 2014-09-11 18:27
    Well, PropWare's bare-bones implementation is complete. I didn't include any extra methods - the only ones included are those required by the demo. But it's there. I won't be able to test it until Tuesday night when my LCD comes in. The pins in the demo are set up for the Propeller ASC but I left them as configurable so that it's easier to use on another platform.

    I found some possible opportunities for optimization that I will try once I get mine.

    If anyone else wants to give this a try in the meantime, I'm anxious to hear how it compares. I set the sendCommand and SendData methods as fcache so that might help a lot? maybe?
  • localrogerlocalroger Posts: 3,451
    edited 2014-09-11 19:32
    I'm very interested to see how it performs, SwimDude. Seems like the SC and SD methods are small enough to fit in a cog so fcache might work miracles with them.
  • Dr_AculaDr_Acula Posts: 5,484
    edited 2014-09-11 19:51
    localroger said
    The pure Spin version is very slow; it takes 20 seconds to clear the screen, but draws the text in a very competitive 2 seconds. But it's also a single self-contained file and only takes 2,500 bytes of Hub RAM.

    and jazzed said
    Get it working first, then optimize as resources permit.

    I agree. The key is to break it up into manageable bits and gradually transfer things from C or Spin into pasm.

    Your should be able to improve on 20 seconds for a screen refresh. Using two external ram chips and a tight pasm loop I think we got it down to 30 milliseconds on this thread http://forums.parallax.com/showthread.php/137266-Propeller-GUI-touchscreen-and-full-color-display/page9?highlight=touchscreen post #168

    The catch there is it uses too many propeller pins and too many external TTL chips. I'm working on fpga solution - there are some great solutions with a hybrid propeller/fpga.
  • ersmithersmith Posts: 5,900
    edited 2014-09-12 04:56
    One obvious optimization for the C++ code is to make some of the smaller, frequently used methods (like all_pin_xxx() and pushData()) inline in the TFT.h file. Merging the writes to OUTA would probably help a lot too.
  • localrogerlocalroger Posts: 3,451
    edited 2014-09-12 15:25
    AND with just a few more instructions to implement a PASM repeat-data loop for the clear screen, vertical, and horizontal line objects, it does the whole demo including clearing the screen and FILLING the large rectangle in under 0.2 seconds! Enjoy!

    SS_TFT_PASMR.spin
  • cavelambcavelamb Posts: 720
    edited 2014-09-12 19:24
    Oh baby, oh baby, Oh yeah!
    But that's just for the speed.

    As to the grudge match?

    How simple it was to add Simple_Numbers.spin, FullDuplexSerial.spin, a few lines of code,
    and my little project is up and running.

    edit:

    At least it looks simple.
    Having trouble with SimpleNumbers.
    But I can probably deal with that.
    The 299 files inthe C demo - no way.


    edit again:
    Interesting...
    Writing text that extends past the edge of the display seems to crash the display.
    That was my Simple_Numbers problem.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2014-09-12 23:24
    Great thread, Thanks Roger.

    I suppose the C is there for people that want to use it as a starting point; but Spin and PASM will continue to be around to speed things up or to reduce bloat when things don't work out..

    And of course, Forth could do this nicely.. but will likely never have floating point.
  • David BetzDavid Betz Posts: 14,511
    edited 2014-09-13 04:44
    Great thread, Thanks Roger.

    I suppose the C is there for people that want to use it as a starting point; but Spin and PASM will continue to be around to speed things up or to reduce bloat when things don't work out..

    And of course, Forth could do this nicely.. but will likely never have floating point.
    C will perform better than Spin in pretty much every case if you use the same PASM optimizations. It isn't reasonable to compare straight C with Spin+PASM. Please compare C+PASM with Spin+PASM and post those results for a fair comparison. This entire thread is really about how wonderful PASM is for making code run faster. It has nothing to do with either C or Spin. The fact that you have to resort to PASM at all is because the architecture fo the Propeller does not lend itself to efficient implementation of compiled languages except for very tiny programs.

    Oh, and another comparison you might make is to compare the performance of C using -mcog mode with Spin. There you will find that C performs almost as well as PASM without having to resort to writing in assembly language. Spin can't touch that performance.
  • localrogerlocalroger Posts: 3,451
    edited 2014-09-13 05:37
    cavelamb wrote: »
    Interesting...
    Writing text that extends past the edge of the display seems to crash the display.
    That was my Simple_Numbers problem.

    FillCircle also seems to crash it. Weird. It should be signal for signal the same as the slow Spin version, which works. And uncommenting the wait loops to make sure SendCommand and SendData are complete doesn't help.
  • localrogerlocalroger Posts: 3,451
    edited 2014-09-13 05:47
    David Betz wrote: »
    C will perform better than Spin in pretty much every case if you use the same PASM optimizations.

    That's true for LMM; my test shows that it's definitely not true for C byte code, which performs very noticeably more poorly than Spin in the character generator. And even giving it credit for the PASM image of the bytecode interpreter (can C reclaim this RAM after the interpreter starts?), the C byte code is still nearly twice the size of the Spin byte code.

    And of course using PASM discards the single biggest advantage of C which is its platform agnosticism -- the fact that you can load this Arduino sketch on a Prop ASC and run it. Once you PASM it up (or even use inline assembly) you can't take the resulting project and run it on a regular Arduino any more. It's then just as Propcentric as a Spin project, and probably bigger, slower, and requiring a lot more files to archive.

    Edit: Probably worth mentioning, one point where C excels and there isn't really any other similarly good solution is for large business logic that needs to be in XMM because it won't fit in Hub RAM, but doesn't need to be fast. I believe gcc is currently the best solution for this sort of thing and the miserable performance of the byte code interpreter doesn't matter when you're accepting miserable performance to load from a serialized XMM solution anyway. But you still need the option of speed for those functions that need it within a project.
  • David BetzDavid Betz Posts: 14,511
    edited 2014-09-13 06:05
    localroger wrote: »
    That's true for LMM; my test shows that it's definitely not true for C byte code, which performs very noticeably more poorly than Spin in the character generator. And even giving it credit for the PASM image of the bytecode interpreter (can C reclaim this RAM after the interpreter starts?), the C byte code is still nearly twice the size of the Spin byte code.
    You could reclaim this space with a two-stage loader and a >32K EEPROM. I'm surprised that CMM would perform worse than Spin without any PASM acceleration. Can you point me to an example of this? Also, you're correct that CMM isn't as compact as Spin bytecodes. However, I believe it is usually faster although you may have found a case where that is not true.
    And of course using PASM discards the single biggest advantage of C which is its platform agnosticism -- the fact that you can load this Arduino sketch on a Prop ASC and run it. Once you PASM it up (or even use inline assembly) you can't take the resulting project and run it on a regular Arduino any more. It's then just as Propcentric as a Spin project, and probably bigger, slower, and requiring a lot more files to archive.
    I guess in this case you are correct. Essentially, there really is no way to write performant code for the Propeller if that code is larger than will fit in COG memory . In that sense, Spin+PASM is the best platform for writing compact and efficient code on the Propeller. The Propeller architecture just isn't designed for fast compiled high-level language code in any language. You either have dog slow interpreted code or locked-to-the-Propeller assembly code that runs fast. You don't have to make this choice with any other processor that I know of. In most cases, high-level language code can be compiled to perform well enough for most purposes and assembly doesn't have to be resorted to often. Of course, high-level language code can be just as processor specific if it mucks with hardware registers. The only solution to that is to partition that code, be it in assembler or C, into a hardware abstraction layer. The remaining logic may have a chance of being processor neutral and can be ported to another process if desired. This is why you can port lots of Arduino code to the Propeller. It is also why you can't generally port any Propeller Spin code to any other processor. That hardware abstraction is seldom present.
  • jazzedjazzed Posts: 11,803
    edited 2014-09-13 06:50
    localroger wrote: »
    Once you PASM it up (or even use inline assembly) you can't take the resulting project and run it on a regular Arduino any more.


    This is not entirely correct. The idea of the C preprocessor makes lots of sense in this case. As you mentioned, the demo code is written to run on arduino or propeller. The only reason that is possible is because of the __PROPELLER__ defined symbol. For example, in TFT.h we have this.
    #ifdef __PROPELLER__
    #include "Font.h"
    #define PROPELLER_ASC
    #else
    #define SEEEDUINO
    #endif
    


    There is nothing stopping us from using the same convention for in-line ASM in sendCommand() and sendData() ... except for desire, time, and ability.
  • cavelambcavelamb Posts: 720
    edited 2014-09-13 07:04
    localroger wrote: »
    FillCircle also seems to crash it. Weird. It should be signal for signal the same as the slow Spin version, which works. And uncommenting the wait loops to make sure SendCommand and SendData are complete doesn't help.

    I tried a few things to recover for the display crash, but nothing I did helped.
    It was late. I'll play with it some more later.

    But it's easy enough for now to avoid writing off the edge.

    It's usable at this point..
    Even if no one takes it any further, it's usable now.
    There are things I'd like to see happen - SendRegister, for instance.
    And if bounds checking could be done to prevent the display from crashing, that would be nice.
    But it used carefully, at this point, it's working well enough to release.
    And get a driver working for the ADC chip on the ASC board.

    But it works right now as a color graphics display - in SPIN.

    Probably need to set it up to run in another cog at some point.
    An init call, stack, whatever, some way to return a Finished Flag to sync with?.

    My test last night was a simple 1 second down-count timer display.
    The paintscreenblack routine is so fast I used as a CLS function between prints.
    :)
    With the following added it looks like about 806 longs of code,
    22 longs for variables.

    Most Excellent Work!
    obj
    num : "Simple_Numbers"
    var

    byte DisplayDirect
    long PASM_mailbox
    long anum, atX, atY, atSize


    pub main
    init

    DoCountDown
    ' doTextSize
    ' doDraw

    Pri doCountDown
    atX := 75
    atY := 150
    atSize := 4
    repeat
    anum := 60 +1 ' starting value +1 for countdown
    repeat while anum >0
    paintscreenblack
    anum -= 1
    drawstring (num.decx(anum,3), atX, atY, atSize, YELLOW)
    delay (1000)
    paintscreenblack
    drawString(string("Lift-Off"),50,150,2,GREEN) ' this is where the crash came from drawing off the edge
    delay (3000)

    Pri doDraw
    fillrectangle(0,00,240,320,RED )
    drawcircle(100,100,60,BLUE)
    drawcircle(100,230,60,white)
    drawline(20,20,200,100,cyan)
    drawline(20,200,100,20,green)
    drawline(20,100,200,300,cyan)
    delay (2000)

    pri doTextSize
    drawString(string("Text Size 1"),0,100,1,YELLOW)
    drawString(string("Text Size 2"),0,150,2,GREEN)
    drawString(string("Text Size 3"),0,200,3,BLUE)
    drawString(string("Text Size 4"),0,240,4,WHITE)
    delay (2000)

    As for the language wars?
    This project has validated the cross-platform capability of C.
    With a bit of work from an experienced coder the Arduino sketch could be made to run on a Propeller.

    But it has also shown the down side as well.
    The monster complexity of the Sketch, and lack of speed made it quite unattractive to anyone NOT an expert C coder.
    Unusable might be said id the display speed is critical (and when isn't it?)

    So therein lies the rub.

    But not the end of the language wars, I'm sure... :)
  • David BetzDavid Betz Posts: 14,511
    edited 2014-09-13 07:44
    cavelamb wrote: »
    As for the language wars?
    This project has validated the cross-platform capability of C.
    With a bit of work from an experienced coder the Arduino sketch could be made to run on a Propeller.

    But it has also shown the down side as well.
    The monster complexity of the Sketch, and lack of speed made it quite unattractive to anyone NOT an expert C coder.
    Unusable might be said id the display speed is critical (and when isn't it?)

    So therein lies the rub.

    But not the end of the language wars, I'm sure... :)
    I guess if you're unwilling to use PASM with C then it will perform worse than Spin+PASM. I'm not sure where the complexity of the sketch came in. Are you talking about the sketch itself plus all of the libraries required to use it? Anyway, it is certainly much easier to use Spin+PASM than it is to use C+PASM. However, the need to use Spin+PASM is mostly due to the fact that Spin often doesn't perform well enough to use on its own.
  • DavidZemonDavidZemon Posts: 2,973
    edited 2014-09-13 08:11
    Well, I can't verify the correctness of course, but since this library is write-only, there's no reason I can't run some preliminary tests without anything connected to the board.

    Using no assembly, I got about the same results as Steve with his code (no surprise) - that is ~3 seconds to run the init function (I couldn't tell if you guys were timing the entire init function or just paintScreenBlack) and ~1.5 seconds to print all three lines of text.

    However... when I copied over the PASM routines... :)
    Now, I only bothered implementing sendCommand and sendData - not the multiSend or sendCmdSeq - but it dropped time to .811 and .325 seconds!

    And personally, what I think is the best part - here's all the (C++) code that it took:
    // Symbol for assembly instructions to start a new SPI cog
    extern "C" {
    extern uint32_t _SeeedTftStartCog (void *arg);
    }
    
    
    class FastSeeedTFT : public PropWare::SeeedTFT {
    public:
        virtual void start (const PropWare::Pin::Mask lsbDataPin,
                const PropWare::Port::Mask csMask,
                const PropWare::Port::Mask rdMask,
                const PropWare::Port::Mask wrMask,
                const PropWare::Port::Mask rsMask) {
            this->m_cog = _SeeedTftStartCog((void *) &this->m_mailbox);
            if (-1 == this->m_cog) {
                print("Oh poo! The cog didn't start :(");
                return;
            }
    
    
            SeeedTFT::start(lsbDataPin, csMask, rdMask, wrMask, rsMask);
        }
    
    
    protected:
        virtual void sendCommand (const uint_fast8_t index) {
            this->m_mailbox = index << 8 + pmb_sendCMD;
        }
    
    
        virtual void sendData (const uint_fast16_t data) {
            this->m_mailbox = data << 8 + pmb_sendDATA;
        }
    
    
    protected:
        typedef enum {
            pmb_idle,
            pmb_sendCMD,
            pmb_sendDATA,
            pmb_repeat
        } SeeedTftAsmFunc;
    
    
        volatile atomic_t m_mailbox;
        int8_t m_cog;
    };
    

    I will also admit, I had to remove "const" from the end of each method declaration in PropWare::SeeedTFT which actually increased the runtime of that class from 3/1.56 to 3/1.68. This could be remmedied by using a mailbox that was in the global scope instead of a member variable though.
  • jazzedjazzed Posts: 11,803
    edited 2014-09-13 08:18
    I fail to understand what grudge match or language war has to do with this topic.
Sign In or Register to comment.