Shop OBEX P1 Docs P2 Docs Learn Events
Using PropGCC and ASM — Parallax Forums

Using PropGCC and ASM

rwgast_logicdesignrwgast_logicdesign Posts: 1,464
edited 2015-09-15 18:04 in Propeller 1
I have only used spin on the prop and Ive havent been giving the prop enough love lately mostly been involves with AVR on the 8 bit side and PSoC4/5 on the 32 bit side.

Im taking a job where I will be required to ravel for 6 weeks straight and be home for 10 days. Im getting a house with a separated lab that has AC YAY!!! On the road I plan to learn an EDA tool either diptrace or KiCad. I would like to develop an OSHW kit and I would like to use the prop I just bought harpits book on ASM and I know C well. What I would like to know is if there is a way to inline assembly with C, and where some resources or examples are of using C without parallax librarys, i.e c just using low level access to registers and bit operations.

I plan on using a 24bit ADC and very stable .05% voltage reference at 2 volts or under, this will be high res test gear in kit form on tindie. but it will perform multiple tests and functions along with input and LCD control so a prop seems like a good way to go, a PSoC would do but I need much higher end opamps and ADC's than what are built on to the PSoC... I want to boot strap on tindie.

PS Ive posted here for along time and I did not see a seperate C form like there us to be so im sorry if this is in the wrong place...

Comments

  • Likely the best person to ask is David Betz, as he's leading the GCC group on the Prop. I know it's possible to mix PASM and C, but generally you write a function in PASM, run it in its own cog, and then "message" it from another cog in C.

    You *might* be able to inline PASM, but you'd probably be limited to the LMM model, as that one uses native PASM instructions instead of a virtual machine. You'd also need to know what the COG map was for the compiler so you didn't step on anything they use.
  • DavidZemonDavidZemon Posts: 2,973
    edited 2015-09-15 19:08
    well, I don't mean to be rude, but I'm afraid you're quite wrong Jason (except for the part where David Betz is a smart guy lol)

    Inline assembly on the Prop is a brilliant concept and, once you get the pattern down, quite easy. I use it a lot in my HAL, PropWare.

    rwgast, since you haven't been around in a long time, you may not have seen PropWare. I encourage you to check out the links in my signature. You may enjoy using it (at least the build system) or you may not - but I can guarantee you'll find a lot of good examples in there.

    For a short-and-sweet example of inlined, fcached assembly, see my function in the UART classes that sends out a single word of data.

    And for I/O operations, check out my Pin and Port classes.

    I don't have any examples of inline assembly without fcache because, in my opinion, if it's not important enough for fcache, it's not important enough to be in assembly.
    Now, that's not entirely true. I wrote a stepper motor driver for a Pick-n-Place machine that runs via cogc mode, so I had no need for fcache. In that case, I had a high level loop written in C++ waiting for instructions and writing a response, and then the tight motor driver loop was written in inlined assembly. Have I mentioned that I love C++ on the Propeller? :D

    When using fcache though, BE WARNED: You only get 64 instructions. If you exceed 64 instructions, it fails at runtime (no compiler error/warning).
  • No offense taken - It's been a while since I've looked at PropGCC, so it's probably matured quite a bit since I saw it last. I'll have to have another go at it.

    Can you use inline asm without CogC or FCache? There may be cases where you can use knowledge of the instruction set to do things that might be hard to accomplish in C (like the REV instruction). Just curious.

    I started writing a robot arm a while back in PropGCC, but I stopped because it was clear I was going to fill up the 32Kb very quickly in LMM mode, and in CMM mode it wasn't much faster than Spin. At the time, the integration of threads and asm was nearly non existent, so I'll have to try again.
  • Actually, Eric Smith is a better resource for inline assembly. Yes, you can do it though.
  • DavidZemonDavidZemon Posts: 2,973
    edited 2015-09-17 01:41
    The original content of this reply is below, in the quote.
    I state in the reply that the assembly code works in and out of CogC mode - that is not accurate. It only works when compiled as a CogC module due to the jmp. It also doesn't loop because I'm bad at copy/paste, but that's beside the point.

    The attached assembly code does show the syntax for inline assembly that does not use FCache though. Pretty straightforward, just remember not to use jmp, djnz, etc
    JasonDorie wrote: »
    No offense taken - It's been a while since I've looked at PropGCC, so it's probably matured quite a bit since I saw it last. I'll have to have another go at it.

    Can you use inline asm without CogC or FCache? There may be cases where you can use knowledge of the instruction set to do things that might be hard to accomplish in C (like the REV instruction). Just curious.

    I started writing a robot arm a while back in PropGCC, but I stopped because it was clear I was going to fill up the 32Kb very quickly in LMM mode, and in CMM mode it wasn't much faster than Spin. At the time, the integration of threads and asm was nearly non existent, so I'll have to try again.

    Here's a meaningless snippet of inline assembly which does not use FCache. It makes no difference if it is CogC or not - I only brought that up because of the opinion I expressed above.

    This snippet of code will read the "limit pin" from INA four times per second and, once the pin goes low, it will exit.
    __asm__ volatile (
            "               jmp #StartPerformMove%=                                     \n\t"
    
            // Declare temporary variables
            "clock%=:                                                                   \n\t"
            "               nop                                                         \n\t"
    
            // START
            "StartPerformMove%=:                                                        \n\t"
            "               mov clock%=, %[_delayAtCurrentRung]                         \n\t"
            "               add clock%=, CNT                                            \n\t"
    
            "ascStepLoop%=:                                                             \n\t"
            "               waitcnt clock%=, %[_delayAtCurrentRung]                     \n\t"
            "               test %[_limitPinMask], INA wz                               \n\t"
            " if_z          jmp #allDone%=                                              \n\t"
    
            "... some other stuff.... \n\t"
    
            // DONE
            "allDone%=:                                                                 \n\t"
    : [_delayAtCurrentRung] "+r"(CLKFREQ / 4), // Pause for 1/4 second
    : [_limitPinMask] "r"(this->m_limitPin.get_mask()));
    
  • Is there an easy path to porting existing PASM code to PropGCC?
  • Here's a meaningless snippet of inline assembly which does not use FCache. It makes no difference if it is CogC or not - I only brought that up because of the opinion I expressed above.

    This snippet of code will read the "limit pin" from INA four times per second and, once the pin goes low, it will exit.
    __asm__ volatile (
            "               jmp #StartPerformMove%=                                     \n\t"
    
            // Declare temporary variables
            "clock%=:                                                                   \n\t"
            "               nop                                                         \n\t"
    
            // START
            "StartPerformMove%=:                                                        \n\t"
            "               mov clock%=, %[_delayAtCurrentRung]                         \n\t"
            "               add clock%=, CNT                                            \n\t"
    
            "ascStepLoop%=:                                                             \n\t"
            "               waitcnt clock%=, %[_delayAtCurrentRung]                     \n\t"
            "               test %[_limitPinMask], INA wz                               \n\t"
            " if_z          jmp #allDone%=                                              \n\t"
    
            "... some other stuff.... \n\t"
    
            // DONE
            "allDone%=:                                                                 \n\t"
    : [_delayAtCurrentRung] "+r"(CLKFREQ / 4), // Pause for 1/4 second
    : [_limitPinMask] "r"(this->m_limitPin.get_mask()));
    
    Does this code work if compiled in LMM mode? I would think that the jmp instructions would fail because the code is actually in hub memory not in cog memory.

  • DavidZemonDavidZemon Posts: 2,973
    edited 2015-09-16 12:57
    You know... I'm not sure. I thought it would, but I just realized I've never actually written asm that wasn't fcached or running as a cog module...

    I'll check tonight! :)
  • JasonDorie wrote: »
    Is there an easy path to porting existing PASM code to PropGCC?

    The
    .pasm
    
    directive will tell PropGCC to assemble your PASM code with almost perfect consistency relative to PropTool. See this thread for reference as well.
  • Thanks for all the comments and suggestions ill look more in to this, my problem is I havent been around in a while..... I left about the time they set up the propGcc tutorials.... While I love C (not c++ dont care for classes) It seemed to me parallax was trying to turn prop gcc into Arduino which if I need to bust some quick test code out ill go to a bs2 board or an Arduino depending on what im doing. But it im going to write C for a cpu as nice as the prop I want to be at the bare metal like an arm mcu or a z80 cpu etc... Im not looking to use parallaxes huge newb librarys to get in my way of doing specailized things, if I wanted that I would use spin it fits the propeller architecture better and is compact.

    Sadly I may end up doing this in pasm, and spin, I have to do more research on propC its been a while....

    Lets just say Ill be using 24bit dacs and ADCs along with a pair of Kelvin Clips...... I want to boot strap this kit on tindie and I want the prop to be in it, not a Cypress or STM chip. I figure if I have to bolt on extra peripherals anyways, why use chips that waste silicon on lower end dacs and adc's and in the case of PSoC opamps that arent instrumentation amps. The prop is my go to when I will be using lots of extra chips because it means I get to use up the tiny pieces of the 8 cogs to do a trivial read on the spi bus then do use the rest for whatever I like.

    I was hoping the propGCC had evolved to where I could through a ton of code on to an EEPROM or SD and then load it all into 1meg of spi sRam chips and have the prop start crunching as many lines of ASM and C as Id like quickly, since C code is no where near as compact as spin, or wasn't a year or two ago.

    I also wanted to build and SDR and the prop and spin acually help out with learning concepts but will never do what I want so that Idea has gone this way... Parallela. The biggest reasoning for that besides raw power is the DSP and handeling floating points well, although going this path will totally kill my goal to be cheaper than a HackRF but it will totally kill the HackRF and put it more in the Ettus USRP type SDR range but at least it will be cheaper and most likely way more powerful than an Ettus board.

    I still have my frank bot built on a parallax sting ray which uses a few atmeaga's with the uno bootloader to control mundane things such as realays for battery packs and reading sensors... my idea there was to collect and format all needed data on 8 bit mcu's then send the data to the prop which acually handles motor control and decisions. The top deck is for something like SBC or maybe a Parrallela (Maybe ill make the real Big Brain MuHAHAHAHA using Parrallela, this is a joke for old schoolers) now to run open CVS and send info to the prop... for those of you who may not have ever seen my projects on here especially since the forum has changed heres frank

    How frank sits now
    EARLY open loop figure 8

    This whole project was to to use wheel encoders along with vision and a wiimote sensor and laser to do mapping and dead reckoning. Right now its basically in dont hit the walls mode or BT control mode. I kind of stoped working on it when my daughter was born and became much more interested in Analog (out of need for precision test tools cheap), and SDR/HAM (Still waiting on my call sign grrrr alow FCC) out of interest in SATCOM.

    Ive recently gotten out of industrial electrical and am going to work flying around for 6 weeks at a time doing 4g/LTE base stations. So ill be doing six days a week 12 hours a day and staying in hotel's away from the family. I plan to bring a lab in a heavy duty pelican case, with spare parts, dev boads, wire, soldiering station and a scope. My biggest hurdles now learning the idiosyncrasies of propGCC (please no classes) and PASM, along with an EDA tool. Ill me doing good for myself making 5 digits every six weeks then be home for 10 days, Im in the middle of getting in to an old grow house where a guy left a bunch of hist stuff but I basically will have my own 20x20 lab already decked out with celings hangers AC and Swamp cooler Shelves everywhere and a work bench. Im hoping that I can start a kit or even full product business and work on electronics only soon. I tried the consultation thing been of for a year with max unemployment but just as it started picking up there was alot of drama in my life and I moved home and took my baby mama to court for custody of my daughter. Were going through counceling and trying to reconcile so hopefully I wont be a full time parent anymore and get back to doing what I love, and start making cash off it.

    Sorry that post kind of went on..... point is Im hoping the prop will work out, I actually think better in parallel and deterministic than I do the standard way. I swear those Interrupts are always causing timing bugs in complex software! Along with librays like Arduino and now Parallax "C learning system". I understand that parallax is geared twords hobbyists and some of you guys consult with it using spin for industrial control... but if they really want to get up there with PIC/AVR/ARM they need to start writing material that doesn't use hand holding library's. It really gets on my nerves they havent written a a PASM book nor do they carry Harpits Pasm 101 book, it is a plain cover but its professionally bound and packed full of info! I know enough x86 to be dangerous and to use things like IDA pro (THE x86 dissembler) in order to help me hex programs i cant afford, for an extended trial. Since Hanno wont release PropScope source and Im interested in turning it in to an SDR platform I have been dissecting the software and doing USB monitoring with wireshark. The propeller powered open scope project has seemed to disappear :/ but I intend to donate my findings to sigrok (an open source cross platform project for almost all test instruments) because id like anyone with a propscope like me to be able to plug it in to there cell phone and easily dignose low bandwidth problems like alternator noise etc...

    PS... Parallax or Anyone in the forum know my avatar was a pic of my dog and best friend through alot of hard times, she recently died in her sleep next to me.... I dont have that picture anymore if there is a way to get my old avatar back please help
  • Your comment is ginormous, so I'm going to reply in multiple posts, as I read through yours.
    Since C code is no where near as compact as spin, or wasn't a year or two ago.

    PropGCC's cmm memory model is, in my experience, a very near match with Spin for both code density and speed. You have a little bit more overhead (I think ~2k) because the CMM interpreter isn't in ROM, like the Spin interpreter, but it ends up being surprisingly close. We did see better performance in Spin when running tight loops - C's stack overhead came into play and took its toll.

    But PropGCC has one major improvement: inline assembly. With that, all of sudden we have the performance of assembly when we need it, and the compactness of Spin (aka, CMM) when we don't. Spin can't compete with that.
  • I was hoping the propGCC had evolved to where I could through a ton of code on to an EEPROM or SD and then load it all into 1meg of spi sRam chips and have the prop start crunching as many lines of ASM and C as Id like quickly, since C code is no where near as compact as spin, or wasn't a year or two ago.
    You can certainly run lots of PropGCC XMM code from SPI EEPROM or even SPI SRAM loaded from an SD card. That support has been there pretty much from day one. It isn't promoted by Parallax for some reason and they don't really have any boards that support it other than the C3 but it's pretty easy to wire up a SPI flash chip and start writing lots of C code.

  • You know... I'm not sure. I thought it would, but I just realized I've never actually written asm that wasn't fcached or running as a cog module...

    I'll check tonight! :)

    Well.. it doesn't work.. :(

    I don't know how to loop in LMM or CMM modes.
  • David BetzDavid Betz Posts: 14,511
    edited 2015-09-17 01:43
    You know... I'm not sure. I thought it would, but I just realized I've never actually written asm that wasn't fcached or running as a cog module...

    I'll check tonight! :)

    Well.. it doesn't work.. :(

    I don't know how to loop in LMM or CMM modes.
    You have to use the LMM macros for branching. Also, your local variable won't work because the code isn't running in COG memory. There is no way to use a 9 bit address field of a Propeller instruction to address hub memory. There are also LMM macros for that.

    Edit: Actually, the PC is in a COG register so you can branch short distances by adding and subtracting from the PC.

  • My biggest hurdles now learning the idiosyncrasies of propGCC (please no classes) and PASM

    Sorry that you don't like classes. I'd love to hear your reasons behind that opinion/choice sometime, but it's definitely off-topic in this thread.

    If you don't like Parallax's C-learning system and you don't like OOP, then you will be largely starting from scratch (which sounds like what you want anyway). You will, however, find a lot of helpful examples in the Simple libraries, PropWare, and libpropeller. And as always, when you get down to the code and have specific questions - we're here for you.

    Also, the latest (working) PropGCC builds can be downloaded from my build server - the download links can be found on PropWare's home page and in one of the threads here on the forum... I don't have the link handy.

    You seem to be a genuine power user though, so you might enjoy PropWare's build system a lot. As I state in the homepage, "... provide both novice and expert users alike a single environment that approaches the simplicity of SimpleIDE or Arduino without sacrificing a single ounce of power or flexibility." Without PropWare's build system, you're either creating your own Make template file or using SimpleIDE for everything... which I bet you're not too fond of :P
  • DavidZemonDavidZemon Posts: 2,973
    edited 2015-09-17 02:04
    the propeller powered open scope project has seemed to disappear

    You might check out Peter's recent post about Splat. And also jazzed's Propalyzer.

    And if you need a real LA that is small, (relatively) cheap, and excellent: https://www.saleae.com/
  • Hey thanks for all the information great start, long road ahead I guess :) I will definatly check out propware. I use to use propalyzer when I first got started then bought a salae 8 unit and a siglent scope, its basically the same as a lecroy waverunner two different vendors one OEM. I havent seen splat but 32, channels thats nice.. Peter and Jazzed are very talented with the propeller. I love the round the props architecture but it seems like it will be pretty nuts to figure out variable access/scope and so fourth using C.

    As far as my not being a CPP fan, classes are ok and CPP is ok on a PC but I think its just to much overhead and not really needed on micro most of the time. I mean John was still using C to write Quake 2. For me its not so much about what paradigm to use its about what gives the most bang for your buck on the hardware you have. As far as higher level languages if I work with the Parallela in the future I would like to give google go a try it seems like I nice language but it obviously is not meant for anything less powerfull than an SBC, I hear its threading capabilities are great. Ive been into so much straight hardware lately and just using a micro for small stuff that I have thought about going down the FPGA route but thats not a skill one just learns in a few weeks!
  • it seems like it will be pretty nuts to figure out variable access/scope and so fourth using C.

    It's actually quite simple. Unless you explicitly use the cogram keyword in front of a variable, everything is stored in HUB RAM. Optimization of course will happen, to use data from the 16 available "registers" instead of reading from HUB RAM all the time, but as far as your C syntax is concerned, it's all in the HUB. For global variables across cogs, just use the volatile keyword, same as you would in a multi-threaded C program on a desktop.
    CPP is ok on a PC but I think its just to much overhead and not really needed on micro

    I thought that too - and in fact I started writing PropWare in C some time ago. Then I saw SRLM's libpropeller written in C++ and gave it a go. So long as you're careful what features to use and when, it's quite handy and doesn't cause any overhead at all (or a negligible amount, like 8 bytes per instance). Writing methods inside the class definition helps a lot though, as SRLM explains here.
    Definitely read this on C++, from the folks that wrote PropGCC. As the document states: be sure to avoid unnecessary virtual methods, excpetions, the standard library, etc, and you'll be good to go. But the basic concepts of classes/objects are no problem at all. Dynamic memory allocation is another big gotcha - but I find ways to avoid it in everything that I write (except the unit tests in PropWare) and I'm sure you can too.
  • You know... I'm not sure. I thought it would, but I just realized I've never actually written asm that wasn't fcached or running as a cog module...

    I'll check tonight! :)

    Well.. it doesn't work.. :(

    I don't know how to loop in LMM or CMM modes.

    In GAS instead of using "jmp" you can use "brs" (for a short branch) or "brl" (for a long branch). These will expand to appropriate LMM instructions; an add/sub to the PC for brs, or a call into the LMM kernel for a brl. (In CMM mode both will be replaced by compressed pseudo-ops.)

    Unfortunately brs and brl only work in LMM/CMM/XMM, and not in any "internal" mode (COGC or fcached code).

    Eric
  • That's great information, thank you.

    I don't have PropGCC in front of me to test this with, so please tell me which parts I screw up :P

    If we want to invoke a jump that runs in any memory model, it might look like....
    __asm__ volatile (
            // START
            "StartPerformMove%=:                                                        \n\t"
            "               mov clock%=, %[_delayAtCurrentRung]                         \n\t"
            "               add clock%=, CNT                                            \n\t"
    
            "ascStepLoop%=:                                                             \n\t"
            "               waitcnt clock%=, %[_delayAtCurrentRung]                     \n\t"
            "               test %[_limitPinMask], INA wz                               \n\t"
    
    #if (defined _PROPELLER_LMM || defined _PROPELLER_CMM)
            // brs used here instead of brl because it's a short function and the jump-to-address is less than 511 bytes (9 bits) away
            " if_nz         brs #StartPerformMove%=                                  \n\t"
    #else
            " if_nz         jmp #StartPerformMove%=                                  \n\t"
    #endif
    : [_delayAtCurrentRung] "+r"(CLKFREQ / 4), // Pause for 1/4 second
    : [_limitPinMask] "r"(this->m_limitPin.get_mask()));
    

    This makes me ask something that I've been wondering for a long time: when you write inline assembly in CMM mode, does PropGCC convert your asm to short instructions during the build, or does the CMM interpreter know what instructions are 8-bits and which ones were hand-coded as full 32-bits?
  • So one thing im confused about it the mention of 64 lines of PASM this applies to inline ASM in XMM mode?
  • Yes malloc<>good in embedded apps for the most part and that one thing I fear about using all these abstracted library and cpp.
  • So one thing im confused about it the mention of 64 lines of PASM this applies to inline ASM in XMM mode?

    64 lines is for FCache. No more than 64 lines of assembly in an fcached routine. Of course.... that's a lot of assembly. More than that, and you either need to remove the fcache, or run it in a dedicated cog, as an "internal" module as Eric put it.
  • It was also stated that c was as fast and compact as spin in one of the memory modes, sorry in a hurry dont wanna scroll up. I thought that c code was larger but much faster (idk why I want to say 40x) than spin?
  • If we want to invoke a jump that runs in any memory model, it might look like....
    __asm__ volatile (
            // START
            "StartPerformMove%=:                                                        \n\t"
            "               mov clock%=, %[_delayAtCurrentRung]                         \n\t"
            "               add clock%=, CNT                                            \n\t"
    
            "ascStepLoop%=:                                                             \n\t"
            "               waitcnt clock%=, %[_delayAtCurrentRung]                     \n\t"
            "               test %[_limitPinMask], INA wz                               \n\t"
    
    #if (defined _PROPELLER_LMM || defined _PROPELLER_CMM)
            // brs used here instead of brl because it's a short function and the jump-to-address is less than 511 bytes (9 bits) away
            " if_nz         brs #StartPerformMove%=                                  \n\t"
    #else
            " if_nz         jmp #StartPerformMove%=                                  \n\t"
    #endif
    : [_delayAtCurrentRung] "+r"(CLKFREQ / 4), // Pause for 1/4 second
    : [_limitPinMask] "r"(this->m_limitPin.get_mask()));
    

    That looks pretty much right. brs/brl will also work in XMM mode too, so you might want to expand the #if defined stuff a bit.
    This makes me ask something that I've been wondering for a long time: when you write inline assembly in CMM mode, does PropGCC convert your asm to short instructions during the build, or does the CMM interpreter know what instructions are 8-bits and which ones were hand-coded as full 32-bits?

    It gets converted during the build -- the assembler will output either full 32 bit instructions or compressed instructions depending on the memory model selected. Note that "compressed" instructions are actually a superset of the whole instruction set; some 32 bit instructions take 40 bits in CMM mode, but the commonly used ones fit in 8 or 16 bits.

    If you want to avoid the instruction compression for some reason then you can use the ".compress off" assembly directive, but you should only do that if you're sure the instructions will be directly executed and not interpreted. That's why FCACHE code typically is surrounded by ".compress off" / ".compress default".

  • It was also stated that c was as fast and compact as spin in one of the memory modes, sorry in a hurry dont wanna scroll up. I thought that c code was larger but much faster (idk why I want to say 40x) than spin?

    The compiler can actually operate in 3 modes:

    COGC: the code is directly executed in a COG (and has to fit in 2K). This is very fast, but quite a bit bigger than Spin.

    LMM/XMM: the code is interpreted by a very simple interpreter (everything except jumps is basically directly executed). This can be up to 4x slower than COGC, although some small loops will run at nearly COGC speed because they fit in internal memory via the FCACHE mechanism. Approximately the same size as COGC.

    CMM: the code is executed by a more complex interpreter that allows for compressed instructions. Because of instruction decode the code is slower than LMM (but still faster than Spin); size wise it is a little bigger than Spin. FCACHE is available in CMM mode, if you compile with -O2 or -O3 (but this makes the code bigger than the "optimize for size" option -Os).

    Here's a comparison; it may be slightly out of date (it was made a while back) but gives a rough comparison of size/speed for Heater's FFT benchmark:
                 Time (ms)    Total Size (bytes)
    PASM             25          4480
    GCC -O2          47          7288
    GCC CMM -O2      96          5768
    GCC -Os         148          7292
    GCC CMM -Os     537          5460
    Spin           1465          3244
    

    The sizes here are the raw binary sizes to be loaded into the device.
    Note that the CMM code size includes the interpreter, whereas Spin's interpreter is built in to the ROM. With modern GCC builds the CMM interpreter is loaded once and then the HUB memory it uses is freed. So effectively CMM -Os needs a bit more than 3460 bytes versus the 3244 for Spin.
Sign In or Register to comment.