Shop OBEX P1 Docs P2 Docs Learn Events
C Expressive Enough for Idiomatic PASM? - Page 2 — Parallax Forums

C Expressive Enough for Idiomatic PASM?

24

Comments

  • Dave HeinDave Hein Posts: 6,347
    edited 2011-08-17 08:50
    In my test5.c I tried to match the way Phil's assembly code was using the variables. outputbuffer was treated as a hub RAM variable, so I defined it as a global data element. indata, valpred and step had implied previous values and they were also modified, which means that external code may need access to them. So I defined them as global data elements in hub RAM or as explicitly defined registers if the USE_REGISTERS flag was defined. Phil's assembly code used them as registers. I defined indexTable and stepsizeTable as pointers, which could also be either in hub RAM or in registers. This differs from Phil's assembly code, where these were actually tables stored in cog RAM.

    The variables val, diff, delta, vpdiff and index all seemed to be temporary variables since they didn't rely on a previous value, so I defined these as local variables. Of course, if some external coded relied on the final values of these variables they would need to be made global also.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-08-17 09:10
    I don't know if this will help or not, but I've attached the entire files from which my code snippet was taken. It's an ADPCM encoder/decoder that I'm adapting for the Propeller. At least you'll be able to see which variables are persistent and which are local.

    -Phil
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-08-17 09:52
    With the tables stored in hub RAM GCC generates 325 lines of code for both the encoder and decoder. That would fit nicely in cog RAM. The 325 locations includes the instructions and constants. The working registers require 16 additional locations, and you would need a main routine to call the encode or decode function. This is a nice example of the usefulness of a C compiler that generates native cog PASM code.

    I attached a Spin file that contains the generated assembly code.

    Edit: The contents of _L_LC2, _L_LC5, _L_LC6 and _L_LC7 needed to be fixed so they properly reference the table addresses. An "@" need to be inserted and the object offset needs to added at run time.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-08-17 10:16
    Dave Hein wrote:
    This is a nice example of the usefulness of a C compiler that generates native cog PASM code.
    I have to agree. ('Didn't think you'd ever see me write that, did you? :) )

    One thing that would be helpful -- or at least interesting -- would be for the compiler to insert each C source code line as a comment above the PASM instructions that it produces.

    -Phil
  • Roy ElthamRoy Eltham Posts: 3,000
    edited 2011-08-17 11:00
    Phil,
    With optimizations on, it's difficult to attribute a C line to a set of instructions like you want, because the instructions for a single line of C code could very well be split apart across the output. The Visual Studio C++ compiler has the option to do the C/C++ code interspersed in comments thing, and in optimized code it's all jumbled up and some lines repeat. It's pretty much unusable.
  • RossHRossH Posts: 5,519
    edited 2011-08-17 15:46

    One thing that would be helpful -- or at least interesting -- would be for the compiler to insert each C source code line as a comment above the PASM instructions that it produces.

    -Phil

    Sounds good - but in practice I have to agree with Roy - even Catalina (whose optimization is by no means aggressive) can reorganize simple C code so that the lines are not recognizable, are removed altogether, or appear in a different order to the original source code. You want this to happen, but it makes understanding or documenting the generated code quite difficult. I had to tackle this problem when writing the Catalina debugger - eventually, you just have to give up and warn the user that some lines that look like they should generate code may not do so, and that the program flow may jump around between source lines unexpectedly.

    Ross.
  • Heater.Heater. Posts: 21,230
    edited 2011-08-17 16:06
    Wow Ross, you even tried. I think every compiler/debugger I have ever used, whatever language, has come with instructions that say "turn optimizations off for debugging".

    Of course that often makes the debugger useless as the code works with optimization off. Or it becomes two big/slow to do what you want when debugging.
  • jmgjmg Posts: 15,185
    edited 2011-08-17 18:12
    Roy Eltham wrote: »
    Phil,
    With optimizations on, it's difficult to attribute a C line to a set of instructions like you want, because the instructions for a single line of C code could very well be split apart across the output. The Visual Studio C++ compiler has the option to do the C/C++ code interspersed in comments thing, and in optimized code it's all jumbled up and some lines repeat. It's pretty much unusable.

    Having no source-comments at all, is "pretty much unusable' by definition, there is nothing to use!

    Even if they are jumbled, that only occurs in SOME places, and comments are still very useful.

    If they are tagged with a simple optimize token, that can even show users WHICH optimise level shuffled things about.

    To me it is quite simple: Give users the choice, do NOT make the choice for them.
  • jmgjmg Posts: 15,185
    edited 2011-08-17 18:43
    So my questions are these:

    1. Can an optimizing C compiler even hope to produce efficient, idiomatic PASM?

    2. If "no" to #1, are we doing users any favors by providing a C-to-PASM compiler if it misleads them into believing that they're getting the best performance possible from the Propeller?

    3. If "yes" to #1, will sufficient effort be put into the planned compiler to do so?

    To me this is not quite asking the right questions.

    The answer to #1 is Yes, it can get ever-closer to "efficient, idiomatic PASM"
    - but as that will be something that gets better over time, a smarter question covers the middle ground.

    4. Will the compiler support good source comments in PASM, and easy paste of PASM manually optimised as in-line assembler ?

    Sometimes you do want in-line asm, so your flight-critical code does not change with improvements in compilers, or any optimize/debug settings.


    There are also simple things that can be done to most compilers, to make the leaps needed, less of an ask, so you can converge gracefully on better compiler outputs.

    eg If you know the PASM can do this at no cost
            if_nc or        delta,#2
    

    then you can code close to PASM in style, (and you will only do this when it really matters)
    'if ( diff >= step  ) {
         delta |= 2;
         diff -= step;
         vpdiff += step;
    }
    
    becomes
     LocalBooleanScratch  =  ( diff >= step  )   // compiler will use carry, or not carry here, 
     if  LocalBooleanScratch   delta |= 2;
     if  LocalBooleanScratch   diff -= step;
     if  LocalBooleanScratch   vpdiff += step;
    
  • RossHRossH Posts: 5,519
    edited 2011-08-18 02:16
    Heater. wrote: »
    Wow Ross, you even tried. I think every compiler/debugger I have ever used, whatever language, has come with instructions that say "turn optimizations off for debugging".

    Of course that often makes the debugger useless as the code works with optimization off. Or it becomes two big/slow to do what you want when debugging.

    Catalina's debugger is pretty darn clever (thanks mostly to Bob Anderson, not me!) - it was designed to allow Catalina to debug optimized code - but that does make it a litle confusing at times (e.g. when the code indicates a function call has occurred but the function has actually been inlined! :))

    Ross.
  • Heater.Heater. Posts: 21,230
    edited 2011-08-18 03:06
    RossH,

    Damn cunning. I guess it's still hard to place break points on lines of code or functions that have been entirely removed by optimization:)

    One of the useful things about coding is PASM is direct access/use of the zero and carry flags. Say shifting bits out of one register into another through carry , or muxing C/Z into a register and so on. As jmg points out conditional execution based on C and/or Z is another important case in the Prop.

    It has always bugged me that despite C being a fairly low level language, a "portable assembler" as they say, when it comes to checking carry outs and such it makes you run through hoops because you cannot get at the status bits.

    Is there a nice way to deal with this in C? I've sometimes imagined a C compiler could support some magic variables like "__carry" and such which would help with this without breaking any standards :)much:).

    And how do we get C to NOT set carry and zero when evaluating expressions, except when we want, so that we can then use __carry as we would in assembler?
  • RossHRossH Posts: 5,519
    edited 2011-08-18 03:38
    Heater. wrote: »
    RossH,

    Damn cunning. I guess it's still hard to place break points on lines of code or functions that have been entirely removed by optimization:)
    Well, it doesn't do that, of course - what it does is indicate which lines it can find code for (regardless of any optimization) and allow you to place breakpoints there. But sometimes it appears inexplicable to a user why (for example) you can place a breakpoint on the entry to one function, but not another, or one line but not another.
    Heater. wrote: »
    One of the useful things about coding is PASM is direct access/use of the zero and carry flags. Say shifting bits out of one register into another through carry , or muxing C/Z into a register and so on. As jmg points out conditional execution based on C and/or Z is another important case in the Prop.

    It has always bugged me that despite C being a fairly low level language, a "portable assembler" as they say, when it comes to checking carry outs and such it makes you run through hoops because you cannot get at the status bits.

    Is there a nice way to deal with this in C? I've sometimes imagined a C compiler could support some magic variables like "__carry" and such which would help with this without breaking any standards :)much:).

    And how do we get C to NOT set carry and zero when evaluating expressions, except when we want, so that we can then use __carry as we would in assembler?

    This is kind of missing the point - C is intended to make it unnecessary to worry about such details - precisely so that you can't make the kind of silly mistakes we often make on the propeller because on other processors a CMP instruction always sets the Z bit, while on the Propeller it doesn't unless you remember to ask it to do so.

    If you really need that level of control, then C (or in fact any high level language) is the wrong choice - use assembler and be done with it. Trying to use C for this makes some sense if you are on a platform with a MIPS rating that looks like a phone number - but it makes very little sense on a processor with 496 instructions and a decimal point in front of the MIPS rating.

    Ross.
  • jmgjmg Posts: 15,185
    edited 2011-08-18 04:45
    Heater. wrote: »
    Is there a nice way to deal with this in C? I've sometimes imagined a C compiler could support some magic variables like "__carry" and such which would help with this without breaking any standards :)much:).

    I've seen compilers do exactly this, so yes, it can be done.

    IIRC C added a boolean (bit) type, which makes this more explicit.

    Of course this does need care, if also using full optimise - that is where clear assembler reporting is important. To make sure you and the compiler, are pushing in the same direction.

    Generally, this is kept for those most critical (small) sections of code.
  • ersmithersmith Posts: 6,103
    edited 2011-08-18 05:23
    Heater. wrote: »
    Turns out that having global vars, as mine and Dave's versions does prevents that. BUT I think even then it's not fool proof. One would have to add the "volatile" attribute to the inputs and outputs to be really sure. One needs "volatile" anyway as we assume that the variables in HUB can be read or written at any time and so we don't want the compiler squirreling them away in registers as it works through the code.

    In this case we need to be sure that whatever would be in COG in the intended PASM is "volatile".

    Just to be pedantic (I'm sure you probably know this, but for the benefit of any less experienced C users who come across this thread):

    "volatile" is only needed for variables that the hardware can change or that another process could change without the compiler knowing it. In the cog memory only the 16 hardware registers would need to be marked "volatile". In hub memory any variables shared with another cog would need to be volatile, but variables that aren't shared don't need to be.

    In general a compiler cannot optimize away stores to external memory locations, since they may be used later in other functions. The only exceptions are static variables where it can see the usage patterns.

    Eric
  • Heater.Heater. Posts: 21,230
    edited 2011-08-18 05:51
    Me:
    In this case we need to be sure that whatever would be in COG in the intended PASM is "volatile".

    I have been quoted twice now saying this, three times if I quote myself, trouble is it is WRONG. Should be:
    In this case we need to be sure that whatever would be in HUB in the intended PASM is "volatile".

    Never mind, I think we are all on the same page now.
  • ersmithersmith Posts: 6,103
    edited 2011-08-18 06:07
    Heater. wrote: »
    It has always bugged me that despite C being a fairly low level language, a "portable assembler" as they say, when it comes to checking carry outs and such it makes you run through hoops because you cannot get at the status bits.

    Is there a nice way to deal with this in C? I've sometimes imagined a C compiler could support some magic variables like "__carry" and such which would help with this without breaking any standards :)much:).

    As Ross correctly points out, C has to hide some of the machine details from you so that your code can be portable. There are machines that do not actually have carry flags, or condition code registers.

    Having said that, a clever code generator can make use of the flags the machine has. In particular, the machine-independent way to specify a carry is something like:
    void
    sum(unsigned *out, unsigned alo, unsigned ahi, unsigned blo, unsigned bhi)
    {
        unsigned rlo, rhi;
        unsigned carry;
    
        rlo = alo + blo;
        carry = (rlo < alo);
        rhi = ahi + bhi + carry;
    
    
        out[0] = rlo;
        out[1] = rhi;
    }
    

    In principle a compiler could certainly recognize that it should use "add with carry" to produce rhi. In practice not many compilers will :-( although I'm sure that some do, at least in some circumstances.

    Eric
  • RossHRossH Posts: 5,519
    edited 2011-08-18 06:44
    ersmith wrote: »
    "volatile" is only needed for variables that the hardware can change or that another process could change without the compiler knowing it. In the cog memory only the 16 hardware registers would need to be marked "volatile". In hub memory any variables shared with another cog would need to be volatile, but variables that aren't shared don't need to be.

    Eric

    Sorry Eric, but I don't think this is the case - either for hub or cog variables.

    For hub varaibles, we need to consider the section of the the C standard that says "A volatile declaration may be used to describe ... an object accessed by an asynchronously interrupting function. Actions on objects so declared shall not be ‘‘optimized out’’ by an implementation or reordered except as permitted by the rules for evaluating expressions."

    In the propeller we don't have traditional interrupts as such, but any cog can write to any hub ram location in between any two hub operations of our C cog, so in fact every other cog behaves as if it were (or at least could be) an "asynchronously interrupting function". This means that if volatile is specified for any hub ram variable (shared or not), then any operation involving that variable cannot be optimized away or re-ordered - at least not by the compiler, which has no knowledge of what may be executing in other cogs at run-time.

    That same section would appear to leave the compiler free to optimize away or re-order the evaluation of cog ram variables even if they were declared volatile (other than the 16 hardware registers, of course). However, there is another section of the standard that says "All accessible objects have values, and all other components of the abstract machine have state, as of the time the longjmp function was called, except that the values of objects of automatic storage duration that are local to the function containing the invocation of the corresponding setjmp macro that do not have volatile-qualified type and have been changed between the setjmp invocation and longjmp call are indeterminate."

    My reading of this is that if you specify volatile for a cog ram variable then its value has to be preserved across setjmp/longjmp calls. Essentially, it cannot be optimized away (or reordered) in this case either - even if the variable is local to the function.

    So specifying volatile should effectively disable optimization in all cases. I suppose you could argue that the function would actually have to contain a setjmp/longjmp to be sure of this - but that is easy enough to arrange.

    Ross.
  • jazzedjazzed Posts: 11,803
    edited 2011-08-18 08:32
    I've always used volatile in C "primarily" for ensuring that a compiler actually does what I ask it to do for device drivers. That is, if I need to do a read/modify/write on a hardware register, I mark the register (pointer etc...) as volatile.

    The effect of volatile is that you can force the optimizer to not ignore certain things. Whether that was the original intent or not is debatable.

    Wikipedia (like it or not) currently gives a nice overview. http://en.wikipedia.org/wiki/Volatile_variable

    Ross offers an interesting clarification to users who may not understand any of it. The statement is just a slightly different (but still correct) interpretation of volatile usage in C.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-08-18 09:39
    Looking at the code generated for the ADPCM file, it looks as if the compiler is still treating the Propeller as a load/store architecture, with math operations only possible in a small set of working registers, viz:
            mov     r7, _L_LC8      'Should be [b]mins r6,_L_LC8[/b]
            mins    r6, r7              
    

    Is this a wired-in characteristic of GCC, or can this behavior be changed?

    And what's up with
    _L_LC2
            long    _stepsizeTable
    

    Literals with values less than 512 (such as cog addresses) do not need long entries.

    Also, I understand the bit about reordering and how that makes source code comments a bit awkward. But one thing that would still be helpful for debugging is to maintain the names of temporary variables in the PASM code, rather than using their working-register designations, r1, r2, etc. Even though the variables are temporary, you can still alias more than one variable name to a given register location.

    I realize this is still a work in progress. Hopefully the aforementioned items are on a to-do list.

    Thanks,
    -Phil
  • Dave HeinDave Hein Posts: 6,347
    edited 2011-08-18 10:17
    Phil,

    You may have missed my comment that I added to my ealier post. It was
    Edit: The contents of _L_LC2, _L_LC5, _L_LC6 and _L_LC7 needed to be fixed so they properly reference the table addresses. An "@" need to be inserted and the object offset needs to added at run time.
    The compiler generates assembly code intended for the GNU assembler, so I used a utility I wrote to convert it to the Spin/PASM format. The two tables in adpcm.c are stored in hub RAM, and my utility did not convert the "long _stepsizeTable" to "long @_stepsizeTable + 16" to get the hub address. Of course that only works for the top object. In general, the statement should be "long @stepsizeTable" and the Spin code should add the object offset before starting the cog.

    The compiler treats the Prop as a load/store machine with 16 registers. I'm not sure if the register moves can be eliminated, but I would guess that it can be done.

    Dave
  • ersmithersmith Posts: 6,103
    edited 2011-08-18 10:23
    RossH wrote: »
    Sorry Eric, but I don't think this is the case - either for hub or cog variables.

    For hub varaibles, we need to consider the section of the the C standard that says "A volatile declaration may be used to describe ... an object accessed by an asynchronously interrupting function. Actions on objects so declared shall not be ‘‘optimized out’’ by an implementation or reordered except as permitted by the rules for evaluating expressions."

    Perhaps I was insufficiently clear. I was saying what things *needed* to be marked as volatile by the programmer, or else the compiler could generate bad code. That is, if you forget to add "volatile" to a hub memory address that another cog changes, then your program may not work correctly. Similarly, if you forget to add "volatile" to a hardware register, your program may not work right (depending on the compiler, optimization settings, etc.). There's generally no need to add "volatile" to a cog ram variable, unless (as you point out) there's some multithreading or setjmp/longjmp going on.

    In other words, I think we're in violent agreement :-). If a variable is marked volatile, the compiler can't optimize accesses to it. If it isn't marked volatile, it's free to do so. So if you're a programmer, remember to put "volatile" on any variables shared with other cogs or threads.

    Eric
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-08-18 10:35
    Dave,

    Actually, what I missed is that the tables are in the hub, not the cog; hence my comment. Is there a way to specify in C that they be put in the cog?

    -Phil
  • ersmithersmith Posts: 6,103
    edited 2011-08-18 10:43
    Looking at the code generated for the ADPCM file, it looks as if the compiler is still treating the Propeller as a load/store architecture, with math operations only possible in a small set of working registers, viz:
            mov     r7, _L_LC8      'Should be [B]mins r6,_L_LC8[/B]
            mins    r6, r7              
    
    Is this a wired-in characteristic of GCC, or can this behavior be changed?

    It's slowly getting better. GCC is very much oriented towards traditional load/store machines, with assumptions that operations are faster in registers. With some tweaking we'll be able to get it to do many operations in cog variables. But there will always be a few cases where it misses that optimization.
    And what's up with
    _L_LC2
            long    _stepsizeTable
    
    stepsizeTable is placed in hub RAM, so it may be > 511.
    Also, I understand the bit about reordering and how that makes source code comments a bit awkward. But one thing that would still be helpful for debugging is to maintain the names of temporary variables in the PASM code, rather than using their working-register designations, r1, r2, etc. Even though the variables are temporary, you can still alias more than one variable name to a given register location.
    gcc has a -fverbose-asm option which adds comments to each instruction giving the original variable names. It's not terribly useful, because most of the variables are internally generated by the compiler and have names like "tmp128" and such, but it's better than nothing.

    The gnu assembler gas can also work with gcc to put the C source code into the assembly listing, but I don't think that feature is ready yet.

    Thanks for everyone's comments and feedback. With your help I'm sure that gcc will soon be a worthy addition to the compilers available on the Propeller.

    Regards,
    Eric
  • ersmithersmith Posts: 6,103
    edited 2011-08-18 10:49
    Actually, what I missed is that the tables are in the hub, not the cog; hence my comment. Is there a way to specify in C that they be put in the cog?

    gcc doesn't have that feature yet. It's relatively complicated to access arrays in cog memory, since you have to do self-modifying code. Similarly, byte and word writes are pretty awkward, requiring read-modify-write masks. It can be done, but I think it's going to be quite a while before we're at that stage.

    I will note that gcc has a fairly decent instruction scheduler (not that the propeller gcc port takes full advantage of it yet). You can see, for example, in the adpcm port that there are many places where hub operations are separated by two ordinary instructions, so once the code is synchronized with the hub window it can stay in sync for quite a while. This reduces the cost of putting things in hub memory.

    Eric
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-08-18 13:04
    Thanks, Eric. I appreciate the feedback! Can I infer from your knowledgeable comments that you're the person writing the compiler?

    -Phil
  • RossHRossH Posts: 5,519
    edited 2011-08-18 16:05
    Looking at the code generated for the ADPCM file, it looks as if the compiler is still treating the Propeller as a load/store architecture, with math operations only possible in a small set of working registers, viz:

    << snip >>

    Is this a wired-in characteristic of GCC, or can this behavior be changed?

    Hi Phil,

    The architecture of GCC - like most compilers - is heavily oriented around making effective use of a limited set of registers. Register allocation is one of the big issues when building any code generator. The way GCC does this by default is apparently pretty good, but it is not "wired in" - you can disable GCCs default register allocation and do it yourself if you want to.

    As to the number of registers, we are lucky on the Propeller to be able to allocate as many or as few as we like - but there are swings and roundabouts. Having lots of registers reduces the code size, but beyond a certain point they will simply be unused by most C programs (and therefore wasted). Also, if you need to preserve registers across function calls, or "spill" registers, then the more registers you have the slower the resulting code will execute.

    For cog-based C on the propeller, where you only have 496 longs to work with, allocating hundreds of them as registers would obviously be a bit dumb - it is unlikely anyone would ever write a C program that ever used them. However, allocating too few is also equally dumb, as the code would have to spill registers to the stack all the time, slowing the code down dramatically.

    It may be possible to figure out the "sweetspot" mathematically, but I suspect the GCC team will do this the same way I did it for LCC - i.e. by trial and error!

    Ross.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2011-08-18 16:50
    Ross,

    Why can't the number of scratch registers be set dynamically during compilation, based on a program's requirements, rather than statically ahead of time? Each program will have its own optimal setting. Since all of the Propeller's cog memory locations are general-purpose registers, there's really no reason to pre-allocate "accumulator" registers ahead of time and demote the rest of cog memory to be load/store only.

    How much can GCC be tweaked to get away from the basic n-register-plus-load/store computing model? It seems unnecessarily crippling, given the Propeller's architecture.

    -Phil
  • RossHRossH Posts: 5,519
    edited 2011-08-18 18:03
    Ross,

    Why can't the number of scratch registers be set dynamically during compilation, based on a program's requirements, rather than statically ahead of time? Each program will have its own optimal setting. Since all of the Propeller's cog memory locations are general-purpose registers, there's really no reason to pre-allocate "accumulator" registers ahead of time and demote the rest of cog memory to be load/store only.

    How much can GCC be tweaked to get away from the basic n-register-plus-load/store computing model? It seems unnecessarily crippling, given the Propeller's architecture.

    -Phil

    Hi Phil,

    Good question - the short answer is that every part of the program to be executed would have to be compiled to expect the same number of scratch registers. So to compile even the smallest C program, you would have to recompile every line of C involved - including (potentially) the entire C library. Also, you could not provide program object files or libraries - you would have to provide the entire C source code of your applications.

    I suppose you could decide on a "minimum" and compile all your library code to use only that number of scratch registers - but for many C programs there is more library code than application code, so this would essentially negate the whole point of doing it in the first place!

    Also, it would be difficult (perhaps impossible) to write an algorithm that could determine what would be the appropriate number of scratch registers to use in any particular case. The alternative would be to do this manually - i.e recompile your program several times using a different numbers of scratch registers, and then run it to see whether the code size/performance tradeoff suits your needs.

    This would make for a very interesting compiler - but somehow I don't think it would help sell many Propellers!

    Ross.
  • ersmithersmith Posts: 6,103
    edited 2011-08-18 19:25
    Thanks, Eric. I appreciate the feedback! Can I infer from your knowledgeable comments that you're the person writing the compiler?

    -Phil
    It would be a gross exaggeration to say that I'm "writing the compiler" :-). We have a team working on various aspects of getting gcc to run on the propeller. I'm currently working on the part that produces PASM output from gcc's internal description. It's a fun challenge, but it's by no means the only or even the most important part of getting things going!

    Seeing suggestions on how to further optimize the output is always useful, even if some of those may turn out to be difficult to implement because of gcc's structure. gcc is a rather messy and complicated piece of code -- but its machine independent optimization is extremely good.

    Eric
  • ersmithersmith Posts: 6,103
    edited 2011-08-18 19:37
    RossH wrote: »
    The architecture of GCC - like most compilers - is heavily oriented around making effective use of a limited set of registers. Register allocation is one of the big issues when building any code generator. The way GCC does this by default is apparently pretty good, but it is not "wired in" - you can disable GCCs default register allocation and do it yourself if you want to.
    A small nit -- it's probably theoretically possible to change gcc's register allocator (which they call "reload"), but it's not actually practical to do much more than change the number and kinds of registers available. Actually changing the algorithms gets into code that is labelled "black magic" and where even the most experienced hackers on the gcc mailing list fear to tread :-).
    As to the number of registers, we are lucky on the Propeller to be able to allocate as many or as few as we like - but there are swings and roundabouts. Having lots of registers reduces the code size, but beyond a certain point they will simply be unused by most C programs (and therefore wasted). Also, if you need to preserve registers across function calls, or "spill" registers, then the more registers you have the slower the resulting code will execute.
    ...(snip)...
    It may be possible to figure out the "sweetspot" mathematically, but I suspect the GCC team will do this the same way I did it for LCC - i.e. by trial and error!
    We're very lucky that we have some other examples (like your excellent Catalina) to look at, so we're not quite starting from scratch. But since gcc's algorithms are quite different from lcc, you're certainly right that there will be some trial and error involved!

    I guess I'd also like to point out here that every compiler is a different tool, and adding a new one (like gcc) to the Propeller's toolbox in no way means that the other tools are obsolete. There are strengths and weaknesses to both lcc and gcc, and I personally think it's great to have a choice available.

    Regards,
    Eric

    (As usual, I should also add a disclaimer that all of my opinions on this thread, and elsewhere, are my own. I'm working with Parallax, but I certainly can't presume to speak for them!)
Sign In or Register to comment.