Catalina 3.13.2 - smaller and faster, better IDE

Heater. · 2014-04-27 11:03

David,

Where it fails is in driver code where the order of operations might matter.

I'm curious about this.

I guess we are all familiar with the issue of inputs not getting read or variables that are written by other processes not getting read (same thing really). Basically because the compiler thinks it "owns" all the data and it knows it has previously read the thing, it has not changed it, so why read it again, it will be the same right? Oops.. you have just optimized away fetching of input data.

This is normally easily fixed by telling the compiler it does not control something. Adding "volatile" to variable declarations.

But, what you mention there is about order of execution. So I might write:

...
a = 1;
...
b = 1;
...

And for whatever reason find the assignments are done in reverse order. Perhaps even if a and b are volatile. Perhaps a and b are outputs that need to be set in that order. Oops...

Have you ever seen such a thing happen? Do you have an example? Does the C standard even allow that?

I have some times glossed over the descriptions of "sequence points" in C which I presume, perhaps wrongly, enable you to control this in standard syntax.

David Betz · 2014-04-27 11:23

Heater. wrote: »
David,
Where it fails is in driver code where the order of operations might matter. 
I'm curious about this.

I guess we are all familiar with the issue of inputs not getting read or variables that are written by other processes not getting read (same thing really). Basically because the compiler thinks it "owns" all the data and it knows it has previously read the thing, it has not changed it, so why read it again, it will be the same right? Oops.. you have just optimized away fetching of input data.

This is normally easily fixed by telling the compiler it does not control something. Adding "volatile" to variable declarations.

But, what you mention there is about order of execution. So I might write:
...
a = 1;
...
b = 1;
...
And for whatever reason find the assignments are done in reverse order. Perhaps even if a and b are volatile. Perhaps a and b are outputs that need to be set in that order. Oops...

Have you ever seen such a thing happen? Do you have an example? Does the C standard even allow that?

I have some times glossed over the descriptions of "sequence points" in C which I presume, perhaps wrongly, enable you to control this in standard syntax.

You need to do ugly stuff like described here: http://www.makelinux.net/books/lkd2/ch09lev1sec10

Heater. · 2014-04-27 11:39

So, this ordering business is not C's fault. The processors can do reordering of operations on the fly at execution time. If you want your C code to be portable you have to assume this will happen somewhere and take the precautions suggested in your link.

Except the article does say "Complicating these issues is the fact that both the compiler and the processor can reorder reads and write". So even if you are targeting a known processor that is known not to reorder you could still not be sure the compiler does not reorder.

I'd really like to see a piece of C for the Propeller where this happens.

dnalor · 2014-04-27 12:18

Thank you for your detailed answer, Ross!

RossH wrote: »

[...]

Aha! What is happening is that your platform by default is including a serial HMI driver, which is using the same pins the debugger wants to use for its serial comms. You have found the correct answer, which is to either specify NO_HMI, or choose another type of HMI (e.g. VGA or TV). If you need to use a serial HMI, you can still use the debugger, but you need to tell the program to use a different set of pins for the debugger serial comms, and you have to have two USB connections to your platform (I have to do this on the HYDRA or HYBRID, both of which cannot use the standard serial port when XMM memory is in use - instead, I use the mouse port or a keyboard port, with a special cable described on page 23 of the Catalina Reference Manual).

And there is one of my problems. I don't easy see what is activated for my platform. For me it is an try and error game and I feel rather stupid.
Now after you wrote, that a serial HMI driver is included, I searched where I can see this. Ah, there is the Custom_HMI.inc and there at the end in the #else, there it is. Now I understand. But for what is then the checkbox TTY in the build options ? I am confused.
Are this config/definition files really easier than a few includes/pragmas in the code? Or the _DEF.inc only + the build options?

RossH wrote: »

There is a discussion on code size in the Catalina Reference Manual starting on page 118 ("A Note about Catalina Code Sizes"). However, it is mainly concerned with reducing code size, and it really only covers LMM and CMM modes - it does not cover all the various XMM modes.

A table of all possible combinations of even a simple example program compiled with all possible memory models, loader options, XMM configurations, HMI options, and optimizer levels would be pages long (on the C3, for example, there are 4 different ways just to use the same XMM memory) - but it is a good idea, so I will see if I can come up with a sensible subset to include in the next release.

You are right, there are maybe too many options to give total numbers. But maybe start a table with the building options and there effect on included code/plugins, codesize and needed changes/precautions in the config files.

This all is not meant as critisism. English is not my mother tongue, so it's maybe all because I do not understand the manuals good enough.

dnalor · 2014-04-27 12:32

RossH wrote: »

Actually if you look at raw (i.e. un-optimized) LCC output compared to un-optimized GCC output, LCC output is much cleaner, simpler and more efficient. I was amazed to find this - the LCC compiler is actually quite good at basic code generation - which helps explain why it is still used as the basis for many compilers - both commercial and free ones - even after all this time.

But although GCC is a pretty ordinary at basic code generation, it makes up for it by having an astonishingly good optimizer, whereas I have had to write my own one for LCC. I would be the first to admit mine is nowhere near as good as the GCC one - but it is getting better all the time!

Ross.

Is there any file, where I can see a mixed view (C-Code-assembler) ?

David Betz · 2014-04-27 12:45

Heater. wrote: »

So, this ordering business is not C's fault. The processors can do reordering of operations on the fly at execution time. If you want your C code to be portable you have to assume this will happen somewhere and take the precautions suggested in your link.

Except the article does say "Complicating these issues is the fact that both the compiler and the processor can reorder reads and write". So even if you are targeting a known processor that is known not to reorder you could still not be sure the compiler does not reorder.

I'd really like to see a piece of C for the Propeller where this happens.

I think there are hardware read and write barriers on some processors.

Heater. · 2014-04-27 13:03

David,

I think there are hardware read and write barriers on some processors.

I'm not au fait with how this all works. I just imagine that modern CPU's have long pipelines and multiple execution units and that somehow the processor stuffs instructions into those units in parallel where possible. It has to ensure that data dependencies are not broken. That still allows for reads and writes to happen out of order if there are no apparent data dependencies.

Clearly to fix that one would have to be able to tell the processors internal instruction scheduler that there are dependencies it cannot work out for itself. Hence barrier instructions, mfence, lfence, and sfence on x86 I believe.

So far so good.

But there is still that nagging statement that the compiler can reorder things for it's own weird optimization purposes.

I'm just curious if there is an example of that we can create somehow. Even on x86 would do, we can always look at the assembler output to see it happening.

Heater. · 2014-04-27 13:26

David,

Well google found it for me. The most simple code can induce GCC to reorder stuff:
http://preshing.com/20120625/memory-ordering-at-compile-time/

Which shows the following example:

int A, B;


void foo()
{
    A = B + 1;
    B = 0;
}

Compile that with -O0 and -O2 for Intel and you can clearly see it in the assembler output.

$ gcc -O0 -S -o reorder0.S reorder.c
$ gcc -O2 -S -o reorder2.S reorder.c

However propgcc is a bit harder to grok. This is the -O0 output:

        .comm   _A,4,4
        .comm   _B,4,4
        .text
        .balign 4
        .global _foo
_foo
        sub     sp, #4
        wrlong  r14, sp
        mov     r14, sp
        mvi     r7,#_B
        rdlong  r7, r7
        mov     r6, r7
        add     r6, #1
        mvi     r7,#_A
        wrlong  r6, r7
        mvi     r7,#_B
        mov     r6, #0
        wrlong  r6, r7
        mov     sp, r14
        rdlong  r14, sp
        add     sp, #4
        lret

And this is the -O2 output:


        .text
        .balign 4
        .global _foo
_foo
        mvi     r6,#_B
        mov     r5, #0
        rdlong  r7, r6
        add     r7, #1
        wrlong  r5, r6
        mvi     r6,#_A
        wrlong  r7, r6
        lret
        .comm   _B,4,4
        .comm   _A,4,4

Err, is that backwards or not?

Ross, what does Catalina do here?

David Betz · 2014-04-27 14:02

Heater. wrote: »
David,

Well google found it for me. The most simple code can induce GCC to reorder stuff:
http://preshing.com/20120625/memory-ordering-at-compile-time/

Which shows the following example:
int A, B;


void foo()
{
    A = B + 1;
    B = 0;
}
Compile that with -O0 and -O2 for Intel and you can clearly see it in the assembler output.
$ gcc -O0 -S -o reorder0.S reorder.c
$ gcc -O2 -S -o reorder2.S reorder.c
However propgcc is a bit harder to grok. This is the -O0 output:
        .comm   _A,4,4
        .comm   _B,4,4
        .text
        .balign 4
        .global _foo
_foo
        sub     sp, #4
        wrlong  r14, sp
        mov     r14, sp
        mvi     r7,#_B
        rdlong  r7, r7
        mov     r6, r7
        add     r6, #1
        mvi     r7,#_A
        wrlong  r6, r7
        mvi     r7,#_B
        mov     r6, #0
        wrlong  r6, r7
        mov     sp, r14
        rdlong  r14, sp
        add     sp, #4
        lret
And this is the -O2 output:


        .text
        .balign 4
        .global _foo
_foo
        mvi     r6,#_B
        mov     r5, #0
        rdlong  r7, r6
        add     r7, #1
        wrlong  r5, r6
        mvi     r6,#_A
        wrlong  r7, r6
        lret
        .comm   _B,4,4
        .comm   _A,4,4
Err, is that backwards or not?

Ross, what does Catalina do here?

I guess it's trying to reuse the _B address in r6. It would be interesting to see what was done on ARM or MIPS. The x86 isn't really a good comparison since it doesn't really have general registers.

Heater. · 2014-04-27 14:14

OK, I think of stared at it long enough to convince myself it is actually writing A out last.

Now, if we make and A and B volatile we get:

        .text
        .balign 4
        .global _foo
_foo
        mvi     r6,#_B
        mvi     r5,#_A
        rdlong  r7, r6
        add     r7, #1
        wrlong  r7, r5
        mov     r7, #0
        wrlong  r7, r6
        lret
        .comm   _B,4,4
        .comm   _A,4,4

Which looks like we are back in order again!

David Betz · 2014-04-27 14:25

Heater. wrote: »
OK, I think of stared at it long enough to convince myself it is actually writing A out last.

Now, if we make and A and B volatile we get:
        .text
        .balign 4
        .global _foo
_foo
        mvi     r6,#_B
        mvi     r5,#_A
        rdlong  r7, r6
        add     r7, #1
        wrlong  r7, r5
        mov     r7, #0
        wrlong  r7, r6
        lret
        .comm   _B,4,4
        .comm   _A,4,4
Which looks like we are back in order again!

That makes sense. In the absence of volatile, the compiler assumes there are no dependencies between A and B so it doesn't matter if they are written in the opposite order. The volatile keyword basically tells it not to make such assumptions.

RossH · 2014-04-27 19:59

David Betz wrote: »

You can, of course, turn off all optimization and then you get exactly what you wrote. However, as many have mentioned, the code is really *too* redundant to be of much use. There might not be an optimization option that gets rid of the redundant code but doesn't move things around. If that's the case then I guess GCC is not for you. Maybe Ross will say whether Catalina does any code reordering. If not, then it would be a good option. Luckily we have both choices on the Propeller.

Well, Catalina's optimizer does some re-ordering, if you consider "inlining" to be re-ordering.

And LCC does a small amount of re-ordering when it is constructing loops and such things - these can confuse people using the debugger for the first time - but I think these would all be considered "normal" and "expected" for any compiler/optimizer.

Ross.

RossH · 2014-04-27 20:09

dnalor wrote: »

Is there any file, where I can see a mixed view (C-Code-assembler) ?

No - you can see the PASM code output but the C source code is not included.

However, if you add -g to your compile command and look in the resulting .dbg file, you will see the address of each source line, which you can cross-reference to the listing (.lst file).

Ross.

Ross.

RossH · 2014-04-27 20:15

Heater. wrote: »

Ross, what does Catalina do here?

Errr. Nothing - here is Catalina's output ...

C_foo ' <symbol:foo>
 jmp #LODL
 long @C_B_
 mov r22, RI
 rdlong r22, r22
 adds r22, #1
 jmp #LODL
 long @C_A_
 wrlong r22, RI
 mov r22, #0
 jmp #LODL
 long @C_B_
 wrlong r22, RI
 jmp #RETN

C_B_ ' <symbol:B>
 byte 0[4]
C_A_ ' <symbol:A>
 byte 0[4]

Ross.

jmg · 2014-04-27 20:33

RossH wrote: »

No - you can see the PASM code output but the C source code is not included.

Is it easy to improve that, so Source lines are inserted as comments ?
That helps a lot in both learning new systems, and in tracking possible bugs.

RossH · 2014-04-27 21:10

jmg wrote: »

Is it easy to improve that, so Source lines are inserted as comments ?
That helps a lot in both learning new systems, and in tracking possible bugs.

Its certainly possible, since LCC already identifies the source lines when debugging is enabled - but it doesn't make the source available to the back-end code generator (where the PASM is produced), so I don't think it is simple - but I'll have a look when I get time.

Ross.

Heater. · 2014-04-27 22:28

dnalor,

Is there any file, where I can see a mixed view (C-Code-assembler) ?

It's a bit of a pain but GCC can do it. Fist compile with "-g" so as to include debugging symbols in the binary.
Then use the objdump command to get a listing. Like so:

$ propeller-elf-gcc -g -O0 -o reorder reorder.c
$ propeller-elf-objdump -S ./reorder > reorder.lst

Which produces something like this:

void foo()
{
 668:   0420fc84                        sub     40 <sp>, #4
 66c:   101c3c08                        wrlong  38 <r14>, 40 <sp> nr
 670:   101cbca0                        mov     38 <r14>, 40 <sp>
    A = B + 1;
 674:   37007c5c                        jmp     #dc <__LMM_MVI_r7> nr
 678:   180d0000                        nop
 67c:   070ebc08                        rdlong  1c <r7>, 1c <r7>
 680:   070cbca0                        mov     18 <r6>, 1c <r7>
 684:   010cfc80                        add     18 <r6>, #1
 688:   37007c5c                        jmp     #dc <__LMM_MVI_r7> nr
 68c:   1c0d0000                        nop
 690:   070c3c08                        wrlong  18 <r6>, 1c <r7> nr
    B = 0;
 694:   37007c5c                        jmp     #dc <__LMM_MVI_r7> nr
 698:   180d0000                        nop
 69c:   000cfca0                        mov     18 <r6>, #0
 6a0:   070c3c08                        wrlong  18 <r6>, 1c <r7> nr
}
 6a4:   0e20bca0                        mov     40 <sp>, 38 <r14>
 6a8:   101cbc08                        rdlong  38 <r14>, 40 <sp>
 6ac:   0420fc80                        add     40 <sp>, #4
 6b0:   0f22bca0                        mov     44 <pc>, 3c <lr>

Problem is that is not the code you want to run, it's not optimized and far too big.
Setting optimization on makes a mess of the listing output.

Heater. · 2014-04-27 22:36

RossH,

Catalina's optimizer does some re-ordering, if you consider "inlining" to be re-ordering.

There is reordering of of instruction layout in memory, as in inlining. There is reordering of the actual run time execution sequence. We have been discussing the latter. In the example the source says "write A then write B", the compiled code does the opposite.

This is serious because if variables are written out in reverse order at run time you program is not behaving as you might naively expect. I was surprised how easily GCC can me made to do such temporal reordering.

All is made well again by the use of "volatile". I hope...!

RossH · 2014-04-27 23:50

Heater. wrote: »

RossH,

There is reordering of of instruction layout in memory, as in inlining. There is reordering of the actual run time execution sequence. We have been discussing the latter. In the example the source says "write A then write B", the compiled code does the opposite.

This is serious because if variables are written out in reverse order at run time you program is not behaving as you might naively expect. I was surprised how easily GCC can me made to do such temporal reordering.

All is made well again by the use of "volatile". I hope...!

I don't know a lot about GCC these days, but GCC used to be quite well known for overly-optimistic optimization. Presumably it is better these days, but it still seems to be very aggressive.

Catalina's optimization is far less aggressive, but so far I have not seen an instance of it doing the wrong thing.

Ross.

RossH · 2014-04-27 23:54

Heater. wrote: »

It's a bit of a pain but GCC can do it. Fist compile with "-g" so as to include debugging symbols in the binary.
Then use the objdump command to get a listing.

Yes, I considered doing it this way for Catalina - essentially providing a "list" utility that used the debugging information to identify the source lines, and then interspersed them in the listing. But it would be better if they just naturally appeared in the normal listing output.

Ross.

Heater. · 2014-04-28 00:49

RossH,

...used to be quite well known for overly-optimistic optimization.

Gosh, when was that? I have been using GCC since 1998 and I don't recall any time our projects code failed due to optimizer bugs. I have managed to crash the compiler a few times though. Not recently. That was mostly using very old GCC versions supplied with Wind River's VxWorks RTOS. One of the many reasons we switched to Linux for all our embedded systems.

dnalor · 2014-04-28 01:19

Heater. wrote: »

RossH,

Gosh, when was that? I have been using GCC since 1998 and I don't recall any time our projects code failed due to optimizer bugs. [...]

Then you never used the Microchip MPLAB C Compiler, which is a port of gcc!

RossH · 2014-04-28 01:39

Heater. wrote: »

RossH,

Gosh, when was that? I have been using GCC since 1998 and I don't recall any time our projects code failed due to optimizer bugs. I have managed to crash the compiler a few times though. Not recently. That was mostly using very old GCC versions supplied with Wind River's VxWorks RTOS. One of the many reasons we switched to Linux for all our embedded systems.

You suddenly make me feel old! I began using GCC in the early eighties, when we eventually had to give up the GCC compiler altogether because of all its problems. But I used it again later (with much more success) indirectly via GNAT.

Like any complex software, of course the GNU optimizer has bugs in it! A quick google search for "gcc optimizer bugs" still shows many instances even in recent versions. Yes, some of them just crash the compiler, but others cause the generation of incorrect code.

But in the case of an optimizer (which is after all usually optional - before the Propeller, I had never worked on a program where memory was so tight you absolutely had to use an optimizer), I suspect many people just do like we always did and simply turned off - or at least reduced - the level of optimization till it worked again. You might try turning it again later (e.g. on the next release of the compiler) or you might not.

Ross.

RossH · 2014-04-28 01:39

dnalor wrote: »

Then you never used the Microchip MPLAB C Compiler, which is a port of gcc!

Or the AVR toolchain!

Heater. · 2014-04-28 02:03

dnalor,

Then you never used the Microchip MPLAB C Compiler, which is a port of gcc!

That is true. Microchip is not a supported architecture of the GCC project. Or is it?. Who knows what Microchip did with the compiler.

I did try GCC for the AVR briefly. Gave up with them, not the compilers fault, I discovered the propeller

Heater. · 2014-04-28 02:11

RossH,

You suddenly make me feel old! I began using GCC in the early eighties,

Yes, your memory is fading:)

The good news is you are not as old as you think. GCC was not released until 1987. The first "stable" release was in 1991. GCC was not adopted by BSD until 1994.

I was amazed to find it in serious use inside Nokia in 1998. But then, at that time I had only just recently become aware Linux and the world of Free Software.

Sounds like you were among the brave pioneering early adopters though.

RossH · 2014-04-28 02:50

Heater. wrote: »

RossH,

Yes, your memory is fading:)

Are you sure? (about the date I mean!) I guess it may have been late eighties. I certainly remember GCC 1.x (not sure what "x" was!) But I do remember we had to use multiple C++ compilers to compile the project, because C++ was so new that no single compiler could yet compile the whole language successfully, so we had to use different compilers for different components. What a mess that was! In the end, the Sun C++ compiler won out, because it was the first one that managed to compile the whole project.

Ross.

Heater. · 2014-04-28 03:06

I am hardly sure of anything any more. But in wikipedia I trust:)

Using C++ in the 1980's was just fool hardy:)

A new and impossible to implement language AND a new compiler. Definitely treading dodgy ground.

Mind you I still think that about C++ today.

RossH · 2014-04-28 03:58

Heater. wrote: »

Using C++ in the 1980's was just fool hardy:)

A new and impossible to implement language AND a new compiler. Definitely treading dodgy ground.

Mind you I still think that about C++ today.

Well, at least we agree on that!

Heater. · 2014-04-28 04:32

C++ is great. I love it.

Did I say "dodgy ground"? It's like hiking out on the pack ice, full of the excitement that at any moment a crack will open up and swallow you whole and all your huskies. Or perhaps parachuting without a reserve. Or is it that experience of building a beautiful arch and then realizing that if you want to change any small part it's all going to collapse? Or just painting yourself into a very small corner of a very large room, when you finally realize your class hierarchy does not model the actual problem you have.

Apart from all that C++ is fine. It redeems itself by the fact that one can write things like JavaScript engines in it.

Catalina 3.13.2 - smaller and faster, better IDE

Comments