PropBASIC Questions:

davidsaunders · 2015-04-19 20:55

jmg wrote: »

True but FPC is a 2015 active language reference, and you should be able to find enough common subset ground.

I'd be wary of creating too many Basics, as that is actually one of the drawbacks....
which is why Pascal makes more sense, than another Basic variant.

Better to have common code generation and libraries, and syntax-options if you really need them.

Does make sense, besides Pascal is easier to write a compiler for than BASIC (I have done both, yuck).

ersmith · 2015-04-20 05:09

davidsaunders wrote: »

I am well aware of the optimizations that GCC, and most others do. Not much use for COG code (optimizing for size can be done fairly easily inline in a recursive decent compiler), maybe a bit of use for LMM, CMM, and XMM code, though that is another story.

I think you're seriously underestimating the amount of optimization a modern compiler like GCC or LLVM can do. Here's a simple array initializaiton (maybe for an array of digits):

void fillarray( unsigned a[] )
{
    int limit = 10;
    int i;

    for (i = 0; i < limit; i++) {
        a[i] = limit * i;
    }
}

Here's what PropGCC produces (with -mcog -O2):

        .global _fillarray
_fillarray
        mov     r7, #0
        mov     r6, #10
        '' loop_start register r6 level #1
.L2
        wrlong  r7, r0
        add     r7, #10
        add     r0, #4
        djnz    r6,#.L2
        jmp     lr

Note that it has figured out that "limit" is constant, converted array accesses to pointer references, converted the multiplication to a sum, and re-arranged the loop so it can use djnz. This is all pretty useful regardless of what memory model you're in.

Or consider the good old factorial function, defined in the natural recursive way:

int factorial(int x)
{
    if (x <=  0) {
        return 1;
    }
    return x * factorial(x-1);
}

GCC can convert that to a loop:

        .global _factorial
_factorial
        mov     r7, r0
        cmps    r7, #0 wz,wc
        mov     r0, #1
        IF_BE   jmp     #.L8
        mov     r6, r7
        '' loop_start register r6 level #1
.L7
        mov     r1, r7
        sub     r7, #1
        call    #__MULSI
        djnz    r6,#.L7
        jmp     lr
.L8
        jmp     lr

Not quite optimal (the return could have been direct) but a heck of a lot more efficient than the recursive version.

Or, given the definition of factorial above, add another function fact5:

int fact5(void) { return factorial(5); }

GCC compiles that to:

_fact5
        mov     r0, #120
        jmp     lr

which is some pretty impressive constant folding.

davidsaunders · 2015-04-20 05:50

ersmith wrote: »
I think you're seriously underestimating the amount of optimization a modern compiler like GCC or LLVM can do. Here's a simple array initializaiton (maybe for an array of digits):
void fillarray( unsigned a[] )
{
    int limit = 10;
    int i;

    for (i = 0; i < limit; i++) {
        a[i] = limit * i;
    }
}
Here's what PropGCC produces (with -mcog -O2):
        .global _fillarray
_fillarray
        mov     r7, #0
        mov     r6, #10
        '' loop_start register r6 level #1
.L2
        wrlong  r7, r0
        add     r7, #10
        add     r0, #4
        djnz    r6,#.L2
        jmp     lr
Note that it has figured out that "limit" is constant, converted array accesses to pointer references, converted the multiplication to a sum, and re-arranged the loop so it can use djnz. This is all pretty useful regardless of what memory model you're in.

Or consider the good old factorial function, defined in the natural recursive way:
int factorial(int x)
{
    if (x <=  0) {
        return 1;
    }
    return x * factorial(x-1);
}
GCC can convert that to a loop:
        .global _factorial
_factorial
        mov     r7, r0
        cmps    r7, #0 wz,wc
        mov     r0, #1
        IF_BE   jmp     #.L8
        mov     r6, r7
        '' loop_start register r6 level #1
.L7
        mov     r1, r7
        sub     r7, #1
        call    #__MULSI
        djnz    r6,#.L7
        jmp     lr
.L8
        jmp     lr
Not quite optimal (the return could have been direct) but a heck of a lot more efficient than the recursive version.

Or, given the definition of factorial above, add another function fact5:
int fact5(void) { return factorial(5); }
GCC compiles that to:
_fact5
        mov     r0, #120
        jmp     lr
which is some pretty impressive constant folding.

There is no question that when you missuse a variable as a constant, GCC can do some pretty impressive optimization.

I ran your first example through the very first C compiler I ever wrote, and the result was:

fillarray:
      MOV AX, 0
      MOV CX, 10
.Lp:  MOV [DI], AX
      MUL AX,10
      INC DI            ;Op requires 80186 or newer.
      DEC CX
      JNE .Lp
      RET

And that with a simple recursive decent compiler that does some simple pre-ordering of expressions, and some simple optimization during compile, with a very simple peephole post compile optimizer.

Now as for the extreme folding as shown in your last example, well that is great as long as no veriable is volatile (the common case on the propeller is most variables are accessed by more than one cog unless local).

davidsaunders · 2015-04-20 05:52

There is no question that GCC can do many optimizations that a simple compiler can not, that is a given. It is just a matter of how useful those optimizations are for the target.

Bean · 2015-04-20 06:29

Wow, those optimizations are really impressive.

PropBASIC was designed to teach PASM. You could write code in BASIC and then see what was needed in PASM to do the same thing.
It is a single pass compiler and compiles "a line at a time" so multi-line optimization is not possible. Well not without a major rewrite.

Bean

ersmith · 2015-04-20 06:35

davidsaunders wrote: »
I ran your first example through the very first C compiler I ever wrote, and the result was:
fillarray:
      MOV AX, 0
      MOV CX, 10
.Lp:  MOV [DI], AX
      MUL AX,10
      INC DI            ;Op requires 80186 or newer.
      DEC CX
      JNE .Lp
      RET

That's pretty good... but where does AX get updated? It looks like it's setting all the array entries to 0, instead of to 10*i.

The optimization that changes the multiply into adds is extremely useful for the Propeller, since there's no hardware multiply. There are a ton of other useful optimizations too (like using conditional instructions instead of jumps in if/then/else).

Having said that, the more variety we have in compilers for the Propeller the better, so we certainly look forward to seeing yours!

davidsaunders · 2015-04-20 09:26

ersmith wrote: »

That's pretty good... but where does AX get updated? It looks like it's setting all the array entries to 0, instead of to 10*i.

The optimization that changes the multiply into adds is extremely useful for the Propeller, since there's no hardware multiply. There are a ton of other useful optimizations too (like using conditional instructions instead of jumps in if/then/else).

Having said that, the more variety we have in compilers for the Propeller the better, so we certainly look forward to seeing yours!

OK, so there is a bug in -O6 level optimization that I had not noticed in my compiler, I will have to fix that.

Thank you for the catch.

davidsaunders · 2015-04-20 09:36

Here is the output with -O5 level optimization:

fillarray:
      PUSHA
      MOV AX, 0
      MOV CX, 10
.Lp:  MOV [DI], AX
      MOV AX, CX
      MUL AX,10
      INC DI            ;Op requires 80186 or newer.
      DEC CX
      JNE .Lp
      POPA
      RET

davidsaunders · 2015-04-20 09:41

Ok I normally only use up to O4, and it has been over 15 years since I did anything with that particular compiler, I see the error of being backwards in the O5 optimization.

davidsaunders · 2015-04-20 10:00

Now that I realize that there are some extreme errors in the example of my very first C compiler, I am showing something from a more recent C compiler.

As a better example here is the output from a one of my more recent C compilers (the second 32-BIT x86 one I did):

NUL_fillarray_UINT_PTR:
         XOR EAX,EAX
         MOV ECX, 0Ah
         ADD EDI, ECX
        SHL ECX, 02h
.L000:   MOV [EDI], EAX
         MOV EAX, ECX
         MUL EAX, 0Ah
         DEC EDI
         SUB ECX, 04h
         JNE .L000
         RET

davidsaunders · 2015-04-20 10:08

I never did get a compiler to realize that multiplying eax by 10 is usually better done as:

   mov ebx, eax
   shl eax,1
   shl ebx, 3
   add eax,ebx

At least as speed goes, even with the extreme piplining of today mul is slower than a r2r mov, two shl's, and an add.

jmg · 2015-04-20 14:03

Bean wrote: »

PropBASIC was designed to teach PASM. You could write code in BASIC and then see what was needed in PASM to do the same thing.
It is a single pass compiler and compiles "a line at a time" so multi-line optimization is not possible. Well not without a major rewrite.

It still generates useful code, and the good thing about any compiler built to teach PASM, is it will output well commented PASM, and can also in-line if someone really decided they need more compact code.
The ideal is an editor paste-able cooperation between PASM list files and source code.

GCC optimises well, but for the Prop it suffers from squeezing all code thru a historic 16 register model.
That's the main reason why PropBASIC code is smaller and faster than GCC.

David Betz · 2015-04-20 14:08

jmg wrote: »

It still generates useful code, and the good thing about any compiler built to teach PASM, is it will output well commented PASM, and can also in-line if someone really decided they need more compact code.
The ideal is an editor paste-able cooperation between PASM list files and source code.

GCC optimises well, but for the Prop it suffers from squeezing all code thru a historic 16 register model.
That's the main reason why PropBASIC code is smaller and faster than GCC.

From some of the code that Eric posted, it looks like propgcc can use variables that are declared as _COGMEM directly rather than loading them into one of the 16 pseudo-registers first.

jmg · 2015-04-20 14:17

David Betz wrote: »

From some of the code that Eric posted, it looks like propgcc can use variables that are declared as _COGMEM directly rather than loading them into one of the 16 pseudo-registers first.

Does that mean the ported FreqCtr example in another thread, can become smaller ?

ersmith · 2015-04-20 16:41

Bean wrote: »

PropBASIC was designed to teach PASM. You could write code in BASIC and then see what was needed in PASM to do the same thing.

And it does that very well. Actually PropBASIC produces pretty darn good PASM code, and its output is quite readable (which is not something GCC is good at, unfortunately).

There are many goals for compilers, and many niches that need filling. I don't think any language (or compiler) can possibly solve all problems, and more choice is always good.

davidsaunders · 2015-04-20 21:24

Yes PropBASIC does a quite good job of outputting good and human readable code in the same order as the input, and it is a very good language.

________________________________
As to the question of languages:
If you were to ask me what I would consider to be the ideal completely general purpose programming language, I would say that such a thing does not exist, and likely never will.

On the other hand if you asked me what the ideal language is for a given purpose, I would likely give a list of the few that I am aware of that are best suited to the purpose.

For example, to write an OS kernel using a High level language, I would recommend C, Pascal, or maybe B.

To write a hypertext parser, I would recommend, BCPL, Pascal, Oberon, or Smalltalk.

So what language we use depends very much on what we are attempting to do. A big part of the reason I am writing the 3D printer software in 4 different languages is that I am not sure which is best suited.

__________________
I am working up a EBNF for an implementation of Object Pascal for the Propeller (COG Mode only), and I will go with minimal optimization.

For many applications Object Pascal has advantages over other languages (though as always it depends on the application). As it supports data and procedure pointers, subrange and set types, structured types, strong typing, nesting procedures in procedures, local variables, modular programming, etc.

The advantages over C and C++, for some applications, are that Pascal supports subrange and set data types, as well as supporting nested procedures to any level (nested procedures could be a big advantage on the Propeller).

Now there are obviously disadvantages in relation to C and C++, such as the lack of support for local to module global's, and the lack of support for overriding operators (yes I know that some Pascal implementations add that one).

So I do believe that Pascal will be a well used addition to the Propeller.

______________________
It is to bad that we do not have a complete list of programming languages for the Propeller anywhere. Having such a list would make it easier for programmers to figure out what language best suits there various projects.

abecedarian · 2015-04-20 21:52

davidsaunders wrote: »

...
It is to bad that we do not have a complete list of programming languages for the Propeller anywhere. Having such a list would make it easier for programmers to figure out what language best suits there various projects.

http://forums.parallax.com/showthread.php/113091-ULTIMATE-List-of-Propeller-Languages ?
http://humanoidolabs.blogspot.tw/2012/03/ultimate-list-of-big-brain-languages.html ?

jmg · 2015-04-20 23:11

davidsaunders wrote: »

I am working up a EBNF for an implementation of Object Pascal for the Propeller (COG Mode only), and I will go with minimal optimization.

For many applications Object Pascal has advantages over other languages (though as always it depends on the application). As it supports data and procedure pointers, subrange and set types, structured types, strong typing, nesting procedures in procedures, local variables, modular programming, etc.

Sounds good.
It also has a native boolean type, which can map to Pins, and can pack flags into words, saving valuable space.

I presume this will be Conditional-Compile compatible with FreePascal ?
- and included the clean in-line ASM, with the generated ASM code designed to drop-into the source easily ?
Example of fpc asm

  //This is required with Lazarus on x86:
  {$ASMMODE intel}
  asm
    MOV EAX, num
    ADD EAX, 110B //add binary 110
    SUB EAX, 2    //subtract decimal 2
    MOV answer, EAX
  end;

Will you also generate a Symbol-file for Source level Debug support ?

Targeting an assembler with Macros and Conditional assembly would also improve things. I think gas does that ?

davidsaunders · 2015-04-21 05:34

jmg wrote: »
Sounds good.
It also has a native boolean type, which can map to Pins, and can pack flags into words, saving valuable space.

I presume this will be Conditional-Compile compatible with FreePascal ?
- and included the clean in-line ASM, with the generated ASM code designed to drop-into the source easily ?
Example of fpc asm
  //This is required with Lazarus on x86:
  {$ASMMODE intel}
  asm
    MOV EAX, num
    ADD EAX, 110B //add binary 110
    SUB EAX, 2    //subtract decimal 2
    MOV answer, EAX
  end;
Will you also generate a Symbol-file for Source level Debug support ?

Targeting an assembler with Macros and Conditional assembly would also improve things. I think gas does that ?

Yes I do intend to implement inline assembly. And as with any compiler there will be an option for generating a symbol file as part of the output (though I am not sure how you are going to do source level debugging on the prop).

As for conditional compilation, I will do something (though this is the territory of a pre-processor, not a compiler).

I have to decide on the form of some things yet, though I am still finishing up the 3D printer stuff.

jmg · 2015-04-21 13:13

davidsaunders wrote: »

Yes I do intend to implement inline assembly. And as with any compiler there will be an option for generating a symbol file as part of the output (though I am not sure how you are going to do source level debugging on the prop).

See some other threads
http://forums.parallax.com/showthread.php/160767-PropGCC-COG-debug-protocol
http://forums.parallax.com/showthread.php/160259-GDB-Baud-rate
http://forums.parallax.com/showthread.php/147104-GDB-over-serial-ready-or-coming
+ 1 other I cannot remember about recent GDB work....

Ahh, here it is, the unrelated title confused my search...
http://forums.parallax.com/showthread.php/160703-Propeller-assembly-in-a-cog-interfaced-to-either-Catalina-or-propeller-gcc.?p=1327187

Source level debug on Prop is closer than you think

- Lazarus has a remote debug feature, and someone has GDB sort of working on the Prop with a GDB helper in the GCC kernel. So there are a lot of building blocks about, just not quite joining all the dots.
In the very simplest form, something that reports the PC from the Prop, then needs the symbol table/File:Line# tags on the host, plus a means to highlight a line on a source file.

Of course, such a system does not really know (or care) what source is in the window, so it can work with any language that creates a compatible symbol file.

davidsaunders · 2015-04-21 13:40

jmg wrote: »

See some other threads
http://forums.parallax.com/showthread.php/160767-PropGCC-COG-debug-protocol
http://forums.parallax.com/showthread.php/160259-GDB-Baud-rate
http://forums.parallax.com/showthread.php/147104-GDB-over-serial-ready-or-coming
+ 1 other I cannot remember about recent GDB work....

Ahh, here it is, the unrelated title confused my search...
http://forums.parallax.com/showthread.php/160703-Propeller-assembly-in-a-cog-interfaced-to-either-Catalina-or-propeller-gcc.?p=1327187

Source level debug on Prop is closer than you think

- Lazarus has a remote debug feature, and someone has GDB sort of working on the Prop with a GDB helper in the GCC kernel. So there are a lot of building blocks about, just not quite joining all the dots.
In the very simplest form, something that reports the PC from the Prop, then needs the symbol table/File:Line# tags on the host, plus a means to highlight a line on a source file.

Of course, such a system does not really know (or care) what source is in the window, so it can work with any language that creates a compatible symbol file.

That would be a binary debugger, with symbol keys, not a source level debugger. That is fairly simple to do. I had thought that true source level debugging was wanted.

jmg · 2015-04-21 13:58

davidsaunders wrote: »

That would be a binary debugger, with symbol keys, not a source level debugger. That is fairly simple to do.

That's good to hear.

davidsaunders wrote: »

I had thought that true source level debugging was wanted.

I'm not following your semantics ?
- All debuggers I know, run binary in the target, and report binary PC information back to the host,
The Host then correlates that binary address, by finding which source file and line it belongs to, using the symbol file(s). ( - all of this hopefully done with smallest overhead in the target, and very fast... )

To the users eyes, all that background binary activity appears as Source Level Debug.
They see a source line highlighted.

davidsaunders · 2015-04-21 14:45

jmg wrote: »

That's good to hear.

I'm not following your semantics ?
- All debuggers I know, run binary in the target, and report binary PC information back to the host,
The Host then correlates that binary address, by finding which source file and line it belongs to, using the symbol file(s). ( - all of this hopefully done with smallest overhead in the target, and very fast... )

To the users eyes, all that background binary activity appears as Source Level Debug.
They see a source line highlighted.

I understand that. Though many languages do have real source level debugging (by using a specialized interpreter, with a means of running any asm inline as native), and there is an effective difference (in the way breakpoints are done, line stepping, etc).

Just because the binary debugger shows you the source, does not mean that it is the same as a source level debugger.

davidsaunders · 2015-04-21 14:45

jmg wrote: »

That's good to hear.

I'm not following your semantics ?
- All debuggers I know, run binary in the target, and report binary PC information back to the host,
The Host then correlates that binary address, by finding which source file and line it belongs to, using the symbol file(s). ( - all of this hopefully done with smallest overhead in the target, and very fast... )

To the users eyes, all that background binary activity appears as Source Level Debug.
They see a source line highlighted.

I understand that. Though many languages do have real source level debugging (by using a specialized interpreter, with a means of running any asm inline as native), and there is an effective difference (in the way breakpoints are done, line stepping, etc).

Just because the binary debugger shows you the source, does not mean that it is the same as a source level debugger.

jmg · 2015-04-21 14:59

davidsaunders wrote: »

Just because the binary debugger shows you the source, does not mean that it is the same as a source level debugger.

Someone using it does not care about such semantics : if they see their source code, to them it is Source level debug.

davidsaunders wrote: »

I understand that. Though many languages do have real source level debugging (by using a specialized interpreter, with a means of running any asm inline as native), and there is an effective difference (in the way breakpoints are done, line stepping, etc)..

Do you have any links, I'm still not following the point of difference you are trying to make.

You seem to be describing a simulator where you mention "a specialized interpreter, with a means of running any asm inline as native" ?
Simulators can also be very useful, but they have drawbacks at real-time interactions, and are never quite the same as real silicon.
Often both are available on a MCU, and then the designer can choose which tool they use.
eg I like Simulators for design optimise and cycle counting and initial code framework verify.
Then, for real-world interaction, a Binary Debug on a Target Device is used - with Source level display on the host.

In other cases I will run a function in a test framework on a different host, which is why having Conditional Compile support that matches hat host compiler matters.

PropBASIC Questions:

Comments