Shop OBEX P1 Docs P2 Docs Learn Events
PropGCC for P2 - Page 3 — Parallax Forums

PropGCC for P2

135

Comments

  • Quick update:

    Today I got to the stage where gcc/g++ completes a build (not libgcc quite yet) and outputs something resembling usable assembly for P2 in COG execution mode. I'm missing some bits on the GAS/LD side so the next stage will be kind of a back-and-forth verifying that the binaries get linked correctly.

    HUBEXEC will come later once COG mode is working. Also I need to swap the P1 emulated mul/div/etc. routines for the shiny new hardware opcodes.
  • cgraceycgracey Posts: 14,133
    edited 2019-09-01 04:47
    Cool! Sounds good.
  • ntosme2 wrote: »
    Quick update:

    Today I got to the stage where gcc/g++ completes a build (not libgcc quite yet) and outputs something resembling usable assembly for P2 in COG execution mode. I'm missing some bits on the GAS/LD side so the next stage will be kind of a back-and-forth verifying that the binaries get linked correctly.

    HUBEXEC will come later once COG mode is working. Also I need to swap the P1 emulated mul/div/etc. routines for the shiny new hardware opcodes.

    Fantastic! Congrats!
  • ntosme2 wrote: »
    Quick update:

    Today I got to the stage where gcc/g++ completes a build (not libgcc quite yet) and outputs something resembling usable assembly for P2 in COG execution mode. I'm missing some bits on the GAS/LD side so the next stage will be kind of a back-and-forth verifying that the binaries get linked correctly.

    HUBEXEC will come later once COG mode is working. Also I need to swap the P1 emulated mul/div/etc. routines for the shiny new hardware opcodes.
    Wow! That’s great progress. Congratulations!
  • Good work, Brian. I'm looking forward to seeing more progress on this. Can't wait for PropGCC on the P2 :) I've been waiting over 2 years for this. Thanks for your efforts.
  • Your efforts are sounding really good so far @ntosme2. Getting a full P2 native GCC toolchain working particularly including Hubexec would be a wonderful achievement and could be incredibly useful for a whole lot of people in the end.

    If you can get it running with the more recent GCC I for one would very much look forward to compare its performance against what was achieved with the p2gcc toolchain on something like the Micropython C source. I was hoping for some further gains there especially with P2 native optimizations on things like prolog/epilogs and local stack frame variable access etc using the P2 HW pointers where possible. In a P2 port there could be more CPU registers made available to use for other register optimizations that the compiler could find when compared to the P1 implementation that only has a small set of registers it works with due to the larger hub transfer overhead of register preserving/restoring a subset of them in all the prologs/epilogs. For the P2 each additional register long saved or restored using block transfers now only adds a single clock if you use setq transfer bursts, instead of 16 clocks per register long for the P1. So I imagine increasing the register range from r0-r14 (plus LR) on P1 to r0-r30 (plus LR) could make good sense for a P2 kernel COG implementation, giving the compiler more registers to play with during its optimization. Something to consider once you hopefully get there... Great stuff, keep it up.
  • ntosme2 wrote: »
    Quick update:

    Today I got to the stage where gcc/g++ completes a build (not libgcc quite yet) and outputs something resembling usable assembly for P2 in COG execution mode. I'm missing some bits on the GAS/LD side so the next stage will be kind of a back-and-forth verifying that the binaries get linked correctly.

    HUBEXEC will come later once COG mode is working. Also I need to swap the P1 emulated mul/div/etc. routines for the shiny new hardware opcodes.
    How did you do this? Did you make use of any of the existing PropGCC code or did you do it from scratch? I'm impressed that you got GAS assembling P2 code so quickly. Thanks very much for your efforts. I'm sure Parallax will be thrilled!

    Is this work in a GitHub repository? Can any of us check it out and try it?

  • ntosme2ntosme2 Posts: 38
    edited 2019-09-03 04:29
    It may have been premature to say GAS was "working"! I started with the moxie target and pretty much rewrote it all in stages. Condition codes get set correctly and a useful subset of the instruction types have the opcode/argument/flag bit stuffing implemented. There are a crazy number of different bit patterns for P2! I'm also missing some details involving expression/label handling and probably other things. I diverted my attention to GCC because GAS and LD were able to build and run and output something on valid input, and throw errors on invalid input.

    Also, I currently have the conditional execution flags mixed in with the {wc, wz, wcz, etc.} flags (after the operands) because GAS pre-processes each line and it was dropping leading characters for some reason.

    For example:
    cmps r7, #9 wz,wc
    jmp #.L6 if_a
    

    ARM does this with a suffix on the instruction:
    LSLEQ r0, r0, #24
    ADDEQ r0, r0, #2
    

    I realize it's not the PASM convention but I'm not sure yet how to move it. The answer is probably in the P1 binutils port.

    For GCC I actually did start with PropGCC (gcc4 version). I've had to comment/remove the {L,C,X}MM,PASM code that was giving me trouble with deprecated and renamed APIs, but maybe Eric can revive the parts that make sense for P2. Today I took a forum dive into hubexec and decided to have that be the default. Currently I'm using PTRA as the stack pointer and pusha/popa/calla/reta

    Yes there are GitLab repos...no it's not quite in a presentable/usable state yet.
    Here's a little teaser though :smiley:
    C++
    int foo;
    int bar;
    
    int bbb(int x) {
        return x - foo;
    }
    int aaa(int x) {
        x++;
        return bbb(x) + bar;
    }
    
    int main() {
        int x = bar;
        asm("adds %[foo], #4"
            : [foo] "=r" (foo));
        return x * aaa(foo);
    }
    
    P2 GAS
    	.file	"test.cpp"
    	.text
    	.balign	4
    	.global	__Z3bbbi
    __Z3bbbi:
    .LFB0:
    	rdlong r7, .LC0
    	sub r0, r7
    	reta
    .LFE0:
    	.size	__Z3bbbi, .-__Z3bbbi
    	.balign	4
    .LC0:
    	.long	_foo
    	.balign	4
    	.global	__Z3aaai
    __Z3aaai:
    .LFB1:
    	rdlong r7, .LC1
    	add r0, #1
    	sub r0, r7
    	rdlong r7, .LC2
    	add r0, r7
    	reta
    .LFE1:
    	.size	__Z3aaai, .-__Z3aaai
    	.balign	4
    .LC1:
    	.long	_foo
    	.balign	4
    .LC2:
    	.long	_bar
    	.balign	4
    	.global	_main
    _main:
    .LFB2:
    	pusha lr
    ' 16 "test.cpp" 1
    	adds r0, #4
    ' 0 "" 2
    	wrlong r0, .LC4
    	calla #__Z3aaai
    	rdlong r7, .LC3
    	muls r0, r7
    	popa lr
    	reta
    .LFE2:
    	.size	_main, .-_main
    	.balign	4
    .LC3:
    	.long	_bar
    	.balign	4
    .LC4:
    	.long	_foo
    	.global	_bar
    	.section	.bss
    	.balign	4
    	.type	_bar, @object
    	.size	_bar, 4
    _bar:
    	.zero	4
    	.global	_foo
    	.balign	4
    	.type	_foo, @object
    	.size	_foo, 4
    _foo:
    	.zero	4
    	.ident	"GCC: (GNU) 9.1.0"
    
    

    DavidZemon wrote: »
    I want Rust.
    P.S. in the near future this may interest you, assuming it will build in cross-compiler mode.
    https://github.com/redbrain/gccrs
  • Ah...just noticed that 'mul' is an unsigned 16x16 multiply. Whoopsi!

    Scheduling that cordic pipeline is going to be really interesting to handle optimally. In -O2 mode I notice that gcc will generate a series of shift and add instructions if it is able to elide one operand to a compile-time constant. For now you guys are going to get:

    qmul a,b
    ... wait for it
    getqx a
  • cgraceycgracey Posts: 14,133
    MUL and SCA are unsigned.

    MULS and SCAS are signed.
  • roglohrogloh Posts: 5,122
    edited 2019-09-03 05:41
    This is great stuff ntosme2! That sample GAS code you pasted looks nice and tight and even includes an inline optimisation for calling bbb from aaa by the looks of it. If you can nail the epilogs/prologs to take maximum advantage of P2 capabilities I think this could work out nicely for fast P2 performance with C. Am sort of hoping there have been improvements in register usage generated from GCC in the recent 9.x versions vs the original 4.x versions the P1 used. When I dug into the PASM code that p2gcc generates for the P2 from the P1 assembly for Micropython, in many places it did not appear to be optimal which I thought was most likely because of its limited working register availability in the original P1 code. This also adds a lot to PASM code size bloat. Your GCC implementation can hopefully improve on this significantly. Excellent.
  • ntosme2 wrote: »
    DavidZemon wrote: »
    I want Rust.
    P.S. in the near future this may interest you, assuming it will build in cross-compiler mode.
    https://github.com/redbrain/gccrs

    Oh that's very cool! :D

    As is your continued work with GCC :)
  • Some more tinkering...

    Here I've created some shorthand macros to P2-specific gcc attributes that specify how to place data in memory, as well as their addressing schemes.
    #define _COGRAM __attribute__((cogram))
    #define _LUTRAM __attribute__((lutram))
    #define _HUBRAM __attribute__((hubram))
    
    int a _COGRAM;
    int b _LUTRAM;
    int c _HUBRAM;
    int d; // default to _HUBRAM when globally-defined
    
    int main() {
        a += 511;
        b += a;
        c += b;
        d += c;
        int e = d+512;
    
        return e;
    }
    

    I'm checking that:
    - a constant 511 is treated as an immediate
    - a constant 512 is stuffed into local cog ram
    - int a is in cog ram and referenced directly
    - int b is in lut ram and referenced directly
    - int c is in hub ram, a local address/pointer is created, and it's referenced via the pointer
    - int d is in hub ram, a local address/pointer is created, and it's referenced via the pointer
    	.file	"test.cpp"
    	.text
    	.section .hubram,"aw",@progbits
    	.balign	4
    	.global	_main
    _main:
    .LFB0:
    	mov r0, _a
    	mov r7, .LC0
    	add r0, #511
    	mov _a, r0
    	add r0, _b
    	mov _b, r0
    	rdlong r6, r7
    	add r0, r6
    	wrlong r0, r7
    	mov r7, .LC1
    	rdlong r6, r7
    	add r0, r6
    	wrlong r0, r7
    	add r0, .LC2
    	reta
    .LFE0:
    	.size	_main, .-_main
    	.section .cogram,"aw",@progbits
    	.balign	4
    .LC0:
    	.long	_c
    	.balign	4
    .LC1:
    	.long	_d
    	.balign	4
    .LC2:
    	.long	512
    	.section .hubram,"aw",@progbits
    	.global	_d
    	.balign	4
    	.type	_d, @object
    	.size	_d, 4
    _d:
    	.zero	4
    	.global	_c
    	.balign	4
    	.type	_c, @object
    	.size	_c, 4
    _c:
    	.zero	4
    	.global	_b
    	.section .lutram,"aw",@progbits
    	.balign	4
    	.type	_b, @object
    	.size	_b, 4
    _b:
    	.zero	4
    	.global	_a
    	.section .cogram,"aw",@progbits
    	.balign	4
    	.type	_a, @object
    	.size	_a, 4
    _a:
    	.zero	4
    	.ident	"GCC: (GNU) 9.1.0"
    

    Thoughts? I think I did that right. The default linker script should accept these sections and "do the right thing".

    Poking at gcc internals reminds me of kernel development where the smallest typo can crash everything immediately.
  • roglohrogloh Posts: 5,122
    edited 2019-09-04 08:29
    The LUTRAM accesses may be rather tricky and I think you may have to use RDLUT. I don't think you can access it as a normal register directly the way you have above, @ntosme2.

    For your "int c" and "int d" cases, the ideal situation on a P2 in future hubexec mode is not necessarily to just create extra constant pointers at .LC0 and .LC1 but to try to use the ## syntax in the mov's or rdlong/wrlong accesses where it makes sense, otherwise you can end up with multiple hub memory accesses to read/write variables when running hubexec mode and some extra memory space for all these constant pointers is then required in some cases.

    So for a hub exec version of this snippet from your cog based code above...
    	mov r7, .LC0
    ...
    	rdlong r6, r7
    	add r0, r6
    	mov r7, .LC1
    	rdlong r6, r7
    	add r0, r6
    	wrlong r0, r7
    
    one approach is to just access your "c" and "d" variable addresses indirectly using this type of thing...
    	mov r7, ##_c
    ...
    	rdlong r6, r7
    	add r0, r6
    	mov r7, ##_d
    	rdlong r6, r7
    	add r0, r6
    	wrlong r0, r7
    

    or better yet load their addresses directly into the rdlong/wrlongs if and wherever it makes sense...(that may be hard to determine)
    ...
    	rdlong r6, ##_c
    	add r0, r6
    	rdlong r6, ##_d
    	add r0, r6
    	wrlong r0, ##_d
    

    This special P2 specific ## syntax needs to be expanded later I expect in GAS to use AUGS as a preceding instruction prior to the rdlong/wrlongs. It does expand each instruction using it to 8 bytes but avoids the hub stall penalty of reading an extra constant pointer from hub memory. There may still be cases where keeping the pointer address in a register is preferential instead of using ## everywhere if you access the same pointer address over and over. In that case you could just use the first indirect method. How much of this you get to control vs what the GCC optimizer itself wants to do I don't know.

    I know "premature optimization is the root of all evil" (a saying a colleague of mine told me and stuck when I lived in Silicon Valley), and you are probably not really doing hubexec fully yet but this is still something to be aware of now so you may be able to consider it later in your port of GCC.

    Roger.



  • evanhevanh Posts: 15,126
    Yes, lutram data access, like hubram data, only has load/store instructions. Lutram can, however, be code space like cogram - with no branch penalties. Making it an excellent place to place fast looping subroutines.

  • @evanh

    Good points, I hadn't noticed that restriction on LUT RAM.

    Thanks for the insight about AUGS. It's not super clear at first why it should be used.
  • ntosme2, you may want to look at the spin2 code generated by p2gcc. It should be similar to the code you will need to generate from GCC when running in the hubexec mode. p2gcc contains a utility called s2pasm that converts the assembly code generated by the P1 GCC compiler to P2 assembly.
  • I figured out LUT RAM addressing and cogexec for more cases. One missing one is generation of ## immediates for 10 to 32-bit constants.

    test.cpp
    #define _COGRAM __attribute__((cogram))
    #define _LUTRAM __attribute__((lutram))
    #define _HUBRAM __attribute__((hubram))
    
    int a _COGRAM;
    int b _LUTRAM;
    int c _HUBRAM;
    int d; // default to _HUBRAM when globally-defined
    
    int main() {
        a &= 1;
        b &= 1;
        c &= 511;
        d &= 512;
        return a + b + c + d;
    }
    

    test.s: propeller-elf-g++ test.cpp -S -O2
    	.file	"test.cpp"
    	.text
    	.section .hubram,"aw",@progbits
    	.balign	4
    	.global	_main
    _main:
    .LFB0:
    	mov r6, _a
    	and r6, #1
    	rdlut r7, ##_b
    	and r7, #1
    	mov r0, r6
    	mov _a, r6
    	add r0, r7
    	wrlut r7, ##_b
    	rdlong r6, ##_c
    	and r6, #511
    	add r0, r6
    	rdlong r7, ##_d
    	and r7, .LC3
    	add r0, r7
    	wrlong r6, ##_c
    	wrlong r7, ##_d
    	reta
    .LFE0:
    	.size	_main, .-_main
    	.section .cogram,"aw",@progbits
    	.balign	4
    .LC3:
    	.long	512
    	.section .hubram,"aw",@progbits
    	.global	_d
    	.balign	4
    	.type	_d, @object
    	.size	_d, 4
    _d:
    	.zero	4
    	.global	_c
    	.balign	4
    	.type	_c, @object
    	.size	_c, 4
    _c:
    	.zero	4
    	.global	_b
    	.section .lutram,"aw",@progbits
    	.balign	4
    	.type	_b, @object
    	.size	_b, 4
    _b:
    	.zero	4
    	.global	_a
    	.section .cogram,"aw",@progbits
    	.balign	4
    	.type	_a, @object
    	.size	_a, 4
    _a:
    	.zero	4
    	.ident	"GCC: (GNU) 9.1.0"
    
  • roglohrogloh Posts: 5,122
    edited 2019-09-09 00:01
    Nice! This is steadily getting better and better ntosme2.

    One other thing to mention if you didn't know about it already. The use of ## for the RDLUT addresses instead of just a single # is redundant (and may even cause errors) given that only 256 of the 512 LUT entries are immediately addressable. For the other 256 locations you do you need to keep using register addressing. That could make things tricky with the LUT unless you treated it as two independent halves based on the address position or only use register addressing for all of it.

    Eg. to access LUT I think you can do this...

    RDLUT dest, #constant ' constant address can only be from 0 to 255
    RDLUT dest, reg ' reg can be a register holding a value from 0-511

    I really like how the actual register usage code generated by GCC with your -O2 setting seems to be optimal for speed with instruction reordering being used (just look how the returned value is calculated above, as a part of the preceding instruction sequence). This already appears to be able to generate some nice and tight C code. If you can customise it wherever you can for taking advantage of any inherent P2 instruction set capabilities such as hub block transfers, HW multiply, ## for larger constants, pointer registers, instruction skip, etc, then GCC is likely to really fly on a P2.
  • Cluso99Cluso99 Posts: 18,066
    IIRC Chip included the PTRx registering in RD/WRLUT to be the same as RD/WRxxxx instructions.
  • @ntosme2

    Any news on your GCC for P2? Is there a GitHub repository you can share with us?
  • ntosme2ntosme2 Posts: 38
    edited 2019-09-14 07:01
    Sorry guys I've been traveling for work a lot recently but I've made you wait long enough. Here are repos and instructions for compiling and running a very alpha p2 gcc.

    Note that the assembler will crash on some input right now, and outputs quite a lot of debug fluff. I suggest playing with gcc using the '-S' flag to output assembly and the `-O2` flag to generate optimized output. Also there appears to be a threading-related issue with line parsing in the assembler I need to track down.

    http://p2gcc.codenthings.com/
  • ntosme2 wrote: »
    Sorry guys I've been traveling for work a lot recently but I've made you wait long enough. Here are repos and instructions for compiling and running a very alpha p2 gcc.

    Note that the assembler will crash on some input right now, and outputs quite a lot of debug fluff. I suggest playing with gcc using the '-S' flag to output assembly and the `-O2` flag to generate optimized output. Also there appears to be a threading-related issue with line parsing in the assembler I need to track down.

    http://p2gcc.codenthings.com/
    Thanks! I'll try building it.

  • Well, my first attempt at compiling p2gcc-dev on the Mac failed. I'll try to track down what went wrong later. Here are the last few lines of the build output:
    gcc /Users/dbetz/work/p2gcc-dev/build-binutils/../binutils-gdb/sim/propeller/../common/gentmap.c -o gentmap -g -O -I. -I/Users/dbetz/work/p2gcc-dev/build-binutils/../binutils-gdb/sim/propeller -I../common -I/Users/dbetz/work/p2gcc-dev/build-binutils/../binutils-gdb/sim/propeller/../common -I../../include -I/Users/dbetz/work/p2gcc-dev/build-binutils/../binutils-gdb/sim/propeller/../../include -I../../bfd -I/Users/dbetz/work/p2gcc-dev/build-binutils/../binutils-gdb/sim/propeller/../../bfd -I../../opcodes -I/Users/dbetz/work/p2gcc-dev/build-binutils/../binutils-gdb/sim/propeller/../../opcodes  
    echo "/* generated by Makefile */" > tmp-hw.h
    /bin/sh /Users/dbetz/work/p2gcc-dev/build-binutils/../binutils-gdb/sim/propeller/../../sim/common/create-version.sh /Users/dbetz/work/p2gcc-dev/build-binutils/../binutils-gdb/sim/propeller/../../gdb \
    	    x86_64-apple-darwin18.6.0 propeller-elf version.c
    sim_hw=""; \
    	for hw in $sim_hw ; do \
    	  echo "extern const struct hw_descriptor dv_${hw}_descriptor[];" ; \
    	done >> tmp-hw.h
    dtc -O dtb -o propeller-gdb.dtb /Users/dbetz/work/p2gcc-dev/build-binutils/../binutils-gdb/sim/propeller/propeller-gdb.dts
    make[3]: dtc: No such file or directory
    echo "const struct hw_descriptor *hw_descriptors[] = {" >> tmp-hw.h
    make[3]: *** [propeller-gdb.dtb] Error 1
    make[3]: *** Waiting for unfinished jobs....
    sim_hw=""; \
    	for hw in $sim_hw ; do \
    	  echo "  dv_${hw}_descriptor," ; \
    	done >> tmp-hw.h
    echo "  NULL," >> tmp-hw.h
    echo "};" >> tmp-hw.h
    mv tmp-hw.h hw-config.h
    make[2]: *** [all] Error 1
    make[1]: *** [all-sim] Error 2
    make: *** [all] Error 2
    
  • ntosme2ntosme2 Posts: 38
    edited 2019-09-14 15:08
    Ah you'll probably want the ./gcc/contrib/download_prerequisites script for OSX,Debian,etc. .
    I'm on Gentoo so those dependencies are already on the system.

    I updated the instructions to mention that.
  • ntosme2 wrote: »
    Ah you'll probably want the ./gcc/contrib/download_prerequisites script for OSX,Debian,etc. .
    I'm on Gentoo so those dependencies are already on the system.

    I updated the instructions to mention that.

    That didn't seem to fix the issue... But, running ./build_gcc.sh anyway created the gcc binaries. Probably means there is something missing but the compiler(s) do build and will compile C and CPP sources.

    dgately
  • It looks like it might be when it is trying to build gdb. Can we just skip that for now?
  • roglohrogloh Posts: 5,122
    edited 2019-09-15 07:29
    I was able to build some of this on Mac OS X (Yosemite 10.10.5). Like David and dgately also found, it failed to complete its Make operation during the gdb tool compilation due to lack of a "dtc" tool, though propeller-elf-gcc could still be built. I discovered I was also without the propeller-elf-as because binutils failed to complete its full Make.

    EDIT: after I did "brew install dtc" I found it could complete the binutils Make step okay and create the missing tools.

    I did find I needed to edit the two upper level build scripts to remove the nproc utility that returns the number of CPU cores which I don't have on my Mac. So on my 4 core machine I commented out the nproc part and just hard coded this related lines as:
    #make all-gcc -j `nproc --all`
    make all-gcc -j 4
    

    I also found I had to patch my readline dynamic library link which had version 8 but not 7 which gawk needed before it could work (probably just my own homebrew stuff out of sync).
    cd /usr/local/opt/readline/lib
    ln -s libreadline.8.0.dylib libreadline.7.dylib
    

    I then tried compiling this simple C test program to see if it could convert it to some PASM2.
    int square(int param)
    {
        return (param * param);
    }
    
    int main()
    {
        int a = 4;
    
        a = square(a+1);
        return a;
    }
    

    When I compiled it to assembly (as unoptimized code) I noticed that it is generating strange code with ## and register names. I know it is early days and likely won't be working right yet but thought I'd point it out anyway.

    So after running:
    propeller-elf-gcc -S main.c
    
    it generates this output assembly file below with ## referencing registers r6 or r7 when I think it should just be using the register r6 or r7 instead without ##. To use ## with rdlong/wrlong you would want to have a symbolic name or immediate constant/expression that resolves to an address in the linker later, but not the register name itself. If the code needs to use the register contents as a computed hub address for a stack frame argument, as this example does, it shouldn't use the ## form.
    	.file	"main.c"
    	.text
    	.global	___mulsi3
    	.section .hubram,"aw",@progbits
    	.balign	4
    	.global	_square
    _square:
    	pusha r14
    	pusha lr
    	mov r14, ptra
    	neg r7, #8
    	add r7, r14
    	wrlong r0, ##r7
    	neg r7, #8
    	add r7, r14
    	rdlong r1, ##r7
    	neg r7, #8
    	add r7, r14
    	rdlong r0, ##r7
    	calla #___mulsi3
    	mov r7, r0
    	mov r0, r7
    	mov ptra, r14
    	popa lr
    	popa r14
    	reta
    	.size	_square, .-_square
    	.balign	4
    	.global	_main
    _main:
    	pusha r14
    	pusha lr
    	mov r14, ptra
    	mov r6, #4
    	neg r7, #8
    	add r7, r14
    	wrlong r6, ##r7
    	neg r6, #8
    	add r6, r14
    	rdlong r7, ##r6
    	add r7, #1
    	mov r0, r7
    	calla #_square
    	mov r7, r0
    	neg r6, #8
    	add r6, r14
    	wrlong r7, ##r6
    	neg r6, #8
    	add r6, r14
    	rdlong r7, ##r6
    	mov r0, r7
    	mov ptra, r14
    	popa lr
    	popa r14
    	reta
    	.size	_main, .-_main
    	.ident	"GCC: (GNU) 9.1.0"
    
    

    Regardless it is great to see something with GCC 9.1.0 actually compiling and see some of GCC for native P2 somewhat functional at this very early stage. You'll definitely get there if you keep at it. Nice one @ntosme2. :smile:

    Cheers,
    Roger
  • Yep that was definitely not intended behavior. I think I've fixed that if you want to pull and try again.

    Here's a similar piece of code that shows it distinguishing between HUB ##symbol addressing and register wrlong/rdlong.

    test.cpp
    int a = 4;
    
    int square(int param)
    {
        return (param * param);
    }
    
    int main()
    {
        int b = a+1;
    
        return square(b);
    }
    

    propeller-elf-g++ test.cpp -S -O0
    	.file	"test.cpp"
    	.text
    	.global	_a
    	.section .hubram,"aw",@progbits
    	.balign	4
    	.type	_a, @object
    	.size	_a, 4
    _a:
    	.long	4
    	.global	___mulsi3
    	.balign	4
    	.global	__Z6squarei
    __Z6squarei:
    	pusha r14
    	mov r7, r14
    	sub r7, #8
    	wrlong r0, r7
    	mov r7, r14
    	sub r7, #8
    	rdlong r1, r7
    	mov r7, r14
    	sub r7, #8
    	rdlong r0, r7
    	calla #___mulsi3
    	mov r7, r0
    	mov r0, r7
    	popa r14
    	reta
    	.size	__Z6squarei, .-__Z6squarei
    	.balign	4
    	.global	_main
    _main:
    	pusha r14
    	rdlong r7, ##_a
    	add r7, #1
    	mov r6, r14
    	sub r6, #8
    	wrlong r7, r6
    	mov r7, r14
    	sub r7, #8
    	rdlong r0, r7
    	calla #__Z6squarei
    	mov r7, r0
    	nop
    	mov r0, r7
    	popa r14
    	reta
    	.size	_main, .-_main
    	.ident	"GCC: (GNU) 9.1.0"
    

    propeller-elf-g++ test.cpp -S -O2
    	.file	"test.cpp"
    	.text
    	.global	___mulsi3
    	.section .hubram,"aw",@progbits
    	.balign	4
    	.global	__Z6squarei
    __Z6squarei:
    	mov r1, r0
    	calla #___mulsi3
    	reta
    	.size	__Z6squarei, .-__Z6squarei
    	.balign	4
    	.global	_main
    _main:
    	rdlong r1, ##_a
    	add r1, #1
    	mov r0, r1
    	calla #___mulsi3
    	reta
    	.size	_main, .-_main
    	.global	_a
    	.balign	4
    	.type	_a, @object
    	.size	_a, 4
    _a:
    	.long	4
    	.ident	"GCC: (GNU) 9.1.0"
    

    A few things are wrong here:

    It looks like gcc is manually pushing/popping values to/from the stack (for which I've chosen to use ptra) instead of using pusha/popa.
    Instruction 'pusha' post-increments ptra so the 'sub r6, #8' should probably be 'add r6, #4'.
    Register r14 is being used as the frame pointer, except it wasn't initialized with the value of ptra first.
    I think 'calla #_some_func' needs to be 'calla ##_some_func' assuming the functions are in HUB RAM.
  • evanhevanh Posts: 15,126
    edited 2019-09-16 00:20
    Pasm2 format of double # is only for data immediates. There is separate instructions for 9-bit and 20-bit branching - which all use single # format.
Sign In or Register to comment.