Shop Learn P1 Docs P2 Docs
LLVM Backend for Propeller 2 - Page 7 — Parallax Forums

LLVM Backend for Propeller 2

145791013

Comments

  • Instructions added.

    Awesome! It looks like the main conflict is the propeller2.h changes. I see you turned a lot of the pin access macros into functions. I don't think I'd want to pull this in, since what would be a single instruction inline now becomes a multi-instruction function call. Adding the leading underscore is fine, I don't think I have enough code that I would not want to change it, and it is more consistent with the other functions in that library.

    I'd also probably make simpletools a separate library, maybe even a separate repo, as it's relatively unrelated to the actual function of the compiler/core libraries to execute code.

  • iseriesiseries Posts: 1,283

    The last changes to the propeller2.h and propeller2.c fixed the issue with the inline functions. I thorough tested all the functions to make sure they were inline instructions.

    I also documented most of the functions so doxygen can be used.

    Yes, I did not want to put that in libp2 but need it to compile with just a reference and have it link in.

    Just need to put it all together.

    Mike

  • Congrats to both of you. @n_ermosh for getting LLVM so far, and @iseries for porting of Simple. Huge steps here!

  • iseriesiseries Posts: 1,283

    Not so fast, I just came up for some air and found I missed a few functions.

    You know in the old simpleIDE GCC it could not remove functions contained in a program so most of the individual functions were separate programs and now putting them back together leads to some missing functions if you don't watch what you're doing....

    Also had to convert the eeprom library functions over to flash since we don't have that.

    Need another assembly function:
    cogstop

    Got to have that one if were going use them.

    So a separate folder called tools with the final library name as Libtools.a should work?

    In the process of testing i2c with a 24lc256 chip to see if the functions came over correctly and the speed doesn't get in the way.

    Mike

  • You know in the old simpleIDE GCC it could not remove functions contained in a program so most of the individual functions were separate programs and now putting them back together leads to some missing functions if you don't watch what you're doing....

    Can you elaborate what you mean by this? You can still have each function be a separate .c file, if that's what you mean.

    cogstop is added, not sure how I missed it.

    Yes, I think a separate top level folder called libtools generating a libtools.a should work. I don't think it should be called just "tools" in the top level, since could be misinterpreted as a folder containing tools for the repo, not the tools library.

    Out of curiosity, what was the motivation for changing from macros to inline functions for pin control? Overall, it does the same thing, but adds complexity to the source.

  • @iseries said:
    So a separate folder called tools with the final library name as Libtools.a should work?

    I'd strongly encourage you to create a new repository for the Simple. If you want help creating a build system for it, I'd be happy to lend a hand, but this probably shouldn't be in the compiler itself (especially not if we ever want this merged upstream).

  • @DavidZemon said:

    @iseries said:
    So a separate folder called tools with the final library name as Libtools.a should work?

    I'd strongly encourage you to create a new repository for the Simple. If you want help creating a build system for it, I'd be happy to lend a hand, but this probably shouldn't be in the compiler itself (especially not if we ever want this merged upstream).

    I agree with making a separate repo, but it won't affect upstream merge if we keep it in the same repo--the llvm-project folder is a submodule that I explicitly keep separate so that we can merge in the future.

  • iseriesiseries Posts: 1,283

    On the P1 there is limited amount of memory and the GCC compiler would keep functions that were contained in a program as a library. So when the program got compiled and linked in the code size was larger than it needed to be because it did not drop the functions not referenced in the main program.

    To get around this they extracted all the functions into small programs and then linked them all together so that the GCC compiler would not include them if not referenced.

    I hate having little programs with one function in them, so I tried to put them back together again just cus.

    For example the simpletools library is 63 little programs.

    Mike

  • got it. Yeah clang does basically the same thing. -ffunction-sections puts every function in it's own section, and then if the section is not referenced, the garbage collector in the linker will throw out that section. If you want to keep everything, you can disable that by removing that flag from compilation.

    I agree that having one small function per file can be a little bit much, but I do think it makes sense to keep code that is functionally unrelated in separate C files. otherwise finding where things occur is a mess. And once the project gets large enough, you can enable must faster compiling because each c file can be compiled in parallel.

  • iseriesiseries Posts: 1,283

    I think I have uncovered an inline bug.

    This program:

    include <propeller.h>
    
    
    int test(int);
    
    int i, j;
    
    
    int main(int argc, char** argv)
    {
        i = 0x0a2829;
        test(i);
    
        while (1)
        {
            _wait(1000);
        }
    }
    
    int test(int x)
    {
        int t;
        char p1, p2, w;
    
        t = x;
        w = t >> 16;
        p2 = (t >> 8) & 0xff;
        p1 = t & 0xff;
    
        _pinh(p1);
        _pinh(p2);
        _pinl(p2);
        _pinl(p1);
    
        return t;
    }
    

    When compiled it thinks its passing character values to the pinh and pinl but in reality it's passing the hole value.

    00000a00 <main>:
         a00: 61 a1 67 fc            wrlong r0, ptra++
         a04: bb 00 00 ff            augs #187
         a08: 98 a0 07 f6            mov r0, #152   
         a0c: 14 05 80 ff            augd #1300
         a10: d0 53 68 fc            wrlong #41, r0
         a14: 14 05 00 ff            augs #1300
         a18: 29 a0 07 f6            mov r0, #41    <---- 0x0a2829
         a1c: 30 0a c0 fd            calla #\2608 <---- 0xa30 (test)
         a20: 01 00 00 ff            augs #1
         a24: e8 a1 07 f6            mov r0, #488   
         a28: ec 6c c0 fd            calla #\27884
         a2c: f8 ff 9f fd            jmp #-8
    
    00000a30 <test>:
         a30: 61 a3 67 fc            wrlong r1, ptra++
         a34: 59 a0 63 fd            drvh r0    <---- 0x0a2829
         a38: d0 a3 03 f6            mov r1, r0 
         a3c: 08 a2 47 f0            shr r1, #8 
         a40: 59 a2 63 fd            drvh r1    <----0x0a28
         a44: 58 a2 63 fd            drvl r1    
         a48: 58 a0 63 fd            drvl r0    
         a4c: d0 df 03 f6            mov r31, r0    
         a50: 5f a3 07 fb            rdlong r1, --ptra  
         a54: 2e 00 64 fd            reta
    

    Mike

  • iseriesiseries Posts: 1,283

    Well, after a day of head banging, I got the 24LC256 memory chip to work with the i2c driver on the P2.

    It turned out that the driver was just fine, the compiler not so much.

    One should not change a whole bunch of code here and there and then try to get something new to work only to find out that nothing was working.

    Linking is great because you don't get to find a new bug based on previous compiled code. With Arduino everything gets compiled every time.

    Mike

  • Your code snippet does look like a bug--I will investigate today. Seems like it's throwing out the &. 0xff part of the operation for some reason.

    Regarding when things get compiled--that is totally up to the user and how they set up their environment. I sometimes make the first step of my make all command a clean step just to be sure I'm compiling the latest code when there could interactions that cause bugs that the make system doesn't know about. Lately I've been using cmake and it is much better about making sure the right dependencies are known.

  • Actually I don't think it's a bug, it's a feature :)

    Because _pinh and _pinl have a char type for their argument, the ABI should dictate that the function itself performs the cast/truncation. As a result, the compiler knows that the callee will perform the truncation to a char, so it removes that operation from the caller to save code execution time and space.

    Take a look at the _pinh or _pinl disassembly and see if it's doing that. If it's not, then it is a bug and I will investigate. I will try to recreate as well to double check.

  • iseriesiseries Posts: 1,283

    I think it's a bug because I tried doing it at the other end.

    asm("and %0, #255\ndrvh %0\n"::"r"(pin));
    

    This cause r0 to be truncated correctly but then the follow on instructs don't restore the register and try to work with truncated values.

    I think the compiler can't see inside the assembly code and just passes the value on.

    In that case the inline will have to push the register, modify the value, use it and pop the registers back off.....

    Mike

  • n_ermoshn_ermosh Posts: 288
    edited 2022-02-15 19:59

    With that inline asm, you'd need to mark pin as being written to with "+r"(pin), then it should mark it as used, and push and pop it correctly, I think.

    So I re-created your example, and I get the following. You can see it does a signx to truncate the value properly. So not sure why yours is compiling differently. And this is what I think you'd expect to happen as well, rather than relying on the callee to do the truncation.

    #include <propeller.h>
    #include <stdio.h>
    
    int test(int);
    int i, j;
    void _pinh(char pin);
    void _pinl(char pin);
    
    int main(int argc, char** argv)
    {
        i = 0x0a2829;
        test(i);
    
        while (1) {
    
        }
    }
    
    int test(int x)
    {
        int t;
        char p1, p2, w;
    
        t = x;
        w = t >> 16;
        p2 = (t >> 8) & 0xff;
        p1 = t & 0xff;
    
        _pinh(p1);
        _pinh(p2);
        _pinl(p2);
        _pinl(p1);
    
        return t;
    }
    
    void __attribute__((noinline)) _pinh(char pin) {
        asm ("dirh %0\n"::"r"(pin));
    }
    
    void __attribute__((noinline)) _pinl(char pin) {
        asm ("dirl %0\n"::"r"(pin));
    }
    
    00000a00 <main>:
         a00: 61 a1 67 fc            wrlong r0, ptra++
         a04: 21 00 00 ff            augs #33
         a08: 98 a0 07 f6            mov r0, #152   
         a0c: 14 05 80 ff            augd #1300
         a10: d0 53 68 fc            wrlong #41, r0
         a14: 14 05 00 ff            augs #1300
         a18: 29 a0 07 f6            mov r0, #41    
         a1c: 20 0a c0 fd            calla #\_Z4testi
    
    00000a20 <_Z4testi>:
         a20: 28 04 64 fd            setq #2
         a24: 61 a1 67 fc            wrlong r0, ptra++
         a28: d0 a3 03 f6            mov r1, r0 
         a2c: d1 a5 03 f6            mov r2, r1 
         a30: 07 a4 67 f7            signx r2, #7   
         a34: d2 a1 03 f6            mov r0, r2 
         a38: 68 0a c0 fd            calla #\_Z5_pinhc
         a3c: d1 a1 03 f6            mov r0, r1 
         a40: 08 a0 47 f0            shr r0, #8 
         a44: 07 a0 67 f7            signx r0, #7   
         a48: 68 0a c0 fd            calla #\_Z5_pinhc
         a4c: 70 0a c0 fd            calla #\_Z5_pinlc
         a50: d2 a1 03 f6            mov r0, r2 
         a54: 70 0a c0 fd            calla #\_Z5_pinlc
         a58: d1 df 03 f6            mov r31, r1    
         a5c: 28 04 64 fd            setq #2
         a60: 5f a1 07 fb            rdlong r0, --ptra  
         a64: 2e 00 64 fd            reta   
    
    00000a68 <_Z5_pinhc>:
         a68: 41 a0 63 fd            dirh r0    
         a6c: 2e 00 64 fd            reta   
    
    00000a70 <_Z5_pinlc>:
         a70: 40 a0 63 fd            dirl r0    
         a74: 2e 00 64 fd            reta
    

    Full compile command (generated by CMake)
    /opt/p2llvm/bin/clang++ -D__P2_GIT_VERSION__=\"1c4ab22-modified\" -fno-exceptions -fno-jump-tables --target=p2 -Dprintf=__simple_printf -ffunction-sections -fdata-sections -Oz -std=gnu++17 -MD -MT src/CMakeFiles/t101.dir/t101.cpp.obj -MF CMakeFiles/t101.dir/t101.cpp.obj.d -o CMakeFiles/t101.dir/t101.cpp.obj -c /Users/nikita/Github/p2llvm/test/src/t101.cpp

    (unrelated, I finally added a symbolizer, so function names are now resolved in disassembly. It only works in on complete programs though, not objects that contain relocations)

  • iseriesiseries Posts: 1,283

    Right, your code is not doing inline so everything works fine. That is what I did to put things back to normal.

    Here is a test program that does not work. It completely removes the assembly code for both examples.

    #include <propeller.h>
    #include <stdio.h>
    
    void high(char);
    void low(char);
    
    typedef struct 
    {
        char clk;
        char dta;
        int wait;
    } i2c_x;
    
    
    int test(i2c_x *);
    int test2(int);
    
    
    int i, j;
    i2c_x tst;
    
    
    int main(int argc, char** argv)
    {
        tst.wait = 10;
        tst.clk = 40;
        tst.dta = 41;
    
        i = 0x0a2829;
    
        test(&tst);
    
        test2(i);
    
        while (1)
        {
            _wait(1000);
        }
    }
    
    int test(i2c_x *x)
    {
        char p1, p2, w;
    
        w = x->wait;
        p2 = x->clk;
        p1 = x->dta;
    
        high(p1);
        high(p2);
        low(p2);
        low(p1);
        _waitus(w);
        return w;
    }
    
    int test2(int x)
    {
        char p1, p2, w;
        p1 = x;
        p2 = (x >> 8);
        w = (x >> 16);
    
        high(p1);
        high(p2);
        low(p2);
        low(p1);
        _waitus(w);
        return w;
    }
    
    
    inline void high(char pin)
    {
        asm("and %0, #255\ndrvh %0\n":"+r"(pin):);
    }
    
    inline void low(char pin)
    {
        asm("and %0, #255\ndrvl %0\n":"+r"(pin):);
    }
    

    This next program works exempt test2 is using the wrong values even though I told the compiler these variables are character.

    #include <propeller.h>
    #include <stdio.h>
    
    void high(char);
    void low(char);
    
    typedef struct 
    {
        char clk;
        char dta;
        int wait;
    } i2c_x;
    
    
    int test(i2c_x *);
    int test2(int);
    
    
    int i, j;
    i2c_x tst;
    
    
    int main(int argc, char** argv)
    {
        tst.wait = 10;
        tst.clk = 40;
        tst.dta = 41;
    
        i = 0x0a2829;
    
        test(&tst);
    
        test2(i);
    
        while (1)
        {
            _wait(1000);
        }
    }
    
    int test(i2c_x *x)
    {
        char p1, p2, w;
    
        w = x->wait;
        p2 = x->clk;
        p1 = x->dta;
    
        high(p1);
        high(p2);
        low(p2);
        low(p1);
        _waitus(w);
        return w;
    }
    
    int test2(int x)
    {
        char p1, p2, w;
        p1 = x;
        p2 = (x >> 8);
        w = (x >> 16);
    
        high(p1);
        high(p2);
        low(p2);
        low(p1);
        _waitus(w);
        return w;
    }
    
    
    inline void high(char pin)
    {
        asm("drvh %0\n"::"r"(pin));
    }
    
    inline void low(char pin)
    {
        asm("drvl %0\n"::"r"(pin));
    }
    

    The underlining problem here is if someone uses this function there could be side effects that are not going to work.

    Mike

  • example 1. For the assembly being removed, change the inline asm constraint to be an input: asm("and %0, #255\ndrvh %0\n"::"+r"(pin)); By putting the constraint as an input, it gets marked as used, and the full function will push/pop the used registers and not throw away the code it thinks is unused:

    00000a4c <_Z4testP5i2c_x>:
         a4c: 28 04 64 fd            setq #2
         a50: 61 a1 67 fc            wrlong r0, ptra++
         a54: d0 a3 03 f6            mov r1, r0 
         a58: 04 a2 07 f1            add r1, #4 
         a5c: d1 a3 03 fb            rdlong r1, r1  
         a60: d0 a5 c3 fa            rdbyte r2, r0  
         a64: 01 a0 07 f1            add r0, #1 
         a68: d0 a1 c3 fa            rdbyte r0, r0  
         a6c: ff a0 07 f5            and r0, #255   
         a70: 59 a0 63 fd            drvh r0    
         a74: ff a4 07 f5            and r2, #255   
         a78: 59 a4 63 fd            drvh r2    
         a7c: ff a4 07 f5            and r2, #255   
         a80: 58 a4 63 fd            drvl r2    
         a84: ff a0 07 f5            and r0, #255   
         a88: 58 a0 63 fd            drvl r0    
         a8c: 1f a2 63 fd            waitx r1   
         a90: 07 a2 67 f7            signx r1, #7   
         a94: d1 df 03 f6            mov r31, r1    
         a98: 28 04 64 fd            setq #2
         a9c: 5f a1 07 fb            rdlong r0, --ptra  
         aa0: 2e 00 64 fd            reta   
    

    example 2. I will take a look. You are right, something funky is happening here--I'll see what it takes to fix it.

    Compiler optimization is a bit of a black box... hard to tell why it's doing something a lot of the time.

  • Wuerfel_21Wuerfel_21 Posts: 3,293
    edited 2022-02-15 21:04

    @n_ermosh said:
    Actually I don't think it's a bug, it's a feature :)

    Because _pinh and _pinl have a char type for their argument, the ABI should dictate that the function itself performs the cast/truncation. As a result, the compiler knows that the callee will perform the truncation to a char, so it removes that operation from the caller to save code execution time and space.

    Take a look at the _pinh or _pinl disassembly and see if it's doing that. If it's not, then it is a bug and I will investigate. I will try to recreate as well to double check.

    But then you'd have to mask with $3F before passing the value into DRVL/DRVH, for those take 11 bit pin-fields.

  • @Wuerfel_21 said:

    @n_ermosh said:
    Actually I don't think it's a bug, it's a feature :)

    Because _pinh and _pinl have a char type for their argument, the ABI should dictate that the function itself performs the cast/truncation. As a result, the compiler knows that the callee will perform the truncation to a char, so it removes that operation from the caller to save code execution time and space.

    Take a look at the _pinh or _pinl disassembly and see if it's doing that. If it's not, then it is a bug and I will investigate. I will try to recreate as well to double check.

    But then you'd have to mask with $3F before passing the value into DRVL/DRVH, for those take 11 bit pin-fields.

    I was actually wrong in that statement. The caller should be the one to prepare the argument properly. There's definitely something wrong--very early in code generation, it breaks the association between the truncated value as a char and the value passed to the inline asm statement, so it throws away the truncation code.

  • iseriesiseries Posts: 1,283

    @Wuerfel_21 ,

    You mean 6 bit fields.
    0x3f = 0b0011 1111

    Right, we assume the lower 8 bits are a valid pin value.

    I tried your example but it doesn't see the and in the assembly code and whacks the r0 register right off the bat.

    This however works though:

    inline void high(char pin)
    {
        pin = pin & 63;
        asm ("drvh %0\n"::"r"(pin));
    }
    
    inline void low(char pin)
    {
        pin = pin & 63;
        asm ("drvl %0\n"::"r"(pin));
    }
    

    Can't use 255 though as the compiler sees it as a character and drops it from the program altogether.

    Mike

  • @iseries said:
    @Wuerfel_21 ,

    You mean 6 bit fields.
    0x3f = 0b0011 1111

    No, I mean 11 bit fields. If you do drvh #$40, it will drive high pin 0 and pin 1 - read the documentation.

  • iseriesiseries Posts: 1,283

    @Wuerfel_21 ,

    The spreadsheet says:

    DRVH    {#}D           {WCZ}    OUT bit of pin D[5:0] = 1.    DIR bit = 1. C,Z = OUT bit.
    

    5:0 = 6 bits right?

    Mike

  • Your spreadsheet is outdated.

    OUT bits of pins D[10:6]+D[5:0]..D[5:0] = 1.    DIR bits = 1. Wraps within OUTA/OUTB. Prior SETQ overrides D[10:6]. C,Z = OUT[D[5:0]].
    
  • iseriesiseries Posts: 1,283

    @Wuerfel_21 ,

    No wander the program is doing funky things with a whole bunch of different pins at the same time.

    Well, my book mark is for Rev A. of the spreadsheet which is what I have.

    Mike

  • Regarding the bug with casting to chars. I think this is a bug in LLVM's code. I created a super simple example in LLVM IR with just two instructions that recreates this, and sent an email to the llvm-dev team. I'll keep digging and see if I'm doing something wrong, but so far can't find anything.

  • RaymanRayman Posts: 12,915

    @n_ermosh Sounds like you are close to getting this C++ compiler working on P2. Keep it up, good work! Will be great when ready.

  • iseriesiseries Posts: 1,283

    I put the and patch into the inline assembly issue so I could retest the i2c code to see if it would work with the inline assembly code.

    It does not. The code generated is fine but it reorders the instructions.

        _dirl(d); // float answer
        _waitus(s);
        _pinh(c);
        _waitus(s);
        d = _pinr(d); //ACK=low
        _pinl(c);
        _waitus(s);
        return d;
    

    Compiles like this:

         c98: 40 a4 63 fd            dirl r2    
         c9c: 00 7c c0 fd            calla #\_waitus
         ca0: 59 a6 63 fd            drvh r3    
         ca4: 00 7c c0 fd            calla #\_waitus
         ca8: 58 a6 63 fd            drvl r3    
         cac: 00 7c c0 fd            calla #\_waitus
         cb0: 40 a4 73 fd            testp r2   wc
         cb4: 6c de 63 fd            wrc r31
    

    It moved the test after the pin was changed which won't work.

    Don't know how they used this product with ARM unless they don't use inline which come to think about it when I was working with arm code everything was a call unless it was sensitive and then it turned into an assembly driver that was called.

    That might be what is needed here.

    Mike

  • @Rayman said:
    @n_ermosh Sounds like you are close to getting this C++ compiler working on P2. Keep it up, good work! Will be great when ready.

    @Rayman I'd say it's in a pretty useable state, with the exception of code with a lot of bit-banging like Mike is working on. Fortunately, he's been a great tester to uncover all these issues :). However, I've been able to work on a few small projects here and there with it and it's been working well, including getting lwIP ported and working to get ethernet connectivity to P2.

    @iseries It looks like it re-orders the instructions because it has no knowledge that something external could modify the INx/OUTx registers, and I don't know if I marked those as implicit operands to those instructions. Can you try adding volatile to the inline assembly statements, maybe that will be enough to tell it not to reorder things? But I don't think so; I'll work on it from the compiler side.

    I haven't run into that issue much yet because like you mentioned, I do what you'd do on ARM, make it a single asm block rather than multiple statements. But, I don't love that approach--what you did above should work.

  • Adding volatile to the ASM statements seems to do the trick. Makes sense in a way--tells the compiler this statement has side effects and not make assumptions about what effects this statement has on the state of the processor

    example:

    //...
    asm volatile  ("dirh %0\n"::"r"(pin));
    //...
    
       //...
       _dirl(d); // float answer
       _waitus(s);
       _pinh(c);
       _waitus(s);
       d = _pinr(d); //ACK=low
       _pinl(c);
       _waitus(s);
       return d;
       //...
    
         a64: 61 a1 67 fc            wrlong r0, ptra++
         a68: 40 a0 63 fd            dirl r0    
         a6c: 1f a4 63 fd            waitx r2   
         a70: 41 a2 63 fd            dirh r1    
         a74: 1f a4 63 fd            waitx r2   
         a78: 40 a0 73 fd            testp r0   wc
         a7c: 6c a0 63 fd            wrc r0
         a80: 40 a2 63 fd            dirl r1    
         a84: 1f a4 63 fd            waitx r2   
         a88: d0 df 03 f6            mov r31, r0    
         a8c: 5f a1 07 fb            rdlong r0, --ptra  
         a90: 2e 00 64 fd            reta   
    
  • iseriesiseries Posts: 1,283

    I tried it at the function level and not at the asm level and it didn't work.

    I will try it at the asm level as you did and see it works that way.

    Mike

Sign In or Register to comment.