LLVM Backend for Propeller 2

Rayman · 2022-02-16 20:03

Anybody build this in Windows yet?

n_ermosh · 2022-02-16 20:57

Mike has been working on Windows, comment #133 in this thread lists the steps he took to get it building. I haven't explicitly tried yet, though.

n_ermosh · 2022-02-16 21:17

@iseries okay, I found a work around to the char issue. Basically, I change the "r" constraint to only allow 32-bit types (ints), since our registers are 32 bit. This means for a char type, it requires an explicit cast to an int. See the below example, but now it uses the right value it looks like. Let me know if that works in your setup. I'll continue discussing with llvm team, this seems weird that this cast isn't implicit.

int main() {
    i = 0x0a2829;

    test2(i);

    while (1) {
        waitx(CLKFREQ);
    }
}

int test2(int x) {
    char p1, p2;
    p1 = x;
    p2 = (x >> 8);

    asm("drvh %0"::"r"((int)p1));
    asm("drvh %0"::"r"((int)p2));
    asm("drvl %0"::"r"((int)p2));
    asm("drvl %0"::"r"((int)p1));

    return 0;
}

test function disassembly:

     a34: 28 02 64 fd            setq #1
     a38: 61 a1 67 fc            wrlong r0, ptra++
     a3c: d0 a3 03 f6            mov r1, r0 
     a40: 07 a2 67 f7            signx r1, #7   
     a44: 59 a2 63 fd            drvh r1    
     a48: 10 a0 67 f0            shl r0, #16    
     a4c: 18 a0 c7 f0            sar r0, #24    
     a50: 59 a0 63 fd            drvh r0    
     a54: 58 a0 63 fd            drvl r0    
     a58: 58 a2 63 fd            drvl r1    
     a5c: 00 de 07 f6            mov r31, #0    
     a60: 28 02 64 fd            setq #1
     a64: 5f a1 07 fb            rdlong r0, --ptra  
     a68: 2e 00 64 fd            reta

iseries · 2022-02-16 22:26

Here is another bugabo:

#include <propeller.h>
#include <stdio.h>

int testasm(int, int);


int main(int argc, char** argv)
{
    int p;

    p = testasm(1, 2);

    printf("p: %d\n", p);

    while (1)
    {
        _wait(1000);
    }
}

int testasm(int x, int y)
{
    int i, j, k;
    i = 8;
    j = 0;
    k = 16;

    asm("mov %[i], #8\n"
        "mov %[j], #0\n"
        "loop1: shl %[j], #1\n"
        "drvh %[x]\n"
        "testp %[k], wc\n"
        "if_c or %[j], #1\n"
        "drvl %[y]\n"
        "djnz %[i], #-6\n"
        :[i]"+r"(i), [j]"+r"(j)
        :[k]"r"(k), [x]"r"(x), [y]"r"(y));
    return j;
}

The assembly program will not compile complaining about a duplicate loop1 label. Doesn't matter what you call the label.
The djnz #-6 generates an augs #8388607 instruction.
If you declare a variable but don't assign it a value, in the assembly program it assigns it r0 even if there is already one.

Mike

n_ermosh · 2022-02-16 23:30

Make the label local: .Lloop1, or you can also use the special string %= to automatically assign an incrementing number for each instance of the same asm statement. see https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#AssemblerTemplate

n_ermosh · 2022-02-16 23:32

the djnz generating and augs is a bug--that I thought I had fixed, where it decides it needs an augs because the (unsigned) value is > 511. I'll need to add special parsing for jump instructions where the argument is 9 bits signed, sign extended to 32.

To avoid hitting this bug, you should be able to jump to the label instead of manually counting offsets: djnz #.Lloop1

iseries · 2022-02-17 13:39

I tried adding the period as I tried all kinds of variations but no luck.

Then I tried counting the offset and saw the other bug. I use this same code in i2c functions with no problems.

Doesn't do this if there is no main though.

Mike

iseries · 2022-02-17 13:49

Dealing with bloat,

I was testing a lot of code and started to notice that my simple test program which had nothing in it was over 100k in size. Not only does it take a little longer to load but should not be so big.

I then looked at the object dump to see that the SD card driver code I added was in there as well. So even though I didn't reference it in my program it was including it anyway. Oops there it is.

Well, there is a driver table that contains a list of drivers needed for different devices and I just added the SD card driver to the list. That counts as a reference and now I have this bloated feeling.

To fix the issue I removed the driver from the list and only add it to the list when the sd_mount command is issued.

Now my simple test program is only 25k.

Mike

n_ermosh · 2022-02-17 15:57

Did you add just a period or “.L”? Need both characters to signify it’s a local label so it doesnt interfere with other labels that might have the same name.

I found that using objcopy to convert the elf to a bin first and loading the bin improves load times significantly. Like a full 512k program (with my debugger at the end) will take only a couple seconds to load (at 3Mbps baudrate)

iseries · 2022-02-17 16:41

I use my own loader which drops the stack, and heap data which is all zeros anyway and don't use the debugger so program size is small. Load everything at 230400.

Mike

iseries · 2022-02-17 16:51

Don't know what going on here but this code compiles just fine:

int I2C_WriteByte(i2c_t *x, int b)
{
    int s, c, d, i;

    s = x->wait;
    d = x->dta;
    c = x->clk;
    i = 8;

    asm ("shl %[b], #24\n"
        "_isnd: rcl %[b], #1 wc\n"
        "if_c dirl %[d]\n"
        "if_nc dirh %[d]\n"
        "waitx %[s]\n"
        "drvh %[c]\n"
        "waitx %[s]\n"
        "drvl %[c]\n"
        "djnz %[i], #_isnd\n"
        "dirl %[d]\n"
        "waitx %[s]\n"
        "drvh %[c]\n"
        "waitx %[s]\n"
        "testp %[d] wc\n"
        "wrc %[d]\n"
        "drvl %[c]\n"
        "waitx %[s]\n"
        :[d]"+r"(d),[i]"+r"(i), [b]"+r"(b)
        :[c]"r"(c), [s]"r"(s));

    return d;
}

Tried the .L with no joy. I think the problem is that it inlines the hole function and thinks the original label and the inline label are duplicates. When I did get it to compile there were no functions just one big program.

Mike

Yep, that fixed it. Added enough code so that it didn't inline the function and now it compiles just fine.

n_ermosh · 2022-02-17 17:33

@iseries said:
Tried the .L with no joy. I think the problem is that it inlines the hole function and thinks the original label and the inline label are duplicates. When I did get it to compile there were no functions just one big program.

I've run into this before. The solution is to make a label something .Lloop%=. This will create labels .Lloop0, .Lloop1, etc for ever time that asm statement is used, solving the inlining issue. Be careful with this though, because something like .Lloop1%=, will evaluate to .Lloop10 on the first instance, and .Lloop%= will evaluate .Lloop10 on the 11th instance, still creating a duplicate label.

n_ermosh · 2022-02-17 21:00

Also, if you are curious about the char stuff with inline asm, here's the discussion with LLVM devs, with a very length writeup on a very similar issue with AArch64 and X86. Turns out, you can't expect the compiler to sign or zero extend a value when allocating a register for an inline asm expression, and need to explicitly up-cast. I've changed the target code in LLVM to not allow smaller types to be placed into inline asm. If you try, it will throw and error. Add a cast to explicitly sign or zero extend to 32 bits.

iseries · 2022-02-17 22:27

So if you cast to an integer type and the value your passing is 8 bits how does it know to chop the upper bits off.

The example of 0x0a2728 is put in register R0 and you tell the compiler to and it with 0xff but it decides its already character but it's not and passes the whole thing to the inline assembly which has it declared as integer you're going to get the wrong value.

The function if called has it defined as character so why is it not doing that.

Not following the logic here.

Mike

n_ermosh · 2022-02-18 02:33

So in the case where you call a function, the truncation to char should happen before the call, and then it gets upcast to an int inside the function and all is happy. In fact, it gets upcast implicitly when placed in r0.

The issue (that we ran into here), is where that function gets inlined and the whole idea that the value was supposed to be a char gets tossed out the window, and the compiler sees you want a char type, and was seeing that r0 was valid for char types, and immediately placing the value where it was FIRST defined (as an int) into r0, skipping the truncation that would happen during the call. Effectively, because I said r0 was valid for storing chars, the compiler assumed the hardware would do the truncation and didn't leave it in code. So, by restricting all values to ints only, it forces the programmer to do the truncation in software by either adding a 0xff, or in my tests, it properly tracked it when inlining.

In the case of and'ing with 0xff, I THINK what was happening was the compiler saw the & 0xff, saw that the register was valid for chars, and decided to let the hardware do the truncation instead of doing it in software. I tested it again with the change to only allow ints, and if I took a large value & 0xff, then upcast to int in the assembly statement, it would properly use the truncated value.

Hopefully that makes sense? I'm still fully trying to grasp how the compiler decides what to do when doing inline assembly. But this is effectively a result of the graph the compiler builds and optimizes

iseries · 2022-02-18 12:03

Actually, I'm good with these functions not being inline and called. Heck, the that's what the compiler on the P1 does and as long as the function does what it's supposed to do.

When it comes to writing drivers, inline assembly was always going to be the way to go.

I'd like us also to be using the same code if we can get there. I like the way the functions show up in Visual Studio Code for code prompting and completion. A big help when trying to write code for the first time.

I would also like to come up with a way that a config file or an environment variable could be set to specify the clock frequency of the program when it gets compiled. People are going to have their reasons for picking the speed they run at and having it set in their environment without having to pass it in or remembering to change it. On the P1 we had board configuration files that did just that.

Most people don't care what frequency it is, they just want their program to work. Some people are going to need it set to an exact value for driving some hardware. For me weather I blink the LED at 300Mhz or 200Mhz it looks the same.

I think the compiler is ready to go and just needs a few things completed. I see the geometry functions are missing. I have BNO080 and FMX30 that need those functions to work. Some of those functions will need to be rewritten to use CORDIC.

Mike

n_ermosh · 2022-02-18 16:34

All inlining issues should be solved now, and it’s a big performance boost, especially in hub mode, to not do calls. It would be nice to be able to write high performance code without inline assembly, but for very low level drivers thats probably the way to go.

Which geometry functions? Like trig and math library stuff?

Board configuration is interesting. I’ve been handling it by always including a “board.h” header in my project to define all board specific stuff: clock configuration, pin assignments, etc. and that becomes my “configuration file” and I just call an “init_board” function that it defines and don’t worry about any of that in my actual application.

Otherwise, I think it would need to be patched at load time, which requires custom loader stuff that I’m not ready to take on, yet.

Also, defining a strict workflow for a new user will be a must, otherwise it will be way to easy to get lost. I might write that up, as a “Getting Started”-type guide. That also requires getting your changes merged in and into the right state with the library and all that. If I write that up, can you review it, and also make a windows-specific version? Still need to figure out how to get a windows-distributable build of the compiler. @DavidZemon is that something you could help with?

DavidZemon · 2022-02-18 16:42

Also, defining a strict workflow for a new user will be a must, otherwise it will be way to easy to get lost. I might write that up, as a “Getting Started”-type guide. That also requires getting your changes merged in and into the right state with the library and all that. If I write that up, can you review it, and also make a windows-specific version? Still need to figure out how to get a windows-distributable build of the compiler. @DavidZemon is that something you could help with?

I should have time this weekend to look into it, yes. Been swamped over the last couple weeks as I prepared for a big presentation (y'all may find it interesting), but now that it's over I'm hoping to dig into my new prop 2 eval board and rust. I'll see about getting those windows cross compiles for llvm working too.

iseries · 2022-02-18 20:23

I downloaded your fix and build P2LLVM.

Changed all the functions to use int pins and recompiled.

Tested against test case here:

#include <propeller.h>
#include <stdio.h>

void high(char);
void low(char);
void set(int);


int main(int argc, char** argv)
{
    int p;

    p = 0x0a2829;
    set(p);

    while (1)
    {
        _wait(1000);
    }
}


void set(int x)
{
    char p1, p2, w;

    p1 = x & 0xff;
    p2 = (x >> 8) & 0xff;
    w = (x >> 16) & 0xff;

    _pinh(p1);
    _pinl(p2);
    _wait(w*100);
    _pinl(p1);
    _pinh(p2);

}

Did and object dump here:

00000a00 <main>:
     a00: 61 a1 67 fc            wrlong r0, ptra++
     a04: 14 05 00 ff            augs #1300
     a08: 29 a0 07 f6            mov r0, #41    
     a0c: 20 0a c0 fd            calla #\set
     a10: 01 00 00 ff            augs #1
     a14: e8 a1 07 f6            mov r0, #488   
     a18: 90 14 c0 fd            calla #\_wait
     a1c: f8 ff 9f fd            jmp #-8

00000a20 <set>:
     a20: 28 04 64 fd            setq #2
     a24: 61 a1 67 fc            wrlong r0, ptra++
     a28: d0 a3 03 f6            mov r1, r0 <--0a2829
     a2c: 08 a0 67 f0            shl r0, #8 
     a30: 18 a0 c7 f0            sar r0, #24    <--0a
     a34: 64 a4 07 f6            mov r2, #100   
     a38: d2 a1 03 fd            qmul r0, r2
     a3c: 18 a0 63 fd            getqx r0   
     a40: d1 a5 03 f6            mov r2, r1 <--0a2829
     a44: 07 a4 67 f7            signx r2, #7   
     a48: 59 a4 63 fd            drvh r2    <--29
     a4c: 10 a2 67 f0            shl r1, #16    
     a50: 18 a2 c7 f0            sar r1, #24    
     a54: 58 a2 63 fd            drvl r1    <--28
     a58: 90 14 c0 fd            calla #\_wait
     a5c: 58 a4 63 fd            drvl r2    
     a60: 59 a2 63 fd            drvh r1    
     a64: 28 04 64 fd            setq #2
     a68: 5f a1 07 fb            rdlong r0, --ptra  
     a6c: 2e 00 64 fd            reta

Everything looks good.

The message it generates though is a little confusing:

\opt\p2llvm\bin\../libp2/include\propeller2.h:294:10: error: couldn't allocate input reg for constraint 'r'
    asm ("wypin %0, %1\n"::"r"(yval),"r"(pin));

Say, would it be possible to do something like this:

     a00: 61 a1 67 fc            wrlong r0, ptra++
     a04: 14 05 00 ff            augs #1300         #665641
     a08: 29 a0 07 f6            mov r0, #41

Some people might not have a calculator that will shift the number left 9 places and add 41.

Mike

n_ermosh · 2022-02-18 21:54

Yeah it's not the greatest error message but that comes from clang so not something I have control over. But it makes sense to some extent: there are no registers available that can hold the given value type.

I've been wanting to do something about folding a preceding augs/augd into the resulting instruction and printing it as a comment or something, but the way instruction parsing and printing works, it's hard to associate two instructions together; they are each treated independently. But I agree it would be nice, I've memorized the keyboard shortcuts for shift and add for macOS's calculator by now from doing it so much Worst case, I can write a bash script that will just wrap objdump and insert a comment string with parsed augs/d value.

Rayman · 2022-02-19 16:57

Trying to do the windows build from @iseries post #133 (or so) here but having a strange problem...

When I do "git submodule update" on p2llvm, it appears to hang and also take out the internet connection.

Very strange. Just installed latest git from here: https://git-scm.com/download/win

Rayman · 2022-02-19 16:59

Going to try just downloading the zips and putting in the respective folders...

iseries · 2022-02-19 17:09

@Rayman ,

Yes, the git submodule update --init will take forever and use all available bandwith.

Mike

DavidZemon · 2022-02-19 17:16

I tried switching the cross-compiler from GCC (mingw) to Clang today, in the hopes that it would solve the compilation error on the CI server. No. In fact, after finally getting CMake to pass its configure step, it failed compilation on exactly the same line with exactly the same error.

I do not know why llvm refuses to cross-compile Linux -> Windows, but it seems pretty unhappy about the idea.

10:05:32   [  3%] Building CXX object utils/benchmark/src/CMakeFiles/benchmark.dir/benchmark.cc.obj
10:05:32   In file included from /home/teamcity/BuildAgent/work/a2d8435c933ae417/llvm-project/llvm/utils/benchmark/src/benchmark.cc:45:
10:05:32   /home/teamcity/BuildAgent/work/a2d8435c933ae417/llvm-project/llvm/utils/benchmark/src/mutex.h:69:14: error: no type named 'condition_variable' in namespace 'std'
10:05:32   typedef std::condition_variable Condition;
10:05:32           ~~~~~^
10:05:32   /home/teamcity/BuildAgent/work/a2d8435c933ae417/llvm-project/llvm/utils/benchmark/src/mutex.h:81:3: error: no type named 'mutex' in namespace 'std'; did you mean 'Mutex'?
10:05:32     std::mutex& native_handle() { return mut_; }
10:05:32     ^~~~~~~~~~
10:05:32     Mutex
10:05:32   /home/teamcity/BuildAgent/work/a2d8435c933ae417/llvm-project/llvm/utils/benchmark/src/mutex.h:75:27: note: 'Mutex' declared here
10:05:32   class CAPABILITY("mutex") Mutex {
10:05:32                             ^

Rayman · 2022-02-19 17:17

@iseries said:
@Rayman ,

Yes, the git submodule update --init will take forever and use all available bandwith.

Mike

Ok, I switched to a better computer and it appears to be working.

Rayman · 2022-02-19 17:29

@iseries I get this error running cmake in the build folder:
CMake Error: The source directory "C:/opt/p2llvm/llvm-project/build/llvm" does not exist.
Specify --help for usage, or press the help button on the CMake GUI.

If I back up one folder and run it, it does a bunch of stuff, but says a lot of things are "not found".
Then, says I need Python 3 at the end...

iseries · 2022-02-19 17:34

@Rayman ,

Yes, my bad, run the command from outside the build folder.

The not founds are things it's adding in. This is so if it crashes you can rerun, and it will pick up where it left off.

Mike

Rayman · 2022-02-19 17:36

Ok, it looked a lot better after installing Python 3

iseries · 2022-02-19 18:49

About the clock setting feature.

I looked at your addresses and see they don't match up with loadp2 is using:

            memcpy(&binbuffer[0x14], &clock_freq, 4);
            memcpy(&binbuffer[0x18], &clock_mode, 4);
            memcpy(&binbuffer[0x1c], &user_baud, 4);

Don't know if this is the unofficial location or not but if you're using loadp2 to load your program they need to match.
```

define _clkfreq (((int)0x24))

define _clkmode (((int)0x28))

I changed them to:

define _clkfreq (((int)0x20))

define _clkmode (((int)0x24))

I also added this code to crt0.c so that the clock would get configured base on the value in _clkfreq.

_clkfreq = 200000000;
_clkmode = 0x14c00f8;
_Program = 0x4c4c3250;
_Program1 = 0x4d56;

// configure clock to run at the speed selected
_hubset(0);
_hubset(_clkmode); // clock config (20,000,000 * 10)
_waitx(177340);
r = _clkfreq / 1000000 - 1;
_clkmode |= (r << 8) | 3;
_hubset(_clkmode); // set clock pll

```
So now you could patch libp2.a(crt0.o) to have what frequency you wanted you would be all set every time you compiled your program.

Mike

Rayman · 2022-02-19 18:57

I think maybe it finally built...
Gives this error message at the end though.

LLVM Backend for Propeller 2

Comments

define _clkfreq (*((int*)0x24))

define _clkmode (*((int*)0x28))

define _clkfreq (*((int*)0x20))

define _clkmode (*((int*)0x24))

define _clkfreq (((int)0x24))

define _clkmode (((int)0x28))

define _clkfreq (((int)0x20))

define _clkmode (((int)0x24))