LLVM Backend for Propeller 2

n_ermosh · 2022-01-18 17:05

Is this the steps you are running to build llvm with VS? https://llvm.org/docs/GettingStartedVS.html

If so, in step 12 when running cmake, replace -DLLVM_TARGETS_TO_BUILD=X86 with -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=P2 -DLLVM_TARGETS_TO_BUILD=. This will build only the P2 target to save a descent amount of build time.

If not, please link me what you are doing so I can take a look. Because P2 is experimental it needs to be explicitly enabled somewhere.

iseries · 2022-01-19 13:59

I went back and started over. This time the P2 environment was installed and I was able to build it.

I tried to compile the example blink.cpp program but that fails with ld.lld not found. Apparently on windows the linker is not installed and there is no documentation on what to use for a linker or how to install the linker.

Not a very helpful system. Don't see how anybody is going to install a 5gig compiler to build p2 code.

Mike

n_ermosh · 2022-01-19 16:32

What steps are you using to build and install? There's a few custom things that need to be configured for p2. Specifically, building the experimental target and enabling lld to be built (which is not enabled by default). p2llvm relies on lld for a linker. Like I mentioned, I've never tried this on Windows so I can't give detailed steps on how to get it working. Next week I should have some time to try to create a Windows build and write up the process once I figure it out.

Regarding it being 5GB, that's if you build all the targets. Which there isn't really a reason to do; anything except the p2 target isn't really useful, unless you really want to also compile avr, arm, x64, etc with the same installation I'm hoping folks don't need to build from scratch at all--eventually we'll have prebuilt binaries. My builds are ~300MB (when compiling in Release mode), so 5GB is definitely way too large.

iseries · 2022-01-19 17:45

I did enable lld and got it loaded but now the problem is the custom libp2 and libc. Can't use your scripts because they are for unix.

Don't know how to build the archives since there is no road map for that.

Mike

n_ermosh · 2022-01-19 19:04

Building those is pretty straightforward, take a look at the python script to see how its built. Basically, for libp2, run cmake (you just need to specify where the llvm install is in p2.cmake) to configure, then make from whatever you pick as a build directory. for libc, I don't have it set up with cmake yet, so just run make (again, need to specify where llvm is installed).

When compiling, you'll need to specify libp2 and libc library and include search paths to clang, as well the linker script, p2.ld (which lives in p2llm/libp2). Your compile command should probably look something like this (untested):

clang --target=p2 -I <wherever you installed libp2>/include -I<wherever you installed libc>/include -L<wherever you installed libp2>/lib -L<wherever you installed libc>/lib -T <path to libp2>/p2.ld -o hello.elf  hello.c

I'm assuming you are using WSL? or purely in visual studio with no underlying unix capabilities? The latter is definitely preferable, and I have very little experience with visual studio, so once I get a windows build going, it will likely rely on wsl.

iseries · 2022-01-19 22:55

WSL does not work with the code. The visual studio code generates exe files and not wsl files and the file system under the covers doesn't work either.

Was able to get the make files to build for command line and built the blink.c example program. Loaded the program from flex prop but it doesn't work.

D:\flexprop\bin>loadp2 -p com10 \documents\p2\blink.elf

Mike

n_ermosh · 2022-01-19 23:35

can you disassemble the binary with llvm-objdump -d blink.elf and post it? last night I found I had introduced a bug into function calling and have fixed it, you might have gotten unlucky and gotten the version with the bug. I can confirm by looking at the assembly.

iseries · 2022-01-20 12:42

Ok, I guess I should have looked at what I was compiling. Spent so much time head banging that I missed the target.

Anyway the code works and the LEDs do blink once you set the correct LEDs. I also compared the generated code with FlexC and it looks like FlexC generated better and faster code.

FlexC blink code:

#include <stdio.h>
#include <propeller.h>

#define LED1 56
#define LED2 57


int main(int argc, char** argv)
{

    _dirh(LED1);
    _dirh(LED2);

    while (1)
    {
        _outnot(LED1);
        _outnot(LED2);
        _waitms(1000);
    }
}

Assembly: BlinkFlexc.txt

Eric set the clock before calling main in his code and also reserves some lower bytes to store clock mode and frequency that was kind of decided on.

LLVM blink program:

#include <propeller.h>
#include <sys/p2es_clock.h>

#define LED1 56
#define LED2 57


int main() {
    _clkset(_SETFREQ, _CLOCKFREQ);

    dirh(LED1);
    dirh(LED2);

    while(1) {
        outnot(LED1);
        outnot(LED2);
        waitcnt(CLKFREQ + CNT);
    }
}

The resulting assembly code: blink.txt

Mike

n_ermosh · 2022-01-20 16:55

Glad you got it working! And good to know it does work on windows (with a little bit of build massaging). If you wouldn't mind writing up a quick summary of the build steps you took, that would be super helpful.

It's slightly an apples to oranges comparison since it's not the same source code. In flexC there's still a few calls made to make waitms happen, while in libp2 it's loading clkfreq and cnt into waitcnt the P1 way. In p2llvm, instead of waitcnt, you could do waitx, which is in inline asm macro and would remove the call to waitcnt (which is simulated in software since there's no waitcnt instruction anymore). It's hard to say from a quick glance which one will perform better, but to me it looks like the waitms system call is significantly larger than my waitcnt version of the code.

Additionally, where you set up the clock doesn't really affect performance or make the code better or worse. It's just more hidden in Eric's tools but basically the same code is still there and executing. I had to change the addresses clock mode/frequency are stored at since the addresses the other compilers use would intersect with the current version of my entry code. I'm okay with leaving that as an implementation specific detail, as long as the headers all use the same name for the values (which is still left as a to-do for me)

iseries · 2022-01-20 18:47

I guess you could say you have a working C compiler that just needs a lot of clean up.

As far as the code generated goes it looks like Flexc removes code that is not needed producing a smaller footprint on the P2 which is needed due to the small amount of ram on these units.

00000a00 <main>:
     a00: 28 04 64 fd            setq #2
     a04: 61 a1 67 fc            wrlong r0, #353
     a08: 04 80 00 ff            augs #32772
     a0c: f8 a1 07 f6            mov r0, #504   
     a10: e1 f5 05 ff            augs #390625
     a14: 00 a2 07 f6            mov r1, #0 
     a18: 10 14 c0 fd            calla #5136  <--- go set the clock frequency
     a1c: 41 70 64 fd            dirh #56   
     a20: 41 72 64 fd            dirh #57   
     a24: 24 a2 07 f6            mov r1, #36    
     a28: 4f 70 64 fd            outnot #56 
     a2c: 4f 72 64 fd            outnot #57 
     a30: d1 a5 03 fb            rdlong r2, r1  
     a34: fc 13 c0 fd            calla #5116 <--- call to wait same as flexc?
     a38: ef a1 03 f6            mov r0, r31    
     a3c: d2 a1 03 f1            add r0, r2 
     a40: 04 14 c0 fd            calla #5124 <---- call to wait?
     a44: e0 ff 9f fd            jmp #-32

From a debug standpoint it would be nice if it kept the label instead of showing the addresses.

Still had to hand build a lot of the code so having some of the stuff prebuilt would be nice. Like provide a libc.a and libp2.a.

I run all my code at 200Mhz and don't change it. It's a nice round number and works well. I even change flexc when I get it and change it 200Mhz before I start compiling anything. Would be nice to have the clock set functions put into the startup code and not in main.

Mike

n_ermosh · 2022-01-20 21:00

Yes, I need to figure out labels in disassembly, it's on my to-do list. Probably easy, I just need to figure it out in the LLVM library world. I'm working on getting everything pre-built set up so that you can just download and use everything and not build yourself, so that should come very soon (but maybe not for windows LLVM)

Garbage collection should be working, so if there's a lot of code being left in, it might be getting reference by something in the startup code inadvertently. LIkely something with a printf statement, since that has a lot attached to it.

I've been debating back and forth on setting clocks in startup code vs main application code. ultimately, the startup code needs to be portable to any hardware with any clock hardware configuration, and the exact clock configuration should be up to the application code (or at the very least, and intermediate "board" layer). I settled on leaving it in main application code for maximum flexibility, but could probably make it settable at load time with some coordination with eric's loadp2 and bootloader. Another option is requiring the application code to provide a board-specific library that the startup code can call, but I think that's just adding complexity. There's no "ideal" option in software because hardware can change.

This hasn't really been a problem so far since most people use a single configuration right (the p2 eval board, or some derivative), but given the massive range the P2 can do (I run my boards at 300), I foresee lots of different clock hardware configurations existing. For example, I've already stopped putting crystals on any of my P1 boards, I strictly use CMOS oscillators to generate a clock.

And just to answer the question about disassembly, the first call (to 5136) is clock configuration, the second (5116) is to getcnt() (hidden by the CNT macro) and the third (5124) is the actual waitcnt call.

iseries · 2022-01-21 17:02

Here are the steps I used to load the environment on windows using Visual Studio to build the compiler.

cmake and visual studio required to setup the environment

Installing P2LLVM build system on windows: (about 9Gig)

Get a command prompt
Cd \
Mkdir opt
Cd opt
git clone https://github.com/ne75/p2llvm.git
cd p2llvm
git submodule init
git submodule update <This takes a long time>
cd llvm-project
mkdir build
cd build
cmake -S llvm -B build -DLLVM_ENABLE_PROJECTS="lld;clang" -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=P2 -DLLVM_TARGETS_TO_BUILD="" -DLLVM_COMPILER_JOBS="2" -Thost=x64
Open Visual Studio and select open a project
\opt\p2llvm\llvm-project\build\LLVM.sln <This will load over 500 projects>
Select release version to build
Select Build – Build Solution <This takes forever to complete><select Output to watch progress>

To compile programs:
Copy libp2 to Release
Copy libp2.a to release
Copy p2.ld to release
Add LLVM to command prompt:
path=%path%\opt\p2llvm\llvm-project\build\release\bin;

clang -Os -ffunction-sections -fdata-sections -fno-jump-tables --target=p2 -Dprintf=__simple_printf -DP2_TARGET_MHZ=200 -o %1.o -c %1.cpp
clang --target=p2 -Wl,--gc-sections -Wl,--print-gc-sections -o %1.elf %1.o

Note:
Can not use cmake to build libc or libp2.

Mike

iseries · 2022-01-24 16:23

I was working on the serial test program and was not able to get it to work. Saw the comment about not initializing in startup.

Looked through the code for some time and found that the debug code was stepping on the serial port. The debug code was set to 3m baud with the same ports.

Changed a few things and now the startup code is working the stdin, stdout, and stderr. I changed the baud rate to 115200 as this is the default for loadp2.

When the P2 starts running its speed is 25Mhz and you are setting up these ports before the clock is changed but since the default clock speed is set the baud is set correctly even though the P2 is not running at that speed. I think Eric plays some games with his code. The port speed is not set until you send or receive something and then he calls the pin setup code on first use. This gets him around someone changing the clock speed before they are used.

Mike

n_ermosh · 2022-01-25 17:59

Thanks for the writeup!

Regarding clock configuration, it's still something I'm toying with. right now, it's set up in crt0 at 24MHz (the datasheet gives no indication of the nominal value, only that it's at least 20MHz, I timed my board and it came out to 24MHz, and configure the serial for 3Mbps. This will allow my debugger program (if linked) to communicate with the host, and allows using printf, etc to print debug messages over serial. I might drop this to 2Mbps as I've heard that's more stable, but I want it to be as fast as possible for the debugger so that it uses as little time as necessary for communicating.

So if you do nothing on entry to main, that's what you get RCfast and 3M serial. If you reconfigure the clock, obviously, you also need to reconfigure the serial for the new clock settings by calling uart_init. I don't yet have a clean way that I could do autobaud, and unfortunately because of how unstable the RCFast clock is, using that as the default is probably not a reliable option either, so I need to think of something there. if you have ideas, I'd love to hear them.

Regarding the debugging code--it shouldn't do anything unless you link the debug library with -lp2db. In which case it will use pins 62/63 for serial, BUT it uses a lock so using printf on those same pins will still work (and I've done that regularly). My debugger program is still very rough, so no promises on anything working there.

iseries · 2022-01-25 18:52

Well, I have gone through and updated a whole bunch of code trying to get things working.

I'm setting the clock right after startup runs and removed your debug code for now.

Working through examples to see if they all work and right now work on getting coginit to work.

Compiler seems to work just fine with the only exception is that aug #0 appears before immediates which is not a Biggy.

Updated the smartpin.h file to look like the SPIN version so that they are compatible.

Need to remove some of that P1 code though, since that is not going to be used here.

I guess you could make a P1 C++ compiler which may be of interest since simpleIDE is not being updated.

Mike

n_ermosh · 2022-01-26 04:06

I wouldn't rely too heavily on the examples--those are a bit out of date at this point. I would take a look at the tests folder. I have a bunch of unit tests for features there (including starting cogs). Any changes you are making that you think should be included, please make a PR to my github repo, so I can merge them in if they make sense to do (like the smartpin.h update).

The aug #0 in some places will exist for immediates that can't be known at compile time, but only at link time (like references to external symbols). There's probably some amount of LTO that can be done to remove those but that's outside of my scope of work for now.

There's plenty of P1 code still there for sure--haven't cleaned up the c library fully.

I've thought about making a P1 subtarget as well, but it will a descent amount of work given the memory constraints of the P1--i'd need to figure out how the various memory models work and port all of that over. Not super trivial, so P1 is out of the scope of this project for now. propgcc seems to work fine as is so that's the main p1 solution for now, I think.

iseries · 2022-01-26 19:06

Was chasing a bug that I could not find. I am having problems with the simple_printf function dumping garbage.

I switched to the standard printf and all is good. Don't know what the issue is.

Chasing a bug with cogstart but have had not luck.

Is this code correct:

00002c54 <cogstart>:
    2c54: 61 a3 67 fc            wrlong r1, #353
    2c58: 61 a7 67 fc            wrlong r3, #353
    2c5c: d2 a7 03 f6            mov r3, r2 
    2c60: 04 a6 07 f1            add r3, #4 
    2c64: d3 a3 63 fc            wrlong r1, r3
    2c68: d2 a1 63 fc            wrlong r0, r2
    2c6c: 10 a2 07 f6            mov r1, #16    
    2c70: 28 a4 63 fd            setq r2
    2c74: d0 a3 f3 fc            coginit r1, r0 wc
    2c78: d1 df 03 36       if_nc    mov r31, r1    
    2c7c: 5f a7 07 fb            rdlong r3, #351    
    2c80: 5f a3 07 fb            rdlong r1, #351    
    2c84: 2e 00 64 fd            reta

At the top is saving R1 and R3 to the same address and then later restoring them from a different address.

Mike

n_ermosh · 2022-01-27 02:59

That code is correct. Those immediates are the special PTRA expressions. So they save to PTRA and increment PTRA, the rdlongs at the end do the reverse. I would pull the latest version from github and rebuild--I updated the instruction printer to actually print out the expression instead of the underlying immediate.

I've been using simple_printf exclusively, so not sure why it's printing garbage but normal printf works. It might related to the function bug that I introduced a little bit ago, which I know is fixed in the latest version, however that seems unlikely

iseries · 2022-01-27 14:20

Just downloaded the latest and compiled it.

Something got broken.

After a printf the program hangs.

I see you fixed the object dump to now print ptr++. It would be nice if the calls were in hex though. Makes it easier to find what function is being called.

Mike

n_ermosh · 2022-01-27 17:55

I'm currently working on getting the symbolizer working so that function addresses are converted to symbols and the names are printed instead of an address, which should make things even easier. But you're right, I should make all memory operands print as hex. That'll require a little bit of refactoring at the MC layer, since I currently don't' distinguish between memory operands and other immediates.

You mentioned you made changes to startup code. What were the changes? I'm curious if there's something broken there now... I just tested what's in the master branch with a clean build of libp2 and libc and it's working correctly against all my tests.

iseries · 2022-01-27 18:07

I made no changes yet, just download master and tested a few programs. looks like some memory corruption.

Need the ELF for this program:

/*
 * A propgcc-compatible (almost) blinker program. The main incompatibility is setting up
 * the clock, which will have to be P2 specific, but that's okay since most libraries won't
 * use it.
 *
 */
#include <propeller.h>
#include <stdio.h>
#include <sys/p2es_clock.h>


int main() {
    _clkset(_SETFREQ, _CLOCKFREQ);

    dirh(56);
    dirh(57);
    outh(56);
    outh(57);

    printf("Hello World\n");

    while(1) {
        //outnot(56);
        outnot(57);
        waitcnt(CLKFREQ + CNT);
    }
}

Mike

n_ermosh · 2022-01-27 18:18

so if you do _clkset, you must run _uart_init after (if you use the serial printing functions like printf). _uart_init will use the current clock frequency and given baud rate to determine the smart pin settings, otherwise printf will not work. The following works for me:

#include <propeller.h>
#include <stdio.h>
#include <sys/p2es_clock.h>

int main() {
    _clkset(_SETFREQ, _CLOCKFREQ);
    _uart_init(DBG_UART_RX_PIN, DBG_UART_TX_PIN, 230400);

    dirh(56);
    dirh(57);
    outh(56);
    outh(57);

    printf("Hello World\n");

    while(1) {
        //outnot(56);
        outnot(57);
        waitcnt(CLKFREQ + CNT);
    }
}

Compile and load steps:

/opt/p2llvm/bin/clang -Os -ffunction-sections -fdata-sections --target=p2 -Dprintf=__simple_printf -DP2_TARGET_MHZ=200 -o ../build/blink.o -c blink.cpp
/opt/p2llvm/bin/clang --target=p2 -Wl,--gc-sections -o ../build/blink.elf ../build/blink.o
/opt/p2llvm/bin/loadp2 -v -ZERO -b 230400 -FIFO 1024 -t  ../build/blink.elf

You can make baud rate anything obviously, I use 3M usually, I hear 2M is more stable though. Used 230400 for this example.

iseries · 2022-01-27 18:38

Right, that output the right information but I think memory is getting stepped on.

This code does not work for me:

#include <propeller.h>
#include <stdio.h>
#include <sys/p2es_clock.h>


int main() {
    _clkset(_SETFREQ, _CLOCKFREQ);
    _uart_init(DBG_UART_RX_PIN, DBG_UART_TX_PIN, 3000000);

    dirh(56);
    dirh(57);
    outh(56);
    outh(57);

    printf("Hello World %d\n", CLKFREQ);

    while(1) {
        outl(56);
        outnot(57);
        waitcnt(CLKFREQ + CNT);
    }
}

Don't know why I would need the _uart_init since InitIO is calling this function at startup with these settings.
Also if I do a return from main the P2 reboots and reboots and reboots.

Mike

n_ermosh · 2022-01-27 19:41

Returning from main behavior is currently undefined--not sure what it should do but I'm open to suggestions.

You need to reset it because the value of clkfreq changed. when startup calls InitIO, it uses the values that are set at that time, which are 24Mhz clock and 3Mbps, to configure the smart pins. Once you change clock frequency, the smart pins are using the old values they already had, which will no longer work now that the chip is running at a new clock frequency. So, you need to re-initialize the smart pins at the new clock frequency after clkset.

It's odd that code doesn't work for you. I'm assuming you are define P2_TARGET_MHZ somewhere before importing p2es_clock.h, right?

Can you post the elf that's generated? I'll compare to what I'm generating (with the exact same code) and see what's going on.

iseries · 2022-01-27 21:15

Not getting anywhere here. In the meantime I'm building the environment on a ubuntu server in the clouds.

here is the test blink program that outputs nothing, and the elf file that goes with it.

Mike

n_ermosh · 2022-01-27 21:56

okay so if I compile that source (using the same commands I posted above), it works for me as expected, but if I try to load your elf, nothing works. Looking into why, but it seems like there are quite a few differences in the binaries, including several functions missing from the runtime library (though it shouldn't matter). Can you please post your complete compile commands?

iseries · 2022-01-27 22:16

I use a batch file so this is what have:

clang -Os -ffunction-sections -fdata-sections -fno-jump-tables --target=p2 -Dprintf=__simple_printf -DP2_TARGET_MHZ=200 -o %1.o -c %1.cpp -v 
clang --target=p2 -Wl,--gc-sections -Wl,--print-gc-sections -o %1.elf %1.o -v

Mike

n_ermosh · 2022-01-27 22:18

I see what's going on in your build. You are linking the debug library (with -lp2db), but it's not linking the entire thing (to link it, you need to do do -Wl,--whole-archive -lp2db -Wl,--no-whole-archive. It's annoying to do this but otherwise the linker will throw out the ISRs and I haven't figured out how to make it not do that). But because the library is partially linked, there's a symbol in the symbol table, __enable_p2db, whose existence tells the entry function to enabled debugging using hubset before restarting cog 0. As a result, the cog jumps to the debug ISR memory location, (which is not present) and hangs up.

To remedy, do one of the following:
1. remove -lp2db from your link flags
2. link the whole p2db archive with -Wl,--whole-archive -lp2db -Wl,--no-whole-archive

I recommend #1. the debugger is not yet in a state to be used by everyone.

n_ermosh · 2022-01-27 22:27

Disregard the last, I see you aren't explicitly linking it, but I'm guessing you compiled it into libp2 and the same thing is happening.

How did you compile libp2? did you include p2db with it? In my build system with cmake, I keep them as two separate archives, so that I don't need to link p2db if I don't need it. I've attached all three libraries below, try with these instead if the ones you built.

iseries · 2022-01-27 22:54

Yes, I see where I included the debug library and removed it. Now everything works....

It's tuff to walk through the code when someone drops a piano on your head and you can't see anything wrong. Let's randomly jump somewhere and see what happens.

Now I can start putting all my changes back in and see if all works.

Mike

LLVM Backend for Propeller 2

Comments