libp2 provides three things: The core startup code, which is responsible for actually initializing libraries and so forth, it can be rewritten in rust, or use the existing library; the runtime library (integer math routines that can't be done in hardware, 64 bit support, floating point support, etc); and hardware access functions (like starting cogs, macros for instructions that don't have equivalent C operations like rev, etc). I don't think rust itself will explicitly rely on any of them (unless you use the features of those libraries), but you won't be able to link programs into executables without it (since you will be missing the custom startup code). there shouldn't be a reliance on libc, but getting memcpy, memset, etc, might be a reason it will be required. It might be worth it to move some of these core memory functions out of libc and into libp2 so that rust has no reliance on libc.
Like Mike said, these functions live in lut for performance. branching in hubex is much more expensive than in cogex or lutex, and you can't use rdfast/wrfast in hubex. So for algorithmic routines that rely on looping a lot (like many math routines are), it's better to place them in the lut. memcpy/memset are great examples of this, they perform much faster in the LUT using wrfast than they would doing individual writes in hubex.
To go deeper, I probably don't have too much advice to offer on the rust side. I don't know the language or the compiler system, I just did what I could to get everything hooked up so you could compile for P2 targets, and hoped someone who knew it better could take over. If you want to write your own memset, etc, and not rely on libc, then really all you need to do is define that function to be in the lut section, and LLVM codegen and linker should handle the rest. Not sure how to do that in rust, but in C it's something like __attribute__ ((section ("lut"), no_builtin("memset"))). For reference, here is the list of all functions that the compiler expects to be in the LUT. https://github.com/ne75/llvm-project/blob/master/llvm/include/llvm/IR/RuntimeLibcalls.def
If the function is too big to fit in the LUT, then it needs a wrapper that will call the actual function that resides in hubex. This is how floating point support current works.
Ok, I guess putting these functions into libp2 makes sense eventually. What I still don’t understand is why this is giving me/us a problem right here. Because nobody knows about memset, why is the call to it into the LUT? Even if it should end up there eventually, I can for example define and call blink2 (the supposed cog bound function) immediately, without a problem. Is that a linker thing?
Please disregard my last post. I just saw that you actually mention how they are expected to be there, this somehow didn’t register yesterday. I’ll try your newer library build and report back.
And the undefined symbol error makes sense--memset is actually defined in libc, not libp2, so if you weren't linking libc, it wouldn't be able to find it. I had including both in my earlier post. Moving those functions to libp2 makes the most sense though--linking libc for a rust program doesn't really make sense, but memset, memcpy, and memmove are things rustc expects to be built in (because llvm), rather than part of the C library. I'll do that sometime soon...
I've moved the 3 memory functions (memcpy, memset, memmove) to libp2, so now you should be able to do the above without defining your own memset, as long as you link libp2. Attached here so you don't need to rebuild anything
As usual, things take longer due to other projects being urgent
I'm currently trying to get a bit more systematically into the project with the goal to better understand the code generation and execution. There's a few question marks, maybe you can help me clear them up:
Puzzling together loadp2 and the disassembly of the generated ELF, it seems as if the ELF sections are just laid out in memory as-is, meaning __entry, __start0, __start are placed at the very beginning of RAM. And then executed in "Register Execution"/cogeexec. I'm unclear as to how that works. How are these instructions placed into COG0 register memory? And how does this work with the LUT, is just placing it at the $200 offset enough to ensure it will be moved to all the LUTs?
The JMP instruction seems to be either a numeric value, and the effectively is JMPREL, or in the form of #\, which is absolute. Is that correct? Is there a reason the disassembly couldn't distinguish between these two (to avoid confusing me ) ?
My current goal would be to get some sort of p2core crate going which pulls in automatically all the p2 lib code as a submodule or some such (under the assumption the P2 target is available), so people can just start without having to set up much if anything for a new rust P2 project.
JMPREL #D takes a normal 9-bit D parameter (which can be a register, too (JMPREL D)), JMP #A takes a special 20-bit A paremeter. There's also the indirect jump opcode JMP D.
JMP D 'Jump indirect absolute
JMPREL D 'Jump indirect relative
JMPREL #D 'Jump direct relative
JMP #A 'Jump direct relative
JMP #\A 'Jump direct absolute
JMPREL #D and JMP #A encode addresses differently for hub exec. JMPREL ##D is a valid instruction albeit rather unlikely.
The other important thing about JMPREL is that the parameter is the number of instructions to skip, not the address (so JMPREL #1 is like JMP #$+1 in cogexec, and JMP #$+4 in hubexec).
@deets said:
Great, thanks for the insights. I'll have to study the instructions a bit more. And hope for the fully annotated assembly instruction manual I guess
Regarding my other question: any insights on how one can prime data in COG/LUT memory? This is still a mystery to me.
I'm not sure about your question but the following might help. loadp2 file loads file into hub RAM beginning at address 0 and starts cog 0, with cog 0 RAM 0-$1F7 copied from 0-$7DF byte addresses in hub RAM. Each cog has to load its own LUT RAM.
For a single cog, I always use a loadp2 file that is exactly 4096 bytes long so that it includes all 2KB of cog RAM plus all 2KB of LUT RAM, making sure I fill any unused space in both. This ensures that start of LUT RAM is always 2048 bytes after start of cog RAM in hub RAM. To load LUT RAM from hub RAM, the cog's startup code has following:
add ptrb,##2048
setq2 #512-1
rdlong 0,ptrb++
Cog 0 could start others cogs, e.g. loadp2 file is 12KB composed of cogs 0+1+2 4KB binaries.
@deets said:
As usual, things take longer due to other projects being urgent
That’s how it always goes…
Puzzling together loadp2 and the disassembly of the generated ELF, it seems as if the ELF sections are just laid out in memory as-is, meaning __entry, __start0, __start are placed at the very beginning of RAM. And then executed in "Register Execution"/cogeexec. I'm unclear as to how that works. How are these instructions placed into COG0 register memory? And how does this work with the LUT, is just placing it at the $200 offset enough to ensure it will be moved to all the LUTs?
On boot, the chip loads the first 2048 bytes into the 512 longs of cog memory and begins executing at register 0. In the specific case of LLVM and my P2 crt0, I wrote it to use a few different start up routines to set things up, including loading the LUT with the runtime library every time a cog boots. I’m not sure there’s a hardware way to load the LUT before/during boot and loading—you always need to do it via software.
I tried finding the place where you load the LUT, but failed. Can you point me to it? And when loading to the LUT you just mean that: write a piece of code (probably in hubexec) that transfers data from the RAM place where the LUT section is placed into the address-space from $200-$400-1 ?
@TonyB_ said:
I'm not sure about your question but the following might help. loadp2 file loads file into hub RAM beginning at address 0 and starts cog 0, with cog 0 RAM 0-$1F7 copied from 0-$7DF byte addresses in hub RAM. Each cog has to load its own LUT RAM.
I have seen the loading part, looking into loadp2.c. I have failed to see the starting cog0 part. How is that accomplished? My naive assumption was that a soft reset just ensures that somehow.
For a single cog, I always use a loadp2 file that is exactly 4096 bytes long so that it includes all 2KB of cog RAM plus all 2KB of LUT RAM, making sure I fill any unused space in both. This ensures that start of LUT RAM is always 2048 bytes after start of cog RAM in hub RAM. To load LUT RAM from hub RAM, the cog's startup code has following:
add ptrb,##2048
setq2 #512-1
rdlong 0,ptrb++
Cog 0 could start others cogs, e.g. loadp2 file is 12KB composed of cogs 0+1+2 4KB binaries.
Nice, that confirms whan Nikita already referred to, I just need to locate that code somehow.
@TonyB_ said:
I'm not sure about your question but the following might help. loadp2 file loads file into hub RAM beginning at address 0 and starts cog 0, with cog 0 RAM 0-$1F7 copied from 0-$7DF byte addresses in hub RAM. Each cog has to load its own LUT RAM.
I have seen the loading part, looking into loadp2.c. I have failed to see the starting cog0 part. How is that accomplished? My naive assumption was that a soft reset just ensures that somehow.
P2 ROM boot code starts cog 0. Doc says: "If a program successfully loads serially within 60 seconds execute 'COGINIT #0,#0' to relaunch cog 0 from $00000." loadp2 transmits file serially in correct format.
@TonyB_ said:
I'm not sure about your question but the following might help. loadp2 file loads file into hub RAM beginning at address 0 and starts cog 0, with cog 0 RAM 0-$1F7 copied from 0-$7DF byte addresses in hub RAM. Each cog has to load its own LUT RAM.
I have seen the loading part, looking into loadp2.c. I have failed to see the starting cog0 part. How is that accomplished? My naive assumption was that a soft reset just ensures that somehow.
For a single cog, I always use a loadp2 file that is exactly 4096 bytes long so that it includes all 2KB of cog RAM plus all 2KB of LUT RAM, making sure I fill any unused space in both. This ensures that start of LUT RAM is always 2048 bytes after start of cog RAM in hub RAM. To load LUT RAM from hub RAM, the cog's startup code has following:
add ptrb,##2048
setq2 #512-1
rdlong 0,ptrb++
Cog 0 could start others cogs, e.g. loadp2 file is 12KB composed of cogs 0+1+2 4KB binaries.
Nice, that confirms whan Nikita already referred to, I just need to locate that code somehow.
For reference on how my crt0 works:
1. __entry() and __start0() get loaded into cog0 on chip boot. The linker has placed things such that __entry() ends up at address 0 and __start0() at address 0x40 in hub RAM (address 0x10 in cog RAM).
2. entry does nothing except jump to 0x10 where start0 is located. This is so that external tools like loadp2 can patch in the clock configuration and not override code space.
3. start0 does some set up to possibly enable debugging (enabled via a compile flag) and then restarts cog0 at __start().
4. __start is a reusable function used it start any cog in hub mode. if it's running on cog0, it does some initial boot stuff and calls main(), otherwise it just calls the cog function passed to it via coginit.
I discovered a problem with the inline assember of Rust. The register size was limited to 8 and 16 bit. So I gr(a|e)bbed around a bit, and found the p2.rs declaration. I changed this to
Nice! I would ask you make a PR, but if you want to wait until you find other bugs and push a bigger one, that would be fine too. I still haven’t learned rust, despite wanting to for months now
You can have the PR if you prefer, no problem. It will be tiny, but there is no harm in that, and establishing a routine here is maybe a good idea.
I also have a repo not yet published with my feeble attempts at creating a p2rt crate that serves as foundation for working with the P2. I'll publish that one of these days. Do you mind being mentioned in it? This is the starting paragraph:
This is my attempt at furthering the Rust implementation on the
Parallax Propeller P2. I'm basing this work off Nikita Ermoshkin's
work who ported the P2 target to LLVM. Anything that works is to his
credit, anything that doesn't probably my mistake.
I keep this readme a bit as a journal, instead of a polished
document. Consolidation might come later, but there is plenty of
topics to understand like
- P2 code execution model, RAM and flash.
- P2 loading.
- Linker Scripts in both C++ and Rust.
- COG-run code vs hubexec.
- Helper functions in LUT-RAM.
And probably a lot more.
My current game plan: I want to build a p2rt crate that contains
everything that Nikita's P2 library contains. Either by
re-implementing or statically building/linking from his sources via
submodule.
@deets said:
I also have a repo not yet published with my feeble attempts at creating a p2rt crate that serves as foundation for working with the P2. I'll publish that one of these days. Do you mind being mentioned in it?
Great, I'll push it one of these days. Right now I'm stumped though: in continuation of my SPI implementation, I need the fltl instruction. However, declaring it analogous to drvl etc fails miserably:
error: invalid operand for instruction
--> /home/deets/Dropbox/projects/p2-rust-support/p2rt/src/lib.rs:71:18
|
71 | "fltl {pin}",
| ^
|
note: instantiated into assembly here
--> <inline asm>:1:2
|
1 | fltl r0
| ^
I tried comparing the P2 target declarations in the llvm between these two instructions, but don't find anything that I can pinpoint. Do you have an idea?
I found the problem... I looked into the llvm project, and there the instructions were declared. But the p2rust of course has its own llvm project as sub-module, and that wasn't up-to-date. I'm trying to rebuild now.
The last commit on the branch rust has a problem btw, it contains merge markers. I'll try and create a PR.
Hmm yeah I’ve made some updates to llvm since creating the rust branch—I will take a look and try to fix. I have a few things I need to do this week but hopefully I can take a look on Wednesday.
Comments
libp2 provides three things: The core startup code, which is responsible for actually initializing libraries and so forth, it can be rewritten in rust, or use the existing library; the runtime library (integer math routines that can't be done in hardware, 64 bit support, floating point support, etc); and hardware access functions (like starting cogs, macros for instructions that don't have equivalent C operations like rev, etc). I don't think rust itself will explicitly rely on any of them (unless you use the features of those libraries), but you won't be able to link programs into executables without it (since you will be missing the custom startup code). there shouldn't be a reliance on libc, but getting memcpy, memset, etc, might be a reason it will be required. It might be worth it to move some of these core memory functions out of libc and into libp2 so that rust has no reliance on libc.
Like Mike said, these functions live in lut for performance. branching in hubex is much more expensive than in cogex or lutex, and you can't use rdfast/wrfast in hubex. So for algorithmic routines that rely on looping a lot (like many math routines are), it's better to place them in the lut. memcpy/memset are great examples of this, they perform much faster in the LUT using wrfast than they would doing individual writes in hubex.
To go deeper, I probably don't have too much advice to offer on the rust side. I don't know the language or the compiler system, I just did what I could to get everything hooked up so you could compile for P2 targets, and hoped someone who knew it better could take over. If you want to write your own memset, etc, and not rely on libc, then really all you need to do is define that function to be in the
lut
section, and LLVM codegen and linker should handle the rest. Not sure how to do that in rust, but in C it's something like__attribute__ ((section ("lut"), no_builtin("memset")))
. For reference, here is the list of all functions that the compiler expects to be in the LUT. https://github.com/ne75/llvm-project/blob/master/llvm/include/llvm/IR/RuntimeLibcalls.defIf the function is too big to fit in the LUT, then it needs a wrapper that will call the actual function that resides in hubex. This is how floating point support current works.
Ok, I guess putting these functions into libp2 makes sense eventually. What I still don’t understand is why this is giving me/us a problem right here. Because nobody knows about memset, why is the call to it into the LUT? Even if it should end up there eventually, I can for example define and call blink2 (the supposed cog bound function) immediately, without a problem. Is that a linker thing?
Please disregard my last post. I just saw that you actually mention how they are expected to be there, this somehow didn’t register yesterday. I’ll try your newer library build and report back.
Using your libp2.a from a few posts ago, things don't improve, actually the opposite:
Aaaaand we have a winner! Thanks to your above suggestion, I was able to lookup the way you can specify the linker section using rust - for reference:
https://docs.rust-embedded.org/embedonomicon/memory-layout.html
I had to roll back the libp2.a to the one I build myself & used before.
And thus the following code generates two different blinking LEDs on my P2 eval! Very happy!
Awesome! Great work digging this down.
And the undefined symbol error makes sense--memset is actually defined in libc, not libp2, so if you weren't linking libc, it wouldn't be able to find it. I had including both in my earlier post. Moving those functions to libp2 makes the most sense though--linking libc for a rust program doesn't really make sense, but memset, memcpy, and memmove are things rustc expects to be built in (because llvm), rather than part of the C library. I'll do that sometime soon...
I've moved the 3 memory functions (memcpy, memset, memmove) to libp2, so now you should be able to do the above without defining your own memset, as long as you link libp2. Attached here so you don't need to rebuild anything
Nice. I’ll give it a spin in a few days, some vacation is coming up.
As usual, things take longer due to other projects being urgent
I'm currently trying to get a bit more systematically into the project with the goal to better understand the code generation and execution. There's a few question marks, maybe you can help me clear them up:
My current goal would be to get some sort of p2core crate going which pulls in automatically all the p2 lib code as a submodule or some such (under the assumption the P2 target is available), so people can just start without having to set up much if anything for a new rust P2 project.
JMPREL #D
is not the same as the normal relative jumpJMP #A
@Wuerfel_21 Oh! I misread the documentation then. What’s the difference?
JMPREL #D takes a normal 9-bit D parameter (which can be a register, too (JMPREL D)), JMP #A takes a special 20-bit A paremeter. There's also the indirect jump opcode JMP D.
My take on the different plain jump instructions:
JMPREL #D
andJMP #A
encode addresses differently for hub exec.JMPREL ##D
is a valid instruction albeit rather unlikely.The other important thing about JMPREL is that the parameter is the number of instructions to skip, not the address (so JMPREL #1 is like JMP #$+1 in cogexec, and JMP #$+4 in hubexec).
Great, thanks for the insights. I'll have to study the instructions a bit more. And hope for the fully annotated assembly instruction manual I guess
Regarding my other question: any insights on how one can prime data in COG/LUT memory? This is still a mystery to me.
I'm not sure about your question but the following might help.
loadp2 file
loads file into hub RAM beginning at address 0 and starts cog 0, with cog 0 RAM 0-$1F7 copied from 0-$7DF byte addresses in hub RAM. Each cog has to load its own LUT RAM.For a single cog, I always use a loadp2 file that is exactly 4096 bytes long so that it includes all 2KB of cog RAM plus all 2KB of LUT RAM, making sure I fill any unused space in both. This ensures that start of LUT RAM is always 2048 bytes after start of cog RAM in hub RAM. To load LUT RAM from hub RAM, the cog's startup code has following:
Cog 0 could start others cogs, e.g. loadp2 file is 12KB composed of cogs 0+1+2 4KB binaries.
That’s how it always goes…
On boot, the chip loads the first 2048 bytes into the 512 longs of cog memory and begins executing at register 0. In the specific case of LLVM and my P2 crt0, I wrote it to use a few different start up routines to set things up, including loading the LUT with the runtime library every time a cog boots. I’m not sure there’s a hardware way to load the LUT before/during boot and loading—you always need to do it via software.
I tried finding the place where you load the LUT, but failed. Can you point me to it? And when loading to the LUT you just mean that: write a piece of code (probably in hubexec) that transfers data from the RAM place where the LUT section is placed into the address-space from $200-$400-1 ?
I have seen the loading part, looking into
loadp2.c
. I have failed to see the starting cog0 part. How is that accomplished? My naive assumption was that a soft reset just ensures that somehow.Nice, that confirms whan Nikita already referred to, I just need to locate that code somehow.
P2 ROM boot code starts cog 0. Doc says: "If a program successfully loads serially within 60 seconds execute 'COGINIT #0,#0' to relaunch cog 0 from $00000." loadp2 transmits file serially in correct format.
In crt0.c, there is a call to a macro
INIT_RTLIB
: https://github.com/ne75/p2llvm/blob/master/libp2/lib/crt0.c#L50 used to load the LUT RAM with the runtime library codewhich is here: https://github.com/ne75/p2llvm/blob/master/libp2/include/propeller2.h#L200. I recommend not messing with the LUT if using the LLVM backend, as it relies on runtime library functions (such as basic math functions that don't have instructions) to be there and there's no catch if you use the LUT for other purposes.
For reference on how my crt0 works:
1.
__entry()
and__start0()
get loaded into cog0 on chip boot. The linker has placed things such that__entry()
ends up at address 0 and__start0()
at address 0x40 in hub RAM (address 0x10 in cog RAM).2. entry does nothing except jump to 0x10 where start0 is located. This is so that external tools like loadp2 can patch in the clock configuration and not override code space.
3. start0 does some set up to possibly enable debugging (enabled via a compile flag) and then restarts cog0 at
__start()
.4.
__start
is a reusable function used it start any cog in hub mode. if it's running on cog0, it does some initial boot stuff and calls main(), otherwise it just calls the cog function passed to it via coginit.I discovered a problem with the inline assember of Rust. The register size was limited to 8 and 16 bit. So I gr(a|e)bbed around a bit, and found the p2.rs declaration. I changed this to
Literally only the
,I32
- I can make a PR for that, but I doubt that's really worth it.So now code using this code
is compiled to
Nice! This doesn't do much yet, trying to port JonnyMac's smartpin SPI code so I can build up on that.
Nice! I would ask you make a PR, but if you want to wait until you find other bugs and push a bigger one, that would be fine too. I still haven’t learned rust, despite wanting to for months now
You can have the PR if you prefer, no problem. It will be tiny, but there is no harm in that, and establishing a routine here is maybe a good idea.
I also have a repo not yet published with my feeble attempts at creating a p2rt crate that serves as foundation for working with the P2. I'll publish that one of these days. Do you mind being mentioned in it? This is the starting paragraph:
Here you go: https://github.com/ne75/rust/pull/1
Merged!
Not at all, appreciate it
Great, I'll push it one of these days. Right now I'm stumped though: in continuation of my SPI implementation, I need the fltl instruction. However, declaring it analogous to drvl etc fails miserably:
An attempt at using it yields
I tried comparing the P2 target declarations in the llvm between these two instructions, but don't find anything that I can pinpoint. Do you have an idea?
I found the problem... I looked into the llvm project, and there the instructions were declared. But the p2rust of course has its own llvm project as sub-module, and that wasn't up-to-date. I'm trying to rebuild now.
The last commit on the branch rust has a problem btw, it contains merge markers. I'll try and create a PR.
So building worked, but then installing failed with the following error-message:
I have to call it quits for today, maybe you have an idea. I'll look into it again as well in the next few days.
Hmm yeah I’ve made some updates to llvm since creating the rust branch—I will take a look and try to fix. I have a few things I need to do this week but hopefully I can take a look on Wednesday.