Understanding code generation
porcupine
Posts: 80
I'm confused how a line of code as simple as this:
int16_t ImmediateRead() {
return buffer_[read_ptr_++];
}
(Where:
int16_t buffer_[256];
volatile uint8_t read_ptr_;
volatile uint8_t write_ptr_;
)
Gets compiled to this very verbose PASM:
.L5
rdbyte r6, r1
and r6,#255
xmov r7,r6 shl r7,#1
add r7, r14
add r6, #1
mov r4, r6
and r4,#255
rdword r6, r7
Shouldn't this really be like 3 lines of PASM? Cycle counts matter a lot here, this is in an inner loop. I'm guessing the AND #255 there is a bounds check of some kind? Any way to turn that off? Why the SHL? I specifically want to wrap on the 8th bit, and do; this is a circular buffer 256 words in size. Any way to improve on the code generation here? Failing that, I'm trying to see how to hand write this piece in PASM.
int16_t ImmediateRead() {
return buffer_[read_ptr_++];
}
(Where:
int16_t buffer_[256];
volatile uint8_t read_ptr_;
volatile uint8_t write_ptr_;
)
Gets compiled to this very verbose PASM:
.L5
rdbyte r6, r1
and r6,#255
xmov r7,r6 shl r7,#1
add r7, r14
add r6, #1
mov r4, r6
and r4,#255
rdword r6, r7
Shouldn't this really be like 3 lines of PASM? Cycle counts matter a lot here, this is in an inner loop. I'm guessing the AND #255 there is a bounds check of some kind? Any way to turn that off? Why the SHL? I specifically want to wrap on the 8th bit, and do; this is a circular buffer 256 words in size. Any way to improve on the code generation here? Failing that, I'm trying to see how to hand write this piece in PASM.
Comments
1) Get the value of read_ptr
2) Get the value of that read_ptr multiplied by two (shift), because buffer has two byte elements.
3) Get the value of buffer_
4) Add those together to get the address of the array element.
5) Read the value at that address.
6) Increment the value of read_ptr that we read originally.
7) AND that result with 255 because it's a byte value.
8) Save the value of the incremented read_ptr
9) Return the result we read in 5)
Or something like that. We have a lot of work to do there. Seems the compiler is doing well.
Can you write that in three lines of PASM?
That AND 255 is there because adding 1 to r6 will overflow 8 bits but you have asked for a byte sized variable.
I suspect that if you made everything 32 bits wide and trimmed the array index with an AND yourself it might be smaller.
I wonder if there's a more optimal implementation of a circular buffer that could be done in pure PASM that fits more with the Propeller's instruction set. Hm.
ryan@sumac:~/prop-synth$ ls -l prop/mcp4922.o
-rw-r--r-- 1 root root 2460 Apr 15 16:18 prop/mcp4922.o
Very frustrating. I fought for many hours to try and trim this down to under 2k to fit.
But even if I could get code size down under 2k, it would still have to hit hub ram to read from the ring buffer (populated by other cogs). That's likely the speed bottleneck I have right now more than anything.
I'm also unclear how to build and link and use mixed memory models. I understand SimpleIDE does some of the magic, but I don't use it (using emacs and or eclipse + a Makefile is much easier for me) and the docs as they are just talk about .cogc files & SimpleIDE without talking about the specific linker options to make mixed happen.
(In general I think the project does itself a disservice by assuming the user will only be using SimpleIDE. It's a good tool for beginners but anybody who has any extensive C/C++ background is unlikely to use it.)
Some of that is documented in the demos folder with the common Makefile. That requires some exploration at this point though, and theoretically one who knows make should understand that by example.