Understanding code generation

porcupine · 2014-04-15 11:33

I'm confused how a line of code as simple as this:

int16_t ImmediateRead() {
return buffer_[read_ptr_++];
}

(Where:

int16_t buffer_[256];
volatile uint8_t read_ptr_;
volatile uint8_t write_ptr_;
)

Gets compiled to this very verbose PASM:

.L5
rdbyte r6, r1
and r6,#255
xmov r7,r6 shl r7,#1
add r7, r14
add r6, #1
mov r4, r6
and r4,#255
rdword r6, r7

Shouldn't this really be like 3 lines of PASM? Cycle counts matter a lot here, this is in an inner loop. I'm guessing the AND #255 there is a bounds check of some kind? Any way to turn that off? Why the SHL? I specifically want to wrap on the 8th bit, and do; this is a circular buffer 256 words in size. Any way to improve on the code generation here? Failing that, I'm trying to see how to hand write this piece in PASM.

porcupine · 2014-04-15 11:40

(Just noticed removing 'volatile' eliminated the AND #256; but the xmov/shl is still there.

Heater. · 2014-04-15 12:06

Let's see, we have to:

1) Get the value of read_ptr
2) Get the value of that read_ptr multiplied by two (shift), because buffer has two byte elements.
3) Get the value of buffer_
4) Add those together to get the address of the array element.
5) Read the value at that address.
6) Increment the value of read_ptr that we read originally.
7) AND that result with 255 because it's a byte value.
8) Save the value of the incremented read_ptr
9) Return the result we read in 5)

Or something like that. We have a lot of work to do there. Seems the compiler is doing well.

Can you write that in three lines of PASM?

That AND 255 is there because adding 1 to r6 will overflow 8 bits but you have asked for a byte sized variable.

I suspect that if you made everything 32 bits wide and trimmed the array index with an AND yourself it might be smaller.

porcupine · 2014-04-15 12:28

Oh, doh, yes, you are so right and I'm feeling a bit silly here; the piece I was not thinking about was the width of buffer_ entries, and that explains the shift there.

I wonder if there's a more optimal implementation of a circular buffer that could be done in pure PASM that fits more with the Propeller's instruction set. Hm.

Dave Hein · 2014-04-15 12:42

Which memory model and optimization level did you use? Using -mcog -O3 I get 13 instructions for the routine. There seems to be 3 redundant instructions, so it could have been done in 10 instructions in PASM.

porcupine · 2014-04-15 13:17

I would love to use COG but it's unlikely I'll get my code in under 2k, so it's LMM for now.

porcupine · 2014-04-15 13:21

I am close tho:

ryan@sumac:~/prop-synth$ ls -l prop/mcp4922.o
-rw-r--r-- 1 root root 2460 Apr 15 16:18 prop/mcp4922.o

Very frustrating. I fought for many hours to try and trim this down to under 2k to fit.

But even if I could get code size down under 2k, it would still have to hit hub ram to read from the ring buffer (populated by other cogs). That's likely the speed bottleneck I have right now more than anything.

I'm also unclear how to build and link and use mixed memory models. I understand SimpleIDE does some of the magic, but I don't use it (using emacs and or eclipse + a Makefile is much easier for me) and the docs as they are just talk about .cogc files & SimpleIDE without talking about the specific linker options to make mixed happen.

(In general I think the project does itself a disservice by assuming the user will only be using SimpleIDE. It's a good tool for beginners but anybody who has any extensive C/C++ background is unlikely to use it.)

jazzed · 2014-04-15 14:01

porcupine wrote: »

(In general I think the project does itself a disservice by assuming the user will only be using SimpleIDE. It's a good tool for beginners but anybody who has any extensive C/C++ background is unlikely to use it.)

Conversely, the developers who do not use or develop SimpleIDE would be doing you and others a service by documenting how it's done with Makefiles.

Some of that is documented in the demos folder with the common Makefile. That requires some exploration at this point though, and theoretically one who knows make should understand that by example.

Dave Hein · 2014-04-15 15:07

porcupine, why not just copy the way SimpleIDE does it. I think there's a mixed mode toggle program you could start with. Compile it under SimpleIDE, and copy the commands that it uses to build it. It seems pretty straight forward.

porcupine · 2014-04-15 15:34

Yeah if I can figure it out I'll try putting some notes together.

ersmith · 2014-04-16 05:23

Using "uint16_t" instead of "int16_t" will save you some instructions (the compiler sign extends int16_t, but doesn't have to do that with uint16_t). Changing all the int16_t to uint16_t and getting rid of the volatiles gets the instruction count down to 9 in -mcog mode. One of those is redundant (a mov from COG memory to a register) but I think the rest are all necessary.

Understanding code generation

Comments