New "compiler" for writing Elev8-FC float math

JasonDorie · 2016-09-28 18:55

Thought I'd share a fun Prop project I'm working on for the Elev8-FC.

A little background: the Elev8-FC does all the IMU math on the Prop, in floating point. In order to keep it small and fast, I'm using a modified F32 driver to do the computation. I added a stream processor to it so I can feed it a list of instructions to execute - it processes the whole sequence and sets a flag when it's done, leaving the main cog free to do other tasks. The "instructions" are just 4 byte values: opcode, input1, input2, and output. The input/output values are byte indices into an array of 32 bit variables, mostly floats, but some ints.

All of this works well, but I've been coding it by hand. That was fine initially, but as the project has grown in scope it's become a bit of a pain point - changing the code isn't trivial, so typically I write in C on the PC until I get the math right, then manually translate to the FPU instructions, and I often make mistakes when translating.

Nothing makes mistakes faster than a computer, so I figured I'd try my hand at automating the process, and it works!

I looked into using Yacc/Lex (Bison/Flex) to generate the parser stage, but the documentation for those tools is terrible, and everything I found online was either "calculator" or "C", with nothing in between. Since I was really only interested in parsing a finite subset of C, with no control statements, I decided to write my own. There's still a bunch of work to do, but it's checked in to GitHub in the Flight-Controller project if you're curious.

First up is converting the code into an expression tree - This is done in two stages, tokenizing and parsing. The tokenizing stage is mostly a pre-filter that reads the source and passes along tokens, like "comment", "label", "operator", "function", "parenthesis", "constant", and so on, along with the relevant token content. For a label it would be the variable name.

The parser takes the token stream and makes sure that the tokens make sense. For example:

This code: rx = gx * const_GyroRate;
Becomes: label, assignment, label, operator*, label, end

From the parser, I use a technique called "Recursive Descent" to generate an expression tree. This handles converting an infix expression with operators into a tree that represents the expression, and the order of operations derived from operator precedence. You end up with something like this:

From the expression tree I generate the actual operators, along with a blob of which variables and constants were used, and any const initialization I need. I got it working last night, and actually replaced one of the core math routines in my Elev8 with the generated code, and it works exactly as expected. Even cooler, since the code is just C, I can run the exact same code on the PC in a previewer, feeding it data from the Elev8, and show both results overlaid on each other, like this:

The 3D "X' on the right is actually two models - one computed on the PC, and one computed on the Elev8, showing that they match (this is good). This frees me up to code new math functions in C on the PC, and when I'm happy with the results, have the F32 stream generated automatically from the source.

I've written a code generator before, but never the front-end of a compiler, so this was a fun diversion and I learned a bunch. It might also be useful for others, so if you have a project that needs to run a long sequence of floating point math in a hurry, this might help you.

I've attached an example of a cpp source function and the generated F32 instruction stream.

Publison · 2016-09-28 19:19

Looks good Jason. I always love to look at your stuff that "is way over my head"

I learn a little every time.

But I know it works on the Elev-8!

Heater. · 2016-09-28 19:21

That is totally awesome.

When I got curious about how one writes a compiler a few years back, even a simple one for a simple language, I also gave up on trying to understand the likes of Yacc/Lex (Bison/Flex) and ended up writing my own recursive decent parser in C. I did not even have a tokenizer, it was a single pass from source to generated assembler. All terribly inefficient and un-optimized but it worked.

That was hard enough for me!

If I were doing such a thing again I think I would be using peg.js. http://pegjs.org/

JasonDorie · 2016-09-28 19:39

I've been told to look at either LLVM or Antlr as well. Both are modern, documented, and widely used. Part of my motivation for doing it from scratch is that I wanted to be able to include it as part of the Elev8 project without a lot of new dependencies. I knew the scope would be small enough to be manageable, and since I'm already compiling the source with a real compiler, I don't quite need full validation.

Mostly I wanted it for me, and I'm using Qt as the framework, which provides a lot of string handling, containers, etc. I was surprised at how fast it came together - I had working code generation by the third day.

yeti · 2016-09-28 19:48

That reminds me of a task in my wild 8 bit days: There was a CBM 610 BASIC program (not written by me) to plot formulae taking a string as input, tokenizing it as if the original tokenizer would have done it and "poke"ing the result into line 1, which was a placeholder with the maximum line length filled with ":"s. Line 2 just was "return" and line 0 a "goto" over that patch area to the main program...

...and that beauty had to be translated to a compiled BASIC dialect.

BASICally (mwhuaahahahahaaa!) I could do the same thing with own bytecode but I had to mimic the BASIC interpreter too by adding an own bytecode interpreter.

Lots of fun! \o/

Today I'd crawl thru Flex/Bison examples first too... ;-)

The_Master · 2017-02-05 15:14

Hello,

I am writing this question for all the experts here like JasonDorie and Heater.

All online information I have seen on C compilers all have the stack being used for local variable storage.

I am writing a compiler that uses general memory for local variables, and with each function call, pushes as much of this general memory onto the stack as needed to free up space for the new function's variables. So it's like I am doing it backwards.

Is there a name for this technique? Can't find any info on this. And perhaps this is how it's actually normally done for older micros? (8051, PIC without all the fancy addressing modes)

Heater. · 2017-02-05 17:13

I am very far away from being an expert in language design or compiler creation. But some simple things are clear enough:

Functions, as we know them from most programming languages, can be called from many places within a program.

Ignoring local variables for a minute, this dictates that function calls store their return address somewhere. This does not require a stack (yet), only a single location in which to store a return address. Which is what happens in Propeller PASM with the CALL/RET instructions.

The "yet" part comes when you want to allow recursive calls in your program. Function "A" calls "B" calls "C" which calls "A" again. At this point you find you need a stack to remember all the return addresses so that you can return up the call chain and return to a different place each time.

There is a similar problem with local variables. One could do without a stack. Just keep the local variables of each function in a memory space somewhere.

Again, this whole scheme falls down if you want to allow recursive calls in your language. Every recursion requires a new set of local variables.

Conclusion: Stack is best unless you want to forbid recursion in your language.

Arguably, not supporting recursion is a way to make an efficient code generator on some architectures.

JasonDorie · 2017-02-05 19:30

The compiler I wrote for the Elev8 FC supports a minimal subset of C, so I get to break a lot of rules. In particular, I don't have functions, loops, or even conditional statements. This allows me to know very clearly what the scope of variables is, which ones are used at any given moment, and efficiently allocate space in memory that allows them to overlap.

A traditional C program doesn't have this luxury for exactly the reasons Heater describes above. You can easily do this within a single code block if it doesn't call out to other functions. If other functions *are* called, you'd have to be able to map their variable usage as well, and in turn any functions they might call, all the way down the tree. If it's not a tree you need a stack, end of story, because you simply can't know how the code will behave when running.

What you're doing sounds like it might work, but it's probably really inefficient in memory usage. My guess is (and please correct me if I've misunderstood) that you're giving all local variables a pre-assigned location in memory to use. Then when you call out to a different function, you push those variables onto the stack. My question for you is: what happens if a function is never called? Are the locals used in that function "reserved" somewhere in your memory space? This is what the stack is truly for - preventing wasted space.

jmg · 2017-02-06 23:48

JasonDorie wrote: »

...
The parser takes the token stream and makes sure that the tokens make sense. For example:

This code: rx = gx * const_GyroRate;
Becomes: label, assignment, label, operator*, label, end
...
I've attached an example of a cpp source function and the generated F32 instruction stream.

That is nifty.
How does the final code size (Calls + Modified F32) compare with the same equations fed thru PropGCC ?

JasonDorie · 2017-02-07 01:48

It *should* be dramatically smaller, though to be fair I haven't actually measured it. The modded F32 library is < 2kb (it runs in a cog) and my emitted bytecode is 4 bytes per instruction, period, because all variables are addressed as a byte-index from a base address. It's because I can enforce this that the code is as compact as it is.

Each opcode is <instruction>, <source1>, <source2>, <destination>

My assumption is that in CMM, the code would have to do something like:

  push #source1
  push #source2
  call #multiply
  pop  #destination

Assuming best case, those are likely two bytes per instruction (one for opcode, one for local stack index or register). If any of those need to be written to / read from memory, the instruction size now has to include that too. Add in the actual address for #multiply (minimum 2 bytes) and I'm better by more than 2:1 for size. Again, unmeasured, but I'd be surprised if I was lower than a 3:1 savings.

GCC could make gains by using opcodes for push/pop to a register index in a single byte, possibly, and if expressions were primarily kept in registers that would save some, but I'd be very surprised if it was able to match the level of compactness I'm at.

New "compiler" for writing Elev8-FC float math

Comments