It's Time for a New Spin Compiler

Electrodude · 2015-03-06 13:12

I've decided it's time for a new Spin compiler. There's no Spin compiler that I know of with good optimizations like common subexpression elimination, function inlining (when it makes sense), compile-time function execution, or dead-code removal (except BST, to an extent), and standard Spin has no macros, no real user-definable namespaces, no way to export DAT symbols from one object to another (for LMM kernels and such), no parametric types (not like any language does these well), and none of many other features that would be very useful. Largely because of these lacking features, more and more people are switching to C/C++.

If you've ever looked at Tachyon's source, you'll find much of it should be written in a special FORTH Spin block but is instead shoehorned into DAT blocks. There are floating point libraries with floating point functions that would be a lot easier to use if there was a way to program them with an actual expression and not something like assembly. I have a DAQ program that has to be modified in multiple places to add a field, but with a macro compiler, all that would need to be done to add a field would be add a DAQ Spin block that names and registers the field, specifies if it's an int or a float, and gets the current value for the field (like a PRI block), all in one place. We need a Spin compiler that knows how to do all this.

My goal is not make a new language. This new compiler/language should be 100% backward compatible with standard Spin. Using Extended Spin should feel as close to using standard Spin as possible. I plan to add types, but they should be completely optional and only generate compiler warnings by default. They should mainly be used for documentation. Type inference should occur where possible, but if the compiler isn't sure of the type of something, it should just not bother with it. I don't think I'll allow operator overloading (for floats and such) by default, but I might add it as an optional feature.

Of course, there should have a command-line flag to produce output identical to that of the official Propeller Tool when given only standard Spin, and another (separate) flag to reject all non-Parallax-compatible extensions.

Part of the reason I want to do this is for the experiencer therefore, I'd like to not use (m)any external libraries, let alone a compiler library. The only exception to this is making compiler plugins explicitly for interfacing with external libraries, for example, for making Lua plugins.

All I have so far is the module loader, which works, and the first-pass Spin block splitter, which is probably going to get removed unless I can figure out what to do when someone tries embedding C in a Spin file using a compiler plugin and finds that {} can't be used for C blocks anymore because it already means Spin comments. To (probably) make matters worse, Spin curly brackets aren't necessarily perfectly matched, for example, "{{ { }}".

All planned features: https://github.com/electrodude/esc/blob/master/TODO.md

What I have so far (all likely to change): https://github.com/electrodude/esc

Any comments or suggestions are welcome (and I'll probably never finish this without them).

EDIT:
This compiler will definitely be open source. As you can see, it's already on Github. As there's no way I'll ever have the time to finish this by myself, any help would be greatly appreciated.

ksltd · 2015-03-06 14:49

I have more than a few opinions on this and will likely be doing my own work in this area shortly. If you're interested, send me an e-mail here with your contact info ...

Electrodude · 2015-03-06 18:24

Can't we just talk about it here so others can see what's going on? I would ask if we could talk about it on Github, but https://github.com/electrodude/esc has barely enough content to discuss anything yet. I would love to hear your ideas, but so would everyone else. If it's secret or something, or you just prefer email, tell me and I'll PM you my email. Either way, I'm eager to hear your ideas. Many of the features I have planned aren't thought out very well.

Where's @Heater? I was sure he would have replied to this by now.

Do @ mentions even do anything on this forum? I see some people using them but they aren't hyperlinks or anything.

Cluso99 · 2015-03-06 20:02

Has Roy's OpenSpin been released? Might this be a good place to start?

I would love to have the #ifdef etc from bst & homespun.
Also, definitely missing basic macro functions.
Dead code removal is another requirement.

Not sure if any of these are implemented in Roy's code.

Electrodude · 2015-03-06 20:47

I don't plan to have #ifdef or any preprocessor macros. I really like how D gets away with not having a preprocessor (or anything resembling one) but has other features that definitely make up for it, and I would like to do something similar. Here's a page on how the preprocessor is unnecessary in D. I might add #ifdef as an interim hack, but, ideally, Extended Spin will be able to do anything #ifdef can, but at a higher level than a preprocessor, after AST generation, during symbol resolution time. My macros will be much closer to Lisp macros than C macros.

I'm not sure what you mean by "released", but I seem to be able to get the source for OpenSpin here. It seems to now have a basic preprocessor. However, I'm pretty sure the internals of my compiler are going to work pretty differently from the original Parallax compiler. As far as I can tell, Chip's compiler's linker works by concatenating the object files together. Mine is going to completely resolve all symbols in all objects before giving anything anything like an address. There will be no explicit link phase.

I'm planning to have a total of four phases in my compiler. The first is the parse phase, which generates an AST. The second is the symbol resolution phase, in which all symbols, values, and sizes are calculated. The third is the address assignment phase, in which everything (which already has a known size) is assigned final address, method/object numbers, etc.. The fourth and final phase is the placement phase, in which everything is told to dump its contents into the buffer which will get loaded into the Prop. Optimizations will happen throughout. Dead code elimination and memory reuse will happen at the address assignment phase - unused things simply won't get addresses, and things that can be overlaid will get overlapping addresses. The distinction between the second and third phase is fairly blurry, and the two phases will probably be merged to an extent.

I'd really like to allow compiler-plugin-definable blocks, but that would involve mid-parse parser modification, which sounds really scary. Obviously, parser modifications wouldn't go into effect until after they are declared - otherwise, the parser would have to backtrack, leading to Fun problems, probably becoming Turing-complete and in danger of becoming intelligent and killing us all. I have code that splits Spin code into blocks, but, as I mentioned already, it sometimes won't work with curly brackets in inlined C because it will think they are comments and will choke when someone has "{{ { }} }" in their C code. The best solution I've come up with so far is to write my own parser and tell it that certain blocks are to be output as one giant token, and tell it that these giant tokens don't have any tokens in them. This sounds like a horrible abuse of a lexer/parser, but it might be the only way (and it might work!).

jmg · 2015-03-06 21:14

Electrodude wrote: »

I don't plan to have #ifdef or any preprocessor macros. I really like how D gets away with not having a preprocessor (or anything resembling one) but has other features that definitely make up for it, and I would like to do something similar. Here's a page on how the preprocessor is unnecessary in D. I might add #ifdef as an interim hack, but, ideally, Extended Spin will be able to do anything #ifdef can, ....

The problem with blazing a totally new trail is code cannot be designed to work in both systems.
That meas if there is already a #ifdef flow in Spin now (even if using an external pre-processor) thus adding support for #ifdef allows cross-compatible code, so is a good idea.

jmg · 2015-03-06 21:18

Electrodude wrote: »

I
I'm not sure what you mean by "released", but I seem to be able to get the source for OpenSpin here. It seems to now have a basic preprocessor. However, I'm pretty sure the internals of my compiler are going to work pretty differently from the original Parallax compiler.

There was discussion a while ago, about improvements to the ROM-side Spin interpreter in parallel with a new Spin compiler.

Clearly of limited use in P1 as the ROM comes for free, but there is real potential here in P1V & P2 to
* Include only what is actually needed, in the Spin interpreter (allowing other stuff to be added)
* Improve the performance
*

Electrodude · 2015-03-06 21:38

jmg wrote: »

The problem with blazing a totally new trail is code cannot be designed to work in both systems.
That meas if there is already a #ifdef flow in Spin now (even if using an external pre-processor) thus adding support for #ifdef allows cross-compatible code, so is a good idea.

Good point. I guess I should do it both ways then. I could always make #ifdef disablable through a command line flag. It's low priority right now, though, at least until I have something that can take something resembling Spin and compile it into something that will actually run on a Propeller. Unless, of course, someone decides that they want to be really nice and write it for me and PR it to electrodude/esc.

Does Windows have anything similar to Unix pipes? What if the preprocessor is a separate executable?

jmg wrote: »

There was discussion a while ago, about improvements to the ROM-side Spin interpreter in parallel with a new Spin compiler.

Clearly of limited use in P1 as the ROM comes for free, but there is real potential here in P1V & P2 to
* Include only what is actually needed, in the Spin interpreter (allowing other stuff to be added)
* Improve the performance
*

There are 5 PASM instructions at the high end of the Spin interpreter ($1E0-$1E4) that can be modified and then run through PNUT using the same register operations you use to mess with special registers and Spin state variables. I haven't really investigated this yet, but instead of including a whole new Spin interpreter, you could theoretically patch the ROM one like this. Or, if there's room, you could just include a custom interpreter...

A custom interpreter would be nice for many reasons. I've always wanted a variant of PNUT j1 (jmp/call) to calculate and push addresses for things, for finding the address of other objects and methods (and where the method table entry is). Run-time dynamic Spin linking and loading might actually be useful for large programs.

A new Spin compiler accompanied the modified Spin interpreter? Which one?

Cluso99 · 2015-03-06 23:36

I, and others, have used bst and homespun to do #ifdef extensively to compile to various hardware and options. Take a look at ZiCog to see what I mean. This was the best feature we had and reason we use bst and homespun. The implementation is subtly different. IIRC bst uses #define for all files whereas homespun is only local to each module.

Spin Interpreter:
There is only one spare opcode in spin. I thought it was $3F but my memory is a little foggy now. I wrote a faster spin interpreter that used hub for a decode table.
I use the shadow registers $1F0-$1F4 to house an LMM trace/debugger for both pasm and spin.
Hippy devised a way to trap the interpreter and insert new code into the interpreter.

Heater. · 2015-03-07 00:57

Electrodude,

Where's @Heater?

Heater has been out partying. Hey, it was Friday night. So he does not understand anything this morning

I have to read your plans again but:

1) You seem to want to be able to mix languages in a single program or even source file.

The problem with that is that while not extending or changing the Spin language you have created an incompatible system

3) You have identified 4 phases in your processing:

i) Parse.
ii) Symbol resolution
iii) Address assignment
iv) Placement

I do hope you have a code generation phase in there somewhere!

I have no idea how "placement" is different from "Address assignment"

4) I would like to suggest that whatever compiler you write you write it in JavaScript. Apart from being much easier to write than say C/C++ it means we end up with a tool that can be used anywhere in a browser as well as on the command line.

5) If you want to experiment with parsing and mixing up languages in a source file do try to create a formal grammar definition first.

That is to say define it as a syntax that can be understood and parsed by a tool like yacc/bisson just so you have the language specification pinned down, even if you never want to use such a tool in the final product. Actually, spend a few days playing pegjs http://pegjs.org/ you can play with the pegjs grammar generator tool on line here: http://pegjs.org/online. That will show you how easy or hard it is to define a language syntax. And no doubt, how hard it is to mix them together.

Electrodude · 2015-03-07 16:23

Code generation is sort of mixed in between all of the phases. The code is generated in tree form mostly during the parse and symbol resolution phases. Dead code is removed during the address assignment phase by not being assigned a final address. Most other optimizations occur somewhere during or immediately after the symbol resolution phase.

The address assignment phase is where everything is given addresses (linker phase), while the placement phase is where the completed tree/graph is finally turned into a flat binary image that can be loaded into a Propeller.

The placement phase probably lines up most closely with the code generation phase. I'm getting the feeling my compiler works way differently from any other. I'm not sure if this is good or bad.

Why would I write a compiler in JavaScript if I could write it in a real, compiled language? Why would you want to run a compiler on a web browser when you could use a real computer? If you really want a JS version, there's Emscripten. Even if I were to use a scripting language, why would I not use Lua over JS? Most benchmarks show LuaJIT to be 2-6 times faster than Node, Lua's syntax is way simpler than JavaScript's (and yet still incredibly powerful), and Lua's functions, objects, and environment make way more sense (to me at least) than Javascript's. I find it hilarious that you can't easily make a sandbox in Javascript (a language designed to run on the web), but that it's trivial in Lua. Also, Javascript seems to be missing some of Lua's most powerful features, like coroutines (but they were apparently just added) and metatables (for attaching methods to objects and doing operator overloads, among tons of other things like making light copies of tables).

I wrote a Flex/Bison parser for a language I designed and hope to eventually implement. It wasn't all that bad, but I'm sure pegjs would be easier. I suspect that Spin can only be comfortably parsed with a parser that has an integrated lexer and parser, if for no other reason than because of how wierd nested comments can get. For example, "{{ { }}", "{ { } }", and "{{ {{ } }}" are all perfectly valid, but "{{ {{ }} }}" isn't.

I just realized that I might be able to get a flex/bison lexer/parser to understand custom Spin blocks, by telling flex what the beginning of a custom Spin block looks like and then having it emit one giant token with the entire contents of the block. It's looking really promising and I will probably try that now.

Heater. · 2015-03-07 17:53

Electrodude,

Interesting stuff. You are obviously more adept at this compiler writing business than I am. The only compiler I ever managed to produce that actually generated working code (for x86 and Propeller) was for a very simple Pascal/C like language. It did not even have arrays or structures. The generated code was very inefficient. I was quite proud of the fact that I managed to write the whole thing in C without the help of anything like lex/bison. Even creating pegjs rules for CON blocks gives me headache:)

Why would I write a compiler in JavaScript if I could write it in a real, compiled language?

Only because then it is immediately usable on any platform with a browser or a JS run time like node.js. Now or well into the foreseeable future. No porting to do, no packages to make.

Why would you want to run a compiler on a web browser when you could use a real computer?

You can also run JS on a "real computer" using node.js or other JS run time. Same like you run Python or Lua etc.

If you really want a JS version, there's Emscripten.

I already did that: http://the.linuxd.org/lab/spine.html and msrobots made that into full up IDE http://the.linuxd.org/lab/spine.html. Sorry the version on my page cannot load and save files. msrobots has working demo somewhere.

why would I not use Lua over JS?

Because of that web browser delivery mechanism thing.

Most benchmarks show LuaJIT to be 2-6 times faster than Node

That may well be true. For luajit that is. I find it hard to believe as C/C++ code "transpiled" to JavaScript is approaching native speed now a days. But the speed of JS would be quite sufficient for building programs of the size we will expect. Do you have any links to such bench marks? I was thinking to try a JS vs Lua web server comparison.

Lua's syntax is way simpler than JavaScript's

May be. Its a long time since I looked at Lua. So I'm not going to comment. I remember I quite liked it lot though.

I find it hilarious that you can't easily make a sandbox in Javascript (a language designed to run on the web)

That is a bit of a problem. JS was designed when the browser was the sand box. It's quite a good one. Then people decided they want to pull all kind of code from advertising and analytics and God know what from God knows where in to that page you are loading. Fixing that seems to be a bit hard without bust everything we have on the web already. Still, you don't need a sand box for a compiler.

Javascript seems to be missing some of Lua's most powerful features, like coroutines...

I guess you mean those "generators" coming in the next JS standard. I have been scratching my head wondering what us they are? What is the use case for coroutines?

..and metatables..

You might have to explain what they are and why I might want them. Certainly you can add methods to objects in JS. What do you mean light copies of tables?

Having said all that it turns out that many folks are looking at running Lua in the browser. Like this chap: http://kripken.github.io/lua.vm.js/lua.vm.js.html. So, full steam ahead, no worries about any of the above!

Those pesky {...} comments are annoying. For my PASM in JS experiment I was thinking of stripping them out with a bit of pre-processing rather than messing with pegjs rules for them. That of course leads to the issue of reporting line numbers correctly. You can't just preserve any new lines in comments as that will break code that hides line feeds in comments so as to continue long statements. Grrr...

It's Time for a New Spin Compiler

Comments