Spin formal grammar specification?
Heater.
Posts: 21,230
OK who did it? Who put a star on my pasm.js github repository? https://github.com/ZiCog/pasm.js. This thing has only just started, does nothing yet, and may never go anywhere, but it gets a star! Own up, I know it was one of you.
What is this all about? It's about a formal specification of the syntax/grammar of Spin. In the manner of the BNF specs. for many other programming languages since ALGOL. pasm.js is where I started experimenting with creating such a grammar for the parts of Spin required to build a PASM assembler. The grammar is specified in the language of pegjs. pegjs is a parser generator that generates a parser in JavaScript for the given grammar specification. One may not ever want a PASM assembler in JavaScript but I though a formal Spin language spec might be interesting and pegjs is as good a way to write it as any with the bonus that it is testable against real Spin source.
How does this work?
1) Write your languages syntax specification for peg js. For the CON section of Spin that might look like this:
2) Compile that specification into a JavaScript function to parse that syntax. Use the pegjs parser generator.
CON a = 5, b = 6
#1,x ,y[2], z
looks like so:
So far I have something that parses DAT and CON blocks in some fashion. You can play with it from the code on github or play with creating your own language syntax parser with pegjs using the interactive onnline pegjs tool http://pegjs.org/online
What is this all about? It's about a formal specification of the syntax/grammar of Spin. In the manner of the BNF specs. for many other programming languages since ALGOL. pasm.js is where I started experimenting with creating such a grammar for the parts of Spin required to build a PASM assembler. The grammar is specified in the language of pegjs. pegjs is a parser generator that generates a parser in JavaScript for the given grammar specification. One may not ever want a PASM assembler in JavaScript but I though a formal Spin language spec might be interesting and pegjs is as good a way to write it as any with the bonus that it is testable against real Spin source.
How does this work?
1) Write your languages syntax specification for peg js. For the CON section of Spin that might look like this:
start = (CONBlank / CONAssignments / CONEnumerations / (white* EOL))* conCON = "CON" CONBlank = conCON white* EOL CONAssignments = conCON? white* conAssignmentList EOL conAssignmentList = constantAssignment white* "," white* conAssignmentList / constantAssignment constantAssignment = symbol white* "=" white* constantExpression CONEnumerations = conCON? white* conEnumerations conEnumerations = "#" white* constantExpression white* "," white* conEnumerationList EOL / "#" white* constantExpression EOL / conEnumerationList EOL conEnumerationList =symbol white* conOffset? white* "," white* conEnumerationList / symbol white* conOffset? conOffset = "[" white* constantExpression white* "]"I have omitted the definition of constantExpression as the specification of that is rather long winded.
2) Compile that specification into a JavaScript function to parse that syntax. Use the pegjs parser generator.
$ pegjs con-grammar.pegjs con-parser.js3) Use the resulting parser function to parse some Spin CON section:
try { ast = parser.parse(conSectionText); console.log(ast); } catch (e) { console.log(e); }Bingo! you have an Abstract Syntax Tree (AST) that represents the input source. The AST could then be passed into a PASM code generator. The AST for a simple CON stection like
CON a = 5, b = 6
#1,x ,y[2], z
looks like so:
[ { "line": 1, "CON": true, "ASSIGMENTS": [ { "column": 5, "symbol": "a", "expression": 5 }, { "column": 12, "symbol": "b", "expression": 6 } ] }, { "line": 2, "CON": null, "ENUMERATIONS": { "HASH": 1, "enums": [ { "column": 4, "constant": "x", "offset": null }, { "column": 7, "constant": "y", "offset": 2 }, { "column": 13, "symbol": "z", "offset": null } ] } } ]N.B. The syntax spec above does not show the parts required to output a nice AST.
So far I have something that parses DAT and CON blocks in some fashion. You can play with it from the code on github or play with creating your own language syntax parser with pegjs using the interactive onnline pegjs tool http://pegjs.org/online
Comments
You might take a look at the OpenSpin compiler. It's based on Chip's assembly version which he said was recursive descent. That being the case, there will be an almost one-to-one correspondence between the recursive procedure calls and the underlying BNF grammar.
-Phil
OpenSpin is a wonderful thing. Roy has done a very good job. OpenSpin is what we compile to JavaScript so as to create web based Propeller Spin IDEs. Very soon one will be able to program a Propeller with just a Chrome book or a Chrome browser. Enabled by OpenSpin.
See here: https://github.com/ZiCog/openspin.js
and herehttp://the.linuxd.org/lab/spine.html
and here http://the.linuxd.org/lab/editor17/editor17.html
But, putting my language purist hat on, I have to say that: "...based on Chip's assembly version.." is not good.
What I mean is that an implementation is not a specification. That is to say that OpenSpin should have been written from the language specification rather than reverse engineering a pile of x86 assembler.
What if someone wants to write a Spin compiler in Java or Pascal or whatever? Should they start again from Chip's x86 assembly language code? Should they start from Roy's C++ interpretation of that?
By the way, as far as I know that x86 "Gold Standard" is not publicly available.
Now, many years ago there was no Spin IDE or compiler for us Linux users. To that end I started to write a compiler for my own toy language into LMM. That I could use for the Propeller. Well, the output of that compiler was PASM source code. How to assemble that into actual binary? I used The PASM assembler by Cliff Biffle.
Then I wanted to write my own assembler. Then I discovered that there is no formal definition of Spin and creating such an assembler is a case of guessing the Spin syntax/semantics by trial and error. Unlike BradC and mpark a gave up.
Now I am a revisiting the whole idea.
The latest version is on github: https://github.com/totalspectrum/spin2cpp
Ah, thank you for that heads up. I guess you mean spin.y in that repo.
Now, forgive me for being new to all this but in the past I have looked briefly at flex/bison. My conclusion was that I'm to dumb to have any idea what is going on there and I soon gave up the idea.
When it came to parsing the source code of my own Noddy language it seemed much easier to just write the parser by hand in C!
spin.y is totally unintelligible as a language specification.
No doubt I am missing a point here.
Of course you have no bug reports, nobody uses spin2cpp
It's not that bad, actually. Yacc is basically BNF with some C code attached that's run as the rules are matched.
I think that's a bit of an exaggeration (probably only a bit). I agree it very much could use more comments, but the basic rules are straightforward. In fact if you just delete all the C code (the stuff inside curly brackets) you'd end up with a simple parser that could verify Spin code without actually doing anything, and really wouldn't be much different from a BNF specification of Spin. Yacc is pretty widely used, and it's worth learning. I'd say it's kind of like the C of parser generators. It's ugly, but it's fast, efficient, and available everywhere.
Probably all too true. :-(
However it seems to say nothing of the semantics only the syntax. For example if you look at the definition of an expression, starting at line 630 in spin.y, it seems to have no idea of operator precedence. Only the syntax of an expression. Well I guess something else takes care of how to evaluate "1 + 2 * 3" in the expected order.
pegjs can parse the syntax and also pull out that kind of semantics. In fact the example on the pegjs.org/online site does exactly that. Straight from source to interpreting the result!
Anyway, given that we already have a bunch of Spin source to Propeller byte code interpreters, the old Prop Tool, BST, HomeSpun, OpenSpin, etc. A more interesting question might be how to compile Spin to any target? As we do with C.
I can imagine a Spin parser, defined by YACC or whatever, that produces an AST for LLVM. Then we can compile Spin to x86 or ARM or whatever back ends LLVM has!
I'm a bit curious about expressions. spin.y seems to treat all operators as equal. But as we know operators have a precedence. spin.y treat operators in a very flat looking way whereas my pegjs parser pulls out the precedence and can build an AST to evaluate them in the correct order. Spin has 12 levels of operator precedence. Where does that get sorted out?
I'm sure I am missing a point or many here.
It's an exaggeration to say that the entire Spin language is defined in spin.y -- it's just (most of) the syntax. There's also some low level token parsing in lexer.c (that parses things like identifiers and numbers) and the preprocessor in preprocess.c. After these have done their things we're left with an abstract syntax tree, which can then be walked to do something else (in spin2cpp's case generate C/C++ code, but in theory you could compile to bytecodes instead).
I got lazy and used the built-in operator precedence handling of bison (and I think other modern yacc's too -- I'm pretty sure I was able to build spin2cpp with byacc). There's a table of operators near the top giving their precendence and whether they are left or right associative (%left and %right, and the order of declarations gives the precedence, with ones that have the same precedence being on the same line). It is of course possible to handle the operator precedence in the rules, it just means adding some more of them.
Or we could imagine converting Spin to some kind of portable assembler, like C or even C++, that could then be compiled on any target with a C compiler. If only there were a tool that could do that :-).
This get's curiouser and curiouser.
As far as I understand the syntax alone is is not enough. Somewhere something has to understand that "1 + 2 * 3" is not only valid syntax but that the expected result is 7 not 9. The pegjs parser generator can do that in one go. From source to AST.
I don't see that in in spin.y. Perhaps I'm missing something.
I'm kind of curious as to how you built the Spin syntax from the available documentation in the first place. It all seems very woolly to me.
Now, I have no intention of diving into actual Spin PUB and PRI methods. But I'm now wondering how on Earth one would deal with the white space indentation with a parser generator like pegjs designed for recursive decent parsing of "normal" languages.?
How far does spin2cpp go?
I mean, does it actually fire up new threads to run what would be COGs running Spin methods as separate processes or what?
-Phil
Well, for sure existing implementations have become a standards. Netscape made JavaScript, warts and all. Microsoft copied JavaScript from Netscape, warts and all. When it came time to standardize JavaScript all those "warts" were baked in. Now the world complains about the JS warts!
Even the simple CON blocks do not lend themselves to recursive decent. They are very line oriented.
As are the DAT blocks of course.
Constant expressions are of course a recursive decent case.
As, no doubt, are spin methods.
-Phil
Not well suited to a tool like pegjs, but never mind. With a little effort that white space indenting of Spin can be accommodated.
But, whilst trying to describe such a simple thing as a CON block you find out how incomplete is the syntax specification in the Propeller manual.
Hence this thread.
It is very wooly, and probably not 100% compatible with the actual Spin compiler. Spin is, unfortunately, under-specified (which is exactly why you started this thread, I'm sure!)
I handled it in lexer.c (the thing that generates tokens like T_IDENTIFIER that Yacc operates on). I keep track of the whitespace indent level there, and when it changes the lexer outputs either a T_INDENT token (when the indentation increases) or T_OUTDENT (when it decreases). The parser can then use those to mark blocks of code.
Eric
It generates a library call to _coginit(). You'd have to provide that function on another platform. Which then opens up the can of worms of what to do with PASM code. Spin2cpp just compiles the PASM to binary blocks. You'd need something like Dave Hein's Spin emulator to run the PASM code on a non-Propeller machine. Spin without PASM isn't very useful in practice.
Yeah, I know, what nut head would want to do that?
But what about compiling PASM instructions into sequences of x86 instructions? The PASM opcode could be used to vector out to whatever sequence implements that instruction.
There would be no idea of any timing determinism but it would sure fly!