PropTiny - Jack Crenshaw's TINY language for the Prop
heater
Posts: 3,370
Some time ago I worked my way through Jack Crenshaw's famous series of articles "Let's Build a Compiler". For those who know a little about programming but nothing about creating compilers this fabulous introduction to that black art. Even if creating a compiler is not your goal the simple parsing techniques described are worth a look at as they have many uses elsewhere.
compilers.iecc.com/crenshaw/
As I worked through the series I worked the examples in C rather than Pascal as used by Jack and generated code for the Propeller rather than the original target, Motorola 68000.
The result is an implementation of the TINY language defined in the series that generates PASM code for the Prop to be run under a Large Memory Model (LMM) virtual machine.
TINY is a very simple block structured language in the style of Pascal and all.
If anyone is interested the results of this effort are attached. See the enclosed README.txt for instructions on building the compiler and compiling/asembling/downloading TINY programs.
A simple example program with most TINY features implemented so far is included.
Just wish I had time available to grow this into the more useful KISS language (or similar) that Jack describes in the series.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
compilers.iecc.com/crenshaw/
As I worked through the series I worked the examples in C rather than Pascal as used by Jack and generated code for the Propeller rather than the original target, Motorola 68000.
The result is an implementation of the TINY language defined in the series that generates PASM code for the Prop to be run under a Large Memory Model (LMM) virtual machine.
TINY is a very simple block structured language in the style of Pascal and all.
If anyone is interested the results of this effort are attached. See the enclosed README.txt for instructions on building the compiler and compiling/asembling/downloading TINY programs.
A simple example program with most TINY features implemented so far is included.
Just wish I had time available to grow this into the more useful KISS language (or similar) that Jack describes in the series.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
zip
60K
Comments
I have been looking over the code (saw it on a previous post), and I have found both your implementation and Jack Crenshaw's series very informative. Thanks to you both (if Jack is still roaming around the forum)!
--Rich
Some years ago I managed to get a nice version of KISS compiling to x86 assembler, again it was written in C rather than Pascal. Supported signed/unsigned BYTES, WORDS, LONGS. The assembler output was GCC syntax for use on Linux.
Sadly I seem to have lost all that project when I had multiple hard disk failures a few years back. Otherwise I would have taken that changed the code generation to PASM. Perhaps I still have it buried in some backups somewhere, it has not turned up yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
I had been thinking of modifying TINY to emit PASM/Spin source (like Catalina) in order to get it to compile using BSTC (adding VAR and DAT sections as you had mentioned). Was there a reason that you had decided to use the propasm complier though?
Now that we have homespun and bst it seems more realistic to have Tiny emit a Spin header to create a proper object for compilation. That way we have a choice of compilers and can easily add the resulting object to a project with other drivers etc.
It would be great if you would do such a modification. I don't really have any time nowadays to work on it. Also it needs multiply and divide fixing up and a way to return values from functions. If you have a moment free [noparse]:)[/noparse]
Perhaps the code generation and the LMM kernel are very inefficient and could use optimization but I'm happy to keep the thing as simple as possible in the spirit of the original TINY.
One thing I did want to do though was add some/most/all of the Spin operators to Tiny as we have an audience here who are familiar with them.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
My long lost version of Crenshaw's KISS language generating code for x86 has been found.
As I mention above, some years back I worked through the Crenshaw tutorials and followed along by writing my compiler in C, generating x86 assembler for GCC to be run on Linux.
After a bout of hard drive failures on different machines in quick succession I was sure that I had lost it forever. But today I found it, in a archive within an archive on a barely functioning old hard drive that was swapped out years ago.
I had forgotten how far I got, this thing supports functions and procedures and signed/unsigned bytes, longs, words and there is quite a test suite for it. No arrays strings or structures.
This might be cheeky but I'm posting it here as a back up Well that and the thought that others may want to see it and that I'd quite like to make a version of a TINY like language for direct compilation to PASM to run in COG.
This also means that you can compile this to run under Zog on the Prop with a C3 or GG_SDRAM card. Which has the freaky result that you can now compile a high'ish language to assembler on the Prop that is targeted at your PC. Weird huh?
But here is the thing. In the back of my mind is to change the code generator to emit PASM for execution directly in a COG. No LMM or such. Probably changing the language as well, no recursive calls, simplified expressions, more operators, whatever, to make it PASM friendly.
Then it world be a C or Pascal like, block structured, language in which to write COG code. All it then needs is an assembler and then we have a system to build COG code on the Prop itself using Zog or Catalina.
I probably should not think about tis to much, I have a lot of other things to be doing.
Edit: Oops. I thought I was posting this to "The Forth Dimension" topic. I guess discussion of Tiny is okay in this topic. Sorry!
Good grief, this is a rave from the grave. That was all started a long time ago. Now I'm a demented old man who hardly remembers doing it
I recall the recursive fibo() was an essential test of function calling that just had to be done.
I'm pretty certain that all parameters were passed on the stack. Certainly I don't recall having registers at all.
I also seem to remember that I never got return values formally worked out. What ever was on the stack when the function returned was the return value. Or something like that.
Was there ever a benchmark? No idea. It's an exercise for the reader to build the compiler and generate the fibo test for the Propeller.
Oh, Smile, thanks, now I'm going to have to dig that code out and try to figure out where I got with it.
There is a David Betz in 2012 apologizing for posting to the "The Forth Dimension" thread when in fact his comment was on topic here.
Then there is this other David Betz who claims it was not him making that post.
Then there is me, who is only slowly remembering I was ever here
Anyway, I'm just searching all my machines for the last possible versions of PropTiny and such. I'd rather have this stuff in github so I don't have to worry about it any more.
@David, (or the other David)
The emphasis here is on "tiny". That is, a very small language syntax and semantics, also a very tiny understanding of compiler technology. So, the language is dead simple, which makes the lexer/parser dead simple and the code generation is dead simple. Stack based is the easiest way to go (Ask the Forth guys). Optimizing that and getting to register allocation, spilling etc is, I guess, an order of magnitude more complex.
So, no, my compiler does not do that. Frankly, I was amazed that I got it to work at all!
But, whilst we are here. Recently I have been using a parser generator to generate parsers for some funky text based data formats we have to deal with. Such a parser generator can be used to parse a programming language and interpret it on the fly or create an AST for later code generation perhaps with intermediate optimization phases.
The parser generator in question is pegjs http://pegjs.org/. Yes, I'm toying with making a parser for Spin in in JavaScript. Perhaps a small step toward a Spin compiler.
It's not clear to me yet how pegjs can handle the white space block scoping of spin though.
Let us into the joke. What is the "Heh...heh...heh..." about?
Anyway thanks for the doc. All the Crenshaw articles in one place.
Jack Crenshaw did honour us with his presence on this forum once or twice. Have not heard from him since.
The Sphinx version of Meta2 was also called Ouroboros
TINY language is merely a 391 page document to read, not exactly tiny. Nonetheless, it is an interesting item I'd not looked at before. I had tried Yacc and Bison tutorials.
Ah, yes, I see what you mean about the "tiny" paper.
Still the Tiny language as described is very small conceptually only requires a small number of lines of code to implement.
Crenshaw's compile paper is long because he spends a lot of time discussing the pros and cons of all kind of language features and how they have worked out historically in actual language designs. For example, the simple problem of comments in code, do you want line comments with "//" or block comments with "/*" and "*/" or something completely different? Up to more advance design decisions, do you want types or not, if so what kind of types?
In reality there is no Tiny language. Jack presents a lot of options for languages features and examples of how they may be implemented in simple code. He encourages the reader to learn the concepts and design their own language not just recreate what he has in Tiny. Or just use what they have learned in other problems that require lexing and parsing.
Thing is, if you are not a computer scientist and have never studied such things, picking up most books on compiler writing is pretty hopeless. Totally impenetrable. But Jacks discourse is understandable by anyone who has learned to write code in a language like Pascal or C etc. He skips the huge complexity of creating abstract syntax trees, intermediate representations, optimization etc etc. He leaves you with something you can build for your self in a reasonable number of hours.
Let's Build a Compiler is brilliant. A highly recommended read.
In truth, it just might drive me to study Chinese more.... actually making a lot of progress with that lately after 20 years of immersion.
Perhaps you were not aware of alternatives to YACC and BISON. I will guess though that you may have written a parser before. If you have ever received ASCII message from a serial line or text from a file and extracted data elements out of it then you have been writing your own hand made parser for whatever the syntax of the message/file was.
The "trick" for me in Let's Build A Compiler was to show how simple it can be to actually generate code for a target processor as the parser parses. How it does not actually require all that heavy weight intermediate representation, ASTs, etc.
There is a point of view that all software is a kind of language parser/compiler. Programs take input, process it and produce output. That input has a well defined structure/format that can, if you squint hard enough, be seen as just another language to be parsed and responded to accordingly.
I too have looked at YACC/BISON and did not like the look of it. Just now I have been playing with a very simple parser generator. You can feed grammar definitions into it that look like this: I think you might immediately see how that defines what integer values look like in your language and then what a simple addition expression looks like. The bits in the curly brackets specify what to do when finding the defined construct in the input text. In this case it immediately performs the addition and returns the the result. The parser code generated from this is an interpreter for your language already! With effort one could have it generate code on the fly for some processor and it would then generate a compiler for you.
This is the pegjs parser generator, you can read about it here: http://pegjs.org/ and try it out with an online version here http://pegjs.org/online. If you simply remove all those bits in the braces it will simply return a data structure that is your abstract syntax tree that can then be processed more like a normal compiler. It really is quite magical.