Spin development tool chain

ruiz · 2012-02-26 13:13

Repost from http://forums.parallax.com/showthread.php?137452-Open-source-Spin-PASM-compiler-in-C-C/page7

Last December I came across the Propeller chip and thought it was interesting. A bit later I came across the propeller based retro computer board 'Hive' and decided to build one: it would be a nice base to do a PDP-11 emulator and make the Hive board run 1970's research unix. Porting one of Fabrice Bellard's C compilers was a relatively quick affair but taught me one thing: debugging tools for PASM/Spin are hard to find and often below par or lack an integrated tool chain. The cluso/jazzed debuggers seem to be okay, but are struggling for compiler support to work; Spinsim seems a good simulator.

What was needed in my opinion was a Spin/PASM compiler that would accept a debug flag and do all the fiddly bits to debug a program. I figured that with Spin being like 'B', with some Pascal blended in and a touch of Python, such a compiler would not be difficult. A BCPL to o-code compiler is about 5000 lines, as is a Pascal to p-code compiler. With the Sphinx compiler available as a guide, I figured I could pump out such a compiler in 10 evenings.

Two days later Roy announced his compiler. After thinking about it for 2 days, I figured there were many reasons for still pushing ahead and so I did. It took about 20 evenings, as I had overestimated how well I understood the PNUT engine, and the expression syntax is somewhat unusual/undocumented. It did come out at about 5000 lines though, and is written in simple C, so that it can compile with Catalina or other, simpler propeller C compilers.

The source is public domain and attached; I will set up a repository shortly. My first goal is to polish the code and to add the debugging features, my second goal is to add dead code elimination

@michael park / sphinx:
- thanks for releasing sphinx - it has saved a ton of reversing engineering work on spin & pnut!
- there is a bug where 'quit' generates a 'jmp' rather than a 'jnz'
- currently I don't see a simple, elegant way to do dead code elimination in a two-pass architecture, but I dislike the idea of going three-pass; penny for your thoughts...

@david hein / spinsim
- thanks for providing spinsim, it has been a great help in developing and debugging hsc!
- there is a bug in spinsim where the 'spr' opcode does not increase the register # with 16
- related, there is a bug where pnut register 0-15 are not handled properly (great loophole for monkey patching pnut on a real propeller, by the way)

@brad / bst
- for example "|<23" is accepted as void expression and leaves garbage on the stack
- a := @a? is accepted and generates strange byte code

@parallax
- what is the intended meaning of spin expression '++a+++b'? It could be '+ + a++ + b' or '++a + ++b' . Or is it intended to be illegal?
- do I now get to be a 'star contributor' :-))

@cluso/jazzed
- your debuggers seem to be a good base to work from. Wouldn't it be nice if one could just type 'hsc -g myprog.spin' and then 'db myprog.binary' and end up in a gdb-like debugging session, switching between spin and pasm, and between cogs as the debugging process may require?
- your help would be gratefully accepted!

@phiphi
- if you want to experiment with syntax & semantic variations for object import, have a look at obj.c: it is short and accessible and should offer a good base for experimentation

Regards,

Paul

Build notes:
- to build hsc just type "gcc -o hsc *.c"
- the code is 32/64 bit clean
- when compiled with "-m32 -Os" options, it is a 40kb binary on x86
- it was developed on osx, but should work on linux and windows
- output file is always named "spin.binary"

Cluso99 · 2012-02-27 17:04

ruiz: Nice work! Unfortunately I know nothing about gcc and gdb but I am willing to help with the debugger in any way I can.

BTW Hanno's ViewPort can do some form of debugging etc using the pc and a high speed serial link. I have not looked at this. It is basically a commercial product.

Roy Eltham · 2012-02-27 17:18

ruiz,
When I compile my stuff with gcc under MinGW, it ends up creating a dependency on a run time lib dll (I forget the name of the dll). What's the option to static link that?
That's part of the reason I currently build my exe with VS2008, because I can set it to static link everything and have no external dependencies.

Roy

ruiz · 2012-02-28 03:14

Some reposts, just to get all lines of thought into one thread:

>>>About the test harness:

I'm not sure what Roy's plan is for a test harness, but my plan is as follows:
- have a bunch of files with small test case spin files
- each file has the 'right' result in a comment, including compiler rejection
- compile each test with hsc, compiler exits with an error code if there is a syntax etc. error
- run output file against spinsim, test return value against expected result
- use access to dcurr to test for push/pop balance
The test harness would be a simple shell script iterating through the test file, using eg. grep or sed to extract the 'right' result

I figure a few hundred tests are sufficient for confidently refactoring and extending hsc.

>>> on debug info

jazzed wrote: »

The SPUD debugger can easily lookup BSTC list file lines generated with the -ls option, so I guess that would be one output target of a spin compiler. SPUD gets lost in the call stack sometimes though and some operations are not supported (random?).

Getting debug info from parsing a listing file is the world upside down and I had to write hsc to get around that issue: a proper compiler should generate usable debug info, period.

Let's forego smart data for a bit, and look at what the debugger needs:
- Find the right source line from a pcurr value. The compiler can generate a 32K array with the line info for each possible value. If the the file cannot be found at the path given, the debugger simply gives up
- Find the value of a symbolic name, given pcurr, dbase, vbase and pbase. Imagine that the compiler generates a list of all names in its symbol table, including local symbols. A name can appear more than once, but each name has a range for the first and last value of pcurr where it is valid. Each symbol has an associated offset versus dbase (local vars), vbase ('global' vars) or pbase (dat vars and labels)
- Backtrace the call stack. Dcurr always points to the start of the current stack frame and this has the previous values of pcurr, dbase, vbase and pbase.
Of course, one would use a smarter encoding in a real compiler/debugger combo.

The only tougher thing -- given a file with proper debug info -- would be finding the value of the counter in REPEAT <expr> statements, as that would require the compiler to keep track of the scratchpad stack to locate the counter location. As it is anonymous, I'm not sure how the programmer would reference it, perhaps by using the keyword COUNTER and the line number of the repeat statement that defines it.

>>> on next steps

Rayman wrote:

ruiz, I think your code is great, nice work. I hope you post it to the forum in a new thread with that fix.

I have already done that, but it sits in the forum moderation queue. Send me your e-mail address and I will send you the source by return.

I'm not sure what causes the 4 byte size difference, but I think it might be differences in struct packing between gcc and visual studio. I have not bothered with MS tools after MSVC 6 and I am not planning too, so I need you input in getting that compile to work.

Note that I am not aiming to generate byte code that is byte for byte identical, although in its current state it will typically do so.

Thank you for your nice comment, but there is a ton of polishing that I still need to do:
- cleaning up the lexer, splitting the lexing and the utf stuff over two files ("one file per concern")
- adding a pool based allocator to replace some of the fixed size tables
- replacing fixed length string arrays with a global string table
- better abstraction of the backend (interface to pnut.c and link.c, splitting parse.h into parse.h and pnut.h)
- merge the constant expression code with the regular expression code -- having two expression parsers is non-sensical
- ....

And that is not counting the enhancements
- adding a preprocessor to the lexer
- generating a debug info file
- generating a listing
- adding dead code elimination
- ...

ruiz · 2012-02-28 03:27

Roy Eltham wrote: »

When I compile my stuff with gcc under MinGW, it ends up creating a dependency on a run time lib dll (I forget the name of the dll). What's the option to static link that?
That's part of the reason I currently build my exe with VS2008, because I can set it to static link everything and have no external dependencies.

Well, I'm rusty on windows but perhaps the following is of help:
- I'm sure you already tried '-static'
- I think even with -static the resulting binary still depends on 'msvcrt.dll', and through that on 'kernel32.dll'. These files ship with every copy of windows for the last 10 or 15 years so there is no point in linking them statically; I even doubt that windows itself can run without them.
- Perhaps VS2008 is different, but MSVC6 cannot produce a dll free executable: when specifying a static build 'kernel32.dll' is still a dynamically linked dependency.
However, -static should statically link in the C++ runtime, which is where typically the problems are (many versions, none ships by default)

Roy Eltham · 2012-02-28 09:54

I should have said "no external dependencies except the obvious windows ones like kernel32.dll".

The one that was happening when I used MinGW's gcc with default options was libgcc_s_dw2-1.dll (which is a 116KB). I had not tried -static, thanks.

When I compile using gcc that comes with MinGW without -static my exe is 157KB, when I compile with -static my exe is 298KB. I tried compiling using the same -m32 -Os options you mentioned and my exe ends up being 132KB.

When I compile using VS2008 in release with static linking of the runtimes my exe is 169KB. Hardly anything uses plain old msvcrt.dll anymore. There are newer versions with each compiler release (and sometimes with service packs or other updates). So you can't rely on the correct one being installed on every machine. Either you have to get the end user to install the proper one, or use static linking of the runtime.

ruiz · 2012-02-28 12:53

Roy Eltham wrote: »

When I compile using gcc that comes with MinGW without -static my exe is 157KB, when I compile with -static my exe is 298KB. I tried compiling using the same -m32 -Os options you mentioned and my exe ends up being 132KB.

I think you may have forgotten to 'strip' the executable, which should take it down to about 100KB, which is what I would expect for your code base.

Hardly anything uses plain old msvcrt.dll anymore

I doubt that's true, but even if it were it doesn't change the fact that it is present on every windows box out there, just like kernel32.dll. Anyhow, I am happy that you now have two choice for dependency free executables.

When I compile using VS2008 in release with static linking of the runtimes my exe is 169KB.

Wow... MSVC6 used to generated slightly smaller code than gcc. I had not realised Microsoft had fallen so far behind in compilers. I certainly hope that the current Sinofsky reorg will get them back on track again.

pedward · 2012-02-28 13:27

If what I think is happening, is happening, you are using something like Cygwin, and they have a compatibility DLL that provides all of the Posix interfaces to Windows. This does not produce a Windows pure executable, but compiles for Unix and uses the DLL to provide the translation.

I just looked up MinGW, and it purports to not need any 3rd party DLL files, other than the standard MSVCRT. However I wonder if you compiled with debugging support and that's what the extra DLL is for?

Maybe you need BC4.5.2

Roy Eltham · 2012-02-28 13:55

I guess you could say I forgot to "strip" the exe, but it's more like I didn't strip it because I've never stripped an exe before.

The VS2008 release build being 169KB was not optimized for size, so it could probably be smaller. It's also static linking it's version of the msvcrt90.

Rayman · 2012-02-28 14:15

ruiz, I'm going to try your compiler again, is the top post here the latest version now?
(It's not stuck in moderation anymore, right?)

I'm going to try Roy's too as a just noticed he's posted an alpha version...

jazzed · 2012-02-28 14:22

Rayman wrote: »

(It's not stuck in moderation anymore, right?)

Rayman do you know what it means?
I'm scratching my head over this since ruiz mentioned it a few days ago.
I don't thik I've never seen an upload stuck in moderation.

Thanks,
--Steve

Phil Pilgrim (PhiPi) · 2012-02-28 14:25

There is no moderation queue. Posts are only moderated, if necessary, after the fact.

-Phil

jmg · 2012-02-28 14:46

ruiz wrote: »

currently I don't see a simple, elegant way to do dead code elimination in a two-pass architecture, but I dislike the idea of going three-pass; penny for your thoughts...

Why the aversion to three-pass, on today's PC's ?? How many milliseconds do you expect that to cost ?
(and it will be optional anyway...)

Roy Eltham · 2012-02-28 14:49

Phil,
The moderation thing he's talking about is the vbulletin thing where new users have their first few posts moderated somehow. I ran into this on another forum. After the first few posts "cleared" then I was able to post normally.

Phil Pilgrim (PhiPi) · 2012-02-28 14:56

How is it, then, that spam posts from new users are able to get by so easily?

-Phil

Dave Hein · 2012-02-28 14:59

jmg wrote: »

Why the aversion to three-pass, on today's PC's ?? How many milliseconds do you expect that to cost ?
(and it will be optional anyway...)

Modern compilers do many passes to implement pre-processing, intermediate language generation, low-level language generation, assembly generation and the final assembly step. There are also optimization steps and linking. Using multiple passes makes each pass easier to code.

It would be nice to add an optimization pass that could convert "x += 1" to "x++", or merge multiple statements into a single statement to eliminate reduncant stack loads.

Phil Pilgrim (PhiPi) · 2012-02-28 15:00

ruiz wrote:

currently I don't see a simple, elegant way to do dead code elimination in a two-pass architecture, but I dislike the idea of going three-pass; penny for your thoughts...

Dead code should have been eliminated as a preprocessor function. So the compiler would never see it anyway. In fact, most language enhancements can be performed in a preprocessor, including optimizations. There's no reason to mess with the compiler code, in most cases, as long as there's a mapping from unoptimized Spin code to optimum code.

-Phil

pedward · 2012-02-28 15:01

Roy Eltham wrote: »

I guess you could say I forgot to "strip" the exe, but it's more like I didn't strip it because I've never stripped an exe before.

The VS2008 release build being 169KB was not optimized for size, so it could probably be smaller. It's also static linking it's version of the msvcrt90.

In Unix, -g debug doesn't do much more than add symbolic information to the object file. However, in windows it may be that -g debug causes the compiler to add preamble information that requires an external DLL to glue everything together. It's best to compile without -g. Stripping an executable only makes it a bit smaller, but it also removes any symbolic information that can be used for bug reporting.

Roy Eltham · 2012-02-28 15:21

Phil Pilgrim (PhiPi) wrote: »

How is it, then, that spam posts from new users are able to get by so easily?

-Phil

Maybe it's a feature that was off before, and was recently turned on?

Phil Pilgrim (PhiPi) · 2012-02-28 15:29

'Has to have been since this weekend, if it was, since there's been reported spam during that interval. Plus, Browser has not received any premoderation requests via email. So I doubt that the feature has been enabled.

-Phil

jmg · 2012-02-28 15:49

Phil Pilgrim (PhiPi) wrote: »

Dead code should have been eliminated as a preprocessor function. So the compiler would never see it anyway. In fact, most language enhancements can be performed in a preprocessor, including optimizations. There's no reason to mess with the compiler code, in most cases, as long as there's a mapping from unoptimized Spin code to optimum code.

-Phil

Depends on the type of 'dead code' - a preprocessor is useful, and can reduce the work of the compiler, but there are some removals/optimize that are best done close to the binary image.
eg Library calls, and linking, can mean you do not know what is 'dead code' until near the end, and local variable overlay is also a late operation.

- and on resource limited parts like Prop, all the optimize you can get matters... especially if it is robotic-housekeeping optimize, which PC's do so much better than humans...

Phil Pilgrim (PhiPi) · 2012-02-28 15:54

My CLEAN pre-processor eliminates unused methods in all external objects. That's about as close as Spin comes to "library calls."

-Phil

ruiz · 2012-02-29 04:34

Why the aversion to three-pass, on today's PC's ?? How many milliseconds do you expect that to cost ?

It is not an issue of speed, it is an issue of architectural beauty. Now beauty is in the eye of the beholder, so you could say it is a personal issue :^)

Actually, it seems that there is no need for a third pass, here is the algorithm:

- Assume that (cross) recursive object importation is illegal. The Spin manual is silent on the topic, but it seems reasonable. BST correctly flags the condition, PropTool bombs out with a nesting depth error, and I just noticed hsc bombs out with an obscure symbol name error; it is structurally impossible in Sphinx.
- Under that assumption, it is possible to sort all the objects in a program so that no later object refers to an earlier object.
- Now we compile the objects in sorted order. Every time a reference is made to an object method it is marked as "alive" in the symbol table.
- When a method is due to be compiled, its "alive"/"dead" flag is checked and when dead its compilation is skipped. The import index table in earlier objects would have to be fixed up, so it is cheating the '2 pass' thing a bit -- but a related fix up already occurs in the current code (see post to Dave below).

Leaves the issue of "aliveness" within a single object. This can be handled in pass 1, where every identifier that appears in the body of a method is flagged as "alive". This would also allow for warnings about unused variables. It would not catch all dead code (it won't mark as dead a group of (cross) recursive methods that cannot be reached from the main method), but I guess this is a rare condition in real code.

The above algorithm would only add some 100200 lines to hsc. It is still brewing in my head, I think it can still be improved upon -- it may even be incorrect.

ruiz · 2012-02-29 04:46

@Dave

You are absolutely right. Actually, I'm cheating with the the '2 pass' claim, because the pnut code generator makes 2 passes over the method AST and the linker makes a pass over the compiled binary to fix up object base addresses -- I guess nobody has actually read the code :^) The 2 pass claim is correct however in the sense that only two passes over the source code are made.

Currently, I chose to build an AST per method, as this is sufficient for most little optimizations:
- efficient SDI resolution
- eliminating jumps to jumps, returns or aborts
- constant propagation, strength reduction, cse elimination, ...
- pnut specific optimizations (var[0] to var, x:=0 to x~, etc.)

One of the things that I am pondering is whether to re-architect the code base from building an AST per method to building a full program AST. It would make it harder to run hsc on the Hive board itself, which I had in the back of my mind, but it could have many benefits as well:
- it might slice some 500 lines of fat out of the current code
- it makes dead code / dead var detection a trivial routine
- it would allow to trivially implement the inlining of accessor methods

merge multiple statements into a single statement to eliminate reduncant stack loads

I'm not sure what you mean by this, please explain.

mpark · 2012-02-29 04:53

Kudos, ruis! I'm glad you were able to get something out of Sphinx.

Are you saying you have a Prop emulating a PDP-11? Do tell. And yeah, definitely worth starting yet another thread for that

ruiz · 2012-02-29 12:42

@mpark
Good to see you in this thread. No, I don't have the pdp-11 emulator (yet). As a precursor I ported Fabrice Bellard's fbcc compiler to the Hive because it can do both K&R and ansi -- kinda handy when you want to play with research Unix. It was easy because fbcc compiles to a VM which is quite similar to ZPU/Zog. The source is in a thread on the Hive forum. However, I did learn that the available debugging tools would be inadequate to comfortably develop the pdp-11 emulator. A proper tool chain required a Spin/PASM compiler that could output debug info -- that did not exist and did not seem to come about: reading through half this forum nobody ever seemed to find comfortable debugging a priority -- possibly except 'jazzed' and 'cluso99'. So I figured I needed to write a compiler/debugger combo to get the pdp-11 project done comfortably. Kinda like developing gdb because you want to debug emacs :^)

Actually, it would be nice to compare notes on debugging info and file formats, so that Roy's compiler, homespun and hsc can use the same formats. Pity that Brad is MIA. A first stab at the required info is in an earlier post to jazzed. I've been reading up on the Unix DWARF format to see prior art. It has a nice algorithm to compress the line number table to about one byte per instruction. Symbol table compression seems to be no more than bit-stuffing and alike.

Have a read through the hsc source code: I think you will find it has a few tricks that translate well to Sphinx (and possibly homespun if it has architectural similarity).

@jazzed
If you are in the final push to complete PropGCC/GDB you are probably up to your eyeballs in DWARF. Do you have time to compare notes?

Paul

Spin development tool chain

Comments