pbtc -- an open source PBASIC tokenization toolchain
AJ Milne
Posts: 12
As per the previous thread, this one has its own tokenizer, so should be fully buildable on any target system that has:
. a decent C++ compiler with STL support (this is any halfway recent g++/gcc)
. a libpcre port (Perl-compatible regular expressions library for C/C++; this dependency may go away in not much longer, but I'm pretty sure the lib's pretty ubiquitous anyway)
. Flex and Bison ports
Note, however, that while it may be of interest at this stage for the deeply binary curious, it is _not_ at all ready for average end users as yet. It generates valid code, but only for a subset of the higher level language, which isn't even PBASIC-proper yet (though the migration to this should be pretty painless from here). It's just up in a pre-alpha state for anyone interested in contributing, mostly.
The repository is at:
http://sourceforge.net/p/pbtc/git/ci/master/tree/
... and to get some idea of the higher-level language support progress, see especially the Bison input at:
http://sourceforge.net/p/pbtc/git/ci/master/tree/pbp/pbas.y
Thanks all. Feedback appreciated, but, again, bear in mind: this is not at all yet expected to be an end-user-friendly tool.
. a decent C++ compiler with STL support (this is any halfway recent g++/gcc)
. a libpcre port (Perl-compatible regular expressions library for C/C++; this dependency may go away in not much longer, but I'm pretty sure the lib's pretty ubiquitous anyway)
. Flex and Bison ports
Note, however, that while it may be of interest at this stage for the deeply binary curious, it is _not_ at all ready for average end users as yet. It generates valid code, but only for a subset of the higher level language, which isn't even PBASIC-proper yet (though the migration to this should be pretty painless from here). It's just up in a pre-alpha state for anyone interested in contributing, mostly.
The repository is at:
http://sourceforge.net/p/pbtc/git/ci/master/tree/
... and to get some idea of the higher-level language support progress, see especially the Bison input at:
http://sourceforge.net/p/pbtc/git/ci/master/tree/pbp/pbas.y
Thanks all. Feedback appreciated, but, again, bear in mind: this is not at all yet expected to be an end-user-friendly tool.
Comments
c++ My ability to read c code stands somewhere between my latin and my greek, beyond that ++?.
STL support, g++/gcc, libpcre port, Perl support, no idea, despite ubiquity.
Flex and Bison, isn't that what you do at the gym, and what does it have to do with a shaggy prehistoric-looking mammal that hangs around bubbling geysers?! I stand in awe! But thanks again for taking it on.
Work's got a bit mad again, so I've had to slow down a bit, but I've been able to put bits and pieces of hours into it in the evenings, last little while, chipping away. At this rate, I figure I'll probably have full coverage for the BS2 command set from BASIC itself within a week or so, at this rate. Then still lots of fit and finish things to make it a little less user-hostile, which only the very unwise would even try to predict, timewise, but it really does look like getting it to useful shouldn't be that big a thing.
Thanks,
Ken Gracey
Yes I'm interested, and will take your advice about waiting awhile.
In the meanwhile I do have pbasic working and can well get along with that.
Thank You
P.S. This site really needs a Linux forum.
ex.bs2:
---
... if you ran the standard tokenizer on it, generating ex.tok, you could then then run pbttc (the encoder/decoder) in 'disassembly' mode on the .tok file, as follows:
pbttc d ex.tok > ex.detok
... yielding the following:
ex.detok:
---
... that's a 'portable' version of the mnemonic syntax, with the jump targets abstracted to labels. You can also request a 'literal' format, with the addresses and other bits left in place, like this:
pbttc ex.tok -n > ex.ndetok
... yielding:
ex.ndetok:
---
... pbttc also can 'assemble' files written in the portable mnemonic format (I've taken to calling it pbt), either to the literal format (it just works out the 'header' with the start and return address, and resolves the jump addresses) or to a properly encoded .tok file. And this portable format, again, is the format another tool, pbp, the basic parser/compiler, _emits_, when reading BASIC input. So you can feed pbp a file like this:
ex.bss:
---
... like this:
pbpc ex.bss > ex.pbt
... and it will construct the following, from the BASIC:
ex.pbt:
---
... which, yes, is identical (excepting some diffferent indents and names for jump target labels) to the detokenized output from the disassembly of the original .tok up there...
So, unsurprisingly, if you then run the 'assembly' version of pbttc on this, like, say, this:
pbttc a ex.pbt ex.ntok
... that .ntok file will be identical (in this case) to the original .tok file emitted by the original tokenizer:
... so it's simply:
BASIC source -> parse with pbp -> emits .pbt stream
.pbt -> assemble with pbttc -> encodes .tok format file.
... and you can thereby encode valid .tok files entirely using the new toolchain, and this should work on any platform on which the two tools will themselves build.
(... caveat: I say it's identical 'in this case' because, in fact, although it happens to be entirely identical in this example, pbp doesn't _always_ make quite the same choices about certain things as did the original--it deliberately organizes some jumps a bit differently, to try to save some extra/unnecessary code in certain cases, so on. But it mostly comes up with the same thing, bit for bit, so far (and where it doesn't, your end result should be the same, when it runs, at least, and I'm figuring I'll probably add a 'compatibility' mode, in which it really does crank out identical binaries, if this is what's preferred).)
(... and yes, you'll note the input syntax the pbp stage currently expects isn't _quite_ identical to standard PBASIC. But it's not going to be much trouble to make them the same, actually. I did that mostly because I'm a bit more familiar with languages that look like that myself, and it so happened the Bison example I started from was laid out more like that, and I've been focusing so far on making sure it emits good object code, given the same 'sense' of the input program. But if you take a look at the Bison grammar up there, yes, making it entirely the same should be pretty painless, mostly a matter of switching out the semicolons for end-of-lines, adding the terminal 'loop' and 'next' tokens to the grammar spec, so on. And this is on the todo list (though honestly, I've been getting a bit attached to the new syntax, may keep it around as an option, as well, anyway).)
That's it. That's pbtc.
... you could maybe just get (and likewise compose for assembly) something like this:
... which, of course, would be a little more comprehensible, again, and definitely nicer for anyone actually trying to _write_ in .pbt, as opposed to using it as an intermediate format/analysis/optimization tool... (And, in fact, I've a Perl version of the disassembler that already does some of that (it does understand string formatters), but it's a bit down the priority list, again, really making that smooth in the working version.)
(ETA: I have now actually implemented half of this: the disassembler will now glob up a series of simple pushes of ints in the ASCII range in serout targets as strings, as above, and the assembler will expand these properly, so this part is solved. Figure it's an obvious enhancement. Formatters may be along in a bit; guess we'll see.)