pbtc -- an open source PBASIC tokenization toolchain
As per the previous thread, this one has its own tokenizer, so should be fully buildable on any target system that has:
. a decent C++ compiler with STL support (this is any halfway recent g++/gcc)
. a libpcre port (Perl-compatible regular expressions library for C/C++; this dependency may go away in not much longer, but I'm pretty sure the lib's pretty ubiquitous anyway)
. Flex and Bison ports
Note, however, that while it may be of interest at this stage for the deeply binary curious, it is _not_ at all ready for average end users as yet. It generates valid code, but only for a subset of the higher level language, which isn't even PBASIC-proper yet (though the migration to this should be pretty painless from here). It's just up in a pre-alpha state for anyone interested in contributing, mostly.
The repository is at:
http://sourceforge.net/p/pbtc/git/ci/master/tree/
... and to get some idea of the higher-level language support progress, see especially the Bison input at:
http://sourceforge.net/p/pbtc/git/ci/master/tree/pbp/pbas.y
Thanks all. Feedback appreciated, but, again, bear in mind: this is not at all yet expected to be an end-user-friendly tool.
. a decent C++ compiler with STL support (this is any halfway recent g++/gcc)
. a libpcre port (Perl-compatible regular expressions library for C/C++; this dependency may go away in not much longer, but I'm pretty sure the lib's pretty ubiquitous anyway)
. Flex and Bison ports
Note, however, that while it may be of interest at this stage for the deeply binary curious, it is _not_ at all ready for average end users as yet. It generates valid code, but only for a subset of the higher level language, which isn't even PBASIC-proper yet (though the migration to this should be pretty painless from here). It's just up in a pre-alpha state for anyone interested in contributing, mostly.
The repository is at:
http://sourceforge.net/p/pbtc/git/ci/master/tree/
... and to get some idea of the higher-level language support progress, see especially the Bison input at:
http://sourceforge.net/p/pbtc/git/ci/master/tree/pbp/pbas.y
Thanks all. Feedback appreciated, but, again, bear in mind: this is not at all yet expected to be an end-user-friendly tool.
Comments
c++ My ability to read c code stands somewhere between my latin and my greek, beyond that ++?.
STL support, g++/gcc, libpcre port, Perl support, no idea, despite ubiquity.
Flex and Bison, isn't that what you do at the gym, and what does it have to do with a shaggy prehistoric-looking mammal that hangs around bubbling geysers?! I stand in awe! But thanks again for taking it on.
Work's got a bit mad again, so I've had to slow down a bit, but I've been able to put bits and pieces of hours into it in the evenings, last little while, chipping away. At this rate, I figure I'll probably have full coverage for the BS2 command set from BASIC itself within a week or so, at this rate. Then still lots of fit and finish things to make it a little less user-hostile, which only the very unwise would even try to predict, timewise, but it really does look like getting it to useful shouldn't be that big a thing.
Thanks,
Ken Gracey
Yes I'm interested, and will take your advice about waiting awhile.
In the meanwhile I do have pbasic working and can well get along with that.
Thank You
P.S. This site really needs a Linux forum.
ex.bs2:
---
a var byte b var byte c var word b = 1 c = 650 for a = 1 to 8 debug "test: ", dec a, cr c = c + 25 gosub l3 next freqout 4, 2000, 3000 end l3: pulsout a, c pause 20 return
... if you ran the standard tokenizer on it, generating ex.tok, you could then then run pbttc (the encoder/decoder) in 'disassembly' mode on the .tok file, as follows:
pbttc d ex.tok > ex.detok
... yielding the following:
ex.detok:
---
. 1 . set_var_byte 09 vset . 650 . set_var_word 03 vset . 1 . set_var_byte 08 vset label_1: . 84 . 16 serout 116 ee lc 101 ee lc 115 ee lc 116 ee lc 58 ee lc 32 ee lc 438 . get_var_byte 08 ee lc 13 ee le . get_var_word 03 . 25 opr+ . set_var_word 03 vset gosub label_0 . 8 . 1 loop_cmp_step_jmp get_var_byte 08 . 1 ee set_var_byte 08 ee adr. label_1 . 4 . 2000 . 3000 freqout end label_0: . get_var_word 03 . get_var_byte 08 pulsout . 20 pause return
... that's a 'portable' version of the mnemonic syntax, with the jump targets abstracted to labels. You can also request a 'literal' format, with the addresses and other bits left in place, like this:
pbttc ex.tok -n > ex.ndetok
... yielding:
ex.ndetok:
---
000.0 s_addr 003.4 001.6 adr. 02b.3 003.4 . 1 004.4 . set_var_byte 09 006.0 vset 006.7 . 650 009.0 . set_var_word 03 00a.3 vset 00b.2 . 1 00c.2 . set_var_byte 08 00d.6 vset 00e.5 . 84 010.3 . 16 011.3 serout 012.2 116 013.7 ee 014.0 lc 014.1 101 015.6 ee 015.7 lc 016.0 115 017.5 ee 017.6 lc 017.7 116 019.4 ee 019.5 lc 019.6 58 01b.2 ee 01b.3 lc 01b.4 32 01c.3 ee 01c.4 lc 01c.5 438 01e.4 . get_var_byte 08 020.0 ee 020.1 lc 020.2 13 021.4 ee 021.5 le 021.6 . get_var_word 03 023.1 . 25 024.5 opr+ 025.4 . set_var_word 03 026.7 vset 027.6 gosub 03b.3 r. 02b.3 02b.3 . 8 02c.3 . 1 02d.3 loop_cmp_step_jmp 02e.2 get_var_byte 08 02f.5 . 1 030.5 ee 030.6 set_var_byte 08 032.1 ee 032.2 adr. 00e.5 034.0 . 4 035.0 . 2000 037.2 . 3000 039.5 freqout 03a.4 end 03b.3 . get_var_word 03 03c.6 . get_var_byte 08 03e.2 pulsout 03f.1 . 20 040.5 pause 041.4 return
... pbttc also can 'assemble' files written in the portable mnemonic format (I've taken to calling it pbt), either to the literal format (it just works out the 'header' with the start and return address, and resolves the jump addresses) or to a properly encoded .tok file. And this portable format, again, is the format another tool, pbp, the basic parser/compiler, _emits_, when reading BASIC input. So you can feed pbp a file like this:
ex.bss:
---
a var byte; b var byte; c var word; b = 1; c = 650; for a = 1 to 8 { debug "test: ", dec a, cr; c = c + 25; gosub l3; } freqout 4, 2000, 3000; end; l3: pulsout a, c; pause 20; return;
... like this:
pbpc ex.bss > ex.pbt
... and it will construct the following, from the BASIC:
ex.pbt:
---
. 1 . set_var_byte 09 vset . 650 . set_var_word 03 vset . 1 . set_var_byte 08 vset L000: . 84 . 16 serout 116 ee lc 101 ee lc 115 ee lc 116 ee lc 58 ee lc 32 ee lc 438 . get_var_byte 08 ee lc 13 ee le . get_var_word 03 . 25 opr+ . set_var_word 03 vset gosub l3 . 8 . 1 loop_cmp_step_jmp get_var_byte 08 . 1 ee set_var_byte 08 ee adr. L000 . 4 . 2000 . 3000 freqout end l3: . get_var_word 03 . get_var_byte 08 pulsout . 20 pause return
... which, yes, is identical (excepting some diffferent indents and names for jump target labels) to the detokenized output from the disassembly of the original .tok up there...
So, unsurprisingly, if you then run the 'assembly' version of pbttc on this, like, say, this:
pbttc a ex.pbt ex.ntok
... that .ntok file will be identical (in this case) to the original .tok file emitted by the original tokenizer:
... so it's simply:
BASIC source -> parse with pbp -> emits .pbt stream
.pbt -> assemble with pbttc -> encodes .tok format file.
... and you can thereby encode valid .tok files entirely using the new toolchain, and this should work on any platform on which the two tools will themselves build.
(... caveat: I say it's identical 'in this case' because, in fact, although it happens to be entirely identical in this example, pbp doesn't _always_ make quite the same choices about certain things as did the original--it deliberately organizes some jumps a bit differently, to try to save some extra/unnecessary code in certain cases, so on. But it mostly comes up with the same thing, bit for bit, so far (and where it doesn't, your end result should be the same, when it runs, at least, and I'm figuring I'll probably add a 'compatibility' mode, in which it really does crank out identical binaries, if this is what's preferred).)
(... and yes, you'll note the input syntax the pbp stage currently expects isn't _quite_ identical to standard PBASIC. But it's not going to be much trouble to make them the same, actually. I did that mostly because I'm a bit more familiar with languages that look like that myself, and it so happened the Bison example I started from was laid out more like that, and I've been focusing so far on making sure it emits good object code, given the same 'sense' of the input program. But if you take a look at the Bison grammar up there, yes, making it entirely the same should be pretty painless, mostly a matter of switching out the semicolons for end-of-lines, adding the terminal 'loop' and 'next' tokens to the grammar spec, so on. And this is on the todo list (though honestly, I've been getting a bit attached to the new syntax, may keep it around as an option, as well, anyway).)
That's it. That's pbtc.
... . 84 . 16 serout 116 ee lc 101 ee lc 115 ee lc 116 ee lc 58 ee lc 32 ee lc 438 . get_var_byte 08 ee ...
... you could maybe just get (and likewise compose for assembly) something like this:
... . 84 . 16 serout "test: " lc dec . get_var_byte 08 ee ...
... which, of course, would be a little more comprehensible, again, and definitely nicer for anyone actually trying to _write_ in .pbt, as opposed to using it as an intermediate format/analysis/optimization tool... (And, in fact, I've a Perl version of the disassembler that already does some of that (it does understand string formatters), but it's a bit down the priority list, again, really making that smooth in the working version.)
(ETA: I have now actually implemented half of this: the disassembler will now glob up a series of simple pushes of ints in the ASCII range in serout targets as strings, as above, and the assembler will expand these properly, so this part is solved. Figure it's an obvious enhancement. Formatters may be along in a bit; guess we'll see.)