'Unofficial' tokenizer for 64-bit linux (and others)
AJ Milne
Posts: 12
My 9-year old son recently started playing with a Boe-Bot kit we had sitting around (he's really having fun; nothing like blinking LEDs and humming servos and breadboards at any age, I'd say), and in setting up an environment he could program from, I discovered the existing tokenizer library for Linux is a) closed source, and b) 32-bit only...
(... this isn't, by the way, completely unworkable, for those of you who know your Linux... building and running a 32-bit binary on a 64-bit system isn't really that huge a headache on most distros, so I did get it working for him: installed the multiarch stuff and did build (I'm a Debian guy, and aptitude made it all pretty painless)... And I guess you could always run XP in a VM, if you don't mind having such a thing on one of your boxen somewhere (this I like less, so did not do... I'm a crypto/net security guy by trade; having that binary running anywhere makes me squeamish)...
But in doing so I got to pondering... could something a bit better be managed? The existing tokenizer is functional, but it seems to me this is likely to become more and more of a pain, not having something open source handy, if you still want to program the stamp family from modern systems... And I'm generally cranky about closed-source language generation. I like to know what my compilers are up to, thank you very much...
So, in fact, I have addressed this. Sort of. I've a tokenizer (and, actually, detokenizer) now that can handle putting together binaries for the BS2 family from a sort of 'assembler-level' token stream syntax... The actual PBASIC-to-assemblerish bit is still in progress, as I work out the vagaries of bison and flex, which do seem to be the way to do this (I'm not a huge compiler guy, honestly...) It's been interesting. Pretty sure I can improve upon the code generated by the existing tokenizer, a bit, with a bit of effort, too... (Though this is also about the programmer, of course, but then, just _seeing_ how the object goes together can help you do things less dumb, which wasn't really an option without a tool like this.) Weird little goto chains do seem to crop up pretty regularly in the objects the existing s/w puts together; I figure an optimization stage could probably detangle these without much trouble.
Anyway, point of this question is, I guess:
1) Is this generally going to be of interest to people? I'd seen some grumbling, but don't know if non-Windows people are making do. It's burned a bit of time I don't really have getting it even this far, but I figure maybe if there's interest, and I check it into Sourceforge, I might get some extra hands on deck, or at least beta testing ...
2) Does anyone know if Parallax is going to be real annoyed at this? (Or are there Parallax people around who might be able to address this directly?) I was guessing probably the reason they haven't updated the original is just available hands, again. But I don't want to Smile off a company that's done some great little products, between the stamps and propeller, if they'd really rather this stuff _weren't_ generally visible. I was actually on the verge of opening up a project on Sourceforge, getting this thing out of my own version control, then got to thinking... What to call it? I note that the Stamps and PBASIC itself are trademarks, so hemmed a bit about that..
Anyway. Anyone? And thanks.
(... this isn't, by the way, completely unworkable, for those of you who know your Linux... building and running a 32-bit binary on a 64-bit system isn't really that huge a headache on most distros, so I did get it working for him: installed the multiarch stuff and did build (I'm a Debian guy, and aptitude made it all pretty painless)... And I guess you could always run XP in a VM, if you don't mind having such a thing on one of your boxen somewhere (this I like less, so did not do... I'm a crypto/net security guy by trade; having that binary running anywhere makes me squeamish)...
But in doing so I got to pondering... could something a bit better be managed? The existing tokenizer is functional, but it seems to me this is likely to become more and more of a pain, not having something open source handy, if you still want to program the stamp family from modern systems... And I'm generally cranky about closed-source language generation. I like to know what my compilers are up to, thank you very much...
So, in fact, I have addressed this. Sort of. I've a tokenizer (and, actually, detokenizer) now that can handle putting together binaries for the BS2 family from a sort of 'assembler-level' token stream syntax... The actual PBASIC-to-assemblerish bit is still in progress, as I work out the vagaries of bison and flex, which do seem to be the way to do this (I'm not a huge compiler guy, honestly...) It's been interesting. Pretty sure I can improve upon the code generated by the existing tokenizer, a bit, with a bit of effort, too... (Though this is also about the programmer, of course, but then, just _seeing_ how the object goes together can help you do things less dumb, which wasn't really an option without a tool like this.) Weird little goto chains do seem to crop up pretty regularly in the objects the existing s/w puts together; I figure an optimization stage could probably detangle these without much trouble.
Anyway, point of this question is, I guess:
1) Is this generally going to be of interest to people? I'd seen some grumbling, but don't know if non-Windows people are making do. It's burned a bit of time I don't really have getting it even this far, but I figure maybe if there's interest, and I check it into Sourceforge, I might get some extra hands on deck, or at least beta testing ...
2) Does anyone know if Parallax is going to be real annoyed at this? (Or are there Parallax people around who might be able to address this directly?) I was guessing probably the reason they haven't updated the original is just available hands, again. But I don't want to Smile off a company that's done some great little products, between the stamps and propeller, if they'd really rather this stuff _weren't_ generally visible. I was actually on the verge of opening up a project on Sourceforge, getting this thing out of my own version control, then got to thinking... What to call it? I note that the Stamps and PBASIC itself are trademarks, so hemmed a bit about that..
Anyway. Anyone? And thanks.
Comments
It's all written in fairly vanilla C++ (assembler/disassembler stage)* and C (bison/flex-based parser, so much as exists as yet), so really any platform that's got a decent toolchain that can handle C++ (including some STL) could probably build it. So given maybe a bit of finagling around bit order at I/O on bigendian architectures, you could probably get it working just about anywhere, and people looking for a tokenizer for MIPs and so on could also be made happy.
*And one lib requirement, so far: libpcre... I don't ever leave home without my regular expressions.
You took Parallax's existing tokenizer for the BS2 in Linux 32bit and migrated it to 64bit. And now you want to know if Parallax will be upset.
IMO, I suspect not. But why not ask them directly.
I seems it pretty much just makes the BS2 useful on more platforms. The only area of sensitivity that I am aware of is if you choose to replicate the internal interpreter that Parallax uses in their BS2.
And I'm not so sure I can say I 'migrated' it so much... Didn't even look at the existing lib, just puzzled out the token formats, and wrote new code entirely. Seemed an easy enough way 'round, given it's hardly a huge language. (Picking nits, and let's hear it for another huge failure in reuse of existing code, but anyway...)
Anyway. Yeah. Phone, I guess, if I don't hear here.
I would like to use in in my Linux platform distro, Mil - Spec. I am checking out every "goody"
avaliable for Parallax products to be included automatically in the software. The target is to
get all the parallax software I can, to run seamlessly on the linux distro; The first time.
Murat Konar came up with MacBS2, which uses the powerPC tokenizer. He has a new experimental version using a new tokenizer, but it is missing a lot of features, such as support for multiple slots. I don't know, maybe you can put your heads together. I'm a Mac guy, and I keep a machine running system 10.6.8 mainly so I can continue running MacBS2.
Have you looked at Chuck Mcmannis' page about the operation of the Stamp 1 tokenizer?
http://www.mcmanis.com/chuck/robotics/stamp-decode.html
There was also a incisive pamphlet by Brian Forbes, "Inside the BASIC Stamp 2", but I don't see links to that presently.
PBASIC version 2.5 supports syntax that was not present in PBASIC 2.0, but the latter is closer to the tokens that are actually present in the hardware. That might explain some of the "Weird little goto chains do seem to crop up pretty regularly in the objects the existing s/w puts together".
The token format has changed a bit since what Chuck worked on for the BS1; main thing that's different is the general 'op' tokens are 7 bits long, and just from a casual glance, they're generally different constants. Addresses are now a bit wider, too, due to the size of the eeprom, I figure (looks otherwise pretty familiar), the gosub 'index' field for the return address, too, so on...
My tokenizer doesn't yet support slots, either--I was just working with a vanilla BS2 for h/w, so it wasn't an initial priority--but it's probably pretty easily addressed from here if there are enough people with sufficient interest/need. I've built, among other things, a sort of interactive, curses-based Perl thing for parsing out streams containing partially unknown tokens that should make working out how such things should be assembled pretty fast (I got through the whole of the base BS2 command set (minus the xout encoding... it's also on the todo list... it happened to exhaust my patience, and there were other priorities) in a little under two weeks, over scattered evenings after work, including building this tool, and using it, and a bit of judicious scripting.)
... this is my general notion for approaching this, by the way: get clean, solid, cross-platform binary assembler/disassembler tools bootstrapped; if there's a sufficient interest, a few interested people picking away at it, we can build tools increasingly solid and full-featured on top of this. People want to encode streams from whatever it is they're running, however far from x86 commodity PCs and OSes it may be, fine, here's the code, bring your platform's toolchain, let's see how we can make that work.
I'm meeting tomorrow with Jeff Martin, the developer of the original tokenizer... he has very kindly offered input/explanations... so I'm hoping building some very flexible cross-platform support for this hardware going here is becoming a very real possibility, here. Should be some source up very shortly.
We all have our promotional fantasies. Mil-spec versus Maelstron... both infer a bit of dramatic hard-core. I suspect that mil-spec just might get a lot more hits on a generic search engine. That might be a very clever thing.
help. (........)
AJ, That's quite a project you've taken on. At least it seems so to me, who know only the barest outline of the tools you are using. Jeff is definitely the guy to talk to about the inner workings. I wonder though about the investment in time you are makiing at this stage in the maturity of the Stamp. Your 9 year old could use the existing tools on an older computer, blink those LEDs, hum those servos, and, shall I say it, try out the Propeller. ---- Are father and son basefall fans? how about the world series (go Giants!).
... and I hope I'm not stealing anyone's thunder by saying: they're also planning on open-sourcing the original tokenizer; it's just been a matter of time to get it ready for this, so in addition to being happy to answer questions, he's sending along the code, which should speed much of this, especially dealing with encoding odd stuff not yet covered. It may also, obviously, make this project somewhat less useful, since this, too, should bring things along on emerging platforms looking for a toolchain. But I'm going to proceed and get it onto Sourceforge anyway; it does seem to me to provide a number of debug/optimization facilities probably wouldn't otherwise exist.
And Tracy, yeah, I had that thought, too, about effort vs. age of h/w. (And do actually have a quickstart propeller board in here just arrived the other day, with various plans in mind already for what can be done with more bins, more cores, more speed.) But for all that they _are_ aging, I figure this is still a nice-to-have for the older chips; it really did strike me how easily my son did take to the simplicity of it. (And as I explained to Jeff, I actually find their simplicity is part of their charm; as someone whose work mostly lives in application and protocol layers standing on those very towering stacks of drivers and O/S, dead simple and slow--but also incredibly low-power and reliable--is sometimes just a breath of fresh air). Be nice to think this might open up a few more possibilities, for people still working with them, teaching on them, so on. And the plan isn't so much to give my life to this; just gonna pick away at it when there's time/inclination, or I'm not sleeping anyway, and if there's someone out there sees a way to build on this, toward making their platform work, it gives them that way forward, too. Had the thought, too, discussed with Jeff, that depending a bit on how things are implemented in the interpreter, it may even be doable to map other languages above the token stream, given this kind of tool...
... so, long story short, I'll have something up probably by the end of the weekend, depending a bit on real life, will post here when there's a URL.
Welcome to the forums!
This sounds like a great en devour, and to meet with the head honcho, even better! That should streamline it.
Hope you, your son, and everybody benefits from it.
Jim
I did get the code cleaned up enough I could stand to put it online, and checked it in. Calling it 'pbtc' for 'a PBASIC tokenization toolchain'. Will announce in its own thread for anyone interested.
I wholeheartedly agree, the BASIC Stamp is an unparallelled teaching tool. Kudos to Parallax for keeping it going. The Stamp has been around and successful for a long time and is remarkably powerful for all its simplicity. It still is in a class by itself, how easy it is to get started and to get results, flash the LED, print "Hello".. I do appreciate your efforts to help it along and to keep it current.