Shop OBEX P1 Docs P2 Docs Learn Events
'Unofficial' tokenizer for 64-bit linux (and others) — Parallax Forums

'Unofficial' tokenizer for 64-bit linux (and others)

AJ MilneAJ Milne Posts: 12
edited 2014-10-20 11:52 in BASIC Stamp
My 9-year old son recently started playing with a Boe-Bot kit we had sitting around (he's really having fun; nothing like blinking LEDs and humming servos and breadboards at any age, I'd say), and in setting up an environment he could program from, I discovered the existing tokenizer library for Linux is a) closed source, and b) 32-bit only...

(... this isn't, by the way, completely unworkable, for those of you who know your Linux... building and running a 32-bit binary on a 64-bit system isn't really that huge a headache on most distros, so I did get it working for him: installed the multiarch stuff and did build (I'm a Debian guy, and aptitude made it all pretty painless)... And I guess you could always run XP in a VM, if you don't mind having such a thing on one of your boxen somewhere (this I like less, so did not do... I'm a crypto/net security guy by trade; having that binary running anywhere makes me squeamish)...

But in doing so I got to pondering... could something a bit better be managed? The existing tokenizer is functional, but it seems to me this is likely to become more and more of a pain, not having something open source handy, if you still want to program the stamp family from modern systems... And I'm generally cranky about closed-source language generation. I like to know what my compilers are up to, thank you very much...

So, in fact, I have addressed this. Sort of. I've a tokenizer (and, actually, detokenizer) now that can handle putting together binaries for the BS2 family from a sort of 'assembler-level' token stream syntax... The actual PBASIC-to-assemblerish bit is still in progress, as I work out the vagaries of bison and flex, which do seem to be the way to do this (I'm not a huge compiler guy, honestly...) It's been interesting. Pretty sure I can improve upon the code generated by the existing tokenizer, a bit, with a bit of effort, too... (Though this is also about the programmer, of course, but then, just _seeing_ how the object goes together can help you do things less dumb, which wasn't really an option without a tool like this.) Weird little goto chains do seem to crop up pretty regularly in the objects the existing s/w puts together; I figure an optimization stage could probably detangle these without much trouble.

Anyway, point of this question is, I guess:

1) Is this generally going to be of interest to people? I'd seen some grumbling, but don't know if non-Windows people are making do. It's burned a bit of time I don't really have getting it even this far, but I figure maybe if there's interest, and I check it into Sourceforge, I might get some extra hands on deck, or at least beta testing ...

2) Does anyone know if Parallax is going to be real annoyed at this? (Or are there Parallax people around who might be able to address this directly?) I was guessing probably the reason they haven't updated the original is just available hands, again. But I don't want to Smile off a company that's done some great little products, between the stamps and propeller, if they'd really rather this stuff _weren't_ generally visible. I was actually on the verge of opening up a project on Sourceforge, getting this thing out of my own version control, then got to thinking... What to call it? I note that the Stamps and PBASIC itself are trademarks, so hemmed a bit about that..

Anyway. Anyone? And thanks.

Comments

  • AJ MilneAJ Milne Posts: 12
    edited 2014-10-11 07:57
    ... oh, adding:

    It's all written in fairly vanilla C++ (assembler/disassembler stage)* and C (bison/flex-based parser, so much as exists as yet), so really any platform that's got a decent toolchain that can handle C++ (including some STL) could probably build it. So given maybe a bit of finagling around bit order at I/O on bigendian architectures, you could probably get it working just about anywhere, and people looking for a tokenizer for MIPs and so on could also be made happy.

    *And one lib requirement, so far: libpcre... I don't ever leave home without my regular expressions.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2014-10-11 10:29
    Umm. let's see.
    You took Parallax's existing tokenizer for the BS2 in Linux 32bit and migrated it to 64bit. And now you want to know if Parallax will be upset.

    IMO, I suspect not. But why not ask them directly.

    I seems it pretty much just makes the BS2 useful on more platforms. The only area of sensitivity that I am aware of is if you choose to replicate the internal interpreter that Parallax uses in their BS2.
  • AJ MilneAJ Milne Posts: 12
    edited 2014-10-11 12:04
    Thanks. Thing is, the contact addresses are the usual 'sales' and 'tech support'... Didn't really figure this would route well through either, and hoped maybe there'd be some more dev types who'd know the lay of the land here keeping an eye on their online forum (and there's not much point to this anyway if the community's not interested). But can try the phone on Monday Pacific Time, if the former isn't so much the case, I guess.

    And I'm not so sure I can say I 'migrated' it so much... Didn't even look at the existing lib, just puzzled out the token formats, and wrote new code entirely. Seemed an easy enough way 'round, given it's hardly a huge language. (Picking nits, and let's hear it for another huge failure in reuse of existing code, but anyway...)

    Anyway. Yeah. Phone, I guess, if I don't hear here.
  • AJ MilneAJ Milne Posts: 12
    edited 2014-10-15 16:18
    Following up: I did get in touch with someone at Parallax, and they seem generally friendly to the notion (thanks much), and we're meeting shortly about this thing. So I expect to have something posted publicly fairly imminently, so people looking for a tokenizer for 64-bit Linux and non-x86 architectures should have a path here, anyway (caveat being: it will initially be very alpha s/w, limited as yet, etc. etc., but something to build on, at least).
  • mklrobomklrobo Posts: 420
    edited 2014-10-16 06:02
    :cool: I am interested in your methodology, as I understand it, streamlined by LoopyByteloose.
    I would like to use in in my Linux platform distro, Mil - Spec. I am checking out every "goody"
    avaliable for Parallax products to be included automatically in the software. The target is to
    get all the parallax software I can, to run seamlessly on the linux distro; The first time.:innocent:
  • Dave HeinDave Hein Posts: 6,347
    edited 2014-10-16 06:58
    mklrobo, I thought you had dropped the misleading "mil-spec" name and changed it to Maelstrom. Why do you continue to use a name that implies that it is endorsed by the U.S. Department of Defense?
  • Tracy AllenTracy Allen Posts: 6,662
    edited 2014-10-16 08:13
    How about a direct call to Ken Gracey at Parallax for the permissions and technical contacts. He's the honcho.

    Murat Konar came up with MacBS2, which uses the powerPC tokenizer. He has a new experimental version using a new tokenizer, but it is missing a lot of features, such as support for multiple slots. I don't know, maybe you can put your heads together. I'm a Mac guy, and I keep a machine running system 10.6.8 mainly so I can continue running MacBS2.

    Have you looked at Chuck Mcmannis' page about the operation of the Stamp 1 tokenizer?
    http://www.mcmanis.com/chuck/robotics/stamp-decode.html
    There was also a incisive pamphlet by Brian Forbes, "Inside the BASIC Stamp 2", but I don't see links to that presently.

    PBASIC version 2.5 supports syntax that was not present in PBASIC 2.0, but the latter is closer to the tokens that are actually present in the hardware. That might explain some of the "Weird little goto chains do seem to crop up pretty regularly in the objects the existing s/w puts together".
  • AJ MilneAJ Milne Posts: 12
    edited 2014-10-16 11:34
    Thanks Tracy. I did see Chuck McMannis' page, as well as yours. His got me to thinking about it a bit more seriously; yours, honestly, made getting started easier. So it's actually pretty awesome to hear from you, as your work was very helpful: not having to pull apart the packet format myself took this from 'might be fun' to 'okay, I'm awake anyway, let's build from there'. The checksum alone would have taken who knows how much trial and error to piece together; I remain very grateful.

    The token format has changed a bit since what Chuck worked on for the BS1; main thing that's different is the general 'op' tokens are 7 bits long, and just from a casual glance, they're generally different constants. Addresses are now a bit wider, too, due to the size of the eeprom, I figure (looks otherwise pretty familiar), the gosub 'index' field for the return address, too, so on...

    My tokenizer doesn't yet support slots, either--I was just working with a vanilla BS2 for h/w, so it wasn't an initial priority--but it's probably pretty easily addressed from here if there are enough people with sufficient interest/need. I've built, among other things, a sort of interactive, curses-based Perl thing for parsing out streams containing partially unknown tokens that should make working out how such things should be assembled pretty fast (I got through the whole of the base BS2 command set (minus the xout encoding... it's also on the todo list... it happened to exhaust my patience, and there were other priorities) in a little under two weeks, over scattered evenings after work, including building this tool, and using it, and a bit of judicious scripting.)

    ... this is my general notion for approaching this, by the way: get clean, solid, cross-platform binary assembler/disassembler tools bootstrapped; if there's a sufficient interest, a few interested people picking away at it, we can build tools increasingly solid and full-featured on top of this. People want to encode streams from whatever it is they're running, however far from x86 commodity PCs and OSes it may be, fine, here's the code, bring your platform's toolchain, let's see how we can make that work.

    I'm meeting tomorrow with Jeff Martin, the developer of the original tokenizer... he has very kindly offered input/explanations... so I'm hoping building some very flexible cross-platform support for this hardware going here is becoming a very real possibility, here. Should be some source up very shortly.
  • LoopyBytelooseLoopyByteloose Posts: 12,537
    edited 2014-10-16 11:57
    Dave Hein wrote: »
    mklrobo, I thought you had dropped the misleading "mil-spec" name and changed it to Maelstrom. Why do you continue to use a name that implies that it is endorsed by the U.S. Department of Defense?

    We all have our promotional fantasies. Mil-spec versus Maelstron... both infer a bit of dramatic hard-core. I suspect that mil-spec just might get a lot more hits on a generic search engine. That might be a very clever thing.
  • mklrobomklrobo Posts: 420
    edited 2014-10-16 12:13
    Dave Hein wrote: »
    mklrobo, I thought you had dropped the misleading "mil-spec" name and changed it to Maelstrom. Why do you continue to use a name that implies that it is endorsed by the U.S. Department of Defense?
    I stand corrected. The official version name will be "Maelstrom". The group I am in, had an internal "discussion" about what to call what. With the Forums input, It will be renamed. I appreciate the
    help. (........)
  • Tracy AllenTracy Allen Posts: 6,662
    edited 2014-10-17 09:47
    I still do a lot of Stamp programming, but at this point in time it is mainly maintenance. Mostly all multislot on the BS2pe. I do far more now with the Propeller.

    AJ, That's quite a project you've taken on. At least it seems so to me, who know only the barest outline of the tools you are using. Jeff is definitely the guy to talk to about the inner workings. I wonder though about the investment in time you are makiing at this stage in the maturity of the Stamp. Your 9 year old could use the existing tools on an older computer, blink those LEDs, hum those servos, and, shall I say it, try out the Propeller. ---- Are father and son basefall fans? how about the world series (go Giants!).
  • AJ MilneAJ Milne Posts: 12
    edited 2014-10-17 11:25
    So I've met with Jeff over lunch, had a great meeting, swapped war stories of h/w changing beneath towering stacks of code. Upshot is: they're happy to see this happen...

    ... and I hope I'm not stealing anyone's thunder by saying: they're also planning on open-sourcing the original tokenizer; it's just been a matter of time to get it ready for this, so in addition to being happy to answer questions, he's sending along the code, which should speed much of this, especially dealing with encoding odd stuff not yet covered. It may also, obviously, make this project somewhat less useful, since this, too, should bring things along on emerging platforms looking for a toolchain. But I'm going to proceed and get it onto Sourceforge anyway; it does seem to me to provide a number of debug/optimization facilities probably wouldn't otherwise exist.

    And Tracy, yeah, I had that thought, too, about effort vs. age of h/w. (And do actually have a quickstart propeller board in here just arrived the other day, with various plans in mind already for what can be done with more bins, more cores, more speed.) But for all that they _are_ aging, I figure this is still a nice-to-have for the older chips; it really did strike me how easily my son did take to the simplicity of it. (And as I explained to Jeff, I actually find their simplicity is part of their charm; as someone whose work mostly lives in application and protocol layers standing on those very towering stacks of drivers and O/S, dead simple and slow--but also incredibly low-power and reliable--is sometimes just a breath of fresh air). Be nice to think this might open up a few more possibilities, for people still working with them, teaching on them, so on. And the plan isn't so much to give my life to this; just gonna pick away at it when there's time/inclination, or I'm not sleeping anyway, and if there's someone out there sees a way to build on this, toward making their platform work, it gives them that way forward, too. Had the thought, too, discussed with Jeff, that depending a bit on how things are implemented in the interpreter, it may even be doable to map other languages above the token stream, given this kind of tool...

    ... so, long story short, I'll have something up probably by the end of the weekend, depending a bit on real life, will post here when there's a URL.
  • PublisonPublison Posts: 12,366
    edited 2014-10-17 11:59
    AJ,

    Welcome to the forums!

    This sounds like a great en devour, and to meet with the head honcho, even better! That should streamline it.

    Hope you, your son, and everybody benefits from it.

    Jim
  • AJ MilneAJ Milne Posts: 12
    edited 2014-10-19 05:01
    Thanks Jim/Publison, all.

    I did get the code cleaned up enough I could stand to put it online, and checked it in. Calling it 'pbtc' for 'a PBASIC tokenization toolchain'. Will announce in its own thread for anyone interested.
  • Tracy AllenTracy Allen Posts: 6,662
    edited 2014-10-20 11:52
    AJ Milne wrote: »
    ...; it really did strike me how easily my son did take to the simplicity of it. (And as I explained to Jeff, I actually find their simplicity is part of their charm; as someone whose work mostly lives in application and protocol layers standing on those very towering stacks of drivers and O/S, dead simple and slow--but also incredibly low-power and reliable--is sometimes just a breath of fresh air). ...

    I wholeheartedly agree, the BASIC Stamp is an unparallelled teaching tool. Kudos to Parallax for keeping it going. The Stamp has been around and successful for a long time and is remarkably powerful for all its simplicity. It still is in a class by itself, how easy it is to get started and to get results, flash the LED, print "Hello".. I do appreciate your efforts to help it along and to keep it current.
Sign In or Register to comment.