Saving a .spin file as plain non-Unicode text? bst, Propeller Tool, PropellerIDE
mmgood
Posts: 19
I'm hoping I'm missing something obvious. I'm trying to save out .spin files to .txt files so I can read them back in to editing tools that don't handle the Propeller Unicode font properly. I've looked here in the Forum but found nothing explicit so far; I've seen someone mention the possibility, but no details.
I've been working with bst since I'm on a Linux box; I've also tried Propeller Tool 1.3.2 and PropellerIDE. Haven't tried SimpleIDE since it seems to require a project file (and, I admit, I haven't read the documentation for it yet, so I might be wrong there). I've tried changing the font to something more generic like Inconsolata or even Courier and saving out; I've looked in "Preferences" to see if there's a plaintext save option; I've tried "Save As" while specifying .txt extension in the file-type-to-show picker (a long shot, but what the heck).
What am I missing? Thanks.
Mike
[EDIT: tl;dr Per Electrodude: Try UTF8 save option in bst: Tools-->IDE Preferences-->IDE Preferences [yes, really. :P]; click checkbox "Save as UTF8"... See below for comments.]
I've been working with bst since I'm on a Linux box; I've also tried Propeller Tool 1.3.2 and PropellerIDE. Haven't tried SimpleIDE since it seems to require a project file (and, I admit, I haven't read the documentation for it yet, so I might be wrong there). I've tried changing the font to something more generic like Inconsolata or even Courier and saving out; I've looked in "Preferences" to see if there's a plaintext save option; I've tried "Save As" while specifying .txt extension in the file-type-to-show picker (a long shot, but what the heck).
What am I missing? Thanks.
Mike
[EDIT: tl;dr Per Electrodude: Try UTF8 save option in bst: Tools-->IDE Preferences-->IDE Preferences [yes, really. :P]; click checkbox "Save as UTF8"... See below for comments.]
Comments
Have you tried Select All in BST and then pasting it into whatever program you want to use?
Another option would be to install an ascii printer on your system and then print the code to a file.
I found a workaround. I can read a .spin file, as-is, into GEdit, which reads the font just fine, and then do, e.g., printing with "View-->Highlight Mode" set to "Plain Text". No illegible grayscale, no color blocks obscuring things. When I want syntax-directed color for editing, I select View-->Highlight Mode-->Sources-->Spin and it's all good. Thanks to Cody (SRLM) for the Spin colorization*, which I tweaked to suit me.
*http://forums.parallax.com/discussion/140139/gedit-spin-syntax-highlighting-solution
I don't know how to link to him in-thread but I'll be sure to thank him.
I haven't gotten cut&paste to do the right thing for me. And installing an ASCII printer seems, forgive me for saying so, rather off-the-wall and 1980's
Cheers, and thanks again.
In my opinion, UTF-8 is the perfect character encoding. ASCII characters are unchanged, and non-ASCII characters use exclusively non-ASCII bytes. You can open a UTF-8 file in any ASCII-only editor that won't choke on bytes 128-255 (I've never encounted an ASCII-only editor with this problem), edit the ASCII parts and even the Unicode parts if you're very careful, and save the file, and all of your Unicode characters will survive. If you view a UTF-8 file containing non-ASCII characters in a terminal that doesn't do Unicode, you will only get a few characters of garbage where each Unicode character should be, and it won't trash your terminal.
I use Linux too, and keep everything in Git repositories. The UTF-16 that Prop Tool saves everything in really annoys me.
https://notepad-plus-plus.org
===Jac
Of course unicode will be the undoing of us all. Source files should be in ASCII.
Unicode allows one to right code like this Javascript example:
Which runs just fine. But WTF?
Edit: Hmm...Never occurred to me before but I wonder if one can write such unicode obfuscated code in Spin? Must run home and try it...
OpenSpin could probably be made to do UTF-8 by having it still treat input as ASCII but allowing all bytes >= 128 in identifiers. Then, all codepoints >= 128 would be legal identifiers, possibly undesirably including things like zero-width spaces.
The main, and only reason as far as I can tell, for Spin to use Unicode is to allow creation of schematic diagrams in the comments with graphical symbols.
Roughly like this C++:
But you can do what you want even in UTF-32. The way you have described it that would require a "next" array with 4 billion elements. BUT rather than test every input character with an array look up one would test for ranges of characters. Like isDigit(), isAlphaNum(), isOperator() and so on.
In this way one can handle vast ranges of symbols. For example isValidVariableNameSymbol () might do something like:
Basically, to do what you want does not require a huge array look up table or depend on the character encoding being used.
A compile that allows syntax to change under control of the program being compiled sounds rather ambitious. That implies the source being compiled would have to contain the source of the new compiler functionality within itself. I'm not sure who would want to write such complex programs.
Still if you want to try it one way would be to write yourself a compiler for your proposed language, call it X, in Javascript. Then the X source being compiled could contain new Javascript code that overrides or adds to the existing X compiler code on the fly.
Sounds hairy scary !
Regular arrays obviously can't be used; they'd be 4GB each. I don't think hash tables make sense here: since every entry would be filled, it might as well be a simple array. This leaves some sort of tree as the only viable option. A comparison tree, the data structure equivalent of the code you mentioned, would work but might not be the most efficient thing possible. A tree that is optimized for ASCII values, which make up 99.99% of source code, but can somehow still accomodate Unicode, would be ideal. Instead of coming up with my own compression scheme to stuff UTF-32 codepoints into smaller data sizes (a byte per codepoints seems very reasonable), I might as well just use UTF-8. Not only is it one less thing that I need to design, but it's 100% compatible with ASCII and there's a good chance that the file being compiled is already encoded as UTF-8.
Also, I'm considering caching the symbol table in the lexer. Every time a new symbol is encountered, the symbol will be added to the lexer state machine, and the next time the lexer encounters that symbol, it will already have a pointer to the symbol object without needing to look anything up in a separate symbol table. Some trickery would need to happen when symbols go out of scope, especially if I decide to allow symbol shadowing, so this might be worth it only for global symbols or not at all. This is another reason I don't want to do comparisons - the lexer state machine would get way too complex.
You can write your own self-extending compiler in JS. I'm writing mine in C++ for now, with a data structure based state machine.
For my compiler, I'll define all of Spin in a file I'll ship with the compiler, and I'll make the compiler able to compile language definitions down to something that can be loaded from disk quickly. That way, it won't have to bootstrap Spin every time you run it. For compiling normal Spin programs, nothing special would need to be done, and the extended-Spin compiler definition would contain syntax for adding new operators and assembly macros that would be generally useful for Spin. The user won't have to worry about the bootstrap language unless he wants to write compiler extensions.