Shop OBEX P1 Docs P2 Docs Learn Events
Saving a .spin file as plain non-Unicode text? bst, Propeller Tool, PropellerIDE — Parallax Forums

Saving a .spin file as plain non-Unicode text? bst, Propeller Tool, PropellerIDE

mmgoodmmgood Posts: 19
edited 2016-03-31 02:59 in Propeller 1
I'm hoping I'm missing something obvious. I'm trying to save out .spin files to .txt files so I can read them back in to editing tools that don't handle the Propeller Unicode font properly. I've looked here in the Forum but found nothing explicit so far; I've seen someone mention the possibility, but no details.

I've been working with bst since I'm on a Linux box; I've also tried Propeller Tool 1.3.2 and PropellerIDE. Haven't tried SimpleIDE since it seems to require a project file (and, I admit, I haven't read the documentation for it yet, so I might be wrong there). I've tried changing the font to something more generic like Inconsolata or even Courier and saving out; I've looked in "Preferences" to see if there's a plaintext save option; I've tried "Save As" while specifying .txt extension in the file-type-to-show picker (a long shot, but what the heck).

What am I missing? Thanks.

Mike

[EDIT: tl;dr Per Electrodude: Try UTF8 save option in bst: Tools-->IDE Preferences-->IDE Preferences [yes, really. :P]; click checkbox "Save as UTF8"... See below for comments.]

Comments

  • kwinnkwinn Posts: 8,697
    mmgood wrote: »
    I'm hoping I'm missing something obvious. I'm trying to save out .spin files to .txt files so I can read them back in to editing tools that don't handle the Propeller Unicode font properly. I've looked here in the Forum but found nothing explicit so far; I've seen someone mention the possibility, but no details.

    I've been working with bst since I'm on a Linux box; I've also tried Propeller Tool 1.3.2 and PropellerIDE. Haven't tried SimpleIDE since it seems to require a project file (and, I admit, I haven't read the documentation for it yet, so I might be wrong there). I've tried changing the font to something more generic like Inconsolata or even Courier and saving out; I've looked in "Preferences" to see if there's a plaintext save option; I've tried "Save As" while specifying .txt extension in the file-type-to-show picker (a long shot, but what the heck).

    What am I missing? Thanks.

    Mike

    Have you tried Select All in BST and then pasting it into whatever program you want to use?

    Another option would be to install an ascii printer on your system and then print the code to a file.
  • mmgoodmmgood Posts: 19
    edited 2016-03-31 02:53
    Kwinn: I appreciate your prompt and courteous reply, and your suggestions.

    I found a workaround. I can read a .spin file, as-is, into GEdit, which reads the font just fine, and then do, e.g., printing with "View-->Highlight Mode" set to "Plain Text". No illegible grayscale, no color blocks obscuring things. When I want syntax-directed color for editing, I select View-->Highlight Mode-->Sources-->Spin and it's all good. Thanks to Cody (SRLM) for the Spin colorization*, which I tweaked to suit me.

    *http://forums.parallax.com/discussion/140139/gedit-spin-syntax-highlighting-solution

    I don't know how to link to him in-thread but I'll be sure to thank him.

    I haven't gotten cut&paste to do the right thing for me. And installing an ASCII printer seems, forgive me for saying so, rather off-the-wall and 1980's :)

    Cheers, and thanks again.
  • ElectrodudeElectrodude Posts: 1,661
    edited 2016-03-31 02:42
    In BST, make sure "Tools -> IDE Preferences -> IDE Preferences tab -> Save as UTF8" is checked. Then, it will save everything as UTF-8. A UTF-8 file is identical to an ASCII file if there aren't any non-ASCII characters; otherwise, those characters that are greater than 127 (i.e. those that are non-ASCII) will appear as two or three characters of garbage, but all normal ASCII characters will be fine.
  • Electrodude: Cool! Thanks, will try that, just to have that option in my "back pocket."
  • Yeahp, completely missed that option. Thanks again.
  • ElectrodudeElectrodude Posts: 1,661
    edited 2016-03-31 02:55
    Your welcome!

    In my opinion, UTF-8 is the perfect character encoding. ASCII characters are unchanged, and non-ASCII characters use exclusively non-ASCII bytes. You can open a UTF-8 file in any ASCII-only editor that won't choke on bytes 128-255 (I've never encounted an ASCII-only editor with this problem), edit the ASCII parts and even the Unicode parts if you're very careful, and save the file, and all of your Unicode characters will survive. If you view a UTF-8 file containing non-ASCII characters in a terminal that doesn't do Unicode, you will only get a few characters of garbage where each Unicode character should be, and it won't trash your terminal.

    I use Linux too, and keep everything in Git repositories. The UTF-16 that Prop Tool saves everything in really annoys me.
  • FYI, Notepad++ can do the conversion too.

    https://notepad-plus-plus.org

    ===Jac
  • Heater.Heater. Posts: 21,230
    edited 2016-04-01 17:53
    This is a recurring problem. The UTF-16 thing of the Propeller tools is a pain.

    Of course unicode will be the undoing of us all. Source files should be in ASCII.

    Unicode allows one to right code like this Javascript example:
    let ﻝ = {
        ﺍ: function () {
            return ("Hello world!");
        }
    }
    
    let msg = ﻝ.ﺍ();
    console.log(msg);
    
    ف = (2 + 3) * (3 + 3)
    
    
    console.log(ف);
    
    Which runs just fine. But WTF?


    Edit: Hmm...Never occurred to me before but I wonder if one can write such unicode obfuscated code in Spin? Must run home and try it...
  • ElectrodudeElectrodude Posts: 1,661
    edited 2016-04-01 20:20
    Heater. wrote: »
    Hmm...Never occurred to me before but I wonder if one can write such unicode obfuscated code in Spin? Must run home and try it...

    OpenSpin could probably be made to do UTF-8 by having it still treat input as ASCII but allowing all bytes >= 128 in identifiers. Then, all codepoints >= 128 would be legal identifiers, possibly undesirably including things like zero-width spaces.
  • Heater.Heater. Posts: 21,230
    Ouch, no. Now you want mix up character encoding with the syntax and semantics of the language. Different character encodings would mean different things in different places in your code.


    The main, and only reason as far as I can tell, for Spin to use Unicode is to allow creation of schematic diagrams in the comments with graphical symbols.
  • ElectrodudeElectrodude Posts: 1,661
    edited 2016-04-01 21:22
    My dream Spin compiler (that lets you invent new syntax mid-parse and probably won't ever actually exist) converts everything to UTF-8 first and then feeds that through the lexer. That allows it to do unicode, and still allows it to take the input stream in byte-sized pieces that it simply feeds through a state machine. If it didn't do that, I can't see how it would be able to just take the next byte and look it up in an array to see what to do. Unless I were to allocate tons of 65536*sizeof(pointer) arrays (yeah right), there would be no other good way.

    Roughly like this C++:
    struct Transition;
    
    struct State
    {
    	Transition* next[256];
    };
    
    struct Transition
    {
    	// override me with actual parser stuff
    	virtual State* next(State* curr, char* p) { return next; }
    private:
    	State* next;
    };
    
    void lexer(char* p, State* state)
    {
    	while (*p)
    	{
    		Transition* t = state->next[(unsigned char)*p];
    		state = t->next(state, p);
    		if (!state) { throw ParserError(t, p); }
    		p++;
    	}
    }
    
  • Heater.Heater. Posts: 21,230
    Electrodude,

    But you can do what you want even in UTF-32. The way you have described it that would require a "next" array with 4 billion elements. BUT rather than test every input character with an array look up one would test for ranges of characters. Like isDigit(), isAlphaNum(), isOperator() and so on.

    In this way one can handle vast ranges of symbols. For example isValidVariableNameSymbol () might do something like:
    symbol = readNextSourceChar()
    if symbol >=  minVariableNameSymbolCode and symbol <= minVariableNameSymbolCode then
        doSomethingWithVariableNameSymbol(symbol)
    else
        doSomethingElse(symbol)
    end
    

    Basically, to do what you want does not require a huge array look up table or depend on the character encoding being used.

    A compile that allows syntax to change under control of the program being compiled sounds rather ambitious. That implies the source being compiled would have to contain the source of the new compiler functionality within itself. I'm not sure who would want to write such complex programs.

    Still if you want to try it one way would be to write yourself a compiler for your proposed language, call it X, in Javascript. Then the X source being compiled could contain new Javascript code that overrides or adds to the existing X compiler code on the fly.

    Sounds hairy scary !







  • Yes, that can be done for a language whose entire grammar and lexicon is known when the compiler is compiled. But how do you do it for a language that lets you define any grammar you like (new operators, new assembly instructions for P1V and such with funny options and flags, assembler macros, etc.) mid-compile? At least for my run-time-extensible compiler, it seems like it would be so much easier to deal with the UTF-8 directly. Here's my reasoning:

    Regular arrays obviously can't be used; they'd be 4GB each. I don't think hash tables make sense here: since every entry would be filled, it might as well be a simple array. This leaves some sort of tree as the only viable option. A comparison tree, the data structure equivalent of the code you mentioned, would work but might not be the most efficient thing possible. A tree that is optimized for ASCII values, which make up 99.99% of source code, but can somehow still accomodate Unicode, would be ideal. Instead of coming up with my own compression scheme to stuff UTF-32 codepoints into smaller data sizes (a byte per codepoints seems very reasonable), I might as well just use UTF-8. Not only is it one less thing that I need to design, but it's 100% compatible with ASCII and there's a good chance that the file being compiled is already encoded as UTF-8.

    Also, I'm considering caching the symbol table in the lexer. Every time a new symbol is encountered, the symbol will be added to the lexer state machine, and the next time the lexer encounters that symbol, it will already have a pointer to the symbol object without needing to look anything up in a separate symbol table. Some trickery would need to happen when symbols go out of scope, especially if I decide to allow symbol shadowing, so this might be worth it only for global symbols or not at all. This is another reason I don't want to do comparisons - the lexer state machine would get way too complex.


    You can write your own self-extending compiler in JS. I'm writing mine in C++ for now, with a data structure based state machine.



    For my compiler, I'll define all of Spin in a file I'll ship with the compiler, and I'll make the compiler able to compile language definitions down to something that can be loaded from disk quickly. That way, it won't have to bootstrap Spin every time you run it. For compiling normal Spin programs, nothing special would need to be done, and the extended-Spin compiler definition would contain syntax for adding new operators and assembly macros that would be generally useful for Spin. The user won't have to worry about the bootstrap language unless he wants to write compiler extensions.
Sign In or Register to comment.