Shop OBEX P1 Docs P2 Docs Learn Events
16-bit characters in Parallax example Spin code — Parallax Forums

16-bit characters in Parallax example Spin code

Rick314Rick314 Posts: 11
edited 2013-09-20 10:41 in Propeller 1
I was just trying to use Cygwin on my Windows 7 64-bit PC with Propeller Tool v1.3.2 installed, to grep the provided ...\Library\*.spin example source code:

$ pwd
/cygdrive/c/Program Files (x86)/Parallax Inc/Propeller Tool v1.3.2/Library
$ grep "Mouse" Mouse.spin
(no result)

The problem is that the provided Mouse.spin file uses 16-bit characters:

$ cat Mouse.spin | od -t x1 -a
...
0000140 20 00 50 00 53 00 2f 00 32 00 20 00 4d 00 6f 00
sp nul P nul S nul / nul 2 nul sp nul M nul o nul
0000160 75 00 73 00 65 00 20 00 44 00 72 00 69 00 76 00
u nul s nul e nul sp nul D nul r nul i nul v nul
...

This shows text near the top of the file that says " PS/2 Mouse Driv" in 16-bit characters. This was a big surprise! I thought all .spin files used an 8-bit character set. It was especially a surprise seeing it in the example files shipped with Propeller Tool. (I am aware of Windows "findstr" to use in this case instead, but am concerned about other problems using Linux/Cygwin tools for source code management in general.)

#1) Is this a problem with my Propeller Tool installation, a problem with something else, or to be expected?

#2) What are all the character encodings I might expect to find in .spin files?

Comments

  • Duane DegnDuane Degn Posts: 10,588
    edited 2013-09-19 21:18
    Rick314 wrote: »
    I#1) Is this a problem with my Propeller Tool installation, a problem with something else, or to be expected?

    #2) What are all the character encodings I might expect to find in .spin files?

    #1 It's to be expected. There's a term for these characters. Here's my (probably incorrect) guess: UTF-16. I don't think that's (not sure it's) right. I'll find out and edit this post unless someone else beats me to it.

    It's very common to have character sets with 16-bits. It's the special characters in the Parallax font that requires the extra bits. If none of the special characters are used then the file is stored as ASCII characters (I think). Edit: I'm not sure, but I think it's stored as UTF-8 (which is backwards compatible with ASCII). (See Phil's post below.)

    I'll find a link to the type of characters they are and edit this post.

    Edit: Now, I'm not so sure. I still think, the files are stored as UTF-16 characters but I just recalled where I had to use UTF-16 encoded characters (see below).

    Edit: The reason I remembered the UTF-16 was I wanted to generate a bunch of title overlays in Adobe Premiere. I had become more comfortable programming the Propeller than a PC so I wrote a program in Spin to generated the desired files and stored them on a SD card. The Adobe titles were encoded as UTF-16.

    Edit: Hopefully someone with more than guesses will reply. (See Phil's post below.)
  • Roy ElthamRoy Eltham Posts: 3,000
    edited 2013-09-19 21:54
    PropTool saves files in 16bit Unicode format that I believe is the older UCS-2 format (only 16bit characters ever). UTF-16 is similar to UTF-8 in that it can have single characters that are more than one "unit" (i.e. some characters are 2 16bit values in sequence).

    UTF-16 is backward compatible with UCS-2 for all 16bit characters, so you can call it UTF-16 if you want, but be aware that the compiler won't handle characters beyond the first 64k page.

    Most Spin files will be in this 16bit Unicode format or in simple ASCII format. You might find some rare files in UTF-8 format.
  • Rick314Rick314 Posts: 11
    edited 2013-09-19 23:00
    Thanks for both answers -- They were just what I needed. I will adjust my other code tools accordingly.
  • Phil Pilgrim (PhiPi)Phil Pilgrim (PhiPi) Posts: 23,514
    edited 2013-09-20 00:00
    Spin files are saved as 8-bit ASCII unless they include any of the special Spin characters, in which case they're saved as UTF-16 (little endian). In the latter case, the file is prepended with a two-character Byte Order Mark, $ff $fe, which you can use to distinguish between the two encodings. I know of no cases where UTF-8 is used for Spin files.
  • Rick314Rick314 Posts: 11
    edited 2013-09-20 00:10
    ... the file is prepended with a two-character Byte Order Mark, $ff $fe ...
    Thanks PhiPi, I saw those two bytes and wondered what they were. It all makes sense now.
  • Roy ElthamRoy Eltham Posts: 3,000
    edited 2013-09-20 01:27
    Phil,
    I didn't know that about the PropTool auto selecting the save format based on the characters used (that part is all in the delphi code I haven't seen). I guess I mostly work with files that have special characters that trigger the 16bit unicode format.

    UTF-8 is now possible to be used with OpenSpin. I came about because of the preprocessor outputting it's results in UTF-8 form, so I had to add code for converting it from UTF-8 into PASCII for the actual compiler code. So since it's now possible, you might see some UTF-8 spin files, rarely. OpenSpin does require them to start with the UTF-8 BOM sequence: 0xEF, 0xBB, 0xBF.
  • pedwardpedward Posts: 1,642
    edited 2013-09-20 10:41
    Roy, couldn't a reasonable default be chosen if a BOM isn't present? What I read says the BOM is optional, and I've never encountered one before (had to look it up).

    IIRC, UTF-8 was supposed to make ASCII compatibility transparent. I wrote a small state machine to parse raw into UTF-8 for SimpleIDE, and didn't encounter any issues by following the basic rules published.
Sign In or Register to comment.