JavaScript
Heater.
Posts: 21,230
var ﻝ = { ﺍ: function () { return ("Hello world!"); } } var msg = ﻝ.ﺍ(); console.log(msg);
Comments
var msg = ﻝ.ﺍ(); has the I and J backwards, should be J.I(), it comes out the way in Notepad++ if I cut/paste your code.
C.W.
Anyway what, it is...
By now someone should have run that code to see that it works, under node.js say, and had a look at it in a hex editor to see what is in there. But here we go, I'll spill the beans:
I was recently checking which characters were valid in JavaScript symbol names. Turns out that rather a lot of the defined unicode characters can be used. This is perhaps cool if you want to us greek letters in your code:
You could use this in minifying and obfuscating JavaScript sent to the browser.
There are a few articles exploring uses of unicode symbol names in JavaScript but have yet to find one that gets on the my next experiment:
So then I thought what about unicode characters from languages that are written right to left instead of left to right? Arabic, or Hebrew letters say. Editors, browsers and such programs that display unicode are supposed to reverse their rendering when they meet characters from "backwards" character sets. This could get interesting, I thought, and so it did.
First I tried a simple case using the Arabic letter "FEH" ف. This is valid JS and runs just fine. It is actually typed into my editor in the normal order of course. But the editor dutifully reverses it's rendering direction in some unpredictable ways. I soon found that the nano, vim and kate editors on Linux all show the expression in the same strange way. Entering that into the node.js command line, or the Linux shell also does some weird reversing tricks as you try and cursor left/right over the line.
OK, what about a weirder example. The following is using Arabic JAM(ﻝ) as an object name and arabic IAM(ﺍ)(I think) as a method name. Sure enough our editors reverse the object.method into method.object when displaying the source code. The code is valid and runs just fine. Try it out under node.js or put in in a web page script. I chose those characters as they look so much like I and J. For maximum confusion potential.
OK. "What about browsers?" I thought. I checked the code into github and sure enough if you browse the code on the github pages it shows it all nicely backwards. Great! https://github.com/ZiCog/secure_express_demo/blob/master/feh.js
And so it came to this thread to see how the Parallax forums handle it. Seems just fine. There must be some scope for fun with this...:)
P.S. Notepad++ seems to be broken. As is my Sublime Text editor. But then unicode breaks everything.
I assumed it was something like that but didn't have time to really dig into it, thanks for the explanation.
C.W.
Can a PegJS parser extend itself at runtime? If not, then crisis averted!
Funny you should say that...
I'm not sure I can stop working on this. It's got me a little obsessed.
Since I posted the above I managed to get home, extend the thing to handle most Spin constant expressions and post it into my pasm.js parser repo on github. It's almost at the point where it can parse all of the Spin DAT section syntax. The output is an abstract syntax tree that can be used to generate actual PASM binary instructions.
To be a useful Spin compatible PASM assembler I need to be able to parse CON blocks but I think that is easy enough now that I have constant expressions mostly working. Of course then I need to be able to parse OBJ statements as well so as to pull in constants from other objects.
Now, I have no intention of writing a Spin compiler. But I can see I might be attracted to continuing on the parser path so as to be able to parse all of the Spin syntax. That would be useful already for openspin.js or perhaps adding preprocessing to Spin.
So if you are volunteering to help out with the Spin grammar definition for pegjs that would be great:)
Here is the pasm.js repository https://github.com/ZiCog/pasm.js. Have a look in dat-grammar.pegjs, at the end are the grammar rules for constantExpression. Not quite finished yet but mostly working already. There are some instructions there as to how to run the test I have for it.
"Can a PegJS parser extend itself at runtime?" Hmm...probably not. It's a node.js module that exports a parser function. I don't think we can mess with it at run time. But given the way you can include JS in the parser rules (i.e. customize the syntax matching rules with JS) and even return JS functions in the syntax tree structure pretty much anything is possible!
I think I don't realize how much I'm asking for when I want a self-extending language. After a Extended Spin file (language needs an actual name!) declares that it's Extended Spin, a DSL (Domain Specific Language, although a different term might be better) block becomes available, with syntax identical to that of OBJ blocks. It declares new Spin blocks and specifies a parser and compiler for them. They should be written in a compilable scripting language, like Lua or JS.
EDIT:
One way to allow run-time parser extension is to have a main parser split the program into Spin blocks and feed each block to the appropriate parser/compiler. Each block compiler can then work however it wants. Any line starting with a previously-declared block name and then a space marks the beginning of a block. This method looks very promising. Ideally, compiler plugins should be able to be written in a variety and combination of scripting and non-scripting languages.
I like Lua. I have no idea which came first pegjs or lpeg. Pegjs was made by a Czech guy who wanted to write a compiler for his own language. He probably never heard of lpeg.
I have no idea where all this leads but I have some random thoughts:
1) We already have two languages in a Spin file: Spin and PASM. I'm not sure I want to see more mixed in. Especially since many cry out for a pre-processor which itself would be a different language again.
2) I like your idea of splitting Spin objects into separate blocks each handled by their own parser/compiler. I already thought that may be a way to go with what I am doing.
To be honest I have no expertise in all this parsing/compiling business. Having a statically defined languages seems hard enough never mind a language that can redefine itself!
I will be happy if I can generate binary instructions out of PASM using pegjs and some JS.
I can the use that as the assembler for the output of my TINY compiler. Which itself will have to be rewritten in JS.
I would prefer to not have a preprocessor if possible. Instead of #if, I'll either have a "static if" like D has, or just have a normal "if" and expect the optimizer to realize that it's static. Instead of #include, I'm going to have a import("path") that compiles the other file separately (to avoid headaches) but then dumps all of the symbols from the other file into the current file's namespace, or under a sub-namespace, and links everything together later as if it were #included. There will be a way to mess with constants in an imported file, probably just importedfile#constant = value. Instead of #define, PRI methods will be automatically inlined when appropriate. There are several other things, though, notably those macros that can't be done with an inlined PRI method, that I'm not sure yet how to do without a preprocessor.
The main compiler program should probably do no more than split the program into blocks, feed each block to the appropriate compiler extension, keep calling symbol resolver functions until all symbols are resolved or until an unresolvable symbol is found (resulting in an error), optimize the now-complete code tree, come up with sizes and final addresses for everything, and finally write all data into the appropriate place. There would be a directory somewhere full of .so files that describe parts of the compiler. The standard Spin blocks would be defined in one (possibly statically linked) library, while there could be other libraries specifically for embedding commonly used languages like C in Spin, and others for interfacing Lua or JS compiler descriptions. That way, the most commonly used languages, like Spin or PASM, can have their compilers written in C/C++, while more rarely used/application specific languages can have their compilers written in Lua or JS (supposing you have the Lua or JS interface library).
I have no expertise in parsing or compiling either, or many of the other skills I'll be needing for this project. The most I can boast is that I successfully got flex/bison to choke its way through an example file for a C-like language I'm designing (completely unrelated to the propeller, my laptop probably isn't powerful enough for it), but it didn't actually make an AST or anything - it just didn't throw any syntax errors.
Yes, the way PASM and Spin are intermingled is just wonderful. So simple to use.
I also think using a preprocessor with "#define", "#ifdef" can get ugly fast. You have some interesting ideas for alternatives there.
Getting Spin, PASM, C, Forth etc to work together like that seems like a mammoth task. You need to fire up a Spin interpreter and a Forth engine etc as and when required. You have to juggle with all the different ways to use objects, functions words from language to language.
One worry I have about such a polyglot system is that one can end up with an ugly mess of different syntaxes all mixed up in the same source. As happens with web pages. See example below that mixes up HTML, CSS, PHP, EJS and SQL. Blech!
Keeping the languages separated into blocks seem like a good idea.
I vaguely remember looking into flex/bison a long while back and deciding that it was far to complex for me and I did not understand anything. I was surprised how easy it is to get going with pegjs.
Edit: I just took a peek at this flex/bison introductory article: http://gnuu.org/2009/09/18/writing-your-own-toy-compiler/4/. Ouch!
Wow, you got it. Thank you. Check this out:
What I was doing:
What happens with --cache:
And there it is right in the pegjs documentation: And there is even a check box on the interactive page at pegjs.org/online
Exactly the exponential time problem I had. I was just getting too tired and stupid to find that after fighting with this thing all day.
Thanks again. No we are in business again.
New turbo-charged pasm.js is now up on github.
It compiles, but the only interesting thing it does so far is dlopen a .so file that will eventually be a compiler module and then instantiate a subclass of CompilerModule defined in the .so file. I don't even know that it actually does this - all I know is that it doesn't throw any errors.
I'll start a new thread about this soon.
EDIT: We've been having this discussion in the test forum all this time?
As it is I have just been playing and experimenting with pegjs. If this ever gets near being a complete PASM parser and code generator I might put on the Prop forum or somewhere.