JavaScript

Heater. · 2015-01-12 04:57

var &#65245; = {
    &#65165;: function () {
        return ("Hello world!");
    }
}

var msg = &#65245;.&#65165;();
console.log(msg);

ctwardell · 2015-01-12 05:45

What's with the strange characters...

var msg = ﻝ.ﺍ(); has the I and J backwards, should be J.I(), it comes out the way in Notepad++ if I cut/paste your code.

C.W.

Heater. wrote: »

var &#65245; = {
    &#65165;: function () {
        return ("Hello world!");
    }
}

var msg = &#65245;.&#65165;();
console.log(msg);

Heater. · 2015-01-18 12:35

Ah, well spotted, good question. Actually you caught me out there. That was just a test that I was going to delete and then use in a new post on the "Official JavaScript war" thread or in one of my rants against the unicode menace. I got distracted and forgot about it.

Anyway what, it is...

By now someone should have run that code to see that it works, under node.js say, and had a look at it in a hex editor to see what is in there. But here we go, I'll spill the beans:

I was recently checking which characters were valid in JavaScript symbol names. Turns out that rather a lot of the defined unicode characters can be used. This is perhaps cool if you want to us greek letters in your code:

    var &#960; = Math.PI;

You could use this in minifying and obfuscating JavaScript sent to the browser.

There are a few articles exploring uses of unicode symbol names in JavaScript but have yet to find one that gets on the my next experiment:

So then I thought what about unicode characters from languages that are written right to left instead of left to right? Arabic, or Hebrew letters say. Editors, browsers and such programs that display unicode are supposed to reverse their rendering when they meet characters from "backwards" character sets. This could get interesting, I thought, and so it did.

First I tried a simple case using the Arabic letter "FEH" ف.

var &#1601; = (2 + 3) * (3 + 3)
console.log(&#1601;);

This is valid JS and runs just fine. It is actually typed into my editor in the normal order of course. But the editor dutifully reverses it's rendering direction in some unpredictable ways. I soon found that the nano, vim and kate editors on Linux all show the expression in the same strange way. Entering that into the node.js command line, or the Linux shell also does some weird reversing tricks as you try and cursor left/right over the line.

OK, what about a weirder example. The following is using Arabic JAM(ﻝ) as an object name and arabic IAM(ﺍ)(I think) as a method name. Sure enough our editors reverse the object.method into method.object when displaying the source code. The code is valid and runs just fine. Try it out under node.js or put in in a web page script.

var &#65245; = {
    &#65165;: function () {
        return ("Hello world!");
    }
}

var msg = &#65245;.&#65165;();
console.log(msg);

I chose those characters as they look so much like I and J. For maximum confusion potential.

OK. "What about browsers?" I thought. I checked the code into github and sure enough if you browse the code on the github pages it shows it all nicely backwards. Great! https://github.com/ZiCog/secure_express_demo/blob/master/feh.js

And so it came to this thread to see how the Parallax forums handle it. Seems just fine. There must be some scope for fun with this...:)

P.S. Notepad++ seems to be broken. As is my Sublime Text editor. But then unicode breaks everything.

ctwardell · 2015-01-31 12:19

Very interesting Heater.

I assumed it was something like that but didn't have time to really dig into it, thanks for the explanation.

C.W.

Heater. · 2015-02-18 08:24

Sorry folks, ignore this post. I just have an urgent need to save this code snippet from a public internet machine before I forget how I arrived at this solution.

start
  = expression

expression
  = l:multiplicative "+"  r:expression { return l + r; }
  / l:multiplicative "-"  r:expression { return l - r; }
  / multiplicative

multiplicative
  = l:logitive "*" r:multiplicative { return l * r; }
  / l:logitive "/" r:multiplicative { return l / r; }
  / logitive

logitive
  = l:primary "|" r:logitive { return l | r; }
  / primary

primary
  = integer
  / "(" e:expression ")" { return e; }

integer "integer"
  = digits:[0-9]+ { return parseInt(digits.join(""), 10); }

start
  = expression

expression
  = l:multiplicative "+"  r:expression
    {
      return {operator:"+", left:l, right:r}
    }
  / l:multiplicative "-"  r:expression
    {
      return {operator:"-", left:l, right:r}
    }
  / multiplicative

multiplicative
  = l:logitive "*" r:multiplicative
    {
      return {operator:"*", left:l, right:r}
    }
  / l:logitive "/" r:multiplicative
    {
      return {operator:"/", left:l, right:r}
    }
  / logitive

logitive
  = l:primary "|" r:logitive
    {
      return {operator:"|", left:l, right:r}
    }
  / primary

primary
  = integer
  / "(" e:expression ")"
    {
      return e;
    }

integer "integer"
  = digits:[0-9]+ { return parseInt(digits.join(""), 10); }

Electrodude · 2015-02-18 13:58

Would you stop working on that before I'm forced to write my Spin compiler in javascript?!?

Can a PegJS parser extend itself at runtime? If not, then crisis averted!

Heater. · 2015-02-18 15:54

Electrodude,

Funny you should say that...

I'm not sure I can stop working on this. It's got me a little obsessed.

Since I posted the above I managed to get home, extend the thing to handle most Spin constant expressions and post it into my pasm.js parser repo on github. It's almost at the point where it can parse all of the Spin DAT section syntax. The output is an abstract syntax tree that can be used to generate actual PASM binary instructions.

To be a useful Spin compatible PASM assembler I need to be able to parse CON blocks but I think that is easy enough now that I have constant expressions mostly working. Of course then I need to be able to parse OBJ statements as well so as to pull in constants from other objects.

Now, I have no intention of writing a Spin compiler. But I can see I might be attracted to continuing on the parser path so as to be able to parse all of the Spin syntax. That would be useful already for openspin.js or perhaps adding preprocessing to Spin.

So if you are volunteering to help out with the Spin grammar definition for pegjs that would be great:)

Here is the pasm.js repository https://github.com/ZiCog/pasm.js. Have a look in dat-grammar.pegjs, at the end are the grammar rules for constantExpression. Not quite finished yet but mostly working already. There are some instructions there as to how to run the test I have for it.

"Can a PegJS parser extend itself at runtime?" Hmm...probably not. It's a node.js module that exports a parser function. I don't think we can mess with it at run time. But given the way you can include JS in the parser rules (i.e. customize the syntax matching rules with JS) and even return JS functions in the syntax tree structure pretty much anything is possible!

Electrodude · 2015-02-18 16:11

A few weeks ago I was looking at LPeg, which is basically PegJS for Lua (I'm not sure which came first). I gave up on it after I discovered it didn't support runtime extension. As much as I prefer Lua over JS, I must admit that PegJS's syntax is much more concise and readable.

I think I don't realize how much I'm asking for when I want a self-extending language. After a Extended Spin file (language needs an actual name!) declares that it's Extended Spin, a DSL (Domain Specific Language, although a different term might be better) block becomes available, with syntax identical to that of OBJ blocks. It declares new Spin blocks and specifies a parser and compiler for them. They should be written in a compilable scripting language, like Lua or JS.

DSL
  FTH : "forth.lua" ' or .js, or whatever

PUB main

  cognew(@entry, @forthmain)

FTH

: forthmain forth inlineable? if yay then ;

DAT
        org 0
entry   ' ...

EDIT:
One way to allow run-time parser extension is to have a main parser split the program into Spin blocks and feed each block to the appropriate parser/compiler. Each block compiler can then work however it wants. Any line starting with a previously-declared block name and then a space marks the beginning of a block. This method looks very promising. Ideally, compiler plugins should be able to be written in a variety and combination of scripting and non-scripting languages.

Heater. · 2015-02-18 16:52

Elecrodude,

I like Lua. I have no idea which came first pegjs or lpeg. Pegjs was made by a Czech guy who wanted to write a compiler for his own language. He probably never heard of lpeg.

I have no idea where all this leads but I have some random thoughts:

1) We already have two languages in a Spin file: Spin and PASM. I'm not sure I want to see more mixed in. Especially since many cry out for a pre-processor which itself would be a different language again.

2) I like your idea of splitting Spin objects into separate blocks each handled by their own parser/compiler. I already thought that may be a way to go with what I am doing.

To be honest I have no expertise in all this parsing/compiling business. Having a statically defined languages seems hard enough never mind a language that can redefine itself!

I will be happy if I can generate binary instructions out of PASM using pegjs and some JS.

I can the use that as the assembler for the output of my TINY compiler. Which itself will have to be rewritten in JS.

Electrodude · 2015-02-18 21:59

The main reason I want to support lots of languages is because I love how easy it is to link Spin and PASM together; if only it were that easy for other languages too. It's nearly impossible to link C with Spin or even C with PASM and it's a mess to embed other languages in Spin like Forth or Float32Full.FFunc-ese.

I would prefer to not have a preprocessor if possible. Instead of #if, I'll either have a "static if" like D has, or just have a normal "if" and expect the optimizer to realize that it's static. Instead of #include, I'm going to have a import("path") that compiles the other file separately (to avoid headaches) but then dumps all of the symbols from the other file into the current file's namespace, or under a sub-namespace, and links everything together later as if it were #included. There will be a way to mess with constants in an imported file, probably just importedfile#constant = value. Instead of #define, PRI methods will be automatically inlined when appropriate. There are several other things, though, notably those macros that can't be done with an inlined PRI method, that I'm not sure yet how to do without a preprocessor.

The main compiler program should probably do no more than split the program into blocks, feed each block to the appropriate compiler extension, keep calling symbol resolver functions until all symbols are resolved or until an unresolvable symbol is found (resulting in an error), optimize the now-complete code tree, come up with sizes and final addresses for everything, and finally write all data into the appropriate place. There would be a directory somewhere full of .so files that describe parts of the compiler. The standard Spin blocks would be defined in one (possibly statically linked) library, while there could be other libraries specifically for embedding commonly used languages like C in Spin, and others for interfacing Lua or JS compiler descriptions. That way, the most commonly used languages, like Spin or PASM, can have their compilers written in C/C++, while more rarely used/application specific languages can have their compilers written in Lua or JS (supposing you have the Lua or JS interface library).

I have no expertise in parsing or compiling either, or many of the other skills I'll be needing for this project. The most I can boast is that I successfully got flex/bison to choke its way through an example file for a C-like language I'm designing (completely unrelated to the propeller, my laptop probably isn't powerful enough for it), but it didn't actually make an AST or anything - it just didn't throw any syntax errors.

Heater. · 2015-02-19 01:54

Electrodude,

Yes, the way PASM and Spin are intermingled is just wonderful. So simple to use.

I also think using a preprocessor with "#define", "#ifdef" can get ugly fast. You have some interesting ideas for alternatives there.

Getting Spin, PASM, C, Forth etc to work together like that seems like a mammoth task. You need to fire up a Spin interpreter and a Forth engine etc as and when required. You have to juggle with all the different ways to use objects, functions words from language to language.

One worry I have about such a polyglot system is that one can end up with an ugly mess of different syntaxes all mixed up in the same source. As happens with web pages. See example below that mixes up HTML, CSS, PHP, EJS and SQL. Blech!

Keeping the languages separated into blocks seem like a good idea.

I vaguely remember looking into flex/bison a long while back and deciding that it was far to complex for me and I did not understand anything. I was surprised how easy it is to get going with pegjs.

Edit: I just took a peek at this flex/bison introductory article: http://gnuu.org/2009/09/18/writing-your-own-toy-compiler/4/. Ouch!

<!DOCTYPE html PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML>
   <HEAD>
       <style>
           body {
               background-color: linen;
           }
           h1 {
               color: maroon;
               margin-left: 40px;
           } 
        </style>   
   </HEAD>
<BODY>
    <P>This is very minimal "hello world" HTML document.</P>
    <?php 
        for ($x = 0; $x <= 10; $x++) {
            echo "console.log('The number is:', x); <br>";
        } 
        $sql = "INSERT INTO MyGuests (firstname, lastname, email) VALUES ('John', 'Doe', 'john@example.com')";
    ?>
    <script type="text/javascript">
        //<![CDATA[
        var x = 10;
        if (x < 5) {
            doSomething(x);
            <?php <p> echo "Something $x" ?> 
        }
        //]]>
    </script>
    <P>May name is <%firtsName%> <%lastName%></P>  
</BODY>
</HTML>

Heater. · 2015-02-19 13:09

Drats, I discovered that pegjs was taking forever to parse bracketed sub-expressions. Like "x and ((((3)))) and y" Might be the way I have specified the grammar. I'm not sure. So for now I don't handle bracketed sub-expresions:

https://github.com/ZiCog/pasm.js[/url]

I hope there is a way around this else my PASM assembler in JS is doomed.

Electrodude · 2015-02-19 13:47

A quick google recommends using pegjs --cache

Heater. · 2015-02-19 14:21

Electrodude,

Wow, you got it. Thank you. Check this out:

What I was doing:

$ pegjs  dat-grammar.pegjs dat-parser.js
$ time node pasm.js
real    0m43.644s
user    0m43.620s
sys     0m0.020s

What happens with --cache:

pegjs --cache dat-grammar.pegjs dat-parser.js
real    0m0.148s
user    0m0.148s
sys     0m0.000s

And there it is right in the pegjs documentation:

--cache
Makes the parser cache results, avoiding exponential parsing time in pathological cases but making the parser slower.

And there is even a check box on the interactive page at pegjs.org/online

Exactly the exponential time problem I had. I was just getting too tired and stupid to find that after fighting with this thing all day.

Thanks again. No we are in business again.

Heater. · 2015-02-19 14:35

I should have realized immediately. This is the same problem as the recursive fibo() algorithm we discussed on the JavaScript war thread recently. Takes forever, until you add caching of intermediate results.

New turbo-charged pasm.js is now up on github.

Electrodude · 2015-02-20 09:49

Posting a link to this somewhere so people know about it will probably encourage me to actually work on it: https://github.com/electrodude/esc

It compiles, but the only interesting thing it does so far is dlopen a .so file that will eventually be a compiler module and then instantiate a subclass of CompilerModule defined in the .so file. I don't even know that it actually does this - all I know is that it doesn't throw any errors.

I'll start a new thread about this soon.

EDIT: We've been having this discussion in the test forum all this time?

Heater. · 2015-02-20 10:04

Electrodude,

We've been having this discussion in the test forum all this time?

Ha! I thought about that too. Let's just say its a "test discussion" we can have the real discussion later if this one works

As it is I have just been playing and experimenting with pegjs. If this ever gets near being a complete PASM parser and code generator I might put on the Prop forum or somewhere.

Heater. · 2015-02-21 07:36

So I managed to make a grammar description of Spin CON blocks in pegjs. It looks like this:

start
  = (conAssignments / conEnumerations)*  

conAssignments
  = "CON"? white* constantAssignment white* "," white* conAssignmentList EOL 
  / "CON"? white* constantAssignment EOL
  / "CON"? white* EOL

conAssignmentList
  = constantAssignment white* "," white* conAssignmentList
  / constantAssignment

constantAssignment
  = symbol white* "=" white* constantExpression

conEnumerations
  = "CON"? white* "#" white* constantExpression white* "," white* conEnumerationList EOL
  / "CON"? white* "#" white* constantExpression EOL
  / "CON"? white* conEnumerationList EOL
  / "CON"? white* EOL

conEnumerationList 
  = s:symbol white* o:conOffset? white* "," white* sd:conEnumerationList
  / s:symbol white* o:conOffset?

conOffset
  = "[" white* e:constantExpression white* "]"

constantExpression
  = d:[0-9]+
  {
    return parseInt(d.join(""));
  }

symbol
  = s:[a-zA-Z_]+
  {
    return s.join("");
  }

white
  = [ ]+

EOL
  = white* "\n"

The definitions of constantExpression and symbol are dummies here. I have the real constantExpression almost done. It's huge!

JavaScript

Comments