Arc Forumnew | comments | leaders | submitlogin
2 points by Pauan 4413 days ago | link | parent

I just realized something... Nulan's macro system won't work in JS. The reason is because... well, first, here's a macro:

  (mac foo -> {bar qux corge})
This macro will replace (foo) with (bar qux corge). The key thing to note is that it uses the values of bar/qux/corge, not the symbols. Therefore, Nulan's macro system is completely hygienic, just like Kernel.

That works just fantastic in Racket, where you can splice arbitrary things into the AST, but that won't work in JS, where things generally have to be given a name.

I could, of course, change it so that it expands to the symbols bar, qux, and corge, and then it would work in JS... but... ew, lack of hygiene. It just wouldn't be the same language anymore.

But then I realized something else... to explain this, I'll need to explain how the Nulan compiler works. Firstly, Nulan doesn't have much of a concept of "symbol": macros expand to values (not symbols), quote is discouraged, etc.

But Nulan takes it a step further. The Nulan compiler will actually remove all symbols from the AST before evaling it in Racket.

How this works is, at compile-time, the compiler has this big dictionary which represents the environment. This environment maps symbols to mutable/immutable boxes.

If the symbol refers to an immutable box, the compiler will unbox it and splice it in directly. If it's a mutable box, it'll put the box into the AST, which will be unboxed at runtime.

This is faaaaaaaast. There's no symbol lookup, not even for globals, thanks to Nulan's hyper-static scope.

Because of this, I'm pretty sure Nulan will end up being significantly faster than Arc. I don't think it'll be faster than Racket though, because Racket's modules are lexical, so I'm sure Racket can do the same sort of optimization. I do think it'll allow Nulan to be just as fast as Racket even though Nulan doesn't use Racket's modules.

Anyways, the point is, symbols are bound to boxes, not to values. And then these boxes are inserted into the AST. That way you can have the same symbol with different meaning:

  (var foo 1)
  (def bar -> foo + 5)

  (var foo 3)
  (bar)
The above first creates a new box and binds it to "foo". Within the function "bar", the compiler replaces the symbol "foo" with the box.

Then it creates a new box and binds it to "foo", but the compiler has already replaced the "foo" inside the function "bar" with a box, so there's no problem, and thus the call to (bar) returns 6, rather than 8.

---

Okay, that works faaaantastic in Racket, but JS doesn't have boxes, and you can't inline values into the AST anyways. So what do I do? Well, there's a little trick I'm using...

I think I'll be able to explain it easiest just by showing what the above compiles into in JS:

  var foo = 1;
  var bar = function () { return foo + 5 };

  var foo2 = 2;
  bar();
Notice that the second call to "var" created a "foo2" variable, not a "foo" variable. If you created a third "foo" variable, it'd be "foo3", etc.

Effectively, it's saving all the different "versions" of the foo variable, so that each variable is unique. And so, that's how I simulate Racket's boxes in JS.

This has another nifty side effect: on the JavaScript side, you can choose which version of the variable to call. So this gives the benefits of hyper-static scope not only to Nulan code, but also to JavaScript code that calls Nulan code.

This reminds me a lot of SSA:

http://en.wikipedia.org/wiki/Static_single_assignment_form

---

So, going back to the macro example way at the top. Consider this:

  (let bar 5 (foo))
Let's macro-expand it:

  (do (var bar 5)
      (bar qux corge))
In Arc, this would cause a name collision, oh noes, lack of hygiene! But what does Nulan compile it to?

  var bar2 = 5;
  bar(qux, corge);
Awesome, the macro remembered which version of "bar" it's using, and the variable "bar" in the "let" is replaced with "bar2". So there's no hygiene violation. With this system of dictionaries mapping symbols to symbols, I simulate boxes, and this makes Nulan-style macros hygienic, even in JS which doesn't let you insert values into the AST.

Anyways, with that out of the way, I actually think I'm going to go for strategy #2, because I only lose a tiny bit of flexibility, but it should be much much faster.

---

I'd still like to hear rocketnia's advice on bytecode, though, for future reference.



2 points by rocketnia 4410 days ago | link

"That works just fantastic in Racket, where you can splice arbitrary things into the AST, but that won't work in JS, where things generally have to be given a name."

Just use my Function( "x", ... )( x ) pattern. Keep in mind that you should only need to translate to JavaScript once per eval or top-level command, rather than once per macro call. The "AST" {bar qux corge} will still exist as an intermediate representation, since we still need to process 'bar if it's a macro and compile 'qux and 'corge if it isn't.

---

"If the symbol refers to an immutable box, the compiler will unbox it and splice it in directly. If it's a mutable box, it'll put the box into the AST, which will be unboxed at runtime."

I did this in Penknife too, but Penknife only had mutable boxes.

Well, I did it for global scope anyway. You say you're doing this for local scope too?! The boxes for a function's parameters don't even exist at compile time, right?

---

"JS doesn't have boxes"

Sure it does! I usually use { val: _ } when I want a box.

You should probably keep using separate JavaScript variables, but here are your examples using boxes:

  // (var foo 1)
  (function () {
      env.foo = { val: 1 };
  })();
  
  // (def bar -> foo + 5)
  (function () {
      var g1 = env.foo;
      env.bar = { val: function () {
          return nulan.plus_( g1.val, 5 );
      } };
  })();
  
  // (var foo 3)
  (function () {
      env.foo = { val: 3 };
  })();
  
  // (bar)
  (function () {
      return bar();
  })();
  
  // (mac foo -> {bar qux corge})
  (function () {
      var g1 = env.bar;
      var g2 = env.qux;
      var g3 = env.corge;
      env.foo = { val: nulan.macro_( { bar: g1 }, function () {
          return [ g1.val, g2.val, g3.val ];
      } ) };
  })();
  
  // (let bar 5 (foo))
  // ==>
  // (do (var bar 5)
  //     (bar qux corge))
  (function () {
      var g1 = env.foo.val.captures.bar;
      var g2 = env.qux;
      var g3 = env.corge;
      return (function () {
          var bar = 5;
          return (0, g1.val)( g2.val, g3.val );
      })();
  })();
(The calls to nulan.macro_() and nulan.plus_() are just placeholders for whatever you would use there.)

Since I'm not using separate JavaScript variables for separate definitions, the original bar is totally out of scope in (let bar 5 (foo)), and I end up having to smuggle it in as part of foo (env.foo.val.captures.bar). I did this in Penknife, and I called the foo->bar path a "lexid." If I had to do it again, I might consider generating unique variables outside every environment, something like what you're doing.

For an easier solution all around, we can call the 'foo macro at run time, rather than relying on the compiler to expand it beforehand.

  // (let bar 5 (foo))
  // ==>
  // (do (var bar 5)
  //     (bar qux corge))
  nulan.eval_( env, [ "let", "bar", 5, [ "foo" ] ] );
This way, 'foo will return its own view of [ bar, qux, corge ], and we don't have to worry about bar being in scope.

As a bonus, this approach allows side effects during macroexpansion to work properly, rather than being forgotten by the compilation process.

The reason I didn't do this in Penknife is because Penknife macros suppressed parsing, the most expensive step, so this kind of compilation wouldn't have done any good.

---

"Anyways, with that out of the way, I actually think I'm going to go for strategy #2, because I only lose a tiny bit of flexibility, but it should be much much faster."

Just wait until someone comes along and thinks your language is too static. ;)

That approach is also good for debugging. Compiling to so-called idiomatic JS is a sacrifice many JS-targeting languages make right now.

---

"I'd still like to hear rocketnia's advice on bytecode, though, for future reference."

I've only done a few things with bytecode, most of which involve reverse engineering GB/NES/SNES machine code and game-specific binary data. I've taken a look at bytecode generation for the JVM a few times, but I know better than to trouble myself with that. :-p

Bytecode is a term commonly used to describe binary formats for programs that run on virtual machines. It's designed to be computationally cheap to execute (e.g. JIT-compile) on multiple CPU machine code dialects.

JavaScript is the only "machine" code you're targeting, so there's not much need to build a compatibility layer. However, you might be interested in fine-tuning for particular JS engines' strengths and weaknesses. For instance, supposedly V8 compiles JavaScript a lot like C code, so it might encourage large blocks of imperative code, few variables, and heavier use of structured control flow than function calls. (Eh, I almost never write C code, so what do I know?)

-----

1 point by Pauan 4409 days ago | link

"Just use my Function( "x", ... )( x ) pattern. Keep in mind that you should only need to translate to JavaScript once per eval or top-level command, rather than once per macro call. The "AST" {bar qux corge} will still exist as an intermediate representation, since we still need to process 'bar if it's a macro and compile 'qux and 'corge if it isn't."

Ewwww. Slow. And yes I know you only need to eval once per top-level form, but the problem is that means you need to actually keep the top level forms around and the compiler/runtime needs to always be present.

Keep in mind that I want to use Nulan to write Chrome Extensions. Now imagine a popup that occurs when the user presses a button. This popup is coded in Nulan. Now, every time the user presses the button, it has to re-parse, re-compile, and re-eval all of the Nulan code that is inside the popup.

I consider that unacceptable for my desires, where the code should be about as fast as handwritten JS. So, Nulan compiles things ahead of time to a JS file, to achieve those speeds.

---

"Well, I did it for global scope anyway. You say you're doing this for local scope too?! The boxes for a function's parameters don't even exist at compile time, right?"

Well, in Racket, I just use a "let" for local scope, so no, locals aren't boxes. But I could make locals use boxes too. It would just be a lot harder to implement. At that point I might as well just write an interpreter.

And in fact, the interpreted vau version of Nulan does use boxes for all variables, global and local.

---

"Sure it does! I usually use { val: _ } when I want a box."

Yes, but then it has to do the boxing/unboxing at runtime in addition to the variable lookup. Whereas with Nulan's scheme, you get almost all the same benefits, but with just the variable lookup. The only downside is that the boxes are no longer available at runtime (but you can access the fake "boxes" at compile-time).

---

"Just wait until someone comes along and thinks your language is too static. ;)

That approach is also good for debugging. Compiling to so-called idiomatic JS is a sacrifice many JS-targeting languages make right now."

Yes it is a tradeoff. Yes I know it's more static than Arc or the Racket version of Nulan. But with my scheme, I can actually make things feel fairly fluid. As an example, this works:

  (var + (fn ...))

  (+ 1 2)
That is, the runtime function + shadows the macro +, so it doesn't do the macro expansion. And macros can hygienically expand to runtime values, as I already explained, thanks to the variable renaming scheme.

So, the only real downside is that you can't do this:

  (def bar -> ...)

  (mac foo ->
    (bar ...))
That is, you can't actually call a runtime function from inside the macro. But you can still expand to it:

  (mac foo ->
    {bar ...})
And you can also evaluate the "bar" function at compile-time, which would make it available:

  (&eval (def bar -> ...))

  (mac foo ->
    (bar ...))
I consider this to be a perfectly reasonable downside in exchange for increased speed.

-----

1 point by Pauan 4409 days ago | link

Your alternate code looks pretty straightforward, except for these two bits:

  env.foo = { val: nulan.macro_( { bar: g1 }, function () {
      return [ g1.val, g2.val, g3.val ];
  } ) };

  return (function () {
      var bar = 5;
      return (0, g1.val)( g2.val, g3.val );
  })();
Why does it pass { bar: g1 } as the first argument to nulan.macro_?

Why do you do (0, g1.val)? That's the same as just using g1.val.

-----

1 point by rocketnia 4409 days ago | link

"Why does it pass { bar: g1 } as the first argument to nulan.macro_?"

That object becomes the "captures" property in env.foo.val.captures.bar later. In Penknife, that's how I let code have run time access to the lexical scopes of compile-time-expanded macros.

---

"Why do you do (0, g1.val)? That's the same as just using g1.val."

I like to go out of my way not to pass a "this" parameter unless I actually want to pass it. If I were hand-writing this code, I would have probably used a fresh local variable to hold g1.val, but this this (0, g1.val)( ... ) approach is sufficient for compiler output.

-----

1 point by Pauan 4409 days ago | link

You learn something new every day:

  var o = {
    foo: function () {
      return this
    }
  }

  o.foo()       -> Object
  (0, o.foo)()  -> Window
I would have expected JS to be smart enough to return the object in both cases.

-----

2 points by rocketnia 4408 days ago | link

Well, this behavior is the one that surprised me initially:

  o.foo()       -> Object
I expected the property access and the procedure call to be two completely separate steps. Instead, there's a secret avenue that somehow exposes my local variable "o" to the callee without me explicitly passing it in!

After this encounter, I reasoned that a.b( ... ) must be its own syntax, only superficially related to a.b and b( ... ). My choice to use this syntax is explicit. This intuition worked well for a while, but then this behavior surprised me:

  (o.foo)()     -> Object
  ((o.foo))()   -> Object
So it's its own syntax, but it also supports grouping parentheses? What for?

Turns out the spec actually specifies this behavior in terms of "References." An expression doesn't just return a value; it returns a Reference that can be either resolved to a value, assigned to, deleted, passed to typeof, or called as a method call. (Maybe there are some other options too.)

In the spec's own words:

"The Reference type is used to explain the behaviour of such operators as delete, typeof, and the assignment operators. For example, the left-hand operand of an assignment is expected to produce a reference. The behaviour of assignment could, instead, be explained entirely in terms of a case analysis on the syntactic form of the left-hand operand of an assignment operator, but for one difficulty: function calls are permitted to return references. This possibility is admitted purely for the sake of host objects. No built-in ECMAScript function defined by this specification returns a reference and there is no provision for a user-defined function to return a reference. (Another reason not to use a syntactic case analysis is that it would be lengthy and awkward, affecting many parts of the specification.)"

An ECMAScript Reference is similar to what I independently called a "fork" in Penknife. Forks were my approach to unifying Arc-style metafns, setforms, and macros into a single syntactic mechanism. A Penknife syntactic abstraction would return a fork value, and that value would have behaviors for parsing a macro call body, for setting, for getting, and for acting as a top-level command. The result of the parsing behavior would be another fork, so this branching process could handle multiple layers of brackets, metafn style. (Now I understand this as a monadic, coinductive approach. Yay big words.)

I've compared JavaScript's method invocation to metafns before: http://arclanguage.org/item?id=12094. I was somewhat supportive of the combination in that post, but I think Arc, JavaScript, and Penknife just use this technology for name punning, rather than resolving any essential conundrums of language design, so I'm not very enthusiastic about the technique in general.

P.S.: I've also casually explained this part of JavaScript's method invocation to you before: http://arclanguage.org/item?id=14665 But it was technically a different example, so no worries. ;)

-----

1 point by Pauan 4407 days ago | link

Yes, the whole "this" system in JS is pretty insane. Naturally Nulan solves this by requiring you to explicitly pass in the object. This also gives you the "call" and "apply" behavior of JS for free, in an orthogonal way.

P.S. The JS version of Nulan is coming along quite nicely. 25 working macros and 10 semi-working macros. It can handle a ton of stuff already, but it still has a long way to go!

-----