Arc Forumnew | comments | leaders | submitlogin
Small bug in load
5 points by aw 5305 days ago | 13 comments

  (def load (file)
    (w/infile f file
      (w/uniq eof
        (whiler e (read f eof) eof
          (eval e)))))
read, if it isn't passed an eof argument, returns nil when it reaches the end of the file. If load called read until it returned nil, then a nil in the middle of a file would terminate the load, so instead load passes a unique symbol to read. Since the symbol is unique, it won't appear in the file being loaded, and is thus would be safe to use as the indicator for the end of the file.

However, the symbol created by w/uniq isn't actually globally unique, it just returns a new symbol each time it's called (gs400, gs401...). Thus it is possible, if unlikely, for the "unique" symbol to appear by itself in the file being loaded, and thus to terminate the load early.

Since any symbol by itself (whether called gs400 or something else) would be useless in an Arc source code file, I doubt that anyone will ever be bitten by this bug in practice. However, I mention it because someone might copy this pattern of scanning input until the end marked by a unique symbol, and maybe their input data might contain singular symbols.



3 points by akkartik 5305 days ago | link

I noticed just this morning that gensym in anarki has been fixed, so it's a globally unique identifier. http://github.com/nex3/arc/blob/master/CHANGES/uniq-via-gens...

(after missing the new coerce I was wondering what else I'm unaware of http://arclanguage.org/item?id=12508)

-----

1 point by rocketnia 5304 days ago | link

Hm, drat. This "fix" means that we have to be more careful with code that coerces symbols to strings and back, like fallintothis's 'sscontract.

I think I prefer original Arc's gensym hack for the time being, since I know that as long as I don't begin any symbols in my own code with "gs", they won't share their name with any gensyms.

With Anarki's approach, wouldn't it be harder to accomplish serializable gensyms? In official Arc, this kind of thing might be done on the reader side by replacing all read-in symbols that start with "gs" so that they don't conflict with gensyms currently in use. In Anarki, uniq!g1 and (uniq) can return gensyms with the same name, so there would have to be some special behavior on the writer side too.

To give a clearer example of what I'm talking about, a Racket-style serialization format would probably look something like this, where #u:X does what it can to create a new, uninterned X every time it's read in (and specifically works when X is a symbol):

  (1 #0=#u:this-is-a-gensym 2 #0#)

-----

1 point by aw 5304 days ago | link

I suspect that if we want to be able to produce unique symbols that can be serialized, we should use random symbols of sufficient length that the chance of collision is vanishing small (that is, similar to a UUID or GUID).

While a #u:X approach would work for a single output, if I produce multiple output files I have no reliable way to put the same uninterned symbol in different files.

-----

1 point by akkartik 5304 days ago | link

I'm not sure I follow, but the situation hasn't changed since August '09: http://github.com/nex3/arc/commit/47909c72a3e5ea7c4e2173fe61...

-----

3 points by rocketnia 5304 days ago | link

Right, but I wasn't familiar with it. This time, I thank you for pointing it out for me. ^_^

I meant to convey that I'd rather write Arc code so that sentinel values are generated using [] or whatnot, rather than introducing uninterned symbols to the language. Uninterned symbols make it much less of a pain for multiple threads to generate unique variable names, but they introduce complexity in other places. Besides the hypothetical examples I already mentioned, I'm happy I can enter (uniq) symbols at the REPL when I'm debugging.

I don't really expect this opinion to catch on, but I'm throwing it out there just in case.

-----

1 point by aw 5305 days ago | link

Aha!

-----

2 points by rocketnia 5304 days ago | link

How about replacing "w/uniq eof" with "let eof []"?

-----

2 points by aw 5304 days ago | link

Maybe not [], that relies on the language not ever getting serializable functions.

-----

1 point by rocketnia 5304 days ago | link

Agreed.

-----

1 point by akkartik 5304 days ago | link

I don't follow this exchange. Why would let eof [] work? And what do serializable functions have to do with it?

-----

5 points by aw 5303 days ago | link

We could use any unique value that compares equal to itself with "is" but isn't the same (by "is") as any possible value read from the input file.

An example of such a value is "(list 'a)". This creates a new pair (cons cell), which "is" itself, but "isnt" any other pair or value.

Another example is in Arc 3.1, fn returns a new #<procedure> which "is" itself but not any other function or other value.

"[]" expands into such a function, so it's a cute way of getting a unique value. However, for web applications it would be nice if closures could be serialized (written out to a file and read back in later), so that a server could be restarted without losing user's state.

If this were possible, then "[]", which expands into "(fn (_) nil)", could be written out to a file and read back in later.

Even if we could read in serialized functions, using [] as a sentinel value would still work if every evaluation of (fn ...) continues to produce a new, unique function, much like how "(list 'a)" or "(cons t t)" produces a new, unique pair every time.

Or maybe it would be useful for optimization or for some other reason for two identical functions "(fn (_) nil)" to evaluate to the same function object, much like how every time we read in a symbol 'x it evaluates to the same symbol. Or not... that might be a useless optimization (at least, I can't think of a use for it off the top of my head). But that was my thinking behind "maybe not []".

-----

3 points by waterhouse 5303 days ago | link

Incidentally, app.arc creates a unique "fail" global variable in this way. (Eeeeeaaaaaargh, never use asterisks in text. The word "fail" has an asterisk at the end. Is there a way to escape asterisks?)

  ; (= fail* (uniq))
  
  (def fail* ()) ; coudn't possibly come back from a form

-----

3 points by rocketnia 5303 days ago | link

You've even italicized the reply button. :-p I think the only way to "escape" an asterisk is to put whitespace after it: If you put a space at the end of "fail* ", presto. You could also subtly tweak your style in order to write fail* without quotation marks or other punctuation afterward. But yeah, escaping would be nice.

-----