Arc Forum | Why isn't copylist tail-recursive?

Arc Forum

Why isn't copylist tail-recursive?

5 points by jazzdev 5804 days ago | 16 comments

I'm wondering why copylist in Arc 3.1 isn't tail-recursive.

  (def copylist (xs)
    (if (no xs) 
        nil 
        (cons (car xs) (copylist (cdr xs)))))

sort copies lists, so sort requires a lot of stack space also.

MzScheme seems to deal with this really well, I suspect it calls malloc() to make the stack bigger when needed.

Unfortunately, Jarc falls over with a StackOverflowError.

So I'm replacing copylist in Jarc with something like

  (def copylist2 (xs (o acch) (o acct))
    (if (no xs)
        acch
        (no acch)
        (do
          (assign acch (cons (car xs) nil))
          (copylist2 (cdr xs) acch acch))
        (copylist2 (cdr xs) acch (scdr acct (cons (car xs) nil)))))

I tried the tail-recursive copylist2 in Arc 3.1 and it runs 9 times slower than the original.

  arc> (let td (n-of 10000 t) (time (copylist td)) 1)
  time: 3 msec.
  1
  arc> (let td (n-of 10000 t) (time (copylist2 td)) 1)
  time: 27 msec.
  1

The scdr solution is so slow that it's faster to iterate the list twice and copy it twice!

  (def copylist4 (xs (o acc))
    (if (no xs)
        (rev acc)
        (copylist4 (cdr xs) (cons (car xs) acc))))

  arc> (let td (n-of 10000 t) (time (copylist4 td)) 1)
  time: 7 msec.

The rev solution takes about twice as long as the original. That makes sense since it does twice as many cons calls.

So copylist appears to be optimal as it is.

Fortunately in Jarc, copylist2 is only twice as slow as copylist. That seems reasonable since it does N scdr calls in addition to the N cons calls, assuming the car and cdr calls are much faster than cons and scdr. I can live with that to avoid the StackOverflowError.

2 points by conanite 5804 days ago | link

I bumped into the same problem with rainbow; my (partial) solution was to revert to the original definition of list (the main caller of copylist)

  (def list args args)

instead of the current (3.1)

  (def list args (copylist args))

arc.arc comments "Can return to this def once Rtm gets ac to make all rest args nil-terminated lists." This is evidently a scheme->arc problem, not one for rainbow, nor, I suspect, for jarc.

At the time I tried implementing copylist using a queue

  (def copylistq (xs)
    (let q (queue)
      (each x xs (enq x q))
      (qlist q)))

It looks like a neater definition, but uses 'atomic internally, which kills performance. It might work with a non-atomic 'enq - but if 'scar is the next bottleneck then maybe it's hopeless.

-----

4 points by akkartik 5804 days ago | link

Yet another reason to totally get rid of the nil/() dichotomy. If we're building atop PLT scheme there's no reason to go against its grain like this for something so purely cosmetic.

I've gotten rid of all that niltree/denil gunk from a private arc, and list operations sped up by 6x. YMMV, but the code looks much cleaner as well.

I'm working on a repackaging/refactoring of arc; will release that at some point.

-----

1 point by rocketnia 5804 days ago | link

I'm working on a repackaging/refactoring of arc; will release that at some point.

I'm glad to hear that. ^_^ I was thinking about doing something similar, but I think too many of my "fixes" would change the grain of the language, so instead I've been focusing my energy into building a new language from scratch (which is something I've been wanting to do for a long time anyway).

-----

3 points by akkartik 5803 days ago | link

"instead I've been focusing my energy into building a new language from scratch"

That's awesome; perhaps we should talk. Then again, perhaps this community is small enough I should just put what I have up here sooner rather than later. Seems stupid to be working alone like an alchemist when I have the perfect community to get feedback from. Let me go first, and then perhaps you can tell me how you've been thinking about entirely different things for your language :)

I'm trying to build a substrate that people can fork off for new languages, a more structured way to monkeypatch code without breaking it. Perhaps the most crazy idea: I'm introducing a transform phase between the read and eval phases (repl -> rtepl). Now I can write operators that modify s-expressions before eval. It seems the biggest reason you need a code-walker in arc is to implement lisp-style macros without stupid phase-separation issues. The transform level seems like the simplest, cleanest way to implement mac.

When arc started out people dismissed it as a 'thin layer of macros atop PLT'. I want to make that as true as possible. As I go over arc's functions and macros, I see a few things that just macros won't fix: if and cond treat () as false; if, + and map can take any number of args; car and cdr don't barf on (); let needs fewer parens and can handle destructuring more transparently. In these cases I plan to make it easy to just replace if with my-if in the transform phase.

This exercise has had a couple of interesting effects. The first is that as I 'explode' ac.scm and move things around, I'm ending up with a rather nice separation of the core language into layers. I've started organizing my arc directory like init.d scripts with files prefixed by a number that specifies the order to load them in. I imagine the arc implementation sitting in the same directory as my app. Say 000 and 099 holds the arc implementation by convention, and 100-999 is available to arc programs[1].

A second interesting effect: I keep running into cases where I can mix scheme code into my arc and it'll just pass right through. When you write a compiler with a cond inside it you have to translate everything. If you build it as a transformation filter, the default operation is to just pass things down to the lower layer. This seems useful so far. I've had to think about the scheme substrate a lot when I hack on arc; arc feels a very leaky abstraction as a compiler. The hard part in programming is switching layers of abstraction; if an abstraction boundary is leaky enough, just taking it out entirely may make things easier. That's the hypothesis, anyway.

Swallowing this red pill changes the language radically, but in new directions. I'm not hung up on backward compatibility, but I'm hoping most of my arc code will just run atop the new arc. I just won't have to deal with nil, or to stare at _variables in error messages.

The new language has a higher learning curve if you're new to lisp. Having just ~10 files made arc much more approachable when I was starting out. I often programmed with my program in a window on one side and arc.arc open in a window next to it. A distribution with a hundred files will be more intimidating. When you make a mistake the error messages may make no sense because you passed through into scheme without realizing it. You really have to think of this as learning a dialect of PLT scheme rather than a new language.

It will be easy to port this implementation to sbcl or jvm, but code written atop it will be hard to port; it's just too easy to use non-portable features of the implementation language.

---

[1] This takes aw's hackinator to the limit. Rather than a big ball of mud and little garnishings atop it, you have a set of ingredients to mix with. You could even imagine having each file encode its dependencies as other files with lower number prefixes, so then you can have multiple implementations of 072json and 072json-akkartik may say it needs 003scheme-aw. Multiple 003's and 072's can live in the same directory; the loader would know to only run one instance of each prefix.

-----

2 points by rocketnia 5802 days ago | link

perhaps you can tell me how you've been thinking about entirely different things for your language :)

Although I'm tempted to talk about Blade, I'm going to put that off until I've responded to your ideas. It's easy for me to get carried away. :-p

I'm introducing a transform phase between the read and eval phases (repl -> rtepl). Now I can write operators that modify s-expressions before eval.

Hmm, something like this?

  arc>
    (def subrepl (transformer name)
      (zap string name)
      (catch:while t
        (on-err [pr "Error: " details]
          (fn ()
            (pr name "> ")
            (iflet (expr) (do.transformer (read))
              (prn eval.expr)
              throw!ok)))))
  #<procedure: subrepl>
  arc> (subrepl [case _ exit nil `((prn ',_))] 'subarc)
  subarc> (1 2 3)
  (1 2 3)
  (1 2 3)
  subarc> exit
  ok
  arc>

I suppose you'd have a bit more infrastructure in there so you could manage having more than one transformation operator, all defined within the same language they're transforming. But if that infrastructure is something more general than macros, then what is it? :)

...I plan to make it easy to just replace if with my-if in the transform phase.

I think that's probably my favorite part about lisps: The way they encourage code transformation.

Say 000 and 099 holds the arc implementation by convention, and 100-999 is available to arc programs.

That seems like a much crazier idea than the RTEPL is. ^_^ But with the way Arc programs are just imperative sequences of expressions, that does seem like the path of least resistance. The part where perceived code order gets soggiest is in a directory structure, IMO, and while I consider that to be a sign that order shouldn't matter (part of Blade's motivation, which also manifests in Lathe's rule precedence system), that numbering technique sure would make the order clearer.

Still, I'm reminded of programming in BASIC. Should all my library's numbers be multiples of 10 so I (or someone else) can come in later and insert another library in between two of them? :)

Hmm... I suppose what you've got there is a convention-over-configuration approach to loading. What if you went the other way around, maximizing the "set of ingredients to mix with" aspect by having lots of little files accompanied by other files full of load statements (like libs.arc), files which can be hackinatored themselves? It would be sort of intimidating to navigate the Arc folder then, since the overall order would be hidden away in a spaghetti network of files, but maybe those files could have a naming convention like "load.arc" so that they all bunch up in one place.

Well, since libs.arc is the status quo, I suppose you've probably already considered that option.

I've had to think about the scheme substrate a lot when I hack on arc; arc feels a very leaky abstraction as a compiler. The hard part in programming is switching layers of abstraction; if an abstraction boundary is leaky enough, just taking it out entirely may make things easier. That's the hypothesis, anyway.

Hmm, I've never thought about that as being the hard part of programming. I think for me, the hard part is deciding upon a sufficiently leak-free way to implement my own abstractions. Dealing with external leaks is a matter of course; living with myself is another matter. :-p

But yeah, it's the same difference. As long as you're hacking on the language, you're the one making the decisions that determine the leaks, so you're the one who has to live with yourself.

It will be easy to port this implementation to sbcl or jvm, but code written atop it will be hard to port; it's just too easy to use non-portable features of the implementation language.

Well, how easy is it to avoid using them? ^_- If all else fails, someone can probably just write a version of traditional Arc that sits on top of your version.

perhaps you can tell me how you've been thinking about entirely different things for your language :)

Back to this again.

I've been kinda quiet about Blade because a) it's currently unusable as a language, and b) it's moving along consistently enough that I don't foresee having to abandon my unfinished ideas to the wild. Soon enough, I expect to be able to demonstrate my ideas rather than just talking about them.

Basically, though, Blade is yet another language-oriented language. It may have more defining characteristics than that soon enough, but for the moment that's the part I'm focusing on. What (maybe) makes Blade special is that all Blade languages (even those defined in the s use the same namespace, and two languages can each refer to things the other language defines.

In a way, the fundamental philosophy behind Blade as a language is that Blade isn't a single language. But any body of self-interpreting code has to have some aspect to it that can be taken as fundamental though, so there has to be at least one core Blade language. Surprisingly enough, a there's-no-syntax mentality has been surprisingly effective for designing the core syntax.

Any file without a .blade extension is ignored. Any column-zero paragraph in a .blade file which doesn't begin with a [ is ignored, and the rest are scanned for [ ] structures. Any text outside [ ] brackets is ignored. The text spanned by each top-level pair of [ ] is checked to see if it matches /\[\s([^\[\]]+?)\s.\]/ (i.e. it has a bracketless, whitespace-terminated word at the beginning), and if it doesn't, it's ignored. Then the words found this way are looked up in a global namespace dedicated for this purpose, and each of those values is applied as a pure function to the declaration text itself, complete with line number information and so forth. Each result of those pure functions will then be executed as the kind of partial calculation Blade's core dependency resolver expects.

Once I get that working (just a tiny bit longer--I already have everything but the word extractor), then it's mainly a problem of writing libraries, and once I have a proof-of-concept language set up (as a library), Blade is born.

Part of Blade's there's-no-core-language philosophy also leads to the idea that there's no one meaning for a program. Once the project has been parsed, resolved, and so forth, it's just a filled-up namespace... and if there are compilation errors, it's a namespace with errors stored in it. It's up to you or the tools you're using to decide what namespaced variables are exported, logged, displayed, executed, inspected with a REPL, etc. So one aspect of Blade in the long term is that a Blade project can have really close interaction with the tools being used to compose it; with a sufficiently efficient incremental compiler, a Blade dialect can provide its own syntax highlighting, code completion, and so forth.

Enough of that. ^^; Like I said, I'm nagged by a notion to just implement the sucker instead of talking about it. Don't let that stop you from giving feedback though. :-p

If you want to see more of what I've got so far, the source is at http://github.com/rocketnia/blade. It's under the GPL 3.0 for now, but I'm planning to pick a new license for it at some point, so suggestions on that front are welcome too.

-----

3 points by akkartik 5802 days ago | link

I looked at it. A groovy project, cool. I liked this from the README:

  Like so many other languages, Blade is so many other languages in theory,
  and it's only so many other languages at the moment.

I'm having trouble understanding parent comment, though. Can you elaborate on the 'why' rather than the 'what'? Are there other examples of language-oriented languages? I looked at the wikipedia page for LOP, but what you're doing isn't about metaprogramming, is it?

---

I got stuck with my project today, so lemme just throw it up on github: http://github.com/akkartik/yrc.

It contains a reference 'minimum arc' that I started out modifying from the top down, and the more experimental yrc.scm interpreter that I started building up from the bottom up. Right now the former isn't much different from vanilla arc, and the latter doesn't do very much at all. Things to read first: Readme at http://github.com/akkartik/yrc#readme, and commit messages starting from the bottom of (currently) http://github.com/akkartik/yrc/commits/master?page=2. I basically stopped working on the top-down version when I started the bottom-up version at commit 8.

Where I got stuck with the bottom-up version: nested macro calls don't work. Check out the failing test case for building def in terms of fn.

I've basically rediscovered why macros need to know something about lisp syntax, and you can't just search and replace blindly wherever you see a macro call. It's a classic beginner error, I imagine, something undergrads have run into thousands of times when building their own little lisp interpreter in lisp.

Comments welcome on where to go from here. Perhaps I need to interleave eval and translate steps more finely (which would require building eval and the more tranditional big-bang interprete-everything structure). Perhaps I end up with a very non-lisp language where I have to 'quote' nested macros.

---

> "Still, I'm reminded of programming in BASIC. Should all my library's numbers be multiples of 10 so I (or someone else) can come in later and insert another library in between two of them? :)"

Yeah I've been imagining having to write scripts to automate creating gaps :) But it should come up less often for files than for line numbers.

It's certainly about personal preference. I like being able to run ls and see the structure of the program[1]. I don't have to update libs.arc when I add more files. I can disable a file simply with rm or mv (and restore with git co). Compare the loaders; yrc.scm seems more timeless than as.scm.

I have this idea that you can assess the complexity of a program by measuring statistical properties of the number of hunks each commit's patch has, the number of distinct places that need to be modified when doing a unit of work. Dynamic languages help keep this complexity down by allowing Aspect-oriented programming (method chaining in ruby, advice in common lisp, before-exec and after-exec in my personal arc). A new feature may modify dozens of functions but keep all the related changes spatially clustered together. Not needing to touch the loader fits into that 'minimize hunks' aesthetic.

I figure I'll try it out and see how much I like it, and if usage uncovers drawbacks. Perhaps most likely, I'll discover I need to refer to filenames somewhere. That would be killing.

[1] Even if we need hierarchies I can imagine attaching numbers to directories, and independent namespaces of numbers within them, all the way down.

-----

3 points by rocketnia 5801 days ago | link

I think I'll reply to each of your sections in a different post. That's because the reply to the first part got to be this long. XD And to make matters more sensible, I'll answer the last question first.

I looked at the wikipedia page for LOP, but what you're doing isn't about metaprogramming, is it?

Ah, yes it is, but I see how that might not have been obvious. It looks like I left part of my post unfinished.

Before: What (maybe) makes Blade special is that all Blade languages (even those defined in the s use the same namespace, and two languages can each refer to things the other language defines.

After: What (maybe) makes Blade special is that all Blade languages (even those defined in the same project) use the same namespace, and two languages can each refer to things the other language defines.

That probably still doesn't make sense, 'cause I also neglected to mention the more important part.

If I were to use one Blade dialect to write another Blade dialect (and I plan to), I could write code in the new dialect in the very same project. The dialect's code will be statically resolved in the same namespaces as its implementation is being resolved, meaning that the implementation can refer to values defined in the dialect and bootstrap itself.

Where you might write a macro in Arc, you might instead write a parser, interpreter, and so forth in Blade. You might also just mix and match, or just instantiate a new interpreter with different parameters, or, well, just write a macro. If you want to write a macro, write a macro. :-p

So metaprogramming--programming DSLs and stuff--should find itself especially well-supported in Blade. In fact, the point of Blade is for all its languages to be interchangeable DSLs like that; that's why I'm aiming to make the core language as unassuming and unobtrusive as possible.

Can you elaborate on the 'why' rather than the 'what'?

Well, it mainly comes down to my personal preference, but I can explain my preference. :) I get fed up with most languages at some point or another. I like modeling my program in the best possible way (according to my own aesthetic), and that's led to things like switching from languages without closures to languages with closures, switching from languages without continuations to languages with continuations, and so forth. There's always some feature or combination of features I miss when I program, even if I've never used them before.

With Blade, I'm hoping to make a programming experience where the abstractions I use are exactly the abstractions I intend to use, whether they're functions without side effects, functions with side effects, functions which expect tail recursion, functions which don't expect continuations, or whatever. I believe this circus of abstractions can work as long as a) they're normalized by coercion depending on their context, and b) new contexts (lexical scopes, dynamic scopes, languages, parameter annotations, etc.) are also easy to specify.

All this should amount to a language (or system of languages) that's hard to get fed up with. As a bonus for me, were I to get fed up with it anyway, I must have had a profound realization about my preference in languages.

Are there other examples of language-oriented languages?

Absolutely. ^_^ PLT Scheme[1] is a language which prides itself on hosting other languages (as the site's front page would have me believe), and XL[2], Kernel[3], and Eight[4] are all fledgling languages with the same high expectations. I'm not sure the extent to which XL provides extensible syntax, but it makes a big deal of it. Kernel and Eight are both founded on s-expressions with fexpr-style syntactic abstraction.

There's also almkglor's hl[5], a lisplike which "allows various libraries to provide their own syntax without conflicting with other libraries ... libraries which may have been written in a paradigm you are unaware of, or are even hostile to." That philosophy lines up very nicely with mine, and I believe I'd call it LOP even though almkglor seems to skip using the word "language" (a vague word anyway) in favor of "syntax" and "paradigm."

In a way it's all sort of the same thing. I can already write syntactic abstractions on top of Arc and call them languages if I want to. The thing is, it would be especially difficult for me to write the mini-languages I want to write in a way that also plays well with other Arc code, or any other language's code for that matter (as far as I can see, but I'd be glad to be proven wrong).

Instead, I'm writing a friendlier foundation, and I don't worry myself with the semantics of the language it's implemented in. I picked Groovy because it's fast, featurific, and familiar to me.

[1] http://plt-scheme.org/ (but you already knew that!)

[2] http://xlr.sourceforge.net/

[3] http://web.cs.wpi.edu/~jshutt/kernel.html

[4] http://github.com/diiq/eight

[5] http://hl-language.sourceforge.net/

-----

1 point by akkartik 5800 days ago | link

"Where you might write a macro in Arc, you might instead write a parser, interpreter, and so forth in Blade."

Oh, very cool. That was kinda what I was trying for as well: the ability to specify new kinds of semantics, and the ability to have languages coexist.

Heh, I just noticed almkglor's hl has files starting with digits as well. Perhaps I just rediscovered his idea (or an even earlier version).

-----

1 point by rocketnia 5800 days ago | link

Where I got stuck with the bottom-up version: nested macro calls don't work. ... Perhaps I need to interleave eval and translate steps more finely (which would require building eval and the more tranditional big-bang interprete-everything structure). Perhaps I end up with a very non-lisp language where I have to 'quote' nested macros.

Hmm, I thought I had a simpler fix for you, but it seems I'm stuck too. ^_^; If it comes to me, I'll be sure to post it.

---

It's certainly about personal preference. I like being able to run ls and see the structure of the program.

Interesting, I tend to consider a flat, fifty-file list as not having a lot of structure to it. In fact, I think the term "structured programming" has something to do with not having flat lists of instructions with line numbers. :-p

Really, though, I can sort of get used to them. It's a bit like being introduced to a language by reading a book; you get a little bit of the language in each page or chapter, and if you get lost, you know which direction to turn to get back to solid footing.

A new feature may modify dozens of functions but keep all the related changes spatially clustered together. Not needing to touch the loader fits into that 'minimize hunks' aesthetic.

Well, those are things we both like. ^_^

-----

2 points by evanrmurphy 5802 days ago | link

> I'm trying to build a substrate that people can fork off for new languages

I sense a common (albeit very general) theme with Readwarp. :)

> As I go over arc's functions and macros, I see a few things that just macros won't fix: if and cond treat () as false; if, + and map can take any number of args; car and cdr don't barf on (); let needs fewer parens and can handle destructuring more transparently. In these cases I plan to make it easy to just replace if with my-if in the transform phase.

Must be misunderstanding you here. I thought most of these (esp. 'nil and '() both being false, those utilities taking any number of args) were well-publicized features of Arc, not bugs. Were you perhaps referring to PLT's functions, or could you otherwise clarify please?

> I often programmed with my program in a window on one side and arc.arc open in a window next to it.

I love doing this.

Definitely an interesting project. I'm glad you decided to share about it here and sorry for not providing richer feedback on the actual idea in this comment.

-----

2 points by akkartik 5802 days ago | link

> "I thought most of these (esp. 'nil and '() both being false, those utilities taking any number of args) were well-publicized features of Arc, not bugs."

Yeah each of these is absolutely an improvement. I meant that these are the reasons arc needs to be a language with a compiler rather than a macro library. So they're the parts that are challenging to reimplement without building a monolithic cond.

-----

1 point by evanrmurphy 5802 days ago | link

> I've had to think about the scheme substrate a lot when I hack on arc; arc feels a very leaky abstraction as a compiler. The hard part in programming is switching layers of abstraction; if an abstraction boundary is leaky enough, just taking it out entirely may make things easier. That's the hypothesis, anyway.

Sounds like an accurate assessment of Arc, though I guess I've been itching for the opposite (probably more obvious) outcome: an entirely platform-independent "arc.arc" with only the handful of axioms in "ac.scm" to bootstrap and guide ports of Arc to other substrates.

I don't see any reason why we can't have both.

-----

2 points by akkartik 5802 days ago | link

Absolutely. One benefit of the transformation layer is that it allows you to grow the language. It could eventually be completely self-sufficient, but you can start with the hard/interesting parts, and other stuff can gradually be abstracted away.

As an example, ac.scm has a comment that ar-gensym is a toy implementation. I don't need to do toy implementations because I can just use PLT's gensym directly until I decide to build my own.

-----

1 point by jazzdev 5804 days ago | link

Yeah, Jarc doesn't use that version of list either. No need.

But copylist is also used by copy which is used by sort. So the sort benchmark ran into it. I guess since Rainbow compiles to CPS the copylist doesn't stack overflow in Rainbow.

-----

3 points by conanite 5804 days ago | link

rainbow grows its stack as needed, and will eventually run out of memory with a non-tail-recursive copylist and a big enough list. Scheme seems to be able to optimise "tail recursion modulo cons" - I've tried copylist with tens of millions of elements and scheme doesn't break; rainbow does.

-----

3 points by jazzdev 5803 days ago | link

I pushed scheme to the limit and it died with an insufficient memory error. I never got a stack error.

-----