Arc Forumnew | comments | leaders | submitlogin
Bugs and failures
5 points by aw 2193 days ago | 5 comments
Failures are normal, expected, and unavoidable. I try to open a file and the file doesn't exist. I try to fetch a document from the web and I currently don't have Internet access. I want my program to handle failures.

Bugs are avoidable. I call `(car 10)`. That's a bug. "Can't take car of 10". I want bugs to be reported. If my code is running off in a server somewhere I want the entire core dump. The entire history of my code execution. I want everything. I want to be able to easily find out why my code called `(car 10)`.

Languages conflate failures and bugs by throwing exceptions for both. Handling failures with exception handlers is a pain. Often the internal structure of the exception object isn't even documented. (Where, for example, in the exception object do I find the OS errno such as ENOENT or EACCES?) Bugs go unreported because someone trying to catch failures with an exception handler hasn't successfully battled the language to catch only expected failures.

Exceptions capture too much for failures and too little for bugs. For bugs I get a stack trace but that often doesn't tell me what I need to know. A stack trace tells me that my function `foo` was called by `bar` which was called by `baz`. Better than nothing, but often I already knew that. What's lost is the context. If not the whole core dump I'd at least like to see the function arguments.

Meanwhile for failures I don't need a stack trace. For a lot of failures a single symbol would be enough.

    (fail 'file-not-found)
That's all I need to know. For some failures I might want some more information. But I don't need a history of the execution of my program.

Failures and bugs are related. Not handling a failure is a bug. If I'm using a library and the library has a bug, I'd like to send the library author a bug report (the core dump); and meanwhile, for me, the library having a bug is a failure.

    (bug "Can't take car of" x)
What do I want this to do? First, generate a core dump. Capture everything. Let me (or the library author) find out why there was a bug. And then call `(fail 'bug)`. That's all my calling code cares about. I tried to use a library or I called one of my functions and it failed because of a bug.

Newer languages are starting to differentiate between failures and bugs. Rust returns a `Result` type from operations containing either the success value or a failure value. But for exploratory programming I don't want to have to always check for failures. (In Rust, as a statically typed language, it's a compile time error to not handle the failure). I'd rather failing to handle a failure be a bug.

Perl6 has `Failure` (https://docs.perl6.org/type/Failure), a "soft exception". You can either check an operation's result to see if it's a failure, or, if you try to use the result value without checking it, it turns into a thrown exception. This is closer to what I'm looking for. I think the timing is wrong though for Perl6 failures. A function returns a failure, and some time later the program tries to use the failure value and throws an exception. By then it's too late to capture a core dump, the function has already returned. And the Failure has to generate a stack trace to put in the wrapped exception in case the failure isn't checked and the exception has to be thrown. So we're again capturing too little for bugs and too much for failures.

Say I want to read a file and get the contents as a string, or nil if the file doesn't exist. I don't know the exact details, but maybe something roughly like:

    (onfail port (infile "foo")
      (file-not-found nil)
      (after (allchars port)
        (close port)))
`port` is assigned the value that infile returns. Next is a list of possible failures and what to return for each. In this example I'm returning `nil` if the file isn't found. (I'm not handling other failures, so they'd turn into bugs). The body is executed on success.

This could be implemented easily in Racket. A parameter (https://docs.racket-lang.org/guide/parameterize.html) could contain a list of failure handlers (i.e. an escape continuation to get back to the onfail), and `onfail` would add a handler to the list for the duration of the execution of the handled code using Racket's `parameterize`. `fail` would look in the list for a handler for its particular symbol. It'd be a bug if there was no handler for the failure.

What's missing from Racket is the "core dump" part. In Racket if we hit a bug we can get a stack trace, but exceptions give us that already and adding a mechanism to handle failures would be nicer to use for failures (wouldn't need to rummage around in the exception object), but doesn't do much for bugs.

A language implementation could instrument the code though. Might be unbearably slow for inner loops, but for outer code it might be OK. For example we could capture the value of function arguments during their dynamic execution. If we hit a bug we could then dump the context.

This would provide something different than either Rust or Perl6. Both Rust and Perl6 return a failure, and later failing to handle the failure can become a bug. But with the `onfail` approach the `fail` function itself can tell whether the failure is set up to be handled or not. If it's not, it's then free to do something expensive (like generate a core dump) that wouldn't be practical to do for expected, handled failures.



3 points by rocketnia 2193 days ago | link

In Lathe and in the first version of Penknife (written in Arc), I was calling this kind of feature "failcall." A function could be called with `failcall` to handle its failures, or it could be called normally, in which case its failures would be promoted to errors automatically.

Your example of using Racket parameters leads to a slight difference in behavior from what I would want. Suppose the code in the body contains a call to some function that in turn makes a normal call to another function which fails. With the Racket parameter technique you talk about, the parameter binding would still be in scope at that point, so the failure would be caught, even though I think the author of that normal function call would have expected its unhandled failures to be promoted to bugs.

I remember thinking Racket parameters would be useful, but the technique I ended up with didn't use them at all. There's a full-featured implementation in the Lathe arc/ folder's failcall.arc[1], but here's a short proof of concept for Anarki:

  ; In this example, a "failfn" is a tagged single-argument function
  ; that returns (list t <success-val>) or (list nil <failure-val>).
  (mac failfn (x . body)
    `(annotate 'failfn (fn (,x) ,@body)))
  
  ; To call a function in a way which handles failures, we pass in an
  ; argument and an `on-fail` handler like so. This can be used with
  ; normal functions too, which just never fail.
  (def failcall (f x on-fail)
    (if (isa f 'failfn)
      (let (succeeded val) rep.f.x
        (if succeeded
          val
          (on-fail val)))
      f.x))
  
  ; When a failfn is called normally, it behaves as though it was
  ; failcalled with a handler that always produces an error.
  (defcall failfn (f x)
    (failcall f x
      (fn (failure-val)
        (err:+ "Failed with " (tostring:write failure-val)))))
  
  ; We define an example failfn. We can't use `def` for this since it
  ; defines a normal function.
  (= failure-prone-sqrt
    (failfn x
      (if (< x 0)
        (list nil "Tried to take the square root of a negative number")
        (list t (sqrt x)))))
  
  
  
  arc> (failure-prone-sqrt 4)
  2
  arc> (failure-prone-sqrt -4)
  Failed with "Tried to take the square root of a negative number"
    context...:
     /path/to/anarki/ac.rkt:1327:4
  
  arc> (failcall failure-prone-sqrt 4 idfn)
  2
  arc> (failcall failure-prone-sqrt -4 idfn)
  "Tried to take the square root of a negative number"
  arc> (failcall sqrt 4 idfn)
  2
  arc> (failcall sqrt -4 idfn)
  0+2i
The REPL transcript shows me calling a failfn using a normal call, calling a failfn using a failcall, and calling a normal function using a failcall. The only case that causes an actual error is when the failfn fails and there was no handler to catch it.

Obviously, a more full-featured approach would allow failcalls of arity other than one. And this `failcall` syntax doesn't have the convenient kind of pattern-matching syntax your `onfail` macro does, but that kind of thing could be built as a layer over the top of this example; I'm just keeping the example small.

Racket is just as capable of this technique as Anarki is. Instead of an `annotate` tagged value, the Racket version would use a struct, and instead of `defcall`, it would use the `prop:procedure` structure type property.

[1] https://github.com/rocketnia/lathe/blob/7127cec31a9e97d27512...

---

As far as making core dumps goes, I've never tried this, but it looks like `gdbdump` might be able to do it for Racket programs on Linux.[2] There's also a Racket built-in called `dump-memory-stats`,[3] which at least in Racket 7.0 appears to give a summary of how many objects of certain kinds are in memory.

[2] https://docs.racket-lang.org/gdbdump/index.html

[3] (https://docs.racket-lang.org/reference/garbagecollection.htm...)

-----

3 points by aw 2193 days ago | link

That's insanely clever to define a callable custom type to handle the case of calling the failfn without a fail handler. I'm impressed.

-----

3 points by waterhouse 2193 days ago | link

It should be possible to get the continuation from the point of failure and the dynamic variables from the failing thread (basically: the stack), the same information from any other running threads, and the set of global variables (this at least can be gotten with (namespace-mapped-symbols)), and trace the graph of reachable objects from there, and serialize it all to a file. I don't know if Racket provides the ability to do all that, though; for one thing, I don't know if there's a way to access the variables saved in a closure (from outside the closure).[1] (Maybe using unsafe operations could do that.) Since tracing the graph of objects is exactly what a GC does, and a proper moving GC has to be able to learn the type of every object and where all the pointers are, it must have that functionality, whether or not it's exposed. (I think it should be exposed, of course.)

Barring that, it's possible that the gdbdump rocketnia points at is the easiest way to do it in Racket.

Also, I guess if you're using the FFI at all (which, say, any GUI program would do), then you do need the full core dump if you want to get the state of the C libraries you're using.

[1] https://docs.racket-lang.org/reference/procedures.html isn't promising. https://docs.racket-lang.org/web-server-internal/closure.htm... provides wrapper macros to make serializable lambdas, implying that there is no way to serialize normal lambdas, which is unfortunate.

-----

3 points by akkartik 2193 days ago | link

I'm not sure I grok the precise boundary you're drawing here.

It seems clear that (car 10) is always a bug, so I'm with you there. However, non-existent files may be bugs in some situations. Perhaps you're just proposing giving programmers two distinct labels to use with discretion? If so I shouldn't get hung up on precise examples.

Are all unhandled failures bugs?

-----

2 points by aw 2193 days ago | link

A failure that is unexpected and unplanned for is a bug. Thus it's a bug if a file doesn't exist and my code doesn't handle that situation.

The boundary is what I want to happen in response to a bug vs. a failure. When I hit a bug, an actual bug, I want to capture the entire state and history of my program, to the fullest extent possible, so that I can find out why the bug occurred. I don't care if this a core dump is GBs in size or might be expensive to generate. If a bug occurs I want all possible information that might help me, everything that the language runtime can produce.

For failures, for expected failures, for failures I handle, I don't need to capture anything. I don't even need a stack trace. I don't need the language runtime to generate a stack trace every time I hit an expected, handled failure.

Existing languages don't allow me to do this. At the point where for example the "file not found" exception is being thrown there isn't enough information to tell whether that's a failure or a bug, so they have to be handled the same.

-----