Arc Forumnew | comments | leaders | submitlogin
A Documentation System for People Who Hate Documentation
3 points by drcode 6013 days ago | 1 comment
The arc philosophy strongly de-emphasizes the use of documentation and comments. I think this is because there are several drawbacks to an excessive use of documentation.

Here is the list of reasons I would give of drawbacks that documentation/comments for code usually has:

1. Comments can clutter up your code. Arc is built on clean code that is not hidden under layers of commenting.

2. Comments cause friction during exploratory programming. If you spend 20% of your time on documentation, then writing your exploratory code requires 20% more time, even though such code usually is rewritten, anyway.

3. Comments/Documentation often suffers from "comment drift" since they are a partial duplication of the source and may not stay in sync with future modifications of the code. This means you can never rely 100% on your comments, since they may not be current. This greatly diminishes the value of comments.

4. It is a difficult cognitive task to determine when you have "enough documentation." A programmer's cognitive energy should be used, instead, on writing code.

I've built a very lightweight/simple/crude documentation system that tries to address all four of these problems with traditional documentation. It consists of only one function called 'checkdoc:

  (def checkdoc (file)
    (let adoc (apply string (intersperse #\newline (w/infile s (string file ".adoc") (drain (readline s)))))
      (with (gran 20
                  tokens (rev:accum a (forlen x adoc
                                              (when (is adoc.x #\[)
                                                (a:read:cut adoc (+ 1 x) (pos #\] adoc x)))))
             code (flat:readall:infile file)
             cur 0) 
        (when (litmatch "granularity " adoc)
          (= gran (read:cut adoc 12 (pos #\newline adoc))))
        (map [aif (pos _ (cut code cur (+ cur gran)))
                  (++ cur it)
                  (prn "failed on reference [" _ "]")]
             tokens)
        (when (len> code (+ cur gran))
          (prn "undocumented code at bottom")))))
To use 'checkdoc, you need to create an "adoc" file for each of your arc files that contains documentation for that arc file. Every adoc file has a "granularity" defaults to 20. This number means that every 20th token in the arc file needs to be documented. To document a token, you must mention the token in the documentation, in order, bracketed by square brackets. If a gap of undocumented tokens greater than 20 exists, 'checkdoc will give an error message. Similarly, if a token is referenced that is not found in the code it causes an error. This means the 'checkdoc function performs a crude "syntax check" of your documentation that is compared against your code.

By strategically referencing the important tokens in your documentation, you can create documentation that explains the most difficult fragments of your code. If the code changes in the future, the 'checkdoc function will notice that the documentation has become out of date. During exploratory programming, you can simply omit running 'checkdoc, until the code has become more certain in structure.

The "granularity" value makes it easy to know if you have "enough" documentation. An adoc file can also have a custom granularity- Simply place "granularity ##" in the first line of the adoc file.

Example:

I have created a simple adoc file for pg's code.arc file. Simply place the text below in "code.arc.adoc". Check the documentation by running (checkdoc "code.arc"). If you just get nil, then the documentation check passed without error.

--------------code.arc.adoc-------------------

granularity 15

[codelines] gives the number of lines in a file with code. It does this by [summing] up non-commented lines, determined by the logic in an [aand] statement.

[codeflat] just gives the length of a [flat]tened file.

[codetree] gives the total number of nodes in the code tree. It's determined by using [treewise] and just using [+] to accumulate the number [1] over every node.

[code-density] calculates the density in tree nodes per line of code

[tokcount] finds the frequency of tokens in multiple [files]. It does this by building a [table]: On [each] token, it increments a count in the [counts] table, starting with [0].

[common-tokens] creates a list of tokens and frequencies by starting with an empty [ranking] table, then insert items into this list with [insort] in ranked order.

[nonop] is a predicate that detects "non-operations" such as [quote].

[common-operators] creates a list of only already-[bound] symbols by filtering the result of [common-tokens].

[top40] prints the first [40] items in a given list.

[space-eaters] is similar to common-tokens, but when it builds the [ranking] table, it uses only [bound] symbols, and uses the product of token [len]gth and frequency to determine rank. The ranking is built out of [list]s of token name, frequency, and the aforementioned product.

[flatlen] is like codeflat, in that it lets you check the [len]gth of flattened code but is a macro that you can wrap around a code fragment. ------------------------------------------------------



1 point by drcode 6012 days ago | link

btw- You don't have to mention exactly every 20th token- Just tokens no more than 20 tokens apart, if that wasn't clear :)

-----