However, I will point out that the name of the project/community is "Anarki", and so it is very likely to be distributed and each person will probably want their own version anyway.
I also agree with separate branches for each library. For one thing, it takes less effort to make the branch stand alone, and there is less chance of messing things up.
Also, they don't get in the way as you are not forced to read the list, and can choose to track as many or few remote branches as they like.
How about using one branch per library, and tags for versions? That way everyone can track the development and even add things, but the versions are the things that 'lib and others like it go for.
The only thing I'm wondering about branches for libraries is: how do you find them if you're trying to fetch them via 'lib? I think only major changes need to be in their own branches, like the hygienic macro system. Everything else should either be hosted separately or in the lib folder.
The only thing I'm wondering about branches for libraries is: how do you find them if you're trying to fetch them via 'lib?
I can think of three scenarios:
1. You're using the public, published version of a library that also happens to be in your git repository.
2. You're using an unpublished version of a library in your git repository: for example, you're testing your code against the latest development version of a library to see if they work together.
3. You're working on a library.
For #3, I presume you'd just be using 'load.
For #2, if you want to for example get the latest version of a file "foo.arc" in the "foo" branch, you can use git for that, e.g.:
git show foo:foo.arc
For #1, you don't need to get the library out of git, you can use whatever mechanism we're all using to fetch and use a published library.
I was hoping to use a modified version of 'lib that would allow me to type
(lib binary)
when I meant
(load "lib/binary.arc")
It has the advantage of less typing, and hopefully the concept that functions are not reloaded unless the file has changed. If I'm trying to test a new version of a function, I would rather selectively reload that function than the whole library.
One thing we could do for a "cpan" style system would be, instead of having one designated library server, use more of a debian-package style system, where anyone can host a server if they like, and then the tool can just browse all of the locations in your server list. Then we can have people like CatDancer just add their server to the list on Anarki, so when anyone else pulls the latest version they get the newest server locations.
A server would effectively be an http folder with a list of arc files and folders in it. There could be a file in the directory with a standard name that the tool would look at first to see what libraries where available, and where they were located.
Then the files wouldn't even have to be located on the same server, they could be anywhere else on the internet, as long as some server has a reference to them.
What do you think? How would you design the file format for determining what libraries were available and where? And what if more than one server tried to host the same library? How would you know which was newest, or which one you meant?
While my attempt to use git as a way of sharing libraries and managing library dependencies (as opposed to, or in addition to, using git in the normal way as a version control system) didn't work out, I learned a number of useful things from it. So while I don't have all the details, I do have some important design principles:
Every release of a library should have a globally unique name: the publisher (we can use our Arc forum username), the name of the library, and the release number. An example of the full name of my toerr library would be "catdancer.toerr.3"
Every release of a library has to be immutable. Once I publish "catdancer.toerr.3", I must never change it. The reason for this is meta-data about a libraries, conflicts, and dependencies; I might say Fred's foo library release 19 doesn't work with my toerr release 3, but there's no way to know if this meta-data is valid if Fred or I are changing our released versions.
I can unpublish a release if I need to; if "catdancer.toerr.3" has a horrible bug I can remove it from my server and publish a fixed version "catdancer.toerr.4". But I must never reuse "catdancer.toerr.3" for a changed version.
The release number is used internally by the tools. A new release number doesn't mean that the version is better. Compared to "catdancer.toerr.2", "catdancer.toerr.3" may be an alpha version or a release candidate or broken in some way.
If you want to have a meaningful version number for your library, make it part of the library name: "foo0", "foo1", "foo2". That way you can release a fixed version of "foo2" if you need to.
We should also be able to publish meta-data about my libraries. For example, I should be able to publish that "catdancer.toerr.3" is, in my opinion, the best available version of the catdancer.toerr library. Unlike a particular library release, which can be deleted but must never change, meta-data can be updated. I can later say that "catdancer.toerr.4" is now better.
The tools should naturally let me abbreviate library names. If I say (load "toerr.arc") or the equivalent, and there's only a "catdancer" toerr library available (or maybe I have "catdancer" first in my search list), and my meta-data says that "4" is the best one, then it should load "catdancer.toerr.4" for me. But if I want to explicitly load "fred.toerr.19", I should be able to do that.
Every library should be available at an http location by username. For example, if you're using git and github, you can publish your repository using the github Project Page feature (http://pages.github.com/).
Now we'll have a list associating usernames with http locations:
this list needs to be kept at some central location, but nothing else does.
Now the tool can download any library it needs to. Given "catdancer.toerr.2", it can lookup "catdancer" in the list of locations, and download "http://hacks.catdancer.ws/catdancer.toerr.2.arc
I agree with several of your pronouncements, such as the fact that libraries need to be immutable, etc.
I think that the naming system should use "user/libname/version" so that it can be reflected in the directory structure of the lib folder of arc where they will be downloaded, and can be easily turned into a path. I'd prefer to make it a 'sym instead of a 'string for less typing ;)
Thus the library loading code would be effectively:
(use libname)
where libname is just a symbol representing a path to which "$arc_home/lib/" and ".arc" are attached. This makes it easy to load files which are already in your lib folder that you wrote, or downloaded. It works with downloaded libraries, and can hopefully detect if you've already loaded it or not, and avoid reloading it. In theory since it is a library it won't be changing and thus only needs to be loaded once.
We can also have other functions that fetch the libraries, or make 'use fetch them automatically. In that case it will need to know how to find them. To that end I propose we add a file to the lib folder called something like "server-list" or something. The file is merely a list of http locations, such as hacks.catdancer.ws, and fred.github.com/arcstuff. Each location will be a folder that contains a file named "libs.arc" or similar, which contains all of the metadata necessary to find the libraries belonging to that server.
Possible meta-data: Name of library, file name and location, date updated, author, dependencies etc.
In theory the package system can determine by reading these files which version of the library is newest, (by version number and date) or if given more particular criteria, find an older version.
I'm basing my ideas off of the debian package system, which I have found very useful. The nice thing about it is that anyone can host a package server, and a package can be located on any server. They aren't tied to a particular server by their name, for instance. It also automatically downloads prereqs, which can be very handy.
Summary:
1) Library is named by symbol which can be easily converted to a path: catdancer/erp/0
2) Servers which have meta-data on finding packages, a la debian packages/yum/gems, etc. for ease in hosting and finding packages.
3) The 'use function, whatever it's called should be able to load local libraries as well as online ones. It should probably also be capable of taking in a direct web address like CatDancer's lib function.
Ok. Why not support both? That way someone can create a large and structured library with multiple sub-folders if they need to.
I was also hoping of making a version of your 'lib hack to manage the libs already in the lib folder, instead of just those on the web.
(lib binary)
seems like much less typing than
(load "lib/binary.arc")
and also has the advantage of not reloading it if it's already been loaded.
On a side note, how hard would it be to selectively reload individual functions?
(reload example-fn)
Since the anarki help system keeps track of what file the function was declared in, it could presumably be used to automatically read in the file and eval the proper form. I'm just somewhat tired of working in a large library file and having to reload all of the functions, even if I don't need to. (I'm using a very impotent linux box for development, so it can get rather slow)
a large and structured library with multiple sub-folders
I don't think we need to design this system to do everything that somebody might someday need. There's already plenty of solutions for distributing large collections of files such as zip or tar files; we don't need to invent something to solve that problem.
EDIT: I didn't explicitly say this here, but I will now: I think that the version numbers should make a definite statement that "catdancer.toerr.3" is an inferior predecessor to "catdancer.toerr.4". I think that in the interest of avoiding "dependency hell", people should be expected to use the latest stable version of a library. However, to ease backwards compatibility, if a library developer sees that the next stable version of their library will break compatibility with other libraries which depend upon their library, they should inform the developers of the other libraries about this, so that once the new library is released, other libraries can be quickly updated to work with the new version.
I think that a good system for version numbers (and who doesn't love copying good systems?) is to have two "latest" versions always available -- a stable version, and an alpha/beta version. The stable versions are the odd version numbers, and the alpha/beta are the evens. Obviously that could be switched, but everybody should use the same convention, and I think this one makes sense because a project starts at version 0, and then the first stable version would be 1.
If this system were used, than only the odd-numbered versions would be required to remain constant. The even-numbered versions could vary as the dev(s) fixed bugs or added features. Odd-numbered versions which depended on other libraries would have to depend on odd versions of those libraries. An even version could depend on any library.
I like the idea of using forum nicknames, because they're unique. URLs are (not entirely, thanks to Internet Explorer...) case-insensitive, so maybe pg should change the forum so that two usernames can't be case-insensitively equal? (If that's the case already, scratch what I just said...)
Meta-data can come in a separate file, named the same as the library. It should probably be some form of alist:
This last part could work because libraries would be uniquely identified by a string, as CatDancer explained above.
Also, some form of standard directory structure could be good. Each person would be able to customize where their lib/ directory would be, and what it would be called (in the example, my directory is arc/keystones/). However, within that directory, I think there should be some convention of how libraries would be organized. I think one that makes sense is that each library would have a directory, within which each version would have a separate directory. If this is nested too deeply, it could instead be a wide nesting -- arc/keystones/foo.xyzlib.3/
Within the library directory, the file which gets loaded should have some standard name too -- the most obvious one would be the name of the library. The directory could contain other files containing more code, and those files would be loaded (or required) by foo.xyzlib/3/xyzlib.arc. Meta-data would be in the file meta.arc.
Customizability of the lib folder is probably a good idea, but it will probably done via hacking the code for the lib functions ;) It shouldn't be that hard to do anyway. With my naming scheme, you'd just change the string that was prepended to the library name.
I'm not sure that odd/even version numbers is such a good idea. It could be very confusing that way. I think that CatDancer's requirement of libraries to be static is a much more reliable concept. Otherwise like he said you'd need to check periodically for updates.
Libraries should also be able to depend on whatever they want. That's the author's decision. If they need the beta version, but they've tested it and know that what they have written is stable, then they should be allowed to publish it that way. They can always make a new version if they need a bug fix.
Also, since the version is just part of the lib name, you can have as many layers of minor version that you want i.e. 1.5.200906015.
Libraries should also be able to depend on whatever they want
Dependencies should actually be managed outside of the libraries themselves. For example, I have a library foo that depends on bar. Later a new version of Arc comes out that implements what bar did. Now foo doesn't need bar any more. But foo itself hasn't changed, so I shouldn't have to release a new version of foo just to say that it doesn't need bar with this new version of Arc.
Instead we publish dependency information about libraries. For example, I can say that foo needs bar 0 and arc 3, or just arc 4... something like:
The odd/even numbering doesn't have to be exactly that way. It could also be something like foo/xyzlib/1b for the beta, and foo/xyzlib/1 for the "stable" version.
only the odd-numbered versions would be required to remain constant
the version numbers should make a definite statement that "catdancer.toerr.3" is an inferior predecessor to "catdancer.toerr.4"
There's a difference between a release number and a version number. A version number, as you say, can be used to indicate that a later release is better, or indicate the stable vs. alpha/beta status of a release, etc. The release number merely identifies releases.
For example, pg had several releases of arc3. Under my naming system, they would have been named "pg.arc3.0", "pg.arc3.1", "pg.arc3.2", etc.
Regardless of the alpha/beta/stable status of a release, two releases of a library should never be released with the same name and release number for several reasons:
- If I'm telling you about a bug in your library, then I can tell you which release if saw the bug in. If you change your library without giving it a new release number, then we won't know if I'm talking about your old release or your new release.
- It's clear when a tool such as my "lib" library which downloads a library from a URL needs to download a new release. If there's a new release number, and I want that new release, then "lib" knows it needs to download the new release. If the release can change at the same URL, then "lib" has to periodically check to see if the file at the URL has changed.
- Just because I think that a release of mine is a "alpha/beta" version doesn't mean that you might not want to keep using it.
I see what you mean -- I was a bit confused about version vs release.
However, I still think that the name of the library should be "arc". Maybe the releases would be named "pg.arc.3.0", "pg.arc.3.1", etc. I just think that the library name should be distinct from the version and release numbers.
Feel free to post what ever you feel like relating to arc on this forum. That's what it's for!
Anarki, as the github fork of arc is called, is a very popular version, as it comes with several libraries and tools which can be very helpful. Basically, when anyone creates an addition to arc and can't wait for pg to adopt it, they put it in Anarki. Currently it's a little bit out of date, because it hasn't caught up to the new arc3, but it's getting there.
The closest thing to the CPAN equivalent would be github, using something like CatDancer's lib function. Unfortunately, while that makes it easy to share libraries, it doesn't make it so easy to find them. That's why many of the libraries are just pushed into the lib folder in Anarki.
How do you organize your hacks in git? I'm working on several libraries, and I'm running into the issue that I want them in separate version control so that I can push them to github separately, if need be, and not pollute Anarki until I'm finished. At the same time, I need them in the arc lib directory for using. How did you solve this one? Just keep them in separate folders near your arc installation and do (load "../blah/lib.arc")?
So if you want your library foo.arc to land in Anarki in the lib directory, create a patch to Anarki that adds foo.arc to the lib directory in a single commit.
One way to do this is to have one branch that tracks Anarki, and then a separate branch for your own development. You can merge new Anarki commits into your private development branch, but avoid ever merging from your development branch back into your Anarki tracking branch directly. Then in your private development branch you can have a messy development history (create foo, change bar, change foo, do something else, then change foo again), but only have add clean commits to your Anarki tracking branch that you push to Anarki ("create foo" or "update foo").
Interesting. This will be helpful if I wanted to use a scheme macro (are there any?), but otherwise I would probably just use the mz patch, using the pattern that showed up for the $ macro:
(mz.function args)
instead of
(mz:function __args)
It almost looks like a package reference, and it gets all of its arguments directly from arc, because it's just returning the procedure and calling it on the arc arguments. It won't work for macros, because those transform code, but it works pretty well for everything else.
Here's a new arc-level implementation of len, that works for dotted lists as well, using the mz/ac-tunnel patch:
(def len (x (o c 0))
(if (isa x 'string) (mz:string-length x)
(isa x 'vec) (mz:vector-length x)
(isa x 'table) (mz:hash-table-count x)
(and atom.x x) (+ c 1)
acons.x (len cdr.x (+ c 1))
c))
I think the reason that lists are usually flat lists is that a flat list is homogenous -- each car is data, and each cdr is the next element, or nil. Dotted lists add another case which you'd have to handle. It's just simpler to deal with flat lists.
If you're just dealing with two- or three-element lists, then it might pay off to use dotted lists. In such short lists, you'd be getting a space saving of 33% or 25% per list.
Arc uses vectors to implement tagged procedures. Type the name of a macro in on the repl, and it displays a vector:
arc> def
#3(tagged mac #<procedure:.../arc/arc.arc.scm:151:11>)
That's a vector with 3 elements: tagged, mac, and a procedure.
Arc doesn't currently let you construct or manipulate your own vectors, but they could be used to provide transparent meta-data attached to functions and variables, such as where they were defined, and what the source code was that did so.
Er, note that I said stable and official branches, not to the master branch. The master branch is what most people using anarki use. The official branch just tracks pg's released arc tarballs. The stable branch incorporates bugfixes and some simple additions (the arc.sh script). (If you knew all this already, and were just asking whether merging with the stable branch worked, yeah, it should just work.)
Switching over the entire master branch is essentially impossible, as has been discussed before... although there aren't many incompatible changed between arc2 and arc3, there are enough, and moreover, there are enough differences between arc2 and anarki that integrating the changes to arc3 wholesale into anarki is difficult. Anarki simply has so much in it - bugfixes, hacks, additions, libraries, pet projects, and even reimplementations.
The solution to this is for people to port the parts they care about separately. That way, the best parts of anarki will get ported and the cruft that no-one is maintaining will be expunged. I'm working on the help/documentation subsysem at the moment, and after that, the ability to define call behavior for tagged objects, and the many extra functions and macros anarki adds. I'll probably be making a post sometime soon about what I see as the way to move forward, but for now, anarki's master branch is unchanged.
Hmm. Have you started a new branch that is the official target, so that we're all working together instead of separately as we port our favorite things over?
I wouldn't mind helping you with the help/documentation stuff as I wrote some of it (src and the current version of fns), and it's probably my favorite part as well. Right now I'm a bit busy working on a new and improved ppr so that the src function actually returns reasonable looking source code ;)
I have been thinking about a new way to do the source code storage for src. I noticed that every time scheme makes a lambda, it remembers where in the code it was created. That means that the information about where the source is is associated with the lambda object, not the name. What if we modified the way that ac-fn worked, so that it associated each fn with its source code? That way all procedures, macros included, would have their source code available for inspection.
That's the idea, the hard part is making it work, and making it work without leaving the old definitions behind when the lambda no longer exists (temporary code, etc.) Any ideas? Maybe we could use vectors like the tag system? That would make it easy to look them up in a table, but doesn't fix the problem of garbage collection. Also, hacking it on at such a low level means that it probably captures the state of the source code after all of the macro expansions, so it may not be such a good idea after all.
the hard part is making it work, and making it work without leaving the old definitions behind when the lambda no longer exists
I don't know about the first part, but for the second, what you want is a weak hash table (http://download.plt-scheme.org/doc/372/html/mzscheme/mzschem...) with the lambda's stored as the keys. The weak hash table holds the keys "weakly", which means it doesn't prevent the keys (the lambda's) from being garbage collected, and when they are garbage collected, the related value is dropped from the table.
If all the following lines go together as a group, and the arguments on the first line aren't part of that group, then I'll indent the following lines differently to emphasize the grouping:
(each (k v) mytable
(prn "my key is " k)
(prn "and my value is " v))
It doesn't matter to me whether it's a macro or a function:
(copy (obj a 1)
'b 2
'c 3)
I use the two space indentation a lot more with macros than I do with functions because many macros use the pattern of some arguments followed by a body and not many functions do, but I don't change my indentation depending on whether it's a function or a macro.
If the lines after the first line don't fall into their own group, then I'll line up the arguments:
(+ '(a b c d)
'(e f g h)
'(i jk))
(map (fn (x) ...)
(generate-my-list ...))
For me the important consideration is using indentation to show which arguments go together.
I think there's a general tradition in Lisp indentation, which works essentially as follows:
Function calls are indented with all the arguments starting at the same column, for example:
(calculate-fn-of arg1
arg2
arg3)
Macro calls are indented with the arguments two spaces in, for example:
(while (< x 10)
(prn x)
(++ x))
This is usually done to emphasize the "body" of the macro arguments. So in macros which take several arguments before the "body" code, you'd put those arguments either on the same line, or indented in some way to distinguish them from the body. For example:
(with (var1 val1
thing2 val2
var3 expr3)
(do
(stuff)
(with-vars)))
However, some forms are indented differently for clarity. This doesn't seem to be as standardized. An example would be Arc's 'if macro. It makes sense to indent it with all the conditions on one line, but where do the result forms go? Possibilities are:
I just noticed that there is a file named CONVENTIONS in Anarki that states comment and indentation conventions for arc. They say pretty much what has been said so far, but may be worth looking at if you are interested.
I think that your first example is probably best in terms of compactness and clarity (associating each condition with its corresponding code). The completely straight indentation can be difficult to scan quickly if you have more than one condition. IMO, the only problem with your first example is when the conditions span more than one line or are very long.
In terms of 'if, I was looking through some of pg's code just now, and noticed that he does use the wavy form of indentation. Thus, if you're big on proof by authority, the correct form of indentation for 'if would be the wavy one...
However, in this case I agree with Adlai: pg had even planned on getting ppr to "properly indent long ifs". I presume that means to implement the wavy feature. It's also the convention for Anarki, according to the CONVENTIONS file.
So, here's what I'm thinking should be done for ifs:
1) If you only have three expressions, if-then-else, use:
(if a
b
c)
1) If you like the format, and the expressions are very short use:
(if a b
c d
...
e)
3) If you don't like that form, or the expressions are longer, use:
(if (a )
(b )
(c )
(d )
(e ))
I'm not sure whether to use one or two spaces between clauses. The anarki conventions file uses two spaces. Elsewhere I've seen only one space. So, which is it? Now or never ;)
(I'm writing a new version of ppr with proper indentation of forms, in case you wanted to know why I'm so interested in the indentation question)
I think two is better, because it a) clearly distinguishes it as indentation (rather than just an accidental #\space), and b) it's consistent with the indentation of macros & co.
Should it depend on whether they fit on one line or not? If they don't fit, should it still be like b., just with the second part spilling onto the next line?
Your first version is the one currently implemented by ppr.arc. If you would like, you can write the indentation function for your second version (it can even include the "; else")
Otherwise I think that the first version is generally clearer, and the second is rarely needed, as the value of the case statement can't usually be very long. Could you give me an example where you use your second version?
Sure; here's something from my tagged-unions.arc on Anarki (Arc 2). It's responsible for parsing the types specified for slots in the tagged union:
(each type-frag types
; Assemble the ('type name) pairs into two lists, so that we can iterate
; through them.
(case (type type-frag)
; If it's a symbol, then we're defining a new name; add a new set of
; values, and go.
sym
(do
(zap [cons type-frag _] names)
(zap [cons nil _] values))
; Otherwise, we're adding a value to an existing variant.
cons
; I changed my mind about the order (now it's (name pred?) instead of
; (pred? name)), so I'm reversing it here.
(zap [cons (rev type-frag) _] (car values))
; else
(err "Invalid member of tagged union declaration.")))
I should also add (just as another data point) that my multi-condition ifs look like that too, and single-condition ifs look like the following:
(def macexn (n expr)
" Macroexpand `expr' `n' times. NB: `expr' *is* evaluated!
See also [[macex1]] [[macex]] "
(if (and (number n) (> n 0))
(macexn (- n 1) (macex1 expr))
expr))
I'm not hugely wedded to any of my conventions, though, and I haven't coded in Arc for quite a while anyway; this isn't intended to get you to change anything or convert people to my style. As I said, it's just another data point.
The nil is the return value of the function. Since the last thing you did was an assignment, is 'nil. It only shows up like that on the repl, which prints both stdout and return values. It is true that printing a newline first would make it show up in the next line, but it doesn't really matter if you're making it into a web service since the nil won't be printed.