Arc Forumnew | comments | leaders | submitlogin
A proposition for regular expressions in Arc
4 points by arnoooooo 5963 days ago | 7 comments
I was thinking about using the scheme pregexp library to get Perl regular expressions in arc.

I am trying to find a nice syntax that would be both short and coherent. Here is what I came up with so far, which basically involves considering regexps as functions :

  arc> (/ca/ "bobcat")
  ("ca")

  arc> (/(\w{1})/g "bobcat")
  ("b" "b" "o" "b" "c" "a" "t")

  arc> (/cd/ "bobcat")
  ()

  arc> (s/b/d/ "bobcat")
  "dodcat"

  arc> (s/b(\w)/d\\1/g "bobcat")
  "dodcat"
I don't really like this last one, since it would have been nice to also get the list of matches, but returning a list would make it inconsistent with the one above. Returning a list for both cases would be a pain. This is probably one case where returning multiple values would be nice,

  arc> (= re /ob/)
  /ob/

  arc> (re "bobcat")
  ("ob")

  arc> (= c 2)
  2
  arc> (makere "/^\d{" c "}/g")
  /^\d{2}/g
What do you think ?

Also, how much sense do you think it would make to have regexps as part of the core language ? I'd love to see a real discussion on regular expressions, their usefulness and possible alternatives.



3 points by cchooper 5963 days ago | link

My suggestion: when regexps are used in the functional place, they should do what users most likely want from them.

  arc> (s/b(\w)/d\\1/g "bobcat")
  "dodcat"
but they can also be used as arguments to functions, which will give you all the other behaviour.

  arc> (match s/b(\w)/d\\1/g "bobcat")
  ("b" "o" "b" "c" "a" "t")

  arc> (sub s/b(\w)/d\\1/g "bobcat")
  "dodcat"

  arc> (match-and-sub s/b(\w)/d\\1/g "bobcat")
  ("dodcat" ("b" "o" "b" "c" "a" "t"))
or something like that. My regexp knowledge is a little rusty. I also agree with stefano's point that they should be part of the standard library. This would be possible if there were easy ways to reprogram the Arc syntax in a library.

-----

3 points by almkglor 5963 days ago | link

> This would be possible if there were easy ways to reprogram the Arc syntax in a library.

Again, like I said, this is probably implementable using readermacros, but fooling around with the reader is always troublesome.

Consider some random programmer who uses /ca/ as a variable name in his or her programs for some inexplicable reason. Whether this is considered a regular expression or a valid symbol will then depend on whether or not it is loaded before or after the regular expression library.

If you want nice regular expression syntax, then it must be standardized as part of the language syntax so that everyone knows they should avoid using such variable names. Alternatively, give some method for specifying a reader for each module file. No, this is an exploratory language, and someone will try using /ca/ as a variable name unless you specifically ban it. I promise you that.

This is only partially implementable using ssyntax, but again this may be considered as "fooling around with the reader".

If strings as regular expressions work for you, then it's okay, since strings are already standardized in the syntax:

  arc> ((rex "s/b(\\w)/d\\\\1/g") "bobcat")
  "dodcat"

-----

4 points by cchooper 5962 days ago | link

Strings would work ok if Arc had a means of representing unprocessed strings like Perl or C#.

  arc> ((rex @"s/b(\w)/d\\1/g") "bobcat")
Has lots of other uses too, so I think this would be a good feature regardless.

-----

1 point by almkglor 5962 days ago | link

True, another good place would be docstrings.

Now all we need is to (re)build a reader for Arc. ^^

-----

5 points by stefano 5963 days ago | link

Regexps are a particular tool. They are useful in many cases, but I think that they aren't general enough to put them in the core. They should be in the standard library, though.

-----

1 point by almkglor 5963 days ago | link

a plain expression like /ca/ seems rather straightforward to me.

This requires some kind of special version of the reader.

One thing I suggest would be to use some sort of # readermacro (which you'll have to hack in the underlying Scheme):

  #/ca/
  #/s/foo/bar/
If you want nice syntax for regexps, almost definitely it will have to be part of the axioms, or at least readermacros (which I personally don't like). Otherwise if representing them via strings is acceptable, then we don't need it as part of the axioms.

-----

2 points by bOR_ 5963 days ago | link

I am not sure what the consequences are for possible implementations, but preferrably, I would want to be able to use regexps in (find or (keep, or (findsubseq just as easily as I would now use strings or functions

  (keep odd (list 1 2 3 4))
  (keep (reg /nan/) (list "banana" "bonobo" "bandanga"))
  (findsubseq (reg /\d+:\d+/) "The current time is 10:00 am")
When looking at http://arcfn.com/doc/string.html , there is an aweful lot of restrictions on when we can use variables, or functions as arguments, so maybe a lot of the string operations in there become obsolete if there is a regexp engine in arc.

-----