Saturday, June 8, 2013

Rust wc

So in my attempt to practice writing POSIX command-line tools in Rust, the next tool I tried to rewrite was wc (or word count). This is a simple tool that counts 4 things:

  • lines (separated by U+0D, LINE FEED)
  • words (separated by any whitespace)
  • characters (multibyte encoded, probably UTF-8)
  • bytes

In the interest of brevity, I'm going to refer the reader to this link for the source code. One thing I learned is that the current std::getopts library doesn't have many features, e.g., it doesn't link short options and long options, so you have to check each separately. While I was writing this, some people in #rust on irc.mozilla.org suggested that I look at docopt which has not been ported to Rust yet, still mostly a Python library, but if ported would make a great addition to the language.

Saturday, June 1, 2013

Rust echo

I recently found the Rust programming language, and I've been having a blast of a time learning it. So far, it seems like a mixture of 50 different languages, most notably C++ and Haskell.

To help myself learn it, I decided a good place to start was with command-line utilities, the most basic of which are defined by POSIX. POSIX echo does not have any parameters, it just prints all of its arguments back to the user. BSD echo takes one parameter '-n' and if it's not given, then it prints a newline after all arguments are printed.

This seemed like a good example to learn a language so this was my first attempt:

fn main() {
    let mut newline = true;
    let mut arguments = os::args();
    arguments.shift();
    if arguments[0] == ~"-n" {
        arguments.shift();
        newline = false;
    }
    for arguments.eachi |argi, &argument| {
        print(argument);
        if argi == arguments.len() - 1 {
            if newline {
                print("\n");
            }
        } else {
            print(" ");
        }
    }
}

I talked about this on #rust at irc.mozilla.org and bjz and huonw recommended:

fn main() {
    let args = os::args();
    match args.tail() {
        [~"-n",..strs] => print(str::connect(strs, " ")),
        strs => println(str::connect(strs, " ")),
    }
}

which had some issues with it, so I refactored it to be more functional:

fn unwords(args: &[~str]) -> ~str {
    return str::connect(args, " ");
}
 
fn echo(args: &[~str]) {
    match args.tail() {
        [~"-n", ..strs] => print(unwords(strs)),
        strs => println(unwords(strs)),
    }
}
 
fn main() {
    echo(os::args());
}

which defined a Haskell-like unwords function to help. I am quite pleased with how supportive Rust is to approaching the problem from a procedural paradigm or a functional paradigm. Hooray for multi-paradigm programming!

Review of previous posts

In this post, I'm going to be reviewing my old posts with what I've learned over the past few years. The purpose behind this analysis is to consolidate my ideas into fewer ideas and hopefully come up with some new insights along the way.

This was my first post about XML entity references. The goal was to restrict the DTD grammar to only entities so facilitate usage in other applications that have no concern about elements and attribute lists. Now I would recommend using only DTD since its usage is widespread. I was considering folding this into a macro system I was planning to write, so that one could replace © with © just as easily as it could replace sqrt(4) with 2. I've come to the conclusion that this post was premature, and needs to be included in a more detailed discussion about macro systems, and templating systems such as M4, Genshi, Mustache, and many others.

This was an example of binary parsing, which is still not possible in many languages, but a topic more suited to C structs. GCC has a large number of attributes and packing parameters that accomplish some of the goals of binary parsing, but I still haven't found a good binary parsing framework to date.

This was a bunch of things trying to make Go something that it wasn't. I have since found the Rust programming language, which has all the features I was looking for in this post and more. So, now I would recommend using Rust.

These posts were just crazy comments.

The following posts were all about RDF:

This was a comment primarily intended to allow for arbitrary Common Lisp lists to be represented in RDF.

This was a comment as well, and I still haven't found a datatype URI for this, perhaps something for a future article.

This is interesting because there isn't an XSD datatype associated with large finite-size integers, although you could define it as xsd:integer, then restrict the domain until it matches int128_t. I have since defined a URI for this on my other website.

This was fun. I have since written about these algorithms on my other website.

This is a type of post I'd like to write more of. You know, explaining the pros and cons of certain implementation techniques and why one is better or worse than the other.

This was attempting to illustrate the inconsistencies that I found between different XML standards.

This and this were comments about The Web. I would like to add that I've started to use unqualified identifiers to refer to RDF and OWL core concepts in my private notes:

  • Type = rdfs:Class
    • ObjectType = owl:Class
    • DataType = rdfs:Datatype
  • Property = rdf:Property
    • ObjectProperty = owl:ObjectProperty
    • DataProperty = owl:DataProperty

The following posts were unintentionally about Data URIs:

This was attempting to solve a problem that Data URIs already solve. Now I recommend using Data URIs instead.

This was a good comment, but I don't recommend these namespaces anymore. Now I recommend using Data URIs instead, for example:

<data:text/x-scheme;version=R5RS,vector-length>

This mentions a problem regarding xsd:hexBinary and xsd:base64Binary, which can be solved in several ways:

  1. Candle assigned the type identifier "binary" to the combined value space.
  2. The MediaType "application/octet-stream" is naturally isomorphic to the binary datatype.
  3. The Data URI <data:application/octet-stream,DATA> may be used to represent data of type binary.
  4. The URI
    <http://www.iana.org/assignments/media-types/application/octet-stream>
    may also be used to represent the binary datatype itself.