Learning Go (and reading Tar files with it, too)

Over the past couple of days, I’ve taken an interest in Go. I’ve gone through the bulk of the documentation. I’ve read the tutorial, the Effective Go documentation, and a collection of other works available on the documentation page. I’m really taking quite a liking to it. Why?

What’s there not to like? Exactly. While the standard library has yet to read “batteries included” status, it is getting there. Now, you’ll find modules for SMTP, networking, HTTP,  and IO, among other things.  There’s even a growing collection of third party modules available on the Package Dashboard.

After reading through the available documentation, I decided to try my hand at a few examples. I couldn’t very much decide where to start, so I opened the standard library reference and started with ‘A.’  In this case, that means the archive packages – tar and zip.

Reading Tar Files with Go

First off, if you don’t have Go installed as of yet, head over to the getting started guide. Installation is source driven, but it’s truly very simple.  The compilation is slightly bulky as you’ll need to compile, link, execute.  That’s a cycle I haven’t used in a while.  In order to speed it up, I’ve defined the following shell function in my .profile.

gogogo() { source=$1 base=$(basename $source .go) 6g $source 6l -o $base $base.6 $(pwd)/$base }

Of course, this little rule only works when building one file at a time.

Now, let’s setup the environment.  This would work with much larger tar files, but for example purposes, we’re going to limit ourselves to some very small ones.  The test environment was setup as follows:

$ echo "one" > 1
$ echo "two" > 2
$ echo "three" > 3
$ tar -cvf test.tar ./[123]
a ./1
a ./2
a ./3
$

Notice that we’ve simply created three text files, named 1, 2, and 3. Each file contains its english representation.  Let’s spice it up a bit and change the contents of “1″ to один, which is simply the Russian word for “one.”

$ echo 'один' > 1
$ cat 1
один
$

Alright. Now, let’s have a look at the Go code needed to read from the tar file we created.

package main

import "archive/tar"
import "fmt"
import "os"

func main() {
  f,_ := os.Open("test.tar", os.O_RDONLY, 0600)
  reader := tar.NewReader(f)
  file_count := 1
  for hdr,err := reader.Next(); err == nil; hdr, err = reader.Next() {
    b := make([]byte, hdr.Size)
    fmt.Printf("%d: %s (%d bytes)\n", file_count, hdr.Name, hdr.Size)
    reader.Read(b)
    fmt.Println("-----> ", string(b))
    file_count++
  }
}

That’s all there is to it. Before we go into detail and explain exactly what’s going on here, let’s built it (using our shortcut) and run in order to verify the output.

$ gogogo tar.go
1: ./1 (9 bytes)
----->  один

2: ./2 (4 bytes)
----->  two

3: ./3 (6 bytes)
----->  three

As an aside, if you run an ls from the current directory, you’ll see the intermediate files needed by the go compiler. We’re just skipping that step with our profile function. Let’s take a closer look at the code, and then double back and review the results.

The very first line of the file declares that we’re defining package main. Each Go source file requires a package name. Larger applications are simply collections of linked packages. The idiomatic name for the main package is, not suprisingly, main.

Next we import three more packages. These are all used within the file. Note that if we were to import a package that we’re not using, the application would not compile. That ensure we always have clean code without a polluted namespace.

Now, we define a main function. This is the application entry point, much like you’ll see in a C, C++, or Objective-C program.

  1. We open a file by using the os.Open function. In Go, identifiers are only exported from packages if they begin with a capitol letter (think static vs. non-static C functions).  The os.Open function returns a file object.
  2. Next, we create a tar reader. The NewReader function expects an object implementing the Reader interface. Note that with Go, there’s no need to declare what you’re implementing. If you add the required methods, you automagically implement the interface. I like this approach. Duck typing light, if you will.
  3. We then iterate through the contents of the tar file by calling reader.Next. In Go, functions and methods can return more than one value. It’s common to see the actual value and a possible error condition passed back. As long as the error condition is not nil (Go’s None/NULL), we keep reading. The reader.Next method returns a header structure as well as an error value.
  4. Now, we create a slice of bytes. The make syntax creates a slice and an underlying array. For more information, see the Go documentation on arrays and slices.
  5. We print some status information, read the full contents of each element in the tar file, and print the results. We increment our file number counter so we can display how many files we’ve read.

Just for reference purposes, the Tar header is defined as the following in archive/tar:

type Header struct {
    Name     string
    Mode     int64
    Uid      int
    Gid      int
    Size     int64
    Mtime    int64
    Typeflag byte
    Linkname string
    Uname    string
    Gname    string
    Devmajor int64
    Devminor int64
    Atime    int64
    Ctime    int64
}

Pretty straightforward, no?  Now, double back and look at the results from earlier. They should all make sense. Well, maybe except for our little Russian file! Note that this record says that it is 9 bytes in size, while the remaining files state that they are equal to the number of files plus the trailing newline.  The answer is pretty simple. Each Cyrillic letter takes up two bytes when UTF-8 encoded. Go transparently handles that complexity for us. So, we’re looking at four two-byte letters, followed by a standard newline.

$ file 1
1: UTF-8 Unicode text
$ file 2
2: ASCII text
$

So, all in all, I’m starting to like this language. Take a minute and dive into it. Now, of course, the challenge becomes finding something worthwhile to write in Go. I’ve learned a small collection of languages over the past year, only to forget most of them due to lack of use. As an interesting aside, I had been planning on refreshing my C/C++. I think I may defer that a bit and really spend some time on this language.

Update: I’ve been asked how one woud manage a gzip compressed tar file.  Well, remember interfaces? The reader we pass into the tar.NewReader function simply has to implement the Reader interface! So, we can update the code above to open a gzip file and pass that reader in, like so:

package main

import ("fmt"
        "archive/tar"
        "compress/gzip"
        "os"
       )

func main() {
  fhandle, _ := os.Open("test.tar.gz", os.O_RDONLY, 0600)
  zhandle, _ := gzip.NewReader(fhandle)
  thandle := tar.NewReader(zhandle)
  hdr, _ := thandle.Next()
  fmt.Println(hdr.Name)
}

See? All we’ve done here is chain our NewReader calls, as each returned object implements the Reader interface.  Running the code provides the following output.

mcjeff@macbook:~$ ls test.tar.gz
test.tar.gz
mcjeff@macbook:~$ gogogo gunzip.go
./1

There. Hope that clears it up!

Posted on February 14, 2011 at 11:52 pm by Jeff McNeil · Permalink
In: development, Go, open source

4 Responses

Subscribe to comments via RSS

  1. [...] This post was mentioned on Twitter by Jeff McNeil, Jeff McNeil. Jeff McNeil said: http://ow.ly/3WwUc New Blog Post. Taking an interest in Go as of late and I've been stepping through the standard library. [...]

  2. [...] this example, I’m going to use the same little shell function I used in my first Go post. Everything else is new.  Let’s take a look at what’s required to build this [...]

  3. [...] Alright, we’ll first run our code using the little build script we stuck in our ~/.profile file in the first Go blog entry. [...]

  4. [...] just easier that way.  We’ll be using the little gogogo command that we created in the first Go post.  The first thing we’ll do is create a small collection of types and methods such that we [...]

Subscribe to comments via RSS