Learning Go (and reading Tar files with it, too)
Over the past couple of days, I’ve taken an interest in Go. I’ve gone through the bulk of the documentation. I’ve read the tutorial, the Effective Go documentation, and a collection of other works available on the documentation page. I’m really taking quite a liking to it. Why?
- It’s quite expressive. This was one of the main things that got me into Python. I liked the fact that I could squeeze a lot of functionality out of just a few lines of well thought out code. Go provides that same functionality and it gives near-C performance (though, of course, there is overhead).
- It’s a small language. This is why I’ve always preferred C to C++. C is a very, very small language. There isn’t much to it at all. Once one gets the idioms down, C programming is not overly difficult. Some of the OS level work might be, but the language itself is very small.
- It’s quite easy to extend. I actually wrapped Python in Go this afternoon after only about an hour of effort. The cgo compiler provides very straightforward means of integrating and linking native C code.
- It provides a very intuitive and easy-to-use threading/parallelism model with the use of goroutines.
What’s there not to like? Exactly. While the standard library has yet to read “batteries included” status, it is getting there. Now, you’ll find modules for SMTP, networking, HTTP, and IO, among other things. There’s even a growing collection of third party modules available on the Package Dashboard.
After reading through the available documentation, I decided to try my hand at a few examples. I couldn’t very much decide where to start, so I opened the standard library reference and started with ‘A.’ In this case, that means the archive packages – tar and zip.
Reading Tar Files with Go
First off, if you don’t have Go installed as of yet, head over to the getting started guide. Installation is source driven, but it’s truly very simple. The compilation is slightly bulky as you’ll need to compile, link, execute. That’s a cycle I haven’t used in a while. In order to speed it up, I’ve defined the following shell function in my .profile.
gogogo() { source=$1 base=$(basename $source .go) 6g $source 6l -o $base $base.6 $(pwd)/$base }
Of course, this little rule only works when building one file at a time.
Now, let’s setup the environment. This would work with much larger tar files, but for example purposes, we’re going to limit ourselves to some very small ones. The test environment was setup as follows:
$ echo "one" > 1 $ echo "two" > 2 $ echo "three" > 3 $ tar -cvf test.tar ./[123] a ./1 a ./2 a ./3 $
Notice that we’ve simply created three text files, named 1, 2, and 3. Each file contains its english representation. Let’s spice it up a bit and change the contents of “1″ to один, which is simply the Russian word for “one.”
$ echo 'один' > 1 $ cat 1 один $
Alright. Now, let’s have a look at the Go code needed to read from the tar file we created.
package main
import "archive/tar"
import "fmt"
import "os"
func main() {
f,_ := os.Open("test.tar", os.O_RDONLY, 0600)
reader := tar.NewReader(f)
file_count := 1
for hdr,err := reader.Next(); err == nil; hdr, err = reader.Next() {
b := make([]byte, hdr.Size)
fmt.Printf("%d: %s (%d bytes)\n", file_count, hdr.Name, hdr.Size)
reader.Read(b)
fmt.Println("-----> ", string(b))
file_count++
}
}
That’s all there is to it. Before we go into detail and explain exactly what’s going on here, let’s built it (using our shortcut) and run in order to verify the output.
$ gogogo tar.go 1: ./1 (9 bytes) -----> один 2: ./2 (4 bytes) -----> two 3: ./3 (6 bytes) -----> three
As an aside, if you run an ls from the current directory, you’ll see the intermediate files needed by the go compiler. We’re just skipping that step with our profile function. Let’s take a closer look at the code, and then double back and review the results.
The very first line of the file declares that we’re defining package main. Each Go source file requires a package name. Larger applications are simply collections of linked packages. The idiomatic name for the main package is, not suprisingly, main.
Next we import three more packages. These are all used within the file. Note that if we were to import a package that we’re not using, the application would not compile. That ensure we always have clean code without a polluted namespace.
Now, we define a main function. This is the application entry point, much like you’ll see in a C, C++, or Objective-C program.
- We open a file by using the os.Open function. In Go, identifiers are only exported from packages if they begin with a capitol letter (think static vs. non-static C functions). The os.Open function returns a file object.
- Next, we create a tar reader. The NewReader function expects an object implementing the Reader interface. Note that with Go, there’s no need to declare what you’re implementing. If you add the required methods, you automagically implement the interface. I like this approach. Duck typing light, if you will.
- We then iterate through the contents of the tar file by calling reader.Next. In Go, functions and methods can return more than one value. It’s common to see the actual value and a possible error condition passed back. As long as the error condition is not nil (Go’s None/NULL), we keep reading. The reader.Next method returns a header structure as well as an error value.
- Now, we create a slice of bytes. The make syntax creates a slice and an underlying array. For more information, see the Go documentation on arrays and slices.
- We print some status information, read the full contents of each element in the tar file, and print the results. We increment our file number counter so we can display how many files we’ve read.
Just for reference purposes, the Tar header is defined as the following in archive/tar:
type Header struct {
Name string
Mode int64
Uid int
Gid int
Size int64
Mtime int64
Typeflag byte
Linkname string
Uname string
Gname string
Devmajor int64
Devminor int64
Atime int64
Ctime int64
}
Pretty straightforward, no? Now, double back and look at the results from earlier. They should all make sense. Well, maybe except for our little Russian file! Note that this record says that it is 9 bytes in size, while the remaining files state that they are equal to the number of files plus the trailing newline. The answer is pretty simple. Each Cyrillic letter takes up two bytes when UTF-8 encoded. Go transparently handles that complexity for us. So, we’re looking at four two-byte letters, followed by a standard newline.
$ file 1 1: UTF-8 Unicode text $ file 2 2: ASCII text $
So, all in all, I’m starting to like this language. Take a minute and dive into it. Now, of course, the challenge becomes finding something worthwhile to write in Go. I’ve learned a small collection of languages over the past year, only to forget most of them due to lack of use. As an interesting aside, I had been planning on refreshing my C/C++. I think I may defer that a bit and really spend some time on this language.
Update: I’ve been asked how one woud manage a gzip compressed tar file. Well, remember interfaces? The reader we pass into the tar.NewReader function simply has to implement the Reader interface! So, we can update the code above to open a gzip file and pass that reader in, like so:
package main
import ("fmt"
"archive/tar"
"compress/gzip"
"os"
)
func main() {
fhandle, _ := os.Open("test.tar.gz", os.O_RDONLY, 0600)
zhandle, _ := gzip.NewReader(fhandle)
thandle := tar.NewReader(zhandle)
hdr, _ := thandle.Next()
fmt.Println(hdr.Name)
}
See? All we’ve done here is chain our NewReader calls, as each returned object implements the Reader interface. Running the code provides the following output.
mcjeff@macbook:~$ ls test.tar.gz test.tar.gz mcjeff@macbook:~$ gogogo gunzip.go ./1
There. Hope that clears it up!


on February 15, 2011 at 12:17 am
Permalink
[...] This post was mentioned on Twitter by Jeff McNeil, Jeff McNeil. Jeff McNeil said: http://ow.ly/3WwUc New Blog Post. Taking an interest in Go as of late and I've been stepping through the standard library. [...]
on February 15, 2011 at 2:50 pm
Permalink
[...] this example, I’m going to use the same little shell function I used in my first Go post. Everything else is new. Let’s take a look at what’s required to build this [...]
on February 16, 2011 at 11:58 pm
Permalink
[...] Alright, we’ll first run our code using the little build script we stuck in our ~/.profile file in the first Go blog entry. [...]
on February 24, 2011 at 1:06 am
Permalink
[...] just easier that way. We’ll be using the little gogogo command that we created in the first Go post. The first thing we’ll do is create a small collection of types and methods such that we [...]