I’m sick of typing this, too…

So, I’m also sick of Chik-Fil-A.

American free speech, quite simply, means the government won’t silence you. You know, like they did in the USSR (or CCCP, for my Russian friends) back when you called the communists a bunch of collective ownership touting douche bags. Or, maybe like when you pissed off Big Bad Daddy Hussain and wound up tossed off of a 3rd story roof.

With regards to Dan Cathy, when the public backlashes to something he said, that is okay. There is no amendment violation at play here. If, however, the government prosecuted him for saying what he said? Yeah, that’s a problem and I’d come to his defense. I’d do this even though I fully disagree with each and every thing that that he said in that now famous interview.

Given that tidbit, can we stop spouting off about the first amendement? There is no rights violation at play here. He said what he wanted as per his rights and those that listened responded in kind, as to their rights.

Good? Good. Sure? Good.

Now, boycotts! Fun! I really believe that it won’t cause a material impact to the chain. It’s a zero sum game. For everyone that elects to no longer eat at Chik-Fil-A, someone else will drive a block further than the closest McDonald’s in order to show support.

As all of that has been clearly laid out, we can boil this whole thing down to two very concise points:

  1. He’s freely exercising his right, as are those responding and refusing to eat at the establishment.  Put this in another context. If a Target associate insisted that you were in fact a dick, you’d probably no longer shop at Target. Danny did this on a much larger scale.
  2. Boycott or no, I don’t want my money going towards these causes. I don’t give a damn if the chain is monetarily damaged or not.  I don’t want a single cent of my bi-monthly winnings to go “traditional family organizations.”
Cool? Make sense? Oh, and damn it right-wing associations, give me the word “family” back. Stop using it to ensure your raging Christian-centric group sounds more appealing.

Finally, I can’t argue against him voicing his opinion. That’s cool and all that. Though, I can easily question his leadership ability having done so with such vigor and in such a public manner.

Update (8.1.2011)

A few folks have pointed out that the responses from Boston & Chicago illustrate free speech violations. Consider that the first amendement states “Congress shall make no law…”  In fact, congress has made no law. Not only that, no one has banned them from any city yet.

With regards to the Chicago city councilman attempting to block Chick-Fil-A’s expansion. Might he be taking it too far? Yeah, probably. Is he in the wrong if he blocks expansion? Yeah, technically. That said, I’m far less motivated by a city council member trying to block a restaurant than I am by federal, state, and local governments injecting the rights limiting measures.

Posted on August 1, 2012 at 1:31 pm by Jeff McNeil · Permalink · One Comment
In: politics

Because I’m Sick of Typing This…

I got into Yet Another Social Network Argument© recently regarding gay marriage. Quite honestly, I’m sick of it.  No, I’m not sick of gay marriage. I’m sick of the opposition to it here in the United States.

You know what I’m even more sick of? Typing the same damn argument out each time it comes up.  So, henceforth, I will simply paste this link. If you’re the recipient, congratulations! You’re wrong. A word of warning up front, though. You may find yourself agreeing with me. Run with that.

Marriage exists on two levels. Individuals who are highly religious (the faith in question does not matter) believe that marriage is a divine union, before God. Sometimes the term “sacrement” is used. There’s nothing wrong with that.

Then, we have marriage at the civic level. That’s entirely different. It is just a contractual joining of two individuals into one entity. The forming of a business. There are legal implications, rights, and other associated privileges. Consider taxes, visitation, medical, insurance, child custody, adoption, and so on.

I really care not about the first type when it comes down to this argument. It doesn’t matter to me. Jewish weddings, Christian ceremonies, Muslim Nikah? I couldn’t care less. The point to understand here is that to each member of a practicing faith — or for some, no faith at all — this is just as important as your promises before witnesses and God. I don’t want to take any of that away. I don’t want to make clergy marry gay folks. No one wants that. Some clergy will be willing, however, and that is fine. That’s a question for that specific church.

Furthermore, for the dedicated faithful, this practice should be held in a higher regard than the civic contract. The civic contract is just a detail. The marriage is for eternity. You profess that your faith matters the most to you. Then live by that.

Now, I want to decouple the two. Consider the following points. Really. Consider them with an open mind and reserve your fire and brimstone speech for just a little bit longer. I promise, it makes a lot of sense.

First, from a rights standpoint, everyone is equal. Love isn’t even a requirement for the contract. Sounds bad? It’s not. It’s a legal binding agreement. That’s it. Two men. Two women. Two people in crazy love. Two best friends in their 70′s who have no one else and want to share assets and decisions. Contract. Legal age? Paid the fee? I pronounce you legally entered into a binding contract. Pay the exit fee when you’re done. Thanks. Please don’t kiss the other party; we don’t care if you’re in love.

Secondly, the two don’t have to happen simultaneously. Perform the faith driven ceremony and file the paperwork for the contract six months later. Think about all of the divorces we’ll save. There’s no “living in sin” at this point as in the eyes of God, the marriage has the rubber stamp.

The two elements belong distinct. The contract has no religious connotation. Rights are equal. Religious marriages stay untouched. Everyone is happy.

In the spirit of full disclosure, I do want to clarify another point. Unfortunately, my argument isn’t all about how beautiful love is or how everyone deserves to marry a soulmate. None of that “fluff” matters to me.  Then, why so vocal? Simple. I disagree with the legislation of religious beliefs, especially when they result in legally restricting rights of others. Think about that. Legally restricting the rights of others. Ew. That tastes bad.

Finally, I’m in a very happy heterosexual marriage. I love the shit out of my wife. You know, I’d have absolutely no problem going downtown and trading my Province of Ontario marriage certificate in for a State of Georgia Civil Union affidavit. This, of course, is assuming my gay buddies get to do the same thing.

Go on. Tell me what the problem with this approach is.  I’ve got $10 on you failing to do so without a Bible quote.

 

Posted on July 24, 2012 at 9:55 pm by Jeff McNeil · Permalink · 3 Comments
In: politics, religion

Debugging Your Python With GDB (FTW!)

In this post we’ll take a look at how to debug your Python code using GDB. It is a handy thing to understand, especially if you’re confronted with an unexpected SEGV or other less than helpful error. I do realize there is some awesome python-gdb.py integration with GDB. I purposefully ignored that. Sometimes all ya have is the binary.

As an unfortunate note, I started doing this using Python 3.3, but at some point, I switched to 2.6 accidentally. I’ve migrated the earlier pieces to 2.6. If anyone smarter than I notices an inconsistency, this is why. I’m fairly certain I’ve cleaned it all up.

Finally, the GDB formatting is mine. I attempted to make it slightly more readable. Hope it helps.

Update: One thing I forgot to mention was the set of GDB macros that come with the Python source. That automates a good bit of the mechanics outlined here. Thanks, Evgeny!

How Does Python Evaluate Code?

First, a little bit of background. Python implements a stack-based virtual machine.  Python byte code manipulates that stack during normal execution.   For example, let’s take a look at a small application disassembled into byte code:

a = 1
b = 2
c = a + b
print c

This is a fairly trivial example that should show us a good sampling of the “instruction set.” We’re going to skim over this bit as understanding all of the byte code operations really isn’t a necessity here. When we use the dis module, we see that the following code is generated:

jeff@martian:~/cpython$ /usr/bin/python -mdis add.py
1           0 LOAD_CONST               0 (1)
            3 STORE_NAME               0 (a)

2           6 LOAD_CONST               1 (2)
            9 STORE_NAME               1 (b)

3          12 LOAD_NAME                0 (a)
           15 LOAD_NAME                1 (b)
           18 BINARY_ADD
           19 STORE_NAME               2 (c)

4          22 LOAD_NAME                2 (c)
           25 PRINT_ITEM
           26 PRINT_NEWLINE
           27 LOAD_CONST               2 (None)
           30 RETURN_VALUE

This is fairly self explanatory.  We see at position 1 that the constants 1 & 2 are placed into a & b.  Next, they’re placed on the stack and BINARY_ADD is called, which triggers the addition of two number objects. Next, STORE_NAME saves the value of the add operation (from the top of the stack) to the location c. Finally, we load c and call the print operations. In Python 3, this would simply call the print function, via CALL_FUNCTION. For an overview of how Python generates bytecode from Python code, see Python/compile.c. The comment at the top of the file is quite helpful.

Using Python 2.6 as a reference point, all of this happens at Python/ceval.c. The function handling byte code execution is named PyEval_EvalFrameEx.  Generally, this is a big switch statement. I use the term switch loosely as it is actually a collection of computed goto labels on both Mac OS and Linux (Visual Studio doesn’t allow that).

Looking at this function, you’ll see various entries such as this;

  case POP_TOP:
     v = POP();
     Py_DECREF(v);
     goto fast_next_opcode;

This is the implementation for the POP_TOP instruction. The POP macro returns the top value of the stack and the subsequent Py_DECREF(v) decrements the reference count. At this point, that could trigger execution of v->ob_type->tp_del & v->ob_type->tp_dealloc, if the reference count of v (v->ob_refcnt) has reached zero. As an aside, note that Python checks for events/thread switches every sys.getcheckinterval() instructions.  If the corresponding implementation of an instruction is complex (and doesn’t release the GIL), we can be left waiting here.

Now, we come to the function we’re interested in:

PyObject * PyEvalCodeEx(PyObject *co, PyObject *globals, PyObject
    *locals, PyObject **args, int argcount, PyObject **kws, int
    kwcount, PyObject **defs, int defcount, PyObject *closure);

Essentially, this function builds a frame from the code object being executed and relies on PyEval_PyEvalFrameEx to handle bytecode instruction evaluation.  The code object contains references to globals, locals, nested scopes (free vars/cell vars, depending on the angle), etc. PyEvalCodeEx “transforms” that into a PyFrameObject.

It is this code object evaluation function we’re interested in as functions and methods are generally boiled down to code objects.

Python Data Structure Data Structures

Now that we’ve covered where to look, we need to take a look at what to look for.   This means building a bit of an understanding around a few data structures.

Type Objects

All of Python’s classes (well, almost) are represented by PyTypeObject objects, which is defined in Python/Include/Object.h.  This structure contains a whole lot of fields. Most of these fields will be pretty familiar looking as this is generally how “dunder”, or __methods__ , are implemented.  Standard, generic values are used (see PyType_Ready) if you don’t setup your own. This is a long structure, but including it here is relevant:

typedef struct _typeobject {
    PyObject_VAR_HEAD
    const char *tp_name; /* For printing, in format "." */
    Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */

    /* Methods to implement standard operations */

    destructor tp_dealloc;
    printfunc tp_print;
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    cmpfunc tp_compare;
    reprfunc tp_repr;

    /* Method suites for standard classes */

    PyNumberMethods *tp_as_number;
    PySequenceMethods *tp_as_sequence;
    PyMappingMethods *tp_as_mapping;

    /* More standard operations (here for binary compatibility) */

    hashfunc tp_hash;
    ternaryfunc tp_call;
    reprfunc tp_str;
    getattrofunc tp_getattro;
    setattrofunc tp_setattro;

    /* Functions to access object as input/output buffer */
    PyBufferProcs *tp_as_buffer;

    /* Flags to define presence of optional/expanded features */
    long tp_flags;

    const char *tp_doc; /* Documentation string */

    /* Assigned meaning in release 2.0 */
    /* call function for all accessible objects */
    traverseproc tp_traverse;

    /* delete references to contained objects */
    inquiry tp_clear;

    /* Assigned meaning in release 2.1 */
    /* rich comparisons */
    richcmpfunc tp_richcompare;

    /* weak reference enabler */
    Py_ssize_t tp_weaklistoffset;

    /* Added in release 2.2 */
    /* Iterators */
    getiterfunc tp_iter;
    iternextfunc tp_iternext;

    /* Attribute descriptor and subclassing stuff */
    struct PyMethodDef *tp_methods;
    struct PyMemberDef *tp_members;
    struct PyGetSetDef *tp_getset;
    struct _typeobject *tp_base;
    PyObject *tp_dict;
    descrgetfunc tp_descr_get;
    descrsetfunc tp_descr_set;
    Py_ssize_t tp_dictoffset;
    initproc tp_init;
    allocfunc tp_alloc;
    newfunc tp_new;
    freefunc tp_free; /* Low-level free-memory routine */
    inquiry tp_is_gc; /* For PyObject_IS_GC */
    PyObject *tp_bases;
    PyObject *tp_mro; /* method resolution order */
    PyObject *tp_cache;
    PyObject *tp_subclasses;
    PyObject *tp_weaklist;
    destructor tp_del;

    /* Type attribute cache version tag. Added in version 2.6 */
    unsigned int tp_version_tag;

#ifdef COUNT_ALLOCS
    /* these must be last and never explicitly initialized */
    Py_ssize_t tp_allocs;
    Py_ssize_t tp_frees;
    Py_ssize_t tp_maxalloc;
    struct _typeobject *tp_prev;
    struct _typeobject *tp_next;
#endif
} PyTypeObject;

The typedef (typedefs? Anyone know the plural of #typedef?) above (i.e. PyNumberMethods) are the C-level equivalent of the double underscore methods required to implement a certain protocol (programmatic interface). They expand into method collections:

typedef struct {
    lenfunc mp_length;
    binaryfunc mp_subscript;
    objobjargproc mp_ass_subscript;
} PyMappingMethods;

These translate into len, subscript, and subscript assignment.

Instances

All Python instances are all implemented as pointers to PyObject values, which is defined as:

typedef struct _object {
    PyObject_HEAD
} PyObject;

PyObject_HEAD, by default, expands to include only a pointer to the object’s type (type objects have a type of type!) and the reference count.

/* PyObject_HEAD defines the initial segment of every PyObject. */
#define PyObject_HEAD                   \
    _PyObject_HEAD_EXTRA                \
    Py_ssize_t ob_refcnt;               \
    struct _typeobject *ob_type;

Wait! Where is all of the per-instance data you say? For classes that do not define __slots__, there is a dictoffset member of the corresponding PyTypeObject structure. This provides the address, via offset from the end of the PyObject structure, that contains a Python dictionary. This is the __dict__ used to store per instance information.  If __slots__ is defined, then dictoffset is NULL and the slot values are stored at the end of the PyObject structure and accessed via descriptors. Generic structures are passed around via casting (and turned back into concrete values via the same method).

Somewhat related bonus Python trivia: The class dictionary is actually a PyDictProxy_Type that refers to the type’s tp_dict field.  You can’t edit it directly.

To clarify, assuming we have a type NinjaTurtle that is represented by PyTypeObject *ninja, then for an instance donatello, the following is true: (PyObject *)donatello->ob_type = ninja; Good. So, naturally, to perform an init call, the corresponding code would like like the following:

donatello->ob_type->tp_init((PyObject *)donatello);

In fact, this is almost exactly what happens when a type is called directly (ala class instantiation: MyClass()).

Code Objects

Let’s look at one final object, the code object. This is represented by a structure defined in code.h.  It is rather simple object (though note the first member).

/* Bytecode object */
typedef struct {
    PyObject_HEAD
    int co_argcount;    /* #arguments, except *args */
    int co_nlocals;   /* #local variables */
    int co_stacksize;   /* #entries needed for evaluation stack */
    int co_flags;   /* CO_..., see below */
    PyObject *co_code;    /* instruction opcodes */
    PyObject *co_consts;  /* list (constants used) */
    PyObject *co_names;   /* list of strings (names used) */
    PyObject *co_varnames;  /* tuple of strings (local variable names) */
    PyObject *co_freevars;  /* tuple of strings (free variable names) */
    PyObject *co_cellvars;      /* tuple of strings (cell variable names) */
    /* The rest doesn't count for hash/cmp */
    PyObject *co_filename;  /* string (where it was loaded from) */
    PyObject *co_name;    /* string (name, for reference) */
    int co_firstlineno;   /* first source line number */
    PyObject *co_lnotab;  /* string (encoding addr<->lineno mapping) See
           Objects/lnotab_notes.txt for details. */
    void *co_zombieframe;     /* for optimization only (see frameobject.c) */
    PyObject *co_weakreflist;   /* to support weakrefs to code objects */
} PyCodeObject;

From here, we can switch into Python. Note the above fields and then have a peek at a function’s func_code attribute (__code__ in 3.x):

>>>
>>> def f(): pass
...
[66987 refs]
>>> import pprint
[67863 refs]
>>> pprint.pprint(dir(f.func_code))
['__class__',
'__delattr__',
'__dir__',
'__doc__',
'__eq__',
'__format__',
'__ge__',
'__getattribute__',
'__gt__',
'__hash__',
'__init__',
'__le__',
'__lt__',
'__ne__',
'__new__',
'__reduce__',
'__reduce_ex__',
'__repr__',
'__setattr__',
'__sizeof__',
'__str__',
'__subclasshook__',
'co_argcount',
'co_cellvars',
'co_code',
'co_consts',
'co_filename',
'co_firstlineno',
'co_flags',
'co_freevars',
'co_kwonlyargcount',
'co_lnotab',
'co_name',
'co_names',
'co_nlocals',
'co_stacksize',
'co_varnames']
[67870 refs]
>>>

Perfect. Now we’ve made the connection between Python and C. Now we can take a look at the actual debugging process.

GDB’ing the Py.

We’ll use the same small bit of code we used above as our test script. We’re referencing /usr/bin/python here, which may vary on your system.

First, we’ll start the interpreter. Note that we’re debugging Python itself, not the script passed to it. GDB will not start if we pass in the Python script as the executable.

jeff@martian:~/cpython$ gdb /usr/bin/python
GNU gdb (GDB) 7.4-gg1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux".

Reading symbols from /usr/bin/python...
Reading symbols from /usr/lib/debug/usr/bin/python...done.
done.

Now we’ll set the appropriate args for the execution of Python — our script. Note that nothing is running at this point.

(gdb) set args add.py

Now, since we want to see how to pick apart the location of our Python code from the C level, we’ll set a breakpoint at PyEval_EvalCodeEx. This forces GDB to up and stop when it gets to our function.

(gdb) break PyEval_EvalCodeEx
Breakpoint 1 at 0x80e1f53: file ../../../Python/ceval.c, line 2767.
(gdb)

Note that if the correct source is available, this gets much easier as there is Python+GDB integration available via python-gdb.py. Now, we can run the executable:

(gdb) run
Starting program: /usr/bin/python add.py

Breakpoint 1, PyEval_EvalCodeEx (co=0xf7de7338, globals=0xf7df313c, 
   locals=0xf7df313c, args=0x0, argcount=0, kws=0x0, kwcount=0, 
   defs=0x0, defcount=0, closure=0x0)
at ../../../Python/ceval.c:2767
2767 ../../../Python/ceval.c: No such file or directory.

Understanding the Object Representation

From here, we can examine the code in question. First, let’s print the value of the first argument to PyEval_EvalCodeEx. From our prototype above, we know this is a code object:

(gdb) p *co
$1 = {ob_refcnt = 1, ob_type = 0x81a1e60, co_argcount = 0, 
  co_nlocals = 0, co_stacksize = 1, co_flags = 64, 
  co_code = 0xf7dedd40, co_consts = 0xf7dedd0c, 
  co_names = 0xf7dc102c, co_varnames = 0xf7dc102c, 
  co_freevars = 0xf7dc102c, co_cellvars = 0xf7dc102c, 
  co_filename = 0xf7de4200, co_name = 0xf7dedd60, 
  co_firstlineno = 1, co_lnotab = 0xf7dc10b0, co_zombieframe = 0x0}
(gdb)

Here, we see the ob_refcnt and the ob_type. If we cast this to a PyObject *, you’ll see that it only prints that information.

(gdb) p *(PyObject *)co
$4 = {ob_refcnt = 1, ob_type = 0x81a1e60}
(gdb)

Ok, let’s step ahead until we see something interesting. We’ll “GDB continue” until we have an args=<value> which is not 0×0, or NULL.  We’ll look at the following frame:

Breakpoint 1, PyEval_EvalCodeEx (co=0xf7d8cc80, globals=0xf7d8a35c, locals=0x0, 
  args=0x81bfe7c, argcount=0, kws=0x81bfe7c, kwcount=0, defs=0x0, 
   defcount=0, closure=0x0)
at ../../../Python/ceval.c:2767
2767 in ../../../Python/ceval.c
(gdb) info frame
Stack level 0, frame at 0xfffec7a0:
eip = 0x80e1f53 in PyEval_EvalCodeEx (../../../Python/ceval.c:2767); 
  saved eip 0x80e0cd2
called by frame at 0xfffec890
source language c.
Arglist at 0xfffec798, args: co=0xf7d8cc80, globals=0xf7d8a35c, 
  locals=0x0, args=0x81bfe7c, 
  argcount=0, kws=0x81bfe7c, kwcount=0, defs=0x0, defcount=0, closure=0x0
Locals at 0xfffec798, Previous frame's sp is 0xfffec7a0
Saved registers:
ebx at 0xfffec78c, ebp at 0xfffec798, esi at 0xfffec790, 
  edi at 0xfffec794, eip at 0xfffec79c
(gdb)

First, let’s have a look at the co value again:

(gdb) p *co
$10 = {ob_refcnt = 2, ob_type = 0x81a1e60, co_argcount = 0, 
       co_nlocals = 0, co_stacksize = 1, 
       co_flags = 99, co_code = 0xf7d8e688, 
       co_consts = 0xf7d8ddac, co_names = 0xf7dc102c,
       co_varnames = 0xf7dc102c, co_freevars = 0xf7dc102c, 
       co_cellvars = 0xf7dc102c, 
       co_filename = 0xf7d8cc38, co_name = 0xf7d8ddc0, 
       co_firstlineno = 51, co_lnotab = 0xf7d8dde0,
       co_zombieframe = 0x0}

Building a Python Friendly Backtrace

Now we can deduce where exactly this code comes from. We can pull the line number, the function name, and the file!

(gdb) p co->co_firstlineno
$16 = 51
(gdb) x/s ((PyStringObject)*co->co_name)->ob_sval
0xf7d8ddd4: "_g"
(gdb) x/s ((PyStringObject)*co->co_filename)->ob_sval
0xf7d8cc4c: "/usr/lib/python2.6/types.py"
(gdb)

So, types.py, line 51, function _g. Let’s take a look:

jeff@martian:~$ head /usr/lib/python2.6/types.py -n 51 | tail -n 1
def _g():

Excellent. This is where our Python function lives! There’s no point in going into it, however, this gives us a starting point to determine where a problem lives.

Looking up Argument Types and Values

Furthermore, we can pull out information about the arguments passed as well.  Let’s go back and determine what the type is. Remember our ‘info frame’ gave us an args parameter?

(gdb) p *args
$21 = (PyObject *) 0x0

Drat! Null. This function takes no arguments.  Let’s jump down a few more frames until we find a function that includes an argument.

Breakpoint 1, PyEval_EvalCodeEx (co=0xf7d9f8d8, globals=0xf7d8a9bc, locals=0x0, 
  args=0xf7d9e1c8, argcount=4, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0)
at ../../../Python/ceval.c:2767
2767 in ../../../Python/ceval.c
(gdb) info frame
Stack level 0, frame at 0xfffefa50:
eip = 0x80e1f53 in PyEval_EvalCodeEx (../../../Python/ceval.c:2767); 
  saved eip 0x813e70e
called by frame at 0xfffefac0
source language c.
Arglist at 0xfffefa48, args: co=0xf7d9f8d8, globals=0xf7d8a9bc, locals=0x0, 
  args=0xf7d9e1c8, argcount=4, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0
Locals at 0xfffefa48, Previous frame's sp is 0xfffefa50
Saved registers:
ebx at 0xfffefa3c, ebp at 0xfffefa48, esi at 0xfffefa40, edi at 0xfffefa44, 
  eip at 0xfffefa4c
(gdb)

Here we go. Now, using the above “trick”, we learn that this is line 78 in method __new__ in abc.py:

(gdb)p co->co_firstlineno
$24 = 78
(gdb) x/s ((PyStringObject)*co->co_name)->ob_sval
0xf7dc4694: "__new__"
(gdb) x/s ((PyStringObject)*co->co_filename)->ob_sval
0xf7d9f8a4: "/usr/lib/python2.6/abc.py"
(gdb)

Perfect. Now, since __new__ is (sometimes) indicative of a metaclass — and we’re looking at code from the Abstract Base Class module which I happen to know goes metaclass crazy — we should have a class, a name, a bases tuple, and an object dictionary. Let’s look at the object types:

(gdb) x/s args[0]->ob_type.tp_name
0x81590e5 <.LC33+5012>: "type"
(gdb) x/s args[1]->ob_type.tp_name
0x8158d74 <.LC33+4131>: "str"
(gdb) x/s args[2]->ob_type.tp_name
0x8158f43 <.LC33+4594>: "tuple"
(gdb) x/s args[3]->ob_type.tp_name
0x8156ea5 <.LC16+1319>: "dict"
(gdb)

Perfect! We’ve found the location of the code executing and the types of arguments that it takes.  What if we wanted to see, for example, the actual name passed in instead of the “str” type? Simple. We just repeat what we’ve already learned:

(gdb) x/s (*(PyStringObject *)args[1]).ob_sval
0xf7d96054: "Hashable"
(gdb) p (*(PyStringObject *)args[1]).ob_refcnt
$38 = 8
(gdb)

Now we know, without looking at a line of Python, that this is the __new__ method of the metaclass for the Hashable ABC and the name of the class has a reference count of 8.

Accessing Dictionaries

Finally, what about something more detailed? Let’s look at the dictionary passed here.

(gdb) p *((PyDictObject*)args[3])
$51 = {ob_refcnt = 3, ob_type = 0x81854a0, ma_fill = 4, ma_used = 4, ma_mask = 7, 
  ma_table = 0xf7d8aa60, ma_lookup = 0x808c70c , ma_smalltable = {
   {me_hash = 435549560, me_key = 0xf7dc44e0, me_value = 0xf7d9b4fc}, 
   {me_hash = 0, me_key = 0x0, me_value = 0x0}, 
   {me_hash = 1333480578, me_key = 0xf7dc2a20, me_value = 0xf7d9d5a0}, 
   {me_hash = -1120181165,me_key = 0xf7dc2688, me_value = 0xf7dc132c}, 
   {me_hash = 1733367940, me_key = 0xf7d942f0, me_value = 0x81c3e64}, 
   {me_hash = 0, me_key = 0x0, me_value = 0x0}, 
   {me_hash = 0, me_key = 0x0, me_value = 0x0}, 
   {me_hash = 0, me_key = 0x0, me_value = 0x0}}}
(gdb)

What’s all of this me business? Let’s look at one of the items in the hash table representing the dictionary.

(gdb) p *((PyTypeObject*)((PyDictObject*)args[3])->ma_smalltable[2].me_key.ob_type)
$64 = {ob_refcnt = 71, ob_type = 0x818a940, ob_size = 0, tp_name = 0x8158d74 "str",  
  tp_basicsize = 24, tp_itemsize = 1, tp_dealloc = 0x809d982 ,
  tp_print = 0x809d74c , tp_getattr = 0, tp_setattr = 0, tp_compare = 0, 
  tp_repr = 0x809ec77 , tp_as_number = 0x8187fe0, tp_as_sequence = 0x8188080,
  tp_as_mapping = 0x81880a8, tp_hash = 0x809c5a9 , tp_call = 0, tp_str = 0x809e602 , 
  tp_getattro = 0x8091091 ,
  tp_setattro = 0x8090e1a , tp_as_buffer = 0x81880b4, tp_flags = 136713723,
  tp_doc = 0x81880e0 
   "str(object) -> string\n\nReturn a nice string representation of the object.\n
        If the argument is a string, the return value is the same object.", 
  tp_traverse = 0,
  tp_clear = 0, tp_richcompare = 0x809cd84 , tp_weaklistoffset = 0, tp_iter = 0, 
  tp_iternext = 0, tp_methods = 0x8188180, tp_members = 0x0, tp_getset = 0x0,
  tp_base = 0x8187c00, tp_dict = 0xf7dc34f4, tp_descr_get = 0, tp_descr_set = 0, 
  tp_dictoffset = 0, tp_init = 0x80ac582 , tp_alloc = 0x80ad345 ,
  tp_new = 0x80a2fcc , tp_free = 0x8094510 , tp_is_gc = 0, tp_bases = 0xf7dc4f0c, 
  tp_mro = 0xf7dc7fa4, tp_cache = 0x0, tp_subclasses = 0x0,
  tp_weaklist = 0xf7dc7fcc, tp_del = 0, tp_version_tag = 0}
(gdb)

Excellent. The key type is a string. What’s the value?

(gdb) x/s ((PyStringObject *)((PyTypeObject*)((
    PyDictObject*)args[3])->ma_smalltable[2].me_value)).ob_sval
0xf7d9d5b4: "_abcoll"
(gdb)

The value of this entry is the string “_abcoll.” Note that the key type doesn’t reference the value type. I left out the step in which I looked up the value’s type.

Closing Notes

The most important step in understanding how to do this is having Python source available. You’re debugging a C program here; you want to access structure members and fields.  Given the above knowledge, you should be able to walk through and display information about almost any Python object in memory. A big help.

What about the shared libraries?

If you’re referencing shared object files that aren’t in standard library paths, you can add them to your GDB shared object search path from your local directory as follows:

for i in $(find . -name *.so)
  do
    dirname $i; 
  done | sort | uniq | tr \\n : | sed -e 's#\./#'$PWD'#g'

And then…

(gdb) set solib-search-path <the above output>

As always, you should ensure these are the same versions that you’re running or that may be referenced in a core.

What if I Have a Core File?

You’ll use it like you would with any other debug session:

gdb -c <core> /usr/bin/python

All of the standard commands should work at that point: up, down, select, frame, etc…

How do I Get a Core File?

You can force a binary to drop a core by ensuring that the ulimit is set appropriately via ulimit -Sc unlimited. If your core files aren’t where you expect, see man core.

Posted on April 19, 2012 at 6:39 pm by Jeff McNeil · Permalink · 2 Comments
In: development, linux, open source, python

How do those nesty-scopy things work for ya?

Over the past few weeks, I’ve been asked about closures.  What are they? How do they work? How do I create one?  Like any good nerd, I’ve decided that that makes it worthy of a blog post.

So, here we go. Let’s make up a definition for a closure and then step through that definition in detail.  For our purposes, a closure is a function that references cell variables in an enclosing namespace via free variables. Sounds pretty simple. Now, let’s spend the next few paragraphs stepping through that definition in detail.

As an aside, these examples are all in Python 3. The majority still applies to the 2.x tree, though some of the method attributes are different. For example, __code__ in the Python 3 world can be accessed as func_code in Python 2.

A closure is a function…

Let’s look at a simple closure.  We’ll use variations of this same snippet throughout as our examples.

def make_adder(c):
  def adder(v):
    return v + c
  return adder
func = make_adder(4)
print(func(4))

Here we have a few interesting things going on.  First off, the function make_adder defines a namespace.  In that namespace, we have the variable “c.” Note that this could have just as easily have been a variable defined in the function body, rather than having been passed as a formal parameter.

Next, we define the adder function. This is the closure itself. Note that adder references  the variable c that exists in the enclosing scope. This is the magic. That’s it. Really. Running this example results in the value “8.”

There are two additional things I want to point out here. First, you can’t del an attribute that’s been defined in an enclosing scope. Doing so is a means to sudden death.

def make_adder(c):
  def adder(v):
    del c # Boom!
    return v + c
  return adder

func = make_adder(4)
print(func(4))

Running this little gem triggers an AttributeError at the above marked line.

$ ./cpython t.py
Traceback (most recent call last):
  File "t.py", line 8, in
    print(func(4))
  File "t.py", line 3, in adder
    del c
UnboundLocalError: local variable 'c' referenced before assignment
[44434 refs]

Next, assigning to a variable from an external scope rebinds the variable in your local scope, effectively “hiding” the data in the enclosing scope.

def make_adder(c):
  def adder(v):
    c = 100 # See? The original 4 is hidden!
    return v + c
  return adder

func = make_adder(4)
print(func(4))

If we run this example, the output becomes “104.” Go ahead and run it. I’ll wait.  Back? Good. Remember that the value of 100 only holds true for “c” in the body of adder. In make_adder, the value stays as it was passed.

Honestly, I’m leaving one little thing out.  The behavior detailed above is the same as we’d see if we have a  global variable and shadow it in a method. If we simply read from that variable, we get the global value. If we write to it? We re-bind in our local namespace and no longer reference the global.  If we wish to mutate a global variable inside of a method, we need to declare it as using the global keyword in that method. Consider the following example:

global_thing = True

def func_one():
  global_thing = False

def func_two():
  global global_thing
  global_thing = False

# Print default value.
print(global_thing)
# Change it, but don't declare it as global.
func_one()

# See the global variable didn't change!
print(global_thing)
# Run again, this time declare it as global in the function.
func_two()

# Note it *has* changed. Told ya so.
print(global_thing)

The nonlocal keyword gives us this same functionality when referencing a nested scope, rather than a global variable. Note that the usage can be slightly tricky, but the ordering is called out below:

def make_adder(c):
  print("Our C is {0}".format(c))
  def adder(v):
    nonlocal c
    c = 500
    return v + c

  adder(0) # We *need* to call this. Simply defining it doesn't execute it!
  print("Our C is now {0}".format(c))
  return adder

func = make_adder(4)
print(func(4))

Now, running the above code provides the following output:

Our C is 4
Our C is now 500
504
[44428 refs]

Make sense? Take close note of the execution here.

  1. We call make_addr, which has an initial value of 4 for “c.”
  2. Next we define the inner method. Note that this method does not execute at this point. We’re simply defining a function object and binding it to the name addr.
  3. Next we call the above function. This function, due to the use of the nonlocal statement, reassigns the value of “c” to 500.
  4. Control returns to our outer function, which prints the modified value.

That covers most points. For more information, check out PEP 227: Statistically Nested Scopes.

…that references cell variables in an enclosing namespace via free variables.

Now that we’ve made sense of closures and looked at them from a Python point of view, let’s dive in a bit deeper and examine the implementation.  We’ll use the following snippet of Python code for the following few examples.

def make_adder(c):
  bias = 2
  def f(): pass
  def adder(v):
    c = 500
    return v + c + bias
  return adder

# Func is the INNER function, as returned by make_adder.
func = make_adder(4)

# Calling the inner function.
print(func(4))

Now, its important to understand the flow here. When make_adder is called, it is executed as any other function. Part of that means defining, but not calling the inner adder function. If we  disassemble the call, we can see this in action:

2 0 LOAD_CONST     1   (2)
  3 STORE_DEREF    0   (bias)
3 6 LOAD_CONST     2   (<code object f at 0x7f993f6e8700, file "t.py", line 3>)
  9 LOAD_CONST     3   ('make_adder.<locals>.f')
  12 MAKE_FUNCTION 0
  15 STORE_FAST    1   (f)
4 18 LOAD_CLOSURE  0   (bias)
  21 BUILD_TUPLE   1
  24 LOAD_CONST    4   (<code object adder at 0x7f993f6e8ac0, file "t.py", line 4>)
  27 LOAD_CONST    5   ('make_adder.<locals>.adder')
  30 MAKE_CLOSURE  0
  33 STORE_FAST    2   (adder)
7 36 LOAD_FAST     2   (adder)
  39 RETURN_VALUE

First off, I do apologize for any formatting irritation with the above. I seem to have gotten into a fight with WordPress (and it won).

Now, have a look at MAKE_FUNCTION through STORE_FAST. That explains why we added the ‘def f(): pass‘ to the original example.  Here, we’re doing things the old fashioned way. We create a function and bind it to ‘f’ in the make_adder namespace.  As an aside, the def keyword is one of the binding operations. The others being ‘=‘, ‘class‘, ‘as‘, ‘except‘, and ‘import.’ This is why we see the STORE_FAST call at the completion of each function build.

If we now look at LOAD_CLOSURE, we’ll see that things are slightly different.  Let’s walk through the remaining steps here to build a solid understanding of what’s going on.

  1. LOAD_CLOSURE increments the reference count of the variable referenced in the closure function and steps to the next bytecode instruction. The object “closure object” is placed on the virtual machine stack.
  2. BUILD_TUPLE is called and builds a tuple out of the variables that are referenced in the enclosing scope.
  3. LOAD_CONST is called twice and puts both the code object for the closure function and the qualified name on the stack as well. Note that the __qualname__ special method is new as of Python 3.3. See PEP 3155 for more information.
  4. Now we call MAKE_CLOSURE.

The MAKE_CLOSURE operation is actually handled by the same C code as MAKE_FUNCTION (Python/ceval.c:2726). Most mechanics are the same, considering that keyword arguments need to be wired up and a function object needs to be allocated.  Though, if we’re building a closure, then PyFunction_SetClosure is called (Objects/funcobject.c:1056). This sets the func_closure field of the PyFunctionObject defining the closure to the tuple created by step #2 above. Plumbing complete.

If you’re interested in how those bytecode instructions get generated during the compilation phase, see Python/compile.c:1380.

Now that we’ve made sense of that, let’s jump back to the Python level and see what’s going on there.  Remember ‘func’ is the closure and ‘make_adder‘ is the outer function (the nested scopy thingy).

If we look at the __closure__ and the func.__code__.co_freevars on the closure function, we see the following:

(<cell at 0x7f7ddf562060: int object at 0x91cad0>,)
('bias',)

To understand these, we need to take a look at the disassembed bytecode of the inner function (our closure):

5 0 LOAD_CONST    1  (500)
  3 STORE_FAST    1  (c)
6 6 LOAD_FAST     0  (v)
  9 LOAD_FAST     1  (c)
  12 BINARY_ADD
  13 LOAD_DEREF   0  (bias)
  16 BINARY_ADD
  17 RETURN_VALUE

Skip on down to LOAD_DEREF.  When this instruction is encountered, Python translates it to the following C code:

  x = freevars[oparg];
  w = PyCell_Get(x);
  if (w != NULL) {
     PUSH(w);
     DISPATCH();
   }

This indexes the zero’th element of the freevars array, which is the “bias” cell object (as stored earlier).  The call to PyCell_Get(x) translates to roughly to PyCell_Get(PyObject(“bias”)). The value returned is the actual PyObject that the associated cell variable contains (well, a reference to it). The resulting dereferenced value is placed on the virtual machine stack and BINARY_ADD fires as normal.  The term LOAD_DEREF ought to be clear now.

We have one more object to look at. Here, we see the contents of make_adder.__code__.co_cellvars. Ready?

('bias',)

Yup, that is it. That comes from the creation of the outer function’s code object, in Python/compile.c:4112.

So, hopefully, the definition given initially makes sense. A closure is a function that accesses cell vars of an inclosing scope via local free vars.  I’d appreciate any comments — especially corrections. I dug into most of this as a means to solidify my own knowledge. If I’ve got something wrong, pointing it out makes you a good person!

Posted on April 2, 2012 at 1:22 am by Jeff McNeil · Permalink · One Comment
In: development, python

Post-PyCon Depression

I was able to attend my third PyCon this weekend in Santa Clara, CA.

Usually, I step through each talk that I’ve attended and document my thoughts.  A large review post, really. High points and low points. I think I’m going to skip that this time around and concentrate on what has really become a bit of a reoccurring problem for me (no, not a real problem, don’t worry). 

Each year I leave PyCon feeling pretty involved in the community.  This usually means studying CPython code for a couple of weeks afterwards, fixing a small collection of bugs found on the issue tracker, and looking for a larger project.   I’ve even contemplated sending in a contributor’s agreement such that I can submit changes back to the little interpreter that has been paying my bills for the past seven years.

Last year, for example, I implemented a “go” keyword — down to the byte code level — just to learn how things are wired up. It was quite a rewarding experience as I got to touch areas of the codebase that usually simply annoy me with cryptic errors.

But, time and time again, I wind up burning out after not finding anything substantial to contribute back. There does seem to be quite a push to get new “core developers”, though I’m not sure where to jump on in.  For example, the following resources are wonderful:

I’m fairly determined to avoid that crash this time through by simply asking. Python world, who wants help? Is there still a need for new people on the CPython proper? Or, have we largely hit critical mass there? Would time be better spent on a third party component? Standard library improvements? Is it worth mailing in that contributor agreement in hopes of finding something worthwhile to chip away at?

Posted on March 12, 2012 at 3:35 pm by Jeff McNeil · Permalink · 4 Comments
In: development, open source, pycon, python

Learning Go (and reflect-ing on what’s possible, too)

It’s been a few days since my last Go post; my apologies. I know everyone out there was feverishly clicking refresh. Relax my babies, there’s more content on the way. This time, we’re going to look into the reflect package, which provides the introspection support found in Go.

If you’ll recall, before my last post, my intention was to cover the Go RPC functionality.  I ran into a problem in that the gob package was used internally and I did not yet understand it.  Well, as it turns out, the reflect package is also used. Values are inspected and translated into corresponding gob data. So this time, we’ll look at reflection within Go.  This is an interesting topic as it not only prepares us for an upcoming RPC post, but it also really solidifies understanding of the Go type system.

Also, realize that I’m not an expert on Go. If you happen to be one with your Go-Fujitsu and you spot something incorrect (or even perhaps something not idiomatic), please leave a comment. That will help make me an expert!

The Inspected Code

In this example, we’ll dial back down to one source module, it’s just easier that way.  We’ll be using the little gogogo command that we created in the first Go post.  The first thing we’ll do is create a small collection of types and methods such that we can see what each component looks like when introspected. Here’s the source that we’ll actually be exploring at run time.

package main

import "reflect"
import "log"

type MailMessage struct {
  subject string
  from string
  to string
  body string
  high_priority bool
}

/* Create a new mail message. */
func NewMailMessage(subject string, from string, to string) (*MailMessage) {
  return &MailMessage{subject, from, to, "", false}
}

/* Set the message body. */
func (m *MailMessage)SetBody(body string) {
  m.body = body
}

/* Set message priority. */
func (m *MailMessage)SetPriority(high bool) {
  m.high_priority = high
}

/* Send the message */
func (m MailMessage) SendMessage() (bool) {
  log.Printf("Message to %s has been sent\n", m.to)
  return true
}

/* Typedef 'int' to a new MessagesSent type. */
type MessagesSent int

Note that at the top of this listing we also declare our package name and include the necessary import statements.  If you’re following along and wish to compile, then you’ll wish comment out the line importing reflect as we’re not using it yet. As an aside, I’m not sure I like that feature of Go just yet.

A few notes on this example here.

  1. We define a new type, which is based on the structure. The new type contains a series of elements.
  2. We define three methods and one function.  Two of the methods take a pointer to a MailMessage structure, whereas the third simply takes a MailMessage structure itself.
  3. We’re not doing anything. Nothing. This is a highly contrived example meant to highlight reflection. We’ll tackle the SMTP package down the road a little bit, you know, once we finally get that RPC stuff out of the way.

Great. Now that we have enough code that does nothing whatsoever, we’ll write some code that does something (with nothing). First, a bit of a detour.

The Type Switch

First, we’ll look at what Go deems the “Type Switch.”  This gives us a mechanism, via a special syntax switch statement, to chose a code branch based on the underlying type of the object.   Consider the following listing.

package main

import "log"

func main() {
  x := 1

  switch x_type := x.(type) {
    case int:
      log.Println("It Was An Integer")
     }
}

This looks simple enough. Given x, we only want to print something if it is indeed an integer type. That special dot-type syntax you see above is what qualifies this as a type switch.  However, take a look at what happens when we try to compile this code.

$ 6g type_switch.go
type_switch.go:8: cannot type switch on non-interface value x (type int)
$

That’s slightly confusing.  It turns out that, in this case, x is being used as a object with a concrete type of integer. That is, it’s not being referenced via an interface.  If we update the code, we can get the desired result.

package main

import "log"

func TypeSwitcher(v interface{}) {
   switch v_type := v.(type) {
    case int:
      log.Println("It Was An Integer")
   }
}

func main() {
  TypeSwitcher(1)
}

Now, before we run this, note the interface{} type. That’s an empty interface. Empty interfaces can stand in for everything. Let’s build this again and give it a run, it should now print out what we’re expecting.

$ gogogo type_switch.go
2011/02/23 22:54:02 It Was An Integer

So, now you understand what a type switch is and how decisions based on type can be made in Go. However, what if we’re not checking against an interface? Or, rather, what if we want to know a little bit more about an object? Perhaps, say, how many fields a structure has? That’s where reflection comes in.

The Reflecting Code

So, like we’ve done in a few other examples, we’ll just dump the entire listing here and then walk through it afterwards.  There are quite a few new elements here. Hang in there, we’ll explain them all.

/* Pull info we care about from reflect.Type */
func GetTypeInfo(t reflect.Type) (string, reflect.Kind, int) {
  return t.Name(), t.Kind(), t.NumMethod()
}

/* Print information about a reflect.Value. */
func ReflectOnValue(v reflect.Value) {
  /* Nothing to do on a nil. */
  if v == nil {
    log.Println("nil value, nothing to do")
    return
  }

  /* Pull the information we require about the Type. */
  name, kind, methods := GetTypeInfo(v.Type())

  /* Now type switch and extract the Value Info. */
  switch v := v.(type) {

    case *reflect.StructValue:
      log.Printf("Discovered %s/%s that has %d fields and %d methods.\n",
        kind, name, v.NumField(), methods)

    case *reflect.PtrValue:
      log.Printf("Discovered a pointer with %d methods, dereferencing...\n", methods)
      ReflectOnValue(v.Elem())

    default:
      log.Printf("Found %s/%s with %d methods.\n", kind, name, methods)
  }
}

/* Make anything an interface. */
func PrintValueData(i interface{}) {
  ReflectOnValue(reflect.NewValue(i))
}

Alright! This code is responsible for digging into the details of our objects and reporting what it finds.  Let’s walk through it (almost) line by line and examine exactly what’s going on.

The Driver

Our goal here is to see how different types and values are viewed by the Go system.  We create a series of variables in our main method and send them to our PrintValueData function.

func main() {
  /* Mail Message Objects */
  m_ptr := NewMailMessage("subject", "from", "to")
  m_std := MailMessage{"subject", "from", "to", "body", false}

  var counter MessagesSent

  /* Print out Information about each. */
  PrintValueData(0)
  PrintValueData(nil)
  PrintValueData(NewMailMessage)
  PrintValueData(m_std)
  PrintValueData(&m_ptr)
  PrintValueData(counter)
}

Running the Application

Now we have a complete application.  Let’s see what happens when we run it.

$ gogogo reflect.go
2011/02/24 00:30:45 Found int/int with 0 methods.
2011/02/24 00:30:45 nil value, nothing to do
2011/02/24 00:30:45 Found func/ with 0 methods.
2011/02/24 00:30:45 Discovered struct/MailMessage that has 5 fields and 1 methods.
2011/02/24 00:30:45 Discovered a pointer with 0 methods, dereferencing...
2011/02/24 00:30:45 Discovered a pointer with 3 methods, dereferencing...
2011/02/24 00:30:45 Discovered struct/MailMessage that has 5 fields and 1 methods.
2011/02/24 00:30:45 Found int/MessagesSent with 0 methods.
$

Wow, neat. This output is almost what we expected.  Let’s again go through this line-by-line and look at why we’ve got what we have. Don’t worry if method information seems a little odd to you, we’ll double back to that afterwards.

  1. Handled by our default case. This is simply an integer with zero methods.
  2. Handled by our nil test.
  3. Handled by our default handler. Go identifies it as a function (with no type name) that has zero of its own methods.
  4. Handled by our structure case. We pass in m_std in our main function, which is a pass-by-value function. Go tells us that this structure has five fields and one method associated with it.
  5. Go finds a pointer with zero methods attached to it, and dereferences.
  6. Go finds a pointer with one three methods attached to it, and dereferences.
  7. Go uncovers what we pointed to by &m_ptr (the address of a pointer to a structure of kind MailMessage). It correctly prints the same information as noted on line four.
  8. Go prints that we have a type MessagesSent with a underlying kind of int, with zero attached methods.

Ok, what’s wrong with this picture? Anyone paying attention? We defined two methods that take a pointer to a MailMessage and one method that takes a MailMessage by value. However, when we dereferenced down to a MailMessage structure, we only accounted for one method. At the same time, we have a pointer (m_ptr after dereferencing &m_ptr), that has three methods! Why?

So, in Go, a pointer to a type’s set of methods is the union of all methods both on the pointer type as well as the pointed-to type. So, *MailMessage has methods SetBody, SetPriority, and SendMessage associated with it. Of those methods, only SendMessages takes a MailMessage by value, thus it only has one method associated with it.

Final Thoughts

So, at this point, we’ve covered a bit about reflection. The Go documentation on this is slightly sparse.  I pieced this entry together using both the Go reflect documentation, and the source code to the Go json module. So, if I’m off base on anything, please leave a comment and correct me.

Next, it is possible to dynamically dispatch methods using the Call method of a reflect.FuncValue object. This method takes a slice of reflect.Value and returns a slice of reflect.Value. Rigging this up would be an interesting experiment.

Lastly, it is not possible in Go to create new types at runtime.

 

Posted on February 24, 2011 at 12:53 am by Jeff McNeil · Permalink · 2 Comments
In: development, Go, open source

Learning Go (and serializing objects with it, too)

I had intended to make my next target the Go RPC services, simply because RPC is cool. Nothing makes me happier than pressing a button on machine A while watching the LED on machine B flash. Nothing.

As I got into the RPC code, I realized that underlying data serialization was handled by the gob module. The gob module appears to be analogous Python’s pickle approach.  Given that, it seems that the gob foundation ought to exist before I spend any amount of time on understanding the RPC mechanism.  So, that’s what this post is about. We’ll put together a module that serializes structure data. Of course, we’ll also provide reader functionality in order to close the loop.

Since we’re getting a little fancier (two files this time), the little gogogo test script won’t work here. Instead, we’ll have to do it the old fashioned way.  I promise we’ll just add a Makefile if we add a third file. That’s probably worth a post itself.

The gob Package

The documentation for the package can be found on the Go Language site. Based on those pages, it’s a rather flexible system.  Instead of terminating or triggering an error condition in many cases, Go will make an effort to follow pointers and do other creative things such as leave out elements if they’re not present in the type we’re attempting to reconstruct.  This is probably something to take note of as it could be slightly surprising if it is not a behavior you’re ready for.

The documentation on the encoding format is quite complete and it covers things such as integer encoding, structure formatting, and the handling of signed values.

Manipulating Gob Data

Our example application isn’t terribly complex.  It is composed of two files. The first file, goopy.go, contains all of the code necessary for handling the serialization.  The second file, main.go, acts as the driver and provides an interface to the user.  Here’s  quick run down of the application flow.

  1. Startup and flag processing. Default to read.
  2. If reading, load the gob file from disk and display it to the screen. We take care to handle missing files gracefully.
  3. If writing, we encode the data in question and stream it to disk.
  4. Application terminates, ensuring all files are closed.

The Stuff at the Top of the goopy.go File

This isn’t overly interesting.  We just include package dependencies and define a structure type that we’ll use for serialization.

package goopy import "gob" import "os" type PersonInfo struct { Name string Age int }

Serialization

The following method handles the serialization of data to a local file. The function accepts a pointer to a PersonInfo as its receiver and a path for the target data file.  We’ll return something implementing the os.Error interface if we run into a problem. Otherwise, the return value is nil.

func (p *PersonInfo)SerializePersonInfoToGob(to_file string) (os.Error) {
  /* Open file and check for error state */
  file_handle, err := os.Open(to_file, os.O_WRONLY|os.O_CREAT, 0600)
  if err != nil {
    return err
  }

  /* Automatically close when we finish in this function, consider
   * with open(to_string) as file_handle. */
  defer file_handle.Close()

  /* Serialize data out. */
  gob.NewEncoder(file_handle).Encode(p)
  return nil
}

Let’s step through this. It’s not overly confusing, but there are a couple of concepts covered here.

  1. The first line opens the target file. Notice the similarities between the flags here and the flags available to the standard open(2) system call. This function returns two values. An *os.File, and a possible error condition.
  2. We check the error condition. If it’s non-nil, we just return it.
  3. Next, since the file was opened, we call ‘defer file_handle.Close().’ What’s this do? When a method exits, for any reason, anything that has been marked with the defer keyword will execute. This ensures that our file is closed.  If you’re familiar with Python, think “leaving a with statement.”
  4. Next, we create an anonymous encoder and call the Encode method. We pass file_handle. Note that NewEncoder function expects something implementing the io.Writer interface. That is, anything with a Write method. Calling Encode flushes the serialized data down.

That’s it. We then return nil to signify a successful encoding.

Deserialization

Deserialization is a lot like the serialization process.  Here’s the code that does that for us.

func NewPersonInfoFromGob(from_file string) (*PersonInfo, os.Error) {
  file_handle, err := os.Open(from_file, os.O_RDONLY, 0600)
  if err != nil {
    return nil, err
  }
  defer file_handle.Close()
  var person PersonInfo

  decoder := gob.NewDecoder(file_handle)
  err = decoder.Decode(&person)
  return &person, err
}

Everything should look quite familiar. Note that we now follow the “value, error” idiom here as well. This is obvious as the returned elements are a pointer to a PersonInfo and something implementing os.Error. The other thing to notice is that we pass a pointer to our PersonInfo variable to the Decode method. This ensures pass-by-reference. Our structure will be populated by the deserialization routine.

Creating New Objects

We add one little helper function to create new PersonInfo objects.

func NewPersonInfo(name string, age int) (*PersonInfo){
  return &PersonInfo{Name: name, Age: age}
}

The main.go File

Now, we put together our driver code that handles user options.  Let’s look at this as one big listing and step through the interesting points.

package main

import "log"
import "flag"
import "./goopy"

/* File we read and write from. This is required in any case. */
var gobfile *string = flag.String(
  "gobfile", "data.gob", "A Go Pickle of a Different Flavor.")

/* Read mode? We default to this in order to be non-destructive.
 * we'll pull Gob data out and just dump it to the screen.
 */
var writing *bool = flag.Bool(
  "writing", false, "If specified, we write the command line data.")

/* Required only in writing scenario */
var age *int = flag.Int("age", -1, "Age for person record.")
var name *string = flag.String("name", "", "Name for person record.")

func main() {
  defer func() {
    if err := recover(); err != nil {
       log.Println("Fatal Error Encountered: ", err)
    }
  }()

  flag.Parse()

  if *writing {
    person_info := goopy.NewPersonInfo(*name, *age)
    person_info.SerializePersonInfoToGob(*gobfile)
    log.Println("Serialization Complete")

  } else {

    person_info,err := goopy.NewPersonInfoFromGob(*gobfile)
    if err != nil {
      panic(err)
    }

    log.Printf("Read Complete (Dump): %+v", person_info)
 }
}

Most of this should look familiar if you’ve followed my other posts to date. There are a few newer concepts (to me, as well!) here, so again, let’s walk through.

  1. Required imports and package name. Notice the “./goopy” syntax used. This is because our goopy module is located in the current directory and not in a centralized library location.
  2. Next up, we setup four command line flags. The file to perform IO on, whether we want to write, and then the values for our PersonInfo structure.
  3. Now, look at the first couple of lines of our main function. Different, no? Here, we defer an anonymous function that calls recover(). If you look back at our deserialization routine, you’ll notice that we’ve called panic.  Panic causes the stack to unwind and the application to exit.  If, along the way, a recover call is executed, the error passed to panic is returned and control resumes at that function return.  Note that since the stack is unwinding, the only valid place (I believe) to stick a recover call is in a deferred function.  In this case, we just use it to print our error condition and exit.
  4. We then parse the flags as we’ve done before.
  5. If we’re writing, we call goopy.NewPersonInfo and then proceed to serialize that information.
  6. Otherwise, we default to reading our data out of file and displaying it to the screen.

That’s fundamentally it in terms of our application. Notice that we’re using the log.Printf and log.Println functions here. That’s nice as it causes data printed to standard output to be prefixed with the date and time.

Compiling and Running

While this isn’t as straightforward as our previous tests were, it’s not difficult. First we’ll compile our application, link it, and display the command line options.

mcjeff@martian:~/my_go$ 6g goopy.go
mcjeff@martian:~/my_go$ 6g main.go
mcjeff@martian:~/my_go$ 6l -o gobber main.6
mcjeff@martian:~/my_go$ ./gobber --help
flag provided but not defined: -help
Usage of ./gobber:
  -writing=false: If specified, we write the command line data.
  -name="": Name for person record.
  -gobfile="data.gob": A Go Pickle of a Different Flavor.
  -age=-1: Age for person record.

First, if we run our application without a valid file, we’ll see our error handling in action.

mcjeff@martian:~/my_go$ ./gobber -gobfile=/no/such/file
2011/02/17 15:43:19 Fatal Error Encountered:  open /no/such/file: no such file or directory

Now, let’s run it again with the proper command line arguments needed to write new gob data out.

mcjeff@martian:~/my_go$ ./gobber -writing=true -name=jeff.gob -age=31 -gobfile=jeff.gob
2011/02/17 15:45:12 Serialization Complete
mcjeff@martian:~/my_go$ ls -l jeff.gob
-rw------- 1 mcjeff mcjeff 58 Feb 17 15:45 jeff.gob

Beautiful! It ran as it should, we created the file, and populated it with our command line data.  Finally, let’s run our new utility one more time in order to read the data from disk and ensure it prints the proper contents.

mcjeff@martian:~/my_go$ ./gobber -gobfile=jeff.gob
2011/02/17 15:46:51 Read Complete (Dump): &{Name:jeff.gob Age:31}

That’s all there is to it!

Posted on February 17, 2011 at 3:59 pm by Jeff McNeil · Permalink · 2 Comments
In: development, Go, linux, open source

Learning Go (and integrating with inotify, too)

Go provides a native interface into the Linux inotify system. What’s inotify?  According to the man page, the inotify API provides a mechanism for monitoring file system events.  It can be used to monitor individual files, or directories.  There are a series of system calls available providing access to inotify. Here’s a quick run down of what they are and what they do.

All in all, it’s a fairly straightforward API.  Watches are configured with bitmasks and events are reported by feeding event structures to the read file descriptor.  For more information, see the Wikipedia page.

Where is it used? Well,  consider desktop indexing applications. For a lower level reference, the udevd system monitors udev rules via inotify and automatically reloads them when changes are detected.

Our Example Application

We’re not going to write anything as fancy as udevd, or as useful as a desktop indexing daemon.  Instead, we’ll put together a simple Go application that uses inotify integration and a collection of channels to calculate the size of files written to a directory. Nothing fancy.

What we Stick at the Top of the File

First thing’s first. We simply name our package and import others that we depend on.

package main import ("flag" "fmt" "os" "os/inotify" ) var watch_target *string = flag.String("watch", ".", "The directory to keep an eye on")

The only other thing going on here worthy of mention is the flag.String line. Since we want our tremendously useful application to be configurable, we’re using the flag module to set up a string flag.  This line sets up command line argument processing. The watch_target variable will contain the argument value as set by ‘–watch’, or the default, once control is passed to main(). The Go runtime will also automatically setup the “–help” flag as well, which will print the help strings for all arguments defined. Neat.

Our Changed File Structure

When we fire up our application, we want to read change events, pass them to a goroutine, and then simply forget about it.  When we pass the changes off, we need to tell our system which file changed as well as what to do with our result output.  We define the following structure.

type ChangedFile struct {
  output_channel chan string
  filename string
}

This simply contains two elements. An output channel that we’ll stream our results to and the filename of the changed entity.

Reporting Output

Next, we define a function that handles printing output to the screen. We run this in it’s own goroutine to keep concurrent routines from stepping on each other.

func notificationReporter(input chan string) {
  for {
    queue_data := <-input
    if len(queue_data) == 0 {
      return
    }
    fmt.Println("Change Received: ", queue_data)
  }
}

This is fairly straightforward. We pass in the channel in which we’ll read our input from. Next, we enter into an infinite loop. We just print the data to standard out.  However, if we get a zero-length string, we terminate the routine.  Though this isn’t really going to happen in our little example as we’re not rigging up sentinel values. We just quit on a Control+C.

The Change Processor

Next, we put together the method that handles processing of each individual change. It’s a method because it has a specific receiver, without it, we’d call it a function.  Here we stat the file changed, build a log string, and send that value to the channel attribute of the receiver.

func (c *ChangedFile)ProcessChange() {
  statbuf, _ := os.Stat(c.filename)
  c.output_channel <- fmt.Sprintf("%s is %d bytes", c.filename, statbuf.Size)
}

As you may have guessed by now, c.output_channel is the other end of the channel that we read from in the above notificationReporter function.  We’ll wire it all up in our main driver method.

The Main Method

Let’s just take a look at this one and then step through it afterwards.

func main() {
  flag.Parse()
  var report_channel = make(chan string, 5)
  go notificationReporter(report_channel)
  watcher, _ := inotify.NewWatcher()
  watcher.AddWatch(*watch_target, inotify.IN_CLOSE_WRITE)
  for {
    select {
      case event := <-watcher.Event:
        change := &ChangedFile{report_channel, event.Name}
        go change.ProcessChange()
      case error := <-watcher.Error:
        panic(error)
    }
  }
}

First off, we create a channel named report_channel, with a buffer of 5. This allows us to queue up to 5 results for writing before the enqueuing would block on a full “pipe.” Next, we fire off the notificationReporter function defined above. By using the go keyword, we allow it to run concurrently with our main line of control.

Next up, our inotify.Watcher is created. We then “load” the watcher with an event.  This takes two arguments – the directory we want to watch and the events we care about.  In this example, we just care about file close events for files that were written.

The for loop is slightly interesting.  Here you’ll see a Go select block. The select system is much like a switch statement, however, the individual cases are channels or IO endpoints.  If the tested object is ready to perform IO, then the following block will execute.  A default block could also be created, which would fire if none of the specific cases match. Finally, the select group will block until it is explicitly exited.

Within this loop, we create a new ChangedFile and configure it with our reporting channel and the name of the file that has been modified. We then execute change.ProcessChange by using the go keyword again, which causes it to run concurrently with the main line of control and the reporting function. We now have three concurrent paths executing at the same time.

Lastly, we just panic on error instead of building proper error handling in.

The Output

Alright, we’ll first run our code using the little build script we stuck in our ~/.profile file in the first Go blog entry.

[jeff@martian ~]$ gogogo test.go

There’s not a whole lot going on there. If everything works correctly, the code will compile and execute. From there, it will just block waiting for directory change events.  In another window, we’ll write a few files.

[jeff@martian ~]$ touch a b c d e; echo "bytes" > f

Alright, now if we switch back to the other window, we’ll see something interesting.

Change Received:  ./a is 0 bytes
Change Received:  ./b is 0 bytes
Change Received:  ./c is 0 bytes
Change Received:  ./d is 0 bytes
Change Received:  ./e is 0 bytes
Change Received:  ./f is 6 bytes

Well, that’s it. As each file is written and closed, our events are handled as outlined above.  When the fun is over, simply terminate the Go application with a Control+C.  As I learn a bit more about Go, I’ll get to closing down channels properly and cover that in a later post. Additionally, we didn’t close down our inotify handle correctly, either.  We rely on the operating system to handle all of that for us when the process terminates.

Posted on February 16, 2011 at 11:58 pm by Jeff McNeil · Permalink · One Comment
In: development, Go, linux, open source

Learning Go (and hitting C libraries with it, too)

Update: This is already underway in a real fashion. Check out go-python for more information. Looks like a solid project!

Being the type of guy that doesn’t like to learn anything at the surface level, I decided to look into how one would integrate an existing C library with a Go application. My curiosity was two fold. First, I hoped to learn a bit more about what lives under the Go-hood.  Secondly, being a new language, my assumption is that there are a lot of native libraries that haven’t yet been wrapped.

I spent a while trying to hunt down a simple, step-by-step guide that would lay out all of the steps needed to do this.  Python has excellent documentation when it comes to handing extending and embedding. I was hoping that I’d come across such a document written from a Go perspective, but I wasn’t able to find one.  Using examples and tool documentation, however, I was able to get some test code put together. Below you’ll find a rundown of what I did and how I did it.

My goal was to link in Python and execute Python code from within a Go executable. I wouldn’t recommend ever doing this for any reason, ever. Actually, you may want to just stop reading now.

Using the cgo Compiler

When building Go code that needs to interface with an existing C library, you’ll compile using the cgo compiler instead of the usual ${num}g.  So, the approach I took was to isolate all of that code into a specific library and then access those functions as I would any other package.

The “C” Package

Generally speaking, C provides a flat, non-hierarchical namespace. Protection is limited to source file scope and is dictated by the programmer’s use of the static keyword. Simply put, everything is visible everywhere unless declared static. This is why C API definitions usually start with an identifying prefix. For Python developers, consider the Py_ prefix. It provides a way to create independent namespaces (though, with no protection between them).
When interfacing with C libraries, Go wraps everything C under the “C” package.  For example, printf as defined in stdio.h becomes C.printf. This provides a nice demarcation between the Go world and the everything else world.

Including Headers

So, now we know what we’ll use to build the code and how we’ll interface with it.  The next step is to ensure Go knows where it can find the necessary headers.  To do this, we embed the information in comments immediately preceding the import “C” line.

// #include <Python.h> import "C"

This ensures that the definitions in Python.h are visible to cgo. Everything found here will be made available under the C namespace and is accessible to code within this source module.  Now, as an aside, I noticed that C preprocessor macros were not expanded. So, because PyRun_SimpleString is simply a macro that calls PyRun_SimpleStringFlags, I had to use the latter.

The Makefile

We’ll be using the following makefile.  This comes from the example code under $GOROOT/misc/cgo/gmp.

# Copyright 2009 The Go Authors.  All rights reserved.
# Use of this source code is governed by a BSD-style
# license that can be found in the LICENSE file.

include ${GOROOT}/src/Make.inc

TARG=pygo

# Can have plain GOFILES too, but this example doesn't.

CGOFILES=\
  pygo.go

CGO_LDFLAGS=-lpython2.6
CGO_CFLAGS = -I/usr/include/python2.6

# To add flags necessary for locating the library or its include files,
# set CGO_CFLAGS or CGO_LDFLAGS.  For example, to use an
# alternate installation of the library:
# CGO_CFLAGS=-I/home/rsc/gmp32/include
# CGO_LDFLAGS+=-L/home/rsc/gmp32/lib
# Note the += on the second line.

CLEANFILES+=pygo

include ${GOROOT}/src/Make.pkg

# Simple test programs
pygo:
  $(GC) pygo.go
  $(LD) -o $@ pygo.$O

There’s a not very much going on here. If you compare this file to the example, we’re simply changing a few internal variables.

Finally, we add the pygo rule at the bottom and remove the example rules for gmp.

The pygo Package

Now that we have our Makefile setup and ready to go, we need to put together our actual Go package. Below you’ll find the listing as I used it. Note that in a few places I purposefully used the long way as a learning tool.

package pygo

/* "C" isn't a real package. Rather, it's a virtual one created when building
 * via cgo. Cgo, as opposed go 6g, allows us to access C code/libraries from
 * within a Go application. Everything in the C-world is then jammed under the
 * 'C.' namespace.
 */

// #include <Python.h>
// #include <pythonrun.h>
import "C"

/* Some basic file IO helpers live here. */
import "io/ioutil"

/* Create a PythonCode object, which is a struct, containing
 * just one field. There's no reason this couldn't have
 * just been a type PythonCode string. When I started,
 * my intention was to be a bit more complicated and store
 * return values and interpreter state and all that, but
 * there's really no point.
 */
type PythonCode struct {
  code string
}

/* In this case, a pointer to a PythonCode object is the receiver,
 * our function is called ExecPy, and there is no return value.
 * we call our internally defined init/exec/term functions.
 *
 * Two things to note here:
 * 1. i/e/t are lower case, which means that they are the equivilent
 *    of a C top level 'static void F().' Not visible outside of
 *    the package (Encapsulation FTW!).
 *
 * 2. This is a METHOD because it has a defined receiver.
 */
func (p *PythonCode) ExecPy(result chan int) {
  initPython("PYTHON_EXEC")
  py_response := execPython(p.code)
  termPython()
  result <- py_response
}

/* This is our init method. There is no __init__ or __del__
 * equiv.  This builds a new object and returns a pointer to it.
 * Note that here, it's 100% okay that we're returning what
 * appears to be a pointer to a local variable. I know,
 * makes me feel a bit uncomfortable, but the docs clear
 * that up.
 */
func NewPythonCode(src *string) *PythonCode {
  contents, _ := ioutil.ReadFile(*src)
  return &PythonCode{code: string(contents)}
}

/* Here we tickle the Python C API. We set the program name,
 * initialize state, and ensure we init correctly.
 */
func initPython(interp_name string) {
  C.Py_SetProgramName(C.CString(interp_name))
  C.Py_Initialize()
  if C.Py_IsInitialized() == 0 {
    panic("Could not init interp.")
  }
}

/* This executes the code. This probably could also be a method on
 * our structure-based type above, but I wanted to highlight
 * the differences between a Go method and a Go function.
 */
func execPython(python_code string) int {
  return int(C.PyRun_SimpleStringFlags(C.CString(python_code), nil))

}

/* Fall Py_Finalize and free up interp. data structures. Again,
 * this probably would work better using an interpreter Go
 * object and then a method to pass in code to exec, but this
 * covers more ground
 */
func termPython() {
  C.Py_Finalize()
}

There’s quite a bit going on in there, so let’s step through it in a bit more detail.

  1. The first few lines are straight forward. We name our package pygo, include the correct C headers, and import io/ioutil.
  2. Next, we create a PythonCode type, which is struct. This object simply contains a string which will be actual Python code.  As the comment states, we could have done this simpler, but I think the struct approach is probably more representative of idiomatic Go (correct me, someone?).
  3. Next, we define ExecPy. Since it has a capital ‘E’, it is exported from the package and available elsewhere.  This method does three things, all by calling other functions internal to the package.  Notice that it takes a channel of int objects as its sole parameter. We pass the return value of execPython back up the channel instead of simply returning it.  In the real world, a return value would be simpler, however, this illustrates the usage of channels.
  4. After that, we have NewPythonCode. As Go doesn’t provide initializers/constructors, we simply do the build in an appropriately named factory function like this.  Here, you see the calls directly into the Python library. In this case, C.Py_SetProgramName, C.Py_Initialize, and C.Py_IsInitialized. In actuality, Go is simply calling Py_Initialize, Py_SetProgramName, and Py_IsInitialized. The other thing to note here is the use of C.CString(). This converts a Go string object into a C char *.
  5. Next up is execPython. Here, we actually return the int value from C.PyRun_SimpleStringFlags. The int call/cast is necessary as a C int is not the same as a go int. Again, you’ll see the call to C.CString converting from string to char *.
  6. Finally, we simply call Py_Finalize from within termPython.

Pretty simple if you ask me.  I’m becoming a fan of the “C.$identifer” approach as it allows code authors to avoid having to write stub modules in C. (SWIG Status, anyone?).

The Driver

The driver is pure Go. There’s no C code whatsoever. We simply import our new module and call it as we would any other Go library.

/* Everything goes into a package. */
package main

/* Could import individually, but this is cleaner. Sort of like wrapping Python
 * imports in order to group them on multiple lines.
 */
import ("pygo"
  "flag"
  "fmt"
  "stdio"
)

/* Flag Parsing. This is silly easy. */
var pythonSource = flag.String("pysource", "test.py", "Python Source File")

/* Main entry point. Just like C/C++ */
func main() {
  flag.Parse()

  /* Print a goofy little header so we can visualize the Go exec
   * vs. the Python exec.
   */
  fmt.Printf("Exec Python: %s\n", *pythonSource)
  fmt.Printf("-----------------------------\n\n")
  stdio.Stdout.Flush()

  /* Create a channel, which is like a queue. We'll block on
   * this for our Pythoning to finish.
   */
  ch := make(chan int)

  /* Create, set code, Exec Py. But.. do it in a different thread
   * of control! All we did was fire it after the keyword "go."
   * Sweet.
   */
  go pygo.NewPythonCode(pythonSource).ExecPy(ch)

  /* This says "fill python_result with whatever comes out
         * of the channel." This is a blocking call.
   */
  python_result := <-ch
  fmt.Printf("\n\n-----------------------------\n")

  /* Another way of printing... */
  fmt.Println("Channel Returned Code: ", python_result)
}

That’s all there is to it. Again, let’s walk through this source listing.

  1. We name our package and import our dependencies. Note that our new package appears here and it is listed like any other package.
  2. Next we setup a –pysource flag, so we can specify a Python file on the command line. This lets us run arbitrary Python code from our Go executable.
  3. We print a header and flush standard output. This is to ensure Go finishes printing (buffered?) before Python starts running.
  4. Next, we create a channel of integers. The channel is used to send the result of our Python execution back to our driver code. Note the use of the make function versus the new function here.
  5. The next line passes the location of the Python source file to a new PythonCode object (that we created via NewPythonCode). See how this call is prefixed with the “go” keyword? This allows the Go runtime to run both the driver code as well as the Python system concurrently.
  6. Finally, we wait on our channel and print the exit code.  If we didn’t wait, the driver would complete execution before our Python library ran. In short, much like terminating the main thread before a worker thread has a chance to complete.

The Test Code

The following snippet of Python code is what I used to test. The metaclassery is not needed, but I was curious as to whether it would cause any issues (I didn’t think it would!).

import sys

class GoofyMeta(type):
  def __new__(*args, **kw):
    print "In Go->Python Metaclass"
    return type.__new__(*args, **kw)

class HelloFromGo(object):
  __metaclass__ = GoofyMeta
  def __str__(self):
    return "Hi, you are running Go."

if __name__ == '__main__':
  h = HelloFromGo()
  print h

  print "See, not kidding: "
  print sys.version

  print "Here's my global namespace: "
  print dir()

So, as a control, let’s run the Python code from the command line first, to ensure it runs as we think it should.

mcjeff@martian:~/pygo$ python test.py
In Go->Python Metaclass
Hi, you are running Go.
See, not kidding:
2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
[GCC 4.4.3]
Here's my global namespace:
['GoofyMeta', 'HelloFromGo', '__builtins__', '__doc__', '__file__', '__name__', '__package__', 'h', 'sys']
mcjeff@martian:~/pygo$

Now that we’re certain everything works, we can move forward with compilation and testing.

Compilation and Running

For this example, I’m going to use the same little shell function I used in my first Go post. Everything else is new.  Let’s take a look at what’s required to build this extension.

mcjeff@martian:~/pygo$ export GOROOT=~/go
mcjeff@martian:~/pygo$ make
CGOPKGPATH= cgo -- -I/usr/include/python2.6 pygo.go
6g -o _go_.6  pygo.cgo1.go _cgo_gotypes.go
6c -FVw -I"/home/mcjeff/go/pkg/linux_amd64" _cgo_defun.c
gcc -m64 -g -fPIC -O2 -o _cgo_main.o -c -I/usr/include/python2.6  _cgo_main.c
gcc -m64 -g -fPIC -O2 -o pygo.cgo2.o -c -I/usr/include/python2.6  pygo.cgo2.c
gcc -m64 -g -fPIC -O2 -o _cgo_export.o -c -I/usr/include/python2.6  _cgo_export.c
gcc -m64 -g -fPIC -O2 -o _cgo1_.o _cgo_main.o pygo.cgo2.o _cgo_export.o -lpython2.6
cgo -dynimport _cgo1_.o >__cgo_import.c && mv -f __cgo_import.c _cgo_import.c
6c -FVw _cgo_import.c
rm -f _obj/pygo.a
gopack grc _obj/pygo.a _go_.6  _cgo_defun.6 _cgo_import.6 pygo.cgo2.o _cgo_export.o
mcjeff@martian:~/pygo$ make install
cp _obj/pygo.a "/home/mcjeff/go/pkg/linux_amd64/pygo.a"
mcjeff@martian:~/pygo$

That’s it. The files included in our Makefile handle the setup and compilation for us. There are a series of intermediary files created that we really don’t have to concern ourselves with.  If you’re curious, they’re left around by the compiler. A simple ‘ls’ will display them.

Finally, we can run our Go binary and execute our small Python example.

mcjeff@martian:~/pygo$ source ~/.profile
mcjeff@martian:~/pygo$ gogogo main.go
Exec Python: test.py
-----------------------------

In Go->Python Metaclass
Hi, you are running Go.
See, not kidding:
2.6.5 (r265:79063, Apr 16 2010, 14:15:55)
[GCC 4.4.3]
Here's my global namespace:
['GoofyMeta', 'HelloFromGo', '__builtins__', '__doc__', '__name__', '__package__', 'h', 'sys']

-----------------------------
Channel Returned Code:  0
mcjeff@martian:~/pygo$ file main
main: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), not stripped

Well, there you have it. A quick run down on how to access C libraries from a Go application.  I find it rather elegant, honestly. Now, as for the embedded Python?  I wouldn’t use this anywhere, ever, for any reason. I’ve no idea what’s going on with regards to internal thread states and structures. This is nice little example, but probably not such a hot idea!

Posted on February 15, 2011 at 2:50 pm by Jeff McNeil · Permalink · 2 Comments
In: Go, linux, open source, python

Learning Go (and reading Tar files with it, too)

Over the past couple of days, I’ve taken an interest in Go. I’ve gone through the bulk of the documentation. I’ve read the tutorial, the Effective Go documentation, and a collection of other works available on the documentation page. I’m really taking quite a liking to it. Why?

What’s there not to like? Exactly. While the standard library has yet to read “batteries included” status, it is getting there. Now, you’ll find modules for SMTP, networking, HTTP,  and IO, among other things.  There’s even a growing collection of third party modules available on the Package Dashboard.

After reading through the available documentation, I decided to try my hand at a few examples. I couldn’t very much decide where to start, so I opened the standard library reference and started with ‘A.’  In this case, that means the archive packages – tar and zip.

Reading Tar Files with Go

First off, if you don’t have Go installed as of yet, head over to the getting started guide. Installation is source driven, but it’s truly very simple.  The compilation is slightly bulky as you’ll need to compile, link, execute.  That’s a cycle I haven’t used in a while.  In order to speed it up, I’ve defined the following shell function in my .profile.

gogogo() { source=$1 base=$(basename $source .go) 6g $source 6l -o $base $base.6 $(pwd)/$base }

Of course, this little rule only works when building one file at a time.

Now, let’s setup the environment.  This would work with much larger tar files, but for example purposes, we’re going to limit ourselves to some very small ones.  The test environment was setup as follows:

$ echo "one" > 1
$ echo "two" > 2
$ echo "three" > 3
$ tar -cvf test.tar ./[123]
a ./1
a ./2
a ./3
$

Notice that we’ve simply created three text files, named 1, 2, and 3. Each file contains its english representation.  Let’s spice it up a bit and change the contents of “1″ to один, which is simply the Russian word for “one.”

$ echo 'один' > 1
$ cat 1
один
$

Alright. Now, let’s have a look at the Go code needed to read from the tar file we created.

package main

import "archive/tar"
import "fmt"
import "os"

func main() {
  f,_ := os.Open("test.tar", os.O_RDONLY, 0600)
  reader := tar.NewReader(f)
  file_count := 1
  for hdr,err := reader.Next(); err == nil; hdr, err = reader.Next() {
    b := make([]byte, hdr.Size)
    fmt.Printf("%d: %s (%d bytes)\n", file_count, hdr.Name, hdr.Size)
    reader.Read(b)
    fmt.Println("-----> ", string(b))
    file_count++
  }
}

That’s all there is to it. Before we go into detail and explain exactly what’s going on here, let’s built it (using our shortcut) and run in order to verify the output.

$ gogogo tar.go
1: ./1 (9 bytes)
----->  один

2: ./2 (4 bytes)
----->  two

3: ./3 (6 bytes)
----->  three

As an aside, if you run an ls from the current directory, you’ll see the intermediate files needed by the go compiler. We’re just skipping that step with our profile function. Let’s take a closer look at the code, and then double back and review the results.

The very first line of the file declares that we’re defining package main. Each Go source file requires a package name. Larger applications are simply collections of linked packages. The idiomatic name for the main package is, not suprisingly, main.

Next we import three more packages. These are all used within the file. Note that if we were to import a package that we’re not using, the application would not compile. That ensure we always have clean code without a polluted namespace.

Now, we define a main function. This is the application entry point, much like you’ll see in a C, C++, or Objective-C program.

  1. We open a file by using the os.Open function. In Go, identifiers are only exported from packages if they begin with a capitol letter (think static vs. non-static C functions).  The os.Open function returns a file object.
  2. Next, we create a tar reader. The NewReader function expects an object implementing the Reader interface. Note that with Go, there’s no need to declare what you’re implementing. If you add the required methods, you automagically implement the interface. I like this approach. Duck typing light, if you will.
  3. We then iterate through the contents of the tar file by calling reader.Next. In Go, functions and methods can return more than one value. It’s common to see the actual value and a possible error condition passed back. As long as the error condition is not nil (Go’s None/NULL), we keep reading. The reader.Next method returns a header structure as well as an error value.
  4. Now, we create a slice of bytes. The make syntax creates a slice and an underlying array. For more information, see the Go documentation on arrays and slices.
  5. We print some status information, read the full contents of each element in the tar file, and print the results. We increment our file number counter so we can display how many files we’ve read.

Just for reference purposes, the Tar header is defined as the following in archive/tar:

type Header struct {
    Name     string
    Mode     int64
    Uid      int
    Gid      int
    Size     int64
    Mtime    int64
    Typeflag byte
    Linkname string
    Uname    string
    Gname    string
    Devmajor int64
    Devminor int64
    Atime    int64
    Ctime    int64
}

Pretty straightforward, no?  Now, double back and look at the results from earlier. They should all make sense. Well, maybe except for our little Russian file! Note that this record says that it is 9 bytes in size, while the remaining files state that they are equal to the number of files plus the trailing newline.  The answer is pretty simple. Each Cyrillic letter takes up two bytes when UTF-8 encoded. Go transparently handles that complexity for us. So, we’re looking at four two-byte letters, followed by a standard newline.

$ file 1
1: UTF-8 Unicode text
$ file 2
2: ASCII text
$

So, all in all, I’m starting to like this language. Take a minute and dive into it. Now, of course, the challenge becomes finding something worthwhile to write in Go. I’ve learned a small collection of languages over the past year, only to forget most of them due to lack of use. As an interesting aside, I had been planning on refreshing my C/C++. I think I may defer that a bit and really spend some time on this language.

Update: I’ve been asked how one woud manage a gzip compressed tar file.  Well, remember interfaces? The reader we pass into the tar.NewReader function simply has to implement the Reader interface! So, we can update the code above to open a gzip file and pass that reader in, like so:

package main

import ("fmt"
        "archive/tar"
        "compress/gzip"
        "os"
       )

func main() {
  fhandle, _ := os.Open("test.tar.gz", os.O_RDONLY, 0600)
  zhandle, _ := gzip.NewReader(fhandle)
  thandle := tar.NewReader(zhandle)
  hdr, _ := thandle.Next()
  fmt.Println(hdr.Name)
}

See? All we’ve done here is chain our NewReader calls, as each returned object implements the Reader interface.  Running the code provides the following output.

mcjeff@macbook:~$ ls test.tar.gz
test.tar.gz
mcjeff@macbook:~$ gogogo gunzip.go
./1

There. Hope that clears it up!

Posted on February 14, 2011 at 11:52 pm by Jeff McNeil · Permalink · 4 Comments
In: development, Go, open source