Python, DB2, and FOR BIT DATA Columns

I’ve been doing some research lately on how well various languages and web application frameworks integrate with IBM DB2 and happened across some less than ideal results. An errant strlen() call nearly killed the project before it ever got off the ground.

As you’d expect, the DB2 drivers for Python and Ruby are written in C and compiled to modules for their respective language targets. What you might not expect is that DB2 has a CHAR/VARCHAR type modifier that makes it difficult, if not impossible to work out of the box with nearly every mainstream framework. It’s called CHAR FOR BIT DATA. Here’s what IBM has to say about it:

A CHAR FOR BIT DATA type allows you to store byte strings of a specified length. It is useful for unstructured data where character strings are not appropriate.

So, these basically take a string that looks like '20100322014820681369000000' and turns it into something like ' \x10\x03"\x01H h\x13i\x00\x00\x00'. DB2 also includes a function called generate_unique() that, not surprisingly, generates IDs for these column types that are guaranteed unique across an entire database. Since the Python C API knows how to take a byte array and turn them into a Python string, everything should be good, right? Wrong.

What happens if you try to compress data for one of these columns, but it looks like '20000000000000000000000000' instead? You get this:

>>> from binascii import a2b_hex
>>> a2b_hex('20000000000000000000000000')
' \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

That’s a lot of null bytes, but it’ll store in the database just fine. Retrieving that from the database, however, has some unexpected results: Python only gives you one of the 26 characters you stored.  Turns out the driver does a strlen on the binary data sent from the database engine, which sees a null byte and treats it as a string terminator, truncating your data.

The good news is the fix was trivial once a convention was agreed upon. The group responsible for the Ruby driver also picked up the fix and will be releasing a comparable patch it in the near future. Big thanks to both groups for helping us get over a substantial hurdle.

Update (3/31/2010): The devs for the Python ibm_db project have released a new version that includes the fix mentioned above. I finally got around to testing the change, and it looks to have solved our problem. Hopefully the Ruby driver is corrected upstream as well shortly.

How easy is the process?

I was doing a bit of catching up in my feed reader today, and one of the latest 37signals posts caught my attention. The post was simple enough, but it was this comment in response to a few other posters that really nailed it:

Who cares about how you do it. That’s just techniques.No matter how you devide[sic] it up into templates and scripts, if the site changes, you have to make those changes. How easy is that process?? That is the question.

Apologies for taking that somewhat out of context to prove a point.

If your software maintenance is painful or leaves you feeling apprehensive, you need to figure out why it’s that way and go after the core of the problem. It only gets worse the longer you ignore it.

Firebug 1.4 Activation Model

I use Gentoo Linux with KDE for my development box at my day job. Gentoo is pretty good about keeping their package repository as close to matching what’s stable upstream as they can, but Firefox 3.5.x took a good long while to be marked stable and consequently, I was running Firebug 1.3 until just recently.

Upon upgrading, I spent a couple days being pretty annoyed by it’s new “simplified” activation model.  Each dev has a subdomain for a development site, but our issue tracker and wiki is also on a subdomain causing firebug to automatically activate itself for the wiki or tracker anytime I had activated it for a development site. My google-fu must have been weak because I couldn’t find a good solution to the problem. Only frustrated people.

I stumbled upon the answer today: Activate Same Origin URLs under Tools > Firebug > Options. Unchecking it restored my sanity.

Making It Work Is Only Part of the Project

I’ve been slowly working my way through Coders At Work the last few weeks. So far, it’s been a great read. Seibel did an excellent job leading the conversations, and the peek into the minds of high profile coders and the evolution they’ve undergone has been beyond intriguing. This quote from Joshua Bloch really struck a chord with me tonight:

The older I get, the more I realize it isn’t just about making it work; it’s about producing an artifact that is readable, maintainable, and efficient.

I couldn’t agree more. Except for the ‘older’ part. I’m not that old yet.

My experience with software development in the wild thus far points to the same sentiment. Personal projects tend to be thrown together quite often. Mine are certainly no exception. What I’ve found interesting, however, is that the projects (even small ones going back to my college days) where I’ve put in a little extra effort to better organize my code have proven much, much quicker to change down the road.

Of course, there has to be a balance. Code is worthless if it’s never shipped. The flip side is that it’s not worth a whole heck of a lot if it easily change over time to change with the business requirements.

Here’s a test: Find some code written by someone else in a module that you aren’t familiar with. How long does it take you to understand the purpose of that code? If it’s more than a couple minutes, take that into consideration the next time you’re hacking away on a project, even if just to write better comments. *Guaranteed* someone will appreciate the extra effort down the road.

Are aUnit frameworks on the horizon?

Something that’s been rolling around in the back of my mind for some time is the term aUnit Framework. A lot of developers are familiar with xUnit family, as least by name if nothing else. If not, I highly recommend reading Martin Fowler’s xUnit history writeup. “aUnit” is not the greatest title, but neither was Microsoft Bob if you ask me.

I say it’s a poor choice of names because I’m not talking about unit-tests in the traditional sense. The web development community needs a well-designed tool crafted around the constructs that the modern-day web is fashioned from. Selenium is quite powerful, but my experience has been that it’s a royal pain to make any but the simplest tests tolerant of response delays with Ajax calls. In the pure Unit Testing world, YUI Test and my personal favorite, QUnit are excellent for running suites against strictly client-side code, and are capable of wrapping tests around Ajax calls, but there isn’t a clean way to tie those into any kind of continuous integration build.

Looking around StackOverflow and the various feeds I subscribe to from other developers, I see a huge desire for something that just works without an undue amount of pain and suffering.

Next Page →