Python, DB2, and FOR BIT DATA Columns
I’ve been doing some research lately on how well various languages and web application frameworks integrate with IBM DB2 and happened across some less than ideal results. An errant strlen() call nearly killed the project before it ever got off the ground.
As you’d expect, the DB2 drivers for Python and Ruby are written in C and compiled to modules for their respective language targets. What you might not expect is that DB2 has a CHAR/VARCHAR type modifier that makes it difficult, if not impossible to work out of the box with nearly every mainstream framework. It’s called CHAR FOR BIT DATA. Here’s what IBM has to say about it:
A CHAR FOR BIT DATA type allows you to store byte strings of a specified length. It is useful for unstructured data where character strings are not appropriate.
So, these basically take a string that looks like '20100322014820681369000000' and turns it into something like ' \x10\x03"\x01H h\x13i\x00\x00\x00'. DB2 also includes a function called generate_unique() that, not surprisingly, generates IDs for these column types that are guaranteed unique across an entire database. Since the Python C API knows how to take a byte array and turn them into a Python string, everything should be good, right? Wrong.
What happens if you try to compress data for one of these columns, but it looks like '20000000000000000000000000' instead? You get this:
>>> from binascii import a2b_hex
>>> a2b_hex('20000000000000000000000000')
' \x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
That’s a lot of null bytes, but it’ll store in the database just fine. Retrieving that from the database, however, has some unexpected results: Python only gives you one of the 26 characters you stored. Turns out the driver does a strlen on the binary data sent from the database engine, which sees a null byte and treats it as a string terminator, truncating your data.
The good news is the fix was trivial once a convention was agreed upon. The group responsible for the Ruby driver also picked up the fix and will be releasing a comparable patch it in the near future. Big thanks to both groups for helping us get over a substantial hurdle.
Update (3/31/2010): The devs for the Python ibm_db project have released a new version that includes the fix mentioned above. I finally got around to testing the change, and it looks to have solved our problem. Hopefully the Ruby driver is corrected upstream as well shortly.
How easy is the process?
I was doing a bit of catching up in my feed reader today, and one of the latest 37signals posts caught my attention. The post was simple enough, but it was this comment in response to a few other posters that really nailed it:
Who cares about how you do it. That’s just techniques.No matter how you devide[sic] it up into templates and scripts, if the site changes, you have to make those changes. How easy is that process?? That is the question.
Apologies for taking that somewhat out of context to prove a point.
If your software maintenance is painful or leaves you feeling apprehensive, you need to figure out why it’s that way and go after the core of the problem. It only gets worse the longer you ignore it.
Making It Work Is Only Part of the Project
I’ve been slowly working my way through Coders At Work the last few weeks. So far, it’s been a great read. Seibel did an excellent job leading the conversations, and the peek into the minds of high profile coders and the evolution they’ve undergone has been beyond intriguing. This quote from Joshua Bloch really struck a chord with me tonight:
The older I get, the more I realize it isn’t just about making it work; it’s about producing an artifact that is readable, maintainable, and efficient.
I couldn’t agree more. Except for the ‘older’ part. I’m not that old yet.
My experience with software development in the wild thus far points to the same sentiment. Personal projects tend to be thrown together quite often. Mine are certainly no exception. What I’ve found interesting, however, is that the projects (even small ones going back to my college days) where I’ve put in a little extra effort to better organize my code have proven much, much quicker to change down the road.
Of course, there has to be a balance. Code is worthless if it’s never shipped. The flip side is that it’s not worth a whole heck of a lot if it easily change over time to change with the business requirements.
Here’s a test: Find some code written by someone else in a module that you aren’t familiar with. How long does it take you to understand the purpose of that code? If it’s more than a couple minutes, take that into consideration the next time you’re hacking away on a project, even if just to write better comments. *Guaranteed* someone will appreciate the extra effort down the road.
Are aUnit frameworks on the horizon?
Something that’s been rolling around in the back of my mind for some time is the term aUnit Framework. A lot of developers are familiar with xUnit family, as least by name if nothing else. If not, I highly recommend reading Martin Fowler’s xUnit history writeup. “aUnit” is not the greatest title, but neither was Microsoft Bob if you ask me.
I say it’s a poor choice of names because I’m not talking about unit-tests in the traditional sense. The web development community needs a well-designed tool crafted around the constructs that the modern-day web is fashioned from. Selenium is quite powerful, but my experience has been that it’s a royal pain to make any but the simplest tests tolerant of response delays with Ajax calls. In the pure Unit Testing world, YUI Test and my personal favorite, QUnit are excellent for running suites against strictly client-side code, and are capable of wrapping tests around Ajax calls, but there isn’t a clean way to tie those into any kind of continuous integration build.
Looking around StackOverflow and the various feeds I subscribe to from other developers, I see a huge desire for something that just works without an undue amount of pain and suffering.
Boredom-Driven Development
I mentioned in a previous post that the Ajax Experience conference crowd showed a great interest in automated testing. During one of the testing sessions, I posted a thought that got retweeted several times:
After the conference, I spent some time looking at what the various frameworks had as for testing hooks built in. While I don’t have working experience with Rails, I was familiar with it’s console and some of the clever ways it lets you peek into your application to see how things are working (or not working) together. I was curious if Django had something similar, although I got sidetracked watching Django in the Real World from Django-Con 2009 before I got too far. There’s actually quite a few good quotes throughout the presentation, but this one from Kent Beck’s Test Driven Development: By Example stood out:
Tests are the Programmer’s stone, transmuting fear into boredom.
Too often, those of us in the maintenance section of the software life cycle are paralyzed by fear when we come across a problem. Can I fix this without breaking other things? Does anything in my application depend on this broken implementation? We shouldn’t have to ask these questions, and automated tests give us the confidence to make those necessary changes without the element of fear. You want that level of boredom.
Having to agonize over what should have been a simple bug fix is not a fun process. Test-driven development has caught a bad wrap from some who argue that TDD imposes unrealistic principles in real-world development. I disagree with that stance on the basis that most of Beck’s work that I’ve read has argued for practical rather than fanatical testing. If you can get to 100% code coverage, great, but do you really need to test your getter/setter methods? Probably not.
