Programming Limits

Work with old, large systems enough and you start to get a persistent feeling of Deja Vu. Get large enough and you always seem to grow a set of features that is almost like every other large system, just implemented in a completely different way so that code sharing is simply impossible.

How many file-system abstraction mechanisms have we collectively written? How many inter-thread communications mechanisms? How often have we re-implemented configuration management, object-relational mappers, caching, indexing, page-layout, and hundreds of other big sub-systems?

When you're starting out, it's just too much work to figure out the "other guy's" byzantine system for handling almost every possible corner case of the problem domain. You only have to do this one thing. You'll never need all that. You can just code up what you need and be done in a few hours instead of spending days or weeks learning how to debug the huge mass of meta-programming in someone else's system.

So you roll your own and it works. You're an elite hacker. Maybe you even pat yourself on the back for having avoided a dependency. Then the customers ask for one more feature. No problem, a day at most. Another, a bit bigger, you refactor the codebase and now it's elegant and ever so much more general. A few years along, you are the (proud?) owner of a byzantine system that handles every possible case except the 5 that the other system handles. Unfortunately, with completely different substrates and totally alien structures, they can't be used together or even talk to each other, so some of the people use yours, some the other, and the potential pool of devlopers and users is halved.

I'm wondering (more throwing out a straw-man) if the problem really is one of documentation? If every large system came with two pieces of documentation:

Documentation that let another meta-progammer (the type who could create such a system themselves) sit down and figure out all the corner cases from a set of principles and rules, something that let you understand the totality of the codebase and walked you through the exceptions and caveats until you could predict what would be in the code base as easily as you would your own code.

Documentation that exhaustively linked to (runnable, tested) code examples for accomplishing every common (and uncommon) task. It could be backed by some sort of Trac-like system so that every bug report, every mailing-list post, every question that came up was turned into a runnable test/example, with people able to push new examples in...

Would it be enough? Would anyone care? I know the second would be very useful, most programmers seem to start off with copying code samples and altering them to work for their purposes, so maybe it would be enough. Maybe the first document is really a red herring, something you pour your work into, but which isn't necessary, because the people who really want to hack on the code have already got some investment and want to know how to make that one needed effect happen. Maybe it's the code library that needs to be front-and-center to make your library pull people in and avoid having the other systems split the community.

Or maybe the Romans had the right idea, and the beauty of each thing is the thing itself, not the Greek ideal shining through from some platonic realm. Maybe each large system's implementation of the dozens of similar sub-systems is what makes the whole magical.

Comments

  1. Mark Eichin

    Mark Eichin on 06/16/2008 1:29 a.m. #

    You don't even need a large-but-generic subsystem to see this effect - optparse is enough. "Oh, I'll just grab things out of sys.argv" "Oh, we want a --help, ok fine I'll just filter that out first"... it's just a lot easier to slap that down and demand optparse.

    I think the examples are key - putting some simple "no really, just cut&paste this even if you're doing *no* argument parsing" templates on our (internal, corporate, developer) wiki made a huge difference.

    I also think "batteries included" makes a huge difference as far as treating existing modules as part of the toolset that you're *obliged* to understand and not get all NIH about...

    (ps. you can probably guess that I picked this up from PlanetPython, even though you've tried to be more generic in your arguments :-)

  2. skierpage

    skierpage on 06/16/2008 4:45 p.m. #

    In Perl, what's the problem? You realize you need to do something, you go to cpan.org, you search, you find promising modules, you read their online documentation and browse the code (that part should be better), you get a good feeling for the best module, you install it using the CPAN module, `make test`, and try it out. Done.

    (/me surfs...) Seems there isn't an exact equivalent for Python. RubyForge.org seems just a directory of links to other Web sites — the module browser needs to force every project to conform for documentation, tests, etc. It seems Perl still has the best "code library" system.

    Your idea to integrate tests and bug reports and samples with doc is great. Users shouldn't have to join a project as a committer to contribute these. The doc on php.net has a feedback system where users contribute sample code, but it's a copy 'n' paste mess. Bugzilla's comments and attachments on bugs are a good organizing metaphor but it's standalone, it's not tied to browsing the project's source code and documentation. I think what's needed is a URI scheme for the tests, bug reports, and samples to refer to source code files/functions/lines, and then the source code and doc browser should notice these links from tests, bug reports and samples and display backlinks to them. Dude, make it so! 8-)

    If you're talking about the overall framework rather than reusing specific modules: a) your language should encourage/impose a lot of framework, b) Ruby on Rails shows a buzzword-worthy framework can become a powerful standard, c) most frameworks are easier to write than comprehend. For frameworks, your second kind of documentation helps, your first is a pipe dream. Either there's insufficient documentation or so much it's easier to read the code. Evaluating which framework to use is perhaps the hardest job of a system architect.

Comments are closed.

Pingbacks

Pingbacks are closed.