Archive for the ‘Development’ Category

Silence is Golden

Tuesday, November 30th, 2010

It is a mostly well-known fact, yet nearly always overlooked, that large software teams underperform small ones. I’ve been building software professionally for 15 years and continue to be amazed by this simple truth. Fred Brooks, of course, canonized this thinking in his “bible of software engineering”, The Mythical Man-Month, but as Brooks has been known to say of his book, “everybody quotes it, some people read it, and a few people go by it.” The urge to add bodies to expedite software development is just too strong to resist.

This isn’t surprising. It’s human nature to apply learned patterns to problems. You need to build a stone wall to defend your town? Add labor (preferably slave) and the project pace quickens. Trying to hunt a bison? Get all of the able-bodied men to assist. Work, broadly defined, is a function of man-hours. The more men, the more man-hours are chewed-up per day, and the faster we all finish and go home.

Unfortunately, software development works precisely the opposite. Throw more people at an effort and the pace drags. And once the pace drags, the human instinct takes over and the urge to add “resources” to an effort takes off – thereby slowing it down further. Brook’s Law is a law for a reason, it’s like gravity—try as you may to overcome it, you’re going to get pulled down.

So why is this the case? At the most basic level, it’s of course about communication. More people require more communication. Think factorials. Communication isn’t inherently bad, but it’s time consuming and fraught with inconsistencies and errors (but Tom told me that feature was taken out of this version of the app). It’s also about decision-making. Larger teams often have more than one decision-maker, which is deadly. Show me a medium to large scale software development effort with more than one decision-maker and I will show you a future disappointment.

OK, so if communication and shared decision making are at fault, then the optimal team size for a development effort is one, correct? Well, sort of. One talented designer / developer with an innate understanding of the product they are trying to develop is the optimal team size, yes. (Chris Dary from Arc90 is one example of that – he single-handedly built the first version of Kindling and took it from idea, to prototype to beta version in weeks. No debating features, no iterating on logos (make it bigger!), no defending decisions.) However, you probably recognize a logical constraint with this, which is that these people and conditions are very rare, and software development, like every other human endeavor, is becoming more and more specialized. It’s hard to find all of the required skill in one human – a product visionary that knows javascript frameworks, can configure apache, design interactions and write SQL. So let me rephrase the optimal team size for any software effort:

The optimal team size for any software effort is precisely enough people to cover all of the required skills of that effort and not a single human more.

Follow this rule and you will be optimally positioned for success 100% of the time.

Besides reducing the team size down to the bare minimum, are there other things you can do to resist the tug of Brook’s Law? Thankfully, there are:

  • Have a fully empowered single Product Manager. This is probably the most controversial. Years ago after reading the (must-read) Dreaming In Code, I wrote this:

    Good software requires a singular vision, and a singular keeper of that vision. The Chandler effort was plagued by ambiguity from the start, what were they building? Did anyone know? Did all of them have the same vision? Any successful software efforts that I’ve been a part of had either a benevolent dictator, a less-than-benevolent dictator or a team with a shared brain. Another manifestation of this is the fact that often great software is built by a team of one – where the vision is never lost in the details of communication. Even worse for the Chandler team, not only did the team not have a shared vision of what they were striving towards, it seemed that no individual did either. Rosenberg articulates this point perfectly later in the book:

    “…software development takes simple elegant visions and atomizes them, separating them into millions of implementation details and interface choices and compromises. The art of making software well is, in a sense, the ability to send a vision through that atomizer in such a way that it can eventually be put back together…”

    A good litmus test here would be to walk around and talk to your software team, asking each member what they’re building. Force them to describe the vision, not the details of a specific task. If you hear entirely different things from the team, you’re likely in trouble.

    That comment received the most feedback, as people were uncomfortable with the notion of one decision-maker. But I believe that even more today than I did then.

  • Document versus communicate. This is a lesson I’ve learned over and over. Communication and discussion is fine, but once a call is made, immediately capture it as fact. That might be in a Requirements Document or better yet as a ticket in an issue tracking system, but don’t leave decisions floating in the ether. Decisions are gold, when you arrive at one, treat it as such.
  • Design first, build later. And I don’t only mean wireframe and comps, I mean design everything before writing code. The architecture. The API. Whatever can be thought-through ahead of coding pays dividends later. This is such an obvious and straight-forward fact that I see ignored repeatedly. Design is decision-making and arriving at decisions early is the goal. Having a fully documented and stubbed API, for example, represents tens of hours of thinking and hundreds of decisions that, once in place, empower development teams to build without communicating with each other. And communication is the enemy of productivity.
  • Use good issue tracking. It’s almost 2011, get yourself a decent issue tracking system. Imagine a fireman using a dull axe or a painter using worn brushes. Building software is an exercise in communication and the issue tracking system is where nearly all of this communication plays out. That’s how important it is.

But the absolute, don’t-forget-it, most important thing is team size. The optimal team size for any software effort is precisely enough people to cover all of the required skills of that effort and not a single human more. Ignore this at your peril.

MongoNYC

Thursday, May 27th, 2010

If you’re keeping up with emerging industry trends, you’ll be familiar with the non-relational database (aka NoSQL) movement. If you aren’t keeping up, the basic idea is that traditional databases don’t scale well on commodity level hardware once you get enough traffic to go beyond what a single DB server can handle. By tweaking/removing design constraints, NoSQL databases (there are well over a dozen) offer one or more features to scale horizontally on commodity servers. This is mainly a problem for very large sites, but most developers have at least some interest in the problem since every startup hopes to become one of the very large sites.

At Arc90, there are only a few areas that would really benefit from the performance of a non-relational database, but one of the things I enjoy most about working here is the general enthusiasm for new technology. The developers here have been independently tracking the whole NoSQL movement for a while and a number of us have actually implemented personal projects in both MongoDB and CouchDB and were pleased with the results. That made the decision to go to the one-day MongoNYC event just a few blocks downtown an easy one.

The conference itself was put together well. My main gripes were the typically terrible tech conference wireless connectivity, and, more importantly, that the hallways were too cramped to really facilitate discussions.

The best talk I attended was Kyle Banker’s Schema tutorial. The discussion wasn’t about forcing schemas into a schemaless database, but rather a tutorial on the different ways to store relations between objects and the benefits and drawbacks to the various approaches. I definitely recommend watching it when they release the video online, but in the meantime you can get a more focused take on the concepts by checking out Kyle’s post on MongoDB and E-commerce.

The other highlight was getting in early to the Gilt talk on real time analytics. The talk started at noon, which is when Gilt has their daily sale, and they had their Hummingbird analytics app running on the main screen. When Hummingbird first came out, I was wondering who would need an analytics app that updated 20 times per second? Watching the traffic go from under 100 to over 5000 requests/sec in the course of about 2 minutes was both the answer to my question and very amusing.

All the 10gen speakers were coherent and well spoken, though some topics were drier than others. The other talks varied but my cynical summary is: “We’re using Mongo! It was easy! It runs fast! No, we’re not using it everywhere. No, we’re not using it for financial transactions.” That isn’t meant as a slight to the speakers, just that there were no horror stories. Good for databases, but it makes for a less entertaining conference.

So, at the end of the day, my outlook on Mongo improved a bit. The indexes are more powerful than I’d realized since they go into list values and objects on indexed attributes. I’d also like to see support for couch-style map/reduce views that get updated on insert, but in general, Mongo has always been the NoSQL database that I thought made the smartest tradeoffs for web server use.

Debugging

Tuesday, November 24th, 2009

In Coders at Work, Joe Armstrong cites Joe’s Law of Debugging: all errors will be plus/minus three statements of the place you last changed the program. (Read the entire page here.) If I had remembered this last night, instead of this morning, I could have saved myself 3+ hours of debugging.

Note to self: next time you’re about to start debugging — especially after 23:00 — for the love of God, first check the diffs of the last couple of commits!

This is just one of many nuggets of wisdom in this excellent book, which I recommend highly.

A Dialog on Processes & Threads

Monday, October 26th, 2009

I’ve been a fan of Ryan Tomayko ever since I stumbled across his How I Explained REST to My Wife in 2005. (That piece was so good that I actually performed it live during a company lunch in early 2006, with the inimitable Kamni Khan.) He’s had a special place in my feed reader since before the Avi Flax release of FeedLounge (yes, that actually happened). So as soon as his most recent essay, I like Unicorn because it’s Unix, appeared in my current feed reader, as usual I read it immediately, and as usual I learned something.

The essay’s main points are that core Unix concepts such as fork(), accept(), select(), and Unix sockets had lamentably been ignored or neglected by proponents of Ruby and Python; that Unicorn is undoubtedly one of the best, most densely packed examples of Unix programming in Ruby I’ve come across; and that Ruby (and Python) developers should learn more about these paradigms, and consider using them where appropriate.

While I learned much from the essay, and enjoyed reading it and following the “* is Unix” meme it sparked, after I read it I had some questions rolling around my head which I couldn’t quiet. So, since comments are currently disabled on Ryan’s site, I wrote him an email. He was kind enough to respond quickly and at length, illuminating a subject I’d been fuzzy on for years. In fact, I thought his responses were edifying to the degree that they should be published. So after securing his permission, I’m reproducing our dialog here.

Continue reading»

Use a DVCS to Track Anything Instantly

Sunday, August 16th, 2009

I recently stumbled across the essay In brief praise of DVCSs by Aristotle Pagaltzis, and it was a revelation for me. I’d been wanting to try a DVCS for a while, but hadn’t really gotten around to it — the perceived benefit hadn’t justified the perceived effort. But when Aristotle pointed out that Subversion makes the mental overhead of creating a repository very much greater than any DVCS, something clicked. I saw an immediate benefit I’d gain from even simplistic usage of a DVCS, and so I immediately gave it a try.

I had a text file that I wanted to explicitly track versions of — I use Time Machine for incremental backup, which is great, but doesn’t always give me precise enough control of the snapshots, and I don’t always have the snapshots available. Once I read Aristotle’s essay, it only took me a few minutes to download and install Mercurial, and then I started tracking my file with three simple commands: hg init; hg add *; hg commit -m "initial". Quick, easy, and effective. From that point on, my folder was also a repository, and I could track specific revisions of my files by simply typing hg commit -m "update" at any point.

What I think is the killer realization here is that it is now super easy to make any folder into a repository, and instantly start tracking versions of its contents. I’ve used Subversion to make local repositories, and track local files in the past — and it really is many more steps, and much more work. It’s also messier, because the repository and the working copy must be separate entities — you have to think about where the repository should live, as opposed to the actual stuff, the working copy. With a DVSC, no such dichotomy exists. The repository and the actual stuff are one and the same. You can move them, zip them, email them — as long as the metadata directory is preserved (for example, .hg or .git), the folder remains a repository.

I’m thankful to Aristotle for helping me get past the initial barrier to entry, and start actually using a DVCS. I’m looking forward to learning more about them, and using them even more going forward.

(Bonus tip: on OS X, some applications save their data using a special type of file called a “package”, which is actually a directory with a special flag. One example is OmniGraffle, which I use frequently. Because these “files” are actually directories, they can be easily made into self-versioning files with their own self-contained repositories. Just navigate to the directory and run the commands above! From that point on, the file will be a repository and can track its own versions. Pretty useful!)

(I can’t explain exactly why I decided to try Mercurial first, before Git or Bazaar. I’ve read many comparisons of them, and something about those comparisons just made me want to try it. I’m not sure exactly how, but I had built up the impression that Mercurial was somehow simpler, cleaner, more elegant, and more approachable than Git or Bazaar. I don’t know if that’s actually true, but the impression was enough that that’s where I started.)

(Aristotle’s mention of “mental overhead” as a decisive factor in using one system over another is something I’ve been thinking about lately as well, although I’ve used the more unwieldy “cognitive overhead.”)

Restlet Newbie FAQ

Sunday, March 22nd, 2009

We’ve been using Restlet for a few years now, so we’ve had the opportunity to introduce a few employees to it. There are a few questions that invariably come up as people starting learning Restlet; I thought it might be helpful to post them here, with the answers, for reference.

Note: answers are written against Restlet 1.1.

Q: How do I access URL query parameters?

There are two answers:

A1, for specific expected query parameters

The easiest way to work with query parameters which your application expects and which have a specific name is to tell your Routes about them.

Typically, you create a Router in an Application, and then add Routes to it using attach(String uriPattern, Class<? extends Resource> targetClass) . Calling this method returns a newly created Route which represents your mapping of a path to a class.

To have the Route extract a parameter from the URL for each request, call extractQuery(String attribute, String parameter, boolean first), like so:

router.attach("collection", MyCollectionResource.class)
.extractQuery("page", "page", true)
.extractQuery("perPage", "perPage", true);

Once that’s all set, when you’re writing your Resource and want to access that parameter, Restlet automatically puts it in the Map returned by Request.getAttributes().

So you can access your parameter like so:

getRequest().getAttributes().get('page');

Keep in mind that if the parameter isn’t specified for a given request, the result of the get() call will be null. Also, if you pass the value false as the last parameter to extractQuery(), the result may be a List.

A2, for arbitrary query parameters

Of course, sometimes you need to just access an arbitrary query parameter on the fly, or check whether one was specified or not. This is easily done anywhere in a Resource, using getQuery(), like so:

String fooParam = getQuery().getFirstValue('foo');

It can seem odd at first that getQuery() returns a “Form”, but the Form class turns out to be a good fit for this use case, and a very helpful class. For example, it has the useful method getValuesMap() to get a Map containing all of its values.

Q: How do I access request headers?

A1, for standard HTTP headers

Restlet is intended to be a comprehensive Java API for building and calling RESTful applications — and not necessarily RESTful web applications. Therefore, Restlet provides classes and methods which represent all the commonly used RESTful paradigms, semantically.

So, to access the values of most standard HTTP headers, you call the appropriate method of the Request class.

A quick reference:

HTTP Header Method of Request
Referer getReferrerRef()
Cookie getCookies()
If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match getConditions()
Authorization getChallengeResponse()

(This table is not exhaustive, it’s an example to illustrate how Restlet works.)

The Restlet wiki contains the page Mapping HTTP semantics, which exhaustively maps HTTP semantics to Restlet. Note that it targets Restlet 1.2.

A2, for custom HTTP headers

Of course, a client can send any header as a custom header, and you may need to check if any have been sent, and get their values. For this reason, Restlet puts all request headers into a Form, which is accessible via a special key in the request attributes.

For example:

Form requestHeaders = (Form) getRequest().getAttributes().get('org.restlet.http.headers');
String myAwesomenessHeader = requestHeaders.getFirstValue('X-Awesomeness');

Q: How do I set response headers?

Response headers works the same as request headers: class methods for standard headers, and a special key in Response attributes for custom headers.

A1, for standard HTTP headers

HTTP Header Method of Response
Location setLocationRef()
Server setServerInfo()
WWW-Authenticate setChallengeRequest()

(This table is not exhaustive, it’s an example to illustrate how Restlet works.)

Some headers are derived from the response representation:

HTTP Header Method of Representation
Last-Modified setModificationDate()
Etag setTag()
Content-Size setSize()
Expires setExpirationDate()

(This table is not exhaustive, it’s an example to illustrate how Restlet works.)

A2, for custom HTTP headers

Form responseHeaders = (Form) getResponse().getAttributes().get('org.restlet.http.headers');
if (responseHeaders == null)
{
responseHeaders = new Form();
getResponse().getAttributes().put('org.restlet.http.headers', responseHeaders);
}
responseHeaders.add('X-Awesomeness', 'true');

Updated on March 23, 2009 to use Form.add() instead of Form.createEntry(), which doesn’t seem to produce the desired effect.

Updated on March 23, 2009 to use Resource.getQuery() instead of getRequest().getResourceRef().getQueryAsForm(); added a note about the relevant version of Restlet; added a link to a wiki page mapping HTTP to Restlet. Thanks Jérôme!

Big Improvements Coming Soon to WADL

Tuesday, February 3rd, 2009

Marc Hadley, the creator of WADL, has started working on a new version, which he recently described in Draft WADL Update.

My favorite enhancement is that method elements can now have multiple response child elements, and the status attribute has moved from the representation element to the response element.

Continue reading»

Trac Bookmarklet

Tuesday, January 27th, 2009

Already know your Trac ticket number and don’t want to type a URL, scroll through your browser history, or have to search your Trac tickets? Customize server and path and save this bookmarklet for any Trac you use regularly:

If you save the bookmarklet to your browser’s toolbar you simply click once, enter the ticket number, and you’re in your ticket.

Encoding and XML

Tuesday, January 6th, 2009

In speccing a project last month, we discussed the best way to attach PDFs to the XML documents we send between very distributed systems. We quickly decided it would avoid a whole host of atomicity, reliability, and redesign issues to simply update our XML schema to include Base64 encoded documents inside the body of the XML document itself.

For my half of our application ecosystem I researched .NET’s Base64 encoding/decoding support, got curious about character set encoding, and set out to write a universal encoder to make it simple, easy, and guaranteed safe to insert any kind of binary or text data into an XML document.

But first, a quick nano-refresher: Character encoding specifies how a string of bytes should be mapped to specific text characters. In the simplest case, ASCII, one byte maps to one of 256 possible characters. 65=A, 66=B, …. 90=Z, etc. In Unicode, two (or even four) bytes map to thousands or even (theoretically) billions of characters. So when my program reads four bytes from a text file, I need to know if it represents four 1-byte characters, two 2-byte characters, or one 4-byte character. It’s actually even more complicated than that, but the basic problem is making sure I don’t accidentally turn a 400-byte ASCII text file into 200 Japanese characters. Or vice versa. Or garbage.

Fortunately, .NET has robust support for character encoding, so all I have to do is load the correct encoding class and ask it to take care of this for me. If I know I will only ever need to deal with Unicode, that class is Encoding.Unicode, but for maximum flexibility I can call Encoding.GetEncoding(encodingName) and get any encoding by name. Like so:

public string GetStringFromFile(string myFilename, string encodingName)
{
byte[] fileBytes = System.IO.File.ReadAllBytes(myFilename);
System.Text.Encoding myEncoding = System.Text.Encoding.GetEncoding(encodingName);
return myEncoding.GetString(fileBytes);
}

After I’m done modifying that string I can easily convert it back into a byte array and save it, preserving the original character encoding:

public void SaveStringToFile(string myFilename, string encodingName, string myString)
{
System.Text.Encoding myEncoding = System.Text.Encoding.GetEncoding(encodingName);
byte[] fileBytes = myEncoding.GetBytes(myString);
System.IO.File.WriteAllBytes(myFilename, fileBytes);
}

But now let’s get back to my actual business problem, storing a PDF or other binary data in my XML document. Because the bytes I encounter are not supposed to represent character data, attempting to map them to characters may result in nonsense. For example, whether I decode the byte sequence 00 00 00 00 as ASCII, Unicode, or UTF-32, I get either one, two, or four null characters that will screw up string processing. Note that use of CDATA sections doesn’t help. Only being very lucky about which byte sequences I encounter would avert disaster, and I don’t like writing lucky code.

Enter Base64, which is designed for exactly this purpose: encoding arbitrary binary data into a string guaranteed to consist of only “safe” ASCII characters and decoding that string back to bytes with 100% fidelity later. Microsoft places Base64 functionality under the System.Convert class, not System.Text.Encoding because it’s more of a conversion and translation process, not a direct byte-to-character encoding like those described above.

To read a file into a Base64 string:

public string GetBase64StringFromFile(string myFilename)
{
byte[] fileBytes = System.IO.File.ReadAllBytes(myFilename);
return Convert.ToBase64String(fileBytes, Base64FormattingOptions.InsertLineBreaks);
}

And to decode it and save it back to the filesystem:

public void SaveBase64StringToFile(string myFilename, string myString)
{
byte[] fileBytes = Convert.FromBase64String(myString);
System.IO.File.WriteAllBytes(myFilename, fileBytes);
}

I combined both “real” character encodings and Base64 encoding in my XmlFileEncoder class (attached) to provide unified access to both encodings when working with XML documents. If you know you’re dealing with Unicode, simply call XmlFileEncoder.InsertFileIntoXmlDocument with the encoding UTF-16 and the file will safely be inserted as text. If you don’t always know the file format, or you are dealing with binary files, simply call the same method with the encoding Base64. In either case, a new node will be added to contain your file data and the encoding attribute will record the encoding method so XmlFileEncoder.ExtractFileFromXmlDocument will use the correct character/Base64 decoding automatically.

The demo WinForms app starts up displaying an XML document with some UTF-16 data already encoded into it. Use the controls along the bottom to experiment with inserting differently encoded files (provided, or use your own) using different application character encodings. Some files will clearly look wrong in the XML when you select the wrong encoding. Others may look correct, or almost correct, but when you press the SaveEncodedFile to File button it will report Copy accuracy FAILED when it verifies against the source file. However, the files encoded and decoded using Base64 will always copy accurately. The only downside is that Base64 encoded data is always 1/3 bigger and much less human readable than the source.

You can find the source code here: XmlEncodingDemo.zip

Have a happy, healthy, and correctly encoded 2009!

Configuring Restlet 1.1 with Spring

Monday, September 15th, 2008

Back in May I wrote a post about using Spring with Restlet 1.0.x and promised another about doing the same with Restlet 1.1.x, which provides better Spring support. Well, the time has come, so here goes!

In Restlet 1.1, which is currently at Release Candidate 1, you can now configure an application completely within the Spring application context. This means that the only Java classes you will need to create are one for starting the application, and one for each of the resources in your class. Although the Restlet authors prefer specifying the configuration in code, I like specifying it in an external, non-compiled configuration file, and this provides a great way to do that.

Continue reading»