Archive for the ‘Architecture’ Category

Quality, a User Problem

Monday, February 8th, 2010

Bobby had just created something to be proud of, a real article of creativity. He had been working on his song for days, hunched over guitars, computers and microphones. A laborious process, but after he had his result, he was satisfied. It was an artifact worth the effort he had put into it.

The only thing left to do was to get it out to the world. In the music industry this would probably be more painful than creating the music itself, but luckily for Bobby he wasn’t in it for the money – so he does what any other modern musician would do nowadays. He posts it on social networks. Namely, Facebook.

Bobby's Status Update on Facebook

After submitting it (and breathing a sigh of relief), he moves on to browse elsewhere for awhile. In the back of his mind, though, is the constant thought – are people listening? Are people enjoying my work? Creativity is largely a social act, and Bobby is in the final stage of it—validation.

He checks back every few minutes, and watches his update begin its inevitable trudging down the feed wall into oblivion. His work is slowly overtaken with gems such as:

A lame, anonymized status update on Facebook.

This is a tragedy. And Facebook knows it.

The Problem of Worth

The problem here is simple: the gap between the quality of these two posts is very high. One of them is a genuine article of effort – someone truly creating something novel and sharing it with the world (whether your tastes align with the content or not). The other is half-hearted mumblings at best. Even the authors themselves wouldn’t deny the difference in quality.

Facebook has acknowledged this problem recently, with their introduction of the “News Feed” versus the “Live Feed”. The News Feed is meant to be a subset of the Live Feed – the things that are “important”, an algorithm primarily based on number of comments and likes. Both the implementation and the naming are a bit clumsy, but the effort is noted: Facebook understands the noise problem, and that it has only been exacerbated with the prevalence of third-party application notifications. They’ve tried to solve it, but haven’t really nailed a way to determine the quality of a post.

Quality is a User Problem

“Quality”, in this case, is a very nebulous concept. It’s not really measurable in machine terms – this makes it a particularly hard problem to solve by filtering. “Likes” are not exactly newsworthiness, for example. You could even make the argument that this enters the realm of Strong AI—machine intelligence that matches or exceeds a human’s—due to its requirement of very human characteristics like taste and emotion. And Strong AI is a very long way off from reality. This is simply not a problem made for machines to solve.

So if it’s not a machine problem, it’s a user problem. That means we have to rely on the authors, and their readers, to figure out the quality of their posts for us.

One way to implement this is below – a simple “update volume” slider, that asks the user “How loudly do you want to say this thing?”

A mockup of 'status volume' when posting a new update in Facebook.

Depending on the author’s choice (and a few other factors like frequency of posts), the update would be displayed more or less prominently on their readers’ update streams.

The authors themselves aren’t ignorant; by and large they know the worth of their own submissions. Putting the problem into their own hands to solve may seem a bit strange at first, but it provides us with a very valuable data point to begin from: Are your words worth anything?

This is the hard part; this is the user problem. From this bit of data, we can begin to solve the simpler problems programmatically: Is a user being arrogant and loud by posting high-volume content too often? Automatically scale their volume based on post frequency. Are application notifications annoying, but still potentially useful for some users? Set them as the lowest volume by default, but allow a reader to specify individual modifiers on specific Applications. These are all solvable problems once we’ve pushed the intractable one of Quality into the authors’ hands. This single data point provides us a lot of ground to stand on when making otherwise-difficult decisions about the worth of a post.

I’d be interested to see how this would play out in the Facebook ecosystem. It seems a particularly more noisy system than other social streams like twitter, due to third party applications posting junk. In the end, we’ll either need better quality control, or fewer friends.

Big Improvements Coming Soon to WADL

Tuesday, February 3rd, 2009

Marc Hadley, the creator of WADL, has started working on a new version, which he recently described in Draft WADL Update.

My favorite enhancement is that method elements can now have multiple response child elements, and the status attribute has moved from the representation element to the response element.

Continue reading»

Mmmm . . . flaky

Friday, January 30th, 2009

You know you’re a tech geek when the most exciting thing that happened to you during the week is GMail’s new offline mode. What struck me most about the announcement, though, is not that they added offline mode, but that they added “flaky connection mode”. Flaky connection mode attempts to bridge the gap between online and offline. Maybe I’m online right this second, maybe I’m not, so GMail works in offline mode and attempts to quietly synchronize in the background. If I’m online, great, but if my connection bombed, no problem, it will just queue up my changes and try again soon.

As devices get smarter and more portable and more connectable, they’re also going to get flakier. Laptops and the various mobile platforms and smart devices continue to push the envelope of the possible, but they all need to deal with the fact that sometimes you’re on an airplane, sometimes you’re in a tunnel, and sometimes, well, your connection is just plain flaky.

Google has built the most powerful software platform on earth by embracing flakiness. When you perform a search, that request may be farmed out to 1,000 different servers, but the platform doesn’t for one nanosecond assume that all those requests are going to succeed. Failures are expected, and the platform compensates by knowing how to identify them (bad responses, no responses) and work around them (re-send the work to a box that isn’t misbehaving). Flaky connection mode simply extends this server-to-server model down to server-to-client. It’s a recognition that we’ve moved incredible computing power from the tightly controlled confines of the data center and put it out there in the flaky, flaky wild.

When you build internet-scale distributed systems, you should always assume you are in flaky connection mode. Maybe the tubes are down today. Maybe your vendor’s server went down. Even with all the contracts and SLAs and angry phone calls in the world, you fundamentally don’t have any control over that box staying up and reachable when you need it.

I’m working on an application right now which communicates with numerous internal and external web services, each owned by different teams and companies. The business process I need to handle isn’t kicked off by a human, so I can’t ask the user to try again if I hit an error, and I can’t force distributed transactions across these systems, so starting over means I’d duplicate lots of calls.

There were two ways for me to meet the business requirements. The first option was to demand fault-tolerant servers, redundant hardware and software, bug-free software from several different companies, and pixie dust to sprinkle on it all to ensure that nothing ever went down. I went with the second option: expect flakiness and deal with it. Using commodity hardware and software I was able to achieve a very high degree of fault/flake-tolerance as follows:

  1. Break the business workflow down into the most atomic units possible. First I need to acknowledge the job request from System A. Then I need to retrieve the data from System B. Then I need to send it to vendor’s System C. Then I need to retrieve the response from System C and pass it to System D.
    etc.
  2. Treat each of those workflow steps transactionally and durably. Either I have stored the job request from System A in my database and sent my acknowledgment of receipt and updated the job’s status in my database, or I must assume that the step was not begun at all.
  3. Deal with the fact that #2 guarantees I will never miss a step, but it may occasionally cause me to execute it twice. Maybe I did store the job request from System A and send my acknowledgment but I blew up at the very end when I updated the job’s status. If I can’t make that step idempotent, it’s my responsibility to deal with the consequences of duplicate runs. The more atomic I make the steps, the easier this is to do.
  4. Have my system understand the difference between system errors and business errors. Because my app is mainly hitting HTTP REST services, this is relatively easy. 400 series errors
    mean the client (i.e. me) is wrong, and the job is doomed, so I abort it. 500 series errors, garbage responses, or simply no response at all tell me that the server is temporarily acting flaky and I should try again in a little while when I . . .
  5. Have a sweeper process that runs in the background and calls the next step on any job that hasn’t been updated in, say, 15 minutes. No matter where a job died, all this process needs to know is to call the job’s CallNextStep method and it will resume after the last completed step.
  6. Recognize that sometimes, success isn’t in the cards. Let’s say that after 20 tries over 12 hours, I’m going to raise a notification to our operations people that this one was just too flaky for me and they need to resolve it manually.

My application will never have to worry about the machine on which it’s running going into a metal elevator, the neighbor changing his wi-fi password from “linksys”, or the complexity of running on a 1,000 node Google cluster. But the principles of flaky connection mode apply to any application with any degree of inter-system dependency. Designing for flakiness allows you to build highly reliable and fault tolerant systems out of very flaky parts.

RESTful Hullabaloo

Monday, August 18th, 2008

I’ve always wanted to use the word “hullabaloo” in a published piece of writing, and now I can finally cross that off my list of life goals.

There’s been a lot of hullabaloo about REST in the web-tech-o-sphere lately, and rather than sum it all up here, and then add my 2¢, I just want to point out that Tim Bray has already done so, with his usual insight: REST Questions.

When all is said and done, I agree with the moderate view that REST is a very useful architectural pattern, but nobody wins when it’s turned into a religion. There are pros and cons of following REST for a given situation, and they should be considered. No architectural approach is a good fit for all situations.

That said, I have found REST to be an excellent pattern for building web services, and I think it’s simpler — in a good way — than any RPC style (such as SOAP, XML-RPC, or anything homegrown) — after you factor in many factors such as documentation, caching, scalability, and many more.

I also think following REST is generally better than building “plain old XML” or “plain old JSON”, or “plain old anything” web services, because following any architectural style is generally better than following none. But even that’s not always true — there’s sometimes value in building something quick and dirty that just gets a job done, quickly.

Migrating from TFS to SVN

Thursday, December 27th, 2007

Our 180-day evaluation copy of Microsoft Team Foundation Server expired today. We had decided for a variety of reasons not to purchase it (let the blog comments begin!), and to migrate our .NET source code from TFS to Subversion, which we use for all other projects. Naturally we needed the final copy of the source, and any history we could migrate over easily would be gravy, but we decided that for our needs (and not necessarily yours!), it wasn’t worth investing much time or money in preserving all historical details (let the blog comments continue!). While I found lots of tools and blog posts about migrating from SVN to TFS, I couldn’t find anything about going from TFS to SVN without writing lots of code or buying expensive software we’d use exactly once. With a little tinkering I came up with the following method to export source code snapshots by date from Team Foundation Server so that we could commit changes sequentially by week into Subversion.

Yes, it loses check-in and check-out comments, but migrating source metadata would be a lot more work than migrating the source itself, and again, for our needs it was not critical. After all, our code is well organized and self-documenting. Isn’t yours? Hope this is useful for someone else out there. Yeah, it’s a bit manual, but so what? In maybe an hour or two of total development+execution time I was all done, and TFS was retired. But by all means, feel free to automate this further and post your improvements.

  1. Create work folder “source” wherever you want
  2. In the work folder’s parent, create the batch file TFSGet.bat containing the following three lines:
    cd source
    "c:\Program Files\Microsoft Visual Studio 8\Common7\IDE\TF.exe" get "*.*" /all /version:D%1 /recursive /overwrite
    cd ..
  3. Using the File -> Source Control -> Workspaces command in Visual Studio, point your project workspace to the new work folder.
  4. Run batch file with the first date, i.e. TFSGet 09/01/2007 (must use MM/DD/YYYY format)
  5. Use Subversion tools such as TortoiseSVN to add this initial revision to Subversion.
  6. Delete all the source code from the work folder, leaving the hidden .svn folder created by Subversion.
  7. Repeat steps 4 through 6, incrementing the date by day, week, month, etc. until you get to present day.
  8. Back up your Team Foundation Server databases through SQL Server’s maintenance tools, just in case!

Whose Value Is It Anyway?

Monday, December 17th, 2007

I recently completed a software development project for a client. I came up with what I thought was a great design to meet their very unique needs, and I worked with a really smart team who thought deeply about every design decision and produced world-class technology. The only negative apparent to me at the time was that my original time estimate ended up looking like the sales tax on the actual effort.

Big deal. Another late IT project, right?

As I combed through the charred wreckage of my project plan for its black box. I thought about the projects I’ve worked on where my estimates were dead-on, and I thought about the ones where I didn’t see my kids for days at a time. This project gave me a brutal reminder that a software development project’s ability to be estimated accurately is inversely proportional to the amount of new technology involved.

Some development projects are really deployment projects. You aren’t creating anything truly new. You’re mainly just gluing together pieces you’ve built or bought many times before. There’s nothing wrong with that. It might not be a thrill a minute, but it’s the lowest risk way to build software . . . if you can meet the requirements.

And that “if” is key. If all projects were that simple, you’d never do any new development. We’d call ourselves Mashup Engineers, not Software Developers, and projects would last a few hours and launch on time every time, instead of an unknowable number of weeks, months, or years. New mashups might come along that blew everyone’s mind and created new zillionaires (Engadget + Amazon = RidiculousGadgetOfTheMonthClub.com !), but the value added by the people actually coding it would be low. After all, how much better can my glue be than yours?

But most projects require some new development. Let’s say the typical project is 90% familiar to you. You always need to design a database, generate reports, write classes to enforce business rules and security, etc. Because this is such cookie-cutter work, you may feel like you’re not doing “valuable” development, but remember that to your client, value = solved business problem, even if to you, value = new data persistence paradigm that’s 40% more performant in low-memory situations.

This is a fundamental disconnect between developers and humans. We developers want to develop, and it’s hard for us to turn that drive off, even when presented with a solved problem. Hell, especially when presented with a solved problem. Come on, if you had to be 5 minutes late for your 10th anniversary to make a chunk of code that nobody but you ever sees or complains about 30% faster, you know you’d at least be tempted.

And what about that 10% of the project that’s new development? I don’t just mean writing yet another CRUD class for yet another project. I mean “new” as in “novel”. As in “uncharted territory”. As in “this will be fun, but let’s face it, you don’t really know exactly what you’re getting yourself into.”

Even assuming your project is perfectly scoped (scope generally doesn’t creep; it just becomes more clear over time), estimating how long it will take to work with a piece of new technology is the riskiest part of software development. Dependencies come up you didn’t anticipate. The new thing doesn’t work the way you had hoped. Any estimate you give for this work is simply a guess, though there is no question that your client will reap great benefits if you manage to deliver it some day. By definition, if this functionality didn’t exist before, none of their competitors has it!

So to developers, it often seems that 90% of the project is boring and adds little value, but 10% is fun and glamorous and adds great value. Alas, the client values the software differently than you, and they cut the checks. Clients care deeply about being able to get back to business when you deliver the software they need. They do not care if their project is 90%, 95%, 99%, or even 100% boring to alpha geeks. They care about 100% complete and delivered.

The correct approach is thus to keep as much of your project out of that lawless 10% area as you can. Wherever possible, put on your Mashup Engineer hat and reuse existing software, libraries and frameworks until all that remains is the innovation directly necessary to address the client-specific business problem you were hired to solve, and no more. You may find that your project isn’t 90/10 after all. Maybe it’s 95/5. You’ve just reduced your project risk by 50%. Hey, maybe it’s 99/1 and you really will ship on time! The less innovation you do, the better your chances of doing it well when it’s actually beneficial to the client, not just intellectually stimulating to you.

If you know your software is going to be deployed numerous times or expanded immediately in ten different directions, investing the additional time, energy, and neurons up front can pay for itself many times over. But if you are building a system for one client, keep your innovation under control and deliver the value they want to get, not the kind you want to write.

So what really happened on my big late project? It had some fairly unique requirements that couldn’t be addressed with the usual tools in my toolbox, nor could we find much existing software that fit the bill either. We thus felt justified in developing more technology than usual. Still, once you scratch an itch it can be hard to stop. We found ourselves rapidly approaching our intended ship date, but instead of a finished client application, we had a far more flexible and powerful world-class software platform . . . but it was only halfway built. We did eventually ship, and I’m extremely proud of the technology we created, but maybe, just maybe, it was the solution to the much bigger and more abstract problem I wish existed, not the one the client actually had.

On Programmers, Programming Languages, and Frameworks

Tuesday, October 16th, 2007

Derek Sivers writes 7 reasons I switched back to PHP after 2 years on Rails, an admirably plain-spoken article, chock-full of gems about programmers, programming languages, and frameworks. He puts a lot of things into perspective.

Many of the hordes of commenters missed Derek’s point by a large margin. It’s edifying to search the web page for “Derek” and read his responses in the comments.

Sometimes you are truly better off starting from scratch

Monday, August 6th, 2007

Reuse, Reuse, Reuse is a typical mantra these days, leaving me to wonder if any new ideas are ever written. Now of course I’m being a little sarcastic here, but it seems that the call to reuse code and use as much free code as you can get your hands on is the norm these days. I believe that there are very valid reasons to not only start from scratch but building certain code bases in-house.

The reasons why we often opt for prefab code (or previous incarnations) instead of coding from scratch are varied: our urge to rush to market, hesitation to throw away code, etc. Sometimes we feel that it would be a waste to not reuse old code, or even “why waste our time writing code that someone else already wrote?” Some good reasons why you should write from scratch:

  1. Maintenance: Plugging 20 odd libraries from 20 separate sources into a single project can become a maintenance nightmare as they might try to step on each other and may not have been written with the most optimized or clean code.
  2. Adding Features: Adding features to some prefab code bases can make the code unmanageable as hacks get frequently introduced to solve a simple issue.
  3. Scalability: Do you know for a fact that the old code or the plugged in library will scale? Do you have the time to perform a thorough code walk through of the new code?  If you do you might actually have time to write new code.
  4. Use the Right Tool (screwdrivers don’t make good hammers): Using old code (especially from legacy platforms written with older languages, such as non-web languages used in a web platform) will result in trying to leap hurdles that newer languages were designed to perform.
  5. Learning: There’s nothing wrong with trying to figure out how to accomplish a task on your own, in the end you’ll be rewarded with the new knowledge and firm understanding of how the code you are using actually works.

Don’t take these reasons as “you should always” write from scratch, but sometimes it’s just better than using <insert code here> into your next project.

RESTful Revert

Wednesday, June 13th, 2007

One of our goals for this blog is to give a glimpse into how we work, and how we think. In that spirit, here’s a recent discussion that took place in a Trac ticket. The task was to add, to a RESTful web service, a way to revert a resource to a previous version.

Continue reading»

Finally, a REST Book!

Wednesday, November 22nd, 2006

I was very excited to stumble across a link to the book REST Web Services, by Leonard Richardson and Sam Ruby, which is slated to be released in May 2007. It can’t come soon enough for me.

I’ve been building REST web services for about a year now, and I was interested in the style well beforehand. Over the past year, I’ve become increasingly enamoured of the style, and I’ve enthusiastically embraced it as the way to build web services and APIs. I’ve gone so far as to become Arc’s Architect for Services, with the mandate to ensure that all our web APIs follow RESTful principles and design patterns.

The single biggest problem with REST has been a lack of clearly established best practices that an initiate could follow to get up to speed with the style quickly. Until now, someone wanting to create a REST service had to start with the dissertation, then a Q & A, a marital dialogue, and a FAQ – before they were ready to peruse hundreds of messages at rest-discuss. It’s an exhausting series of hurdles to jump over; it’s no wonder that adoption of REST has been slow and uneven.

But lately things have started getting much better, rapidly. There’s now a publicly available example of a best-practices REST API, Blinksale’s. And REST is getting all sorts of high-profile attention. (Interestingly, much of the recent writing has been in the form of dialogue, which isn’t as common as it should be in net-tech circles. Ryan Tomayko should get a lot of credit for kicking that off.)

I have high hopes that this book will firmly establish REST as a practical, effective architectural style, one that’s as mature in its application as in its theory. I’m really looking forward to seeing it!

Now, since we’re finally getting practice cemented down on top of the ideas, it’s time for some better tool support! More on that later.