Thursday, May 27th, 2010

MongoNYC

By Karl Guertin

If you’re keeping up with emerging industry trends, you’ll be familiar with the non-relational database (aka NoSQL) movement. If you aren’t keeping up, the basic idea is that traditional databases don’t scale well on commodity level hardware once you get enough traffic to go beyond what a single DB server can handle. By tweaking/removing design constraints, NoSQL databases (there are well over a dozen) offer one or more features to scale horizontally on commodity servers. This is mainly a problem for very large sites, but most developers have at least some interest in the problem since every startup hopes to become one of the very large sites.

At Arc90, there are only a few areas that would really benefit from the performance of a non-relational database, but one of the things I enjoy most about working here is the general enthusiasm for new technology. The developers here have been independently tracking the whole NoSQL movement for a while and a number of us have actually implemented personal projects in both MongoDB and CouchDB and were pleased with the results. That made the decision to go to the one-day MongoNYC event just a few blocks downtown an easy one.

The conference itself was put together well. My main gripes were the typically terrible tech conference wireless connectivity, and, more importantly, that the hallways were too cramped to really facilitate discussions.

The best talk I attended was Kyle Banker’s Schema tutorial. The discussion wasn’t about forcing schemas into a schemaless database, but rather a tutorial on the different ways to store relations between objects and the benefits and drawbacks to the various approaches. I definitely recommend watching it when they release the video online, but in the meantime you can get a more focused take on the concepts by checking out Kyle’s post on MongoDB and E-commerce.

The other highlight was getting in early to the Gilt talk on real time analytics. The talk started at noon, which is when Gilt has their daily sale, and they had their Hummingbird analytics app running on the main screen. When Hummingbird first came out, I was wondering who would need an analytics app that updated 20 times per second? Watching the traffic go from under 100 to over 5000 requests/sec in the course of about 2 minutes was both the answer to my question and very amusing.

All the 10gen speakers were coherent and well spoken, though some topics were drier than others. The other talks varied but my cynical summary is: “We’re using Mongo! It was easy! It runs fast! No, we’re not using it everywhere. No, we’re not using it for financial transactions.” That isn’t meant as a slight to the speakers, just that there were no horror stories. Good for databases, but it makes for a less entertaining conference.

So, at the end of the day, my outlook on Mongo improved a bit. The indexes are more powerful than I’d realized since they go into list values and objects on indexed attributes. I’d also like to see support for couch-style map/reduce views that get updated on insert, but in general, Mongo has always been the NoSQL database that I thought made the smartest tradeoffs for web server use.

2 Responses

  1. Corey said:

    Hey, Karl.

    I’ve been looking at Mongo among others lately, after hearing one of their founders on the techzing podcast. I’m still struggling with concrete examples of when you would use a NoSQL db. Was this covered? Any links to share?

    Thanks.

  2. Karl Guertin said:

    Depends what you mean by concrete examples. More or less every company presenting at the conference started with a relational database and shifted to Mongo, usually because of performance so when they do release the videos for the conf, you could get specifics on why they chose to do so.

    I personally would prefer to use a document-oriented database for most applications on the web, which tend to be built from entities that have a core chunk of data (e.g. blog post, user profile) having a number of related properties (title, tags, url) and a minimal need for ad-hoc querying (retrieve comments). The big win is the ability to persist lists/subobjects without having to create tables, perform joins, perform additional queries, etc. This describes a large class of web applications: mail, news feeds, blogs, forums, bug tracking, CMS, etc.

    There are certainly domains where I don’t think the model is as natural: sales, CRM, banking, business reporting. The reporting is actually the biggest deal on this list since virtually every application for business wants some sort of analytics to see how well X is working. The closer your app resembles a relational domain like these, the less sense it makes to use a document DB (though it might fit an object db very well). It’s certainly doable, just like you can map list attributes to a relational model, but it’s not natural. Also, if you’re doing the common pattern of integrating a bunch of apps on the database, then it doesn’t make business sense to use something else unless it makes financial sense to migrate everything over, which it almost never does.

    The key to deciding between the two comes down to how you choose to model your domain, and that’s where Kyle’s talk was particularly helpful. There’s nothing keeping your app(s) from using both a SQL database and a non-relational one at the same time and, in fact, most of the people presenting were doing just that.

    I suspect that this isn’t what as concrete as you’d like, but it really does depend–both on the data and on your environment. This applies even after you’ve chosen to go document oriented. For server web apps, I really like Mongo. I think they made all the design tradeoffs in favor of running normal internet apps. If, on the other hand, I was going to make some sort of contact database for a small business, I’d probably go Couch so people could run it on their laptops and sync back and forth.

    Sorry for the late response, this got eaten by the spam filter when I posted it on Friday. No clue why.

Leave a Comment