our blog

11 / 17 / 2011

My Quest to Define “NoSQL”

 

“NoSQL”. For the last couple of years it’s been an increasingly hot buzzword in Cloud application discussions. This new trend in databases brought with it unprecedented data processing speeds and flexibility to traffic-intensive websites and services.

But what are the specifics? What is NoSQL, really? The term as it is most commonly used has only been in the common lexicon since 2009, after all. I knew it was a label for technologies offering an alternative to the relational database approach that has dominated for decades, the latter championed by well-known systems like Oracle, SQL Server, and MySQL. I knew that NoSQL encompasses exciting new database technologies like MongoDB, Hbase, and Cassandra. I also knew that some of the concepts and technologies under the NoSQL umbrella have been around for ages, niche approaches whose time in the mainstream simply hadn’t come. Beyond that, it was a nebulous term existing in an industry overflowing with products, technologies, and concepts.

However, not being able to concisely define an oft-used term is like having a nagging splinter in my head.

Thus my quest for an end-all definition began. How hard could it be? Light digging betrayed a common tendency to view NoSQL as a saint-like savior from the “oppressive nature” of SQL and relational databases. Freedom from third normal form! Freedom from the tyrannical chains of database schemas! Freedom from associative entities, unintended Cartesian products, foreign key constraints, gox box socks and yill-iga-yaks and everything else “the man” imposed on us all! A veritable revolution!

But there had to be more to it. Surely NoSQL cannot be defined simply in terms of what it’s not? Could a restaurant’s entire menu be defined as “steak and not-steak”? How exactly does one order a “not-steak”? What am I truly implementing with a NoSQL database? What is the concrete, set-in-stone definition of NoSQL?

After much contextual hand wringing, the answer is…still surprisingly vague. Wikipedia, the world’s leading bastion of irrefutable wisdom (at least considered as such for as long as it is convenient for the writing of this post), currently defines NoSQL as:

“…a broad class of database management systems that differ from the classic model of the relational database management system (RDBMS) in some significant ways. These data stores may not require fixed table schemas, usually avoid join operations, and typically scale horizontally.”

In other words, NoSQL refers to a slew of disparate DBMS types that are somehow different from the classic aforementioned mainstream relational varieties we all know.  Beyond that, it’s a swamp marsh tangle of “some” and “may” and “usually.” As if these waters weren’t muddied enough, there are those who insist that NoSQL databases cannot even be considered to be truly non-relational.

Oh, and there’s more. Let’s take this to Hudson River levels of murk: consider that NoSQL doesn’t even truly mean “no SQL”…since NoSQL has no unifying language standards, SQL syntax could be a perfectly valid interface to a NoSQL structure. And in fact, a language called “UnQL” is being developed as a standard query language for NoSQL environments. Ironically, and perhaps to the chagrin of the anti-SQL crowd, UnQL’s syntax is largely based upon…

…wait for it…

…SQL. It should then come as no surprise that “NoSQL” is generally now considered to stand for “Not Only SQL”, as opposed to the more intuitive (and perhaps more popular amongst more militant circles) interpretation.

This was not working. My simple search was clouded and so polluted by now that anyone smoking near this toxic confluence of obfuscation would be in danger of triggering the world’s largest three-eyed fish fry.

By this point it’s pretty clear that the name “NoSQL”, while catchy and providing quite the hook with its controversial implications, is simply not very accurate or meaningful. It’s not descriptive of any particular database technology or concept. Rather, the name is representative of a group of database technologies and concepts truly bound in sharing only that which they are not. The “not-steaks”. In other words, my quest ended up right where it began.

So for the sake of my own sanity, with the highest certainty of disagreement from some contingent or another, I am defining NoSQL as any database system that by design does not encourage the traditional concepts of Edgar F. Codd’s relational model or ACID properties. Or, in more laymen’s terms, any database system that lets one layout and extend data with high flexibility and without being overtly concerned with the data integrity and consistency considerations of classic relational database design.

I’ll just stick with this. Vague? Sure. Debatable? Absolutely. At least it’s concise. It’s also for the most part the same ambiguous definition that is typically offered up for the term. I must concede defeat to the extent of being unable to put a smaller box around the concept of NoSQL. Now if I can just embrace the vagary, perhaps that splinter in my head will go away.

 

2 Responses

  1. Cross says:

    Nice post Andy, I’m doing the same question over and over, I’m creating my own “NoSQL” solution keeping in mind that I want to achieve two things that traditional SQL solutions don’t offer directly: Unstructured data and exceptional performance. I think the ACID attributes need to be preserve if you are building anything, or can the blogspot say “oh sorry, the post you spend 3 hours writing was not found, do it again”?
    I think “unstructured data” solves one of the big problem of the modern applications, a lot of applications need to breakdown the information posted on a website into several small pieces of information, and those pieces that means nothing if they’re alone, then why to break them down? because the RDMS enforces this unnecessary overhead. It’s better to keep information as it was typed and avoid this transformations (but traditional SQL DBs does not offer a nicer way to retrieve the data if they’re not broke on columns).
    The exceptional performance is a half-truth experience, you cannot blame the database for a poor performance, most of the DBs do a very nice work on saving and retrieving the data, and most of the performance problems rely on a poor design, BUT… the transformation I referred to is the key of this problem.
    Thanks for your effort to create a solid definition, it’s really hard to explain a concept saying “what is not”, as u stated “not steak” is a lot of thinks and “Not only SQL” could means a Car, airplane… or a steak.

  2. Andy says:

    Thanks for the feedback, Cross. I think the issue is largely due simply to the complete and utter dominance that Codd-model relational database systems have enjoyed for so long. They’ve become so ingrained into the collective mind of the data management world that the very concept of NOT being based on such a model seems so initially earth-shattering as to become a defining characteristic…despite the huge variance, both in terms of purpose and approach, between all of these NoSQL technologies.

    It will be interesting to watch the dust settle over the next few years, as various technologies win out over others and some semblance of a standard (or multiple standards) emerge.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>