The Server-Side Pad

by Fabien Tiburce, Best practices and personal experiences with enterprise software

Relational Databases Under Fire

with 4 comments

There is a certain irony to this post.  It’s a bit like a car salesman trying to sell you a bicycle.  My career so far has largely revolved around relational databases.  That is slowing changing however as new storage mechanisms and models emerge and demonstrate they are better suited to certain requirements.  I discuss a number of them here.

1. Distributed file systems.  DFS, out of the box, scale well beyond the capabilities of relational databases.  Hadoop is an open-source distributed file system inspired by Google’s BigTable.  Hadoop also implements MapReduce, a distributed computing layer on top of the file system.

2. Enterprise search servers.  The biggest eye opener in recent years (which we implemented for a public library’s “social” catalogue) has to be Solr.  Solr is based on Lucene and also integrates with Hadoop.  Already in widespread use, this product is poised to gain further adoption as more organizations seek to expose their data (including social data) to the world through searches.  The speed and features of Solr alone sell search servers better than I ever could and quite simply leave relational databases in the dust.

3. RDF stores.  While relational databases are governed by an overarching schema and excel at one-to-many relationships, RDF stores are capable of storing disparate data and excel at many-to-many relationships.  Open source products include Jena and Sesame.  Unfortunately, at the present time, the performance of RDF stores falls well short of relational databases for one-to-many data (most typical in enterprise databases) making its widespread enterprise adoption a long shot.

4. Web databases like this recent (and very quiet) Google announcement on Fusion Tables.  While functionally and programmatically limited compared to other stores, the Google product focuses on rapid correlation and visualization of data.  A product to watch.

Seismic shift in data storage?  Not quite.  But an evolution is certainly under way.  Relational databases are in widespread use.  They are highly capable at storing data  and data relationships, scale reasonably well and are economical for the most part.  Relational databases are not going away.  But the once dominant technology is being challenged by other models that are more capable, more efficient and/or more economical at handling certain tasks.  By evaluating these technologies against your organization’s needs, you may find surprising answers and ROI.

Advertisements

Written by Compliantia

June 12, 2009 at 9:04 am

4 Responses

Subscribe to comments with RSS.

  1. What do you think of RDF2RDB?

    William Mougayar

    June 12, 2009 at 10:18 am

    • Interesting topic William, thanks for bringing it up. I have focused mostly on RDB2RDF. In fact, I registered a number of domains recently, including rdb2rdf.com thinking I might want to build general-purpose conversion technologies down the road. The vast majority of the world’s data resides in the “deep web” and within enterprise environments not open to the web (not the public web anyways. Most companies use VPN to give their mobile workforce access to company resources). This data is mostly stored in relational databases. rdb2rdf makes sense, even at the enterprise level, as it potentially allows database barriers to come down. Most large corporations have lots of data but limited means to access it. Unless programs and facilities are specifically provided, an organization often can’t correlate data from database A with data from database B. RDF potentially changes all that. Of course I am still disapointed by the performance of RDF stores which tend to be one full order of magnitude slower than relational databases at storing and querying one-to-many data (eg: one customer with many orders). That’s not surprising considering RDF stores were not engineered nor optimized for this sort of operation but that is nonetheless the most important type of data most organizations store (organizations don’t store random facts, they store specific attributes about specific entities: customers, suppliers, employees, etc…). To your point, there might be an opportunity to bypass the RDF store API and either convert RDF to RDB or use a DB-backed RDF store and access it through SQL as a typical SQL query. That is certainly doable technically and *probably* faster (if the RDF store is already SQL based) but if you have to do that, why use RDF at all? I am cautiously confident that the performance of RDF stores will increase dramatically once commercial interest develops. These technologies are not that fast today because they don’t need to be. The world’s largest software companies will commit the software engineering resources necessary once there is a commercial interest to serve. Which is why, while semantic technologies might not be completely large-enterprise ready today, I certainly keep them on my radar…

      Fabien Tiburce

      June 13, 2009 at 8:52 pm

      • Fabien, your comments about 1:M performance are reminiscent of RDBMSs a few years ago. When faced with heterogenous queries, all data would be pulled down to the local database instance, and thus queried locally. Any keys or indexes would be ignored! Even today with more intelligent query optimizers, it is sometimes better to split-up joined table queries and run them individually, and then rejoin the subset data locally.

        With the indexing and sorting capabilities of HBase and Lucene, query the disparate data individually then, create a hash lookup locally for the final join: possibly a higher performance, interim, solution?

        Ian Mosley

        June 19, 2009 at 11:58 am

  2. I like this article, as I spend all day focusing on highly scalable architectures. I recently wrote a piece on my site about how Social Media is killing the standard RDBMS … I think you’d like it: http://www.roadtofailure.com

    Bradford

    June 26, 2009 at 2:09 pm


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: