The Server-Side Pad

by Fabien Tiburce, Best practices and personal experiences with enterprise software

Semantic Technologies will Rise from the Limitations of Relational Databases and the Help of Distributed File Systems

with 4 comments

As an architect of large enterprise systems, I look to the Semantic Web with envy and anticipation.   And yet, the more I look into the potential of semantic technologies, the more I realize semantics are victims of the success of the very technologies they are trying to replace. The semantic web is a network of global relations.  Semantic content is not bound by a single database schema, it represents globally linked data.  However as an expert in database modelling and database-backed systems, I am forced to concede that, for the purpose of each enterprise, a relational database governed by rules (schema) mostly internal to the organization and serving a certain functional purpose, is often all that’s needed.  Semantics are to a large extent, a solution in need of a problem.  And yet I am a strong believer in a semantic future, but not for reasons pertaining to semantics per se.   While actual numbers vary by database vendor, installation and infrastructure, relational databases are inherently limited in how much data they can store, query and aggregate efficiently.  Millions yes, billions no.  The world’s largest web properties don’t use relational databases for primary storage, they use distributed file systems.  Inspired by Google’s famous Big Table file system, Hadoop is an open-source free distributed file system.  It currently supports 2,000 nodes (servers) and, coupled with MapReduce, allows complete abstraction of hardware across a large array of servers, assured failover and distributed computing.  While 2,000 servers seems like a lot, even for large enterprise, I am amazed how many enterprise clients and partners are dealing with ever increasing datasets that challenge what relational databases were designed for.  Why does this matter?  When dealing with millions of files, billions of “facts” on a distributed file system, semantic technologies start making a lot of sense.  In fact dealing with universally marked loose content is precisely what semantic technologies were engineered to address.  And so I am hopeful.  Not that semantic technologies will prevail because of some inherent advantage but that the future points to gigantic datasets of disparate origins, ill suited conceptually and technically to be handled by relational databases.  It’s not that semantic technologies are better, it’s that they are better suited for the times ahead.

Written by Compliant IA

June 3, 2009 at 10:04 pm

4 Responses

Subscribe to comments with RSS.

  1. Great observations. I think you have really nailed it in a none geeky clear eyed view of the current point in the transition to semantic technology. Relational database became the technology of choice for the enterprise because of the need for ACID, but like any technology it has it’s limitation that over time the marketplace has tried to address. You are absolutely correct in a catalyst (a gigantic datasets of disparate origins) to cause the enterprise to start the transition.


    June 4, 2009 at 2:01 pm

  2. My company will soon introduce a product to provide large-scale, high-speed RDBMS features with ACID capability, so your observations on the size of datasets and the need for ACID are nice to hear!

    Matthew Fowler

    June 8, 2009 at 6:33 am

  3. […] to databases.  He brought up the intereesting notion of Web 3.0, the Symantec Web, and its take on data and context: specifically how data will be sourced intelligently, and be referenced from its original […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: