Posts Tagged ‘lucene’
There is a certain irony to this post. It’s a bit like a car salesman trying to sell you a bicycle. My career so far has largely revolved around relational databases. That is slowing changing however as new storage mechanisms and models emerge and demonstrate they are better suited to certain requirements. I discuss a number of them here.
1. Distributed file systems. DFS, out of the box, scale well beyond the capabilities of relational databases. Hadoop is an open-source distributed file system inspired by Google’s BigTable. Hadoop also implements MapReduce, a distributed computing layer on top of the file system.
2. Enterprise search servers. The biggest eye opener in recent years (which we implemented for a public library’s “social” catalogue) has to be Solr. Solr is based on Lucene and also integrates with Hadoop. Already in widespread use, this product is poised to gain further adoption as more organizations seek to expose their data (including social data) to the world through searches. The speed and features of Solr alone sell search servers better than I ever could and quite simply leave relational databases in the dust.
3. RDF stores. While relational databases are governed by an overarching schema and excel at one-to-many relationships, RDF stores are capable of storing disparate data and excel at many-to-many relationships. Open source products include Jena and Sesame. Unfortunately, at the present time, the performance of RDF stores falls well short of relational databases for one-to-many data (most typical in enterprise databases) making its widespread enterprise adoption a long shot.
4. Web databases like this recent (and very quiet) Google announcement on Fusion Tables. While functionally and programmatically limited compared to other stores, the Google product focuses on rapid correlation and visualization of data. A product to watch.
Seismic shift in data storage? Not quite. But an evolution is certainly under way. Relational databases are in widespread use. They are highly capable at storing data and data relationships, scale reasonably well and are economical for the most part. Relational databases are not going away. But the once dominant technology is being challenged by other models that are more capable, more efficient and/or more economical at handling certain tasks. By evaluating these technologies against your organization’s needs, you may find surprising answers and ROI.