Archive for the ‘Scalability’ Category
There is a certain irony to this post. It’s a bit like a car salesman trying to sell you a bicycle. My career so far has largely revolved around relational databases. That is slowing changing however as new storage mechanisms and models emerge and demonstrate they are better suited to certain requirements. I discuss a number of them here.
1. Distributed file systems. DFS, out of the box, scale well beyond the capabilities of relational databases. Hadoop is an open-source distributed file system inspired by Google’s BigTable. Hadoop also implements MapReduce, a distributed computing layer on top of the file system.
2. Enterprise search servers. The biggest eye opener in recent years (which we implemented for a public library’s “social” catalogue) has to be Solr. Solr is based on Lucene and also integrates with Hadoop. Already in widespread use, this product is poised to gain further adoption as more organizations seek to expose their data (including social data) to the world through searches. The speed and features of Solr alone sell search servers better than I ever could and quite simply leave relational databases in the dust.
3. RDF stores. While relational databases are governed by an overarching schema and excel at one-to-many relationships, RDF stores are capable of storing disparate data and excel at many-to-many relationships. Open source products include Jena and Sesame. Unfortunately, at the present time, the performance of RDF stores falls well short of relational databases for one-to-many data (most typical in enterprise databases) making its widespread enterprise adoption a long shot.
4. Web databases like this recent (and very quiet) Google announcement on Fusion Tables. While functionally and programmatically limited compared to other stores, the Google product focuses on rapid correlation and visualization of data. A product to watch.
Seismic shift in data storage? Not quite. But an evolution is certainly under way. Relational databases are in widespread use. They are highly capable at storing data and data relationships, scale reasonably well and are economical for the most part. Relational databases are not going away. But the once dominant technology is being challenged by other models that are more capable, more efficient and/or more economical at handling certain tasks. By evaluating these technologies against your organization’s needs, you may find surprising answers and ROI.
Semantic Technologies will Rise from the Limitations of Relational Databases and the Help of Distributed File Systems
As an architect of large enterprise systems, I look to the Semantic Web with envy and anticipation. And yet, the more I look into the potential of semantic technologies, the more I realize semantics are victims of the success of the very technologies they are trying to replace. The semantic web is a network of global relations. Semantic content is not bound by a single database schema, it represents globally linked data. However as an expert in database modelling and database-backed systems, I am forced to concede that, for the purpose of each enterprise, a relational database governed by rules (schema) mostly internal to the organization and serving a certain functional purpose, is often all that’s needed. Semantics are to a large extent, a solution in need of a problem. And yet I am a strong believer in a semantic future, but not for reasons pertaining to semantics per se. While actual numbers vary by database vendor, installation and infrastructure, relational databases are inherently limited in how much data they can store, query and aggregate efficiently. Millions yes, billions no. The world’s largest web properties don’t use relational databases for primary storage, they use distributed file systems. Inspired by Google’s famous Big Table file system, Hadoop is an open-source free distributed file system. It currently supports 2,000 nodes (servers) and, coupled with MapReduce, allows complete abstraction of hardware across a large array of servers, assured failover and distributed computing. While 2,000 servers seems like a lot, even for large enterprise, I am amazed how many enterprise clients and partners are dealing with ever increasing datasets that challenge what relational databases were designed for. Why does this matter? When dealing with millions of files, billions of “facts” on a distributed file system, semantic technologies start making a lot of sense. In fact dealing with universally marked loose content is precisely what semantic technologies were engineered to address. And so I am hopeful. Not that semantic technologies will prevail because of some inherent advantage but that the future points to gigantic datasets of disparate origins, ill suited conceptually and technically to be handled by relational databases. It’s not that semantic technologies are better, it’s that they are better suited for the times ahead.
This post aims to answer some of the questions we frequently get from executives on what we do, the business and process behind information technology in general and software in particular. First a preamble. I don’t expect an executive to understand the intricacies and details of what we do as software engineers and consultants. My job is to understand what an executive requires, what “pain points” might exist in the operation of the business, what opportunities might lie ahead and to devise and implement solutions through information technology. My job is to understand and communicate the nature of the solution, scope it, price it, build it and integrate it. Our primary expertise is software, more specifically custom software.
What is custom software?
You can buy “off-the-shelf” software. Software of this type is often, quite literally, available on a shelf at a computer or electronics store. Other times it is downloaded or procured from a commercial or open source vendor. Most people are familiar with this type of software because of the ubiquitous availability of some well known off-the-shelf software. If you have used Microsoft Office, you have used off-the-shelf software. Custom software is purpose built, or rather purpose “assembled” from readily available and custom-built libraries. You don’t buy or download it ready-made.
Is custom software built from scratch?
Not at all. Today’s application development is more accurately described as application “assembly”. Architects and developers combine readily available libraries and components to meet the business and functional requirements of the system and the needs of the organization. The widespread availability of these (often open-source) components has created a new breed of software development, one that relies on rapid prototyping and frequent iterations. Good developers don’t reinvent the wheel. They use tried and true readily available components, libraries and best practices. They don’t make, they assemble.
Why do I need custom software, can’t I customize off-the-shelf software?
It does depend on the software but in the vast majority of cases, you can, to some degree. Appearances can be deceiving however. Making changes to a large one-size-fits-all software application or platform can often be more expensive than purposefully assembling an application from loose components. The economics of “buy vs build” hinge on the nature of the application. This is why neither should be a foregone conclusion. Always start with your business and functional requirements, initially ignoring what you think is doable, perceived costs and complexity.
Brand “X” off-the-shelf software does 80% of what I need. How expensive will it be to build the remaining 20%?
As I said above, I can’t quite answer this question for each and every situation without further analysis. But I can say this with absolute certainty: a lot longer than you could ever imagine and often a lot longer than the vendor is willing to admit. Nowhere does the 80/20 rule apply more so than in systems. You will meet 80% of your requirements in 20% of the time and budget. Commercial vendors know this and are quick to sell you those features that come to mind. Don’t assume what you didn’t see is easy to get; it isn’t. The remainder will be expensive and difficult because by design, off-the-shelf software is meant to fit most organizations’ needs, not yours specifically. Custom built software has a more predictable and linear complexity curve. While not all features are equal in complexity and scope, building custom software has few or no limitations. Any experienced professional can accurately scope and estimate the time and costs involved in building the features needed.
How do I kick-start a software project?
Every software project needs a mandate. Software exists to serve a business and functional purpose. Elliciting requirements is a job fit for professionals. Any good software consulting organization will put forth experienced individuals in this area. They will meet your stakeholders, will interview current and future users, will seek to understand the current business and functional processes the new piece of software is meant to support, alleviate or replace. From this process, a list of mandatory business and functional requirements emerge. Be specific and get everything in writing. Upon delivery, the software will be passed by user acceptance. User acceptance ensures that the system meets all requirements stated and is fit for deployment.
How do I measure success on a software project?
Sofware should be easy to use. What goes into usability is open for debate but the outcome isn’t. Are your users productive? Do they (the people actually using the software, be it your employees or clients) find it easy to use? Have previously difficult and time consuming tasks become easier and faster? Is the software intuitive? Does it lend itself to experienced users and novices alike (a difficult balance by the way)?. Usability is important. Make sure you work with people who read, think and speak usability. There are other facets, but this one cannot be overlooked.
Software needs to be fast. Give the most patient person in the world a web browser and make him wait 4 seconds and you have a frustrated irate user. Rightly so. Software needs to be fast. People are used to thinking fast. Customers demand highly responsive interactions or they move on. Fast requires proper software engineering and infrastructure. Don’t assume any piece of software can scale. That is simply not true. Principles of scalability must be embedded in the application itself. We listen to every word Google, You Tube and Facebook software engineers have to say because scalability is very much a science that relies on software patterns, design and infrastructure decisions. You may not be as large as Google but scaling down is easier than scaling up. In this regards, there is absolutely no substitute for experience. Don’t hire an organization that hasn’t built something comparable in size or scope. They will learn on the job, they won’t meet your expectations and you will miss your target. Software engineers are worth every penny you pay them. Expensive? Just adopt the agile methodology and ensure most of your dollars are going towards the end product not superfluous management (not that management is superfluous but in agile development, extra process can in fact be detrimental).
Software must be easy to change. If I had to pick one symptom of poorly engineered software I would say, without a doubt, a pattern of “I asked how long it would take to make small change X and they said it would be Y weeks”. The truth is not all software is created equal. Good software is what we call “declarative”. It can be changed easily because only key functions are “hard” coded, the interactions between code modules and functions to actually create processes are “soft” coded, typically in XML or configuration files. If your vendor consistently tells you it will take days and weeks to do simple things, they may in fact be honest but (regrettably) incompetent. Talk to a vendor’s existing or previous clients. Was the software written on time? Did it perform? Were changes easily accommodated? If any of these answers is negative, move on.
Can my IT department write software?
Some can. However most IT departments are usually barely keeping up with the ongoing needs of the business. Freeing up resources to write and integrate complex software is often prohibitive. Another angle is while some IT departments have in house talent able to write software, writing enterprise software is complex and very much a profession in itself. Technical skills and development methodologies are taxing and time consuming to learn and master. A little knowledge can indeed be a dangerous thing. If a system is mission critical and/or will affect your bottom line, leave it to people who do nothing but software development.
How do I choose a vendor?
In operational and logistical areas, the size of the vendor is often proportional to the size of the project. Software is different however. Software scales, not according to the number of people on the team but according to the experience of the engineers who architected it. I once worked for a large consulting organization. The pitch to your new clients was often the same: at the first meeting, they’d bring out the “stars”, the experts. The client was wowed. Clearly this is money well spent they felt. Unfortunately, the contract would be signed, the stars would disappear, never to be seen again, and the client would be stuck with a team of recently hired “B” developers. Projects at these large consulting houses go notoriously wrong and over budget. Not to say that there isn’t a place for them. But when working with software, get to know who you are working with. And keep in mind that small teams do great things.
What vendor would you recommend?
I thought you’d never ask! We at Betterdot Systems practice what we preach. We’re a small company of ultra-motivated highly-experienced software professionals who do great things. Speaking with us is not cheap. It’s free. We want to understand your business and your needs before commitments are made or sought. There are other vendors out there. In fact if we feel that your requirements don’t fit our expertise and skill set, we’ll happily recommend a few. Speak with your peers and ask them about their experience with software vendors. And as I mentioned above, ask to speak with a vendor’s clients. A good vendor has happy clients. Happy clients are willing to talk.
Building scalable systems is a deliberate, purposeful and complex undertaking. Architected correctly, following certain design principles, a system will scale (handle a high level of concurrency and throughput), albeit with some effort. On the other hand, systems which were not designed to scale often end up being expensive and unreliable when stressed. So unless you are building a small departmental application, it’s often best to sow the seed of scalability right into the core technologies, the base architecture of your application.
So how do you scale?
1. Identify a few design principles and guidelines before you start architecting. Google made the assumption that no single piece of the application could be deemed 100% reliable. What you think is bullet-proof will eventually fail. Knowing this forces the architect to handle failures gracefully and build fall-back systems. This principles applies to software, hardware and network components.
2. Avoid single points of failures. Application serves are load-balanced. Load balancers and switches are paired. Databases are setup using replication/pooling and/or partioning. Should any given machine fail, the traffic is automatically rerouted to other components.
3. Build your system as a set of discreet components. Each components hides its complexity (and error handling!) to the rest of the system. A component can run on all or some machines. For example, background jobs can be handled by a thread pool server(s) instead of bogging down an application server used to serve user requests. Please note that using property and descriptor files, a single application can be instantiated in a number of ways declaratively so the developer does not have to build several applications.
4. Automate your unit and load testing.
5. Build introspection capabilities into your production system. The worst thing about a bug is not being able to understand it. Threshold based introspection capabilities must be built into systems so a developer can understand what is happening, who is using the system, what resources are being used, etc…
In everyday life, 99% of something is often just as good as the whole thing. 99% of a hot dog will fill you up. Not so in systems. An application with 99% availability means it will be unavailable on average 1% of the time or 87 hours a year. Downtime is typically not evenly distributed. In fact is is more likely to occur when your systems are stressed, during peak time. For this reason, availability is generally stated in degrees of the 99th percentile, typically from 99.99% to 99.999%. A business should consider how many 9’s it needs and how many it can afford. How many 9’s are needed is usually dictated by the cost and the effect of down time on the business. Will down-time cause damage deemed a) catastrophic and irreparable (eg: loss of life for a medical application), significant (eg: loss of income for an e-commerce site) or merely inconvenient (loss of productivity and frustrated customers for a call center)? While no business wants downtime, 99.999% availability is exponentially more expensive than 99.99% availability. Additional uptime requires expensive investments in software, hardware and networking infrastructure, including load balancers and session replicators. Lastly, any site or application that claims 100% uptime should be treated with extreme caution. No system is perfect. No operator or system administrator is perfect. Even with hot deployments enabled, you sometimes need software updates and planned maintenance windows. A business is often better off expecting “only” 99.999% availability and having contingency measures in place than expecting, paying for and never getting to the ellusive 100% mark.