compliantia.com – Technology
This article addresses the technology stack and infrastructure of the compliantia.com service.
As a hosted service, Compliantia needs to be completely configurable, highly secure and scaleable. Compliantia is part of a relatively new breed of software known as Software as a Service (Saas). Gmail and Google Apps have popularized this type of software that runs on the vendor’s infrastructure. From a client’s perspective, SaaS is a business play not a technology one. This essentially makes Compliantia a turn-key service, at least from a technical standpoint. This greatly reduces the risk, costs and barrier of entry. It also allows the organization to focus on what it does best: running its business. Compliantia gives organizations the tools to be operationally succesful.
Every retailer operates in a slightly different way. The “look” of a certain retail brand, its programs and promotions are often what makes the brand stand out in a consumer’s mind. Retailers must also address the regional specificities of the various markets in which they operate and account for different store formats, different demographics, etc… In other words, operational compliance does not entail a monolithic set of criteria that apply to all stores and all store formats in all regions. Compliantia allows forms to be “narrow-casted” to certain stores and certain regions. The forms themselves are data driven and the form narrow-casting is also data-driven.
The need for an entirely data-driven architecture really shaped the selection of the technology stack. The stack itself needed to be configurable and have a proven track record in the enterprise as a scaleable and secured platform. Having worked with server-side Java and large enterprise and web-based systems for over 13 years, the decision was easy. We opted for Java, Spring, Hibernate and Grails. Thread pools, load-balancer ready, distributed cache, template engine, mail, data exports, security…Spring and Grails have so much built-in, enterprise-grade, highly available, configurable and scaleable. These technologies are at the core of the Compliantia stack.
The database is PostgreSQL. Postgresql is truly open source and not commercially-owned (unlike MySQL which is now owned by Oracle). More importantly, its performance under load is excellent and the database is programatically rich. Postgresql has built-in replication and pooling so you virtually can’t outgrow it.
The operating system is Linux. The flexibility and configurability of the Linux ecosystem continues to amaze us. Deployment is entirely scripted and automated. Ports and access is precisely controlled. Linux is a sys-admin’s dream and continues to meet our every need for web-based applications.
Last but not least, compliantia.com runs in the Rackspace cloud. Cloud computing allows us to provision capacity on demand, gives us the flexibility we and our customers need and the ability to add server instances as we grow. Rackspace has become one of the largest providers of managed and cloud services by combining a world class distributed and secured infrastructure with fanatical customer service which we experienced first hand.
I’ll briefly mention the usual suspects to round up the technology discussion. Compliantia uses SVN from beanstalk for source control wherever we are, fogbugz for bug reporting and pingdom to monitor our service’s uptime and performance.
Compliantia runs on a modern, distributed, highly-available and scaleable technology stack and cloud infrastructure. It was designed to meet the needs of large national retailers on the web, the BlackBerry™, the iPhone™ and other mobile smart phones.
Thanks for reading. Please visit the Compliantia product website for more information on our innovative store walk-through service.
Fresh Thinking
I really enjoyed attending a FreshBooks workshop today, held by Mike McDerment, FreshBooks’ co-founder and CEO. FreshBooks is the leader in online billing, a rapidly growing category that FreshBooks essentially invented and is continually perfecting. The workshop, Building a Web App Business, addressed the pre-launch phase any web startup faces and covered topics ranging from building to marketing, product management, metrics and financing. I literally took ten pages of notes in four hours. Mike is a wealth of information, candid with his answers and refreshingly transparent. The session was truly inspiring and I will be drawing on today’s take-aways for some time. In the meantime, I wanted to wet your appetite with a few tidbits, a number of quotes from Mike McDerment and the context in which they were given. If you find them as thought-provoking as I did, you really owe it to yourself to get to know FreshBooks and perhaps attend a future workshop.
Freshbooks “We get paid to be an aspirin, a pain killer”.
Launch features “Build the least, you’re in a vacuum”. “You need to build the minimum number of features you need and engage people using it [the service]“.
Feedback “Make it easy for people to give it to you”. ”Make it easy for people to reach you, by telephone and email”. “You want to remove barriers”.
Ownership “I want people to act and behave as owners”.” Don’t stack chances against yourself by being a control freak or by putting up with one”.
Founding team “Trust, honesty, loyalty, openness”. “Passion is your fuel”. “If you call them at 3AM, are they going to answer the phone?”.
Lawyers “Lawyers get paid”. “Lawyers are risk reducers”.”If the dark clouds come, things break down gracefully”.
Office space “You do not need an office just because you have a business”. “Rent will increase your burn”.
Categories “Don’t underestimate the power of categories”. “What makes you unique?”.
Choosing a name “Easy to remember. Easy to spell. Describes the category. Describes the benefit. Describes the difference”. “A name is a vessel, you can build a lot of meaning into it”.
The story “Your story is going to influence your strategy more than you know”.
The homepage “The home page must answer: what is it? who is it for? why it matters”.
Blogs “Outspend or out-teach”. “If you don’t have something interesting to write, don’t write”. “You want to build a network “. “Start sharing and teaching”.
PR “PR is a pretty hard thing to outsource”.
Support “Everyone [at FreshBooks] does support”.”When you are building a business, you are building a culture”.”Post your phone number” .”Use a forum, use twitter, be everywhere”.
Usability “If you want to be humbled, watch somebody use your application. You will be floored, you will be shamed”.
Surveys “If you want stats, use an online survey, if you want insights, call people”.
Decisions “You are editor/curator”. “You can’t please all people all the time”. “If you do support, you’ll know”.”Remove the pain, stay true to vision”.
Funding “You need to know your formula before investing capital at the top”.”When you know your users better than anyone. When you have a formula for the $. When you’ve got traction. When you know your market size”. “Angels do 27 times more investing than VCs”. “Can you increase your share price by more than dilution?”.
Competition “I never think about competition”.” I don’t believe customer service is going to get out of style”.”If you only look at your competitors, you are only reacting, not leading”.
Startups “You do stuff in a way a big company cannot”.
Relational Databases Under Fire
There is a certain irony to this post. It’s a bit like a car salesman trying to sell you a bicycle. My career so far has largely revolved around relational databases. That is slowing changing however as new storage mechanisms and models emerge and demonstrate they are better suited to certain requirements. I discuss a number of them here.
1. Distributed file systems. DFS, out of the box, scale well beyond the capabilities of relational databases. Hadoop is an open-source distributed file system inspired by Google’s BigTable. Hadoop also implements MapReduce, a distributed computing layer on top of the file system.
2. Enterprise search servers. The biggest eye opener in recent years (which we implemented for a public library’s “social” catalogue) has to be Solr. Solr is based on Lucene and also integrates with Hadoop. Already in widespread use, this product is poised to gain further adoption as more organizations seek to expose their data (including social data) to the world through searches. The speed and features of Solr alone sell search servers better than I ever could and quite simply leave relational databases in the dust.
3. RDF stores. While relational databases are governed by an overarching schema and excel at one-to-many relationships, RDF stores are capable of storing disparate data and excel at many-to-many relationships. Open source products include Jena and Sesame. Unfortunately, at the present time, the performance of RDF stores falls well short of relational databases for one-to-many data (most typical in enterprise databases) making its widespread enterprise adoption a long shot.
4. Web databases like this recent (and very quiet) Google announcement on Fusion Tables. While functionally and programmatically limited compared to other stores, the Google product focuses on rapid correlation and visualization of data. A product to watch.
Seismic shift in data storage? Not quite. But an evolution is certainly under way. Relational databases are in widespread use. They are highly capable at storing data and data relationships, scale reasonably well and are economical for the most part. Relational databases are not going away. But the once dominant technology is being challenged by other models that are more capable, more efficient and/or more economical at handling certain tasks. By evaluating these technologies against your organization’s needs, you may find surprising answers and ROI.
Semantic Technologies will Rise from the Limitations of Relational Databases and the Help of Distributed File Systems
As an architect of large enterprise systems, I look to the Semantic Web with envy and anticipation. And yet, the more I look into the potential of semantic technologies, the more I realize semantics are victims of the success of the very technologies they are trying to replace. The semantic web is a network of global relations. Semantic content is not bound by a single database schema, it represents globally linked data. However as an expert in database modelling and database-backed systems, I am forced to concede that, for the purpose of each enterprise, a relational database governed by rules (schema) mostly internal to the organization and serving a certain functional purpose, is often all that’s needed. Semantics are to a large extent, a solution in need of a problem. And yet I am a strong believer in a semantic future, but not for reasons pertaining to semantics per se. While actual numbers vary by database vendor, installation and infrastructure, relational databases are inherently limited in how much data they can store, query and aggregate efficiently. Millions yes, billions no. The world’s largest web properties don’t use relational databases for primary storage, they use distributed file systems. Inspired by Google’s famous Big Table file system, Hadoop is an open-source free distributed file system. It currently supports 2,000 nodes (servers) and, coupled with MapReduce, allows complete abstraction of hardware across a large array of servers, assured failover and distributed computing. While 2,000 servers seems like a lot, even for large enterprise, I am amazed how many enterprise clients and partners are dealing with ever increasing datasets that challenge what relational databases were designed for. Why does this matter? When dealing with millions of files, billions of “facts” on a distributed file system, semantic technologies start making a lot of sense. In fact dealing with universally marked loose content is precisely what semantic technologies were engineered to address. And so I am hopeful. Not that semantic technologies will prevail because of some inherent advantage but that the future points to gigantic datasets of disparate origins, ill suited conceptually and technically to be handled by relational databases. It’s not that semantic technologies are better, it’s that they are better suited for the times ahead.
Helping Machines Read, A Simple Microformat Case Study
I recently made Betterdot’s Contact Us page both human and machine readable by adding hCard microformat markup to the underlying XHTML. This notion of “machine readable” content is arguably abstract and somewhat obscure however. What do we mean? What do machines see? Perhaps a picture (or three) are worth the proverbial 1,000 words.
When a human reader, using a web browser, looks at the page, he or she sees this:

Contact page, as seen by human readers
Without semantic markups such as the hCard microformat markup, a machine (for example a Google bot crawling the Betterdot site for indexing) sees this:

Contact page as seen by machines (no microformat markup)
With semantic markups such as the hCard microformat markup, the same machine or bot sees this:

Contact page, as seen by machines with microformat markup
In Layman’s terms, microformats help machine “read” data marked up with microformat tags on the page. While “reading” falls short of true semantic “understanding”, microformats are certainly a step in the right direction.
The Road to the Semantic Web is Paved with Microformats
Google recently and quietly announced something huge, “rich snippets”. Rich snippets are smart previews, displayed right on a search results page. While Google has long relied on snippets to attach a bit of information to each link (thus letting the user know what he or she might expect on each page represented by a link), rich snippets go a step further: they extract key characteristic of the page, be it a rating of a review or a person’s contact information. Google doesn’t have to guess it, it knows it. Google’s rich snippets are powered by microformats and RDFa, two semantic standards that are rapidly gaining adoption. Google’s implementation allows semantically-marked web content (such as reviews and contact information) to be exposed, aggregated and averaged in a Google search results page. In short, after years in the lab, the web is at last, albeit quietly, becoming semantic!
Microformats are not a substitute for the semantic web, they are a stepping stone and a very important one. They demonstrate the feasibility and value of adding semantic meaning to web page content. They do so using existing browsers and standards. They do so today, in the field not in the lab. By making web pages understandable to both humans (also known as readers…) and machines, using current technologies, current browsers and minimal effort, microformats allow web content to be reliably understood and aggregated by search engines. The future is bright. Google could, for example, calculate an average review for a book from a list of semantically compliant sites. Google could also uniquely identify a user as a single human being across sites. The semantic web, a web of meaning, is finally taking shape.
I am convinced the semantic web is going to change the way we publish content, exchange, correlate and aggregate information, both in the public domain and the enterprise. It’s an exciting time for web professionals who can look forward to building companies and next generation systems that leverage semantic data.
![]()
In Toronto and interested in the semantic web? Join us at the Toronto Semantic Web group on LinkedIn.
Evangelists, The Semantic Web Needs You!

First, a confession. What started as a curiosity, has turned into a bit of an obsession… Artificial intelligence, natural language processing, data interchange, global ontologies are all, directly or indirectly, facets of the semantic web. There is enough in there to excite the geek in me for three life times and there lies the problem… Let me take a step back.
In broad terms, the semantic web refers to a global web of unequivocal meaning, that can be used and queried by machines, programs and ultimately user-facing applications. In equally broad terms, this amounts to turning loose data (words on a page, with no meaning other than their proximity to other words which can be counted, similarities inferred, etc…) into information (meaning, purpose and inter-operability). Micro-formats asides, words like ISBN or UPC on most web sites are just that, words. They mean nothing, they are not tied to the same universal concept and the words that precede or follow them (which usually is an actual ISBN or UPC code) are not linked to the same resource. The web was built for people, not machines. People scan a page and quickly understand the purpose of the page and the meaning of captions, buttons and other elements on the page. On the other hand, the semantic web refers to a collection (the web is the largest collection of human knowledge ever assembled) understandable to machines. While user-generated tags and meta-data exist, these alone are generally insufficient to be used predictably and reliably by computer programs. XML is widely used around the web but XML schemas (XML contracts which govern the structure and content of XML documents) are often attached to a single document, a single service or a single organization. This point alone gets to the root of the problem: without the semantic web, there doesn’t exist a single, universally accepted way of specifying a person, a UPC code, a financial service or a purchaseable item. The fact that product “A” on site x and product “A” on site y are the same product is established by humans (by comparing brands, labels, model numbers, pictures), it cannot be conclusively and reliably determined by a computer program. Lastly, while search engines have bridged this gap somewhat, short of a complete Artificial Intelligence system, the information on the web will remain in unstructured data form until technologies like the semantic web become prevalent. In conclusion, the semantic web, a term coined by Sir Tim Berners-Lee and spearheaded by the W3C, seeks to attach meaning to page content so this content can be consumed, queried and inter-related by machines. From the largest collection of text in the world, the internet would be elevated to the largest collection of inter-related, meaningful information in the world.
The semantic web is generally believed to be the next version of the web. Whereas Web 1.0 was about basic publishing, Web 2.0 is social, Web 3.0 is expected to be semantic. Yet for all the promises, its ascension remains clouded with doubts and hindered by real world impediments. The semantic web is a technology of the future that, until now, has remained in the future. On paper, all the required building blocks are here. Standards (W3C recommendations) have been published, parsers, query-engines and core-technologies are available and so are global open-source ontologies. What’s missing?
The “social web” is largely being promoted and evangelized by a combination of online marketing and user-experience professionals. Evangelists are tremendously important in spreading the word and encouraging adoption. On the Toronto scene, Web 2.0 evangelists like David Crow, Matthew Milan and Saul Colt come to mind. And yet the semantic web community hasn’t really reached out to Web 2.0 professionals in general. The conversation mostly revolves around the back-end, infrastructure and core technologies. The semantic web talks about schemas, objects and relationships. It talks about machine languages and parsers. It does not directly address the user experience (although its ultimate goal is just that). To succeed, the semantic web needs to leave the lab and the research department. It needs to make itself palatable to early adopters and would-be evangelists. It needs a business plan, promoters and supporters. It needs to reach out, inform and excite the web 2.0 community. Why bother? While the first iteration of a semantic ecosystem will most likely focus on the “back-end” (similar to back-end-centered Web 1.0 followed by user-centered Web 2.0), this will likely be followed by a second iteration of user-centered services, heavily skewed on the user experience and powered by semantic web data. While the web does a lot today, imagine the capabilities of a web 4.0 front-end powered by a semantic web back-end. The potential is mind boggling. Let’s go semantic, if you catch my meaning
Resources:
W3C semantic web homepage: http://www.w3.org/2001/sw/
Wikipedia on semantic web: http://en.wikipedia.org/wiki/Semantic_Web
Sample concept from open-source ontology for semantic web (in human readable format): http://sw.opencyc.org/concept/Mx4rvVi1AJwpEbGdrcN5Y29ycA
Open source (created by HP, java-based) semantic web toolkit: http://jena.sourceforge.net
![]()
In Toronto and interested in the semantic web? Join us at the Toronto Semantic Web group on LinkedIn.
10 Twitter Tips for Professionals
I am an unlikely fan of Twitter, the rapidly growing “micro-blogging” platform (I won’t call it a site, read on…). For starters, I don’t particularly enjoy gossip. I have no interest in celebrities and I think Smalltalk is a computer language. So like many, I hesitated to join Twitter. I was afraid it would amount to pointless chatter, noise. That was then. This is now: in a matter of weeks, Twitter has not only become useful to me, it has become downright essential. Here are 10 Twitter tips I hope professionals find useful.
1. Twitter is a bit like eavesdropping. The conversation is as good as the participants. Follow interesting people, creative thinkers, prominent speakers and chances are you are going to be enlighted by a constant flow of insightful tweets. Follow “noise” and the pearls of wisdom will be few and far between.
2. Follow your friends and peers, sure. But mostly, seek out people you wouldn’t normally get to converse with. Unlike LinkedIn, you can virtually follow anyone. This is unique. Following someone on Twitter is a bit like being allowed in his or her inner, albeit public, circle. Twitter has given me new perspectives from people I may not otherwise meet, listen to or learn from on a day to day basis.
3. Everything you say is public. The search engine in Twitter is very good and real time. The appearance of “inner circle” privacy is just that, an appearance. Be candid (most people are) but tweet accordingly.
4. Being a fairly public and open platform, Twitter is very transparent. You can search Twitter (company name, person, idea) using #hashtags. This gives you a pretty good idea of how the company or idea is being perceived. Real time, unfiltered knowledge. Brilliant for marketers, researchers and just about anyone involved in creating and selling a product or service.
5. Tele-presence. There is a great conference in San Francisco you wish to attend but can’t due to prior commitments. No worries. Lookup the hashtag and “listen” for tweets on the conference. Key points and take-aways will probably make it on Twitter before they show up anywhere else. Not quite like being there, but close.
6. Twitter is a platform, more than a site. The web interface is one of many ways to get on Twitter. I installed a desktop client called TweetDeck and a BlackBerry client called TwitterBerry. There are countless other clients which is further driving its adoption. And there lies a valuable take-away on success 2.0: play nice with the community and the community will adopt you and make you successful.
7. Twitter is not just about people, it’s about news. I essentially stopped using RSS and now use Twitter to read updates from some favourite technical news sites such as Slashdot. There again, Twitter is a platform more than an application. Its potential is enormous.
8. Twitter can get your questions answered. Sure LinkedIn has Q&A’s but the answers take days or weeks to come. Answers on LinkedIn tend to be longer and well thought-out (some anyways) but they still take time. Chances are, unless you are writing a research paper, you need answers at the speed of business. The answers you will get on Twitter are more like insights, facets to the complete answer. Quick, opinionated, maybe a follow up link or two. From there you can make your own opinion.
9. Twitter restricts you to 140 characters. What good can you say in 140 characters? A lot! Twitter forces you be concise, synthetic, to the point. As a writer, it’s a good exercise in concision. As a reader, it’s a great time saver that stimulates the mind.
10. Remember Laurence Fishburne as Morpheus in the Matrix “No one can tell you what The Matrix is, you simply have to experience it for yourself.”? Well so is Twitter. Because of its openness, its choice of interface and who you follow, Twitter is what you want it to be. Try it and you just might like it.
Follow Fabien on Twitter at http://twitter.com/FabienTiburce
Good Map, Bad Map
I recently attended DMTI Spatial’s excellent Expedition 2009 conference. DMTI is a leader in the field of location intelligence, what most of us call maps. Location based systems are quickly marching into the enterprise by combining an innovative, often interactive, visual interface with demonstrable ROI’s. When a prominent distributor with a fleet of delivery trucks addresses a crowd and, data in hand, explains how a location based system engineered by Descartes has allowed his organization to save gas, make faster routes with fuller trucks, you know the technology is sound and the returns real.
At this same conference, I also had the pleasure of meeting a representative of zoocasa.com, a Toronto based home search service. Which brings me to the title of this article. If you decide to try this yourself, the following will take less than 2 minutes. Go to zoocasa.com and type a neighborhood name. I typed “Cabbagetown”, a Toronto neighborhood I call home. Within one second, a map comes up. The neighborhood boundaries are shown in blue. Each property available for sale is represented as a pin on the map (as is common with Google Maps which zoocasa uses). The list of properties is also displayed on a scrollable window to the right. In a matter of seconds, I was able to search and locate all properties available in a given neighborhood in a fast, clean and easy to use interface. As a systems engineer, I know this is not easy to accomplish but you certainly wouldn’t know it using zoocasa. Now, if you will, please attempt to do the same thing using MLS.ca (now Realtor.ca). After clicking on Residential Properties, you end up on the quick search page (a misnomer if I have ever seen one). Enter the same neighborhood name. This is where the mess really begins. The systems indicates “The search criteria would return more than 500 properties, the maximum the system can display”. It forces me to repeatedly click on a map (starting on the entire GTA), thereby making me drill-down to the location I am interested in. This is very time consuming. Besides who actually knows the exact boundaries of a neighborhood? Only when I have reached the <500 locations threshold does realtor.ca present me with something reminiscent of the zoocasa experience (albeit with smaller and clunkier maps).
I am not disparaging the Realtor.ca service (which I have used extensively over the years for my own home searches) but frankly the map implementation is poor. Maps might be on everybody’s mind right now but if you are going to implement them, implement them well. Talk to experts and/or study best practices. Ask yourself the following: as a user (everybody is a user), which map service do you like, which ones don’t you like and why? What will the user experience be like with maps? What value will maps add? How will they be implemented and tied to the rest of the system? Also keep in mind that your customers don’t actually care whether your site has maps (Jakok Nielsen famously said “you shouldn’t listen to what users say; you should watch what they do“). When the novely wears out, what really matters to your customers is whether they can get the service they expect from your organization quickly and effectively. A poorly implemented map is worse than no map at all. For now, off to zoocasa.com I go…
