First published on blog.bizosys.com 21 dec, 2012
As a company engaged in Big data before the term became as common as it is today, we are constantly having conversations around solutions that have a big data problem. Naturally, a lot of talk ranges around Hadoop, NoSQL, and other such technologies.
But what we notice is a pattern in how this is impacting business. There is a company that caters to researchers who till recently were dealing with petabytes of data. This is a client company and we helped implement our HSearch real time big data search engine for Hadoop. Before this intervention, the norm was to wait for upto 3 days at times to receive a report for a query spanning the petabytes of distributed information that was characterized by huge volumes and lot of variety. Today, the norm has changed with big data solution and it is about sub second response times.
Similarly, in a conversation with a Telecom industry veteran, we were told that the health of telecom has always been networks monitored across large volume of transmission towers and together generate over 1 Terabyte of data each day as machine logs, sensor data, etc. The norm here was to receive health reports compiled at a weekly frequency. Now, some players are not satisfied with that and want to receive these reports on a daily basis, and possibly hourly or even in real time.
Not stopping at reporting as it happens, or in near real-time, the next question business is asking, if you can tell so fast, can you predict it will happen, especially in a world of monitoring IT systems and machine generated data. We will leave predicting around human generated data analytics (read – social feed analysis) out of the story for the moment. Predictive analysis could mean predicting that a certain purple shade large size polo neck is soon going to run out of stock for a certain market region given other events. Or it could mean, more feasible, that a machine serving rising number of visitors to a site is likely to go down soon since its current sensor data indicates a historical pattern, therefore, alert the adminstrator or better still bring up a new node on demand and keep it warm and ready.
So it seems the value of big data is in its degree of freshness and actionability, and at most basic level, simply get the analysis or insight out faster by a manifold factor!