The Origins of Bigdata

While sharing our thoughts on big data with our communications team, we were storytellers. The story around big data was impromptu! We realized the oft-quoted VolumeVariety and Velocity actually can be mapped to TransactionsInteractions and Actions. I have represented it using a Visual.ly infographic background.

Here is a summary –

“The trend we observe is that the problems around big data are increasingly being spoken about more in business terms viz.Transactions, Interactions, Actions and less in technology terms viz. Volume, Variety, Velocity. These two represent complementary aspects and from a big data perspective promise better business-IT alignment in 2013, as business gets hungrier still for more actionable information.”

Volume – Transactions

More interestingly, as in a story, it flowed along time and we realize that big data appears on the scene as an IT challenge to grapple with when the Volume happens, which comes from either growing rate of transactions. Sometimes transactions occur in several hundreds per second, or as billions of database records required to process in a single batch were the volume is multiplied due to newer, more sophisticated models being applied as in the case of risk analysis. Big data appears on the scene as a serious IT challenge and project to deal with associated issues around scale and performance of large volumes of data. Typically, these are Operational in nature and internal facing.

These large volumes are often dealt with by relying on a public Cloud infrastructure such as Amazon, Rackspace, Azure, etc. or more sophisticated solutions involving ‘big data appliances’ that combine Terabyte scale RAM at hardware level with  in-memory processing software from large companies such as HP, Oracle, SAP, etc.

Variety – Interactions

The next level of big data problems surface when dealing with external facing information arising out of Interactions with customers, and other stakeholders. Here one is confronted with a huge variety of information, mostly textual, captured from customers interactions with call centers, emails, or meta data from these including videos, logs, etc. The challenge is in semantic analysis of huge volumes of text to determine either user intent or sentiment and project brand reputation, etc. However, despite ability to process this volume and variety, getting a reasonably accurate measurement that is ‘good enough’ still remains a daunting challenge.

Value – Transactions + Interactions

The third level of big data appears where some try to make sense of all the data that is available – structured and unstructured, transactions and interactions, current and historical, to enrich the information, pollinate the raw text by determining business entities extracted, linking them to richer structured data, linking to yet other sources of external information, to triangulate and derive a better view of the data for more Value.

Velocity – Actions

And finally, we deal with Velocity of the information as it happens. Could be for obvious aspects like Fraud detection, etc. but also to determine actionable insights before information goes stale. This requires addressing all aspects of big data to be addressed as it flows and within a highly crunched time frame. For example, an equity analyst or broker would like to be informed about trading anomalies or patterns detected as intraday trades happen.

The business impact of Bigdata

First published on blog.bizosys.com 21 dec, 2012

As a company engaged in Big data before the term became as common as it is today, we are constantly having conversations around solutions that have a big data problem. Naturally, a lot of talk ranges around Hadoop, NoSQL, and other such technologies. 

But what we notice is a  pattern in how this is impacting business. There is a company that caters to researchers who till recently were dealing with petabytes of data. This is a client company and we helped implement our HSearch real time big data search engine for Hadoop. Before this intervention, the norm was to wait for upto 3 days at times to receive a report for a query spanning the petabytes of distributed information that was characterized by huge volumes and lot of variety. Today, the norm has changed with big data solution and it is about sub second response times.

Similarly, in a conversation with a Telecom industry veteran, we were told that the health of telecom has always been networks monitored across large volume of transmission towers and together generate over 1 Terabyte of data each day as machine logs, sensor data, etc. The norm here was to receive health reports compiled at a weekly frequency. Now, some players are not satisfied with that and want to receive these reports on a daily basis, and possibly hourly or even in real time.

Not stopping at reporting as it happens, or in near real-time, the next question business is asking, if you can tell so fast, can you predict it will happen, especially in  a world of monitoring IT systems and machine generated data. We will leave predicting around human generated data analytics (read – social feed analysis) out of the story for the moment. Predictive analysis could mean predicting that a certain purple shade large size polo neck is soon going to run out of stock for a certain market region given other events. Or it could mean, more feasible, that a machine serving rising number of visitors to a site is likely to go down soon since its current sensor data indicates a historical pattern, therefore, alert the adminstrator or better still bring up a new node on demand and keep it warm and ready. 

So it seems the value of big data is in its degree of freshness and actionability, and at most basic level, simply get the analysis or insight out faster by a manifold factor!