Welcome to the Data Age
The name “Big data” the buzz word today in the software market is intuitive and suggests
that the data is big, the question everybody has is how big it is? The answer to this question is not so simple. We have to consider few factors that will determine the type of the data that falls into this category. The history tells us what ages we have live through, starting with the Stone Age to the Information age. There will not be any exaggeration in saying today we are living in Data Age.The prime focus for analyzing, storing, retrieving is Data. The big data can be considered a problem domain in today’s fastest growing digital age.
Data Inflow Sources:
We should consider some facts before going deep into the technicalities of the subject.The flood of data is coming from many sources. Professionals have estimated the data of digital universe in 2006 to be 0.18 zettabytes and predicted a tenfold growth in 2011. The present day Statistics tell us that data has grown close to 2.7 Zettabytes (1 ZB = 10 21 bytes) in 2014.Some of the giant companies contributing to such a huge of amount of data flood are listed below.
➢ The New York stock exchange generates one terabyte of data per day.
➢ The most famous social networking giant Facebook hosts approximately 10 billion photos
taking up one petabyte of storage.
➢ The internet Archive’s data is growing at 20 Terabytes per month.
The conclusive fact is that we produce 2.5 quintillions of data each day, so much that 90% of the data today in world has been created in last two years.
Dimensions of Big Data:
Big data has specifically three dimensions namely volume, velocity and variety. But these days we have to also take into consideration two more factors those are gaining attention namely variability and complexity. Traditional systems only analyze structured data which is very low and by natural intuition one can say the more the refined data the lesser accurate will be the analysis. But if there is a way to analyze raw data which is unstructured or semi structured which occupies maximum percentage of overall data in the universe, the results might quite interestingly accurate and analysis will be right on the target. The point to be noted here the analysis on raw data will fetch us unbelievably interesting details and better results.
Volume: The increase in the volume of data today is mainly due to transactions in banks, unstructured social streaming of media data, sensors used in almost every field which records data each and every second. So with the increasing amount of data and low storage costs as in contrast with past high storage costs, the analysis of the data stored in low cost storage is on the rise.
Velocity: The velocity is at which data is flowing into a system and how well we are able to respond to it is the next challenge. Sometimes even a minute is a slow time when we are dealing with fraud detection, response management aspects. This is very important for organizations that have to analyze quickly and deliver it in no time.
Variety: At a high level the formats of data that we can think of are Structured, Semi Structured, and Unstructured. Structured is an organized way of storing data as in the case of RDBMS, semi structured can be thought as partially structured data. Unstructured is the data like videos, media, email, photos, whose finer details couldn’t be stored in a structured format. Experts have estimated 80-90 per cent of data in any organization is unstructured. The potential platform for innovation is to be able to analyze such unstructured data. Variability and Complexity: These two factors are also gaining attention as big data aspects.Variability corresponds to periodic trends in the flow of data and Complexity refers to data that should be correlated taking it up from different resources.
Conclusion: Based on the facts and challenges, organizations are facing today one can predict that big data is a problem domain both in terms of storage and as well as processing. The challenge lies in finding a plausible solution to this problem domain.
References
➢ http://www.sas.com/en_us/insights/big-data/what-is-big-data.html
➢ http://en.wikipedia.org/wiki/Big_data
➢ Book: Hadoop the definitive guide