Like the explosion of interest in analytics, interest in what is known as big data has recently increased dramatically.
Big data is simply a set of data that cannot be managed, processed, or analyzed with commonly available software in a reasonable amount of time.
Example of Big Data are:
Walmart handles over one million purchase transactions per hour.
Facebook processes more than 250 million picture uploads per day.
Five billion cell-phone owners around the world generate vast amounts of data by calling, texting, tweeting and browsing the web on a daily basis.
As Google CEO Eric Schmidt has noted, the amount of data currently created every 48 hours is equivalent to the entire amount of data created from the dawn of civilization until the year 2003. Perhaps it is not surprising that 90 percent of the data in the world today has been created in the last two years.
Businesses are interested in understanding and using data to gain a competitive advantage.
Although big data represents opportunities, it also presents analytical challenges from a processing point of view and consequently has itself led to an increase in the use of analytics.
More companies are hiring data scientists who know how to process and analyze massive amounts of data.
Big data issues are a subset of analytics and that many very valuable applications of analytics do not involve big data.
It is through technology that we have truly been thrust into the data age.
Because data can now be collected electronically, the available amounts of it are staggering.
The term "big data" has been created in the midst of vast amounts of data collection from various sources like the Internet, cell phones, retail checkout scanners, surveillance video, and sensors
There is no universally accepted definition of big data, but a commonly accepted one is that it refers to any set of data that is too large or too complex to be handled by standard data-processing techniques and typical desktop software
IBM describes big data through the four Vs:
Volume
Velocity
Variety
Veracity
Volume
Data at rest
Terabytes to exabytes of existing data to process
Velocity
Data in Motion
Streaming data, milliseconds to seconds to respond
Variety
Data in Many Forms
Structured, unstructured, text, multimedia
Veracity
Data in Doubt
Uncertainly due to data inconsistency & incompleteness, ambiguities, latency, deception, model approximations
Volume:
Data collected electronically allows for the collection of vast quantities of data
Many companies now store over 100 terabytes of data (1 terabyte = 1,024 gigabytes)
Velocity:
Real-time capture and analysis of data pose challenges in storage and speed of analysis
The NewYorkStock Exchange collects 1 terabyte of data in a single trading session
Having current data and real-time rules for trades and predictive modeling are crucial for managing stock portfolios
Variety:
Companies now collect more complicated types of data in addition to large volumes and high speeds
Text data from social media platforms like Twitter, audio data from service calls, and video data from in-store cameras are examples
Analyzing nontraditional data sources is complex due to the processing needed to transform data into a numerical form for analysis
Veracity:
Refers to the uncertainty in data
Challenges include missing values, inconsistencies in units of measure, and the lack of reliability of responses leading to bias
Ensuring reliable analysis with uncertain data is a significantchallenge