In order to understand ‘Big Data’, we first need to know what ‘data’ is.
Oxford dictionary defines ‘data’ as –
quantities, characters, or symbols on which operations are performed by a
computer, which may be stored and transmitted in the form of electrical signals
and recorded on magnetic, optical, or mechanical recording media. ”
Big data is a term that is used to describe data that is high
volume, high velocity, and/or high variety; requires new technologies and
techniques to capture, store, and analyze it; and is used to enhance decision
making, provide insight and discovery, and support and optimize processes.
Here, big data is used to
better understand customers and their behaviors and preferences. Companies are
keen to expand their traditional data sets with social media data, browser logs
as well as text analytics and sensor
data to get a more complete picture of their customers.
Big Data Sources. Big data sources are repositories of large volumes of data. … This brings
more information to users’ applications without requiring that the data be held in a
single repository or cloud vendor proprietary data store.
Examples of big data sources are Amazon
Redshift, HP Vertica, and MongoDB.
The general consensus of the
day is that there are specific attributes that define big data. In most big
data circles, these are called the four V’s: volume, variety,velocity, and veracity. (You might
consider a fifth V, value.)
That’s why big data analytics
technology is so important to heath
care. By analyzing large amounts of information – both structured and
unstructured – quickly, health care providers can provide lifesaving diagnoses
or treatment options almost immediately.
Big data tools: Talend Open Studio. Talend also offers an Eclipse-based IDE
for stringing together data processing
jobs with Hadoop. Its tools are designed
to help with data integration, data quality, and data management,
all with subroutines tuned to these jobs.
So, ‘Big Data’ is also a data but
with a huge size. ‘Big Data’ is
a term used to describe collection of data that is huge in size and yet growing
exponentially with time.In short, such
a data is so large and complex that none of the traditional data management
tools are able to store it or process it efficiently.
Statistic shows that 500+terabytes of new data gets ingested into the databases of social
media site Facebook, every day. This
data is mainly generated in terms of photo and video uploads, message
exchanges, putting comments etc.