Big Data Concepts – In 5 Minutes

What is Big Data –

If you are looking for standard definition, then refer to obvious source i.e. wiki

As per wiki, the term has been in use since the 1990s, with some giving credit to “John Mashey” for coining or at least making it popular. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. More details are anyways at wiki.

The definition I prefer is, “When data is too big for OLTP then it’s Big Data“. Other definitions –

  • When data is in Peta Bytes.
  • 3 Vs (Volume, Velocity and Variety) or 4Vs (Volume, Velocity, Variety, and Veracity)

What Scenario produces it –

Data getting produced from web/internet, social networking/media, phone/mobile tower and many more as mentioned in the diagram below.

Point to be notes is, the notion of big data is not NEW. We always had it, what we haven’t done is to STORE IT and ANALYSE IT. This is now possible because of many factor/enablers.

What Enables it –

If you compare today with a day decades ago. You will observe the entry barriers got reduced significantly and democratization of concepts and its enablers happened. For example, nowadays buying compute/storage resources is relatively cheap than it was previously. Also, the technologies/solutions required to make sense out of big data are more accessible, thanks to open source initiatives and its serious players in the market. Hence, today we have more and more Producers and Consumers of data who are interested in it and its analysis.

I’m trying to list few enablers, but true list would be far greater than this. However, it should give you initial food for thoughts.

What It Enables –

  • Analysis – Sentiments, Clickstream and Forensic etc. Analysis.
  • Patterns – Buying, Search and Investment.
  • Machine Learning
  • Research – Physics and Healthcare
  • Prediction and Prevention Maintenance.
  • And many more…Just Bing/Google it

Map Reduce, I heard somewhere about it what’s that –

Developed and perfected inside the google then published to public. It’s 2 pass process – 1) Map and 2) Reduce. More details

Let’s understand it quickly via picture. As, “a picture is worth a thousand words”

Although picture is self-explanatory, but I will add the explanation, if required and requested