Big Data Concepts – In 5 Minutes

What is Big Data –

If you are looking for standard definition, then refer to obvious source i.e. wiki

As per wiki, the term has been in use since the 1990s, with some giving credit to “John Mashey” for coining or at least making it popular. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. More details are anyways at wiki.

The definition I prefer is, “When data is too big for OLTP then it’s Big Data“. Other definitions –

  • When data is in Peta Bytes.
  • 3 Vs (Volume, Velocity and Variety) or 4Vs (Volume, Velocity, Variety, and Veracity)

What Scenario produces it –

Data getting produced from web/internet, social networking/media, phone/mobile tower and many more as mentioned in the diagram below.

Point to be notes is, the notion of big data is not NEW. We always had it, what we haven’t done is to STORE IT and ANALYSE IT. This is now possible because of many factor/enablers.

What Enables it –

If you compare today with a day decades ago. You will observe the entry barriers got reduced significantly and democratization of concepts and its enablers happened. For example, nowadays buying compute/storage resources is relatively cheap than it was previously. Also, the technologies/solutions required to make sense out of big data are more accessible, thanks to open source initiatives and its serious players in the market. Hence, today we have more and more Producers and Consumers of data who are interested in it and its analysis.

I’m trying to list few enablers, but true list would be far greater than this. However, it should give you initial food for thoughts.

What It Enables –

  • Analysis – Sentiments, Clickstream and Forensic etc. Analysis.
  • Patterns – Buying, Search and Investment.
  • Machine Learning
  • Research – Physics and Healthcare
  • Prediction and Prevention Maintenance.
  • And many more…Just Bing/Google it

Map Reduce, I heard somewhere about it what’s that –

Developed and perfected inside the google then published to public. It’s 2 pass process – 1) Map and 2) Reduce. More details

Let’s understand it quickly via picture. As, “a picture is worth a thousand words”

Although picture is self-explanatory, but I will add the explanation, if required and requested

LAMP and Azure – Misconceptions vs Possibilities

A discussion of the Microsoft Platform (Windows, IIS, SQL Server and ASP.NET) vs LAMP (Linux, Apache, MySQL and PHP) topic covers a large set of topics.

My intent is not compare 1:1 but commenting on a scenario.

In many discussions, I realized many people have perception/misconceptions that, Azure is not really meant for traditional web-based applications built on the LAMP (Linux Apache MySQL PHP).

However, truth is that you can deploy LAMP stack on Azure to rapidly build, deploy, and dynamically scale websites and web apps using IaaS (VM scale sets) and PaaS (Azure Web Apps)

 

 

So, Customers who want to – Upgrade web apps to the cloud for scalability, high availability and other cloud traits like global presence, and dynamically scale (up and down) websites in a cost-effective. You should consider Azure as you get Architectural choices for hosting websites to choose from a wide array of architectures (containers, VMs, PaaS services, Azure Functions, etc.) and languages (node.js, PHP, Java, etc.). Linux web apps, let us create node and Java script websites that are fully managed.

Providers like, Bitnami provides images which are pre-configured, tested and optimized for Microsoft Azure and portable across platforms. Which provides quick and ready to use services.

For more information please feel free to visit @ https://azure.microsoft.com/en-in/overview/choose-azure-opensource/

File Storage and Functions – A files import story in Azure

The story goes like this – you have set of files which should be imported into a solution hosted on Azure.

Idea is to cover the scenario technically – the key players are Azure File Storage, Azure Functions.

If you don’t know already then quick summary –

  1. Azure File storage
    It’s a service that offers file shares in the cloud using the standard Server Message Block (SMB) Protocol. With Azure File storage, you can migrate legacy applications that rely on file shares to Azure quickly and without costly rewrites. Applications running in Azure virtual machines or cloud services or from on-premises clients can mount a file share in the cloud, just as a desktop application mounts a typical SMB share. Any number of application components can then mount and access the File storage share simultaneously. Since a File storage share is a standard SMB file share, applications running in Azure can access data in the share via file system I/O APIs. For more details please refer here. [Reference: Azure docs]
  2. Azure Functions – It’s a service that offers a server-less compute service that enables you to run code on-demand without having to explicitly provision or manage infrastructure. Use Azure Functions to run a script or piece of code in response to a variety of events. So, a solution for easily running small pieces of code, or “functions,” in the cloud. You can write just the code you need for the problem at hand, without worrying about a whole application or the infrastructure to run it. Functions can make development even more productive, and you can use your development language of choice, such as C#, F#, Node.js, Python or PHP. For more details please refer here. [Reference: Azure docs]

The Overall process –

  • Define the structure for Input files location – In file storage, defines a structure for Input file, Processed and Failed file by using ‘Share(s)’ and ‘Directory(s)/Files(s)’.
  • New file detection mechanism – check the presence of new file(s) as per predefined schedule and add message to a queue for further processing. Using a Function triggered by timer.
  • Import the files/data into system – A Function which process the input file(s) and ultimately imports the data.
  • Perform cleanup at Input files location – Mark files as processed, or move files to processed/failed directory for reference/tracking purpose.

Now, the devil is in the detail –

The Azure File service offers the four resources: the storage account, shares, directories, and files. The File service REST API provides a way to work with share, directory, and file resources via HTTP/HTTPS operations. So, instead of UNC/file-share/mapping, you need to use Azure Storage SDK which is a wrapper over Azure Storage REST API. This should avoid any UNC/mapping/related issues.