Memory-Optimized Tables – Helps in Performance and Scalability

Today, while performing the code review on one of my project, which is getting developed using many Azure services/technologies.

Being ‘Internet of Things’/IOT scenario, the requirements demands the use of “Polyglot Persistence” pattern. Because solution need to store the structured/SQL as well unstructured/NoSQL data. And as we know, to store the structured/relational data ‘SQL Azure’ is the default technology choice being on Azure and Microsoft person πŸ™‚

So, while analyzing the stored procedure’s T-SQL code, observed that many of the SPs are utilizing the temporary tables for data computation/processing operations to improve the overall performance. Using temporary tables, table variables, or table-valued parameters was a reasonable/acceptable practice when I was a programmer πŸ™‚ But started wondering if anything new added to this approach/pattern to improve for better. By using Bing, I quickly discovered that we really have something new and better of course namely, “Memory-Optimized Tables“. This is part of In-Memory OLTP, which is the premier technology available in SQL Server and Azure SQL Database for optimizing performance of transaction processing, data ingestion, data load, and transient data scenarios.

As MS docs says, Memory-optimized tables are tables, created using CREATE TABLE with “MEMORY_OPTIMIZED = ON” option. Memory-optimized tables are fully durable by default, and, like transactions on (traditional) disk-based tables, fully durable transactions on memory-optimized tables are fully atomic, consistent, isolated, and durable (ACID). Memory-optimized tables and natively compiled stored procedures support a subset of Transact-SQL. More details.

Hence, if you are exploring the options to enhance your SPs/T-SQL code on SQL Azure then please refer here for performance and scalability considerations.

The details scenario are documented with instructions @ Replace global tempdb ##table and Replace session tempdb #table

So, next time whenever you see CREATE TABLE #temptable and/ CREATE TABLE #temptable and choose to replace by memory optimization option then make sure you visit this blog to say thanks you πŸ™‚

Β 

Β 

Β 

Designing High Availability and Disaster Recovery for IoT/Event Hub

Before we jump directly to the topic, it requires some pre-requisites. So, make yourself comfortable with them.

As per wiki, High Availability (HA) is a characteristic of a system, which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. It is measured as a percentage of uptime in a given year. For details, please refer.

And, Disaster Recovery (DR) involves a set of policies, procedures and tools to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster. Disaster recovery focuses on the IT or technology systems supporting critical business functions. For details, please refer.

Azure like any other cloud provider, has many built-in platform features that support highly available applications. However, you need to design the application specific logic (checklist) which absorbs fluctuations in availability, load, and temporary failures in dependent services and hardware. So that, the overall solution continues to perform acceptably, as defined by business requirements or application service-level agreements (SLAs). For details, please refer.

Hoping above info provides the high-level picture of HA/DR. Remaining post is more focused on a specific scenario in Internet of Things (IOT), basically the headline πŸ™‚

I’m intentionally skipping the conceptual part of HA/DR importance, how to measure it, and different enables. As enough literature is available of the www.

Designing HA/DR for a solution which is using IoT/Event Hub has few considerations –

  • Devices are Smart – The devices should either have logic to differentiate between the primary and secondary region/site or shouldn’t declaratively aware of any endpoint. One of the way is to devices regularly check a concierge service for the current active endpoint. The concierge service can be a web service that is replicated and kept reachable using DNS-redirection techniques (Example, Azure Traffic Manager or AWS Route 53). So, you need to ask yourself what will happened to messages when cloud endpoint is not available? Message loss is acceptable/not? If yes, then fine otherwise you need some offline storage/queue at device end also.
  • Devices Identities – Generally endpoint understand the devices identities, if so then all device identities should be geo-replicated/backups and pushed to the secondary IoT hub before switching the active endpoint for the devices. Accordingly, the concierge service and ultimately devices must be made aware of this change in the endpoint. Also you need to develop the tools/utilities to quickly upload/push devices metadata to the IoT Hub.
  • Delta Identification and Merge – Once the primary region becomes available again, all the state and data that have been created in the secondary site must be migrated back to the primary region. This state and data mostly relates to device identities and application metadata, which must be merged with the primary IoT hub and any other application-specific stores in the primary region.

How much time it should take to fall back to secondary site and recover from it, is something which is solution specific and depends on solution’s RPOs and RTOs.

The overall approach includes following considerations in two major areas –

  • Device – IoT Hub
    • A secondary IoT hub
    • Backup Identities to a geo-redundant store
    • Device routing logic
    • Merging identities, when Primary is back
    • Either interim message store on device or message loss acceptable.
  • Application Components/Storages
    • A secondary App/Services Instance
    • Enable geo-redundant for all storages
    • Restoration of data/states from used storages (SQL & NoSQL)
    • Anything custom

Here is the conceptual architecture diagram, which depicts the proposed solution.

Although diagram is self-explanatory – but feel free to comment/ask on anything.

Β 

Big Data Concepts – In 5 Minutes

What is Big Data –

If you are looking for standard definition, then refer to obvious source i.e. wiki

As per wiki, the term has been in use since the 1990s, with some giving credit to “John Mashey” for coining or at least making it popular. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time. More details are anyways at wiki.

The definition I prefer is, “When data is too big for OLTP then it’s Big Data“. Other definitions –

  • When data is in Peta Bytes.
  • 3 Vs (Volume, Velocity and Variety) or 4Vs (Volume, Velocity, Variety, and Veracity)

What Scenario produces it –

Data getting produced from web/internet, social networking/media, phone/mobile tower and many more as mentioned in the diagram below.

Point to be notes is, the notion of big data is not NEW. We always had it, what we haven’t done is to STORE IT and ANALYSE IT. This is now possible because of many factor/enablers.

What Enables it –

If you compare today with a day decades ago. You will observe the entry barriers got reduced significantly and democratization of concepts and its enablers happened. For example, nowadays buying compute/storage resources is relatively cheap than it was previously. Also, the technologies/solutions required to make sense out of big data are more accessible, thanks to open source initiatives and its serious players in the market. Hence, today we have more and more Producers and Consumers of data who are interested in it and its analysis.

I’m trying to list few enablers, but true list would be far greater than this. However, it should give you initial food for thoughts.

What It Enables –

  • Analysis – Sentiments, Clickstream and Forensic etc. Analysis.
  • Patterns – Buying, Search and Investment.
  • Machine Learning
  • Research – Physics and Healthcare
  • Prediction and Prevention Maintenance.
  • And many more…Just Bing/Google it

Map Reduce, I heard somewhere about it what’s that –

Developed and perfected inside the google then published to public. It’s 2 pass process – 1) Map and 2) Reduce. More details

Let’s understand it quickly via picture. As, “a picture is worth a thousand words”

Although picture is self-explanatory, but I will add the explanation, if required and requested

The Azure Architecture Center is available now

The Azure Architecture Center is available now in the documentation section of Azure. It’s Open to everyone with no cost to access the information. The Architecture Center is an extremely valuable resource as it brings –

  • Information for all cloud users ranging from beginners to specialists.
  • Best practices for security, availability, scalability, performance, cost, and manageability.
  • Tested, proven, and verified guidance. Not theoretical designs, they have been built and successfully run and ready for production.
  • Prepared deployment scripts and diagrams that anyone can use to get started quickly

The main areas of the architecture center covers are –

  • Application Architecture Guide – This guide presents a structured approach for designing applications on Azure that are scalable, resilient, and highly available.
  • Reference Architectures – Scenarios with related architectures grouped together.
  • Cloud Design Patterns – These design patterns are useful for building reliable, scalable, secure applications in the cloud.

One of interesting topic is a special section for customers coming from compete cloud provider namely, AWS. It helps Amazon Web Services (AWS) experts understand the basics of Microsoft Azure accounts, platform, and services. It also covers key similarities and differences between the AWS and Azure platforms, here.

Lastly, the people who are deep in architectural/design work should visit here. This provides resources including icons, Viso templates, PNG files, and SVG files that are useful for producing your own architecture diagrams. A direct link to download.

Google Cloud – Developer’s Sneak Peek

As per Gartner, the Internet search and ad giant has entered top 3 cloud provider.

If you are already familiar with any cloud provider like Azure or AWS. You will find yourself at home Google Cloud Platform (GCP) is hosted on the same infrastructure used by Google Search and YouTube.

The fundamentals of PaaS, IaaS, Compute, Storage, Networking and Security will help you to quickly digest the google cloud platform specifics. However, refer to google differentiators, as it claims.

Probably, the good news for beginners is that “Google Cloud Platform Free Tier” is relatively relaxed compare to other cloud provider, IMO. Question on it visit here.

Although, Google started late. But, it seems to have strong IaaS and PaaS capabilities. Many of its cloud services are extended capabilities of existing services. Interesting observations is, to improve developer productivity, GCP offers App Engine Flexible Environment (Managed ‘Virtual Machine’) that operates between IaaS and PaaS. The App Engine flexible environment is based on Google Compute Engine and automatically scales your app up and down while balancing the load.

Another significant aspect is GCP’s significant contribution to ‘Open Source’. Few examples –

  • Kubernetes – System for automating deployment, operations, & scaling of containerized apps.
  • Spanner – Scalable, multi-version, globally-distributed, & synchronously-replicated database.
  • Hadoop MapReduce – It will let users run native C/C++ code in their Hadoop environments.
  • Dataflow – Ability to handle batch/stream processing of large data sets.

The way I see, from uber perspective google has divided the cloud offerings into –

1) Consumer oriented and 2) Developer oriented

For beginners, especially if you are coming from application architecture/design/development background the navigation path for google cloud could be – more details.

LAMP and Azure – Misconceptions vs Possibilities

A discussion of the Microsoft Platform (Windows, IIS, SQL Server and ASP.NET) vs LAMP (Linux, Apache, MySQL and PHP) topic covers a large set of topics.

My intent is not compare 1:1 but commenting on a scenario.

In many discussions, I realized many people have perception/misconceptions that, Azure is not really meant for traditional web-based applications built on the LAMP (Linux Apache MySQL PHP).

However, truth is that you can deploy LAMP stack on Azure to rapidly build, deploy, and dynamically scale websites and web apps using IaaS (VM scale sets) and PaaS (Azure Web Apps)

Β 

Β 

So, Customers who want to – Upgrade web apps to the cloud for scalability, high availability and other cloud traits like global presence, and dynamically scale (up and down) websites in a cost-effective. You should consider Azure as you get Architectural choices for hosting websites to choose from a wide array of architectures (containers, VMs, PaaS services, Azure Functions, etc.) and languages (node.js, PHP, Java, etc.). Linux web apps, let us create node and Java script websites that are fully managed.

Providers like, Bitnami provides images which are pre-configured, tested and optimized for Microsoft Azure and portable across platforms. Which provides quick and ready to use services.

For more information please feel free to visit @ https://azure.microsoft.com/en-in/overview/choose-azure-opensource/

AWS – 3-Tier Web Application Architecture

Wikipedia says – Three-tier architecture is a client–server software architecture pattern in which the user interface (presentation), functional process logic (business rules), computer data storage and data access are developed and maintained as independent modules, most often on separate platforms/server.

AWS provides reference architecture for “Web Application Hosting” with description as Amazon Web Services, provides the reliable scalable, secure, and high performance infrastructure required for web applications while enabling an elastic, scale out and scale down infrastructure to match IT costs in real time as customer traffic fluctuates.

However, the provided reference architecture is little high level in nature and probably people need more details considering AWS five pillars perspectives. Hence, I’m attempting a shot.

Proposed Reference Architecture is –

When architecting technology solutions, if we neglect the five pillars of Security, Reliability, Performance Efficiency, Cost Optimization, and Operational Excellence it can become challenging to build a system that delivers on your expectations and requirements. When you incorporate these pillars into your architecture, it will help you produce stable and efficient systems. This will allow you to focus on the other aspects of design, such as functional requirements. Considering these, proposed architecture is depicted in diagram below –

Description for Key components – with 5 pillar perspective

Security

Security of data in rest/transit is taken with the utmost priority. All the servers are completely isolated by design. One of the most important networking features AWS provides is resource isolation using Virtual Private Cloud (VPC). Where Security group acts as a virtual firewall for Instances in to control inbound and outbound VPC traffic and Network Access Lists (NACL) is another, optional layer of security for VPC that controls traffic between one or more subnets. Also, using secure protocols listeners and enabling SSL termination on the ELB to release load on the backend instances.

Reliability

Amazon Web Services brings lot of built-in features to address business continuity. Elastic Load Balancing (ELB) and multiple Availability Zones for Servers/Instances. ELB will effectively distribute load among EC2 servers, but also ensures that services will be unaffected if one data center becomes unavailable for some reason. Per RTO/RPO requirements, the solution could be deployed in two/more AWS regions from disaster/recovery standpoint (Active/Passive). Route 53 for traffic distribution/routing across regions

Performance Efficiency

The Performance Efficiency focuses on the efficient use of computing resources and maintaining that efficiency as demand changes and technologies evolve. AWS provides multiple types of EC2 instances – on demand, reserved (for a specific period) and spot instances (bid on unused instances) and allow for flexibility in choice of size also. Solution should start with smaller on-demand instances and once we understand the level of workload, choose the right size and combine on demand and reserved instances. The options are endless. In certain scenarios, such as when flash traffic is expected, the auto scaling with cloud watch should be utilized for effective utilization of resources and same should be monitored.

Cost Optimization

The Cost Optimization is a continual process of refinement and improvement of a system over its entire lifecycle. From the initial design of very first POC to the ongoing operation of production workloads. Solution should monitor the usage and accordingly auto adjust by using AWS “CloudWatch” and “Auto Scaling” features. Also, usage of PaaS services remove the operational burden of maintaining servers for tasks like sending email or managing NoSQL DBs. As PaaS services operate at cloud scale, they can offer a lower cost per transaction or service. Replicating the environments using CloudFormation templates.

Operational Excellence

The Operational Excellence includes operational practices used to manage production. How planned changes are executed, responses to unexpected operational events. Change execution and responses should be automated, documented, tested, and regularly reviewed. In AWS, we can set up source control, a (CI/CD) pipeline and release management. Aggregate logs for centralized monitoring and alerts. Make sure alerts trigger automated responses, including notification and escalations.