Azure Storage Options


Difference between Azure Storage / File protocols and usages

Azure has multiple options for Storage services like BLOB , Gen2 ADLS, Managed Disk , File Storage and many services use very specific access protocols to facilitate the usages for specific needs .

I have outlined those services and protocols differences in many ways in below diagrams:

ADLS GEN2 and Usages
AKS Service and multiple storage provisioner usage in Azure
Using Azure File Storage
ADLS Gen2 Storage Account Access and Usages
Azure Netapp Files Usages

Governance across Azure Naming Standards/Conventions and Best Practices

Azure Naming Best Practice and Governance


“There are only two hard things in Computer Science: cache invalidation and naming things.” 

Phil Karlton

Preamble: 

As enterprises start to utilize Azure resources, even a reasonably small footprint can begin to accumulate thousands of individual resources. This means that the resource count for much larger enterprises could quickly grow to hundreds of thousands of resources.

Establishing a naming convention during the early stages of establishing Azure architecture for your enterprise is vital for automation, maintenance, and operational efficiency. For most enterprises, these aspects involve both humans and machines, and hence the naming should cater to both of them.

Objective – Naming standards saves any teams countless number of hours to think , making mistakes and carry with those mistakes and suffer others and be religious in cloud eco system. This thread will outline what are the technical guardrails we can adopt inside naming to instrument this as practice. Idea is to follow KISS principle – “Keep it simple, stupid” . 

Naming conventions always end up with heated debates and someone could be very arrogant to propose “one-size-fits-all” kind of theory but chill ! Lets first understand what can we do best for our own benefit.

WHY TO ADOPT BEST PRACTICE/ RECOMMENDATIONS

Synchronous / Rhythmic and Orderly / Pleasant to see
Do what you want / Go All over the places ?

What is Azure Resources

Any component got to do some work is resource in Azure World. Here is simple resource structure in Azure

  • Management groups: These groups are containers that help you manage access, policy, and compliance for multiple subscriptions. All subscriptions in a management group automatically inherit the conditions applied to the management group.
  • Subscriptions: A subscription logically associates user accounts and the resources that were created by those user accounts. Each subscription has limits or quotas on the amount of resources you can create and use. Organizations can use subscriptions to manage costs and the resources that are created by users, teams, or projects.
  • Resource groups: A resource group is a logical container into which Azure resources like web apps, databases, and storage accounts are deployed and managed.
  • Resources: Resources are instances of services that you create, like virtual machines, storage, or SQL databases.

When Naming resources in cloud, degree of governance and guidelines is very important to drive a successful tidy thoughtful environment.

Hence tagging and naming is BOTH important factors for developers/administrator/config manager to follow.

What important factors driving naming standards ?

  • Consistency –  Makes DevOps engineers job easy
  • Remove Redundancy
  • Scalability and Flexibility
  • Avoid Confusion
  • Ease of Management – Easy to follow and easy to identify resources/pin point product and application owners  in few seconds
  • Billing/Charge Back – Easy to find the resource usage cost by tagging and naming
  • Better monitoring and logging
  • Habit the practice – you will live with this names and spend lot of time with them so make them pleasure to work with
  • Renaming Resources – Azure resource and resource group can’t be renamed once created. Renaming is moving all your resources from one group to other and it potentially impact lot of places

What is excluded ?

Below will not follow the naming conventions as those are automated by Azure itself. 

  • Naming for the Resources auto-generated by Azure like Managed Disk name (e.g. <VMName>_OsDisk_1_25fbb34c5b074882bcd1179ee8b87eeb)
  • Supporting resource group for Azure Kubernetes Service
  • Some resources require a globally unique name
  • Internal Resource group name related to Databricks for e.g. – databricks-rg-dev000hidpuse02dbricks001-44gxjl4c7ailk

Azure Regions for Deployment of Resource Group and Resources

  • EAST US2 is 10% cheaper from EAST US and have better resource availability
  • Co-locating resources at same location will help reducing ingress and egress cost of data transfer for any integration work

Key Resource Group(KRG) Naming

  • Resource group prefix – e.g. – rg (short stands for resource group)   (this will simplify the search while creating any resources) 
  • Department / Business Unit Name – e.g. – hosbi , crs – preceding BU Name will help us to identify the sub from the merged corporate subscription tree
  • Type of the resources under that resource groups. e.g. for db, app, devops, ml, k8s
  • Environment Type – e.g.  – dev, prd
  • Regional Instance Identifier ( azure datacenter) – e.g. eastus, westus – Not mandatory to be explicitly mentioned and maintained specially when business agreed all resources to use specific region
  • Incrementor: 01, 02, 03 etc if required (can be used for redundant copy of existing resource group or brand new resource group to be used by different team under same business unit )  .

For e.g. – rg-hosbi-db-dev , rg-hosbi-app-prd

Key Resource Identifier (KRI) Strings

  • Environment Type – e.g.  – dev, tst, prd
  • Azure resource Identifier – e.g.  graphql ,react, druid, postgres, kafka, eventhubs, redis, adls, vm, redis, nsg, lb, aks, airflow , datafactory, databricks, couchbase, oracle, splunk, adls, storage
  • Role/Function of resources/Primary Use  – e.g.  operation, config, cache, analytics, oltpds, olapds, db, web, mail, shared (when this resource will be shared across resource groups)
  • Regional Instance Identifier( azure datacenter) -e.g.  eastus, westus  – Not mandatory to be explicitly mentioned and maintained specially when business agreed all resources to use specific region
  • Incrementor: 01, 02, 03 etc if required –  e.g. can be used for HA resources like Cluster nodes, VM scale sets. 

For e.g. –  dev-databricks-olap, prd-aks-analytics-eastus, dev-redis-cache, prd-postgres-oltp-config

Note: BU Name / Product Name subject to change with organizational merge hence better to keep those name for each KRI inside Tags and not in the naming itself. 

Micro Service Container Naming convention

this can follow the standard POD naming when deploying the resource using the corresponding helm charts. Certain naming in POD can’t be controlled because it is managed inside kubernetes own realm

Naming standard challenges:

There are lot of naming rules across azure services. Some resource offers naming in hyphen or period characters where some are not. Some likes underscore while other prohibits any naming except alphanumeric characters.

Some allows naming to be certain number of characters some can have pretty lengthy naming and some allows uppercase where some are strictly lowercase

Unifying the Naming Construct and Separators

UsagesDescription of Rules
CASEUse lowercase always. camelCase, PascalCase, Initcap is not allowed.
WHITESPACEno space allowed across any name
ABBREVIATION to shorten characterswhere Resource name length > 8 characters use conformed and consistent abbreviated form like df for datafactory . 
HYPHEN or DASH as Separator to break them upIf hyphen or Dash not allowed use Underscore(“_”) to supplement. If no special characters allowed in between use 1 as separator for e.g. poc1storage1db, poc1storage1olapds, poc1storage1shared . Rare case, as an example Storage name which doesn’t accept hyphen/dash neither any special characters. But Fileservice/blob services/containers underneath it can be names with hyphenUse hyphen(“-“) on resource name or resource group name as separator between resource identifier values for e.g. –  prd-react-analytics-eastusDo not use more then 1 hyphen (“–”) as separator for e.g.  dev–aks–olapds . Don’t use hyphen characters inside the name of resources itself for e.g. eventhubs should never be called as event-hubs rather use continuous naming or abbreviation if long name
START AND END CharacterNever start any resources with hyphen(“-“) or any charactersNever end any resource name with hyphen(“-“) or any characters (like period(“.”) )Never start with Numeric value but end with numeric value is allowed, for e.g. – dev-couchbase-config-eastus2
INCREMENTORUse only when this make sense for e.g. Clusters, Multiple Nodes deployed for similar resources, VM Scale sets etcDon’t use 00 if no incrementor presentUse 0 padding – Never use 1/2/3 rather use 01/02/03 
ENVIRONMENT NameShouldn’t be exactly 3 characters (dev, tst, stg, prd, prf, uat, poc) – qa should be analogous to tst or stg environment
CHARACTERS LimitIf KRI strings length is long enough to fit the allowable limit use additional tag to identify the resources. Some cases abbreviation can be used
REDUNDANT NAMING Cross RegionAvoid creating unnecessary number of resources for redundancy and HA because some services having built-in high availability(fault domains and update domains) and no need to create in different regions.
RESOURCE functionsThis is very important to identify what is the purpose the resources and long term it will benefit identifying the purpose

Important Limitations

Scope:Resources:Constraint:
Alpha-NumericStorage Account NameCannot have dash, dot
Azure CloudSQL Server Name, Storage Account NameMust be unique across Azure not just subscription
LengthSearch Service and Virtual Machines2 to 15 characters
Lower CaseStorage Account NameCannot be upper characters

Sample abbreviated naming for some Azure Resources

Resource/Service:Short Code:Long Code:
Subscriptionsubsub
Resource Grouprgrg
Virtual Machinevmvm
Availability Setavsavset
Virtual Networkvnvnet
Subnetsubsubnet
Public IP Addresspippublicip
Network Security GroupnsgNetworksg
Storage Accountstgstorage
TrafficManagertmtrafficmgmr
Load Balancerlbloadbalancer
Application Gatewayagwappgateway
App Servicesvcappsvc
Key Vaultkvkv
App Service Planaspappplan
Sql Databasesdbsqldatabase
Sql Serversqlsqlserver
Diskdskdisk
DNS Zonednsdnszone
Log Analyticsloaloganalytics
Logic Apploglogapp
Network Interfacenifnetinterface

Tagging :  Very useful to find the resource usage in Azure by Tag Name. Also it helps to categorize the similar resources under one Application or Product.

Another great use of tagging is Billing . It is great way to report the Cost Analysis. Note tagging is Key Value and Can be changed anytime but KRG / KRI is not simple rename.

There would be multi level tagging options:

  • Subscription
  • Resource Group
  • Resource

Tag name/value length: 512 / 256

Tag Name rule: 

Name is the key , it can’t be duplicate string

Never use sequence like 1,2,3 in the name key for tag

Tagging Naming Conventions for Resource group (All parameters below are mandatory)

A common and good use of tags name and value combinations would be below:

NameSample ValueAllowed ValuesEnforced by policyDescription (why needed ? ) 
business-unithosbishould be only one valueYesif multiple product use hyphen(“-“) separator
ownerdebashis-pauluse full nameidentify who owns what, if middle name then add another hyphen(-)
email<email address>valid email addressto identify right person if same name
teamapplicationapplication, analytics, qa, esd, externalYespick one of them
environmentpocdev, tst, stg, prd, prf, uat, pocpick one of them
billingownerhosbi-<companyname>hospitality BI division companyEasy to filter and report cost sheet
regioneastus2eastus, eastus, westusby default policy to be enforced from 

Since Tag Name/Value at Resource Group level doesn’t recursively applied to Resource Name so we still need to apply Tagging pair at individual resource level.

Tagging Naming Conventions for Resource Name  (All parameters below are mandatory)

NameSample ValueDescription (why needed ? ) 
business-unithosbiif multiple product use hyphen(“-“) separator
ownerdebashis-paulidentify who owns what, if middle name then add another hyphen(-)
email<emaild Id>Key to identify if there is same name
teamanalyticsapplication, analytics, qa, esd, external , pick one from them
resource-typestoragee.g. storage, network, database, compute, log, monitor, vm, cache, web, application, queue , stream, orchestrator, iias, pass, saas, 
product-namerateif multiple product use hyphen(“-“) separator
environmentpocUsed for what platform poc, dev, test, production ?
thirdparty-swyes/no/alle.g. couchbase, mongo, kafka, splunk, oracle which is marketplace product and non managed resources
billingownerhosbi-<companyname>Easy to filter and report cost sheet

Scripts to create resources and scripts to enforce such policies… Bookmark them. Non Prod or Prod subscription can’t create resource name/ groups without Script

Enforcing the naming and tagging practice:

ARM template scripts should be executed via Azure CLI / Powershell or Azure API while creating resources / tags following above rules for Prod and Non Prod env except POC environment

POC Subscription resource group should be easy to create without Scripts but should adhere above naming principles

NOTE

Resources with a public endpoint already have an FQDN which accurately describes what they are so some cases resource name is self explanatory while looking at default public endpoint URL azure creates:

Examples:

Resource typePublic endpoint
App Service (web/logic/function app)name.azurewebsites.net
Traffic managername.trafficmanager.net
Storage accountname.blob.core.windows.net
Azure SQLname.database.windows.net
Public IP (load balancer, VM, App Gateway etc.)name.location.cloudapp.azure.com
Cosmos DBname.documents.azure.com
Service Busname.servicebus.windows.net

Ref Doc: 

Organize Azure resource effectively

MS recommendation on Naming and Tagging

Gospel of Azure Storage comparisons


Storage is the backbone of any Cloud IaaS, PaaS or SaaS solution typically managed by Cloud provider. The importance is to identify the capabilities of different storage options and differentiate them and pick the right choices is key to the success of technology implementation. Here is the listed difference at high level but at low levels there could be many differences.

Note: The scope of below is to lay out the differences offered by Microsoft Azure only and not any third party storage options like Netapp or Purestorage or MinIO storages.

Synapse vs Snowflake vs Databricks in Azure


All these three technologies provided modern approach to Cloud Data warehousing but each of them having unique set of features to resolves problem , poses unique challenges to work with. Any modern technology platform for a big enterprise should not take monolithic approach for Data solutions unless clear understanding of business use case and polyglot persistence architecture must be keep in mind when designing the Data store.

Its hard to make the judgement initially about what data store to use for what purpose so does the Research , Proof of Concept and Due Diligence work required when architecting the data solution and this will help building right things in right way

To understand the key difference I have tried to put all three technology comparisons together in one frame and with very high level differentiation however at the low level there could be thousands of other difference on features which is out of scope at present for this thread. As of my writing the differences captured below and this is subject to change in future evolution

Lets deep drive on it and happy to hear feedback/comments below:

Azure Local Zone Geo Redundancy – Connecting the dots


The objective is to simplify the Azure resiliency options explaining the Local Redundancy (LR) , Zone Redundancy(ZR) and Geo Redundancy(GR) . Redundancy offers degree of High availability so does address the SLA % for Fault and PITR (Point in time recovery) for Disaster.

Redundancy is a key objective on Cloud paradigm giving the agility of the application compute and storage options so that any Disaster scenario could be handled with maximum flexibility , lowest possible downtime and with better cost effective option with minimal impact to platform and infrastructure.

Below diagram tries to address the key standards of redundancy from Architecture standpoint . When we talk about redundancy we should always differentiate Storage vs Compute redundancy. Below cases stated mostly applied for Storage redundancy however Compute redundancy is varied across services and offerings.

For e.g. for by-default Managed services having inherent redundancy inbuilt for e.g. Data is either replicated Synchronously three times in primary region using LRS (local redundant storage) and then replicated asynchronously to the secondary region as GRS (Geo redundant storage) or Data is replicated synchronously across 3 AZ’s in primary region using ZRS and then replicated to secondary in asynchronous way as GZRS (Geo Zone redundant storage)

In above example we focus on deploying Azure resources on US Eastern Region (US EAST). Similarly there would be other Geographical region exist making a Regional Pair for Geo Redundant Storage(GRS)

Now Primary EAST US region comprised of multiple Availability Zones(AZ) which is located into different hardware infrastructure . In this case US EAST having 3 AZ’s to give Zone Redundancy. All AZ’s are connected through Azure Virtual network to allow Synchronous replication. These AZ’s are logical entity combining physically different Datacenters.

Within one AZ there could be hundreds of Azure connected resources / services exist spread over one Data Center and across different floor of a building or could be separate out across hardware shelves/RAC’s. That’s how it is giving the local redundancy . LRS services are physically and logically closed to each other to allow minimal downtime for failure and maintenance for patches.

Hope this simplifies some cloudy areas of redundancy and explains the importance of Redundancy/Replication in cloud.