MySQL on Premise to Azure Cloud Migration

Objective is to analyze offline Migration of the MySQL Database instance from On Premise to Cloud w/o help of Azure Expressroute or Dedicated VPN Gateway or Private Endpoints. Also understand the process of offline migration , how much data it can handle , troubleshoot different issues and the pros and cons of following one Data Export / Import process vs Other and pick right choices in Cloud (Managed vs VM) . This drives to Cost assessment , provide limitations on certain choices . The right choice for big bulk migration is definitely DMS solutions with minimal downtime and leverage the dedicated connections but that steps are out of scope of this thread


  1. MySQL DB Instance in Cloud
  2. Identify Sample Data to be migrated from On Premise
  3. Configure Cloud / Network / Database
  4. Export Data dump / Import Data


This PoC will try to achieve following :

  • Create MySQL Database instance in Cloud / Pick right choice  
    • Azure VM using Marketplace MySQL Image
    • Azure Single Server MySQL Managed Instance
  • Pick the right option for MySQL Migration
    • Using mysqldump or PHPMyAdmin (Offline Backup/ Restore)
    • Using MySQLWorkBench ( Offline)
    • Using DMS  (Continuous Sync with Mininmal Downtime)
    • Using mydumper / myloader tool
      • mydumper to Dump the Database
      • myloader to load to target MySQL instance
  • Understand Network Config/ Firewall/Storage and other dependent setup required for Offline migration
    • Network Inbound / Outbound Rule for Azure VM
    • Network Inbound / Outbound Rule for MySQL DB Access
  • Understand Data Volume , Latency for generating data dump and Transfer over to Cloud Landing Storage
  • How to mount and use Cloud Landing shared location and get ready for Data Import
  • Import options for loading Data dump
  • Produce Benchmark options / Analysis to improve performance
  • Cost Assessment and Comparative cost analysis with On Premise MySQL instances

Platform : 

  • On Premise MySQL Standby Slave Instance (Production  ‘mvh’ database) and ‘mvh_test’ Test Database
  • Cloud Azure
  • Azure VM
  • Any other onPrem VM’s for SCP file to Cloud

Note: I have been following microsoft document ” and I have failed exporting and import through MySQL workbench in many ways with lot of challenges which I have reserved for separate thread/blog to explain.

1.0 Setup Cloud VM for MySQL

Creating Virtual machine with MySQL 5.7 Ubuntu Image

Have the inbound port enabled for access (not recommended for Any IP on port 22 for Production workload – should be target only inbound IP’s)

Attached separate Disk other than OS for MySQL Data workload (Follow more steps outlined required for OS to recognize the Disk)

Install mydumper 

Refer this about my dumper tool and this Github thread

ssh dxpadmin@<publicIP>  and then give the password (configured during VM setup)

Find the OS version and Image :  cat /etc/os-release

If prebuilt Image that has been picked is Ubuntu 5.7 will be SSH to this box from either local or any TC box as :  

and install below packages:

$wget$(lsb_release -cs)_amd64.deb

$sudo dpkg -i mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb 

If below error follow the commands in sequence:

root@tcmysql:~# dpkg -i mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb

dpkg: error: dpkg status database is locked by another process


root@tcmysql:~# lsof /var/lib/dpkg/lock


dpkg    14397 root    3uW  REG    8,1        0 56697 /var/lib/dpkg/lock

root@tcmysql:~# ps cax | grep 14397

14397 pts/0    Ss+    0:07 dpkg

root@tcmysql:~# kill 14397

root@tcmysql:~# ps cax | grep 14397

root@tcmysql:~# sudo rm /var/lib/dpkg/lock

root@tcmysql:~# sudo dpkg --configure -a

If Prebuilt Image is RedHat/CentOS use below:

yum install
yum install

1.1 Extract the Sample dump from MySQL Database

Extracting Rate Fact table (Current and Historical) data from MySQL Production Slave instance

Install mydumper/myloader utility to any TC Host which can be used as Landing zone for data:

RedHat / Centos

yum install
yum install

Ubuntu / Debian

wget$(lsb_release -cs)_amd64.deb
dpkg -i mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb

Login to any Host which can reach the MySQL Database and execute below: See usage here:

sudo mydumper -B mvh -T hotel_rate_wes2 --host=bidbhotl01.gain.tcprod.local --user=dbuser --password=xxxx --rows=1000000 --outputdir=./hotel_rate_wes2/ --compress --no-locks --build-empty-files --threads=256 --compress-protocol --verbose=3 -L mydumper-logs_hotel_rate_wes2.txt &

This will create the incremental dump files for current rate (hotel_rate_wes2) table from mvh schema. 

Execute similar command to dump files for Historical Changed rate (hotel_rate_ih_wes2) table from mvh schema. 

A benchmark from the mydumper tool is below:

Table Name (Production)Total RecordsDump TimeDump SizeNumber of Threads usedNumber of Files Generated
hotel_rate_wes223,360,8173 Mins19 GB25625
hotel_rate_ih_wes2126,273,32812 Mins320 GB256130

1.2 Create Azure Storage account Fileshare

This is one of the Preconfigured Storage Account

Inside “dxpfileshare” create “data” directory

Either create Subdirectory “mvh_hotel_rate_wes2” using Azure Portal or Connect to Azure Cloud Shell and create directory to hold the Data Dump files

1.3 Mount Storage Account Fileshare inside Azure VM

This step will remote copy the Dump files generated by mydumper tool and transfer over to Cloud Storage

dpaul@<onpremvmhost> ~ $ ssh dxpadmin@<vmhostip>
dxpadmin@tcmysql:~$ sudo su -

In Storage Account in Azure and Copy the code to mount FileShare inside your Virtual Machine 

Execute below commands inside Azure VM (SSH to the VM box)

sudo mkdir /mnt/dxpfileshare
if [ ! -d "/etc/smbcredentials" ]; then
sudo mkdir /etc/smbcredentials
if [ ! -f "/etc/smbcredentials/dxppocstorageaccount.cred" ]; then
    sudo bash -c 'echo "username=dxppocstorageaccount" >> /etc/smbcredentials/dxppocstorageaccount.cred'
    sudo bash -c 'echo "password=<replaceme>" >> /etc/smbcredentials/dxppocstorageaccount.cred'
sudo chmod 600 /etc/smbcredentials/dxppocstorageaccount.cred

sudo bash -c 'echo "// /mnt/dxpfileshare cifs nofail,vers=3.0,credentials=/etc/smbcredentials/dxppocstorageaccount.cred,dir_mode=0777,file_mode=0777,serverino" >> /etc/fstab'
sudo mount -t cifs // /mnt/dxpfileshare -o vers=3.0,credentials=/etc/smbcredentials/dxppocstorageaccount.cred,dir_mode=0777,file_mode=0777,serverino

1.4 Transfer the file from OnPrem TC Host to Cloud Storage Fileshare

From Local TC Host transfers those files over to Storage Account

dpaul@<host> /u00/dump $ nohup sh -c 'date;scp -r hotel_rate_wes2/ dxpadmin@<azure VM Host>:/mnt/dxpfileshare/data/1;date' 2>&1 > remote_transfer.log &

dpaul@<host> /u00/dump $ nohup sh -c 'date;scp -r hotel_rate_ih_wes2/ dxpadmin@<azure VM Host>:/mnt/dxpfileshare/data/;date' 2>&1 > remote_transfer_hotel_rate_ih_wes2.log &

Note: for above commands to execute make sure generate the SSH Key in TC Host and add that as authorized keys inside Azure VM Host . This will stop asking password Prompts 

1.5 Network and DB access

Network Inbound rule change

To enable the VM host access to MySQL Database Inbound port 3306 needs to be allowed from Network option on VM .

Note : Mysql DB port 3306 is exposed to the Internet. Its not recommend exposing database ports to the Internet and suggest only exposing them to your front-end tier inside your virtual network.

This practice is applicable for Testing purpose only . For DB migration a dedicated private endpoint is advised 

DB Change

Grants required to be added for Azure OS user ‘root’ and  ‘dxpadmin’ (user created during VM creation) who can connect to port 3306 MySQL services. Note this is required to run myloader to load data as well.

To do this login to MySQL as root user : 

[root@tcmysql data]# mysql -u root -p

mysql> CREATE USER 'dxpadmin'@'%' IDENTIFIED BY '<password>';

mysql> GRANT ALL ON *.* to dxpadmin@'' IDENTIFIED BY '<password>';



mysql> exit

1.6 MySQL DB Config changes

Increase number of DB Connection’s

This is required to avoid error : “Failed to connect to MySQL server: Too many connections  when running myloader with more threads”

edit /etc/my.cnf file to add below params under [ mysqld ] option :

max_connections = 512

Disable Strict Mode

This is required to avoid error : “Invalid default value for <Date> columns” . Refer below image related to this error while loading data:

for e.g. : 

** (myloader:13067): CRITICAL **: 16:29:20.016: Error restoring mvh_test.currency from file mvh_test.currency-schema.sql.gz: Invalid default value for ‘last_update_dt’

** (myloader:2728): CRITICAL **: 00:36:22.806: Error restoring mvh_test.hotel_rate from file mvh_test.hotel_rate-schema.sql.gz: Invalid default value for                  ‘arv_dt’

** (myloader:2728): CRITICAL **: 00:36:20.940: Error restoring mvh_test.hotel_prop_data_source_note from file mvh_test.hotel_prop_data_source_note-schema.                 sql.gz: Invalid default value for ‘create_dttm’

edit /etc/my.cnf file to add below params under [ mysqld ] option :


Change default DB packets/ throughput and other parameters

This is required as performance tuning parameters

edit /etc/my.cnf file to add below params under [ mysqld ] option :

innodb_buffer_pool_size = 40G
max_allowed_packet = 900MB
innodb_read_io_threads = 16
innodb_write_io_threads = 32
innodb_purge_threads = 2
innodb_buffer_pool_instances = 16
innodb_log_buffer_size = 512M
innodb_log_file_size = 1G

Restart the MySQL Database for above change to propagate

For CentOS/Fedora used : 

[dxpadmin@tcmysql~]$ sudo systemctl start mysqld
[dxpadmin@tcmysql ~]$ sudo systemctl stop mysqld
[dxpadmin@tcmysql etc]$ systemctl status mysqld.service

1.7 Attach and Prepare Datadisk Ready for OS and VM

STEP 1 – Attach disk to existing VM

This can be done two ways : Azure Portal Disk section of the Azure VM by adding new LUN which we already did at 1.4 .

Other way would be executing below Azure CLI code

az vm disk attach \
--resource-group myResourceGroupDisk \
--vm-name myVM \
--name myDataDisk \
--size-gb 128 \
--sku Premium_LRS \

STEP 2 – Prepare the Data Disk

As you can see though we had attach Datadisk of 2 TB in STEP 1 it is still not showing up here : 

[root@tcmysql data]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 9.0M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda2 30G 7.0G 23G 24% /
/dev/sda1 494M 65M 430M 13% /boot
/dev/sda15 495M 12M 484M 3% /boot/efi
// 2.0T 349G 1.7T 18% /mnt/dxpfileshare
/dev/sdb1 126G 2.1G 118G 2% /mnt/resource
tmpfs 6.3G 0 6.3G 0% /run/user/1000

So lets execute below steps to make that available:

  • Create an SSH connection with the virtual machine. Replace the example IP address with the public IP of the virtual machine.
ssh dxpadmin@<Public IP Address of Azure VM>

  • Partition the disk with parted.

sudo parted /dev/sdc –script mklabel gpt mkpart xfspart xfs 0% 100%

  • Write a file system to the partition by using the mkfs command. Use partprobe to make the OS aware of the change.
sudo mkfs.xfs /dev/sdc1
sudo partprobe /dev/sdc1
  • Mount the new disk so that it is accessible in the operating system.
   sudo mkdir /datadrive && sudo mount /dev/sdc1 /datadrive
  • The disk can now be accessed through the /datadrive mountpoint, which can be verified by running the df -h command.
df -h | grep -i "sd"

The output shows the new drive mounted on /datadrive.

Filesystem Size Used Avail Use% Mounted on

/dev/sda1        30G   7.0G   23G   24% /
/dev/sda15       495M  12M  484M   4% /boot/efi
/dev/sdb1        126G   2G   118G   1% /mnt/resource
/dev/sdc1        2.0T     2.0T   1G   1% /datadrive
  • To ensure that the drive is remounted after a reboot, it must be added to the /etc/fstab file. To do so, get the UUID of the disk with the blkid utility.
sudo -i blkid
  • The output displays the UUID of the drive, /dev/sdc1 in this case.

/dev/sdc1: UUID=”33333333-3b3b-3c3c-3d3d-3e3e3e3e3e3e” TYPE=”xfs”

  • Open the /etc/fstab file in a text editor as follows:
sudo nano /etc/fstab
  • Add a line similar to the following to the /etc/fstab file, replacing the UUID value with your own.

UUID=33333333-3b3b-3c3c-3d3d-3e3e3e3e3e3e /datadrive xfs defaults,nofail 1 2

  • When you are done editing the file, use Ctrl+O to write the file and Ctrl+X to exit the editor.

Now that the disk has been configured, close the SSH session. Finally it will look like this:

Change VM Disk Setting for performance

While VM is stopped make changes on Data Disk Advanced setting in Azure portal

1.8 Move MySQL data directory to Azure Data Disk

STEP 1 – Check current MySQL data directory and Stop services

$ mysql -u root -p
mysql > select @@datadir;

| @@datadir       |
| /var/lib/mysql/ |
1 row in set (0.00 sec)

mysql > exit
$ sudo systemctl stop mysqld
$ sudo systemctl status mysqld

STEP 2 – Use rsync to copy data directory to Azure DD and backup

$sudo rsync -av /var/lib/mysql /datadisk
$sudo mv /var/lib/mysql /var/lib/mysql.bak

STEP 3 – Override MySQL Config file

$ sudo vi /etc/my.cnf

. . .
. . .


STEP 4 – Restart MySQL instance and check datadir changes

$ sudo systemctl start mysqld
$ sudo systemctl status mysqld

$ mysql -u root -p

mysql> select @@datadir;


| @@datadir                  |
| /mnt/volume-nyc1-01/mysql/ |
1 row in set (0.01 sec)

if you don’t do below steps MySQL will not start

STEP 5:  On CentOS or RedHat, modify SELinux settings. On Ubuntu or Debian, modify the AppArmor settings

When you perform copy of data directory or using rsync , the SELinux context will be lost (Note if you move directory that may intact) 

If SELinux is running in enforcing mode, you need to to persuade it to allow mysqld to use the non-default datadir. Or alternatively, put SELinux into permissive mode, or disable it entirely.

Run sestatus to see which mode it’s currently running in.

[root@tcmysql data]# [root@tcmysql data]# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31

Run grep mysql /var/log/audit/audit.log to see if SELinux has blocked any mysql file access attempts or similar.

(But note that this is fine if running in permissive mode, because then it doesn’t actually block anything, it just logs what it would have blocked if you were running in enforcing mode.)

Run setenforce permissive to put SELinux into permissive mode, though note you need to edit /etc/sysconfig/selinux (or /etc/selinux/config) to make the change permanent.

SELinux Context Type for MySQL

This is just a reference before and after mysql directory copy 

If we won’t do the steps to change the filestype mysqld will throw below error and stop:

# cat /var/log/mysqld.log:
mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
[Warning] Can't create test file /var/lib/mysql/devdb..lower-test
/usr/libexec/mysqld: Can't change dir to '/var/lib/mysql/' (Errcode: 13)
[ERROR] Aborting
[Note] /usr/libexec/mysqld: Shutdown complete

STEP 6: Bounce the whole VM 

1.9 Load/Import Dump file using myloader

Prerequisite of myloader

Disable the Foreign Keys to expedite the loading/import process :

SET unique_checks = 0;
SET GLOBAL innodb_stats_on_metadata = 0;

After Import set those variables back :

SET foreign_key_checks = 1
SET unique_checks = 1;
SET GLOBAL innodb_stats_on_metadata = 1;

Execute myloader

With the below state of directory structures execute myloader which will load mvh_test (all tables) and hotel_rate_wes2(which is just one massive fact table with Hotel rates)

$ myloader -s mvh_test -h <public IP address VM> -u dxpadmin -p <password> -d mvh_test -t 500 -C -v 3 -e 2> myloader_mvh_test.log &

$ myloader --host=<public IP address VM> --user=dxpadmin --password=<password> --database=mvh_test --directory=./hotel_rate_wes2/ --threads=500--overwrite-tables --compress-protocol --verbose=3 -e 2> myload_hotel_rate_wes2.log &

Monitor Progress of the Data Loading

# Following gives approximate number of data files already restored
grep restoring <myloader-output-logs-path> | grep Thread | grep -v schema | wc -l
# Following gives total number of data files to be restored
ls -l <mydumper-output-dir> |  grep -v schema | wc -l
# Following gives information about errors
grep -i error <myloader-output-logs-path>

As the data import is in progress we can see /datadrive Azure DataDisk mount storage Used volume keep increasing . So we are successfully able to use Azure DD to store MySQL Database files.

2.0 Connect MySQL Cloud DB from Workbench & Query

Host Address should be VM Public IP Address :  Note InBound Rule should be allowed for your local machine IP to connect to this VM (This was outlined in previous steps)

Also to connect MySQL on port 3306 your local IP address needs to be GRANT permission to access from Database itself ( this process outlines before how to give GRANT privileges to open access for All or certain IP’s) 

Execute sample query as below:

use mvh_test;

hr.rate_id as rid,
if(uncompressed_length( < 16000000,uncompress(, as data,
from hotel_rate_wes2 hr
where hr.shop_data_source = 'WEB'
and hr.guests = 1
and hr.shop_pos_cd = 'US'
and hr.arv_dt <= '2022-02-08'
and hr.arv_dt >= '2019-03-15'
and hr.prop_id in (253,290,291,362)
and hr.shop_currency_cd in ('SGD','USD')
and ((shop_dttm >= date_sub(now(), interval 7 day)) or (err_dttm >= date_sub(now(), interval 7 day)))
and hr.los = 1


Synapse vs Snowflake vs Databricks in Azure

All these three technologies provided modern approach to Cloud Data warehousing but each of them having unique set of features to resolves problem , poses unique challenges to work with. Any modern technology platform for a big enterprise should not take monolithic approach for Data solutions unless clear understanding of business use case and polyglot persistence architecture must be keep in mind when designing the Data store.

Its hard to make the judgement initially about what data store to use for what purpose so does the Research , Proof of Concept and Due Diligence work required when architecting the data solution and this will help building right things in right way

To understand the key difference I have tried to put all three technology comparisons together in one frame and with very high level differentiation however at the low level there could be thousands of other difference on features which is out of scope at present for this thread. As of my writing the differences captured below and this is subject to change in future evolution

Lets deep drive on it and happy to hear feedback/comments below:

Azure Local Zone Geo Redundancy – Connecting the dots

The objective is to simplify the Azure resiliency options explaining the Local Redundancy (LR) , Zone Redundancy(ZR) and Geo Redundancy(GR) . Redundancy offers degree of High availability so does address the SLA % for Fault and PITR (Point in time recovery) for Disaster.

Redundancy is a key objective on Cloud paradigm giving the agility of the application compute and storage options so that any Disaster scenario could be handled with maximum flexibility , lowest possible downtime and with better cost effective option with minimal impact to platform and infrastructure.

Below diagram tries to address the key standards of redundancy from Architecture standpoint . When we talk about redundancy we should always differentiate Storage vs Compute redundancy. Below cases stated mostly applied for Storage redundancy however Compute redundancy is varied across services and offerings.

For e.g. for by-default Managed services having inherent redundancy inbuilt for e.g. Data is either replicated Synchronously three times in primary region using LRS (local redundant storage) and then replicated asynchronously to the secondary region as GRS (Geo redundant storage) or Data is replicated synchronously across 3 AZ’s in primary region using ZRS and then replicated to secondary in asynchronous way as GZRS (Geo Zone redundant storage)

In above example we focus on deploying Azure resources on US Eastern Region (US EAST). Similarly there would be other Geographical region exist making a Regional Pair for Geo Redundant Storage(GRS)

Now Primary EAST US region comprised of multiple Availability Zones(AZ) which is located into different hardware infrastructure . In this case US EAST having 3 AZ’s to give Zone Redundancy. All AZ’s are connected through Azure Virtual network to allow Synchronous replication. These AZ’s are logical entity combining physically different Datacenters.

Within one AZ there could be hundreds of Azure connected resources / services exist spread over one Data Center and across different floor of a building or could be separate out across hardware shelves/RAC’s. That’s how it is giving the local redundancy . LRS services are physically and logically closed to each other to allow minimal downtime for failure and maintenance for patches.

Hope this simplifies some cloudy areas of redundancy and explains the importance of Redundancy/Replication in cloud.

Connecting Oracle OnPrem from Cloud using Integration Runtime , No Expressroute


how to leverage Microsoft Self Hosted Runtime integration in Azure Data Factory to connect On-Premise network and view data.

This is a proof of concept that said without being a hard wired Azure ExpressRoute setup or Site to Site VPN connectivity we can still connect to On-Premise Data to Cloud Infrastructure by the help of Azure Data Factory


The scope of this POC is limited to show the connectivity of the data source (Oracle) which is behind corporate firewall and view data part only. Detail scope to move data from Oracle to Azure Data Lake Storage Gen 2( ADLS Gen2) will be covered in separate blog item.


  • You need to have standalone local machine VPN connectivity established to connect to the Source Database (in this case Oracle) on-Prem.
  • Windows Server 2012/2016/2019 , Windows 8.1 , Windows 10

What is Microsoft Self Hosted Integration Runtime

The Microsoft Integration Runtime is a customer managed data integration and scanning compute infrastructure used by Azure Data Factory, Azure Synapse Analytics and Azure Purview to provide data integration and scanning capabilities across different network environments. 

A self-hosted integration runtime can run copy activities between a cloud data store and a data store in a private network. It also can dispatch transform activities against compute resources in an on-premises network or an Azure virtual network. The installation of a self-hosted integration runtime needs an on-premises machine or a virtual machine inside a private network

Self Hosted Integration Runtime Architecture

The high-level overview of data flow
Self Hosted Integration Runtime Architecture as Define in Azure Portal


  • Create Data Pipeline in Azure Data Factory
  • Setup Integration Runtime
  • Install Microsoft Self hosted integration runtime in local laptop and Configure the connection to cloud
  • Setup Linked Services
  • View Oracle Data

Step 1 : Create Data Pipeline in Azure Data Factory

Search for Data Factory service in Azure

Create a Data Factory Pipeline

I have created “TestOracleDBConnectADF” as and Click Author and Monitor

Create New Pipeline

Click New Integration Runtimes

Choose SelfHosted option and Copy the Key . Also here we can see direct download link for the integration Runtime installer

Step 2 : Setup Self Hosted Integration runtime in local laptop

Download and Install from here :

Copy the Key to below section after running the installer and hit Register

Step 3 : Verify the connection established between Local machine and Cloud

The Selfhosted runtime configuration should be look like this . No additional configuration required to set here

In Azure Portal it should look like below

Step 4 : Connect to VPN and See the Sample table in Exadata schema

Step 5 : Configure Linked Services to connect with Oracle Exadata DB info

In Azure Data factory create New Linked service called “LS_Oracle_onPrem” .

Add the Exadata OnPrem Database connection information there and Test the connection:

Step 6 : Create New Dataset

I have created Data folder called EMPDEPT and inside that new dataset called SRC_EMP 

Pick and choose the Linked Service and Test Connection . Once successful Click the Table dropdown to get the list of Schema and Table to access and you can Preview any dataset you want .

This will directly fetch the data from OnPrem Oracle to Cloud . 

That’s all for the Validating and testing connection and data access across two platform. 

Databricks Spark vs Open Source Spark

As an Architect we use to have lot of challenges to pick right technology on right use case and this is very key for long term Strategy of Product and Solutions in Cloud.

The intent of the writing is to give the right choice for the Architect and Data Solution engineer and pick the best breed of technology and not to criticize. Again this is completely my view and opinion so pick your favorite by during your own Due diligence 🙂

I did very deep dive on this areas of technology to find out the differences between Open Source Spark and Databricks Spark. Essentially some of the Open source Spark has been used in typical Cloud technologies to integrate in the framework for e.g. in this case Azure Synapse adopted Spark as separate pool differentiating from SQL DW pool.

My exercise is to show the differences in a Birds eye view based on my own research. This is reported as of April 1st 2021 but below information subject to change on technology evaluations for future. Hope this will help.

Love to see any comment , opinions / feedback please share

Couchbase6.5 K8s Installation – Sweet & Simple

Below step defines the Couchbase 6.5 with Autonomous operator 2.0 Installation in on-Premise Private Cloud based Kubernetes env
  • Installation of Helm
    • This is required to streamline your installation and managing the applications to be deployed in K8s platform
    • I will touchbase this loosely and assume k8s admin already made this available and configured to the platform
    • Helm3 needed for this version of couchbase installation
    • For older Couchbase version you need Tiller to be deployed in your cluster instance
  • Helm Chart is deployable packaged module which is collection of files which define kubernetes resources
  • Add/Download Couchbase helm Chart from repo
  • verify the repo
  • fetch the repo in remote linux directory
  • Customize the deployment the Yaml inside directory
  • perform the helm installation – for Operator first and then Cluster
  • Helm installation will be done in two folds:
    • Couchbase Autonomous Operator (this manages the Cluster in K8s platform)
    • Couchbase Cluster (this is actual cluster)

What is Helm: Helm is single unit of Installer and package manager for Kubernetes applications where any resources (Service / Roles / Deployments) can be managed and updated as single unit and all dependencies would be catered inside it automatically

Helm needs to be installed and present in the env (typically a Linux shell where you have pre-configured kubeconfig file exist to connect to right context and right namespace) from where you invoke all below commands

For Couchbase 2.0 operator I will be using Helm3 build 3.3.0+ . Couchbase compatible with Helm 3.1+ build. Helm build can be downloaded for your OS from here : . Some previous release of operator is compatible with only lower version of helm.

[master] wget
--2020-08-28 18:05:46--
Resolving (…, 2606:2800:11f:1cb7:261b:1f9c:2074:3c
Connecting to (||:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 12741413 (12M) [application/x-tar]
Saving to: ‘helm-v3.3.0-linux-amd64.tar.gz’
100%[========================>] 12,741,413 --.-K/s in 0.1s
2020-08-28 18:05:47 (106 MB/s) - ‘helm-v3.3.0-linux-amd64.tar.gz’ saved [12741413/12741413]

I have extracted this .tar.gz file into my home directory and will execute my helm3 as ~/helm3/helm going forward

Lets switch to namespace and the right k8s env where you want to install couchbase and Set current context: 

kubectl config set-context $(kubectl config current-context) --namespace=<namespace>

kubectl config file to connect right environment is already set inside ~/.kube/config

Check what is available in local repo:

~ [master]
helm repo list

As Helm maintain all the Chart repositories so lets Add new couchbase repository

~ [master]
helm repo add couchbase
"couchbase" has been added to your repositories

Update the repo index to get latest available charts from Chart repositories to local:

~ [master]
helm repo update
Hang tight while we grab the latest from your chart repositories…
…Skip local chart repository
…Successfully got an update from the "couchbase" chart repository          …Successfully got an update from the "stable" chart repository
Update Complete. ⎈ Happy Helming!⎈

Check the repo list now:

~ [master]
helm repo list

ignore if you see stable and local as other repository

Helm maintain local and cache repository index in your linux file system

~/.helm/repository [master]
ls -ltr
total 12
drwxr-xr-x 2 dpaul domain users 4096 Apr 19 16:14 cache
drwxr-xr-x 2 dpaul domain users 4096 Jul 10 14:52 local
-rw-r--r-- 1 dpaul domain users 697 Jul 10 15:30 repositories.yaml
~/.helm/repository/cache [master]
confluentinc-index.yaml couchbase-index.yaml incubator-index.yaml local-index.yaml stable-index.yaml

Search the repo to confirm couchbase is available and its version:

~ [master]
helm search repo couchbase
couchbase/couchbase-cluster 0.1.2 1.2 Couchbase Server is a NoSQL document database with a dist…
couchbase/couchbase-operator 2.0.1 2.0.1 A Helm chart to deploy the Couchbase Autonomous Operator …

this command is helpful to find the right chart version you want to install in-case you have multiple charts with different release editions

Before I proceed making sure no previous couchbase cluster exist on my namespace:

~ [master]
kubectl get cbc
No resources found in bi-dev namespace.

exist this can be cleaned up by issuing command : kubectl delete cbc <clustername>

Also making sure there is no previous couchbase operator and deployment exist using helm’s tiller utility :

~ [master]
~/helm3/helm ls | grep operator

~ [master]
~/helm3/helm ls  | grep cluster

~ [master]  kubectl get deployment

above should return no results . deployment will find if there is any previous instance of couchbase deployment exist on this said namespaces or not . If yes use : ‘kubectl delete deployment’ to purge the previous cluster before proceed.

If the above return results that means you may have already couchbase deployment/instance exist on same namespace or this could be dangling charts with crazy names of the previously deleted instance. you might want to clean those using below options to have clean slate start:

~/couchbase65 [master]
~/helm3/helm delete <chartname>
release "<release name>" deleted

While you can run the installer from repo directly by issuing below commands but ideal practice is to customize the installer as per your env needs for e.g. you want to named it differently or you want to change the resource (cpu/memory) allocation or you want to customize your couchbase deployment services .

Note: don’t be surprised if below command throws up error to you as you might NOT have certain permissions in cluster to execute certain things because of RBAC privileges

~/helm3/helm install <my-release> couchbase/couchbase-operator

In my case I would be fetching the installer from repo to my remote directory to edit it based on env needs:

~ [master]
~/helm3/helm fetch couchbase/couchbase-operator
~ [master]
~/helm3/helm fetch couchbase/couchbase-cluster

This brings below:

-rw-r--r-- 1 dpaul domain users 15712 Jul 10 16:10 couchbase-operator-2.0.1.tgz
-rw-r--r-- 1 dpaul domain users 4517 Jul 10 16:11 couchbase-cluster-0.1.2.tgz

Now lets extract this .tgz file: I have created a directory called couchbase65 before I move the content there.

~ [master]
mkdir couchbase65 ; mv couchbase*.tgz couchbase65 ; cd couchbase65
~/couchbase65 [master]
tar -xvzf couchbase-cluster-0.1.2.tgz ;
~/couchbase65 [master]
tar -xvzf couchbase-operator-2.0.1.tgz

Now you got this:

~/couchbase65 [master]
ls -ltr
total 32
-rw-r--r-- 1 dpaul domain users 15712 Jul 10 16:10 couchbase-operator-2.0.1.tgz
-rw-r--r-- 1 dpaul domain users 4517 Jul 10 16:11 couchbase-cluster-0.1.2.tgz
drwxr-xr-x 3 dpaul domain users 4096 Jul 10 16:25 couchbase-cluster
drwxr-xr-x 5 dpaul domain users 4096 Jul 10 16:25 couchbase-operator

Now the Action Begins ! 🙂

Few steps we need to know is : you can install Couchbase from the extracted location with customized parameters or else you can pull that couchbase instance from repo directly which will install both operator and cluster :

Note: due to security reason of couchbase resources to be managed in kubernetes cluster, our cluster Administrator created custom CRD’s and Admission controller so as a Couchbase cluster owner I don’t need to bother about internals of permissions managements –


Couchbase Operator enables you to run Couchbase deployments natively on Open Source Kubernetes or Enterprise Red Hat OpenShift Container Platform.

The goal of the Couchbase Operator is to fully manage one or more Couchbase deployments by removing operational complexities of running Couchbase by automating the management of common Couchbase tasks such as the configuration, creation, upgrade and scaling of Couchbase clusters.

The Couchbase Operator extends the Kubernetes API by creating a Custom Resource Definition(CRD) and registering a Couchbase specific controller (the Operator) to manage Couchbase clusters.

in couchbase-operator directory you will see below files :

-rw-r--r-- 1 dpaul domain users 888 Aug 20 11:48 Chart.yaml
-rw-r--r-- 1 dpaul domain users 12195 Aug 20 11:48 values.yaml
drwxr-xr-x 2 dpaul domain users 4096 Aug 20 11:48 templates
-rw-r--r-- 1 dpaul domain users 92 Aug 20 11:48 OWNERS
drwxr-xr-x 2 dpaul domain users 4096 Aug 20 11:48 examples
drwxr-xr-x 2 dpaul domain users 4096 Aug 20 11:48 crds

the file important to look at is values.yaml . It has built-in objects that helm templates offer . All the computational details provided inside the charts required certain values to be passed. if you want to override some values you need to have your own values.yaml in different name lets say I created once called “tc_operator_values.yaml” . Just copy values.yaml file and create your own file with this name . Here is my customized Operator Yaml. Details of below parameters can be found in Couchbase Operator 2.0 document link here :

# Default values for couchbase-operator chart.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# Select what to install
  # install the couchbase operator
  couchbaseOperator: true
  # install the admission controller
  admissionController: false
  # install couchbase cluster
  couchbaseCluster: false 
  # install sync gateway
  syncGateway: false

# couchbaseOperator is the controller for couchbase cluster
  # name of the couchbase operator
  name: "dev-ilcb-bi-cb"
  # image config
    repository: couchbase/operator
    tag: 2.0.1
  imagePullPolicy: IfNotPresent
  imagePullSecrets: []
    # pod creation timeout
    pod-create-timeout: 10m
  # resources of couchbase-operator
  resources: {}
  nodeSelector: {}
  tolerations: []
  # enable to auto create certs
  generate: true 
  # Expiry time of CA in days for generated certs
  #expiration: 365

Before install lets verify the couchbase opetator package by helm3 lint utility . This will detect any errors or anomalies present inside your base operator packages : in 2.0 I found below issue and reached couchbase which they have said to ignore and will be fixed in later release:

~/helm3/helm lint couchbase-operator
==> Linting couchbase-operator
[ERROR] templates/: parse error at (couchbase-operator/templates/_helpers.tpl:129): function "lookup" not defined
Error: 1 chart(s) linted, 1 chart(s) failed

To install operator you need to be inside couchbase directory and use below command :

~/helm3/helm install cb-operator couchbase-operator/ --values couchbase-operator/tc_operator_values.yaml --namespace bi-cb --skip-crds --debug --dry-run

Details of every parameters below:

cb-operator is my custom operator name , couchbase-operator/ is the directory name , I am installing into bi-cb namespace with skip CRD parameters (because my K8s Admin already had CRD installed for me) and use –debug and –dry-run before execution as best practice to see if Chart has any error and if all computational values looks okay

Above will verify if installation charts are syntactically good and proceed actual installation by removing –debug and –dry-run to build the operator:

~[master] ~/helm3/helm install cb-operator couchbase-operator/ --values couchbase-operator/tc_operator_values.yaml --namespace bi-cb --skip-crds

NAME: cb-operator
LAST DEPLOYED: Fri Aug 28 18:21:52 2020
STATUS: deployed
== Couchbase-operator deployed.
# Check the couchbase-operator logs
kubectl logs -f deployment/cb-operator-dev-ilcb-bi-cb –namespace bi-cb

== Manage this chart
# Upgrade Couchbase
helm upgrade cb-operator -f stable/couchbase

# Show this status again
helm status cb-operator

this is log of the deployment

Now verify the new deployment created , operator helm chart created

kubectl get deployment
cb-operator-dev-ilcb-bi-cb 1/1 1 1 48s
~/helm3/helm ls
cb-operator bi-cb 1 2020-08-28 18:21:52.203954412 -0400 EDT deployed couchbase-operator-2.0.1 2.0.1

Lets see the POD’s status

kubectl get pods
cb-operator-dev-ilcb-bi-cb-98dccf88-ftn5f 1/1 Running 0 6m41s

Let’s see the logs of the Operator :

kubectl logs cb-operator-dev-ilcb-bi-cb-98dccf88-ftn5f
{"level":"info","ts":1598653316.9318385,"logger":"main","msg":"couchbase-operator","version":"2.0.1 (build 130)","revision":"release"}
{"level":"info","ts":1598653316.93203,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1598653316.9731295,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1598653316.9784648,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1598653317.00218,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"couchbase-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1598653317.1025467,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"couchbase-controller"}
{"level":"info","ts":1598653317.202735,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"couchbase-controller","worker count":4}

this is very useful for troubleshooting purposes . As operator drives everything in the couchbase cluster so any cluster related issue can be further triaged from here


There is two types of deployment : MDS and non-MDS

In Non-MDS deployment all the couchbase services nodes will get same resources allocated (like CPU , RAM , Storage) but in ideal world use-case we might want to do MDS for production load because MDS is customized at granular level to shuffle allocation across the resources based on anticipated workload and I can customized CPU , RAM , Storage distributions based on my need.

In my specifications I will go with MDS based and NON-MDS helm chart is much easier .

Now , you have two options : either create your own customized values.yaml inside couchbase-cluster directory (which was extracted from .tgz file copying actual values.yaml ) or create the same inside couchbase-operator directory and inside helm charts make sure you mention couchbaseCluster: true and rest are false . I have created file named: tc_cluster_values_mds.yaml

I have provided the Couchbase Cluster code inline at below and point to note here is that some of the parameters I have taken out from Couchbase provided Yaml .For e.g. I want the buckets to be managed by Admin Console UI so I made it false.

Detail specifications of individual couchbase resource parameters refer here:

# Couchbase Multidimension Scaling Cluster Chart
# Maintained by : Debashis Paul
# Default values for couchbase-operator chart.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

# Select what to install
  # install the couchbase operator
  couchbaseOperator: false
  # install the admission controller
  admissionController: false
  # install couchbase cluster
  couchbaseCluster: true
  # install sync gateway
  syncGateway: false

  # Default values for couchbase-cluster
  # name of the cluster. defaults to name of chart release
  name: "couchbase-cluster-dev"
  # image is the base couchbase image and version of the couchbase cluster
  image: "couchbase/server:6.5.1"
  antiAffinity: true
    # username of the cluster admin.
    username: Administrator
    # password of the cluster admin.
    # auto-generated when empty
    password: "admin123"
    # adminSecret is name of secret to use instead of using
    # the default secret with username and password specified above
     managed: false
    #ldap: {}
  # networking options
    # Option to expose admin console
    exposeAdminConsole: true
    # Option to expose admin console
      - data
    # Specific services to use when exposing ui
      - client
     # - xdcr
    # Defines how the admin console service is exposed.
    # Allowed values are NodePort and LoadBalancer.
    # If this field is LoadBalancer then you must also define a spec.dns.domain.
    adminConsoleServiceType: NodePort
    # Defines how the per Couchbase node ports are exposed.
    # Allowed values are NodePort and LoadBalancer.
    # If this field is LoadBalancer then you must also define a spec.dns.domain.
    exposedFeatureServiceType: NodePort
    # The dynamic DNS configuration to use when exposing services
  #  dns:
    # Custom map of annotations to be added to console and per-pod (exposed feature) services
  #  serviceAnnotations: {}
    # The Couchbase cluster tls configuration (auto-generated)
  #  tls:
  # The retention period that log volumes are kept for after their associated pods have been deleted.
  logRetentionTime: 604800s
  # The maximum number of log volumes that can be kept after their associated pods have been deleted.
  logRetentionCount: 20
  # xdcr defines remote clusters and replications to them.
    # managed defines whether the Operator should manage XDCR remote clusters
  #  managed: false
    # remoteClusters contains references to any remote clusters to replicate to
  #  remoteClusters:
  # backup defines values for automated backup.
    # managed determines whether Automated Backup is enabled
    managed: true
    # image used by the Operator to perform backup or restore
    image: couchbase/operator-backup:6.5.1
    # optional service account to use when performing backups
    # service account will be created if it does not exist
    serviceAccountName: cbbackupuser
  # defines integration with third party monitoring sofware
  # monitoring:
  #   prometheus:
  #     # defines whether Prometheus metric collection is enabled
  #     enabled: true
  #     # image used by the Operator to perform metric collection
  #     # (injected as a "sidecar" in each Couchbase Server Pod)
  #     image: couchbase/exporter:1.0.1
  #     # Optional Kubernetes secret that clients use to access Prometheus metrics
  #     authorizationSecret:
  # Cluster wide settings for nodes and services
    # The amount of memory that should be allocated to the data service
    dataServiceMemoryQuota: 4096Mi
    # The amount of memory that should be allocated to the index service
    indexServiceMemoryQuota: 4096Mi
    # The amount of memory that should be allocated to the search service
    searchServiceMemoryQuota: 256Mi
    # The amount of memory that should be allocated to the eventing service
    eventingServiceMemoryQuota: 256Mi
    # The amount of memory that should be allocated to the analytics service
    analyticsServiceMemoryQuota: 1Gi
    # The index storage mode to use for secondary indexing
    indexStorageSetting: plasma

      # amount of fragmentation allowed in persistent database [2-100]
        percent: 30
        size: 1Gi
      # amount of fragmentation allowed in persistent view files [2-100]
        percent: 30
        size: 1Gi
      # whether auto-compaction should be performed in parallel
      parallelCompaction: false
      # how frequently tombstones may be purged
      tombstonePurgeInterval: 72h
      # optional window when an auto-compaction may start (uncomment below)
      timeWindow: {}
      # start: 02:00
      # end: 06:00
      # abortCompactionOutsideWindow: true

  # configuration of logging functionality
  # for use in conjuction with logs persistent volume mount
    # retention period that log volumes are kept after pods have been deleted
    logRetentionTime: 604800s
    # the maximum number of log volumes that can be kept after pods have been deleted
    logRetentionCount: 20
  # kubernetes security context applied to pods
    # fsGroup of persistent volume mount
    fsGroup: 1000
    runAsUser: 1000
    runAsNonRoot: true
  # cluster buckets
    #Managed defines whether buckets are managed by us or the clients.
    managed: false
      size: 3
        - data
            cpu: "2"
            memory: 3Gi
            cpu: "1"
            memory: 2Gi
        data: couchbasedata
        default: couchbasedefault
      size: 2
        - index
        - query
            cpu: "2"
            memory: 3Gi
            cpu: "1"
            memory: 2Gi
        index: couchbaseindex
        default: couchbasedefault
      size: 1
        - search
        - eventing
        - analytics
            cpu: "2"
            memory: 2Gi
            cpu: "1"
            memory: 1Gi
        default: couchbasedefault
    - metadata:
        name: couchbasedata
          - ReadWriteOnce
        storageClassName: pure-block
            storage: 70Gi
    - metadata:
        name: couchbaseindex
          - ReadWriteOnce
        storageClassName: pure-block
            storage: 50Gi
    - metadata:
        name: couchbasedefault
          - ReadWriteOnce
        storageClassName: pure-block
            storage: 5Gi               

# RBAC users to create
# (requires couchbase server 6.5.0 and higher)
users: {}

#Uncomment to create an example user named 'developer'

  # password to use for user authentication
  # (alternatively use authSecret)
  password: password
  # optional secret to use containing user password
  # domain of user authentication
  authDomain: local
  # roles attributed to group
    - name: bucket_admin
      bucket: group360

# TLS Certs that will be used to encrypt traffic between operator and couchbase
  # enable to auto create certs
  generate: true
  # Expiry time of CA in days for generated certs
  #expiration: 365

Now Lets execute the Cluster Helm-Chart build

~/couchbase65 [master]
~/helm3/helm install cb-cluster couchbase-operator --values couchbase-operator/tc_cluster_values_mds.yaml --namespace bi-dev --skip-crds 

if you end up with problem use –debug –dry-run to test the computed values . I would do that first before execution . Also –skip-crds is very important

After execution you will see below

~/helm3/helm install cb-cluster couchbase-operator --values couchbase-operator/tc_cluster_values_mds.yaml --namespace bi-cb --skip-crds
NAME: cb-cluster
LAST DEPLOYED: Mon Aug 31 18:40:42 2020
STATUS: deployed
== Connect to Admin console
kubectl port-forward --namespace bi-cb couchbase-cluster-dev-0000 18091:18091
# open https://localhost:18091
username: Administrator
password: admin123
== Manage this chart
# Upgrade Couchbase
helm upgrade cb-cluster -f stable/couchbase
# Show this status again
helm status cb-cluster

Lets see the Operator Logs while the Cluster started getting created

~/couchbase65 [master]
kubectl logs -f cb-operator-dev-ilcb-788bc4cb9b-l854f
{"level":"info","ts":1598915123.440226,"logger":"main","msg":"couchbase-operator","version":"2.0.1 (build 130)","revision":"release"}
{"level":"info","ts":1598915123.440417,"logger":"leader","msg":"Trying to become the leader."}
{"level":"info","ts":1598915123.4805334,"logger":"leader","msg":"No pre-existing lock was found."}
{"level":"info","ts":1598915123.4857814,"logger":"leader","msg":"Became the leader."}
{"level":"info","ts":1598915123.511823,"logger":"kubebuilder.controller","msg":"Starting EventSource","controller":"couchbase-controller","source":"kind source: /, Kind="}
{"level":"info","ts":1598915123.6121693,"logger":"kubebuilder.controller","msg":"Starting Controller","controller":"couchbase-controller"}
{"level":"info","ts":1598915123.7123137,"logger":"kubebuilder.controller","msg":"Starting workers","controller":"couchbase-controller","worker count":4}
{"level":"info","ts":1598915182.4643793,"logger":"cluster","msg":"Watching new cluster","cluster":"bi-cb/couchbase-cluster-dev"}
{"level":"info","ts":1598915182.4645112,"logger":"cluster","msg":"Janitor starting","cluster":"bi-cb/couchbase-cluster-dev"}
{"level":"info","ts":1598915182.4669094,"logger":"cluster","msg":"Couchbase client starting","cluster":"bi-cb/couchbase-cluster-dev"}
{"level":"info","ts":1598915182.5306997,"logger":"cluster","msg":"UI service created","cluster":"bi-cb/couchbase-cluster-dev","name":"couchbase-cluster-dev-ui"}
{"level":"info","ts":1598915182.5568848,"logger":"cluster","msg":"Cluster does not exist so the operator is attempting to create it","cluster":"bi-cb/couchbase-cluster-dev"}
{"level":"info","ts":1598915182.6547754,"logger":"cluster","msg":"Creating pod","cluster":"bi-cb/couchbase-cluster-dev","name":"couchbase-cluster-dev-0000","image":"couchbase/server:6.5.1"}

you will start seeing the Couchbase POD’s will be creating one after another

~/couchbase65 [master]
kubectl get pods
cb-operator-dev-ilcb-788bc4cb9b-l854f 1/1 Running 0 113s
couchbase-cluster-dev-0000 0/1 Running 0 52s
couchbase-cluster-dev-0001 0/1 Running 0 22s
tiller-deploy-5fc9fcb64b-dtkx5 1/1 Running 0 24d

once all the 6 POD’s is in running state that means cluster installation is successful

This is how finally how cluster , deployment , pods and helm will look like

~/couchbase65 [master]
kubectl get cbc
couchbase-cluster-dev 6.5.1 6 Running 4ea6222a5b4ee6a50613ab0fd589a9f0 3m52s
~/couchbase65 [master]
kubectl get deployment
cb-operator-dev-ilcb 1/1 1 1 4m40s
tiller-deploy 1/1 1 1 383d

~/helm3/helm ls
cb-cluster bi-cb 1 2020-08-31 19:06:04.399745186 -0400 EDT deployed couchbase-operator-2.0.1 2.0.1
cb-operator bi-cb 1 2020-08-31 19:05:20.266942245 -0400 EDT deployed couchbase-operator-2.0.1 2.0.1
~/couchbase65 [master]
kubectl get pods
cb-operator-dev-ilcb-788bc4cb9b-l854f 1/1 Running 0 4m53s
couchbase-cluster-dev-0000 1/1 Running 0 3m52s
couchbase-cluster-dev-0001 1/1 Running 0 3m22s
couchbase-cluster-dev-0002 1/1 Running 0 2m56s
couchbase-cluster-dev-0003 1/1 Running 0 2m30s
couchbase-cluster-dev-0004 1/1 Running 0 2m11s
couchbase-cluster-dev-0005 1/1 Running 0 107s
tiller-deploy-5fc9fcb64b-dtkx5 1/1 Running 0 24d

Now lets get the Nodeport details to access database (both Admin and for using Client API connection) :

~/couchbase65 [master]
kubectl get svc

To access Couchbase Admin URL you need Nodeport service names and port : for my example above ‘couchbase-cluster-dev-ui’ is the service and outside nodeport corresponding to 8091 which is 31255 is what we need to access this outside K8s env. This port assigned automatically by K8s. Now lets find what IP address we should use.

Lets describe POD from above list of Cluster POD’s and you can pick any one of the IP address and use port 31255 .

~ [master]
kubectl describe pod | grep Node

Now lets access Admin Console UI

Customizing Apache Superset UI – Config , Theme Changes

Lets talk something different than Traditional BI today which is about Apache Superset , A modern UI framework based on Flask and React JS.

As of my writing I am on Superset 0.99.0a version : Assuming I have my virtualenv , node module everything running and I have superset installer synced from docker Hub or cloned from Github I am straight going to the customization of different pieces of UI.

I am touching basics of the installation quickly : and my local folder where Superset exist is ‘ssdev’

pip install virtualenv
virtualenv venvdxp — create the env called ssdev
. ./venvdxp/bin/activate or source venvdxp/bin/activate — Activating virtualenv ssdev

Create your own Superset Dev env directory:
mkdir ssdev
cd ssdev
sudo git clone
sudo git clone
cd incubator-superset
(venv) dpaul@<HOST> ~/ssdev/incubator-superset $
sudo pip install -r requirements.txt
cd ~/superset_dxp
sudo chmod -R 777 incubator-superset/
cd incubator-superset/
sudo pip install -e .
fabmanager create-admin –app superset (note you can’t have multiple account in same name and same emailid in 1 venv )
superset db upgrade
superset init or superset-init
superset load_examples
nohup superset runserver -d -p 8080 &
tail -f nohup.out

The version I am working on below is :

cat $HOME/ssdev/incubator-superset/superset/assets/version_info.json
{“GIT_SHA”: “0509d7aefd71ee21b417c315c8f29cb51bb6b48c”, “version”: “0.999.0dev”}

Create Custom Folder for UI in Superset to make custom change localized here : In my case my custom changes placed under TC folder :
cd ~/ssdev/incubator-superset/superset/assets/stylesheets/less/
mkdir TC
cp -r cosmo/* TC/
cd ~/ssdev/incubator-superset/superset/assets/stylesheets/less/TC
mv cosmoTheme.js TCTheme.js
Comment below and Add TC config :
//@import “./cosmo/variables.less”;
//@import “./cosmo/bootswatch.less”;
// TC Branding Config
@import “./TC/variables.less”;
@import “./TC/bootswatch.less”;
Comment below and Add TC Config:
@import ‘./less/index.less’;
//@import “./less/cosmo/variables.less”;
// TC Branding Config change
@import “./less/TC/variables.less”;
Changing All UI Fonts to “Lato” Font:
Replace below to change all font to Lato: (Note Lato file must be copied/ deployed to the Superset Unix/ Linux env in ~/.fonts)
and then :
$fc-cache -fv
to load the fonts.
if fc-cache is not available in docker then use :  apt-get install fontconfig
Now change below on  :  ~/ssdev/incubator-superset/superset/assets/stylesheets/less/TC/variables.less
//@font-family-sans-serif:  Helvetica, Arial;
@font-family-sans-serif:  Lato;
//@font-family-serif:       Georgia, “Times New Roman”, Times, serif;
@font-family-serif:       Lato ;
//** Default monospace fonts for `<code>`, `<kbd>`, and `<pre>`.
//@font-family-monospace:   Menlo, Monaco, Consolas, “Courier New”, monospace;
@font-family-monospace:   Lato;
Changing Logo:
Copy new logo here: ~/ssdev/incubator-superset/superset/assets/images/TClogo.png
Edit ~/ssdev/incubator-superset/superset/
Change the Logo name there:
APP_ICON = ‘/static/assets/images/TClogo.png’
Shaping the Top Nav Bar(height) plus bgcolor and font color:
Replace below:
//@navbar-height:                    50px;
@navbar-height:                      30px;
//TC Branding colors
@tcbrand-blue:         #0093e0;
//@navbar-inverse-bg:                         #fff;
@navbar-inverse-bg:                         @tcbrand-blue;
//@navbar-inverse-link-color:                 @gray-dark;
@navbar-inverse-link-color:                 #ffffff;
Navbar onhover underline color:
.navbar-inverse .navbar-nav > > a {
//  border-bottom: 3px solid @brand-primary;
border-bottom: 3px solid #F38A00;
.navbar-nav > li > a {
//  padding-top: 18px;
// padding-top: 8px;
Changing the Caret color:
.caret {
  border: none;
  color:  #ffffff;
.caret:hover {
  color:  #ffffff;
Changing The filterbox visualizations:
const backgroundImage = (
                ‘linear-gradient(to right, lightgrey, ‘ +
              //  `lightgrey ${perc}%, rgba(0,0,0,0) ${perc}%`
                `lightgrey ${0}%, rgba(0,0,0,0) ${0}%`
              label={t(‘Start Date’)}
              description={t(‘Select starting date’)}
              onChange={this.changeFilter.bind(this, since)}
              label={t(‘End Date’)}
              description={t(‘Select end date’)}
              onChange={this.changeFilter.bind(this, until)}
Background blue onhover color:
.VirtualizedSelectFocusedOption {
//  background-color: rgba(0, 0, 0, 0.1);
background-color: #ffffff ;
color: #0093e0;
.VirtualizedSelectFocusedOption:hover {
  cursor: pointer;
background-color: #0093e0 ;
color: #ffffff;
Changing the Brand Primary color from SS default green to TC Blue:
Will change Dashboard links to blue , Fav color to blue and export link color to blue:
//@brand-primary:         #00A699;
@brand-primary:         #0093E0;
Changes the style of the Input fields:
//** Text color for “s
//@input-color:                    @text-color;
@input-color:                    #0093e0;
//@input-bg-disabled:              @gray-lighter;
@input-bg-disabled:             #0093e0 ;
//@input-border-focus:             #66afe9;
@input-border-focus:             #0093e0;
Button shape change:
// Allows for customizing button radius independently from global border radius
//@btn-border-radius-base:         @border-radius-base;
//@btn-border-radius-large:        @border-radius-large;
//@btn-border-radius-small:        @border-radius-small;
@btn-border-radius-base:         30px;
@btn-border-radius-large:        30px;
@btn-border-radius-small:        30px;
Removing the table border color:
//** Border color for table and cell borders.
//@table-border-color:            #ddd;
@table-border-color:            #fff;
Dropdown menu changes:
//@dropdown-border:                rgba(0,0,0,.15);
@dropdown-border:                #0093e0;
//@dropdown-link-color:            @gray-dark;
@dropdown-link-color:            #495057;
Fix custom logo alignment issue:
(Not available in older SS version) And add margin on below section at :
.navbar-brand {
  float: left;
  padding: @navbar-padding-vertical @navbar-padding-horizontal;
  font-size: @font-size-large;
  line-height: @line-height-computed;
  height: @navbar-height;
  &:focus {
    text-decoration: none;
  > img {
    display: block;
    margin-top: -7px ;
Else change in here :
Add new entry after navbar section as below :
.navbar-brand > img {
  display: block;
margin-top: -7px !important;
height: 34px !important;
Changing Filter control text:
              className=”float-left ok”
                onClick={this.setValueAndClose.bind(this, ‘now’)}
                onClick={this.setValueAndClose.bind(this, ”)}
Removing Right side navbar menu items : (Profile / Languages)
Commented out this section:
<!–li class=”dropdown”>
    <a class=”dropdown-toggle” data-toggle=”dropdown” href=”javascript:void(0)”>
    <ul class=”dropdown-menu” id=”language-picker”>
      <li class=”dropdown”>
        {% for lang in languages %}
            {% if lang != locale %}
                <a tabindex=”-1″ href=”{{appbuilder.get_url_for_locale(lang)}}”>
                     – {{languages[lang].get(‘name’)}}
            {% endif %}
        {% endfor %}
Add below for Dropdown after : {% if not current_user.is_anonymous() %}
<li class=”dropdown”>
<SPAN STYLE=”font-family: Lato”>Dashboards</SPAN>
  &nbsp;<b class=”caret”></b>
        <ul class=”dropdown-menu”>
         href=”/superset/dashboard/world_health” title=”Pricing Analysis” target=”_self”>
        <SPAN STYLE=”font-family: Lato”>Pricing</SPAN>
         href=”/superset/dashboard/births/” title=”Supply Analysis” target=”_self”>
        <SPAN STYLE=”font-family: Lato”>Supply</SPAN>

Add links as tabs in Menu:


add below after : {% include ‘appbuilder/navbar_right.html’ %}
      <li >
         href=”/superset/dashboard/world_health” title=”WorldBank” target=”_self”>
        <SPAN STYLE=”font-family: Lato”>Pricing</SPAN>
        <li >
         href=”/superset/dashboard/births/” title=”Birth” target=”_self”>
        <SPAN STYLE=”font-family: Lato”>Supply</SPAN>

Change ~/ssdev/incubator-superset/superset/templates/appbuilder/navbar.html

          width=”146″ height=”43″ src=”{{ appbuilder.app_icon }}”
          alt=”{{ appbuilder.app_name }}”

To remove Right side navbar menu items : (Version info /Superset’s Github / Documentation  )

Remove all text below:
          <a href=”/static/assets/version_info.json” title=”Version info”>
            <i class=”fa fa-code-fork”></i> &nbsp;


          <a href=”” title=”Google” target=”_blank”>
            <i class=”fa fa-github”></i> &nbsp;
          <a href=”” title=”Documentation” target=”_blank”>
            <i class=”fa fa-book”></i> &nbsp;
Superset default Menu items text can be changed by editing below file content:
Changing the Dashboard in place Button  look and feel to be brand button:
background-color:    #0093e0;
border-radius:     3rem;
border-color:     #0093e0;
Add TC brand color palettes for all visualizations:
Add TC new color palette
In superset/assets/src/modules/colors.js, add this part of code in line 43:
export const tcColors = [
And once that part is added, add this in line 104:
export const ALL_COLOR_SCHEMES = {
To change Heatmap control to add TC color Visualizations:
Add below in line 129 :
tcbrand_blue: [
Add below highlight:
  linear_color_scheme: {
    type: ‘ColorSchemeControl’,
    label: t(‘Linear Color Scheme’),
    choices: [
      [‘fire’, ‘fire’],
      [‘white_black’, ‘white/black’],
      [‘black_white’, ‘black/white’],
      [‘tcbrand_blue’, ‘tcbrand_blue’],
      [‘dark_blue’, ‘light/dark blue’],
      [‘pink_grey’, ‘pink/white/grey’],
      [‘greens’, ‘greens’],
      [‘purples’, ‘purples’],
      [‘oranges’, ‘oranges’],
      [‘blue_white_yellow’, ‘blue/white/yellow’],
      [‘red_yellow_blue’, ‘red/yellowish/blue’],
      [‘brown_white_green’, ‘brown/white/green’],
      [‘purple_white_green’, ‘purple/white/green’],
    default: ‘tcbrand_blue’,
And change the slice view parameter default to : tcbrand_blue or we can change Default in control.jsx 
Adding new SuperSet menu item:
Add below in #2752 Line
#    category_icon=’fa-flask’,
#    icon=’fa-flask’,
#    category_icon=’fa-flask’,
#    icon=’fa-flask’,
    href='<place any URL here>’,
#    category_icon=’fa-flask’,
#    icon=’fa-flask’,
I would only do this if you mind the extra space between container body with navbar:
div.navbar {
  z-index: 999;
margin-bottom: 10px;
Adding lato font:
Add below :
body {
margin: 0px !important;
font-family: ‘Lato’, sans-serif !important;
Removing Select[xxx] in dropdown :
Remove space[%s] :
            placeholder={t(‘Select [%s]’, filter)}
Customize free form text to ‘Custom Range’ :
            title=”Free form”
            isSelected={this.state.type === ‘free’}
            onSelect={this.setType.bind(this, ‘free’)}
Title is not in Lato Font :
.editable-title input {
    padding: 2px 6px 3px 6px;
    font-family: Lato ;
.editable-title input[type=”button”] {
    border-color: transparent;
    background: transparent;
    white-space: normal;
    text-align: left;
font-family: Lato ;
Dashboard and header component change:
Also CSS list of changes in individual dashboards is possible from Superset dashboard pages itself in OBIEE/Weblogic : Read timed out

I can continue write a long thread about this issue that we have faced in regard of this SocketTimeoutException. However this time I will keep in concise.
Here is the issue : two of the prompts in our Production environment suddenly broken which was working fine for past 3-5 months. In between we have made bundle patch upgrade on our env , we did network/firewall level changes so we were not sure why those two prompts started behaving weirdly. Also we found we had similar kind of prompt exist elsewhere which was not facing the vulnerabilities.

By broken, I mean when user click on the prompt they are keep getting “Please wait” appearing in the drop-down menu and the drop-down value never displayed. And this only happens when you are accessing application through web / outside your corporate network and through external DNS. The prompt works fine in internal VPN network.
Since we have thousands of login happen each day, the Production OBIEE environment crashes after couple of days because of this issue. By crashes we mean OBIEE/EM?Console login hangs , you will not find any process went down in opmnctl status -l , all alive , EM / Console even doesn’t show anything red(if you able to login here) determining its a crash .

Pretty interesting ….. !!! Huh ?

When we inspect the console log of browser we have spotted after 2-3 minutes the browser throws below error :

“Internal Server error” Ref :
500 Internal Server Error
ERROR Codes for Reference #3.5e7f1cb8.1547495087.4c779890

The above was Akamai reference number. And for this issue there is nothing to do with Akamai edge or origin server and its cache issue. We have checked Akamai has no issue with cache content

As soon we receive below error the ADR incident log for obiee , bi_Serverx.log and biserver_diagnostic.log throws below :

This appears in Incident log (change the path to be exact for your environment)

Incident Id: 7997
Incident Source: SYSTEM
Create Time: Tue Jan 15 12:11:11 EST 2019
Problem Key: DFW-99998 [][$][analytics]
ECID: e8398b29c4083075:10e4302f:1685274d016:-8000-0000000000008333
Application Name: analytics
User Name: <WLS Kernel>
Error Message Id: DFW-99998

Context Values
DFW_SERVER_NAME : bi_server1
DFW_DOMAIN_NAME : bifoundation_domain
DFW_APP_NAME : analytics
Incident detected using watch rule “UncheckedException”:
Watch time: Jan 15, 2019 12:11:11 PM EST
Watch ServerName: bi_server1
Watch RuleType: Log
Watch Rule: (SEVERITY = ‘Error’) AND ((MSGID = ‘WL-101020’) OR (MSGID = ‘WL-101017’) OR (MSGID = ‘WL-000802’) OR (MSGID = ‘BEA-101020’) OR (MSGID = ‘BEA-101017’) OR (MSGID = ‘BEA-000802′))
Watch DomainName: bifoundation_domain
Watch Data:
DATE : Jan 15, 2019 12:11:11 PM EST
SERVER : bi_server1
MESSAGE : [ServletContext@756918633[app:analytics module:analytics path:/analytics spec-version:2.5 version:11.1.1]] Root cause of ServletException. read is alrady timed out

[WARNING:7] [WL-320068] [Diagnostics] [host: <hostname>] [nwaddr: [] [tid: [ACTIVE].ExecuteThread: ’71’ for queue: ‘weblogic.kernel.Default (self-tuning)’] [userId: <WLS Kernel>] [LOG_FILE: /u00/app/Middleware/user_projects/domains/bifoundation_domain/servers/bi_server2/logs/bi_server2.log] Watch ‘UncheckedException’ with severity ‘Notice’ on server ‘bi_server2’ has triggered at Jan 23, 2019 9:36:00 AM EST. Notification details: [[
WatchRuleType: Log
WatchRule: (SEVERITY = ‘Error’) AND ((MSGID = ‘WL-101020’) OR (MSGID = ‘WL-101017’) OR (MSGID = ‘WL-000802’) OR (MSGID = ‘BEA-101020’) OR (MSGID = ‘BEA-101017’) OR (MSGID = ‘BEA-000802′))
WatchData: DATE = Jan 23, 2019 9:36:00 AM EST SERVER = bi_server2 MESSAGE = [ServletContext@1881523291[app:analytics module:analytics path:/analytics spec-version:2.5 version:11.1.1]] Root cause of ServletException. Read timed out
at Method)
at weblogic.servlet.internal.FilterChainImpl.doFilter(
at Method)
at weblogic.servlet.internal.FilterChainImpl.doFilter(
at oracle.dms.servlet.DMSServletFilter.doFilter(
at weblogic.servlet.internal.FilterChainImpl.doFilter(
at weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(
at weblogic.servlet.internal.WebAppServletContext$
at weblogic.servlet.internal.WebAppServletContext.securedExecute(
at weblogic.servlet.internal.WebAppServletContext.execute(
SUBSYSTEM = HTTP USERID = <WLS Kernel> SEVERITY = Error THREAD = [ACTIVE] ExecuteThread: ’38’ for queue: ‘weblogic.kernel.Default (self-tuning)’ MSGID = WL-101017 MACHINE = <hostname> TXID = CONTEXTID = 0000MXv8etiFKAw_wDCCyW1SGacg003qNu TIMESTAMP = 1548254160834
WatchAlarmType: AutomaticReset
WatchAlarmResetPeriod: 30000


As you are seeing there are two variations of the log :

1) Read timed out
2) read is alrady timed out (note: the typo is not me and it is writing by the OBIEE product itself ! )

Essentially both are same and incident log says this is your problem key : “Problem Key: DFW-99998 [][$][analytics]”

When we started investigating we have seen the prompt code is not special and we have used regular SQL results with SELECT statement and UNION statement.
Most weird part of if we use union all (as lowercase) it works and if we use FETCH ONLY 650001 ROWS clause at the end of the select statement of prompt logical query it works absolutely fine. Again these are the interesting facts which we can contradict later but these are our findings.

Here is the solution :

After going through several layers of network trace , Akamai , IDP Ping SSO , External / Internal DNS check, Checking the packets transfer across firewall , tracing the various logs we are seeing that when you click prompt the request headers even not passing through presentation server session log so its being stuck somewhere in the network and network is not able to process the request . Finally we came to know that we had security layer incorporated by Secureworks (that is a Intrusion Prevention System) which intercept the OBIEE browser POST request header and when it sees the SQL Injections operation (by SELECT or UNION ) via its Intrusion Prevention rule it blocks the inflow of traffic and that stops passing the request to other network layers and cause Weblogic server to go in unknown state. We had to allow those SQL rules across network as white listed traffic to stop this “SocketTimeoutException”.
As soon as we did this those 2 prompts started working fine but the “SocketTimeoutException” didn’t completely goes away . However the volume of this Exception reduced by 100x, 200x and we have seen no sign of crashing the OBIEE platform.

So in Summary :

Root Cause:

The Intrusion Prevention System misidentifying application traffic as malicious.

 Contributing factors:

  • OBIEE uses raw SQL statements for normal application functionality causing the IPS to misidentify the traffic as malicious.
  • OBIEE products do not utilize TLS.
  • Specific filter in the OBIEE, using those filters generate the socket timeout exception which create cascading effect having the product to become unresponsive
  • The server encountered an internal error or mis-configuration and was unable to complete the request.

Unfortunately, because of this Network issue, being in application team, we had to loads of hours to do root cause analysis before we figure this out. Facts of life for developers !!!