MySQL on Premise to Azure Cloud Migration


Objective is to analyze offline Migration of the MySQL Database instance from On Premise to Cloud w/o help of Azure Expressroute or Dedicated VPN Gateway or Private Endpoints. Also understand the process of offline migration , how much data it can handle , troubleshoot different issues and the pros and cons of following one Data Export / Import process vs Other and pick right choices in Cloud (Managed vs VM) . This drives to Cost assessment , provide limitations on certain choices . The right choice for big bulk migration is definitely DMS solutions with minimal downtime and leverage the dedicated connections but that steps are out of scope of this thread

Planning:

  1. MySQL DB Instance in Cloud
  2. Identify Sample Data to be migrated from On Premise
  3. Configure Cloud / Network / Database
  4. Export Data dump / Import Data

Goal:

This PoC will try to achieve following :

  • Create MySQL Database instance in Cloud / Pick right choice  
    • Azure VM using Marketplace MySQL Image
    • Azure Single Server MySQL Managed Instance
  • Pick the right option for MySQL Migration
    • Using mysqldump or PHPMyAdmin (Offline Backup/ Restore)
    • Using MySQLWorkBench ( Offline)
    • Using DMS  (Continuous Sync with Mininmal Downtime)
    • Using mydumper / myloader tool
      • mydumper to Dump the Database
      • myloader to load to target MySQL instance
  • Understand Network Config/ Firewall/Storage and other dependent setup required for Offline migration
    • Network Inbound / Outbound Rule for Azure VM
    • Network Inbound / Outbound Rule for MySQL DB Access
  • Understand Data Volume , Latency for generating data dump and Transfer over to Cloud Landing Storage
  • How to mount and use Cloud Landing shared location and get ready for Data Import
  • Import options for loading Data dump
  • Produce Benchmark options / Analysis to improve performance
  • Cost Assessment and Comparative cost analysis with On Premise MySQL instances

Platform : 

  • On Premise MySQL Standby Slave Instance (Production  ‘mvh’ database) and ‘mvh_test’ Test Database
  • Cloud Azure
  • Azure VM
  • Any other onPrem VM’s for SCP file to Cloud

Note: I have been following microsoft documenthttps://docs.microsoft.com/en-us/azure/mysql/concepts-migrate-import-export ” and I have failed exporting and import through MySQL workbench in many ways with lot of challenges which I have reserved for separate thread/blog to explain.

1.0 Setup Cloud VM for MySQL

Creating Virtual machine with MySQL 5.7 Ubuntu Image

Have the inbound port enabled for access (not recommended for Any IP on port 22 for Production workload – should be target only inbound IP’s)

Attached separate Disk other than OS for MySQL Data workload (Follow more steps outlined required for OS to recognize the Disk)

Install mydumper 

Refer this about my dumper tool and this Github thread

ssh dxpadmin@<publicIP>  and then give the password (configured during VM setup)

Find the OS version and Image :  cat /etc/os-release

If prebuilt Image that has been picked is Ubuntu 5.7 will be SSH to this box from either local or any TC box as :  

and install below packages:

$wget https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb

$sudo dpkg -i mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb 

If below error follow the commands in sequence:

root@tcmysql:~# dpkg -i mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb

dpkg: error: dpkg status database is locked by another process

root@tcmysql:~#

root@tcmysql:~# lsof /var/lib/dpkg/lock

COMMAND   PID USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME

dpkg    14397 root    3uW  REG    8,1        0 56697 /var/lib/dpkg/lock

root@tcmysql:~# ps cax | grep 14397

14397 pts/0    Ss+    0:07 dpkg

root@tcmysql:~# kill 14397

root@tcmysql:~# ps cax | grep 14397

root@tcmysql:~# sudo rm /var/lib/dpkg/lock

root@tcmysql:~# sudo dpkg --configure -a

If Prebuilt Image is RedHat/CentOS use below:

yum install https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper-0.10.1-2.el7.x86_64.rpm
yum install https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper-0.10.1-2.el8.x86_64.rpm

1.1 Extract the Sample dump from MySQL Database

Extracting Rate Fact table (Current and Historical) data from MySQL Production Slave instance

Install mydumper/myloader utility to any TC Host which can be used as Landing zone for data:

RedHat / Centos

yum install https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper-0.10.1-2.el7.x86_64.rpm
yum install https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper-0.10.1-2.el8.x86_64.rpm

Ubuntu / Debian

wget https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb
dpkg -i mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb

Login to any Host which can reach the MySQL Database and execute below: See usage here:  https://manpages.debian.org/testing/mydumper/mydumper.1.en.html

sudo mydumper -B mvh -T hotel_rate_wes2 --host=bidbhotl01.gain.tcprod.local --user=dbuser --password=xxxx --rows=1000000 --outputdir=./hotel_rate_wes2/ --compress --no-locks --build-empty-files --threads=256 --compress-protocol --verbose=3 -L mydumper-logs_hotel_rate_wes2.txt &

This will create the incremental dump files for current rate (hotel_rate_wes2) table from mvh schema. 

Execute similar command to dump files for Historical Changed rate (hotel_rate_ih_wes2) table from mvh schema. 

A benchmark from the mydumper tool is below:

Table Name (Production)Total RecordsDump TimeDump SizeNumber of Threads usedNumber of Files Generated
hotel_rate_wes223,360,8173 Mins19 GB25625
hotel_rate_ih_wes2126,273,32812 Mins320 GB256130

1.2 Create Azure Storage account Fileshare

This is one of the Preconfigured Storage Account

Inside “dxpfileshare” create “data” directory

Either create Subdirectory “mvh_hotel_rate_wes2” using Azure Portal or Connect to Azure Cloud Shell and create directory to hold the Data Dump files

1.3 Mount Storage Account Fileshare inside Azure VM

This step will remote copy the Dump files generated by mydumper tool and transfer over to Cloud Storage

dpaul@<onpremvmhost> ~ $ ssh dxpadmin@<vmhostip>
dxpadmin@tcmysql:~$ sudo su -

In Storage Account in Azure and Copy the code to mount FileShare inside your Virtual Machine 

Execute below commands inside Azure VM (SSH to the VM box)

sudo mkdir /mnt/dxpfileshare
if [ ! -d "/etc/smbcredentials" ]; then
sudo mkdir /etc/smbcredentials
fi
if [ ! -f "/etc/smbcredentials/dxppocstorageaccount.cred" ]; then
    sudo bash -c 'echo "username=dxppocstorageaccount" >> /etc/smbcredentials/dxppocstorageaccount.cred'
    sudo bash -c 'echo "password=<replaceme>" >> /etc/smbcredentials/dxppocstorageaccount.cred'
fi
sudo chmod 600 /etc/smbcredentials/dxppocstorageaccount.cred

sudo bash -c 'echo "//dxp<fake>storageaccount.file.core.windows.net/dxpfileshare /mnt/dxpfileshare cifs nofail,vers=3.0,credentials=/etc/smbcredentials/dxp<fake>storageaccount.cred,dir_mode=0777,file_mode=0777,serverino" >> /etc/fstab'
sudo mount -t cifs //dxp<fake>storageaccount.file.core.windows.net/dxpfileshare /mnt/dxpfileshare -o vers=3.0,credentials=/etc/smbcredentials/dxp<fake>storageaccount.cred,dir_mode=0777,file_mode=0777,serverino

1.4 Transfer the file from OnPrem TC Host to Cloud Storage Fileshare

From Local TC Host transfers those files over to Storage Account

dpaul@<host> /u00/dump $ nohup sh -c 'date;scp -r hotel_rate_wes2/ dxpadmin@<azure VM Host>:/mnt/dxpfileshare/data/1;date' 2>&1 > remote_transfer.log &

dpaul@<host> /u00/dump $ nohup sh -c 'date;scp -r hotel_rate_ih_wes2/ dxpadmin@<azure VM Host>:/mnt/dxpfileshare/data/;date' 2>&1 > remote_transfer_hotel_rate_ih_wes2.log &

Note: for above commands to execute make sure generate the SSH Key in TC Host and add that as authorized keys inside Azure VM Host . This will stop asking password Prompts 

1.5 Network and DB access

Network Inbound rule change

To enable the VM host access to MySQL Database Inbound port 3306 needs to be allowed from Network option on VM .

Note : Mysql DB port 3306 is exposed to the Internet. Its not recommend exposing database ports to the Internet and suggest only exposing them to your front-end tier inside your virtual network.

This practice is applicable for Testing purpose only . For DB migration a dedicated private endpoint is advised 

DB Change

Grants required to be added for Azure OS user ‘root’ and  ‘dxpadmin’ (user created during VM creation) who can connect to port 3306 MySQL services. Note this is required to run myloader to load data as well.

To do this login to MySQL as root user : 

[root@tcmysql data]# mysql -u root -p

mysql> CREATE USER 'dxpadmin'@'%' IDENTIFIED BY '<password>';

mysql> GRANT ALL ON *.* to dxpadmin@'20.36.130.33' IDENTIFIED BY '<password>';

mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '<password>' WITH GRANT OPTION;

mysql> FLUSH PRIVILEGES;

mysql> exit

1.6 MySQL DB Config changes

Increase number of DB Connection’s

This is required to avoid error : “Failed to connect to MySQL server: Too many connections  when running myloader with more threads”

edit /etc/my.cnf file to add below params under [ mysqld ] option :

max_connections = 512

Disable Strict Mode

This is required to avoid error : “Invalid default value for <Date> columns” . Refer below image related to this error while loading data:

for e.g. : 

** (myloader:13067): CRITICAL **: 16:29:20.016: Error restoring mvh_test.currency from file mvh_test.currency-schema.sql.gz: Invalid default value for ‘last_update_dt’

** (myloader:2728): CRITICAL **: 00:36:22.806: Error restoring mvh_test.hotel_rate from file mvh_test.hotel_rate-schema.sql.gz: Invalid default value for                  ‘arv_dt’

** (myloader:2728): CRITICAL **: 00:36:20.940: Error restoring mvh_test.hotel_prop_data_source_note from file mvh_test.hotel_prop_data_source_note-schema.                 sql.gz: Invalid default value for ‘create_dttm’

edit /etc/my.cnf file to add below params under [ mysqld ] option :

        sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

Change default DB packets/ throughput and other parameters

This is required as performance tuning parameters

edit /etc/my.cnf file to add below params under [ mysqld ] option :

innodb_buffer_pool_size = 40G
max_allowed_packet = 900MB
innodb_read_io_threads = 16
innodb_write_io_threads = 32
innodb_purge_threads = 2
innodb_buffer_pool_instances = 16
transaction-isolation=READ-COMMITTED
innodb_log_buffer_size = 512M
innodb_log_file_size = 1G

Restart the MySQL Database for above change to propagate

For CentOS/Fedora used : 

Startup:
[dxpadmin@tcmysql~]$ sudo systemctl start mysqld
Shutdown:
[dxpadmin@tcmysql ~]$ sudo systemctl stop mysqld
Status:
[dxpadmin@tcmysql etc]$ systemctl status mysqld.service

1.7 Attach and Prepare Datadisk Ready for OS and VM

STEP 1 – Attach disk to existing VM

This can be done two ways : Azure Portal Disk section of the Azure VM by adding new LUN which we already did at 1.4 .

Other way would be executing below Azure CLI code

az vm disk attach \
--resource-group myResourceGroupDisk \
--vm-name myVM \
--name myDataDisk \
--size-gb 128 \
--sku Premium_LRS \
--new

STEP 2 – Prepare the Data Disk

As you can see though we had attach Datadisk of 2 TB in STEP 1 it is still not showing up here : 

[root@tcmysql data]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 9.0M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda2 30G 7.0G 23G 24% /
/dev/sda1 494M 65M 430M 13% /boot
/dev/sda15 495M 12M 484M 3% /boot/efi
//dxp<fake>storageaccount.file.core.windows.net/dxpfileshare 2.0T 349G 1.7T 18% /mnt/dxpfileshare
/dev/sdb1 126G 2.1G 118G 2% /mnt/resource
tmpfs 6.3G 0 6.3G 0% /run/user/1000

So lets execute below steps to make that available:

  • Create an SSH connection with the virtual machine. Replace the example IP address with the public IP of the virtual machine.
ssh dxpadmin@<Public IP Address of Azure VM>

  • Partition the disk with parted.

sudo parted /dev/sdc –script mklabel gpt mkpart xfspart xfs 0% 100%

  • Write a file system to the partition by using the mkfs command. Use partprobe to make the OS aware of the change.
sudo mkfs.xfs /dev/sdc1
sudo partprobe /dev/sdc1
  • Mount the new disk so that it is accessible in the operating system.
   sudo mkdir /datadrive && sudo mount /dev/sdc1 /datadrive
  • The disk can now be accessed through the /datadrive mountpoint, which can be verified by running the df -h command.
df -h | grep -i "sd"

The output shows the new drive mounted on /datadrive.

Filesystem Size Used Avail Use% Mounted on

/dev/sda1        30G   7.0G   23G   24% /
/dev/sda15       495M  12M  484M   4% /boot/efi
/dev/sdb1        126G   2G   118G   1% /mnt/resource
/dev/sdc1        2.0T     2.0T   1G   1% /datadrive
  • To ensure that the drive is remounted after a reboot, it must be added to the /etc/fstab file. To do so, get the UUID of the disk with the blkid utility.
sudo -i blkid
  • The output displays the UUID of the drive, /dev/sdc1 in this case.

/dev/sdc1: UUID=”33333333-3b3b-3c3c-3d3d-3e3e3e3e3e3e” TYPE=”xfs”

  • Open the /etc/fstab file in a text editor as follows:
sudo nano /etc/fstab
  • Add a line similar to the following to the /etc/fstab file, replacing the UUID value with your own.

UUID=33333333-3b3b-3c3c-3d3d-3e3e3e3e3e3e /datadrive xfs defaults,nofail 1 2

  • When you are done editing the file, use Ctrl+O to write the file and Ctrl+X to exit the editor.

Now that the disk has been configured, close the SSH session. Finally it will look like this:

Change VM Disk Setting for performance

While VM is stopped make changes on Data Disk Advanced setting in Azure portal

1.8 Move MySQL data directory to Azure Data Disk

STEP 1 – Check current MySQL data directory and Stop services

$ mysql -u root -p
mysql > select @@datadir;

Output
+-----------------+
| @@datadir       |
+-----------------+
| /var/lib/mysql/ |
+-----------------+
1 row in set (0.00 sec)

mysql > exit
$ sudo systemctl stop mysqld
$ sudo systemctl status mysqld

STEP 2 – Use rsync to copy data directory to Azure DD and backup

$sudo rsync -av /var/lib/mysql /datadisk
$sudo mv /var/lib/mysql /var/lib/mysql.bak

STEP 3 – Override MySQL Config file

$ sudo vi /etc/my.cnf

[mysqld]
. . .
datadir=/datadisk/mysql
socket=/datadisk/mysql/mysql.sock
. . .

[client]
port=3306
socket=/datadisk/mysql/mysql.sock

STEP 4 – Restart MySQL instance and check datadir changes

$ sudo systemctl start mysqld
$ sudo systemctl status mysqld



$ mysql -u root -p

mysql> select @@datadir;


Output

+----------------------------+
| @@datadir                  |
+----------------------------+
| /mnt/volume-nyc1-01/mysql/ |
+----------------------------+
1 row in set (0.01 sec)

if you don’t do below steps MySQL will not start

STEP 5:  On CentOS or RedHat, modify SELinux settings. On Ubuntu or Debian, modify the AppArmor settings

When you perform copy of data directory or using rsync , the SELinux context will be lost (Note if you move directory that may intact) 

If SELinux is running in enforcing mode, you need to to persuade it to allow mysqld to use the non-default datadir. Or alternatively, put SELinux into permissive mode, or disable it entirely.

Run sestatus to see which mode it’s currently running in.

[root@tcmysql data]# [root@tcmysql data]# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31

Run grep mysql /var/log/audit/audit.log to see if SELinux has blocked any mysql file access attempts or similar.

(But note that this is fine if running in permissive mode, because then it doesn’t actually block anything, it just logs what it would have blocked if you were running in enforcing mode.)

Run setenforce permissive to put SELinux into permissive mode, though note you need to edit /etc/sysconfig/selinux (or /etc/selinux/config) to make the change permanent.

SELinux Context Type for MySQL

This is just a reference before and after mysql directory copy 

If we won’t do the steps to change the filestype mysqld will throw below error and stop:

# cat /var/log/mysqld.log:
mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
[Warning] Can't create test file /var/lib/mysql/devdb..lower-test
/usr/libexec/mysqld: Can't change dir to '/var/lib/mysql/' (Errcode: 13)
[ERROR] Aborting
[Note] /usr/libexec/mysqld: Shutdown complete

STEP 6: Bounce the whole VM 

1.9 Load/Import Dump file using myloader

Prerequisite of myloader

Disable the Foreign Keys to expedite the loading/import process :

SET FOREIGN_KEY_CHECKS=0;
SET GLOBAL FOREIGN_KEY_CHECKS=0;
SET unique_checks = 0;
SET GLOBAL innodb_stats_on_metadata = 0;

After Import set those variables back :

SET foreign_key_checks = 1
SET unique_checks = 1;
SET GLOBAL innodb_stats_on_metadata = 1;

Execute myloader

With the below state of directory structures execute myloader which will load mvh_test (all tables) and hotel_rate_wes2(which is just one massive fact table with Hotel rates)

$ myloader -s mvh_test -h <public IP address VM> -u dxpadmin -p <password> -d mvh_test -t 500 -C -v 3 -e 2> myloader_mvh_test.log &

$ myloader --host=<public IP address VM> --user=dxpadmin --password=<password> --database=mvh_test --directory=./hotel_rate_wes2/ --threads=500--overwrite-tables --compress-protocol --verbose=3 -e 2> myload_hotel_rate_wes2.log &

Monitor Progress of the Data Loading

# Following gives approximate number of data files already restored
grep restoring <myloader-output-logs-path> | grep Thread | grep -v schema | wc -l
# Following gives total number of data files to be restored
ls -l <mydumper-output-dir> |  grep -v schema | wc -l
# Following gives information about errors
grep -i error <myloader-output-logs-path>

As the data import is in progress we can see /datadrive Azure DataDisk mount storage Used volume keep increasing . So we are successfully able to use Azure DD to store MySQL Database files.

2.0 Connect MySQL Cloud DB from Workbench & Query

Host Address should be VM Public IP Address :  Note InBound Rule should be allowed for your local machine IP to connect to this VM (This was outlined in previous steps)

Also to connect MySQL on port 3306 your local IP address needs to be GRANT permission to access from Database itself ( this process outlines before how to give GRANT privileges to open access for All or certain IP’s) 


Execute sample query as below:

use mvh_test;

select
hr.rate_id as rid,
hr.shop_request_id,
hr.org_id,
hr.prop_id,
hr.shop_data_source,
hr.shop_currency_cd,
hr.shop_pos_cd,
hr.los,
hr.guests,
hr.arv_dt,
hr.change_dttm,
hr.shop_status,
hr.shop_dttm,
hr.shop_msg,
hr.err_dttm,
hr.err_msg,
hr.chain_cd,
hr.currency_cd,
hr.pos_cd,
if(uncompressed_length(hr.data) < 16000000,uncompress(hr.data),hr.data) as data,
hr.modify_dttm
from hotel_rate_wes2 hr
where hr.shop_data_source = 'WEB'
and hr.guests = 1
and hr.shop_pos_cd = 'US'
and hr.arv_dt <= '2022-02-08'
and hr.arv_dt >= '2019-03-15'
and hr.prop_id in (253,290,291,362)
and hr.shop_currency_cd in ('SGD','USD')
and ((shop_dttm >= date_sub(now(), interval 7 day)) or (err_dttm >= date_sub(now(), interval 7 day)))
and hr.los = 1

Results