Objective is to analyze offline Migration of the MySQL Database instance from On Premise to Cloud w/o help of Azure Expressroute or Dedicated VPN Gateway or Private Endpoints. Also understand the process of offline migration , how much data it can handle , troubleshoot different issues and the pros and cons of following one Data Export / Import process vs Other and pick right choices in Cloud (Managed vs VM) . This drives to Cost assessment , provide limitations on certain choices . The right choice for big bulk migration is definitely DMS solutions with minimal downtime and leverage the dedicated connections but that steps are out of scope of this thread
Planning:
- MySQL DB Instance in Cloud
- Identify Sample Data to be migrated from On Premise
- Configure Cloud / Network / Database
- Export Data dump / Import Data
Goal:
This PoC will try to achieve following :
- Create MySQL Database instance in Cloud / Pick right choice
- Azure VM using Marketplace MySQL Image
- Azure Single Server MySQL Managed Instance
- Pick the right option for MySQL Migration
- Using mysqldump or PHPMyAdmin (Offline Backup/ Restore)
- Using MySQLWorkBench ( Offline)
- Using DMS (Continuous Sync with Mininmal Downtime)
- Using mydumper / myloader tool
- mydumper to Dump the Database
- myloader to load to target MySQL instance
- Understand Network Config/ Firewall/Storage and other dependent setup required for Offline migration
- Network Inbound / Outbound Rule for Azure VM
- Network Inbound / Outbound Rule for MySQL DB Access
- Understand Data Volume , Latency for generating data dump and Transfer over to Cloud Landing Storage
- How to mount and use Cloud Landing shared location and get ready for Data Import
- Import options for loading Data dump
- Produce Benchmark options / Analysis to improve performance
- Cost Assessment and Comparative cost analysis with On Premise MySQL instances
Platform :
- On Premise MySQL Standby Slave Instance (Production ‘mvh’ database) and ‘mvh_test’ Test Database
- Cloud Azure
- Azure VM
- Any other onPrem VM’s for SCP file to Cloud
Note: I have been following microsoft document ” https://docs.microsoft.com/en-us/azure/mysql/concepts-migrate-import-export ” and I have failed exporting and import through MySQL workbench in many ways with lot of challenges which I have reserved for separate thread/blog to explain.
1.0 Setup Cloud VM for MySQL
Creating Virtual machine with MySQL 5.7 Ubuntu Image
Have the inbound port enabled for access (not recommended for Any IP on port 22 for Production workload – should be target only inbound IP’s)
Attached separate Disk other than OS for MySQL Data workload (Follow more steps outlined required for OS to recognize the Disk)
Install mydumper
Refer this about my dumper tool and this Github thread
ssh dxpadmin@<publicIP> and then give the password (configured during VM setup)
Find the OS version and Image : cat /etc/os-release
If prebuilt Image that has been picked is Ubuntu 5.7 will be SSH to this box from either local or any TC box as :
and install below packages:
$wget https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb
$sudo dpkg -i mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb
If below error follow the commands in sequence:
root@tcmysql:~# dpkg -i mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb
dpkg: error: dpkg status database is locked by another process
root@tcmysql:~#
root@tcmysql:~# lsof /var/lib/dpkg/lock
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dpkg 14397 root 3uW REG 8,1 0 56697 /var/lib/dpkg/lock
root@tcmysql:~# ps cax | grep 14397
14397 pts/0 Ss+ 0:07 dpkg
root@tcmysql:~# kill 14397
root@tcmysql:~# ps cax | grep 14397
root@tcmysql:~# sudo rm /var/lib/dpkg/lock
root@tcmysql:~# sudo dpkg --configure -a
If Prebuilt Image is RedHat/CentOS use below:
yum install https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper-0.10.1-2.el7.x86_64.rpm
yum install https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper-0.10.1-2.el8.x86_64.rpm
1.1 Extract the Sample dump from MySQL Database
Extracting Rate Fact table (Current and Historical) data from MySQL Production Slave instance
Install mydumper/myloader utility to any TC Host which can be used as Landing zone for data:
RedHat / Centos
yum install https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper-0.10.1-2.el7.x86_64.rpm
yum install https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper-0.10.1-2.el8.x86_64.rpm
Ubuntu / Debian
wget https://github.com/maxbube/mydumper/releases/download/v0.10.1/mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb
dpkg -i mydumper_0.10.1-2.$(lsb_release -cs)_amd64.deb
Login to any Host which can reach the MySQL Database and execute below: See usage here: https://manpages.debian.org/testing/mydumper/mydumper.1.en.html
sudo mydumper -B mvh -T hotel_rate_wes2 --host=bidbhotl01.gain.tcprod.local --user=dbuser --password=xxxx --rows=1000000 --outputdir=./hotel_rate_wes2/ --compress --no-locks --build-empty-files --threads=256 --compress-protocol --verbose=3 -L mydumper-logs_hotel_rate_wes2.txt &
This will create the incremental dump files for current rate (hotel_rate_wes2) table from mvh schema.
Execute similar command to dump files for Historical Changed rate (hotel_rate_ih_wes2) table from mvh schema.
A benchmark from the mydumper tool is below:
Table Name (Production) | Total Records | Dump Time | Dump Size | Number of Threads used | Number of Files Generated |
---|---|---|---|---|---|
hotel_rate_wes2 | 23,360,817 | 3 Mins | 19 GB | 256 | 25 |
hotel_rate_ih_wes2 | 126,273,328 | 12 Mins | 320 GB | 256 | 130 |
1.2 Create Azure Storage account Fileshare
This is one of the Preconfigured Storage Account
Inside “dxpfileshare” create “data” directory
Either create Subdirectory “mvh_hotel_rate_wes2” using Azure Portal or Connect to Azure Cloud Shell and create directory to hold the Data Dump files
1.3 Mount Storage Account Fileshare inside Azure VM
This step will remote copy the Dump files generated by mydumper tool and transfer over to Cloud Storage
dpaul@<onpremvmhost> ~ $ ssh dxpadmin@<vmhostip>
dxpadmin@tcmysql:~$ sudo su -
In Storage Account in Azure and Copy the code to mount FileShare inside your Virtual Machine
Execute below commands inside Azure VM (SSH to the VM box)
sudo mkdir /mnt/dxpfileshare
if [ ! -d "/etc/smbcredentials" ]; then
sudo mkdir /etc/smbcredentials
fi
if [ ! -f "/etc/smbcredentials/dxppocstorageaccount.cred" ]; then
sudo bash -c 'echo "username=dxppocstorageaccount" >> /etc/smbcredentials/dxppocstorageaccount.cred'
sudo bash -c 'echo "password=<replaceme>" >> /etc/smbcredentials/dxppocstorageaccount.cred'
fi
sudo chmod 600 /etc/smbcredentials/dxppocstorageaccount.cred
sudo bash -c 'echo "//dxp<fake>storageaccount.file.core.windows.net/dxpfileshare /mnt/dxpfileshare cifs nofail,vers=3.0,credentials=/etc/smbcredentials/dxp<fake>storageaccount.cred,dir_mode=0777,file_mode=0777,serverino" >> /etc/fstab'
sudo mount -t cifs //dxp<fake>storageaccount.file.core.windows.net/dxpfileshare /mnt/dxpfileshare -o vers=3.0,credentials=/etc/smbcredentials/dxp<fake>storageaccount.cred,dir_mode=0777,file_mode=0777,serverino
1.4 Transfer the file from OnPrem TC Host to Cloud Storage Fileshare
From Local TC Host transfers those files over to Storage Account
dpaul@<host> /u00/dump $ nohup sh -c 'date;scp -r hotel_rate_wes2/ dxpadmin@<azure VM Host>:/mnt/dxpfileshare/data/1;date' 2>&1 > remote_transfer.log &
dpaul@<host> /u00/dump $ nohup sh -c 'date;scp -r hotel_rate_ih_wes2/ dxpadmin@<azure VM Host>:/mnt/dxpfileshare/data/;date' 2>&1 > remote_transfer_hotel_rate_ih_wes2.log &
Note: for above commands to execute make sure generate the SSH Key in TC Host and add that as authorized keys inside Azure VM Host . This will stop asking password Prompts
1.5 Network and DB access
Network Inbound rule change
To enable the VM host access to MySQL Database Inbound port 3306 needs to be allowed from Network option on VM .
Note : Mysql DB port 3306 is exposed to the Internet. Its not recommend exposing database ports to the Internet and suggest only exposing them to your front-end tier inside your virtual network.
This practice is applicable for Testing purpose only . For DB migration a dedicated private endpoint is advised
DB Change
Grants required to be added for Azure OS user ‘root’ and ‘dxpadmin’ (user created during VM creation) who can connect to port 3306 MySQL services. Note this is required to run myloader to load data as well.
To do this login to MySQL as root user :
[root@tcmysql data]# mysql -u root -p
mysql> CREATE USER 'dxpadmin'@'%' IDENTIFIED BY '<password>';
mysql> GRANT ALL ON *.* to dxpadmin@'20.36.130.33' IDENTIFIED BY '<password>';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '<password>' WITH GRANT OPTION;
mysql> FLUSH PRIVILEGES;
mysql> exit
1.6 MySQL DB Config changes
Increase number of DB Connection’s
This is required to avoid error : “Failed to connect to MySQL server: Too many connections when running myloader with more threads”
edit /etc/my.cnf file to add below params under [ mysqld ] option :
max_connections = 512
Disable Strict Mode
This is required to avoid error : “Invalid default value for <Date> columns” . Refer below image related to this error while loading data:
for e.g. :
** (myloader:13067): CRITICAL **: 16:29:20.016: Error restoring mvh_test.currency from file mvh_test.currency-schema.sql.gz: Invalid default value for ‘last_update_dt’
** (myloader:2728): CRITICAL **: 00:36:22.806: Error restoring mvh_test.hotel_rate from file mvh_test.hotel_rate-schema.sql.gz: Invalid default value for ‘arv_dt’
** (myloader:2728): CRITICAL **: 00:36:20.940: Error restoring mvh_test.hotel_prop_data_source_note from file mvh_test.hotel_prop_data_source_note-schema. sql.gz: Invalid default value for ‘create_dttm’
edit /etc/my.cnf file to add below params under [ mysqld ] option :
sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES
Change default DB packets/ throughput and other parameters
This is required as performance tuning parameters
edit /etc/my.cnf file to add below params under [ mysqld ] option :
innodb_buffer_pool_size = 40G
max_allowed_packet = 900MB
innodb_read_io_threads = 16
innodb_write_io_threads = 32
innodb_purge_threads = 2
innodb_buffer_pool_instances = 16
transaction-isolation=READ-COMMITTED
innodb_log_buffer_size = 512M
innodb_log_file_size = 1G
Restart the MySQL Database for above change to propagate
For CentOS/Fedora used :
Startup:
[dxpadmin@tcmysql~]$ sudo systemctl start mysqld
Shutdown:
[dxpadmin@tcmysql ~]$ sudo systemctl stop mysqld
Status:
[dxpadmin@tcmysql etc]$ systemctl status mysqld.service
1.7 Attach and Prepare Datadisk Ready for OS and VM
STEP 1 – Attach disk to existing VM
This can be done two ways : Azure Portal Disk section of the Azure VM by adding new LUN which we already did at 1.4 .
Other way would be executing below Azure CLI code
az vm disk attach \
--resource-group myResourceGroupDisk \
--vm-name myVM \
--name myDataDisk \
--size-gb 128 \
--sku Premium_LRS \
--new
STEP 2 – Prepare the Data Disk
As you can see though we had attach Datadisk of 2 TB in STEP 1 it is still not showing up here :
[root@tcmysql data]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 32G 0 32G 0% /dev
tmpfs 32G 0 32G 0% /dev/shm
tmpfs 32G 9.0M 32G 1% /run
tmpfs 32G 0 32G 0% /sys/fs/cgroup
/dev/sda2 30G 7.0G 23G 24% /
/dev/sda1 494M 65M 430M 13% /boot
/dev/sda15 495M 12M 484M 3% /boot/efi
//dxp<fake>storageaccount.file.core.windows.net/dxpfileshare 2.0T 349G 1.7T 18% /mnt/dxpfileshare
/dev/sdb1 126G 2.1G 118G 2% /mnt/resource
tmpfs 6.3G 0 6.3G 0% /run/user/1000
So lets execute below steps to make that available:
- Create an SSH connection with the virtual machine. Replace the example IP address with the public IP of the virtual machine.
ssh dxpadmin@<Public IP Address of Azure VM>
- Partition the disk with
parted
.
sudo parted /dev/sdc –script mklabel gpt mkpart xfspart xfs 0% 100%
- Write a file system to the partition by using the
mkfs
command. Usepartprobe
to make the OS aware of the change.
sudo mkfs.xfs /dev/sdc1
sudo partprobe /dev/sdc1
- Mount the new disk so that it is accessible in the operating system.
sudo mkdir /datadrive && sudo mount /dev/sdc1 /datadrive
- The disk can now be accessed through the
/datadrive
mountpoint, which can be verified by running thedf -h
command.
df -h | grep -i "sd"
The output shows the new drive mounted on /datadrive
.
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 30G 7.0G 23G 24% /
/dev/sda15 495M 12M 484M 4% /boot/efi
/dev/sdb1 126G 2G 118G 1% /mnt/resource
/dev/sdc1 2.0T 2.0T 1G 1% /datadrive
- To ensure that the drive is remounted after a reboot, it must be added to the /etc/fstab file. To do so, get the UUID of the disk with the
blkid
utility.
sudo -i blkid
- The output displays the UUID of the drive,
/dev/sdc1
in this case.
/dev/sdc1: UUID=”33333333-3b3b-3c3c-3d3d-3e3e3e3e3e3e” TYPE=”xfs”
- Open the
/etc/fstab
file in a text editor as follows:
sudo nano /etc/fstab
- Add a line similar to the following to the /etc/fstab file, replacing the UUID value with your own.
UUID=33333333-3b3b-3c3c-3d3d-3e3e3e3e3e3e /datadrive xfs defaults,nofail 1 2
- When you are done editing the file, use
Ctrl+O
to write the file andCtrl+X
to exit the editor.
Now that the disk has been configured, close the SSH session. Finally it will look like this:
Change VM Disk Setting for performance
While VM is stopped make changes on Data Disk Advanced setting in Azure portal
1.8 Move MySQL data directory to Azure Data Disk
STEP 1 – Check current MySQL data directory and Stop services
$ mysql -u root -p
mysql > select @@datadir;
Output
+-----------------+
| @@datadir |
+-----------------+
| /var/lib/mysql/ |
+-----------------+
1 row in set (0.00 sec)
mysql > exit
$ sudo systemctl stop mysqld
$ sudo systemctl status mysqld
STEP 2 – Use rsync to copy data directory to Azure DD and backup
$sudo rsync -av /var/lib/mysql /datadisk
$sudo mv /var/lib/mysql /var/lib/mysql.bak
STEP 3 – Override MySQL Config file
$ sudo vi /etc/my.cnf
[mysqld]
. . .
datadir=/datadisk/mysql
socket=/datadisk/mysql/mysql.sock
. . .
[client]
port=3306
socket=/datadisk/mysql/mysql.sock
STEP 4 – Restart MySQL instance and check datadir changes
$ sudo systemctl start mysqld
$ sudo systemctl status mysqld
$ mysql -u root -p
mysql> select @@datadir;
Output
+----------------------------+
| @@datadir |
+----------------------------+
| /mnt/volume-nyc1-01/mysql/ |
+----------------------------+
1 row in set (0.01 sec)
if you don’t do below steps MySQL will not start
STEP 5: On CentOS or RedHat, modify SELinux settings. On Ubuntu or Debian, modify the AppArmor settings
When you perform copy of data directory or using rsync , the SELinux context will be lost (Note if you move directory that may intact)
If SELinux is running in enforcing mode, you need to to persuade it to allow mysqld to use the non-default datadir. Or alternatively, put SELinux into permissive mode, or disable it entirely.
Run sestatus
to see which mode it’s currently running in.
[root@tcmysql data]# [root@tcmysql data]# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Max kernel policy version: 31
Run grep mysql /var/log/audit/audit.log
to see if SELinux has blocked any mysql file access attempts or similar.
(But note that this is fine if running in permissive mode, because then it doesn’t actually block anything, it just logs what it would have blocked if you were running in enforcing mode.)
Run setenforce permissive
to put SELinux into permissive mode, though note you need to edit /etc/sysconfig/selinux
(or /etc/selinux/config
) to make the change permanent.
SELinux Context Type for MySQL
This is just a reference before and after mysql directory copy
If we won’t do the steps to change the filestype mysqld will throw below error and stop:
# cat /var/log/mysqld.log:
mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
[Warning] Can't create test file /var/lib/mysql/devdb..lower-test
/usr/libexec/mysqld: Can't change dir to '/var/lib/mysql/' (Errcode: 13)
[ERROR] Aborting
[Note] /usr/libexec/mysqld: Shutdown complete
STEP 6: Bounce the whole VM
1.9 Load/Import Dump file using myloader
Prerequisite of myloader
Disable the Foreign Keys to expedite the loading/import process :
SET FOREIGN_KEY_CHECKS=0;
SET GLOBAL FOREIGN_KEY_CHECKS=0;
SET unique_checks = 0;
SET GLOBAL innodb_stats_on_metadata = 0;
After Import set those variables back :
SET foreign_key_checks = 1
SET unique_checks = 1;
SET GLOBAL innodb_stats_on_metadata = 1;
Execute myloader
With the below state of directory structures execute myloader which will load mvh_test (all tables) and hotel_rate_wes2(which is just one massive fact table with Hotel rates)
$ myloader -s mvh_test -h <public IP address VM> -u dxpadmin -p <password> -d mvh_test -t 500 -C -v 3 -e 2> myloader_mvh_test.log &
$ myloader --host=<public IP address VM> --user=dxpadmin --password=<password> --database=mvh_test --directory=./hotel_rate_wes2/ --threads=500--overwrite-tables --compress-protocol --verbose=3 -e 2> myload_hotel_rate_wes2.log &
Monitor Progress of the Data Loading
# Following gives approximate number of data files already restored
grep restoring <myloader-output-logs-path> | grep Thread | grep -v schema | wc -l
# Following gives total number of data files to be restored
ls -l <mydumper-output-dir> | grep -v schema | wc -l
# Following gives information about errors
grep -i error <myloader-output-logs-path>
As the data import is in progress we can see /datadrive Azure DataDisk mount storage Used volume keep increasing . So we are successfully able to use Azure DD to store MySQL Database files.
2.0 Connect MySQL Cloud DB from Workbench & Query
Host Address should be VM Public IP Address : Note InBound Rule should be allowed for your local machine IP to connect to this VM (This was outlined in previous steps)
Also to connect MySQL on port 3306 your local IP address needs to be GRANT permission to access from Database itself ( this process outlines before how to give GRANT privileges to open access for All or certain IP’s)
Execute sample query as below:
use mvh_test;
select
hr.rate_id as rid,
hr.shop_request_id,
hr.org_id,
hr.prop_id,
hr.shop_data_source,
hr.shop_currency_cd,
hr.shop_pos_cd,
hr.los,
hr.guests,
hr.arv_dt,
hr.change_dttm,
hr.shop_status,
hr.shop_dttm,
hr.shop_msg,
hr.err_dttm,
hr.err_msg,
hr.chain_cd,
hr.currency_cd,
hr.pos_cd,
if(uncompressed_length(hr.data) < 16000000,uncompress(hr.data),hr.data) as data,
hr.modify_dttm
from hotel_rate_wes2 hr
where hr.shop_data_source = 'WEB'
and hr.guests = 1
and hr.shop_pos_cd = 'US'
and hr.arv_dt <= '2022-02-08'
and hr.arv_dt >= '2019-03-15'
and hr.prop_id in (253,290,291,362)
and hr.shop_currency_cd in ('SGD','USD')
and ((shop_dttm >= date_sub(now(), interval 7 day)) or (err_dttm >= date_sub(now(), interval 7 day)))
and hr.los = 1
Results
You must be logged in to post a comment.