how to leverage Microsoft Self Hosted Runtime integration in Azure Data Factory to connect On-Premise network and view data.
This is a proof of concept that said without being a hard wired Azure ExpressRoute setup or Site to Site VPN connectivity we can still connect to On-Premise Data to Cloud Infrastructure by the help of Azure Data Factory
The scope of this POC is limited to show the connectivity of the data source (Oracle) which is behind corporate firewall and view data part only. Detail scope to move data from Oracle to Azure Data Lake Storage Gen 2( ADLS Gen2) will be covered in separate blog item.
- You need to have standalone local machine VPN connectivity established to connect to the Source Database (in this case Oracle) on-Prem.
- Windows Server 2012/2016/2019 , Windows 8.1 , Windows 10
What is Microsoft Self Hosted Integration Runtime
The Microsoft Integration Runtime is a customer managed data integration and scanning compute infrastructure used by Azure Data Factory, Azure Synapse Analytics and Azure Purview to provide data integration and scanning capabilities across different network environments.
A self-hosted integration runtime can run copy activities between a cloud data store and a data store in a private network. It also can dispatch transform activities against compute resources in an on-premises network or an Azure virtual network. The installation of a self-hosted integration runtime needs an on-premises machine or a virtual machine inside a private network
Self Hosted Integration Runtime Architecture
- Create Data Pipeline in Azure Data Factory
- Setup Integration Runtime
- Install Microsoft Self hosted integration runtime in local laptop and Configure the connection to cloud
- Setup Linked Services
- View Oracle Data
Step 1 : Create Data Pipeline in Azure Data Factory
Search for Data Factory service in Azure
Create a Data Factory Pipeline
I have created “TestOracleDBConnectADF” as and Click Author and Monitor
Create New Pipeline
Click New Integration Runtimes
Choose SelfHosted option and Copy the Key . Also here we can see direct download link for the integration Runtime installer
Step 2 : Setup Self Hosted Integration runtime in local laptop
Download and Install from here : https://www.microsoft.com/en-us/download/details.aspx?id=39717
Copy the Key to below section after running the installer and hit Register
Step 3 : Verify the connection established between Local machine and Cloud
The Selfhosted runtime configuration should be look like this . No additional configuration required to set here
In Azure Portal it should look like below
Step 4 : Connect to VPN and See the Sample table in Exadata schema
Step 5 : Configure Linked Services to connect with Oracle Exadata DB info
In Azure Data factory create New Linked service called “LS_Oracle_onPrem” .
Add the Exadata OnPrem Database connection information there and Test the connection:
Step 6 : Create New Dataset
I have created Data folder called EMPDEPT and inside that new dataset called SRC_EMP
Pick and choose the Linked Service and Test Connection . Once successful Click the Table dropdown to get the list of Schema and Table to access and you can Preview any dataset you want .
This will directly fetch the data from OnPrem Oracle to Cloud .
That’s all for the Validating and testing connection and data access across two platform.