Last week I did some initial tests for Azure Synapse Link for SQL Server 2022. Which is currently in preview.
In addition, I showed a working version of it at Data Saturday Croatia. With this in mind, I thought I would share some of my experiences with it so far.
By the end of this post, you will have an understanding of this feature and know one way to test that it is working.
About Azure Synapse Link for SQL Server 2022
Azure Synapse Link for SQL Server 2022 allows you to replicate your data from a SQL Server 2022 database to an Azure Synapse Analytics dedicated SQL Pool.
It is one of the options for the new Azure Synapse Link for SQL feature that was announced during Microsoft Build. You can read more about this in the Microsoft post which also announced the Public Preview of Azure Synapse Link for SQL.
It is the main reason why I installed SQL Server 2022 on a virtual machine. Which I covered in a previous post about creating a dacpac for a SQL Server 2022 database. However, since than SQL Server 2022 container images are now available. Which opens up some more options.
Azure Synapse Link for SQL Server 2022 recently came out of preview and is now generally available.
Setting up Azure Synapse Link for SQL Server 2022
I setup Azure Synapse Link for SQL Server 2022 by following the instructions by Microsoft. Which shows how to get started with Azure Synapse Link for SQL Server 2022.
Because I had SQL Server 2022 installed in a virtual machine I installed the integration runtime on there before creating the linked service.
I decided to test using the WideWorldImporters sample database. Because I was initially thinking about doing some tests using the WideWorldImporters data generation.
One key point to mention here is that I did not change the database compatibility level after the database restore.
If you are looking to do the same with this database here are some key points to remember
- You need to create a master key in the database. In addition, if you restore the database, you must add this key again.
- You must create all the schemas required in the destination dedicated SQL Pool.
- If you recreate the dedicated SQL Pool, remember to add the master key back again.
- You must make sure you use managed identity when you create a linked service to connect to your landing zone on Azure Data Lake Storage Gen2. If you authenticate using another method, you will not be able to select it whilst setting up Azure Synapse Link. I know this from experience.
Azure Synapse Link table recommendations
One thing I liked when I went to create the Azure Synapse Link for SQL Server 2022 connection was the fact that it highlighted potential issues with tables straight away after selecting the database as a linked service. As you can see below.
After selecting the database to be the Link connection I could manage it in the new ‘Link connection’ option.
Straight away it gave me recommendations to change some tables to be a heap.
Once I clicked on the Start button, I had to wait a while for the connection to start running. So, you might need to be patient here. One good thing about this is that you can drill down into the link to view the progress of each table by clicking on the link name.
It is very clear when Azure Synapse Link for SQL Server 2022 is up and running.
After it was up and running, I checked that it was using the right storage account and container.
Azure Synapse Link Storage Account
After it was running, I checked to make sure the right storage account was being used by Azure Synapse Link. Luckily there is an easy way to check. Because if you scroll to the end of your link connection you can find a continuous run id.
You can than navigate into your storage account and check that the folder name is the same as the run ID.
Testing automatic replication
One question I got asked at Data Saturday Croatia was if it Azure Synapse Link for SQL automatically updated. With this in mind I thought I would show what happens when an update takes place. So that you can see for yourself.
As you can see below the replication first started at 3:50.
In the storage account, there is an initial Manifest file. Which was created at 3:55.
If I go navigate further into the ‘Tables’ folder and go into one of its subfolders I can see a JSON file which contains the table details and a FullCopyData folder.
I downloaded the JSON file so that I could verify which table it was. I than updated the comments column in the PurchaseOrders table within the SQL Server 2022 database.
Here is where things get interesting, because a new folder called ‘ChangeData’ appeared. Plus, it looks like the new file is a CSV file. Whereas the table used for the initial load was a parquet file.
In addition, the Manifest JSON file got updated with a new ‘Last modified’ value.
To check that the link had done its job, I checked that my update in the SQL Server 2022 database had made its way to the dedicated SQL Pool by running the below query.
One thing I will add is that it is probably a good idea come up with your own method to load test Azure Synapse Link for SQL Server 2022. Especially if you intend to test by using the WideWorldImporters database.
Because there are issues if you try and use the ‘DataLoadSimulation.PopulateDataToCurrentDate’ stored procedure within the database after setting it up with Azure Synapse Link for SQL Server 2022.
Final words about initial tests
I hope this post about my initial tests for Azure Synapse Link for SQL Server 2022 has been a good introduction to this feature for some of you.
It is clear that a lot of effort has been made to make it easy to configure and monitor this new feature. However, make sure you understand its known limitations.
If you want to learn more about the Azure Synapse Link for SQL feature there are a couple of sessions relating to it at the next Data Toboggan conference. Including one by myself.
Of course, if you have any comments or queries about this post feel free to reach out to me.