In this post I want to cover how you can transform your SQL Server database schema for a dedicated SQL Pool if you are using Azure DevOps. Because I covered it at Data Toboggan over the weekend and it can be very useful.
By the end of this post, you will know one way you can transform the schema of a database project for SQL Server 2019 if you are using Azure DevOps. So that you can make it optimal for dedicated SQL Pools.
For example, say you have created a database project based on a SQL Server 2019 database. Which is in a Git repository in Azure DevOps and used in a deployment pipeline to deploy to both a SQL Server database and an Azure Synapse Analytics dedicated SQL Pool.
Like the one in a previous post here.
What do you do if you want to make it optimal for Azure Synapse Analytics? For instance, change a table to use hash distribution for performance reasons.
My solution for doing this is very simple. In fact, it’s based on the method I use to deploy to deploy from a SQL Server database project to many different SQL Server related services including dedicated SQL Pools. You can read the original post about that in detail here.
To demonstrate this, I will use a very simple pipeline as below.
In reality, your pipeline will be more complex than above. Like the one that is in a previous post here that covers unit tests.
Transform & build stage
As you can see in the above pipeline, I have a stage called transform & build. To create a dacpac file that sets a table to use hash distribution, the following steps are performed in this stage.
- First sets all variables for that stage and which agent pool to use.
- Afterwards it copies all the files in the repository to a subfolder the staging area. Apart from the yaml file.
- It then runs an inline PowerShell task which replaces the bracket at the end of the SQL file for that particular table. Changing it to add the required WITH syntax instead.
- Afterwards, runs another inline PowerShell task which replaces the DSP line in the copy of the sqlproj file in the staging area. Setting it to be for an Azure SQL Database instead.
- It then runs a build against the new sqlproj file with the changed DSP line.
- Once the build has completed a new artifact is published with only the contents of the ‘bin\release’ folder (release was what I name my configuration).
One of the advantages of doing step 2 is that you only maintain one database project. In addition, it keeps the copies you create behind the scenes to change for other SQL Services nice and secure. So, there will be no drifts apart from transformations you do in your pipeline.
Once the above stage is completed the deploy stage is run to deploy the dacpac to a dedicated SQL Pool.
In step 3 above I used the below PowerShell in an inline PowerShell task.
cd.. cd a cd $(tablesfolder) $Old_end = '\);' $New_end = ') WITH (DISTRIBUTION = HASH (Column1), CLUSTERED COLUMNSTORE INDEX);' $New_end (Get-Content "Table1.sql") -replace $Old_end, $New_end | Set-Content "Table1.sql"
First of all, the task has to navigate to the folder where the SQL scripts for the tables are stored in the copied version of the repository.
Afterwards, I replace the bracket at the end of the SQL script with the required WITH command.
It’s simple, however I will point out some things here.
- You will notice I have used a backslash (\) as an escape character after the parentheses for the search variable, because it is searching for a special character. You only have to do this for the search value, not the replace one.
- Remember to include the full colon (;) after the bracket if you intend to do this. Otherwise, it will replace brackets used elsewhere in the SQL script as well.
- In the above example I have done this for one particular table. However, you could also run a for loop to do this for all your SQL files for tables if required.
- Do your research about what distribution type you need. Microsoft has a good guide about this here.
- You will notice I declared a clustered Columnstore index above. In reality, if you do not declare an index type this is the default that is created for dedicated SQL Pools. You can read more about index types for dedicated SQL Pools here.
I hope my post about how you can transform your SQL Server database schema for a dedicated SQL Pool is useful. Especially since it shows that you can do a lot of things in your yaml pipelines with some imagination.
If you have any questions or comments about this post feel free to reach out to me.