Before my unique session at dataMinds Connect the other week I created some guidelines for deploying Big Data Clusters using a Visual Studio Subscription.
For those of you who are not aware, Big Data Clusters is something new which is being introduced with SQL Server 2019. You can read my post about learning resources for Big Data Clusters in detail here.
In order to present the right results for various outcomes I attempted to deploy Big Data Clusters multiple times.
When I say multiple times, I mean the number of deployments easily went into double figures. Because I was testing deploying various virtual machine sizes in multiple regions.
Hence, I spent many hours testing and verifying the results in order to present them properly.
For testing I used a couple of methods to deploy Big Data Clusters into Azure Kubernetes Service. I used both the sample python deployment script available online and the wizard that comes in Azure Data Studio Insiders Build.
In addition, I tested the defaults from both the script and the wizard in each other’s solutions.
Sometimes the errors appeared immediately. However, sometimes I had to investigate further. For example, using the kubectl command to dive into issues further.
In truth, any of these guidelines could change anytime. In fact, in my session I only presented three guidelines. However, I did mention the fourth relating to housekeeping.
Anyway, below are the four guidelines I have come up with for deploying Big Data Clusters using a Visual Studio Subscription at this moment in time.
1. Virtual Machine size
The virtual machine size that you specify has to exist in that region. Microsoft will not magically create that virtual machine size in a region for you.
You can check this a fair few ways. For example, by using PowerShell or by attempting to create a virtual machine of that size in the portal.
Now this was an interesting discovery. It appears that the total maximum number of disks that all the virtual machines collectively can handle must be equal to or above 24.
First time I realised this is when my deployment was taking longer than expected. So, I decided to run the kubectl command which gave me an error about a node exceeding max volume count.
I found afterwards that it is in the section highlighted as important on the page to deploy using the sample python script. Which you can read in detail here.
For example, if you state one Standard_L8s_V2 virtual machine it will fail because the maximum number of disks it can handle is 16. However, if you state two of them the maximum is 32 disks which is fine.
Make sure you delete all of the resource groups that are created after you have finished testing the deployment of a Big data Cluster.
In reality, they do not cost much to deploy for testing purposes if you use a small number of virtual machines. However, if you keep them there, they will start to use up your Azure credit.
The total number of cores used by all the virtual machines must be 20 or below if you have not made any changes to your Visual Studio Subscription.
Because it is the default quote limit per virtual machine size. However, this particular issue is something you can resolve by submitting a request to Microsoft to extend your quota.
You can read how to do that in detail here.
I hope you find my guidelines for deploying Big Data Clusters using a Visual Studio Subscription useful.
As you can imagine this kept me very busy leading up to dataMinds Connect. Which you can read my post about presenting there in detail here.
Like I said these guidelines could change at any time, especially with the release of SQL Server 2019 expected soon.