Recently, I have been looking more into SQL Server 2019 offerings due to various reasons. So, I am sharing current SQL Server 2019 Big Data Clusters learning resources that I discovered.
Now, one of the reasons I am looking into it is because I am acting product owner of SQL Server for a large SQL Server estate for a client.
Due to this I organised a workshop about it. So I had to make sure I knew all about the newest offerings. Which includes Big Data Clusters.
For the benefit of those of you who aren’t aware, Big Data Clusters is a new feature that is being introduced in SQL Server 2019. It will allow you to view Big Data which is either being stored locally or on remote systems using Polybase.
Like Hadoop, it will also support storing HDFS data within it’s environment. In addition, Microsoft have extended the Polybase functionality so that you can view data stored remotely on other systems like Oracle.
In a previous post I shared current SQL Server 2019 learning resources, which you can view in detail here.
However, SQL Server 2019 Big Data Clusters are very involved. So, I thought I better dedicate a whole post to share further learning resources for it.
Because some people have different learning methods I have included references to both documents and videos in this post. In addition, I have created the below links in case somebody wants to go directly to a specific section.
With this in mind, I hope you find the below useful. I thought I would begin with the most obvious resource first.
Overall, Microsoft have done a fairly good job of keeping the documentation for Big Data Clusters updated. Which you can view in detail here.
In the documentation they cover all aspects based on what is currently available within the latest SQL Server Current Technology Preview (CTP). Which currently is on version 2.5.
In addition, I have a tip for those who are hoping to read the documentation whilst travelling around as well.
If you click on the ‘Download PDF’ link in the document, you can download all the SQL Server documentation available as one big pdf file.
In addition, the pdf document is in a decent format. Due to it containing information about various versions of SQL Server it’s a large pdf file. Which is around 178MB in size.
If you want more of an overview you can read the SQL Server 2019 and Big Data white paper instead.
It has similar information as to what is in the SQL Server documentation, however it is more condensed. In addition, it does provide a good overview and highlights that you can potentially use this with Azure Stack services.
I did a post previously which covered Azure Stack, which you can read here.
You can sign up to download and read the Big Data white paper in detail here.
Of course, if you want an even shorter version of the white paper without having to sign up for anything you could read this original introduction post. In fact, it was published last September after the original Ignite announcement. You can read it in detail here.
I discussed in my previous post that Microsoft have made various workshops available in GitHub that you can use.
One of these happens to be for Big Data Clusters, which I used as part of my training day with Buck Woody at SQLBits. I found it very useful.
It has since been updated to work with CTP 3.2 and works perfectly well with the latest Release candidate. In fact, it’s now easier to deploy than it was before.
You can read about it in detail here.
I read this post about Jeff Levy’s review of using Big Data Clusters and I really enjoyed it. He discusses how he got a Big Data Cluster up and running in a Minikube environment because he’s part of the early adoption program.
I find can be useful to read other experiences with things, as they note facts which are not always found in the official documented. You can read his post in detail here.
Chris Adkin has produced a fair few Kubernetes posts. In fact, Chris was looking at Kubernetes long before it became popular with others.
Recently Chris has been focusing on posts about Big Data Clusters, which give some useful insights. You can read his posts in detail here.
Like I said earlier in this post Big Data Clusters was demonstrated to the masses at Ignite last year. You can few the videos for these sessions online, and I have found two fairly useful.
Firstly is the originaly introduction video, which covers similar material to the documentation. However, it also shows some demos so that you can see it in action. Which you can watch here.
Secondly is the deep dive session, that focuses on the architecture in detail. Which you can watch here.
I was lucky enough this year to attent Buck Woody’s training day at SQLBits about Big Data Clusters this year. In addition, he also did a session which was a shortened version of it.
I will admit there is indeed some overlap between Buck’s content and the Ignite videos. However, Buck does cover some new ground.
In addition, some of you will probably enjoy his unique presentation skills. Plus, his session is shorter then the Ignite counterparts, which some of you might prefer.
You can view Bucks session at SQLBits here.
You can watch a Big Data Cluster being deployed using Azure Data Studio in my ‘SQL Server related services in Azure’ video here.
If you prefer to read a book, there are some now available.
One I would recommend is the ‘SQL Server Big Data Clusters’ book by Benjamin Weissman and Enrico van de Laar. In May 2020 the second edition of this book was published, which is available from here.
I hope this post encourages some of you to find out more about Big Data Clusters. I think it’s going to prove very useful for companies that have a legitimate business case for it.
If you can think of any other additional learning resources online then feel free to add a comment. for the benefit of the community.