In this post I want to cover spreading your SQL Server wings with Microsoft Fabric Lakehouses. As part of a series of posts about spreading your SQL Server wings with the Microsoft Intelligent Data Platform.
You can find out more about this series of posts in an introductory post I published about the series.
I thought it was fitting to do this post since I covered spreading your SQL Server wings with Microsoft Fabric Data Warehouses recently.
Plus, it also seems fitting after the recent announcement about the new DP-600 exam for Microsoft Fabric. Since this post contains items that are in the study guide.
By the end of this post, you will have a good overview of Microsoft Fabric Data Lakehouses, including CI/CD options. In addition, where your SQL Server background can prove to be useful and where some Power BI knowledge can come in handy.
Plus, I share plenty of links in this post. For instance, there are a couple of useful resources to help you get started towards the bottom of this post.
Recap about Microsoft Fabric
Microsoft Fabric unifies the functionality you can find in various applications together within one Software as a Service solution to deliver a unified analytics platform.
So, all you have to do is go to one website and all the functionality that you see below will be at your fingertips. Including the Data Engineering experience which Lakehouses are a part of.
To help with some jargon here, you can find out more about what I mean when I refer to Software as a Service (SaaS) in an old blog post called ‘SaaS, PaaS, and IaaS.. Oh my!’.
In reality, even though Microsoft Fabric has numerous data experiences you should configure some other services as well to get the most out of it. Especially if you want to work with it at an enterprise level.
For example, Microsoft Purview for your compliance and governance needs. In addition, Azure DevOps to work with Microsoft Fabric Git integration. As indicated below.
About Microsoft Fabric Lakehouses
Microsoft Fabric Lakehouses allows you to work with the Lakehouse paradigm at scale. Due to the fact that it allows you to work with both structured and unstructured data. Behind the scenes, it is backed by the Spark compute.
Something that will be very familiar to those of you who have worked with other services that support working with Lakehouses. Such as Azure Synapse Analytics and Azure Databricks.
Lakehouse Editor
You can manage the contents of Lakehouses within Microsoft Fabric with the Lakehouse Editor. Which opens up automatically if you select a Lakehouse within Microsoft Fabric.
As soon as you open the Lakehouse Editor you can see that Lakehouses support ingesting raw data as well as curated Delta Tables.
Because it has a ‘Files’ folder where you can store any file types and a ‘Tables’ folder which stores the nicely curated Delta tables.
Which means you can work with the very common pattern of ingesting files containing raw data into the Files folder and from there transform the data to curated Delta tables.
Which will be a concept familiar to those of you who have had experience loading the data from files into SQL Server databases.
Shortcuts
In addition, you can take advantage of a powerful feature called shortcuts. Which allows you to create a link to files stored in another location.
Like various SQL Server applications such as Azure Data Studio it has a variety of additional menu items depending on where you click to in the interface.
For example, in the Lakehouse editor you can create shortcuts to other file locations. Which you can then work with as if they were local objects in the Lakehouse.
You can create a shortcut in the Lakehouse Editor by clicking on the ellipsis (…) and selecting ‘New shortcut’. From there, you can select where your data is stored through a nice clickable wizard. As you can see below.
It is worth noting that you can do this in the ‘Tables’ folder for Delta tables you have created elsewhere. For example, existing Delta tables created with Azure Databricks.
Plus, those of you who are keen on performance can take advantage of the Optimize option that you can use when working with tables. I have been informed that this option is potentially going to be renamed but that change has yet to appear om my tenant.
Clicking this option will provide you with various options you can use to maintain your tables. Because the data is stored in files the options are to either run the optimize command or run the vacuum command.
Those of you with a SQL Server background can think of the optimize command to be the file equivalent of the ALTER INDEX REBUILD command, or DBCC REINDEX depending on when you last worked with SQL Server.
SQL analytics endpoint
When you deploy a Lakehouse in Microsoft Fabric you actually deploy three items. Which are a Lakehouse, a semantic model (which is a new name for a dataset) and a SQL analytics endpoint.
In the top right-hand corner of Lakehouse editor there is a drop-down menu. Where one of the options is the SQL analytics endpoint.
Selecting this option opens up a read-only version of the Warehouse Editor. Which I covered in my previous post about spreading your SQL Server wings with Microsoft Fabric Data Warehouses.
When you open up the SQL analytics endpoint for a particular Lakehouse you can query your delta tables like in Lakehouse using T-SQL. Like in the below example.
One interesting thing I want to highlight here is that even though you can only run SELECT statements through the read-only endpoint you are still able to perform Warehouse modelling directly when working with Warehouse Editor through the SQL analytics endpoint.
It is worth noting that whether you work within the Lakehouse Editor or the read-only SQL analytics endpoint you can create a Power BI report easily within Microsoft Fabric within a few clicks.
Developing in Microsoft Fabric with Notebooks
Notebooks are probably familiar to those of you who have used Azure Data Studio, or other services such as Azure Synapse Analytics.
You can work with various languages within notebooks. Including Spark SQL, which is similar to T-SQL. Like in the below example taken from my Festive Tech Calendar contribution for this year.
So, you can utilize your SQL Server development skills to write code within notebooks. Of course, if you intend to do heavy development in Lakehouses I do recommend learning Python as well.
One key point to remember is that when you run queries in these notebooks it utilizes the Spark compute behind the scenes to do the processing.
Performance tuning Microsoft Fabric Lakehouses
In reality, there is a lot you can do to help with performance tuning when working with Lakehouses.
For instance, Microsoft provides detailed documentation about how you can work with Spark compute. To give you a better understanding of how you can do various things such as create custom Spark pools.
Which you can think of conceptually a bit like selecting the best machine specifications to run SQL Server.
Plus, you can look to translate your SQL Server monitoring skills over to Apache Spark monitoring.
For example, you can view the recent runs of some items just like you can in a lot of SQL Server monitoring tools.
Integration with other Microsoft Fabric experiences
Just like Microsoft Fabric Data Warehouses, Lakehouses integrate with other items within Microsoft Fabric as well.
Below are just some examples of how the Microsoft Fabric Lakehouses interacts with other items and with OneLake.
CI/CD for Microsoft Fabric Lakehouses
Lakehouses are now supported by Git integration. So, if you have worked with source control for SQL Server databases in the past you can work with similar concepts.
However, you need to be aware of what objects are actually stored before considering this as an option. Due to the fact that the details currently stored are limited. You can read more about it in the Lakehouse git integration details.
Same currently applies for working with Lakehouses in deployment pipelines as well.
Microsoft Fabric Data Warehouse material to help spread your SQL Server wings
Microsoft has been working on Microsoft Fabric Lakehouses for some time and it shows.
To align with the original announcement during Microsoft Build a Microsoft Fabric Launch Digital event took place at the same time. You can watch both the Day 1 video and the Day 2 video on YouTube now.
In addition there are some interesting videos from Microsoft Ignite sessions that you can watch.
Plus, Microsoft provides some really good end-to-end tutorials in Microsoft Fabric to help you get up to speed with it. You can also find a learning path to help you get started with Microsoft Fabric in Microsoft Learn.
Furthermore, there is already an online Fabric community that you can join.
Final words about spreading your SQL Server wings with Microsoft Fabric Lakehouses
I hope this post about spreading your SQL Server wings with Microsoft Fabric Data Lakehouses has proved to be useful.
Because this post has been in the works for a while now and I thought it was appropriate to publish it after Microsoft Fabric became Generally Available.
Of course, if you have any comments or queries about this post feel free to reach out to me.
[…] Kevin Chant invites you to a swank lakhouse: […]
[…] For those interested, I covered notebooks and some other elements shown in this video recently in another post. Which covers spreading your SQL Server wings with Microsoft Fabric Lakehouses. […]
[…] addition, I have covered some of the material in various posts already. Such as my one about spreading your SQL Server wings with Microsoft Fabric Lakehouses. Where I covered various aspects that are contained in the […]