Skip to content

The great “number of workspaces for medallion architecture in Microsoft Fabric” debate

Reading Time: 5 minutes

In this post I want to share my thoughts about the great “number of workspaces for medallion architecture in Microsoft Fabric” debate.

Since I got asked about it this week during the Learn Together session I did alongside Shabnam Watson (l/X). Plus, it is a highly debated topic in our community, and I wanted to share my thoughts about it.

Due to the fact that my personal opinion is that it depends. However, the number you choose depends on a variety of reasons which I intend to cover in this post.

By the end of this post, you will know my personal opinions as to why. Plus, plenty of things to consider when deciding on the number of workspaces to implement.

Along the way I also share plenty of links.

Medallion architecture recap

Before I go any further, I will do a short(ish) recap about the medallion architecture in Microsoft Fabric. To recap for those who know and to help those studying for the DP-600 Microsoft Fabric exam.

Basically, the medallion architecture is a suggested architecture paradigm where you ingest data and transform it in various layers, sometimes referred to as zones.

In order to ensure that data is reliable and consistent enough for it to be consumed elsewhere. Plus, to give more peace of mind that the relevant data is stored securely (more on that later).

In reality, the medallion architecture is based on concepts that have been around for years.

Typically, there are three layers recommended when working with the medallion architecture. Which are commonly known as bronze, silver and gold.

Three layers typically recommended for the medallion architecture. Which is a source of discussion for the great "number of workspaces for medallion architecture in Microsoft Fabric" debate
Three layers typically recommended for the medallion architecture

For those wondering, the colors for the layers above were created using the official color codes.

Typical data flow in the medallion architecture

Anyway, these layers have also been known as other names over the years and still are for some. Typically, data flows between the layers as follows:

  • Source data gets ingested into the bronze layer in its original format, or the closest format to it.
  • Once in the bronze layer data is extracted and transformed into a cleansed state into the silver layer. Typically, things that occur at this stage includes removing duplicates and converting null values to a standardized value.
  • Afterwards, the data is extracted and transformed from the silver layer to the gold layer. During this stage data tends to be aggregated and prepared to be consumed elsewhere.
    For example, Power BI for BI purposes or Microsoft Purview to consume data for assessments. Like the ESG data estate data I covered in a previous post.
    One recommendation by Microsoft is that the data conforms to the common star schema design in this layer.

Now, the reason I said typically because the above is advisory and you can customize to suit your business needs.

Anyway, you can find out more about this by going through the “Organize a Fabric lakehouse using medallion architecture design” Microsoft learn module. Alternatively, you can find resources that explain this on the Microsoft Fabric Career Hub.

The great “number of workspaces for medallion architecture in Microsoft Fabric” debate

I know there has been a lot of debate about the number of workspaces required when looking to implement the medallion architecture.

Due to the fact that some people prefer to have one workspace for all three layers and others recommend a separate workspace for each layer.

Personally, I think it depends on a few different things. I know it sounds like the typical answer from a data platform engineer, but it really does because there are so many factors involved.

Below are some factors which in my opinion you need to consider when thinking about the number of workspaces required for your medallion architecture.

Environment you are creating your medallion architecture in

You need to consider the environment you are creating your medallion architecture in. For example, creating in your own Microsoft Fabric environment is fine.

However, if the intention is to deploy to multiple workspaces elsewhere you should test that as well.

Sensitivity of data in your medallion architecture

You must consider sensitivity of data. For example, dummy or sanitized data are good candidates for deploying the medallion architecture to a single workspace.

If you intend to deploy production data to a single workspace you must take security very seriously for all the layers concerned.

For highly sensitive data, or data where there is a clear separation of duties required for each layer, multiple workspaces are more appropriate.

For example, you can end up in a situation where different personas want access to the different layers for various purposes. Such as data scientists and developers requiring access to the raw layer and business analysts wanting access to the gold layer.

In this instance, to make sure that personas only get access to the data that they are required to work with you can configure the layers in different workspaces.

Governance requirements for each layer

You must consider governance requirements. For example, when clear separation of duties is required like in the previous example than multiple workspaces are a more appropriate solution.

Plus, any Enterprise Architecture principles that are in-place or requirements from your security team.

Capacity requirements for each layer in the medallion architecture

Another point to consider is the capacity requirements for each layer.

For instance, you need an F64 Fabric capacity or above to ingest data sources with either trusted workspaces or managed virtual networks.

With this in mind, you must ask yourself if all the layers require these security features. Plus, even if they should.

In addition, there are other capacity considerations to consider. Including Copilot for Data Factory usage and the data sovereignty for each layer.

Fabric items used for each layer

In my previous diagram I showed the different layers represented as Lakehouses.

However, you can potentially look to implement Data Warehouses for the silver or gold layers. Which can be beneficial if T-SQL is required or there is a strong preference to perform in-place Warehouse restores.

Medallion architecture with Data Warehouses

If so, one important point to consider is do you want to move your Warehouses to different workspaces in order to avoid accidental restores. Even more so since it was announced recently that a UX for restores is coming soon.

Administering multiple Microsoft Fabric workspaces

When you are considering implementing multiple workspaces you must ensure you are on top of Microsoft Fabric administration. Due to the fact that it does increase the complexity of your Microsoft Fabric estate.

Even more so when you need to implement separate workspaces to cater for Development, Test, Acceptance and Production (DTAP) environments. Like I mentioned in a previous post. Because then the number of workspaces you must support increases exponentially.

It makes for some interesting mathematics. Especially if the intention is to do it for every data product.

Final words about the great “number of workspaces for medallion architecture in Microsoft Fabric” debate

I hope my post about the great “number of workspaces for medallion architecture in Microsoft Fabric” debate has given you food for thought.

To clarify, there are a lot of things to consider when deciding on the number of workspaces to implement as part of a medallion architecture. Luckily, there are plenty of resources online to help you make the decision.

Of course, if you have any comments or queries about this post feel free to reach out to me.

Published inDP-600Microsoft Fabric

6 Comments

  1. John Britto John Britto

    When we intent to implement medallion architecture we are bound to have 3 lake houses for (Bronze, Silver and Gold).

    Let’s say we keep these 3 lake houses in 3 different workspaces for better security and governance.

    When we do the deployment to DEV, TEST and PROD, then each environment should have 3 different layers.

    Does this mean we need to have 9 different workspaces to manage?

    Or we can have DEV and QA in single workspace with folder organized and only for PROD we have 3 different workspaces?

    • Kevin Chant Kevin Chant

      Hi John,

      It depends on a few different factors, including the security policies in your organization, the maturity of your governance and the data you will store in your QA environment.

      For example, if your QA data is production grade data then you may need to keep them separate.

      Others have stated that they will keep the three layers in the same workspace and will only separate when absolutely required. I would check your security policies first and take things from there.

      Kind regards

      Kevin

  2. John Britto John Britto

    Thanks for the response, Kevin.

    You are right our QA data will be production grade only.

    I have another question too, unsure if this is in a relevant threat.

    When we implement DevOps, I do see a lot of suggestions online as well as from Microsoft that we need to have our notebooks in a different workspace than the lakehouse or warehouse.

    This suggestion further increases the number of workspaces that we need to manage.

    How do you see this? Do we have a better way to implement DevOps without having the headache of managing multiple workspaces?

  3. Matt Weirath Matt Weirath

    I wanted to add to the discourse, especially around Deployment Pipelines. If you want to leverage Deployment Pipelines for Dev, Test, and Prod, having fewer workspaces with more Fabric items can also make the use of Deployment Pipelines more challenging. You can easily reach a point where you have to be very careful about Selective Deployments to ensure that you don’t accidentally publish something to a higher-level environment that is not ready. This only becomes more challenging as you increase the number of developers working in a space.

    I could imagine the Deployment Pipelines process becoming very manual. You could end up tracking items on spreadsheets to determine what is ready to go to the next environment.

    Matt

Leave a Reply

Your email address will not be published. Required fields are marked *