In this post I want to share the results of my initial testing of the new ESG data estate capability for sustainability reporting in Microsoft Fabric. Which is one of the capabilities offered as part of the Sustainability data solutions in Microsoft Fabric.
Please note: After this post was published a new ESG metrics capability was introduced, making a significant part of this post obsolete. There is now an updated post that covers a review of the ESG metrics capability in Microsoft Fabric after the Microsoft Ignite announcement.
To help with some jargon here, ESG stands for Environmental, Social and Governance. Which are three of the main topics that companies need to report about when reporting on sustainability.
Sustainability is an important topic at the moment. Especially in Europe where a lot of companies are due to report on sustainability metrics. Due to the Corporate Sustainability Reporting Directive (CSRD).
I am really excited to deliver this post. Because sustainability is something that was taught to me in my earlier years. Now I get to cover how to you can utilize Microsoft Fabric to report on it in this post.
By the end of this post, you will know the results of my initial testing of the new ESG data estate capability in Microsoft Fabric. Along the way I share plenty of links.
One key point I want to highlight is that the Sustainability data solutions in Microsoft Fabric are currently in preview and are subject to change. Another point I want to highlight is that I run the below with a trial capacity in my own Microsoft Fabric environment.
How I found out the ESG data estate capability was finally available in North and West Europe
In reality, the ability to deploy the Sustainability data solutions within Microsoft Fabric was available in some Azure regions before March this year.
However, it only became available in the North and West Europe Azure regions during the first week of March. Which surprised me because sustainability is such an important topic in Europe.
Prior to March I was able to change the tenant settings for Sustainability solutions in the Admin portal beforehand. Plus, I deployed the Sustainability data solution to Microsoft Fabric through the Microsoft Cloud Solution Centre.
However, when I went into Microsoft Fabric, I could not see the Industry Solutions option at that time. Which is when I checked the Azure region availability for Sustainability data solutions and discovered that it was not yet available.
In fact, I found out it was available just as I was about to present a session for the Microsoft Fabric User Group Denmark last month.
Deploying the new ESG data estate capability in Microsoft Fabric
When I discovered that the Industry Solutions were finally available in Microsoft Fabric, I started the process to deploy the new ESG data estate capability.
I created a new workspace and then selected the “Industry Solutions” experience in the bottom-left hand corner. I then selected “Sustainability solutions” as below.
I then provided the name ESG_data_estate for my solution.
One key point about the solution name is that it will become the prefix for all your items that are deployed as part of the solution. I show this later in this post.
After providing a name I then selected the ESG data estate capability from the provided options. As you can see, there are other options available. Including the Microsoft Azure emissions insights capability, which allows you to visualize your Azure emissions data.
After selecting ESG data estate a screen appears explaining the purpose of the solution.
I like this screen because as well as describing the purpose of the solution it shows a nice diagram of how the data should ideally flow between the Lakehouses on the right-hand side.
Where the source data is first ingested into a raw Lakehouse. From there, the raw data is transformed to Delta tables in a processed Lakehouse. Which are then used to create metrics in a computed Lakehouse.
Interesting observation about ESG data estate Lakehouses
One interesting observation is the naming convention for the three main Lakehouses involved. Instead of bronze, silver and gold the capability names them ingested, processed and computed.
As you can see, the screen above also shows that all the items contain the solution prefix. Like I mentioned earlier. Just be aware that if you give your capability a long name it will be harder to navigate for objects within the workspace.
With this in mind, when I reference items in in the rest of this post I will refer to their prefix as ‘{Prefix}’. For easier reading.
Anyway, I then clicked on the green ‘Deploy to workspace’ button. Once the deployment had completed, I went into the workspace.
Creating demo data in the ESG data workspace
I wanted to install the ESG data estate demo data within the workspace first. So, I opened up the ‘{Prefix}_LoadDemoDataInProcessedESGDataTables_INTB’ notebook (the name will vary depending on the name of your solution).
I checked that all the cells in the notebook had been pre-populated with the correct information. Plus, that the relevant Lakehouses had already been added to the notebook. I then clicked on the ‘Run all’ button to run all the cells in the notebook.
After all the cells are completed, I went to the ‘{Prefix}_ProcessedESGData_LH’ Lakehouse and confirmed that the tables containing demo data had been created.
This is a good way to get some demo data. However, I must admit that it would be nice to have an alternative demo where data first appears in the raw Lakehouse. To help everybody realize the full flow of data.
Creating the metrics in the ESG data workspace
After checking the tables were there, I then ran the notebooks to create the tables which contain the ESG metrics. To generate the required tables in the computed Lakehouse.
Which you can read about these notebooks in detail in the ‘Compute analytical datasets and metrics‘ section of the ESG data estate documentation online.
All the default notebooks follow a similar pattern to create metrics for each area. First one notebook creates the Fact tables in the Files section within of the computed Lakehouse. By extracting data stored in tables in the Processed Lakehouse.
Afterwards, another notebook creates various metric tables as Delta tables in the compute Lakehouse. By extracting data from the Fact tables that were created by the previous notebook. Most of the tables are named as per the metrics that are required for CSRD reports.
Like in the below example based that creates metrics for Greenhouse Gas (GHG) emissions. Where I first created the Fact tables in the Files location with the ‘{Prefix}_CreateEmissionsFactTables_INTB’ notebook, and then the metrics tables in the Tables location with the ‘{Prefix}_GenerateEmissionsMetricTables_INTB’ notebook.
Sample CSRD report
After creating all the documented Facts and Metrics I then decided to test the sample CSRD Metrics report that comes with the capability. Which you can read about in more detail in the ‘Visualize data and analytics‘ documentation online.
As you can see the report worked as expected.
Of course, this is just a sample report which you can customize to suit your needs.
Creating empty tables
Creating the populated demo tables is all well and good, but what if you want to create empty tables? So that you can map source data that is in the raw Lakehouse to delta tables that exist in the processed Lakehouse.
Well, this capability comes with a GenerateESGTables notebook. Which allows you to create a group of empty ESG model tables based on one of the below business areas.
- GHG Emissions
- Waste Sustainability
- Water Sustainability
- Environmental Social and Govenance
- Circularity
- Business Management
- Party
To identify the schema of a certain table or to find which business area it belongs to you can view the ‘ESGschema.json’ file that exists in the ‘{Prefix}_ConfigandDemo’ Lakehouse. Some tables can belong to multiple business areas.
For example, within the demo tables the information about learning events are not created. However, they are required for certain reporting purposes. In order to find the table and which business areas it belongs in you must find it in the ESGSchema.json file.
Because the LearningEvent table is in multiple business areas I chose one ‘Party’ as a parameter in the GenerateESGTables. As you can see in the below example.
TARGET_LAKEHOUSE = "SDS_ESGDE_ESG_TableCreation_Test_ProcessedESGData_LH"
CONFIG_DATALAKE_PATH = "abfss://xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxx@onelake.dfs.fabric.microsoft.com/xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxx"
SELECTED_BUSINESS_AREA = "Party"
Afterwards, I ran all the cells in the notebook which created all the tables that are classed as being in the Party business area for me.
In reality, there are other methods to help you identify tables in the ESG model. However, this is a method that you can work directly in Microsoft Fabric.
Final words about my initial testing of the new ESG data estate capability
I hope that me sharing the results of my initial testing of the new ESG data estate capability for sustainability reporting in Microsoft Fabric inspires some of you to look into this further.
Because measuring sustainability is important. Especially for companies that are affected by the Corporate Sustainability Reporting Directive (CSRD).
To learn more about it, I recommend going through the “Get started with Sustainability data solutions in Microsoft Fabric” learning path. Which is available in Microsoft Learn.
Of course, if you have any comments or queries about this post feel free to reach out to me.
[…] allows you to store and analyze your Azure emissions data. After testing the new ESG data estate capability in my previous post I thought it was appropriate to cover this one as […]
[…] or Microsoft Purview to consume data for assessments. Like the ESG data estate data I covered in a previous post.One recommendation by Microsoft is that the data conforms to the common star schema design in this […]
[…] In fact, you can consider this post a follow-up to a previous post. Where I covered testing of the new ESG data estate capability for sustainability reporting in Microsoft Fabric. […]
[…] Including leveraging my experience with the sustainability data solutions in Fabric. Which I wrote some posts about. Including one where I share the results of my initial testing of the new ESG data estate capability for sustainability reporting in Microsoft Fabr…. […]
[…] to compare the updates to the previous method that I covered in an older post. Where I covered testing of the new ESG data estate capability for sustainability reporting in Microsoft Fabric. Due to the fact there are significant […]