Automate testing Microsoft Fabric Data Pipelines with YAML Pipelines

Reading Time: 7 minutes

In this post I want to cover one way that you can automate testing Microsoft Fabric Data Pipelines with YAML pipelines in Azure DevOps. By implementing the Data Factory Testing Framework within Azure Pipelines in Azure DevOps.

I previously shared the results of my initial tests of the Data Factory Testing Framework in the Fabric community blogs.

Now I want to show how you can automate testing Microsoft Fabric Data Pipelines in an efficient manner. In order to implement automated tests whilst performing CI/CD in order to identify potential issues. Plus, show how you can view your test results in Azure DevOps.

Completed YAML Pipeline with automated Data Pipeline testing with YAML Pipelines in Azure DevOps — Completed YAML Pipeline with automated Data Pipeline testing in Azure DevOps

One thing I want to make clear is that even though the example in this post is based on performing CI/CD with fabric-cicd you can implement this level of testing with a variety of CI/CD options for Microsoft Fabric. Including Fabric-CLI and PowerShell.

Because you can introduce the relevant tasks to perform testing wherever is necessary in Azure DevOps.

To manage expectations, this post shows how to perform tests with a YAML pipeline. You can see how to do it with the classic pipelines in my previous post. Along the way I share plenty of links.

Sample Git repository to test Microsoft Fabric Data Pipelines with YAML Pipelines

I created a new Git repository to accompany this post called AzureDevOps-fabric-cicd-with-automated-tests.

I purposely decided to create a new repository instead of updating the AzureDevOps-fabric-cicd-sample repository that I previously shared. So that the original repository is kept simple I can add additional tests to this one ongoing.

Within the AzureDevOps-fabric-cicd-with-automated-tests Git repository there are three different YAML files which can be used to create YAML pipelines for the below scenarios.

fabric-cicd-demo-variables.yml – Pipeline that contains references to Azure Pipeline variables. For scenarios where all the values are constant.
fabric-cicd-demo-wsparameters.yml – Pipeline that contains parameters that affect workspace values. Including workspace ID and items to deploy.
fabric-cicd-demo-gblparameters.yml – Pipeline that contains global parameters. Suited for Option four in the recommended CI/CD options article by Microsoft.

Note: Even though I share three different YAML pipelines in the Git repository, this post is only based on the one that requires variables. I recommend sticking with the variables-based pipeline. Unless you really need to work with parameters.

One quick way to get started is to import the repository into Azure DevOps. From there, create a pipeline from an existing YAML file.

When you start any of YAML pipelines in Azure DevOps you may need to permit them to access resources when you first run the pipeline. Just permit them and let the pipeline resume.

All of the sample Fabric items in this repository are from the original fabric-cicd GitHub repository. I changed the Power BI report to test parameterization.

My only ask is that you give this repository a star in GitHub if it proves to be helpful.

Example in this post

This post is based on a sample YAML pipeline that fully requires variables. Which can be found in the fabric-cicd-demo-variables.yml file within my sample Git repository.

For the benefit of simplicity, I added the testing of the data pipeline that exists in my Microsoft Fabric development workspace as an additional stage to my YAML pipeline. So that it is easier to visualize the flow of activities.

You are more then welcome to customize where you perform your testing to cater for your needs.

I made some changes to the fabric-cicd samples for the benefit my previous post. For completeness I list the changes below.

I added the WithoutSchema sample Lakehouse that is included as part of the samples in the fabric-cicd GitHub repository.
I changed the sample “Run Hello World” data pipeline a new pipeline parameter called DirectoryName. In order to dynamically change a folder location.
In addition, I added a new copy data activity to the “Run Hello World” data pipeline. Which downloads the sample NYC Taxi data to a folder in the Lakehouse. Specifying the created parameter as the folder location.

New copy activity in Data Pipeline to show how to automate testing Microsoft Fabric Data Pipelines with YAML Pipelines Azure DevOps — New copy activity in Data Pipeline

Azure DevOps variable groups and environment

Before working on the below example, I first verified that the two variable groups referenced in my YAML pipeline existed in Azure DevOps. One that contained sensitive values and one that contained non-sensitive values.

For reference, the variable group that contains sensitive values should ideally link to secrets in Azure Key Vault. Alternatively, you can look to work with the Azure Key Vault task in your YAML pipeline instead.

In the variable group for sensitive values the below variables are in-place:

AZURE_CLIENT_ID – Your service principal client ID.
AZURE_CLIENT_SECRET – Your service principal secret. Note this is the secret value.
AZURE_TENANT_ID – Your Microsoft Entra tenant ID.

Whereas the variable group for non-sensitive values contains the below:

ItemsInScope – List of all items you want deployed. For this example, I opted for the below items in the below format:
“Notebook,Environment,Report,SemanticModel”
ProdEnv – Name of production environment in the context of fabric-cicd parameters. As opposed to the Azure DevOps environment.
ProdWorkspace – GUID value of the production workspace.
resourceUrl – URL value to get a bearer token for the specific API. In this case for “https://api.fabric.microsoft.com”.
TestEnv – Name of production environment in the context of fabric-cicd parameters.
TestWorkspace – GUID value of the test workspace.

I already had a Production environment configured in Azure DevOps. Along with an approval configured. One key point I must add is that the approvals and checks for deployment to an environment can be in many forms.

Automate testing Microsoft Fabric Data Pipelines with YAML Pipelines

After performing all the necessary checks I was then ready to configure automate testing the changes to my Microsoft Fabric Data Pipeline with a YAML pipeline.

In order to do this I added a new stage to my release pipeline and added a new job. As per the below code.

stages:

- stage: TestDP
  displayName: 'Test Data Pipeline'
  
  jobs:
  - job: 'TestDataPipeline'
    displayName: 'Test Data Pipeline'

I then added the below four tasks to my new stage.

Use Python 3.12

First task to run is to select the version of Python to work with via the Python version task. I had previously encountered issues when selecting Python 3.13 so opted for Python 3.12 instead.

- task: UsePythonVersion@0
  displayName: 'Use Python 3.12'
  inputs:
    versionSpec: 3.12

Install necessary libraries

Afterwards I had to install the necessary libraries in a PowerShell task. In this task I ran the below code.

- task: PowerShell@2
  displayName: 'Install necessary libraries'
  inputs:
    targetType: 'inline'
    script: |
      pip install data-factory-testing-framework  
      pip install pytest
    pwsh: true

Running the above code installs the Data Factory testing Framework and Pytest libraries. Both of which are required to test data Pipelines this way.

Run sample pipeline test

Once the libraries are installed the next task runs a pipeline script that I cover later in this post. In addition, I specify to output the test results into a special xml file so that I can display the results in Azure DevOps.

- task: PowerShell@2
  displayName: 'Run sample pipeline test'
  inputs:
    targetType: 'inline'
    script: |
      pytest Tests\simple-pipeline-tests.py --junitxml=simple-pipeline-test-results.xml
    pwsh: true

Publish Pipeline Test Results

Last task that I configured in the new stage was a Publish Test Results v2 task. Which publishes the xml output that contains the test results into Azure DevOps. I specified to look for the below file(s) with the JUnit result format.

- task: PublishTestResults@2
  displayName: 'Publish Pipeline Test Results'
  inputs:
    testResultsFormat: 'JUnit'
    testResultsFiles: '**/simple-*.xml'
    failTaskOnMissingResultsFile: true
    testRunTitle: 'Sample Data Pipeline Test'
  condition: always()

I set the condition to be always so that the task always runs. Even if there is a failure in the previous task.

Doing this means that when there is a test failure this task will still publish the results. However, the stage itself will fail and the YAML pipeline will not continue.

Python script to perform Data Pipeline test

Once everything else was configured the final piece of the puzzle was the Python script to perform the testing. So I checked that the below Python script existed in the Tests folder.

import os
import pytest
from data_factory_testing_framework import TestFramework, TestFrameworkType
from data_factory_testing_framework.models import Pipeline
from data_factory_testing_framework.state import PipelineRunState,RunParameter, RunParameterType


@pytest.fixture
def test_framework(request: pytest.FixtureRequest) -> TestFramework:
    return TestFramework(
        framework_type=TestFrameworkType.Fabric,
        root_folder_path=os.path.dirname(request.fspath.dirname),
    )

@pytest.fixture
def pipeline(test_framework: TestFramework) -> Pipeline:
    return test_framework.get_pipeline_by_name("Run Hello World")

def test_directory_parameter(request: pytest.FixtureRequest, pipeline: Pipeline) -> None:
    # Arrange
    activity = pipeline.get_activity_by_name("Copy sample data")
    state = PipelineRunState(
    parameters=[
        RunParameter(RunParameterType.Pipeline, name="DirectoryName", value="SampleData")
    ],
    )

    # Act
    activity.evaluate(state)

    # Assert to check correct directory name is used
    assert (
        activity.type_properties["sink"]["datasetSettings"]["typeProperties"]["location"]["folderPath"].result
        == "SampleData"
    )

In reality, this script is very similar to my original post on the Fabric Community blog. Where I test that the output Directory is set correctly. With some slight changes, as below.

I needed to import os to work with os.path.dirname to navigate between folders.
I had to change the name of the data pipeline to the “Run Hello World” sample.

Test results to automate testing Microsoft Fabric Data Pipelines with Azure DevOps

When I ran the pipeline the three stages completed like in the below example.

Completed Release Pipeline in Azure DevOps

In my new “Tests Data Pipeline” stage an icon appeared showing completed tests. Clicking on the stage and changing the filter to “Passed” allowed me to see more information.

Of course, my testing would not be complete without testing for failure. So, I updated my Python script so that the name of the directory in my assertion test was SampleData2.

Doing this caused my test to fail and my YAML pipeline to stop at my testing stage.

However, because I had configured the condition for the publish task to always, I was still able to view the test results.

Final words about testing Microsoft Fabric Data Pipelines with YAML Pipelines

I hope that showing one way you can automate testing Microsoft Fabric Data Pipelines with YAML Pipelines in Azure DevOps helps some of you get started.

Plus, has introduced a lot of you to the Data Factory Testing Framework. Because there are a lot of possibilities with it.

Of course, if you have any comments or queries about this post feel free to reach out to me.

2 Comments

Create Data Pipeline tests with GitHub Copilot in Visual Studio Code - K Chant

[…] For the benefit of this post I worked with the Git repository that I created in a previous post. Where I shared how to automate testing Microsoft Fabric Data Pipelines with YAML Pipelines. […]
Operationalize Fabric workspaces with Azure DevOps using Fabric CLI and fabric-cicd - K Chant

[…] I covered how to perform fabric-cicd deployments in a previous post about operationalizing fabric-cicd with YAML Pipelines. So, details about that stage will be kept to a minimum to avoid repetition. Along the way I share […]

Automate testing Microsoft Fabric Data Pipelines with YAML Pipelines

Sample Git repository to test Microsoft Fabric Data Pipelines with YAML Pipelines

Example in this post

Azure DevOps variable groups and environment

Automate testing Microsoft Fabric Data Pipelines with YAML Pipelines

Use Python 3.12

Install necessary libraries

Run sample pipeline test

Publish Pipeline Test Results

Python script to perform Data Pipeline test

Test results to automate testing Microsoft Fabric Data Pipelines with Azure DevOps

Final words about testing Microsoft Fabric Data Pipelines with YAML Pipelines

Related

2 Comments

Leave a Reply Cancel reply

Automate testing Microsoft Fabric Data Pipelines with YAML Pipelines

Sample Git repository to test Microsoft Fabric Data Pipelines with YAML Pipelines

Example in this post

Azure DevOps variable groups and environment

Automate testing Microsoft Fabric Data Pipelines with YAML Pipelines

Use Python 3.12

Install necessary libraries

Run sample pipeline test

Publish Pipeline Test Results

Python script to perform Data Pipeline test

Test results to automate testing Microsoft Fabric Data Pipelines with Azure DevOps

Final words about testing Microsoft Fabric Data Pipelines with YAML Pipelines

Share this:

Related

2 Comments

Leave a Reply Cancel reply