Skip to content

Create Data Pipeline tests with GitHub Copilot in Visual Studio Code

Reading Time: 7 minutes

In this post I want to cover how you can create Data Pipeline tests with GitHub Copilot in Visual Studio Code. In order to test Microsoft Fabric Data Pipelines.

Ask Copilot chat and Microsoft Fabric Data Pipeline image

I want to do this post to help others realize what is possible when working with GitHub Copilot in Visual Studio Code.

Plus, highlight the fact that it can save a lot of development time. Because the way we work is changing and offerings such as GitHub Copilot enable us to be more productive.

Along the way I share plenty of links.

Data Pipeline example

For the benefit of this post I worked with the Git repository that I created in a previous post. Where I shared how to automate testing Microsoft Fabric Data Pipelines with YAML Pipelines.

I focus on my customized “Run Hello World” data pipeline. Which contains the default “Run notebook” activity plus a copy data activity that I created. Which downloads the sample NYC Taxi data to a folder in the Lakehouse. Specifying the created parameter as the folder location.

Data Pipeline example to show how to create Data Pipeline tests with GitHub Copilot
Data Pipeline example

If you wish to follow along with this post, you need the following:

Knowledge about the Data Factory Testing Framework will be beneficial if you intend to create your own Data Pipeline tests.

Create single Data Pipeline test with GitHub Copilot in Visual Studio Code

To create Data Pipeline tests with GitHub Copilot I first opened up the Git repository that contains the metadata for the workspace in Visual Studio Code. I then navigated to the existing test file that I had created for my previous post.

Once I had opened the file I clicked on the GitHub Copilot icon at the top of Visual Studio Code and selected Open Chat.

Selecting Open Chat option to open Copilot
Selecting Open Chat option

After doing this the “Ask Copilot” window appeared.

Ask Copilot in Visual Studio Code
Ask Copilot in Visual Studio Code

First of all, I needed to provide more context for what I wanted. So, I clicked on “Add context” in the chat and selected the “pipeline-content.json” metadata file stored in my data pipeline folder.

This is where my prompt engineering skills were put to the test. Because I then asked Copilot the below.

Can you create a new test file written in Python that is similar to the current file which uses the data_factory_testing_framework Python library to check that the destination type in the “Copy sample data” is a Lakehouse?

Below is the response I got, which I opened in a new window for visibility.

Copilot response that contains a Data Pipeline test
Copilot response

As you can see, as well as suggesting the code it also provided me with the option to create the “test_lakehouse_destination.py” file. So, I clicked on the option to create the new file and saved it.

To keep the test simple, I went into my YAML pipeline file and updated the relevant PowerShell task to work with the new test file, as below.

- task: PowerShell@2
  displayName: 'Run sample pipeline test'
  inputs:
    targetType: 'inline'
    script: |
      pytest Tests\test_linked_service_type.py --junitxml=simple-pipeline-test-results.xml
    pwsh: true

I then saved the changes to the file for my YAML pipeline and committed all the updates to my local Git repository. When prompted, I synchronized the change to my Azure DevOps repository.

Data Pipeline test results in Azure DevOps for simple test

When I ran the YAML pipeline in Azure DevOps the three stages completed like in the below example.

Completed YAML Pipeline in showing a passed data Pipeline test Azure DevOps
Completed YAML Pipeline in Azure DevOps

I clicked on the “Tests Data Pipeline” stage and changed the filter to “Passed” which allowed me to see more information.

Confirming Lakehouse test had passed thanks to Copilot in Visual Studio Code
Confirming Lakehouse test had passed

Of course, my testing would not be complete without testing for failure. So, I changed the destination in the Data Pipeline to be a Data Warehouse instead.

Doing this caused my test to fail and my YAML pipeline to stop at my testing stage. However, because I had configured the condition for the publish task to always, I was still able to view the test results.

Viewing information about failed test
Viewing information about failed test

Multiple Data Pipeline tests with GitHub Copilot in Visual Studio Code

GitHub Copilot was able to create a test file for one pipeline with ease. But what about a real-life scenario for multiple pipelines?

I decided to create nine copies of my Data Pipeline in my workspace to test another real-world scenario.

Which was to check that the Copy data task in all my Data pipelines were activated. Because in the real-world you want to know if an activity was inadvertently deactivated as soon as possible. Especially when working in a workspace with multiple Data Pipelines.

So, I created all the Data Pipelines in Microsoft Fabric, committed the change back to my Git repository and synchronized the changes locally in Visual Studio Code.

I then went back to Ask Copilot in Visual Studio Code whilst I had my Python test file opened. First, I clicked on “Add Context” again and added one of the “pipeline-content.json” files stored in one of the Data Pipeline folders.

I was then ready to ask Copilot another question. Admittedly it took a few attempts to get a test that worked properly. Finally, I got a working script by asking the below question:

Can you create a new test file written in Python that first loops through to identify all the Data Pipelines in my workspace folder and then uses the data_factory_testing_framework Python library to loop through all the Data Pipelines in my workspace folder by name and check that all the activities that have a type of “Copy” do not have a state value of “Inactive”?

Copilot returned the below for me.

It also provided the below explanation for me as well.

Certainly! Here’s a Python test file that:

  1. Discovers all Data Pipeline folders in your workspace directory (looks for folders ending with .DataPipeline).
  2. Loads each pipeline’s JSON (pipeline-content.json).
  3. Checks all activities of type Copy to ensure their state is not "Inactive".
  4. Uses pytest for test structure and data_factory_testing_framework for any helpers (though the main logic is standard Python).

Place this file in your Tests folder (e.g., Tests/test_pipeline_copy_activity_state_not_inactive.py):

import os
import json
import pytest
from data_factory_testing_framework import DataPipelineTestCase  # Optional, for future extension

WORKSPACE_DIR = os.path.join(os.path.dirname(os.path.dirname(__file__)), "workspace")

def find_pipeline_jsons(workspace_dir):
    """Yield (pipeline_name, pipeline_json_path) for all DataPipeline folders."""
    for entry in os.listdir(workspace_dir):
        if entry.endswith(".DataPipeline"):
            pipeline_dir = os.path.join(workspace_dir, entry)
            pipeline_json = os.path.join(pipeline_dir, "pipeline-content.json")
            if os.path.isfile(pipeline_json):
                yield entry, pipeline_json

@pytest.mark.parametrize("pipeline_name,pipeline_json_path", list(find_pipeline_jsons(WORKSPACE_DIR)))
def test_copy_activities_not_inactive(pipeline_name, pipeline_json_path):
    with open(pipeline_json_path, "r", encoding="utf-8") as f:
        pipeline = json.load(f)
    activities = pipeline.get("properties", {}).get("activities", [])
    for activity in activities:
        if activity.get("type") == "Copy":
            state = activity.get("state", None)
            assert state != "Inactive", (
                f"Pipeline '{pipeline_name}' has a Copy activity '{activity.get('name')}' with state 'Inactive'."
            )

How it works:

  • The test will automatically run for each pipeline in your workspace.
  • It will fail if any Copy activity has "state": "Inactive".

Requirements:

  • pytest must be installed.
  • data_factory_testing_framework is imported for future extensibility, but not strictly required for this check.

Let me know if you want to extend this to use more features from data_factory_testing_framework!

As you can see, Copilot clearly stated that the Data Factory testing Framework was not required for this particular test. So I removed the reference to it in the suggested code.

I added the script to my Git repository and updated the file for my YAML pipeline to work with the new test instead.

- task: PowerShell@2
  displayName: 'Run sample pipeline test'
  inputs:
    targetType: 'inline'
    script: |
      pytest Tests\test_pipeline_copy_activity_state.py --junitxml=test-results-all-pipelines.xml
    pwsh: true

- task: PublishTestResults@2
  displayName: 'Publish Pipeline Test Results'
  inputs:
    testResultsFormat: 'JUnit'
    testResultsFiles: '**/test-results-*.xml'
    testRunTitle: 'Data Pipeline Tests'
  condition: always()

Data Pipeline test results in Azure DevOps for multiple Data Pipelines

When I ran the YAML pipeline in Azure DevOps the three stages completed. I clicked on the “Tests Data Pipeline” stage and changed the filter to “Passed” to allow me to see more information about the passed tests.

Test results when all copy activities are active, showing that the Data Pipeline Tests created with Copilot in Visual Studio Code works
Test results when all copy activities are active

Of course, my testing would not be complete without testing for failure. So I deactivated the Copy activity in two of the Data Pipelines and then run the YAML pipeline in Azure DevOps again.

This time the pipeline failed. I clicked on the “Tests Data Pipeline” stage and changed the filter to “Failed” to allow me to see more information about the failed tests.

Test results when two copy activities are deactivated, showing that the Data Pipeline Tests created with Copilot in Visual Studio Code works
Test results when two copy activities are deactivated

Final words

I hope that by this post on how to create Data Pipeline test with GitHub Copilot in Visual Studio Code inspires some of you to think about the testing possibilities with Copilot.

One thing I must stress is that you need to be good at prompt engineering for more complex scenarios. Plus, it can take a few attempts to get the desired results.

Luckily, there is a Copilot Chat Cookbook available to help. A special thanks to Thomas Thornton for sharing the details about that just after this post was published.

Of course, if you have any comments or queries about this post feel free to reach out to me.

Published inCopilotMicrosoft Fabric

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *