Skip to content

Allow list for Databricks Repos

Reading Time: 6 minutes

In this post I want to cover the results of some interesting tests I did with the allow list for Databricks Repos.

In short, Databricks Repos is a reasonably new feature in Azure Databricks which allows you to work on a complete set of files that are stored in a Git repository.

To find out how to set it up Databricks Repos in Azure Databricks you can read the Microsoft guide about Git integration with Databricks Repos. Alternatively, you can read the Databricks guide on how to set up Git integration with Databricks Repos.

You can view the allow list for Databricks Repos yourself if you have the right permissions. By going to Settings and selecting Admin Console. From there you can scroll down to the Repos section.

Repos section in Admin console

As you can see above, for the benefit of this post I have already added a link to a single AGit repository that is in an Azure DevOps to the allow list.

I tested a few different things relating to the allow list in Databricks Repos. By the end of this post, you will know the results of those tests. Plus, something you need to be aware of if you are using the allow list for Databricks Repos.

Testing entries in the allow list for Databricks Repos

For these tests I setup Databricks to use an existing Azure DevOps organization. Which contained various repositories. Including the one I had added to the allow list.

I first added the URL of one specific repository that I had created in Azure Repos within Azure DevOps. I then created a feature branch in the GUI because I think it is good practice.

Creating a feature branch in Databricks Repos
Creating a feature branch

Afterwards, I created a new notebook. After adding some code in the cell all I had to commit the changes to my local Git repository and push the changes to the repository in Azure DevOps. In order to do this, I clicked on the branch name and filled out the below window.

Doing a commit & push in Databricks Repos
Commit & Push

Those with Git experience will recognize that the above GUI is showing me what has changed in my Git repository and inviting me to do the commit and push I described above. Which I did and it worked fine.

Once done it invited me to create a pull request in Azure DevOps if I clicked on a new link that appeared.

I must admit I had issues opening using the link on the same page in Databricks Repos to do this. However, I can still create pull requests by going directly into Azure DevOps.

Testing an Azure DevOps repository not in the allow list

I then tried to add another Git repository which was not in the allow list. It failed stating the below error.

Error message when there is an entry in the Databricks Repos allow list
Error creating repo

So, I changed the allow list permissions to only restrict Commit & Push to allowed Git repositories.

Changed Allow List permissions

This time around I was able to clone the repository in Databricks Repos. However, when I went to commit changes to a notebook I got the same error message as above.

Which is good. Because it means that there is a way you can allow people to clone existing repositories without them being able to push updates back to the source repositories if you want to do so.

I then changed my allow list to contain to be the link to the address of the Azure DevOps project which contained this repository. Which was a link in the format of “https://dev.azure.com/{my Azure DevOps organization name}/{my Azure DevOps project}”.

Afterwards, I waited a while for the update to work. However, I got a bit impatient. So, I decided just to remove the reference to the Azure DevOps projects and then add it back again. I could then update the repository in Azure DevOps.

Which highlights a key point. If you read the allow lists restrict remote repo usage from Databricks it can take up to fifteen minutes for changes in the Admin Console to take effect. Something to bear in mind if you want to make any changes.

Web terminal workaround for the allow list

Something that I want to stress in this post is that the allow list only appears to work for the Databricks Repos feature. You can currently still update repositories that are not in the scope of the allow list by other means.

For example, a while ago I published a post about my first impressions of the web terminal in Azure Databricks. Whilst writing that post I discovered that I could use Git in the web terminal.

Using Git in the Databricks web terminal
Using Git in the Databricks web terminal

So, I decided to test connecting to a separate Git repository in GitHub whilst the allow list that I created in Databricks Repos was in place.

I was able to clone the repository due to the fact that I had setup the right credentials to access it.

From there I was also able to run the below commands to upload a new file into a GitHub repository. Even though the allow list for Databricks Repos only listed an Azure DevOps project.

touch howdy.txt
git add .
git commit -m "Added file to test"
git push origin

I decided to include the above code in case anybody wants to test this for themselves. You can see the result in the GitHub repository below.

New file in GitHub even though GitHub is not in the allow list
New file in GitHub

So, bear in mind that you must take additional steps if you want to limit where you want your Databricks code to end up. For example, you might want to check your Databricks permissions or configure a firewall somewhere as an extra line of defence.

Initial thoughts about allow list for Databricks Repos

Databricks Repos is a very useful way to limit which Git repositories you can access with the Databricks Repos feature. Which can be very useful. Just bear in mind that currently it appears to only limit access to repositories within the Databricks Repos feature itself.

For instance, even with the allow list enabled you can still access other Git repositories hosted elsewhere by using the web terminal. Of course, you can disable the web terminal if needed.

I want to highlight a couple of other interesting observations I noted along the way during these tests as well.

First of all, I had a new window popup appear notifying me about new visualizations in the Notebook. Which I suspect some of you Databricks enthusiasts will be interested in learning more about.

Secondly, if you do not initialize a Git repository before using Databricks Repos the default branch name appears to be Master.

Final words

I hope my post about testing the allow list for Databricks Repos has been an interesting insight. Because it can be useful.

Plus, it highlights some key points. Like the fact that even if you create an allow list you can currently still access other Git repositories using other methods. So, if you are looking to limit access to other Git repositories completely within Azure Databricks you must take further steps.

Of course, if you have any comments or queries about this post feel free to reach out to me.

Published inAzure DatabricksAzure DevOps

Be First to Comment

Leave a Reply

Your email address will not be published.