Skip to content

GitHub hygiene for Microsoft Data Platform repositories

Reading Time: 4 minutes

In this post I want to cover some of the ways I have been looking to improve my GitHub hygiene for Microsoft Data Platform repositories. By following some best practices.

To clarify, GitHub hygiene is a term that I use to describe the practice of keeping GitHub repositories healthy.

Some of you have probably noticed I have been doing this more recently. With this in mind, I thought I would share what I have been doing in this post for a couple of reasons.

First of all, to help raise awareness about some of the best practices I have been doing.

Secondly, because I am interested to get feedback from other members of the Microsoft Data Platform community about this. For example, do you also follow the same practices?

In reality, you can use the below practices to achieve good GitHub hygiene with other GitHub repositories as well by customizing them for their specific content. However, in this post I focus on my ones for the Microsoft Data Platform.

By the end of this post, you will know more about steps I have taken to improve the GitHub hygiene for some of my Microsoft Data Platform repositories.

GitHub hygiene

Before I started making my Microsoft Data Platform repositories public, I wanted to create a sensible naming convention for them. So, I came up with the below naming convention:

I also made sure that they contained a relevant README.md file. Like the one in my GitHub-AzureSQLDatabase repository.

Plus, I made sure that there were also comments to help in the YAML files. For example, in the ‘azure-pipelines-Single-Serverless-Pool.yml’ file in the AzureDevOps-SynapseServerlessSQLPool repository I added comments about the variables required.\

After I initially did this, I noticed that my ‘About’ section on the right-hand side looked a bit empty. Like in the below example for a new repository.

An empty About section that shows poor GitHub hygiene for a Microsoft Data Platform repositories
An empty About section

With this in mind, I started adding descriptions and topics to them. Like you can see in the below example. Based on my popular AzureDevOps-SynapseServerlessSQLPool repository. Which contains a template you can use to perform CI/CD for Azure Synapse Analytics serverless SQL Pools using Azure DevOps.

A completed About section that shows better GitHub hygiene for a Microsoft Data Platform repository
Example from AzureDevOps-SynapseServerlessSQLPool repository

In addition, I have added new labels where applicable. For example, in the AzureDevOps-serverlessSQLPoolToFabricDW I created a new label called microsoft-fabric. Which other can use in their Microsoft Fabric repositories in GitHub.

Wiki for Microsoft Data Platform repositories

Over time I decided to add a Wiki to some of the more popular repositories. To provide more details about how to use them. Which you can see in the Wiki section of the GitHub repository that contains a template to perform CI/CD for serverless SQL Pools using Azure DevOps.

In addition, I started to enable Discussions on some of them. Which gave me the opportunity to add things like announcements.

It also hinted that I should create a license for them. Even though I had already added text that people can use them, but I am not responsible for their use of them. GitHub provides a guide on how to add a license to a repository. In the end I opted to add an MIT license to them.

However, if you find them useful it will make me smile even more if you give them a star. You can do this by clicking on the star button in the top right-hand corner. Plus, I would really appreciate it if you credited the source if you use any of them for your posts or sessions.

Star option in GitHub
Star button

GitHub Actions

Another thing I have started getting into the habit of doing is updating version numbers of GitHub Actions when sharing new workflows. Because I think it is good practice to keep them updated with the latest versions.

For example, the workflow I mention in my post on how to keep your Azure Synapse secrets secret in GitHub has been updated to use the latest version of the sql-Action GitHub Action.

Community Standards

After doing all of this I was curious about how good my GitHub hygiene was compared to recommended community standards.

With this in mind, I created a new public repository and clicked on the ‘Insights’ tab. From there, I clicked on ‘Community Standards’. What I got back was the below:

Community Standards checklist in GitHub to check GitHub hygiene
Community Standards

It was good to see that I had already done some of the items in this checklist. I must admit that I was a bit surprised that there was not an option to create a Wiki on the checklist. Because I personally think that it makes better GitHub hygiene.

I do recommend looking at the Code of Conduct options as well. You can see a good version of this in the dbatools Code of Conduct.

Final words about GitHub hygiene for Microsoft Data Platform repositories

I hope that this post about looking to improve my GitHub hygiene for my Microsoft Data Platform repositories has inspired some of you to do the same. Because I think that it is an important thing to do if you are sharing them with others.

Plus, I do recommend creating a Wiki if you are sharing repositories that would benefit from you providing more details.

Like I said before, I would really appreciate feedback from others who strive to do the same. Of course, if you have any other comments or queries about this post feel free to reach out to me.

Published inAzure Synapse AnalyticsGitHubMicrosoft FabricSQL Server

4 Comments

Leave a Reply

Your email address will not be published. Required fields are marked *