My T-SQL Tuesday contribution for this month is about the SQL Server data type bit versus all other data types. Which is hosted this month by Brent Ozar.
Brent has invited us all to talk about our favourite data types (or least of). In reality, being a regular contributor, I’m expecting a lot more contributions than usual this month. You can find out more about the invite by clicking here or on the image below.
I decided to tweak it a bit for this post, to provide a humorous comparison between the bit data type and others that are available in SQL Server. In reality, this won’t cover every single one.
By the end of this post, you will some see pitfalls to using certain data types and some tips on how to avoid them. Plus, you will find out which data type is my least favourite to use.
You can verify some of the facts relating to these data types in the official documentation here. It’ll be interesting to see if anybody clicks on that link after our host for this month stated that nobody reads the documentation for them.
Let’s start with the other popular numeric type which is int. Int is good and can bring significant performance benefits when used as part of an indexing strategy.
For example, used as a primary key in one table and a foreign key in another to improve the performance of queries that contain a join.
However, it’s fair to say more that there’s been a lot more requests to convert int to bigint then for bit to be changed. I’ve never had a request to convert bit for this reason, so bit wins here.
Tinyint and smallint
Let’s carry on with numeric datatypes. Both tinyint and smallint both have their potential uses. In reality though, most people stick with int “just in case”. I understand it to a degree with tinyint, but surely there’s more user cases for values that will never reach 32,767?
Both are often overlooked when data profiling takes place. Whereas potential issues with a column that probably should be a bit data type is usually picked up straight away.
With this in mind, bit wins here.
Bigint on the other hand is the data type that handles a large range of numbers. In reality, this can be great. Apart from when it’s used for columns that rarely get into four figures. Causing your indexes to be larger than they need to be and potentially causing performance issues.
Luckily, this can be identified by data profiling. However, the bigint data type can still be misused.
In reality, the opposite of what has happened with the int data type can happen with bigint. For instance, an identity column has been declared as a bigint when in reality could be an int or even a smallint.
Again, never happens with bit data type so bit wins here.
To be honest, these have their uses for calculating large values as long as you don’t mind losing some precision. Whereas with a bit value 1 will always be 1.
So, bit clearly wins here because it’s consistent.
Character data types
Now this is where things can get interesting. Again, I should point out that sometimes the char data type is over specified with a higher range of characters than required. Obviously, an issue that never happens with the bit data type.
However, where things get really interesting is when varchar comes into play.
Variable data strings can be great. Allowing your character strings to change in size. With that being said, things some would consider strange have been done with them as well.
For a start, varchar(1). Let’s face it, at one stage or another a lot of us have seen this in use. In fact, I expect others to mention this today as well. Because it is very common.
What will surprise me is if anybody mentions the below, which believe me is perfectly possible.
- First the varchar(1) specified and set to null.
- Followed by an ISNULL function to set the same column to 0.
In reality, these things do end up in code for various reasons. It makes the case for doing peer reviews or pair programming a lot stronger. Because another pair of eyes can identify things like this being done.
In addition, this can also be identified is when pull requests are done in source control. For example, when a pull request is done to merge work into another branch in Azure DevOps. So, if you’re trying to justify using source control to your colleagues you can use this example.
I’m sure this has happened with bit as well. But it’s far more amusing looking back at it happening with something that should really be for more characters, so bit wins here.
It does make me wonder if there should be some sort of warning when people use varchar(1). For example, asking if they want to reconsider using it.
Of course, varchar(max) can be useful for storing LOB data in SQL Server. Unless somebody decides to misuse it for some reason.
I know some people are keen to use it for XML data, but we do have XML datatypes in SQL Server for just that reason. Just for this reason alone the bit data type wins here.
Unicode character strings
Unicode characters can have a lot of uses. However, people have issues when calculating the amount of space that they use.
For this sole reason alone, bit wins here.
Data and time data types
Data and time data types can be useful. Even more so with the newer data types.
Due to the fact that I have had to cast and convert issues in Europe in the past, bit wins here.
XML data types
XML data types were first introduced in SQL Server 2005. In fact, they were a big part of the SQL Server 2005 exams. In addition, you can use special XML indexes with them, which I discussed in an old post here.
However, they do have a slight popularity issue these days. Even ore so with the advances made with other semi-structured offerings.
With this in mind, bit wins here.
First introduced with SQL Server 2000 and I’ve yet to find a legitimate use for the sql_variant data type in any SQL Server database I have worked on. It’s has appeared in a small number of databases that I have supported. However, that’s about it.
Since that’s all I can say about it, suffice to say that bit wins here.
Lack of breadth
Various other SQL Server data types are only supported in Azure SQL Database and various versions of SQL Server. For example, hierarchyid and spatial data types.
Whereas the bit data type is also supported in other services as well. For instance, Azure Synapse Analytics. Making it easier to move data between them.
In reality, bit is a lot easier to declare then hierarchyid as well. With this in mind, bit wins here as well.
So, as you can see in this slightly biased post the bit data type wins here. Of course, in reality most have the data types here have their place in SQL Server apart from sql_variant.
Joking aside, I’ve just never had a legitimate use for sql_variant. If you have done feel free to share with a comment what you have used it for.
As always, if you have any queries or comments about this post feel free to reach out to me.
I laughed so hard. Nicely done.
Now I feel bad that everyone is bashing sql_variant
People are bound to have different opinions but the data types. I’ve just never had a valid use for it myself.
[…] Kevin Chant is a fan of the bit type: […]
Nicely done. Good chuckle.
In all seriousness, sql_variant is useful when you’re writing utilities that hold data that can be of different data types.
[…] Bit versus other data types by Kevin Chant – “So, as you can see in this slightly biased post the bit data type wins here.” (I laughed pretty hard throughout this one because it was not what I was expecting.) […]
I once used bit in a fact table in a data warehouse analytics context. So imagine a table where your have one row per granular item and different measures indicating if it had a status such and such. In that context these columns can only have 0 and 1 so it seemed a good fit for bit. However I ended up converting to tinyint before I loaded the data into analysis service because well you can’t do any arithmetic on bit such as summing it’s more like true or false what is meant I guess.
One drawback for bit type: if you set up an ODBC connection in MS Access to your table, the bit translates to a non-nullable yes/no field. If you have nulls, any users trying to edit a row in Access with a bit field will run into errors.
(Of course the root of the issue is having to support users editing data in Access, but changing hearts and minds is another battle and maybe a topic for a future T-SQL Tuesday!)
“It does make me wonder if there should be some sort of warning when people use varchar(1). For example, asking if they want to reconsider using it.”
THANK YOU. I’ve seen this regularly, both in our home-grown code and databases provided by vendors. Why???