As a part of my work, I frequently get opportunity to review database models. As I am a little more specific about the usage of data types I feel a bit restless when I see a developer using the data type bigint or varchar(max) when a tinyint or varchar(50) would have sufficed.
DB developers who argue that storage space is cheaper these days are under a wrong impression. Remember Data types are not only about storage space because if this was the case DB vendors shouldn’t have introduced various data types with various sizes at the first place.
If you care for performance in the long run choose not only appropriate data type but also define appropriate size as well.
Note: This article is applicable to SQL Server 2008, 2012 and 2014 editions.
While designing a database there are multiple parameters that needs to be taken care of and discussing about them is not in the scope of this article. From my experience I can tell you that the most basic thing that you can learn about database design is that to know a little more about the data types. It’s simple to know them and creates the foundation of your DB design, don’t ignore them.
There are at least 2 basic data type related considerations that needs to be taken care of while designing a database.
- Choose the appropriate data type.
- Choose the right size of the data type (wherever applicable).
Even if you don’t follow the above 2 rules your database will work and application will run. But you must follow it because:
- Performance: It makes a big difference when your database grows.
- Design Practice: Why unnecessarily allocate more storage when you can do it with less?
Exact-number data types : bigint or int or smallint or tinyint
There’s a general tendency with developers to use int, without giving a second thought, wherever they require exact numbers to be entered.
Don’t use bigint/int when you can do it with tinyint.
Before you start designing a table, read the software requirement document thoroughly and have fair idea of what would be stored in the columns of that table. Say for example if you are creating a table to store all the countries of the world in two columns, CountryId CountryName.
Suppose CountryId is the primary key. Which data type would be appropriate for it? I’ve seen developers who unnecessarily use int for this. I would choose tinyint. Why? Because there are indeed less than 200 countries in the world. If you want to take extra precaution to be on the safer side, i.e. if you are someone who thinks it might happen that many countries would disintegrate and number of countries may increase to a figure beyond the capacity of tinyint (up to 255) then choose smallint (up to 32,767).
For defining types like CategoryType, AddressType, DepartmentIds tinyint is an appropriate data type unless you are sure of something which might require more than 255 values.
Tip: Before choosing bigint think if int can suffice the requirement of that field. Before using int think if smallint can suffice it.
String data types: char or varchar
As per the MSDN guidelines:
- Use char when the sizes of the column data entries are consistent.
- Use varchar when the sizes of the column data entries vary considerably.
- Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.
Let me describe the above cases with practical examples:
Char should be used basically for fixed length code like Currency code (always 3 digits – USD, INR, GBP etc.). US State code (AZ, CA etc), Application related code like F (for Female) or M (for Male). Do not use this for storing more than 5 characters.
Varchar is the most popular data type. If you declare a column data type as varchar(50) and store a value “CodingThis” in that column then the storage size for this value will be 10 bytes. Varchar(50) means you can store up to 50 characters only. This is not in case of Char as if you declare a column as char(50) and store the same value “CodingThis” it will occupy 50 bytes of storage!
An example follows to describe the above:
For this exercise Create a table named SizeTest with 2 columns namely:
- CharColumn with data type defined as char(10)
- VarCharColumn with data type defined as varchar(10)
I’ve entered the following values in our table named SizeTest
In the following image you can see that char will always occupy the storage space that has been defined whereas varchar will occupy the storage based on number of characters actually stored in the column.
There are developers who straightway define every column as varchar(max). It’s nothing but laziness. Varchar(max) should be used only when you’ve to store huge amount of data that may exceed 8000 characters!
Remember: you can’t create an index on a varchar(max) column.
Datetime: SmallDateTIme or Date or DateTime2 or DateTime
For all practical purposes first consider using Date or SmallDateTime. For example, if you are getting a date input from user like, Journey Date, Date of Birth, Joining Date you are not asking the user about time. So, in these kind of scenario Date is the most appropriate data type. Similarly if you are storing OrderDate, you may not require the precise timing correct up to seconds. If this is the case go for SmallDateTime. If you want a precise timing go for DateTime2. In SQL Server 2008 you can use DateTime2, it stores a date and time, takes 6-8 bytes to store and has a precision of 100 nanoseconds. I would not recommend the usage of DateTime as it might be deprecated in the future editions.
What your favorite data type related tips?