varchar vs nvarchar

Question

as a general rule, should you use varchar instead of nvarchar, if you won't ever be storing foreign language characters?

Answer

Hi @SQL Guy10 ,

If we know that data to be stored in the column or variable doesn’t have any Unicode characters, we can use varchar. But some experts recommends nvarchar always because: since all modern operating systems and development platforms use Unicode internally, using nvarchar rather than varchar, will avoid encoding conversions every time you read from or write to the database.

The major difference between varchar vs nvarchar

Nvarchar stores UNICODE data. If you have requirements to store UNICODE or multilingual data, nvarchar is the choice. Varchar stores ASCII data and should be your data type of choice for normal use.
For nvarchar, when using characters defined in the Unicode range 0-65,535, one character can be stored per each byte-pair, however, in higher Unicode ranges (65,536-1,114,111) one character may use two byte-pairs. Whereas varchar only uses 1 byte.

Advantages and Disadvantages of varchar and nvarchar in SQL Server.

By the way, starting with SQL Server 2019 (15.x), consider using a UTF-8 enabled collation to support Unicode and minimize character conversion issues.

Please refer to below links to get more information.

Difference Between Sql Server VARCHAR and NVARCHAR Data Type
What is the difference between varchar and nvarchar?
SQL Server differences of char, nchar, varchar and nvarchar data types

Best regards,
Cathy

If the response is helpful, please click "Accept Answer" and upvote it, thank you.
Note: Please follow the steps in our documentation to enable e-mail notifications if you want to receive the related email notification for this thread.

Answer

If you never need to make use of the extended characters available in UNICODE then yes, it does make sense to only use VARCHAR.

However, if you are only using characters in the UNICODE range 0-65535 then they use the same storage space as VARCHAR.

See the discussion about NCHAR and NVARCHAR

Remember also that if you use UNICODE then you MUST always prefix strings with "N" otherwise you will introduce an implicit conversion and also affect the SARGability of your predicates.

Answer

as a general rule, should you use varchar instead of nvarchar, if you won't ever be storing foreign language characters?

The problem is: can you really be sure. All of a sudden your business expands and the need arises.

If you start with varchar and need to change later - it will be painful. UTF-8 collations can save you, but not all the way since you need review column lengths.

Answer

General

It depends mainly on:

What do you consider "foreign language characters"?
How guaranteed are you that this requirement won't change?

Regarding "foreign language characters", will something like emojis every be an option? Meaning, are you thinking in terms of "no characters outside of the local alphabet and punctuation", or "alphabet and punctuation are local, but emojis might get thrown in"?

For me, I would say that "codes" usually have a stable range of characters and can safely be VARCHAR. For example, postal codes, airline confirmation codes, credit card authorization/confirmation codes, ISO country codes, etc.

But names, whether for people or places, or even products, etc, can increasingly contain "foreign language characters". Even if it's reasonable to expect local street names to contain only local language characters, if you are dealing with a list of customers and the business isn't 100% local, then you can't guarantee only local language characters. So for names, etc it is better to go with NVARCHAR.

UTF-8

UTF-8, introduced in SQL Server 2019, is a tricky option as it helps some scenarios but hurts others. In general, it makes sense to use under one of the following conditions:

You are dealing with app code that submits SQL using VARCHAR and/or string literals without the upper-case N prefix (whether it is a legacy app or an app that works with one or more non-SQL Server RDBMSs that allow for Unicode in VARCHAR and string literals missing the upper-case N prefix).
Your data is primarily US English characters (i.e. standard ASCII / code points 1 - 127) but you either have a small amount of data that isn't standard ASCII or there's at least potential to get non-standard ASCII data, and a significant portion of the data is in NVARCHAR(MAX) columns (i.e. can't benefit from Unicode Compression that is used in Row Compression but doesn't work on NVARCHAR(MAX) ).

For full details, please see my post: Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?

Bytes per Character

Now, how many bytes are used per character is also tricky as it depends on the datatype and collation. In general:

VARCHAR can be either 1 or 2, though when using a UTF-8 collation it can be anywhere from 1 to 4.
NVARCHAR can be either 2 or 4.

For full details, please see my post: How Many Bytes Per Character in SQL Server: a Completely Complete Guide

varchar vs nvarchar

4 answers