Use unicode character format to import or export data (SQL Server)

Applies to: SQL Server Azure SQL Database Azure SQL Managed Instance Azure Synapse Analytics Analytics Platform System (PDW)

Unicode character format is recommended for bulk transfer of data between multiple instances of SQL Server by using a data file that contains extended/DBCS characters. The Unicode character data format allows data to be exported from a server by using a code page that differs from the code page used by the client that is performing the operation. In such cases, use of Unicode character format has the following advantages:

  • If the source and destination data are Unicode data types, use of Unicode character format preserves all of the character data.

  • If the source and destination data are not Unicode data types, use of Unicode character format minimizes the loss of extended characters in the source data that cannot be represented at the destination.

In this Topic:
Considerations for Using Unicode Character Format
Special Considerations for Using Unicode Character Format, bcp, and a Format File
Command Options for Unicode Character Format
Example Test Conditions
 ● Sample Table
 ● Sample Non-XML Format File
Examples
 ● Using bcp and Unicode Character Format to Export Data
 ● Using bcp and Unicode Character Format to Import Data without a Format File
 ● Using bcp and Unicode Character Format to Import Data with a Non-XML Format File
 ● Using BULK INSERT and Unicode Character Format without a Format File
 ● Using BULK INSERT and Unicode Character Format with a Non-XML Format File
 ● Using OPENROWSET and Unicode Character Format with a Non-XML Format File
Related Tasks

Considerations for Using Unicode Character Format

When using Unicode character format, consider the following:

  • By default, the bcp utility separates the character-data fields with the tab character and terminates the records with the newline character. For information about how to specify alternative terminators, see Specify Field and Row Terminators (SQL Server).

  • The sql_variant data that is stored in a Unicode character-format data file operates in the same way it operates in a character-format data file, except that the data is stored as nchar instead of char data. For more information about character format, see Collation and Unicode Support.

Special Considerations for Using Unicode Character Format, bcp, and a Format File

Unicode character format data files follow the conventions for Unicode files. The first two bytes of the file are hexadecimal numbers, 0xFFFE. These bytes serve as byte-order marks (BOM), specifying whether the high-order byte is stored first or last in the file. The bcp Utility may misinterpret the BOM and cause part of your import process to fail; you may receive an error message similar as follows:

Starting copy...
SQLState = 22005, NativeError = 0
Error = [Microsoft][ODBC Driver 13 for SQL Server]Invalid character value for cast specification

The BOM may be misinterpreted under the following conditions:

  • The bcp Utility is used and the -w switch is used to indicate Unicode character

  • A format file is used

  • The first field in the data file is non-character

Consider whether any of the following workarounds may be available for your specific situation:

Command Options for Unicode Character Format

You can import Unicode character format data into a table using bcp, BULK INSERT or INSERT ... SELECT * FROM OPENROWSET(BULK...). For a bcp command or BULK INSERT statement, you can specify the data format in the statement. For an INSERT ... SELECT * FROM OPENROWSET(BULK...) statement, you must specify the data format in a format file.

Unicode character format is supported by the following command options:

Command Option Description
bcp -w Uses the Unicode character format.
BULK INSERT DATAFILETYPE ='widechar' Uses Unicode character format when bulk importing data.
OPENROWSET N/A Must use a format file

Note

Alternatively, you can specify formatting on a per-field basis in a format file. For more information, see Format Files for Importing or Exporting Data (SQL Server).

Example Test Conditions

The examples in this topic are based on the table, and format file defined below.

Sample Table

The script below creates a test database, a table named myWidechar and populates the table with some initial values. Execute the following Transact-SQL in Microsoft SQL Server Management Studio (SSMS):

CREATE DATABASE TestDatabase;
GO

USE TestDatabase;
CREATE TABLE dbo.myWidechar ( 
	PersonID smallint NOT NULL,
	FirstName nvarchar(25) NOT NULL,
	LastName nvarchar(30) NOT NULL,
	BirthDate date,
	AnnualSalary money
);

-- Populate table
INSERT TestDatabase.dbo.myWidechar
VALUES 
(1, N'ϴAnthony', N'Grosse', '02-23-1980', 65000.00),
(2, N'❤Alica', N'Fatnowna', '11-14-1963', 45000.00),
(3, N'☎Stella', N'Rossenhain', '03-02-1992', 120000.00);

-- Review Data
SELECT * FROM TestDatabase.dbo.myWidechar;

Sample Non-XML Format File

SQL Server support two types of format file: non-XML format and XML format. The non-XML format is the original format that is supported by earlier versions of SQL Server. Please review Non-XML Format Files (SQL Server) for detailed information. The following command will use the bcp utility to generate a non-xml format file, myWidechar.fmt, based on the schema of myWidechar. To use a bcp command to create a format file, specify the format argument and use nul instead of a data-file path. The format option also requires the -f option. In addition, for this example, the qualifier c is used to specify character data, and T is used to specify a trusted connection using integrated security. At a command prompt, enter the following commands:

bcp TestDatabase.dbo.myWidechar format nul -f D:\BCP\myWidechar.fmt -T -w

REM Review file
Notepad D:\BCP\myWidechar.fmt

Important

Ensure your non-XML format file ends with a carriage return\line feed. Otherwise you will likely receive the following error message:

SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 13 for SQL Server]I/O error while reading BCP format file

Examples

The examples below use the database, and format files created above.

Using bcp and Unicode Character Format to Export Data

-w switch and OUT command. Note: the data file created in this example will be used in all subsequent examples. At a command prompt, enter the following commands:

bcp TestDatabase.dbo.myWidechar OUT D:\BCP\myWidechar.bcp -T -w

REM Review results
NOTEPAD D:\BCP\myWidechar.bcp

Using bcp and Unicode Character Format to Import Data without a Format File

-w switch and IN command. At a command prompt, enter the following commands:

REM Truncate table (for testing)
SQLCMD -Q "TRUNCATE TABLE TestDatabase.dbo.myWidechar;"

REM Import data
bcp TestDatabase.dbo.myWidechar IN D:\BCP\myWidechar.bcp -T -w

REM Review results is SSMS

Using bcp and Unicode Character Format to Import Data with a Non-XML Format File

-w and -f switches and IN command. A workaround will need to be used since this example involves bcp, a format file, Unicode character, and the first data field in the data file is non-character. See Special Considerations for Using Unicode Character Format, bcp, and a Format File, above. The data file myWidechar.bcp will be altered by adding an additional record as a "dummy" record which will then be skipped with the -F 2 switch.

At a command prompt, enter the following commands and follow the modification steps:

REM Truncate table (for testing)
SQLCMD -Q "TRUNCATE TABLE TestDatabase.dbo.myWidechar;"

REM Open data file
Notepad D:\BCP\myWidechar.bcp
REM Copy first record and then paste as new first record.  This additional record is the "dummy" record.
REM Close file.

REM Import data instructing bcp to skip dummy record with the -F 2 switch.
bcp TestDatabase.dbo.myWidechar IN D:\BCP\myWidechar.bcp -f D:\BCP\myWidechar.fmt -T -F 2

REM Review results is SSMS

REM Return data file to original state for usage in other examples
bcp TestDatabase.dbo.myWidechar OUT D:\BCP\myWidechar.bcp -T -w

Using BULK INSERT and Unicode Character Format without a Format File

DATAFILETYPE argument. Execute the following Transact-SQL in Microsoft SQL Server Management Studio (SSMS):

TRUNCATE TABLE TestDatabase.dbo.myWidechar; -- for testing
BULK INSERT TestDatabase.dbo.myWidechar
	FROM 'D:\BCP\myWidechar.bcp'
	WITH (
		DATAFILETYPE = 'widechar'
		);

-- review results
SELECT * FROM TestDatabase.dbo.myWidechar;

Using BULK INSERT and Unicode Character Format with a Non-XML Format File

FORMATFILE argument. Execute the following Transact-SQL in Microsoft SQL Server Management Studio (SSMS):

TRUNCATE TABLE TestDatabase.dbo.myWidechar; -- for testing
BULK INSERT TestDatabase.dbo.myWidechar
   FROM 'D:\BCP\myWidechar.bcp'
   WITH (
		FORMATFILE = 'D:\BCP\myWidechar.fmt'
		);

-- review results
SELECT * FROM TestDatabase.dbo.myWidechar;

Using OPENROWSET and Unicode Character Format with a Non-XML Format File

FORMATFILE argument. Execute the following Transact-SQL in Microsoft SQL Server Management Studio (SSMS):

TRUNCATE TABLE TestDatabase.dbo.myWidechar;  -- for testing
INSERT INTO TestDatabase.dbo.myWidechar
	SELECT *
	FROM OPENROWSET (
		BULK 'D:\BCP\myWidechar.bcp', 
		FORMATFILE = 'D:\BCP\myWidechar.fmt'  
		) AS t1;

-- review results
SELECT * FROM TestDatabase.dbo.myWidechar;

To use data formats for bulk import or bulk export

See Also

bcp Utility
BULK INSERT (Transact-SQL)
OPENROWSET (Transact-SQL)
Data Types (Transact-SQL)
Collation and Unicode Support