INSERT INTO

Inserts new rows into a table. You specify the inserted rows by value expressions or the result of a query.

Syntax

INSERT INTO [ TABLE ] table_identifier [ partition_spec ] [ ( column_list ) ]
    { VALUES ( { value | NULL } [ , ... ] ) [ , ( ... ) ] | query }

Note

When you INSERT INTO a Delta table schema enforcement and evolution is supported. If a column’s data type cannot be safely cast to a Delta table’s data type, a runtime exception is thrown. If schema evolution is enabled, new columns can exist as the last columns of your schema (or nested columns) for the schema to evolve.

Parameters

  • table_identifier

    • [database_name.] table_name: A table name, optionally qualified with a database name.
    • delta.`<path-to-table>`: The location of an existing Delta table.
  • partition_spec

    An optional parameter that specifies a comma-separated list of key and value pairs for partitions.

    Syntax: PARTITION ( partition_col_name = partition_col_val [ , ... ] )

  • column_list

    An optional parameter that specifies a comma-separated list of columns belonging to the table_identifier table.

    Note

    The current behavior has some limitations:

    • All specified columns should exist in the table and not be duplicated from each other. It includes all columns except the static partition columns.
    • The size of the column list should be exactly the size of the data from VALUES clause or query.
    • The order of the column list is alterable and determines how the the position in which data from VALUES clause or query is inserted.
  • VALUES ( { value | NULL } [ , … ] ) [ , ( … ) ]

    The values to be inserted. Either an explicitly specified value or a NULL. Use a comma to separate each value in the clause. You can specify more than one set of values to insert multiple rows.

  • query

    A query that produces the rows to be inserted. One of following formats:

    • A SELECT statement
    • A TABLE statement
    • A FROM statement

Dynamic partition inserts

In part_spec, the partition column values are optional. When the partition specification part_spec is not completely provided, such inserts are called dynamic partition inserts or multi-partition inserts. When the values are not specified, these columns are referred to as dynamic partition columns; otherwise, they are static partition columns. For example, the partition spec (p1 = 3, p2, p3) has a static partition column (p1) and two dynamic partition columns (p2 and p3).

In part_spec, static partition keys must come before the dynamic partition keys. This means all partition columns having constant values must appear before other partition columns that do not have an assigned constant value.

The partition values of dynamic partition columns are determined during the execution. The dynamic partition columns must be specified last in both part_spec and the input result set (of the row value lists or the select query). They are resolved by position, instead of by names. Thus, the orders must be exactly matched.

The DataFrameWriter APIs do not have an interface to specify partition values. Therefore, the insertInto() API is always using dynamic partition mode.

Important

In dynamic partition mode, the input result set could result in a large number of dynamic partitions, and thus generate a large number of partition directories.

OVERWRITE

The values to be inserted. Either an explicitly specified value or a NULL can be inserted. A comma must be used to separate each value in the clause. More than one set of values can be specified to insert multiple rows.

The semantics are different based on the type of the target table:

  • Hive SerDe tables: INSERT OVERWRITE doesn’t delete partitions ahead, and only overwrites those partitions that have data written into it at runtime. This matches Apache Hive semantics. For Hive SerDe tables, Spark SQL respects the Hive-related configuration, including hive.exec.dynamic.partition and hive.exec.dynamic.partition.mode.
  • Native data source tables: INSERT OVERWRITE first deletes all the partitions that match the partition specification (e.g., PARTITION(a=1, b)) and then inserts all the remaining values. You can change the behavior of native data source tables to be consistent with Hive SerDe tables by changing the session-specific configuration spark.sql.sources.partitionOverwriteMode to DYNAMIC. The default mode is STATIC.

Examples

In this section:

Single row insert using a VALUES clause

CREATE TABLE students (name VARCHAR(64), address VARCHAR(64), student_id INT)
    USING PARQUET PARTITIONED BY (student_id);

INSERT INTO students VALUES
    ('Amy Smith', '123 Park Ave, San Jose', 111111);

SELECT * FROM students;
+---------+----------------------+----------+
|     name|               address|student_id|
+---------+----------------------+----------+
|Amy Smith|123 Park Ave, San Jose|    111111|
+---------+----------------------+----------+

Multi-row insert using a VALUES clause

INSERT INTO students VALUES
    ('Bob Brown', '456 Taylor St, Cupertino', 222222),
    ('Cathy Johnson', '789 Race Ave, Palo Alto', 333333);

SELECT * FROM students;
+-------------+------------------------+----------+
|         name|                 address|student_id|
+-------------+------------------------+----------+
|    Amy Smith|  123 Park Ave, San Jose|    111111|
+-------------+------------------------+----------+
|    Bob Brown|456 Taylor St, Cupertino|    222222|
+-------------+------------------------+----------+
|Cathy Johnson| 789 Race Ave, Palo Alto|    333333|
+--------------+-----------------------+----------+

Insert using a SELECT statement

-- Assuming the persons table has already been created and populated.
SELECT * FROM persons;
+-------------+--------------------------+---------+
|         name|                   address|      ssn|
+-------------+--------------------------+---------+
|Dora Williams|134 Forest Ave, Menlo Park|123456789|
+-------------+--------------------------+---------+
|  Eddie Davis|   245 Market St, Milpitas|345678901|
+-------------+--------------------------+---------+

INSERT INTO students PARTITION (student_id = 444444)
    SELECT name, address FROM persons WHERE name = "Dora Williams";

SELECT * FROM students;
+-------------+--------------------------+----------+
|         name|                   address|student_id|
+-------------+--------------------------+----------+
|    Amy Smith|    123 Park Ave, San Jose|    111111|
+-------------+--------------------------+----------+
|    Bob Brown|  456 Taylor St, Cupertino|    222222|
+-------------+--------------------------+----------+
|Cathy Johnson|   789 Race Ave, Palo Alto|    333333|
+-------------+--------------------------+----------+
|Dora Williams|134 Forest Ave, Menlo Park|    444444|
+-------------+--------------------------+----------+

Insert using a TABLE statement

-- Assuming the visiting_students table has already been created and populated.
SELECT * FROM visiting_students;
+-------------+---------------------+----------+
|         name|              address|student_id|
+-------------+---------------------+----------+
|Fleur Laurent|345 Copper St, London|    777777|
+-------------+---------------------+----------+
|Gordon Martin| 779 Lake Ave, Oxford|    888888|
+-------------+---------------------+----------+

INSERT INTO students TABLE visiting_students;

SELECT * FROM students;
+-------------+--------------------------+----------+
|         name|                   address|student_id|
+-------------+--------------------------+----------+
|    Amy Smith|    123 Park Ave, San Jose|    111111|
+-------------+--------------------------+----------+
|    Bob Brown|  456 Taylor St, Cupertino|    222222|
+-------------+--------------------------+----------+
|Cathy Johnson|   789 Race Ave, Palo Alto|    333333|
+-------------+--------------------------+----------+
|Dora Williams|134 Forest Ave, Menlo Park|    444444|
+-------------+--------------------------+----------+
|Fleur Laurent|     345 Copper St, London|    777777|
+-------------+--------------------------+----------+
|Gordon Martin|      779 Lake Ave, Oxford|    888888|
+-------------+--------------------------+----------+

Insert using a FROM statement

-- Assuming the applicants table has already been created and populated.
SELECT * FROM applicants;
+-----------+--------------------------+----------+---------+
|       name|                   address|student_id|qualified|
+-----------+--------------------------+----------+---------+
|Helen Davis| 469 Mission St, San Diego|    999999|     true|
+-----------+--------------------------+----------+---------+
|   Ivy King|367 Leigh Ave, Santa Clara|    101010|    false|
+-----------+--------------------------+----------+---------+
| Jason Wang|     908 Bird St, Saratoga|    121212|     true|
+-----------+--------------------------+----------+---------+

INSERT INTO students
     FROM applicants SELECT name, address, id applicants WHERE qualified = true;

SELECT * FROM students;
+-------------+--------------------------+----------+
|         name|                   address|student_id|
+-------------+--------------------------+----------+
|    Amy Smith|    123 Park Ave, San Jose|    111111|
+-------------+--------------------------+----------+
|    Bob Brown|  456 Taylor St, Cupertino|    222222|
+-------------+--------------------------+----------+
|Cathy Johnson|   789 Race Ave, Palo Alto|    333333|
+-------------+--------------------------+----------+
|Dora Williams|134 Forest Ave, Menlo Park|    444444|
+-------------+--------------------------+----------+
|Fleur Laurent|     345 Copper St, London|    777777|
+-------------+--------------------------+----------+
|Gordon Martin|      779 Lake Ave, Oxford|    888888|
+-------------+--------------------------+----------+
|  Helen Davis| 469 Mission St, San Diego|    999999|
+-------------+--------------------------+----------+
|   Jason Wang|     908 Bird St, Saratoga|    121212|
+-------------+--------------------------+----------+

Insert with a column list

INSERT INTO students (address, name, student_id) VALUES
    ('Hangzhou, China', 'Kent Yao', 11215016);
SELECT * FROM students WHERE name = 'Kent Yao';
+---------+----------------------+----------+
|     name|               address|student_id|
+---------+----------------------+----------+
|Kent Yao |       Hangzhou, China|  11215016|
+---------+----------------------+----------+

Insert with both a partition spec and a column list

INSERT INTO students PARTITION (student_id = 11215017) (address, name) VALUES
    ('Hangzhou, China', 'Kent Yao Jr.');
SELECT * FROM students WHERE student_id = 11215017;
+------------+----------------------+----------+
|        name|               address|student_id|
+------------+----------------------+----------+
|Kent Yao Jr.|       Hangzhou, China|  11215017|
+------------+----------------------+----------+