Bulk Copy (Bulk Insert)
Some database servers provide functionality to insert large amounts of data into a table in more effective way compared to conventional inserts. The downside of this method is that each server has its own view on how this functionality should work; there is no standard interface for it.
To leverage the complexity of work with this operation,
LINQ To DB provides a
BulkCopy method. There are several overrides, but all of them do the same thing - take data and operation options, then perform inserts and return operation status. How insert operations are performed internally depends on the level of provider support and the provided options.
// DataConnectionExtensions.cs BulkCopyRowsCopied BulkCopy<T>(this DataConnection dataConnection, BulkCopyOptions options, IEnumerable<T> source) BulkCopyRowsCopied BulkCopy<T>(this DataConnection dataConnection, int maxBatchSize, IEnumerable<T> source) BulkCopyRowsCopied BulkCopy<T>(this DataConnection dataConnection, IEnumerable<T> source) BulkCopyRowsCopied BulkCopy<T>(this ITable<T> table, BulkCopyOptions options, IEnumerable<T> source) BulkCopyRowsCopied BulkCopy<T>(this ITable<T> table, int maxBatchSize, IEnumerable<T> source) BulkCopyRowsCopied BulkCopy<T>(this ITable<T> table, IEnumerable<T> source)
In addition, there are two asynchronous methods for each of the methods listed above; one accepting an
IEnumerable<T>, and for .Net Standard, one accepting an
IAsyncEnumerable<T>. Each method accepts an optional
CancellationToken parameter to cancel the bulk copy operation. A few of the asynchronous signatures are listed below:
Task<BulkCopyRowsCopied> BulkCopyAsync<T>(this DataConnection dataConnection, BulkCopyOptions options, IEnumerable<T> source, CancellationToken cancellationToken = default) Task<BulkCopyRowsCopied> BulkCopyAsync<T>(this DataConnection dataConnection, BulkCopyOptions options, IAsyncEnumerable<T> source, CancellationToken cancellationToken = default) Task<BulkCopyRowsCopied> BulkCopyAsync<T>(this ITable<T> table, IEnumerable<T> source, CancellationToken cancellationToken = default) Task<BulkCopyRowsCopied> BulkCopyAsync<T>(this ITable<T> table, IAsyncEnumerable<T> source, CancellationToken cancellationToken = default)
Insert methods and support by providers
LINQ To DB allows you to specify one of four insert methods (or three, as Default is not an actual method):
LINQ To DBwill choose the method automatically, based on the provider used. Which method to use for a specific provider can be overriden using the
RowByRow. This method just iterates over a provided collection and inserts each record using separate SQL
INSERTcommands. This is the least effective method, but some providers support only this one.
MultipleRows. Similar to
RowByRow. Inserts multiple records at once using SQL
INSERT FROM SELECTor similar batch insert commands. This one is faster than
RowByRow, but is only available for providers that support such
INSERToperations. If the method is not supported, LINQ To DB will silently fall back to the
ProviderSpecific. Most effective method, available only for a few providers. Uses provider specific functionality, usually not based on
SQLand could have provider-specific limitations, like transactions support. If the method is not supported, LINQ To DB will silently fall back to the
|IBM DB2 (LUW, zOS)||Yes||Yes||Yes (will fallback to
|IBM Informix||Yes||No||Yes (when using IDS provider for DB2 or Informix. Will fallback to
|MySql / MariaDB||Yes||Yes||Yes||MultipleRows||MySqlTools.DefaultBulkCopyType|
|PostgreSQL||Yes||Yes||Yes (read important notes below)||MultipleRows||PostgreSQLTools.DefaultBulkCopyType|
|Microsoft SQL CE||Yes||Yes||No||MultipleRows||SqlCeTools.DefaultBulkCopyType|
|Microsoft SQL Server||Yes||Yes||Yes||ProviderSpecific||SqlServerTools.DefaultBulkCopyType|
|Sybase ASE||Yes||Yes||Yes (using native provider. Also see)||MultipleRows||SybaseTools.DefaultBulkCopyType|
|ClickHouse||Yes||Yes||Yes (except MySqlConnector)||ProviderSpecific||ClickHouseTools.DefaultBulkCopyType|
Note that when using the provider-specific insert method, only some providers support asynchronous operation; other providers will silently fall back to a synchronous operation.
Providers with async support:
- SAP HANA native provider
- System.Data.SqlClient and Microsoft.Data.SqlClient
- ClickHouse.Client and Octonica.ClickHouseClient
PostgreSQL provider-specific bulk copy
BulkCopy uses the
BINARY COPY operation when the
ProviderSpecific method specified. This operation is very sensitive to what types are used. You must always use the proper type that matches the type in target table, or you will receive an error from server (e.g.
"22P03: incorrect binary data format").
Below is a list of types that could result in error without an explicit type specification:
money. Those are two different types, mapped to
System.Decimal. Default mappings will use the
numerictype, so if your column is the
moneytype, you should type it in mappings using
DataType = DataType.Moneyor
DbType = "money"hints.
interval. Those are two different types, mapped to
System.TimeSpan. Default mappings will use the
timetype, so if your column is the
intervaltype, you should type it in mappings using a
DbType = "interval"hint. Or use the
NpgsqlTimeSpantype for intervals.
- any text types/
jsonb. All those types are mapped to
characterwhich is mapped to
System.Char). Default mappings will not work for
jsonbcolumn and you should type it in mappings using
DataType = DataType.BinaryJsonor
DbType = "jsonb"hints.
cidr. If you use
NpgsqlInettype for the mapping column, it could be mapped to both
inetand 'cidr' types. There is no default mapping for this type, so you should explicitly specify it using
DbType = "inet"or
DbType = "cidr"hints. Also for
inetyou can use
IPAddresswhich will be mapped to the
macaddr8. Both types could be mapped to the same
Stringtypes, so you should explicitly specify the column type using
DbType = "macaddr"or
DbType = "macaddr8"hints. Even if you use a provider version without
macaddr8support, you should specify the hint or it will break after the provider updates to a newer version.
datetype. You should use the
NpgsqlDatetype in mappings or specify
DataType = DataType.Dateor
DbType = "date"hints.
time with time zonetype needs the
DbType = "time with time zone"hint.
If you have issues with other types, feel free to create an issue.
See BulkCopyOptions properties and support per-provider. Some options explained below.
KeepIdentity option (default :
This option allows you to insert provided values into the identity column. It is supported by limited set of providers and is not compatible with
RowByRow mode. Hence, if the provider doesn't support any other insert mode, the
KeepIdentity option is not supported.
This option is not supported for
RowByRow because corresponding functionality is not implemented by
LINQ To DB; it could be added upon request.
If you will set this option to
true for an unsupported mode or provider, you will get a
|IBM DB2 (LUW, zOS)||Only for GENERATED BY DEFAULT columns|
|Firebird||No (you need to disable triggers manually, if you use generators in triggers)|
|MySql / MariaDB||Yes|
|Oracle||Partial. Starting from version 12c it will work for GENERATED BY DEFAULT columns (as DB2), for earlier versions you need to disable triggers with generators (as Firebird). Note that for versions prior to 12c, no exception will be thrown if you will try to use it with
|SAP HANA||Depends on provider version (HANA 2 only?)|
|Microsoft SQL CE||Yes|
|Microsoft SQL Server||Yes|
|ClickHouse||No (ClickHouse doesn't have identity/sequence concept)|
MaxDegreeOfParallelism Option (default:
Supported only by
ClickHouse.Client provider and defines number of parallel connections to use for insert. Note that behavior depends on provider implementation, which currently use 4 parallel connections by default and perform parallel insert only for batches larger than 100K records.
WithoutSession Option (default:
Supported only by
ClickHouse.Client provider. When enabled, provider will use session-less connection even if
linq2db connection configured to use sessions. Note that:
- session-less connection requied for parallel inserts (see
- session-less connections doesn't support temporary tables, so you cannot insert into temporary table with this option set to
As an alternative to
BulkCopy, a Merge operation could be used. It allows more flexibility but is not available for some providers and will be always slower than
BulkCopy with native provider support.