Bulk Copy (Bulk Insert)
Some database servers provide functionality to insert large amounts of data into a table in more effective way compared to conventional inserts. The downside of this method is that each server has its own view on how this functionality should work; there is no standard interface for it.
Overview
To leverage the complexity of work with this operation, LINQ To DB provides a BulkCopy method. There are several overrides, but all of them do the same thing - take data and operation options, then perform inserts and return operation status. How insert operations are performed internally depends on the level of provider support and the provided options.
// DataConnectionExtensions.cs
BulkCopyRowsCopied BulkCopy<T>(this DataConnection dataConnection, BulkCopyOptions options, IEnumerable<T> source)
BulkCopyRowsCopied BulkCopy<T>(this DataConnection dataConnection, int maxBatchSize, IEnumerable<T> source)
BulkCopyRowsCopied BulkCopy<T>(this DataConnection dataConnection, IEnumerable<T> source)
BulkCopyRowsCopied BulkCopy<T>(this ITable<T> table, BulkCopyOptions options, IEnumerable<T> source)
BulkCopyRowsCopied BulkCopy<T>(this ITable<T> table, int maxBatchSize, IEnumerable<T> source)
BulkCopyRowsCopied BulkCopy<T>(this ITable<T> table, IEnumerable<T> source)
In addition, there are two asynchronous methods for each of the methods listed above; one accepting an IEnumerable<T>, and for .Net Standard, one accepting an IAsyncEnumerable<T>. Each method accepts an optional CancellationToken parameter to cancel the bulk copy operation. A few of the asynchronous signatures are listed below:
Task<BulkCopyRowsCopied> BulkCopyAsync<T>(this DataConnection dataConnection, BulkCopyOptions options, IEnumerable<T> source, CancellationToken cancellationToken = default)
Task<BulkCopyRowsCopied> BulkCopyAsync<T>(this DataConnection dataConnection, BulkCopyOptions options, IAsyncEnumerable<T> source, CancellationToken cancellationToken = default)
Task<BulkCopyRowsCopied> BulkCopyAsync<T>(this ITable<T> table, IEnumerable<T> source, CancellationToken cancellationToken = default)
Task<BulkCopyRowsCopied> BulkCopyAsync<T>(this ITable<T> table, IAsyncEnumerable<T> source, CancellationToken cancellationToken = default)
Insert methods and support by providers
LINQ To DB allows you to specify one of four insert methods (or three, as Default is not an actual method):
Default.LINQ To DBwill choose the method automatically, based on the provider used. Which method to use for a specific provider can be overriden using the<PROVIDER_NAME>Tools.DefaultBulkCopyTypeproperty.RowByRow. This method just iterates over a provided collection and inserts each record using separate SQLINSERTcommands. This is the least effective method, but some providers support only this one.MultipleRows. Similar toRowByRow. Inserts multiple records at once using SQLINSERT FROM SELECTor similar batch insert commands. This one is faster thanRowByRow, but is only available for providers that support suchINSERToperations. If the method is not supported, LINQ To DB will silently fall back to theRowByRowimplementation.ProviderSpecific. Most effective method, available only for a few providers. Uses provider specific functionality, usually not based onSQLand could have provider-specific limitations, like transactions support. If the method is not supported, LINQ To DB will silently fall back to theMultipleRowsimplementation.
| Provider | RowByRow | MultipleRows | ProviderSpecific | Default | Notes |
|---|---|---|---|---|---|
| Microsoft Access | Yes | No | No | MultipleRows | AccessTools.DefaultBulkCopyType |
| IBM DB2 (LUW, zOS) | Yes | Yes | Yes (will fallback to MultipleRows if called in transaction) |
MultipleRows | DB2Tools.DefaultBulkCopyType |
| Firebird | Yes | Yes | No | MultipleRows | FirebirdTools.DefaultBulkCopyType |
| IBM Informix | Yes | No | Yes (when using IDS provider for DB2 or Informix. Will fallback to MultipleRows if called in transaction) |
ProviderSpecific | InformixTools.DefaultBulkCopyType |
| MySql / MariaDB | Yes | Yes | Yes | MultipleRows | MySqlTools.DefaultBulkCopyType |
| Oracle | Yes | Yes | Yes | MultipleRows | OracleTools.DefaultBulkCopyType |
| PostgreSQL | Yes | Yes | Yes (read important notes below) | MultipleRows | PostgreSQLTools.DefaultBulkCopyType |
| SAP HANA | Yes | No | Yes | MultipleRows | SapHanaTools.DefaultBulkCopyType |
| Microsoft SQL CE | Yes | Yes | No | MultipleRows | SqlCeTools.DefaultBulkCopyType |
| SQLite | Yes | Yes | No | MultipleRows | SQLiteTools.DefaultBulkCopyType |
| Microsoft SQL Server | Yes | Yes | Yes | ProviderSpecific | SqlServerTools.DefaultBulkCopyType |
| Sybase ASE | Yes | Yes | Yes (using native provider. Also see) | MultipleRows | SybaseTools.DefaultBulkCopyType |
| ClickHouse | Yes | Yes | Yes (except MySqlConnector) | ProviderSpecific | ClickHouseTools.DefaultBulkCopyType |
Note that when using the provider-specific insert method, only some providers support asynchronous operation; other providers will silently fall back to a synchronous operation.
Providers with async support:
- MySqlConnector
- Npgsql
- SAP HANA native provider
- System.Data.SqlClient and Microsoft.Data.SqlClient
- ClickHouse.Client and Octonica.ClickHouseClient
PostgreSQL provider-specific bulk copy
For PostgreSQL, BulkCopy uses the BINARY COPY operation when the ProviderSpecific method specified. This operation is very sensitive to what types are used. You must always use the proper type that matches the type in target table, or you will receive an error from server (e.g. "22P03: incorrect binary data format").
Below is a list of types that could result in error without an explicit type specification:
decimal/numericvsmoney. Those are two different types, mapped toSystem.Decimal. Default mappings will use thenumerictype, so if your column is themoneytype, you should type it in mappings usingDataType = DataType.MoneyorDbType = "money"hints.timevsinterval. Those are two different types, mapped toSystem.TimeSpan. Default mappings will use thetimetype, so if your column is theintervaltype, you should type it in mappings using aDbType = "interval"hint. Or use theNpgsqlTimeSpantype for intervals.- any text types/
jsonvsjsonb. All those types are mapped toSystem.String(exceptcharacterwhich is mapped toSystem.Char). Default mappings will not work forjsonbcolumn and you should type it in mappings usingDataType = DataType.BinaryJsonorDbType = "jsonb"hints. inetvscidr. If you useNpgsqlInettype for the mapping column, it could be mapped to bothinetand 'cidr' types. There is no default mapping for this type, so you should explicitly specify it usingDbType = "inet"orDbType = "cidr"hints. Also forinetyou can useIPAddresswhich will be mapped to theinettype.macaddrvsmacaddr8. Both types could be mapped to the samePhysicalAddress/Stringtypes, so you should explicitly specify the column type usingDbType = "macaddr"orDbType = "macaddr8"hints. Even if you use a provider version withoutmacaddr8support, you should specify the hint or it will break after the provider updates to a newer version.datetype. You should use theNpgsqlDatetype in mappings or specifyDataType = DataType.DateorDbType = "date"hints.time with time zonetype needs theDbType = "time with time zone"hint.
If you have issues with other types, feel free to create an issue.
Options
See BulkCopyOptions properties and support per-provider. Some options explained below.
KeepIdentity option (default : false)
This option allows you to insert provided values into the identity column. It is supported by limited set of providers and is not compatible with RowByRow mode. Hence, if the provider doesn't support any other insert mode, the KeepIdentity option is not supported.
This option is not supported for RowByRow because corresponding functionality is not implemented by LINQ To DB; it could be added upon request.
If you will set this option to true for an unsupported mode or provider, you will get a LinqToDBException.
| Provider | Support |
|---|---|
| Microsoft Access | No |
| IBM DB2 (LUW, zOS) | Only for GENERATED BY DEFAULT columns |
| Firebird | No (you need to disable triggers manually, if you use generators in triggers) |
| IBM Informix | No |
| MySql / MariaDB | Yes |
| Oracle | Partial. Starting from version 12c it will work for GENERATED BY DEFAULT columns (as DB2), for earlier versions you need to disable triggers with generators (as Firebird). Note that for versions prior to 12c, no exception will be thrown if you will try to use it with KeepIdentity set to true and generated values will be used silently as LINQ To DB don't have Oracle version detection right now. This could be changed in future. |
| PostgreSQL | Yes |
| SAP HANA | Depends on provider version (HANA 2 only?) |
| Microsoft SQL CE | Yes |
| SQLite | Yes |
| Microsoft SQL Server | Yes |
| Sybase ASE | Yes |
| ClickHouse | No (ClickHouse doesn't have identity/sequence concept) |
MaxDegreeOfParallelism Option (default: null)
Supported only by ClickHouse.Client provider and defines number of parallel connections to use for insert. Note that behavior depends on provider implementation, which currently use 4 parallel connections by default and perform parallel insert only for batches larger than 100K records.
WithoutSession Option (default: false)
Supported only by ClickHouse.Client provider. When enabled, provider will use session-less connection even if linq2db connection configured to use sessions. Note that:
- session-less connection requied for parallel inserts (see
MaxDegreeOfParallelismoption) - session-less connections doesn't support temporary tables, so you cannot insert into temporary table with this option set to
true
See Also
As an alternative to BulkCopy, a Merge operation could be used. It allows more flexibility but is not available for some providers and will be always slower than BulkCopy with native provider support.