SQL Server connector — Trino 457 Documentation (2024)

SQL Server connector — Trino 457 Documentation (1)

The SQL Server connector allows querying and creating tables in an externalMicrosoft SQL Server database. Thiscan be used to join data between different systems like SQL Server and Hive, orbetween two different SQL Server instances.

Requirements#

To connect to SQL Server, you need:

SQL Server 2012 or higher, or Azure SQL Database.
Network access from the Trino coordinator and workers to SQL Server.Port 1433 is the default port.

Configuration#

The connector can query a single database on a given SQL Server instance. Createa catalog properties file that specifies the SQL server connector by setting theconnector.name to sqlserver.

For example, to access a database as example, create the fileetc/catalog/example.properties. Replace the connection properties asappropriate for your setup:

connector.name=sqlserverconnection-url=jdbc:sqlserver://<host>:<port>;databaseName=<databaseName>;encrypt=falseconnection-user=rootconnection-password=secret

The connection-url defines the connection information and parameters to passto the SQL Server JDBC driver. The supported parameters for the URL areavailable in the SQL Server JDBC driver documentation.

The connection-user and connection-password are typically required anddetermine the user credentials for the connection, often a service user. You canuse secrets to avoid actual values in the catalogproperties files.

Connection security#

The JDBC driver, and therefore the connector, automatically use Transport LayerSecurity (TLS) encryption and certificate validation. This requires a suitableTLS certificate configured on your SQL Server database host.

If you do not have the necessary configuration established, you can disableencryption in the connection string with the encrypt property:

connection-url=jdbc:sqlserver://<host>:<port>;databaseName=<databaseName>;encrypt=false

Further parameters like trustServerCertificate, hostNameInCertificate,trustStore, and trustStorePassword are details in the TLS section ofSQL Server JDBC driver documentation.

Data source authentication#

The connector can provide credentials for the data source connectionin multiple ways:

inline, in the connector configuration file
in a separate properties file
in a key store file
as extra credentials set when connecting to Trino

You can use secrets to avoid storing sensitivevalues in the catalog properties files.

The following table describes configuration propertiesfor connection credentials:

Property name	Description
`credential-provider.type`	Type of the credential provider. Must be one of `INLINE`, `FILE`, or`KEYSTORE`; defaults to `INLINE`.
`connection-user`	Connection user name.
`connection-password`	Connection password.
`user-credential-name`	Name of the extra credentials property, whose value to use as the username. See `extraCredentials` in Parameterreference.
`password-credential-name`	Name of the extra credentials property, whose value to use as thepassword.
`connection-credential-file`	Location of the properties file where credentials are present. It mustcontain the `connection-user` and `connection-password` properties.
`keystore-file-path`	The location of the Java Keystore file, from which to read credentials.
`keystore-type`	File format of the keystore file, for example `JKS` or `PEM`.
`keystore-password`	Password for the key store.
`keystore-user-credential-name`	Name of the key store entity to use as the user name.
`keystore-user-credential-password`	Password for the user name key store entity.
`keystore-password-credential-name`	Name of the key store entity to use as the password.
`keystore-password-credential-password`	Password for the password key store entity.

Multiple SQL Server databases or servers#

The SQL Server connector can only access a single SQL Server databasewithin a single catalog. Thus, if you have multiple SQL Server databases,or want to connect to multiple SQL Server instances, you must configuremultiple instances of the SQL Server connector.

To add another catalog, simply add another properties file to etc/catalogwith a different name, making sure it ends in .properties. For example,if you name the property file sales.properties, Trino creates acatalog named sales using the configured connector.

General configuration properties#

The following table describes general catalog configuration properties for theconnector:

Property name	Description
`case-insensitive-name-matching`	Support case insensitive schema and table names. Defaults to `false`.
`case-insensitive-name-matching.cache-ttl`	Duration for which case insensitive schema and tablenames are cached. Defaults to `1m`.
`case-insensitive-name-matching.config-file`	Path to a name mapping configuration file in JSON format that allowsTrino to disambiguate between schemas and tables with similar names indifferent cases. Defaults to `null`.
`case-insensitive-name-matching.config-file.refresh-period`	Frequency with which Trino checks the name matching configuration filefor changes. The duration value defaults to `0s`(refresh disabled).
`metadata.cache-ttl`	Duration for which metadata, including table andcolumn statistics, is cached. Defaults to `0s` (caching disabled).
`metadata.cache-missing`	Cache the fact that metadata, including table and column statistics, isnot available. Defaults to `false`.
`metadata.schemas.cache-ttl`	Duration for which schema metadata is cached.Defaults to the value of `metadata.cache-ttl`.
`metadata.tables.cache-ttl`	Duration for which table metadata is cached.Defaults to the value of `metadata.cache-ttl`.
`metadata.statistics.cache-ttl`	Duration for which tables statistics are cached.Defaults to the value of `metadata.cache-ttl`.
`metadata.cache-maximum-size`	Maximum number of objects stored in the metadata cache. Defaults to `10000`.
`write.batch-size`	Maximum number of statements in a batched execution. Do not changethis setting from the default. Non-default values may negativelyimpact performance. Defaults to `1000`.
`dynamic-filtering.enabled`	Push down dynamic filters into JDBC queries. Defaults to `true`.
`dynamic-filtering.wait-timeout`	Maximum duration for which Trino waits for dynamicfilters to be collected from the build side of joins before starting aJDBC query. Using a large timeout can potentially result in more detaileddynamic filters. However, it can also increase latency for some queries.Defaults to `20s`.

Appending query metadata#

The optional parameter query.comment-format allows you to configure a SQLcomment that is sent to the datasource with each query. The format of thiscomment can contain any characters and the following metadata:

$QUERY_ID: The identifier of the query.
$USER: The name of the user who submits the query to Trino.
$SOURCE: The identifier of the client tool used to submit the query, forexample trino-cli.
$TRACE_TOKEN: The trace token configured with the client tool.

The comment can provide more context about the query. This additionalinformation is available in the logs of the datasource. To include environmentvariables from the Trino cluster with the comment , use the${ENV:VARIABLE-NAME} syntax.

The following example sets a simple comment that identifies each query sent byTrino:

query.comment-format=Query sent by Trino.

With this configuration, a query such as SELECT * FROM example_table; issent to the datasource with the comment appended:

SELECT * FROM example_table; /*Query sent by Trino.*/

The following example improves on the preceding example by using metadata:

query.comment-format=Query $QUERY_ID sent by user $USER from Trino.

If Jane sent the query with the query identifier20230622_180528_00000_bkizg, the following comment string is sent to thedatasource:

SELECT * FROM example_table; /*Query 20230622_180528_00000_bkizg sent by user Jane from Trino.*/

Note

Certain JDBC driver settings and logging configurations might cause thecomment to be removed.

Domain compaction threshold#

Pushing down a large list of predicates to the data source can compromiseperformance. Trino compacts large predicates into a simpler range predicateby default to ensure a balance between performance and predicate pushdown.If necessary, the threshold for this compaction can be increased to improveperformance when the data source is capable of taking advantage of largepredicates. Increasing this threshold may improve pushdown of largedynamic filters.The domain-compaction-threshold catalog configuration property or thedomain_compaction_threshold catalog session property can be used to adjust the default value of32 for this threshold.

Specific configuration properties#

The SQL Server connector supports additional catalog properties to configure thebehavior of the connector and the issues queries to the database.

Property name	Description
`sqlserver.snapshot-isolation.disabled`	Control the automatic use of snapshot isolation for transactions issued byTrino in SQL Server. Defaults to `false`, which means that snapshotisolation is enabled.

Case insensitive matching#

When case-insensitive-name-matching is set to true, Trinois able to query non-lowercase schemas and tables by maintaining a mapping ofthe lowercase name to the actual name in the remote system. However, if twoschemas and/or tables have names that differ only in case (such as “customers”and “Customers”) then Trino fails to query them due to ambiguity.

In these cases, use the case-insensitive-name-matching.config-file catalogconfiguration property to specify a configuration file that maps these remoteschemas/tables to their respective Trino schemas/tables:

{ "schemas": [ { "remoteSchema": "CaseSensitiveName", "mapping": "case_insensitive_1" }, { "remoteSchema": "cASEsENSITIVEnAME", "mapping": "case_insensitive_2" }], "tables": [ { "remoteSchema": "CaseSensitiveName", "remoteTable": "tablex", "mapping": "table_1" }, { "remoteSchema": "CaseSensitiveName", "remoteTable": "TABLEX", "mapping": "table_2" }]}

Queries against one of the tables or schemes defined in the mappingattributes are run against the corresponding remote entity. For example, a queryagainst tables in the case_insensitive_1 schema is forwarded to theCaseSensitiveName schema and a query against case_insensitive_2 is forwardedto the cASEsENSITIVEnAME schema.

At the table mapping level, a query on case_insensitive_1.table_1 asconfigured above is forwarded to CaseSensitiveName.tablex, and a query oncase_insensitive_1.table_2 is forwarded to CaseSensitiveName.TABLEX.

By default, when a change is made to the mapping configuration file, Trino mustbe restarted to load the changes. Optionally, you can set thecase-insensitive-name-mapping.refresh-period to have Trino refresh theproperties without requiring a restart:

case-insensitive-name-mapping.refresh-period=30s

Non-transactional INSERT#

The connector supports adding rows using INSERT statements.By default, data insertion is performed by writing data to a temporary table.You can skip this step to improve performance and write directly to the targettable. Set the insert.non-transactional-insert.enabled catalog propertyor the corresponding non_transactional_insert catalog session property totrue.

Note that with this property enabled, data can be corrupted in rare cases whereexceptions occur during the insert operation. With transactions disabled, norollback can be performed.

Fault-tolerant execution support#

The connector supports Fault-tolerant execution of queryprocessing. Read and write operations are both supported with any retry policy.

Querying SQL Server#

The SQL Server connector provides access to all schemas visible to the specifieduser in the configured database. For the following examples, assume the SQLServer catalog is example.

You can see the available schemas by running SHOW SCHEMAS:

SHOW SCHEMAS FROM example;

If you have a schema named web, you can view the tablesin this schema by running SHOW TABLES:

SHOW TABLES FROM example.web;

You can see a list of the columns in the clicks table in the web databaseusing either of the following:

DESCRIBE example.web.clicks;SHOW COLUMNS FROM example.web.clicks;

Finally, you can query the clicks table in the web schema:

SELECT * FROM example.web.clicks;

If you used a different name for your catalog properties file, usethat catalog name instead of example in the above examples.

Type mapping#

Because Trino and SQL Server each support types that the other does not, thisconnector modifies some types when reading orwriting data. Data types may not map the same way in both directions betweenTrino and the data source. Refer to the following sections for type mapping ineach direction.

SQL Server type to Trino type mapping#

The connector maps SQL Server types to the corresponding Trino types following this table:

SQL Server type to Trino type mapping#
SQL Server database type	Trino type	Notes
`BIT`	`BOOLEAN`
`TINYINT`	`SMALLINT`	SQL Server `TINYINT` is actually `unsigned TINYINT`
`SMALLINT`	`SMALLINT`
`INTEGER`	`INTEGER`
`BIGINT`	`BIGINT`
`DOUBLE PRECISION`	`DOUBLE`
`FLOAT[(n)]`	`REAL` or `DOUBLE`	See Numeric type mapping
`REAL`	`REAL`
`DECIMAL[(p[, s])]`, `NUMERIC[(p[, s])]`	`DECIMAL(p, s)`
`CHAR[(n)]`	`CHAR(n)`	`1 <= n <= 8000`
`NCHAR[(n)]`	`CHAR(n)`	`1 <= n <= 4000`
`VARCHAR[(n \| max)]`, `NVARCHAR[(n \| max)]`	`VARCHAR(n)`	`1 <= n <= 8000`, `max = 2147483647`
`TEXT`	`VARCHAR(2147483647)`
`NTEXT`	`VARCHAR(1073741823)`
`VARBINARY[(n \| max)]`	`VARBINARY`	`1 <= n <= 8000`, `max = 2147483647`
`DATE`	`DATE`
`TIME[(n)]`	`TIME(n)`	`0 <= n <= 7`
`DATETIME2[(n)]`	`TIMESTAMP(n)`	`0 <= n <= 7`
`SMALLDATETIME`	`TIMESTAMP(0)`
`DATETIMEOFFSET[(n)]`	`TIMESTAMP(n) WITH TIME ZONE`	`0 <= n <= 7`

Trino type to SQL Server type mapping#

The connector maps Trino types to the corresponding SQL Server types following this table:

Trino type to SQL Server type mapping#
Trino type	SQL Server type	Notes
`BOOLEAN`	`BIT`
`TINYINT`	`TINYINT`	Trino only supports writing values belonging to `[0, 127]`
`SMALLINT`	`SMALLINT`
`INTEGER`	`INTEGER`
`BIGINT`	`BIGINT`
`REAL`	`REAL`
`DOUBLE`	`DOUBLE PRECISION`
`DECIMAL(p, s)`	`DECIMAL(p, s)`
`CHAR(n)`	`NCHAR(n)` or `NVARCHAR(max)`	See Character type mapping
`VARCHAR(n)`	`NVARCHAR(n)` or `NVARCHAR(max)`	See Character type mapping
`VARBINARY`	`VARBINARY(max)`
`DATE`	`DATE`
`TIME(n)`	`TIME(n)`	`0 <= n <= 7`
`TIMESTAMP(n)`	`DATETIME2(n)`	`0 <= n <= 7`

Complete list of SQL Server data types.

Numeric type mapping#

For SQL Server FLOAT[(n)]:

If n is not specified maps to Trino Double
If 1 <= n <= 24 maps to Trino REAL
If 24 < n <= 53 maps to Trino DOUBLE

Character type mapping#

For Trino CHAR(n):

If 1 <= n <= 4000 maps SQL Server NCHAR(n)
If n > 4000 maps SQL Server NVARCHAR(max)

For Trino VARCHAR(n):

If 1 <= n <= 4000 maps SQL Server NVARCHAR(n)
If n > 4000 maps SQL Server NVARCHAR(max)

Type mapping configuration properties#

The following properties can be used to configure how data types from theconnected data source are mapped to Trino data types and how the metadata iscached in Trino.

Property name	Description	Default value
`unsupported-type-handling`	Configure how unsupported column data types are handled: `IGNORE`, column is not accessible. `CONVERT_TO_VARCHAR`, column is converted to unbounded `VARCHAR`. The respective catalog session property is `unsupported_type_handling`.	`IGNORE`
`jdbc-types-mapped-to-varchar`	Allow forced mapping of comma separated lists of data types to convert tounbounded `VARCHAR`

Property name

Description

Default value

unsupported-type-handling

Configure how unsupported column data types are handled:

IGNORE, column is not accessible.
CONVERT_TO_VARCHAR, column is converted to unbounded VARCHAR.

The respective catalog session property is unsupported_type_handling.

IGNORE

jdbc-types-mapped-to-varchar

Allow forced mapping of comma separated lists of data types to convert tounbounded VARCHAR

SQL support#

The connector provides read access and write access to data and metadata in SQLServer. In addition to the globally availableand read operation statements, the connectorsupports the following features:

INSERT
UPDATE
DELETE
TRUNCATE
See Also
What are the Best Ways to Write a SQL Query? - GeeksforGeeks
Schema and table management

UPDATE#

Only UPDATE statements with constant assignments and predicates aresupported. For example, the following statement is supported because the valuesassigned are constants:

UPDATE table SET col1 = 1 WHERE col3 = 1

Arithmetic expressions, function calls, and other non-constant UPDATEstatements are not supported. For example, the following statement is notsupported because arithmetic expressions cannot be used with the SETcommand:

UPDATE table SET col1 = col2 + 2 WHERE col3 = 1

All column values of a table row cannot be updated simultaneously. For a threecolumn table, the following statement is not supported:

UPDATE table SET col1 = 1, col2 = 2, col3 = 3 WHERE col3 = 1

SQL DELETE#

If a WHERE clause is specified, the DELETE operation only works if thepredicate in the clause can be fully pushed down to the data source.

ALTER TABLE RENAME TO#

The connector does not support renaming tables across multiple schemas. Forexample, the following statement is supported:

ALTER TABLE example.schema_one.table_one RENAME TO example.schema_one.table_two

The following statement attempts to rename a table across schemas, and thereforeis not supported:

ALTER TABLE example.schema_one.table_one RENAME TO example.schema_two.table_two

Procedures#

`system.flush_metadata_cache()`#

Flush JDBC metadata caches. For example, the following system callflushes the metadata caches for all schemas in the example catalog

USE example.example_schema;CALL system.flush_metadata_cache();

`system.execute('query')`#

The execute procedure allows you to execute a query in the underlying datasource directly. The query must use supported syntax of the connected datasource. Use the procedure to access features which are not available in Trinoor to execute queries that return no result set and therefore can not be usedwith the query or raw_query pass-through table function. Typical use casesare statements that create or alter objects, and require native feature suchas constraints, default values, automatic identifier creation, or indexes.Queries can also invoke statements that insert, update, or delete data, and donot return any data as a result.

The query text is not parsed by Trino, only passed through, and therefore onlysubject to any security or access control of the underlying data source.

The following example sets the current database to the example_schema of theexample catalog. Then it calls the procedure in that schema to drop thedefault value from your_column on your_table table using the standard SQLsyntax in the parameter value assigned for query:

USE example.example_schema;CALL system.execute(query => 'ALTER TABLE your_table ALTER COLUMN your_column DROP DEFAULT');

Verify that the specific database supports this syntax, and adapt as necessarybased on the documentation for the specific connected database and databaseversion.

Table functions#

The connector provides specific table functions toaccess SQL Server.

`query(varchar) -> table`#

The query function allows you to query the underlying database directly. Itrequires syntax native to SQL Server, because the full query is pushed down andprocessed in SQL Server. This can be useful for accessing native features whichare not implemented in Trino or for improving query performance in situationswhere running a query natively may be faster.

The native query passed to the underlying data source is required to return atable as a result set. Only the data source performs validation or securitychecks for these queries using its own configuration. Trino does not performthese tasks. Only use passthrough queries to read data.

For example, query the example catalog and select the top 10 percent ofnations by population:

SELECT *FROM TABLE( example.system.query( query => 'SELECT TOP(10) PERCENT * FROM tpch.nation ORDER BY population DESC' ) );

`procedure(varchar) -> table`#

The procedure function allows you to run stored procedures on the underlyingdatabase directly. It requires syntax native to SQL Server, because the full queryis pushed down and processed in SQL Server. In order to use this table function setsqlserver.experimental.stored-procedure-table-function-enabled to true.

Note

The procedure function does not support running StoredProcedures that return multiple statements,use a non-select statement, use output parameters, or use conditional statements.

Warning

This feature is experimental only. The function has security implication and syntax might change andbe backward incompatible.

The follow example runs the stored procedure employee_sp in the example catalog and theexample_schema schema in the underlying SQL Server database:

SELECT *FROM TABLE( example.system.procedure( query => 'EXECUTE example_schema.employee_sp' ) );

If the stored procedure employee_sp requires any inputappend the parameter value to the procedure statement:

SELECT *FROM TABLE( example.system.procedure( query => 'EXECUTE example_schema.employee_sp 0' ) );

Note

The query engine does not preserve the order of the results of thisfunction. If the passed query contains an ORDER BY clause, thefunction result may not be ordered as expected.

Performance#

The connector includes a number of performance improvements, detailed in thefollowing sections.

Table statistics#

The SQL Server connector can use table and column statistics for cost based optimizations, to improve query processing performancebased on the actual data in the data source.

The statistics are collected by SQL Server and retrieved by the connector.

The connector can use information stored in single-column statistics. SQL ServerDatabase can automatically create column statistics for certain columns. Ifcolumn statistics are not created automatically for a certain column, you cancreate them by executing the following statement in SQL Server Database.

CREATE STATISTICS example_statistics_name ON table_schema.table_name (column_name);

SQL Server Database routinely updates the statistics. In some cases, you maywant to force statistics update (e.g. after defining new column statistics orafter changing data in the table). You can do that by executing the followingstatement in SQL Server Database.

UPDATE STATISTICS table_schema.table_name;

Refer to SQL Server documentation for information about options, limitations andadditional considerations.

Pushdown#

The connector supports pushdown for a number of operations:

Join pushdown
Limit pushdown
Top-N pushdown

Aggregate pushdown for the following functions:

avg()
count()
max()
min()
sum()
stddev()
stddev_pop()
stddev_samp()
variance()
var_pop()
var_samp()

Note

The connector performs pushdown where performance may be improved, but inorder to preserve correctness an operation may not be pushed down. Whenpushdown of an operation may result in better performance but riskscorrectness, the connector prioritizes correctness.

Cost-based join pushdown#

The connector supports cost-based Join pushdown to make intelligentdecisions about whether to push down a join operation to the data source.

When cost-based join pushdown is enabled, the connector only pushes down joinoperations if the available Table statistics suggest that doing soimproves performance. Note that if no table statistics are available, joinoperation pushdown does not occur to avoid a potential decrease in queryperformance.

The following table describes catalog configuration properties forjoin pushdown:

Property name	Description	Default value
`join-pushdown.enabled`	Enable join pushdown. Equivalent catalogsession property is`join_pushdown_enabled`.	`true`
`join-pushdown.strategy`	Strategy used to evaluate whether join operations are pushed down. Set to`AUTOMATIC` to enable cost-based join pushdown, or `EAGER` topush down joins whenever possible. Note that `EAGER` can push down joinseven when table statistics are unavailable, which may result in degradedquery performance. Because of this, `EAGER` is only recommended fortesting and troubleshooting purposes.	`AUTOMATIC`

Predicate pushdown support#

The connector supports pushdown of predicates on VARCHAR and NVARCHARcolumns if the underlying columns in SQL Server use a case-sensitive collation.

The following operators are pushed down:

=
<>
IN
NOT IN

To ensure correct results, operators are not pushed down for columns using acase-insensitive collation.

Bulk insert#

You can optionally use the bulk copy APIto drastically speed up write operations.

Enable bulk copying and a lock on the destination table to meet minimallogging requirements.

The following table shows the relevant catalog configuration properties andtheir default values:

Bulk load properties#
Property name	Description	Default
`sqlserver.bulk-copy-for-write.enabled`	Use the SQL Server bulk copy API for writes. The corresponding catalogsession property is `bulk_copy_for_write`.	`false`
`sqlserver.bulk-copy-for-write.lock-destination-table`	Obtain a bulk update lock on the destination table for write operations. Thecorresponding catalog session property is`bulk_copy_for_write_lock_destination_table`. Setting is only used when`bulk-copy-for-write.enabled=true`.	`false`

Limitations:

Column names with leading and trailing spaces are not supported.

Data compression#

You can specify the data compression policy for SQL Server tableswith the data_compression table property. Valid policies are NONE, ROW or PAGE.

Example:

CREATE TABLE example_schema.scientists ( recordkey VARCHAR, name VARCHAR, age BIGINT, birthday DATE)WITH ( data_compression = 'ROW');

SQL Server connector — Trino 457 Documentation (2024)

Requirements#

Configuration#

Connection security#

Data source authentication#

Multiple SQL Server databases or servers#

General configuration properties#

Appending query metadata#

Domain compaction threshold#

Specific configuration properties#

Case insensitive matching#

Non-transactional INSERT#

Fault-tolerant execution support#

Querying SQL Server#

Type mapping#

SQL Server type to Trino type mapping#

Trino type to SQL Server type mapping#

Numeric type mapping#

Character type mapping#

Type mapping configuration properties#

SQL support#

UPDATE#

SQL DELETE#

ALTER TABLE RENAME TO#

Procedures#

system.flush_metadata_cache()#

system.execute('query')#

Table functions#

query(varchar) -> table#

procedure(varchar) -> table#

Performance#

Table statistics#

Pushdown#

Cost-based join pushdown#

Predicate pushdown support#

Bulk insert#

Data compression#

`system.flush_metadata_cache()`#

`system.execute('query')`#

`query(varchar) -> table`#

`procedure(varchar) -> table`#