The SQL Server connector allows querying and creating tables in an externalMicrosoft SQL Server database. Thiscan be used to join data between different systems like SQL Server and Hive, orbetween two different SQL Server instances.
Requirements#
To connect to SQL Server, you need:
SQL Server 2012 or higher, or Azure SQL Database.
Network access from the Trino coordinator and workers to SQL Server.Port 1433 is the default port.
Configuration#
The connector can query a single database on a given SQL Server instance. Createa catalog properties file that specifies the SQL server connector by setting theconnector.name
to sqlserver
.
For example, to access a database as example
, create the fileetc/catalog/example.properties
. Replace the connection properties asappropriate for your setup:
connector.name=sqlserverconnection-url=jdbc:sqlserver://<host>:<port>;databaseName=<databaseName>;encrypt=falseconnection-user=rootconnection-password=secret
The connection-url
defines the connection information and parameters to passto the SQL Server JDBC driver. The supported parameters for the URL areavailable in the SQL Server JDBC driver documentation.
The connection-user
and connection-password
are typically required anddetermine the user credentials for the connection, often a service user. You canuse secrets to avoid actual values in the catalogproperties files.
Connection security#
The JDBC driver, and therefore the connector, automatically use Transport LayerSecurity (TLS) encryption and certificate validation. This requires a suitableTLS certificate configured on your SQL Server database host.
If you do not have the necessary configuration established, you can disableencryption in the connection string with the encrypt
property:
connection-url=jdbc:sqlserver://<host>:<port>;databaseName=<databaseName>;encrypt=false
Further parameters like trustServerCertificate
, hostNameInCertificate
,trustStore
, and trustStorePassword
are details in the TLS section ofSQL Server JDBC driver documentation.
Data source authentication#
The connector can provide credentials for the data source connectionin multiple ways:
inline, in the connector configuration file
in a separate properties file
in a key store file
as extra credentials set when connecting to Trino
You can use secrets to avoid storing sensitivevalues in the catalog properties files.
The following table describes configuration propertiesfor connection credentials:
Property name | Description |
---|---|
| Type of the credential provider. Must be one of |
| Connection user name. |
| Connection password. |
| Name of the extra credentials property, whose value to use as the username. See |
| Name of the extra credentials property, whose value to use as thepassword. |
| Location of the properties file where credentials are present. It mustcontain the |
| The location of the Java Keystore file, from which to read credentials. |
| File format of the keystore file, for example |
| Password for the key store. |
| Name of the key store entity to use as the user name. |
| Password for the user name key store entity. |
| Name of the key store entity to use as the password. |
| Password for the password key store entity. |
Multiple SQL Server databases or servers#
The SQL Server connector can only access a single SQL Server databasewithin a single catalog. Thus, if you have multiple SQL Server databases,or want to connect to multiple SQL Server instances, you must configuremultiple instances of the SQL Server connector.
To add another catalog, simply add another properties file to etc/catalog
with a different name, making sure it ends in .properties
. For example,if you name the property file sales.properties
, Trino creates acatalog named sales
using the configured connector.
General configuration properties#
The following table describes general catalog configuration properties for theconnector:
Property name | Description |
---|---|
| Support case insensitive schema and table names. Defaults to |
| Duration for which case insensitive schema and tablenames are cached. Defaults to |
| Path to a name mapping configuration file in JSON format that allowsTrino to disambiguate between schemas and tables with similar names indifferent cases. Defaults to |
| Frequency with which Trino checks the name matching configuration filefor changes. The duration value defaults to |
| Duration for which metadata, including table andcolumn statistics, is cached. Defaults to |
| Cache the fact that metadata, including table and column statistics, isnot available. Defaults to |
| Duration for which schema metadata is cached.Defaults to the value of |
| Duration for which table metadata is cached.Defaults to the value of |
| Duration for which tables statistics are cached.Defaults to the value of |
| Maximum number of objects stored in the metadata cache. Defaults to |
| Maximum number of statements in a batched execution. Do not changethis setting from the default. Non-default values may negativelyimpact performance. Defaults to |
| Push down dynamic filters into JDBC queries. Defaults to |
| Maximum duration for which Trino waits for dynamicfilters to be collected from the build side of joins before starting aJDBC query. Using a large timeout can potentially result in more detaileddynamic filters. However, it can also increase latency for some queries.Defaults to |
Appending query metadata#
The optional parameter query.comment-format
allows you to configure a SQLcomment that is sent to the datasource with each query. The format of thiscomment can contain any characters and the following metadata:
$QUERY_ID
: The identifier of the query.$USER
: The name of the user who submits the query to Trino.$SOURCE
: The identifier of the client tool used to submit the query, forexampletrino-cli
.$TRACE_TOKEN
: The trace token configured with the client tool.
The comment can provide more context about the query. This additionalinformation is available in the logs of the datasource. To include environmentvariables from the Trino cluster with the comment , use the${ENV:VARIABLE-NAME}
syntax.
The following example sets a simple comment that identifies each query sent byTrino:
query.comment-format=Query sent by Trino.
With this configuration, a query such as SELECT * FROM example_table;
issent to the datasource with the comment appended:
SELECT * FROM example_table; /*Query sent by Trino.*/
The following example improves on the preceding example by using metadata:
query.comment-format=Query $QUERY_ID sent by user $USER from Trino.
If Jane
sent the query with the query identifier20230622_180528_00000_bkizg
, the following comment string is sent to thedatasource:
SELECT * FROM example_table; /*Query 20230622_180528_00000_bkizg sent by user Jane from Trino.*/
Note
Certain JDBC driver settings and logging configurations might cause thecomment to be removed.
Domain compaction threshold#
Pushing down a large list of predicates to the data source can compromiseperformance. Trino compacts large predicates into a simpler range predicateby default to ensure a balance between performance and predicate pushdown.If necessary, the threshold for this compaction can be increased to improveperformance when the data source is capable of taking advantage of largepredicates. Increasing this threshold may improve pushdown of largedynamic filters.The domain-compaction-threshold
catalog configuration property or thedomain_compaction_threshold
catalog session property can be used to adjust the default value of32
for this threshold.
Specific configuration properties#
The SQL Server connector supports additional catalog properties to configure thebehavior of the connector and the issues queries to the database.
Property name | Description |
---|---|
| Control the automatic use of snapshot isolation for transactions issued byTrino in SQL Server. Defaults to |
Case insensitive matching#
When case-insensitive-name-matching
is set to true
, Trinois able to query non-lowercase schemas and tables by maintaining a mapping ofthe lowercase name to the actual name in the remote system. However, if twoschemas and/or tables have names that differ only in case (such as “customers”and “Customers”) then Trino fails to query them due to ambiguity.
In these cases, use the case-insensitive-name-matching.config-file
catalogconfiguration property to specify a configuration file that maps these remoteschemas/tables to their respective Trino schemas/tables:
{ "schemas": [ { "remoteSchema": "CaseSensitiveName", "mapping": "case_insensitive_1" }, { "remoteSchema": "cASEsENSITIVEnAME", "mapping": "case_insensitive_2" }], "tables": [ { "remoteSchema": "CaseSensitiveName", "remoteTable": "tablex", "mapping": "table_1" }, { "remoteSchema": "CaseSensitiveName", "remoteTable": "TABLEX", "mapping": "table_2" }]}
Queries against one of the tables or schemes defined in the mapping
attributes are run against the corresponding remote entity. For example, a queryagainst tables in the case_insensitive_1
schema is forwarded to theCaseSensitiveName schema and a query against case_insensitive_2
is forwardedto the cASEsENSITIVEnAME
schema.
At the table mapping level, a query on case_insensitive_1.table_1
asconfigured above is forwarded to CaseSensitiveName.tablex
, and a query oncase_insensitive_1.table_2
is forwarded to CaseSensitiveName.TABLEX
.
By default, when a change is made to the mapping configuration file, Trino mustbe restarted to load the changes. Optionally, you can set thecase-insensitive-name-mapping.refresh-period
to have Trino refresh theproperties without requiring a restart:
case-insensitive-name-mapping.refresh-period=30s
Non-transactional INSERT#
The connector supports adding rows using INSERT statements.By default, data insertion is performed by writing data to a temporary table.You can skip this step to improve performance and write directly to the targettable. Set the insert.non-transactional-insert.enabled
catalog propertyor the corresponding non_transactional_insert
catalog session property totrue
.
Note that with this property enabled, data can be corrupted in rare cases whereexceptions occur during the insert operation. With transactions disabled, norollback can be performed.
Fault-tolerant execution support#
The connector supports Fault-tolerant execution of queryprocessing. Read and write operations are both supported with any retry policy.
Querying SQL Server#
The SQL Server connector provides access to all schemas visible to the specifieduser in the configured database. For the following examples, assume the SQLServer catalog is example
.
You can see the available schemas by running SHOW SCHEMAS
:
SHOW SCHEMAS FROM example;
If you have a schema named web
, you can view the tablesin this schema by running SHOW TABLES
:
SHOW TABLES FROM example.web;
You can see a list of the columns in the clicks
table in the web
databaseusing either of the following:
DESCRIBE example.web.clicks;SHOW COLUMNS FROM example.web.clicks;
Finally, you can query the clicks
table in the web
schema:
SELECT * FROM example.web.clicks;
If you used a different name for your catalog properties file, usethat catalog name instead of example
in the above examples.
Type mapping#
Because Trino and SQL Server each support types that the other does not, thisconnector modifies some types when reading orwriting data. Data types may not map the same way in both directions betweenTrino and the data source. Refer to the following sections for type mapping ineach direction.
SQL Server type to Trino type mapping#
The connector maps SQL Server types to the corresponding Trino types following this table:
SQL Server database type | Trino type | Notes |
---|---|---|
|
| |
|
| SQL Server |
|
| |
|
| |
|
| |
|
| |
|
| See Numeric type mapping |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
|
| |
|
|
|
|
| |
|
|
|
|
|
|
|
| |
|
|
|
Trino type to SQL Server type mapping#
The connector maps Trino types to the corresponding SQL Server types following this table:
Trino type | SQL Server type | Notes |
---|---|---|
|
| |
|
| Trino only supports writing values belonging to |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| See Character type mapping |
|
| See Character type mapping |
|
| |
|
| |
|
|
|
|
|
|
Complete list of SQL Server data types.
Numeric type mapping#
For SQL Server FLOAT[(n)]
:
If
n
is not specified maps to TrinoDouble
If
1 <= n <= 24
maps to TrinoREAL
If
24 < n <= 53
maps to TrinoDOUBLE
Character type mapping#
For Trino CHAR(n)
:
If
1 <= n <= 4000
maps SQL ServerNCHAR(n)
If
n > 4000
maps SQL ServerNVARCHAR(max)
For Trino VARCHAR(n)
:
If
1 <= n <= 4000
maps SQL ServerNVARCHAR(n)
If
n > 4000
maps SQL ServerNVARCHAR(max)
Type mapping configuration properties#
The following properties can be used to configure how data types from theconnected data source are mapped to Trino data types and how the metadata iscached in Trino.
Property name | Description | Default value |
---|---|---|
| Configure how unsupported column data types are handled:
The respective catalog session property is |
|
| Allow forced mapping of comma separated lists of data types to convert tounbounded |
SQL support#
The connector provides read access and write access to data and metadata in SQLServer. In addition to the globally availableand read operation statements, the connectorsupports the following features:
INSERT
UPDATE
DELETE
TRUNCATE
Schema and table management
UPDATE#
Only UPDATE
statements with constant assignments and predicates aresupported. For example, the following statement is supported because the valuesassigned are constants:
UPDATE table SET col1 = 1 WHERE col3 = 1
Arithmetic expressions, function calls, and other non-constant UPDATE
statements are not supported. For example, the following statement is notsupported because arithmetic expressions cannot be used with the SET
command:
UPDATE table SET col1 = col2 + 2 WHERE col3 = 1
All column values of a table row cannot be updated simultaneously. For a threecolumn table, the following statement is not supported:
UPDATE table SET col1 = 1, col2 = 2, col3 = 3 WHERE col3 = 1
SQL DELETE#
If a WHERE
clause is specified, the DELETE
operation only works if thepredicate in the clause can be fully pushed down to the data source.
ALTER TABLE RENAME TO#
The connector does not support renaming tables across multiple schemas. Forexample, the following statement is supported:
ALTER TABLE example.schema_one.table_one RENAME TO example.schema_one.table_two
The following statement attempts to rename a table across schemas, and thereforeis not supported:
ALTER TABLE example.schema_one.table_one RENAME TO example.schema_two.table_two
Procedures#
system.flush_metadata_cache()
#
Flush JDBC metadata caches. For example, the following system callflushes the metadata caches for all schemas in the example
catalog
USE example.example_schema;CALL system.flush_metadata_cache();
system.execute('query')
#
The execute
procedure allows you to execute a query in the underlying datasource directly. The query must use supported syntax of the connected datasource. Use the procedure to access features which are not available in Trinoor to execute queries that return no result set and therefore can not be usedwith the query
or raw_query
pass-through table function. Typical use casesare statements that create or alter objects, and require native feature suchas constraints, default values, automatic identifier creation, or indexes.Queries can also invoke statements that insert, update, or delete data, and donot return any data as a result.
The query text is not parsed by Trino, only passed through, and therefore onlysubject to any security or access control of the underlying data source.
The following example sets the current database to the example_schema
of theexample
catalog. Then it calls the procedure in that schema to drop thedefault value from your_column
on your_table
table using the standard SQLsyntax in the parameter value assigned for query
:
USE example.example_schema;CALL system.execute(query => 'ALTER TABLE your_table ALTER COLUMN your_column DROP DEFAULT');
Verify that the specific database supports this syntax, and adapt as necessarybased on the documentation for the specific connected database and databaseversion.
Table functions#
The connector provides specific table functions toaccess SQL Server.
query(varchar) -> table
#
The query
function allows you to query the underlying database directly. Itrequires syntax native to SQL Server, because the full query is pushed down andprocessed in SQL Server. This can be useful for accessing native features whichare not implemented in Trino or for improving query performance in situationswhere running a query natively may be faster.
The native query passed to the underlying data source is required to return atable as a result set. Only the data source performs validation or securitychecks for these queries using its own configuration. Trino does not performthese tasks. Only use passthrough queries to read data.
For example, query the example
catalog and select the top 10 percent ofnations by population:
SELECT *FROM TABLE( example.system.query( query => 'SELECT TOP(10) PERCENT * FROM tpch.nation ORDER BY population DESC' ) );
procedure(varchar) -> table
#
The procedure
function allows you to run stored procedures on the underlyingdatabase directly. It requires syntax native to SQL Server, because the full queryis pushed down and processed in SQL Server. In order to use this table function setsqlserver.experimental.stored-procedure-table-function-enabled
to true
.
Note
The procedure
function does not support running StoredProcedures that return multiple statements,use a non-select statement, use output parameters, or use conditional statements.
Warning
This feature is experimental only. The function has security implication and syntax might change andbe backward incompatible.
The follow example runs the stored procedure employee_sp
in the example
catalog and theexample_schema
schema in the underlying SQL Server database:
SELECT *FROM TABLE( example.system.procedure( query => 'EXECUTE example_schema.employee_sp' ) );
If the stored procedure employee_sp
requires any inputappend the parameter value to the procedure statement:
SELECT *FROM TABLE( example.system.procedure( query => 'EXECUTE example_schema.employee_sp 0' ) );
Note
The query engine does not preserve the order of the results of thisfunction. If the passed query contains an ORDER BY
clause, thefunction result may not be ordered as expected.
Performance#
The connector includes a number of performance improvements, detailed in thefollowing sections.
Table statistics#
The SQL Server connector can use table and column statistics for cost based optimizations, to improve query processing performancebased on the actual data in the data source.
The statistics are collected by SQL Server and retrieved by the connector.
The connector can use information stored in single-column statistics. SQL ServerDatabase can automatically create column statistics for certain columns. Ifcolumn statistics are not created automatically for a certain column, you cancreate them by executing the following statement in SQL Server Database.
CREATE STATISTICS example_statistics_name ON table_schema.table_name (column_name);
SQL Server Database routinely updates the statistics. In some cases, you maywant to force statistics update (e.g. after defining new column statistics orafter changing data in the table). You can do that by executing the followingstatement in SQL Server Database.
UPDATE STATISTICS table_schema.table_name;
Refer to SQL Server documentation for information about options, limitations andadditional considerations.
Pushdown#
The connector supports pushdown for a number of operations:
Join pushdown
Limit pushdown
Top-N pushdown
Aggregate pushdown for the following functions:
avg()
count()
max()
min()
sum()
stddev()
stddev_pop()
stddev_samp()
variance()
var_pop()
var_samp()
Note
The connector performs pushdown where performance may be improved, but inorder to preserve correctness an operation may not be pushed down. Whenpushdown of an operation may result in better performance but riskscorrectness, the connector prioritizes correctness.
Cost-based join pushdown#
The connector supports cost-based Join pushdown to make intelligentdecisions about whether to push down a join operation to the data source.
When cost-based join pushdown is enabled, the connector only pushes down joinoperations if the available Table statistics suggest that doing soimproves performance. Note that if no table statistics are available, joinoperation pushdown does not occur to avoid a potential decrease in queryperformance.
The following table describes catalog configuration properties forjoin pushdown:
Property name | Description | Default value |
---|---|---|
| Enable join pushdown. Equivalent catalogsession property is |
|
| Strategy used to evaluate whether join operations are pushed down. Set to |
|
Predicate pushdown support#
The connector supports pushdown of predicates on VARCHAR
and NVARCHAR
columns if the underlying columns in SQL Server use a case-sensitive collation.
The following operators are pushed down:
=
<>
IN
NOT IN
To ensure correct results, operators are not pushed down for columns using acase-insensitive collation.
Bulk insert#
You can optionally use the bulk copy APIto drastically speed up write operations.
Enable bulk copying and a lock on the destination table to meet minimallogging requirements.
The following table shows the relevant catalog configuration properties andtheir default values:
Property name | Description | Default |
---|---|---|
| Use the SQL Server bulk copy API for writes. The corresponding catalogsession property is |
|
| Obtain a bulk update lock on the destination table for write operations. Thecorresponding catalog session property is |
|
Limitations:
Column names with leading and trailing spaces are not supported.
Data compression#
You can specify the data compression policy for SQL Server tableswith the data_compression
table property. Valid policies are NONE
, ROW
or PAGE
.
Example:
CREATE TABLE example_schema.scientists ( recordkey VARCHAR, name VARCHAR, age BIGINT, birthday DATE)WITH ( data_compression = 'ROW');