Breaking SQL queries up into temporary physical or virtual “units” is done for many reasons, including performing complex SQL logic, re-using data for multiple DML/ETL operations (for example, multiple INSERTs), or simply just for better SQL code readability/maintainability.
This posts discusses the 3 available options in SQL Server for holding temporary data, these are SQL Table Variable, SQL Temporary Table and the CTE (Common Table Expression) SQL Syntax
We are trying to essentially find the best method to store data in terms of performance, but as with many SQL Server optimization tips, it really depends on the particular situation/purpose of the query, and the server environment setup you have, in terms of resources (memory and CPU).
SQL Server Temporary Table Performance
SQL Server Temporary Table’s scope is across multiple SQL statements or queries.
Temporary Tables are usually tables that are temporarily created in SQL Server’s tempdb database, this has many implications:
- Moving data to a materialized table is I/O intensive, since the data is actually written back to disk (in a physical table) until the scope of the Temporary Table has expired.
- The table being in tempdb, this introduces a dynamic factor that affects performance depending on the current stress load of tempdb. So it could be argued that it is more difficult to keep a tight control over performance figures.
Temporary tables are generally used in situations were you have a large table, server memory is of paramount importance and you don’t want to stress it further, and I/O access is rapid and non-restrictive. It can be the slowest performing solution for temporary storage in SQL Server; this is the result of requiring both I/O and memory resources in order to process data into and out of a temporary table, and being dependent on TempDB (which is a shared database across the instance) it is generally difficult to ensure consistent performance figures from temporary table, particularly in instances were TempDB is being actively used across the databases. This maybe the only solution though if you have a large dataset that could do some damage on your memory resources if it was fully held there.
Temporary Table Tip: For very large datasets, and IF the query runs as part of a sequential (non-blocking) ETL operation, it might be worth experimenting with creating a “permanent” temporary table on the relevant database (rather than tempdb), this is because tempdb is shared across many different databases, while you could possibly place tighter performance controls around the database this query is relevant to. This permanent table could be TRUNCATED at the start and end of the query. I found this leads to more consistent performance.
Additionally, temporary tables can be very powerful since it allows the ability to create indexes to aid DML operation performance, this is an important feature of temporary tables (that does not exist in table variables) and should definitely be considered when handling large datasets and SARG-able DML statements.
SQL Server Table Variable Performance
SQL Server Table Variable’s scope is across multiple SQL statements within one query.
SQL Server Table Variable is different from a Temporary Table in that it *could* be created in memory, and so the whole dataset is uploaded to memory, and queried from there. This means there is no I/O involvement and no TempDB bottlenecks. This is only true if the server is not under memory pressure, under memory pressure SQL will push the table variable pages to TempDB, think of it as “Paging” to disk when the OS is under memory pressure.
Table Variables can be clearly superior over Temporary Tables when handling smaller datasets, but this comes with a few caveats:
- Table Variables do not allow DDL statements, so that means you cannot create indexes to improve the performance of your table, which could lead to lower performance when being used in joins, aggregations or filtering. You could work-around this limitation sometimes by creating a UNIQUE constrain on the Table Variable definition.
- Table Variables do not maintain statistics, see notes below for how this might affect the performance.
My opinion is to use Table Variables over Temporary Tables when you have a small dataset which does not require indexed operations, and ONLY IF you need your temporary data scope to span across many SQL statements.
SQL Server CTE (Common Table Expression) Performance
SQL Server CTE’s scope is within one SQL statement or query.
CTEs are actually only a syntactical way of breaking up a query, hence why their scope is one query, and in reality, all the SQL Server query processing engine is doing in the background is substituting the table alias with the query itself (and so in many ways it is similar to creating an “on-the-fly” SQL VIEWs) . This means that they offer the same benefits as a Table Variable, with the added bonus of lower stress on memory (in some queries).
I would recommend using a CTE in situation were u don’t need temporary data scope to exceed 1 query (or statement), this will most likely offer the best (and safest) performance option with a low server memory or CPU resource utilization, although of course this depends on your query and table setup.
Common Table Expression Tip: It is important to remember than you can chain CTE statements, creating a Multiple CTE expression each referencing each other. What this essentially means is that you shouldn’t choose to opt-out of the CTE option because of the lack of expressiveness of the SQL language, because it is expressive, but only opt-out if the query has multiple destinations (for example, uses the temp data to INSERT to multiple tables).
In any case, whatever option you decide to go for, I would highly recommend thoroughly testing the impact of each against your own SQL Server setup, there are so many variables to consider here, particularly when it comes down to high usage queries (relatively fast queries that are used very often, usually on client facing websites).
Temporary Table vs Table Variable Performance within Stored Procedures
Temporary storage behaves in a rather unpredictable manner when utilized within the context of a parameterized stored procedure, the issue stems from a classic parameter sniffing and statistics miss-match problem that is regularly encountered when optimizing stored procedure performance.
I will not go into much details here with regard to this subject, since Paul White covered it in excellent depth on his Temporary Tables in Stored Procedure post. But in a nutshell, a Temporary Table within a stored procedure sometimes ends up being cached by the engine (never actually gets dropped and the object is persisted and reused after each execution), when this happens statistics are maintained and reused as well on each execution of the stored procedure, this is likely to lead to a miss-match in statistics which propagates as lower (or higher, but that is not a problem) memory allocation to the query, lower memory allocation will lead to the infamous TempDB Spillover on operations such as Sorting or Aggregation causing massive performance bottlenecks.
The problem is magnified in Table Variables, since Table Variables maintain no statistics and the objects are discarded after each execution, leaving the SQL engine no options but to use guess-work statistics, which introduces a random factor on performance figures.