SQL Server Table Variable vs Temporary Table vs CTE
Breaking SQL queries up into temporary physical or virtual “units” is done for many reasons, including performing complex SQL logic, re-using data for multiple DML/ETL operations (for example, multiple INSERTs), or simply just for better SQL code readability/maintainability.
This posts discusses the 3 available options in SQL Server for holding temporary data, these are SQL Table Variable, SQL Temporary Table and the CTE (Common Table Expression) SQL Syntax
We are trying to essentially find the best method to store data in terms of performance, but as with many SQL Server optimization tips, it really depends on the particular situation/purpose of the query, and the server environment setup you have, in terms of resources (memory and CPU).
SQL Server Temporary Table Performance
SQL Server Temporary Table’s scope is across multiple SQL statements or queries.
Temporary Tables are usually tables that are temporarily created in SQL Server’s tempdb database, this has many implications:
- Moving data to a materialized table is I/O intensive, since the data is actually written back to disk (in a physical table) until the scope of the Temporary Table has expired.
- The table being in tempdb, this introduces a dynamic factor that affects performance depending on the current stress load of tempdb. So it could be argued that it is more difficult to keep a tight control over performance figures.
Temporary tables are generally used in situations were you have a large table, server memory is of paramount importance and you don’t want to stress it further, and I/O access is rapid and non-restrictive. It can be the slowest performing solution for temporary storage in SQL Server; this is the result of requiring both I/O and memory resources in order to process data into and out of a temporary table, and being dependent on TempDB (which is a shared database across the instance) it is generally difficult to ensure consistent performance figures from temporary table, particularly in instances were TempDB is being actively used across the databases. This maybe the only solution though if you have a large dataset that could do some damage on your memory resources if it was fully held there.
Temporary Table Tip: For very large datasets, and IF the query runs as part of a sequential (non-blocking) ETL operation, it might be worth experimenting with creating a “permanent” temporary table on the relevant database (rather than tempdb), this is because tempdb is shared across many different databases, while you could possibly place tighter performance controls around the database this query is relevant to. This permanent table could be TRUNCATED at the start and end of the query. I found this leads to more consistent performance.
Additionally, temporary tables can be very powerful since it allows the ability to create indexes to aid DML operation performance, this is an important feature of temporary tables (that does not exist in table variables) and should definitely be considered when handling large datasets and SARG-able DML statements.
SQL Server Table Variable Performance
SQL Server Table Variable’s scope is across multiple SQL statements within one query.
SQL Server Table Variable is different from a Temporary Table in that it *could* be created in memory, and so the whole dataset is uploaded to memory, and queried from there. This means there is no I/O involvement and no TempDB bottlenecks. This is only true if the server is not under memory pressure, under memory pressure SQL will push the table variable pages to TempDB, think of it as “Paging” to disk when the OS is under memory pressure.
Table Variables can be clearly superior over Temporary Tables when handling smaller datasets, but this comes with a few caveats:
- Table Variables do not allow DDL statements, so that means you cannot create indexes to improve the performance of your table, which could lead to lower performance when being used in joins, aggregations or filtering. You could work-around this limitation sometimes by creating a UNIQUE constrain on the Table Variable definition.
- Table Variables do not maintain statistics, see notes below for how this might affect the performance.
My opinion is to use Table Variables over Temporary Tables when you have a small dataset which does not require indexed operations, and ONLY IF you need your temporary data scope to span across many SQL statements.
SQL Server CTE (Common Table Expression) Performance
SQL Server CTE’s scope is within one SQL statement or query.
CTEs are actually only a syntactical way of breaking up a query, hence why their scope is one query, and in reality, all the SQL Server query processing engine is doing in the background is substituting the table alias with the query itself (and so in many ways it is similar to creating an “on-the-fly” SQL VIEWs) . This means that they offer the same benefits as a Table Variable, with the added bonus of lower stress on memory (in some queries).
I would recommend using a CTE in situation were u don’t need temporary data scope to exceed 1 query (or statement), this will most likely offer the best (and safest) performance option with a low server memory or CPU resource utilization, although of course this depends on your query and table setup.
Common Table Expression Tip: It is important to remember than you can chain CTE statements, creating a Multiple CTE expression each referencing each other. What this essentially means is that you shouldn’t choose to opt-out of the CTE option because of the lack of expressiveness of the SQL language, because it is expressive, but only opt-out if the query has multiple destinations (for example, uses the temp data to INSERT to multiple tables).
In any case, whatever option you decide to go for, I would highly recommend thoroughly testing the impact of each against your own SQL Server setup, there are so many variables to consider here, particularly when it comes down to high usage queries (relatively fast queries that are used very often, usually on client facing websites).
Temporary Table vs Table Variable Performance within Stored Procedures
Temporary storage behaves in a rather unpredictable manner when utilized within the context of a parameterized stored procedure, the issue stems from a classic parameter sniffing and statistics miss-match problem that is regularly encountered when optimizing stored procedure performance.
I will not go into much details here with regard to this subject, since Paul White covered it in excellent depth on his Temporary Tables in Stored Procedure post. But in a nutshell, a Temporary Table within a stored procedure sometimes ends up being cached by the engine (never actually gets dropped and the object is persisted and reused after each execution), when this happens statistics are maintained and reused as well on each execution of the stored procedure, this is likely to lead to a miss-match in statistics which propagates as lower (or higher, but that is not a problem) memory allocation to the query, lower memory allocation will lead to the infamous TempDB Spillover on operations such as Sorting or Aggregation causing massive performance bottlenecks.
The problem is magnified in Table Variables, since Table Variables maintain no statistics and the objects are discarded after each execution, leaving the SQL engine no options but to use guess-work statistics, which introduces a random factor on performance figures.
Thanks for that post. I was trying to figure out some of the methods I should try to use to approach an array like situation, and all the advise was throwing about the term CTE like it was the messiah come back…. As I suspected – fancy sub queries it is then.
I had a manager a while back that used to always suggest to “BCP” everything!, even if we were talking about something completely unrelated, he would still suggest to BCP it… I started thinking that he’s just using it as a buzz word, hoping 1 in 100 it might hit home.
Perhaps he wants to open all the table data in his word processor where he can understand it? Did he at least know what a command prompt is? Perhaps he learned that telling you to DOS it was just silly…
“DOS it” –> thats just brilliant, particularly because I live in Scotland and “dos” (urban dic) has a different meaning.
He actually meant BCP as in BULK COPY PROCEDURE in SQL Server
Maybe you can confirm or disprove something which I thought I heard (relating CTE’s and Table Variables).
I believe that if you are using a CTE for multiple queries, then each reference of the CTE will go away and perform the query again.
If you use a table variable, once you have populated this, then the data is “in-memory”. Subsequently, if there are multiple references, then it is more efficient to use a table variable than a CTE.
Similarly, depending on the number of rows in your table variable, a temp table will become more efficient as it can be indexed and has statistics.
Not sure if that’s all utter nonsense, however this is how I’ve been lead to believe the differences are.
Over-all I tend to agree with your thinking, although there are some subtle ways your hypothesis might be challenged, in relation to how SQL Server’s internals work.
When using CTE for multiple queries, you are correct that SQL Server will perform the CTE multiple times, but you have to remember that SQL tries to cache things in memory, and you might notice that the second time the same CTE is used, most of the data will already be available in memory (the underlying tables would’ve been uploaded to memory), and so it will return much faster.
When it comes to table variables, SQL Server does not necessarily keep the data “in-memory” and could potentially push it off to disk (in tempdb), if this happens, then the cost associated with querying a table variable will obviously increase, and given that you cannot control indexes and statistics, you might end up in a worse situation than using a temp table.
I would even go further, and recommend that maybe you use a local temp table to the database you are operating on, instead of a temp table in tempdb, this will most likely give you a more consistent performance figure, eliminating dependency on existing tempdb operation.