Hash Match is a strategy used by SQL Server to join two tables together using the Hash Bucket and Hashing Algorithm approach, this approach is taken when joining tables or aggregating rows.
In this post we will go through what is a Hash Match, what are the types of Hash Match, why does SQL Server Execution Plan decides to go for a Hash Match, and also how to go about optimizing a Hash Match join or aggregation.
What is Hash Match?
Hash Match is a strategy were SQL Server hashes the columns involved in an aggregation or a join (the argument columns), in order to quickly match these columns (either to each other or to other tables), allowing SQL Server to efficiently perform the required join or aggregation.
Types of Hash Match Graphical Execution Plan Operator
A Hash Match operator can appear in the following situations:
- Hash Match Join: This is when SQL Server creates hash values (using the HASH() internal function) to facilitate joining two tables. In a default scenario, the hashing is initially performed on the joining columns of the smaller table, which is then matched against the corresponding columns of the larger table.
- Hash Match Aggregation (or Distinct): In this scenario SQL Server execution plan decides to build a hash table to facilitate the aggregation of a table. In this case the aggregation argument columns of the aggregated SQL table are hashed, then after the hashing SQL Server scans this new table matching records in order to satisfy the requested criteria.
- Hash Match Union: Hash Match is also used to facilitate a Union operation between two tables, SQL Server builds the hash table out of the first table involved in a union operation, and probes on the second table building hash values and matching them.
When is Hash Match used by SQL Server’s Execution Plan
A Hash Match is likely to pop-up in the following scenarios:
- No covering index is available on the tables involved in join, union or aggregation operations.
- One large table is being joined against a much small table, Hash Match sometimes proves to be very efficient in these cases.
Optimize SQL Server Hash Match operator
Optimizing Hash Match might be difficult due to the nature of the query itself, or Hash Match might actually be the best option for the query you are trying to execute, nevertheless, here are a few options you could try out in order to improve the performance of Hash Match, or possibly alert SQL Server’s execution plan to a better option for joining, concatenating or grouping data.
- Building a Covering Index: SQL Server will not need to create a hash table to hold the hashed values being used in an argument if these columns are properly indexed, hence the best way to rid your query of a Hash Match might be building a covering index that effectively encompasses the argument being used in the Hash Match operator.
- Updating Table Statistics: SQL Server might decide to go for Hash Match due to the lack of updated statistics on the underlying tables, for example if SQL Server wrongly detects a large difference between the sizes of the rows involved in a join operation, the execution plan might select Hash Match as the best join type option, leading to a larger over-all execution time. Ensuring your statistics are up-to-date through your maintenance plan is crucial and could improve the health of many queries.
- Adding the Appropriate Filter Operations (SARG): Filtering queries and making them SARG-able is one of the best ways to ensure efficient results for any query type, remember to include only the required rows (and columns) for the output, sometimes its better to get little pieces of information from a database multiple times (and delayed until needed), rather than getting a large data-set which is being held in full in the application memory resources. Basically if your business requirement allows it, segment the processing of large data-sets through queries with SARG expressions.
- Consider Partitioning your Larger Table: Although this shouldn’t be your sole reason for partitioning, as well as partitioning being a powerful concept with a learning curve, and might be an over-kill for the purpose of simply optimising a Hash Match operator, still, this could solve your optimization requirement by creating a Hash Match Team operator, giving parallelism to your Hash Match operator.
- Calculated Expressions used in SARG or Argument Statement: As always, Calculated Expressions could be the culprit behind a slow Hash Match operation; by materializing calculated expressions through indexed persisted computed columns, you could turn a calculated expression into an indexed column that could be utilized in a much more efficient way to satisfy join or aggregation operations.
It is important to carefully benchmark your results, and ensure that any new indexed or columns you create do not affect any other aspect of your database, such as slowing down a critical insert operation or affecting performance of other queries.