Outsmarting SQL Joins

Reynold Greenlaw

If you work with SQL you will know about the various types of join and what they do:

  • [Inner] Join
  • Left [Outer] Join
  • Right [Outer] Join
  • Full [Outer] Join
  • Cross Join

(Where the Inner and Outer keywords are optional).

But how does SQL actually do the joining? Usually, it picks the most appropriate method (ie the fastest) and everything’s fine. Occasionally, however, it drops the performance ball and everything grinds to a halt. In these situations, knowing what’s going on behind the scenes can help to overrule SQL and get a better performance.

Here’s how SQL does joins:

Nested Loop
Good for joining from a small table to a large indexed table. Each row in the small table is taken in turn and matches found in the large indexed table use an index lookup.

Merge
Good for joining between two large tables. These are both sorted on the columns being used for the join. The first row in one table is then compared with the first row in the second. Depending on which has the bigger value for the join column, the next row is then read from one or other of the tables. This process, which is, essentially, stepping down through the tables one row at a time, matching on the join column, is repeated until the end of one is reached.

Hash
Used when all else fails. Neither table needs to be indexed nor sorted. Each row in the smaller table is hashed and put into a hash bucket. Each row in the larger table is then hashed and checked against the corresponding hash bucket for a match.

Sometimes, we know that one particular type of join will be better when SQL decides to pick another. SQL can be forced to use one option quite simply with the syntax: Left Merge Join, etc. This can result in significant performance improvements so give it a try!

Leave a Reply