SQL Server index refresher

TAB

DBA Julian Fletcher on the key points of good indexing practice in SQL Server

We all know that indexes are good for performance and enforcing uniqueness. Obviously there’s a lot more to them than that. Here are a few key points to refresh the old grey matter (which is apparently pink in a living human being).

1. A table can have up to one clustered index

  • This can be on any column or set of columns – these can be nullable and don’t even have to contain unique values.
    • However, a clustered index can be defined as unique if you want.
    • Clustered indexes get a uniquifier added for any rows with key values that duplicate an existing row. This is just an ascending integer.
  • A clustered index (if there is one) effectively is the table and the table’s data is stored in the order of the clustered index.
    • That’s why there can only be one clustered index per table.
    • This also means that the clustered index should be made on columns that increase with each new row (such as the table’s identity column, if it has one) – otherwise, SQL may have to do a lot of reordering when new rows are added. (So don’t use a GUID or a SPID!)
  • A table without a clustered index is referred to as a heap.
    • The data in a heap is usually stored in the order it’s added – probably randomly (although SQL might decide to move it around at any later point).

2. A table may have any number of non-clustered indexes

  • These can be unique or non-unique.
  • They are separate from the table’s data but reference it.
  • They always contain a row locator back to the base table.
    • If the table has a clustered index, this is used; this means that a non-clustered index never needs to include any of the columns in the clustered index as they are already part of the index (although hidden by SQL).
    • If the table is a heap (that is, it doesn’t have a clustered index), a physical row identifier (RID) is used.

3. Additional columns may be included on a non-clustered index

  • This means that SQL might be able to avoid doing a (costly) key lookup when executing queries.
  • They can be added to unique or non-unique indexes. (They can’t be added to a clustered index because there would be no point in doing so.)

4. A table may also have up to one primary key

  • This can be defined on one or more columns, none of which may be nullable.
  • The values in the column(s) must be unique.
  • It may be clustered (assuming the table doesn’t already have a clustered index) – in fact, SSMS will automatically create a clustered index when making a primary key if one doesn’t already exist.
    • However, it doesn’t have to be clustered.

5. Non-clustered indexes may be filtered

  • This limits the rows included in the index.
  • Filters can be applied to unique or non-unique indexes.
  • They obviously can’t be added to a clustered index nor a table’s primary key (even if it’s not clustered).

6. Tables may be linked by foreign keys

  • You can establish a foreign key from one table (the foreign key table) to another (the primary key table).
  • Each foreign key can link one or more columns.
  • However, the column(s) must be unique in the primary key table via either a primary key or a unique index on that table.

7. Tables may also have up to one identity column

  • It has to be an integer-type column.

8. A covering index is one that includes all the columns referenced by a specific SQL statement

  • If there is one, SQL will use it instead of having to perform a (potentially costly) lookup to the clustered index.

Bringing it all together

This is by no means the only way of doing things, but in ContrOCC the majority of our tables conform to the following design:

  • a table to hold information about things is called T_Thing (or TRef_Thing or even TRefSys_Thing, depending on what type of thing a Thing is and whether OCC or the customer ‘owns’ it)
  • the table will have a ThingID column which is as follows:
    • an identity column (unless it’s a TRefSys table, in which case OCC must be able to control the values in the column)
    • the table’s primary key
    • clustered

These together mean that there’s always a suitable column for making foreign keys between tables.

It will usually also have a Thing column, which is a textual description of what the Thing is, and a Deleted column.