Database

Index

Why do you need to index your tables?

Relational databases use indexes to find data quickly when a query is processed. Creating the proper index can drastically increase the performance of an application.

Because without an index the SQL server has to scan the entire table to return the requested data. It is like the index page in a book. You check within the index for the keyword you want to learn about. From that point forward, you jump directly to the page where the content belongs, instead of scanning page by page for the material you want to read.

A table without a clustered-index is called a heap table”. A heap table has not its data sorted. The SQL server has to scan the entire table in order to locate the data, in a process called a scan”.

The side effects of indexes are related to the cost of INSERTUPDATE, MERGE and DELETE statements. Such statements can take longer to execute, in the presence of indexes, as it alters the data in the table, thus to the indexes too.

Imagine the situation of an INSERT statement. It has to add new rows in a table with a clustered index. In such case the table rows may need to be re-positioned! Remember…? The clustered index needs to order the data pages themselves! This will cause overhead.

So, it is crucial to take into account the overhead of INSERTUPDATE and DELETE statements before designing your indexing strategy. Although there is an overhead to the above statements, you have to take into account, that many times, an UPDATE or DELETE statement will execute in a subset of data. This subset can be defined by a WHERE clause, where indexing may outweigh the additional cost of index updates, because the SQL server will have to find the data before updating them.

As explained above, a non-clustered index includes the clustered index’s key as its row locator, in the presence of a clustered index in the table.

clustered index, the SQL server would have to check the row locator of the non-clustered index, in order to do an additional navigation to the B-tree structure of the clustered index, to retrieve the desired row. You see, the row locator does not contain the RID, but the clustered-index key.

On the other hand, there is a benefit. It has to do with the clustered index updates. Imagine the following situation: Two new rows with index key values of A2 and A3 have to be added in the clustered index below.

Because this is a clustered index page, its physical structure has to be reallocated in order to fit A2 and A3 between A1 and A4. It has to maintain index’s order. Since there is no free space in the index page to accommodate these changes,  a page split will occur. Now, there is enough space to fit A2 and A3 between A1 and A4.

The goal achieved and the order maintained within the index. But imagine what would have happened if the non-clustered index was looking at the RID, instead of the clustered index’s key? It would have to change its row locators to reflect the changes. This could have been a huge performance hit! Especially, in the case of large clustered indexes.

Instead of the RID, the row locators now point at the clustered index key. Meaning, that there is no longer needed to change its values. This is quite a benefit if you think of the large clustered indexes, that are usually maintained in many tables.

CLUSTERED INDEX

In the case of a clustered index, the data are sorted on the key values (columns) of the index. The SQL server is now able to locate the data by navigating down from the root node, to the branch and finally to the leaf nodes of the B-tree structure of the index. This process called a “seek”. The later approach is much faster, when you want to filter or sort the data you want to retrieve.

clustered index is like a phone book. A method of storing table data. There can only be one way in which you can order the data physically. Hence there can only be one clustered index per table. The rows are stored physically on the disk in the same order as the index. A cluster is a group of tables that share the same data blocks because they share common columns and are often used together

A clustered index alters the way that the rows are physically stored. When you create a clustered index on a column (or a number of columns), the SQL server sorts the table’s rows by that column(s).

It is like a dictionary, where all words are sorted in an alphabetical order. Note, that only one clustered index can be created per table. It alters the way the table is physically stored, it couldn’t be otherwise.

In the example below, all rows are sorted by computer_id, as a clustered index on the computer_id column has been created.

Advantages:

Non CLUSTERED INDEX

If a clustered index is like a phone book, a nonclustered index is like the index in the back of a chemistry text book. The data is stored in one place, the index in another, with pointers to the storage location of the data. There is a second list that has pointers to the physical rows. The data is stored in one place, the index in another place. You can have many non clustered indexes, although each new index will increase the time it takes to write new records.

A non-clustered index, on the other hand, is a completely different object in the table. It contains only a subset of the columns. It also contains a row locator looking back to the table’s rows, or to the clustered index’s key.

Because of its smaller size (subset of columns), a non-clustered index can fit more rows in an index page, therefore resulting to an improved I/O performance. Furthermore a non-clustered index can be allocated to a different FileGroup, which can utilize a different physical storage in order to improve performance even more.

A non-clustered index, on the other hand, does not alter the way the rows are stored in the table. Instead, it creates a completely different object within the table, that contains the column(s) selected for indexing and a pointer back to the table’s rows containing the data.

It is like an index in the last pages of a book. All keywords are sorted and contain a reference back to the appropriate page number. A non-clustered index on the computer_id column, in the previous example, would look like the table below:

Advantages:

You have a table that is constantly having values updated. These updated values are used as in your JOINs and WHERE clauses. If our values are constantly changing, SQL only has to update the index and pointers while putting the actual data wherever it has available space on disk. Compare this to a clustered index where it has to put the inserted/updated data in the correct order, meaning potentially lots of operations to shift the data around if available free space doesn’t exist at that location.

 

Heap – Non Index table

You have a small staging table that you will always read all rows from and then truncate. You don’t care about the order. Do you add an index?  No, leave it as a heap — This is one scenario where not adding an index can give you better performance since there is no overhead in SQL having to store things in a sorted order or update indexes to specify the order. If you truly don’t care about the order, and you will always be reading all rows from a table and then truncating the table, then it’s better not to have the overhead of having indexes on the table.