Tools: Clustered vs Non-Clustered Index in SQL(Complete Guide with Examples)

Tools: Clustered vs Non-Clustered Index in SQL(Complete Guide with Examples)

Source: Dev.to

1. Introduction: Why Do We Need Indexes? ## 2. The Foundation: Data Pages and The Heap Structure ## Data Pages: ## The Heap Structure (No Index) ## 3. The Clustered Index ## Physical Sorting: ## The B-Tree Logic: ## Ideal Candidate: ## 4. The Non-Clustered Index ## The Pointer Structure ## The B-Tree Logic ## 5. Direct Comparison: Clustered vs. Non-Clustered ## 6. SQL Syntax and Implementation ## Creating a Clustered Index: ## Creating a Non-Clustered Index: Speed and efficiency are key when it comes to querying a database. An index is a data structure that acts as a guide for your database, giving you quick access to a particular data set to speed up your queries significantly. To understand indexes, let’s consider a couple of analogies. Before we move on to indexes, we should know how SQL stores data logically as well as physically. When you create a new table in your SQL database and start adding data to it, SQL doesn’t store data as a spreadsheet; instead, it stores data in data files on your hard disk, divided into fixed-size 8-Kilobyte blocks called “Pages.” A Data Page Consists Of: If you don't create an index on your table, your data is stored in a Heap Structure. Write Logic: When new data is added to your table, SQL just throws it onto the next available Data Page in the order it was written. They're not sorted. Read Logic (Full Table Scan): Because your data is unsorted, when you want to read a particular record, say Customer ID 14, SQL doesn't know where it is. It has to perform a Full Table Scan, reading every single Data Page until it finds your record. Trade-off: Fast write speed (no sorting necessary), terrible read speed. A Clustered Index changes everything about how your data is stored on your hard disk. A Clustered Index on a particular column (e.g., User_ID) causes SQL to physically sort ALL existing data pages on that column from lowest to highest. SQL uses a default data structure called a B-Tree (Balanced Tree) to navigate this sorted data: Characteristics: Limit: You can only have ONE Clustered Index per table because your data can only be sorted in ONE way. Primary Keys are perfect for Primary Keys. They are unique, and they are never updated. (Updating a clustered index column requires the database to move the row to maintain the sort order, which is very resource intensive). A Non-Clustered Index makes reads faster but never changes the physical order of the underlying table. When you create a non-clustered index, SQL leaves the actual Data Pages exactly as they are, whether they are a Heap or sorted by a clustered index. It simply creates an entirely separate B-Tree structure. Leaf Nodes: Unlike the clustered index, these leaf nodes are Index Pages, not Data Pages. These leaf nodes are simply a list of the actual index values, along with a Row Identifier (RID) [00:17:42]. This RID is an exact address, like File ID, Page Number, and then an actual Offset, which points to where the actual row lives in the actual table. Navigation: To retrieve your actual row, SQL uses the non-clustered index B-Tree, navigates down to the leaf node, finds the RID, and then makes one "jump" to the actual Data Page to retrieve the actual row. Creating indexes in SQL Server uses straightforward syntax. By default, if Primary Keys are defined, SQL automatically creates a clustered index for them and uses B-Tree as default data structure. CREATE CLUSTERED INDEX idx_customers_id ON sales.customers (customer_id); CREATE NONCLUSTERED INDEX idx_customers_lastname ON sales.customers (last_name); (Note: If you do not specify "NONCLUSTERED", SQL Server defaults to Non-Clustered.) I have another post in which I have talked about how to achieve best unique identifier key for your table's cluster index for the fastest Read/Write. Click Me to redirect to the relevant post. Templates let you quickly answer FAQs or store snippets for re-use. Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment's permalink. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse - The Book: You don’t want to go through each page of a book to find a particular chapter or data; instead, you look in the back of the book to find that chapter quickly. - The Hotel: You don’t want to knock on every hotel room on every floor to find room 5001; instead, you look at a map in the lobby to quickly locate that room. - Page Header: This contains metadata related to that particular page. - Data Rows: This contains your actual data. The amount of data that you can store here is determined by your data size. - Offset Array: This is a small internal map that quickly tracks where your data starts on that particular page. - Root Node (Top): An index page that contains a pointer to intermediate pages. - Intermediate Nodes: Index pages that contain a pointer to a particular set of data (e.g., "IDs 1-10 go left, IDs 11-20 go right"). - Leaf Nodes (Bottom): In a Clustered Index, this is your actual Data Page. - Limit: You may have multiple non-clustered indexes on one single table. - Ideal Candidate: Columns you often use in your WHERE clause, like "Last_Name," or often use in your JOIN clause.