Understanding B-Tree Indexes in PostgreSQL, Ineffective for Low Cardinality Columns
Follow up to Unlocking the Power of PostgreSQL Indexes.
Indexes are essential for improving the performance of your PostgreSQL queries. They allow the database to find and retrieve specific rows much faster than without them. In this article, we will delve into the internal structure of PostgreSQL indexes to understand how they work at a low level and explain why PostgreSQL might not use an index in cases of low cardinality.
The Structure of PostgreSQL Indexes
B-Tree Indexes
The most common type of index in PostgreSQL is the B-Tree index. A B-Tree index is a balanced tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time.
Lets see in abstract what happens under the hood when the query gets executed:
SELECT * FROM names WHERE name = 'name250000';
(1) The Query Rolls Down the B-Tree (O(log n) Time Complexity):
When you execute a query like SELECT * FROM names WHERE name = 'name250000';
, PostgreSQL starts at the root node of the B-Tree index and traverses down to the appropriate leaf node. The root node, the top node of the B-Tree, contains pointers to internal nodes. These internal nodes act as intermediaries between the root and leaf nodes and contain pointers to either other internal nodes or leaf nodes.
(2) Finding the Leaf and Value within the Index Page
Once the traversal reaches the correct leaf node, PostgreSQL scans the leaf node to find the index entry corresponding to name250000
.
(3) Using the Pointer to Fetch the Exact Table Row
The index entry provides a pointer to the exact location of the row in the heap table. PostgreSQL then uses this pointer to fetch the row from the heap page.
Why PostgreSQL Might Not Use an Index for Low Cardinality Columns
Low cardinality columns have low selectivity, meaning many rows share the same value. For example, if we have an idx_category
index, the value category1
will point to multiple rows in the actual table, making the index less useful.
PostgreSQL would have to iterate over all these pointers, which can be a lot, making it more efficient to perform a sequential scan instead.
SELECT * FROM names WHERE category = 'category1';
Conclusion
Understanding the internal structure of PostgreSQL indexes provides insight into how the database optimizes query performance. By knowing how B-Tree indexes and page layouts work, you can better appreciate the efficiency of indexed queries and the importance of maintaining up-to-date statistics with ANALYZE.
Indexes are powerful tools, but they also come with costs in terms of disk space and performance overhead for write operations. Always consider the cardinality and access patterns of your data when creating indexes. In cases of low cardinality, PostgreSQL might choose not to use an index due to the lower selectivity and higher efficiency of a sequential scan.