Basics-Wrapup

View on GitHub

indexing in Databases

Sequential File Organization or Ordered Index File

In this, the indices are based on a sorted ordering of the values. These are generally fast and a more traditional type of storing mechanism. These Ordered or Sequential file organization might store the data in a dense or sparse format:

and dense Index means data is ordered or listed in one place near each other

Hash File organization

Indices are based on the values being distributed uniformly across a range of buckets. The buckets to which a value is assigned is determined by a function called a hash function.

There are primarily three methods of indexing:

cluster_index

this type of indexes involves

Non-Clustered or Secondary Indexing

A non clustered index just tells us where the data lies, i.e. it gives us a list of virtual pointers or references to the location where the data is actually stored. Data is not physically stored in the order of the index. Instead, data is present in leaf nodes. For e.g. the contents page of a book. Each entry gives us the page number or location of the information stored. The actual data here(information on each page of the book) is not organized but we have an ordered reference(contents page) to where the data points actually lie. We can have only dense ordering in the non-clustered index as sparse ordering is not possible because data is not physically organized accordingly. It requires more time as compared to the clustered index because some amount of extra work is done in order to extract the data by further following the pointer. In the case of a clustered index, data is directly present in front of the index.

indexing3

this non-clustered index involved

Multilevel Indexing

With the growth of the size of the database, indices also grow. As the index is stored in the main memory, a single-level index might become too large a size to store with multiple disk accesses. The multilevel indexing segregates the main block into various smaller blocks so that the same can stored in a single block. The outer blocks are divided into inner blocks which in turn are pointed to the data blocks. This can be easily stored in the main memory with fewer overheads.

Untitled-Diagram-41

this process involves dividing the main index file into smaller indices in order to reduce the memory overhead by nesting pointers accordingly to keep index access speed with lower memory overhead

Unique and non-unique indexes

Unique indexes are indexes that help maintain data integrity by ensuring that no two rows of data in a table have identical key values. When you create a unique index for an existing table with data, values in the columns or expressions that comprise the index key are checked for uniqueness. If the table contains rows with duplicate key values, the index creation process fails. When a unique index is defined for a table, uniqueness is enforced whenever keys are added or changed within the index. This enforcement includes insert, update, load, import, and set integrity, to name a few. In addition to enforcing the uniqueness of data values, a unique index can also be used to improve data retrieval performance during query processing.

Non-unique indexes are not used to enforce constraints on the tables with which they are associated. Instead, non-unique indexes are used solely to improve query performance by maintaining a sorted order of data values that are used frequently.

Including and excluding NULL keys

Unique and non-unique indexes can be created so that a key is not inserted into the index object when all columns or expressions of the key are null. Excluding null keys can result in improved storage and performance optimization for cases where you do not want queries to access data associated with null keys. For unique indexes, the enforcement of the uniqueness of table data ignores rows where the index key is null.

Differences between primary key or unique key constraints and unique indexes

It is important to understand that there is no significant difference between a primary key or unique key constraint and a unique index. To implement the concept of primary and unique key constraints, the database manager uses a combination of a unique index and the NOT NULL constraint. Therefore, unique indexes do not enforce primary key constraints by themselves because they allow null values. Although null values represent unknown values, when it comes to indexing, a null value is treated as being equal to other null values.

Therefore, if a unique index consists of a single column, only one null value is allowed-more than one null value would violate the unique constraint. Similarly, if a unique index consists of multiple columns, a specific combination of values and nulls can be used only one time.

Both clustered and non-clustered indexes contain only keys and record identifiers in the index structure. The record identifiers always point to rows in the data pages. With clustered indexes, the database manager attempts to keep the data in the data pages in the same order as the corresponding keys in the index pages. Thus the database manager attempts to insert rows with similar keys onto the same pages. If the table is reorganized, data is inserted into the data pages in the order of the index keys. The database manager does not maintain any order of the data when compared to the order of the corresponding keys of a non-clustered index.

Bidirectional indexes

Bidirectional indexes allow scans in both the forward and reverse directions.

The ALLOW REVERSE SCANS clause of the CREATE INDEX statement enables both forward and reverse index scans. That is, in the order that is defined at index creation time and in the opposite, or reverse order. With this option, you can: