B+trees are self-balancing while BST are not balancing => can lead to a very tall tree
B+trees are much shorter than other balanced binary tree structures such as AVL => fewer disk access.
If we want to retrieve a range of entries in order by key, B+trees is much more efficient than AVL, BST or
Hash table, because each node contains a large number of key with sorted key.
Support full scan: The leaf nodes of B+ trees are linked, so a full scan of all objects in a tree requires
just one linear pass through all the leaf nodes.
Inner node of B+Tree does not have a pointer to data => more keys can be packed in a B+Tree node => reducing
the tree’s height.
Only index on isolating columns. should not be part of an expression or be inside a function in the query
mysql> SELECT id FROM student WHERE id + 1 = 5;
selectivity = number of distinct indexed values / number of rows in table. A highly selective index is good
because it lets DBMS filter out more rows when it looks for matches. Trade-off:
more characters => higher selectivity, more space.
less characters => less selectivity, less space.
Use Covering index: indexes that contain all the data needed to satisfy a query
e.g., if we have a constraint, all customer accounts must have a positive balance. If a transaction brings a customer account into a negative balance, that transaction will be rolled back
Isolation: 2 concurrent transactions are isolated from each other. Each transaction can pretend that it is the
only transaction running on the entire DB.
This highest isolation level is rarely used because of the performance expense.
mysql: default isolation level == repeatable read
Durability: guarantees that once a transaction has been committed, it will remain committed even in the case
of a system failure (e.g., power outage or crash).
How Database Management Systems (DBMS) store data on a hard disk?
How DBMS manages its memory (RAM)?
What is index? How to implement indexes? Best practice for creating indexes?
SQL fix-sized = 8kb. 1 row: column text 7KB => where to store?
What is a database index?
Why don’t we index every column to support fast read?
How do Hash Indexes work?
What are its disadvantages?
Compare B+Tree and B-Tree?
given B+Tree index (C1, C2, C3), can B+Tree support query on C2?
Compare B+Tree index and Hash index?
Should we index boolean column?
selectivity = 2 / #rows => should not.
What happens to index when database machine crashes?
DBMS use an additional data structure on disk called write-ahead log (WAL):
append-only file which every B+Tree index modification must be written before it can be applied
append-only => fast
When the database resumes after a crash, this log is used to restore B+Tree.