Transactions in Databases: Simplifying Theory and Practice
Written on
Understanding Transactions in Databases
Welcome back to our insightful series, where we simplify and clarify core database concepts. In our last discussion, we delved into partitioning, successfully implementing hash-based partitioning in our key-value store. This development significantly improved our system's scalability and fault tolerance.
As we approach the conclusion of our database project and this article series, we now turn our attention to another vital topic: transactions.
So far, we haven't tackled the problems that arise when multiple threads attempt to write to the same storage location concurrently or examined the consequences of server failures during write processes. These scenarios lead us to the essential concept of transactions. While relational databases frequently incorporate transactional mechanisms, many NoSQL databases do not. This discrepancy prompts a deeper investigation into the benefits and drawbacks of transaction-based systems.
Before diving deeper, let's consider a video that illustrates the importance of transactions in modern databases.
The Complexity of Transactions
As we examine transactions, we'll also touch on the intricacies of distributed transactions, particularly relevant to our context where data is partitioned across multiple machines. Understanding transactions is crucial for appreciating how databases uphold data integrity and consistency, even during concurrent operations and system failures.
Challenges Faced by Databases
Databases encounter various challenges, including software bugs, hardware malfunctions, partial client operations due to crashes, network issues, and concurrent writes that may overwrite data. To tackle some of these challenges, our simplified database model utilizes a locking mechanism that secures the entire index and file system.
impl KeyValueStoreImpl {
pub async fn new(index_engine: Box<dyn IndexEngine>, config_manager: Box<ConfigManager>) -> Self {
let (tx, rx) = channel::(1000);
let config_manager = Arc::new(Mutex::new(config_manager));
start_replicator(rx, config_manager.clone()).await;
Self {
index_engine: Mutex::new(index_engine),
tx,
}
}
}
#[tonic::async_trait]
impl KeyValueService for KeyValueStoreImpl {
async fn create(&self, request: Request<CreateRequest>) -> Result<Response<CreateResponse>, Status> {
let key_value = request.into_inner().key_value
.ok_or_else(|| ServerError::InvalidArgument("key_value must be set".to_string()))?;
let replication = Replication {
action: Action::Add,
key_value: key_value.clone(),
};
let key_val = key_value.key.ok_or_else(|| ServerError::InvalidArgument("key must be set".to_string()))?;
let key_bytes = key_val.encode_to_vec();
let value_val = key_value.value.ok_or_else(|| ServerError::InvalidArgument("value must be set".to_string()))?;
let value_bytes = value_val.encode_to_vec();
// lock entire index and file system
let mut index_engine = self.index_engine.lock().await;
index_engine.insert(Document {
id: key_bytes,
value: value_bytes,
}).map_err(ServerError::from)?;
self.send_replication(replication).await;
let reply = CreateResponse {
key_value: Some(KeyValue {
key: key_val.into(),
value: value_val.into(),
})
};
Ok(Response::new(reply))
}
}
This strategy effectively mitigates race conditions but may impact performance. For instance, Redis employs a single-threaded model and achieves impressive speed, indicating that performance can be enhanced through partial locking or alternative concurrency management techniques.
ACID Principles and Isolation Levels
Transactions serve as a fundamental solution to these challenges. They are typically viewed as mechanisms for executing a series of operations either in full or not at all, thereby ensuring consistency—where no constraints are violated—and isolation, which prevents interference between threads. Upon successful execution, transactions are retained indefinitely, adhering to the ACID principles: Atomicity, Consistency, Isolation, and Durability.
Isolation is particularly important, as it encompasses various levels that address different anomalies. For instance, a dirty read occurs when a transaction retrieves data from an uncommitted transaction, resulting in potential inconsistencies if that transaction is rolled back. Non-repeatable reads happen when a transaction reads the same data twice but receives different outcomes due to changes made by another transaction. Phantom reads arise when a transaction finds differing results upon revisiting a set of rows, owing to modifications by another transaction.
Being aware of these isolation levels is crucial, especially since many clients default to 'read committed', which may lead to anomalies if not carefully managed.
Mitigating Concurrency Challenges: Locks and Versioning
To tackle concurrency issues in databases, such as dirty writes, several techniques can be employed. Locks at the row level are effective in preventing dirty writes. However, utilizing locks to address dirty reads can be inefficient, which leads to the consideration of versioning strategies. Versioning entails the database retaining old values until a new transaction commits, ensuring that only the most recent values are accessed thereafter.
Snapshot isolation is a key concept in addressing concurrency issues. This approach involves creating a consistent snapshot of the database for all read operations, ensuring that modifications made during a transaction are not visible until committed, thus avoiding the aforementioned anomalies. Typically, this entails locking during write operations but not for reads. The underlying principle is that readers and writers do not block each other, known as multi-version concurrency control (MVCC).
To illustrate snapshot isolation, consider a scenario where an individual has two bank accounts, each with $500. If they transfer $100 from one account to the other, without MVCC, Transaction 100 might read Account 1's balance as $500, and Transaction 101 might then execute the transfer. Later, Transaction 100 could read Account 2's balance as $400, not accounting for the transfer. MVCC prevents this by ensuring Transaction 100 reads the account versions as they existed at the start of the transaction.
This principle is reflected in our implementation, where an update sets an xmax (the ID of the transaction that deleted the row) and creates a new row.
Client-Side Anomalies
While our focus has primarily been on server-side issues in databases, it is crucial to also consider client-side anomalies such as lost updates and write skews. Understanding these phenomena is vital for a comprehensive view of database interactions.
A lost update occurs when a client reads a value, modifies it, and then writes it back to the server. This problem can be mitigated through several approaches. One method involves performing the modification directly on the server, such as using a command like:
UPDATE accounts SET balance = balance + 100 WHERE id = 1;
Another method uses an exclusive lock on the data being read and written:
SELECT * FROM account WHERE id = 1 FOR UPDATE;
UPDATE account SET balance = ? WHERE id = 1;
Alternatively, a versioning technique similar to optimistic locking can be effective, for instance:
UPDATE accounts SET balance=?, version = version + 1 WHERE version=?;
It's essential to check if the rows are empty post-operation, as this method may not inherently fail even if issues arise.
Write skews, akin to lost updates, involve multiple objects. For example, consider an on-call schedule with a rule that mandates at least one admin must always be on duty. If two clients initiate transactions simultaneously and each reads that two admins are available, both might decide to take one admin off duty, resulting in a situation where no admin is available, thus violating business logic.
Unlike lost updates, our usual techniques with versioning and direct row modifications are less effective in this scenario due to the involvement of multiple objects. The solution here leans towards using locks and constraints to ensure consistency and compliance with business rules.
The Case for Serialization
In addressing concurrency challenges, one might ponder: why not eliminate concurrency altogether? This approach, represented by single-threaded servers like Redis, is indeed valid, especially given the speed of modern CPUs. Unlike the past necessity for multithreaded servers, a single-threaded server, particularly when scaled across multiple partitions, can be quite effective.
Beyond the simplicity of a single write lock, we can contemplate a two-phase locking mechanism, specifically a read-write lock. This system allows multiple transactions to acquire a read lock concurrently, as they do not modify data. However, for a transaction to write, all read locks must be released before it can obtain an exclusive write lock. This method is designed to optimize read operations, given their prevalence over writes in many applications.
While read-write locks can lead to performance issues, we have options such as serializability, which offers moderate scalability, and weak isolation levels, which provide better performance but carry the risk of race conditions. Another alternative is Serializable Snapshot Isolation (SSI), an optimistic concurrency control mechanism. Here, "optimistic" means the system proceeds with transactions despite potential risks, verifying isolation compliance only at the transaction's commitment stage.
Imagine a database as a library filled with people reading books (data). Upon entering, the librarian hands you a magical photograph representing the library's state at that moment. As you use this snapshot to find books, the actual library continues to change around you, but your photograph remains unchanged. This snapshot allows you to work undisturbed by ongoing modifications.
Similarly, initiating a transaction in a database with snapshot isolation provides you with a consistent view of the data as it was at the start of your transaction. You operate on this snapshot, unaffected by other concurrent transactions.
However, the database, like our librarian, must monitor changes that could impact your transaction. If you wish to "borrow a book" (modify data) based on your snapshot, but the actual data has been altered by someone else, the database will notify you of this conflict when you attempt to commit your transaction.
The database checks for two key factors:
- If the data you're reading has been modified by another transaction.
- If the data you're modifying has been accessed or altered by others since your snapshot.
If your transaction passes these checks, you can successfully commit your changes. Otherwise, you may need to restart due to significant changes since your snapshot was captured.
In summary, SSI allows multiple transactions to concurrently read and write without interference, maintaining transactional integrity and minimizing conflicts. It strikes a balance between operational efficiency and ensuring coherent, non-conflicting transactions.