distributed c5

Created by

jo Higan

Cards (32)

Synchronization 
How processes can coordinate and cooperate to access shared resources or agree on the ordering of events
Synchronization in distributed systems is often much more difficult compared to synchronization in uniprocessor or multiprocessor systems
Synchronization based on actual time
1. Discussion of the issue
2. Synchronization in which only relative ordering matters rather than ordering in absolute time
Clock synchronization 
Achieving agreement on time in a distributed system
In a centralized system, time is unambiguous
In a distributed system, achieving agreement on time is not trivial
Physical clocks 
Quartz crystal oscillators that generate clock ticks at a well-defined frequency
With a single computer and a single clock, it does not matter much if the clock is off by a small amount
As soon as multiple CPUs are introduced, each with its own clock, the situation changes radically
Clock skew 
Difference in time values between clocks on different machines
External physical clocks are needed in some systems where the actual clock time is important
Logical clocks 
Clocks that track the ordering of events rather than absolute time
Happens-before relation 
Relation that captures the ordering of events in a distributed system
Lamport's logical clock algorithm
1. Assign logical timestamps to events based on the happens-before relation
2. Ensure that if event a happens before event b, then a's timestamp is less than b's timestamp
Vector clocks 
Clocks that capture causality between events, allowing more precise reasoning about the ordering of events
Vector clocks enable ensuring that a message is delivered only if all messages that causally precede it have also been received
Mutual exclusion 
Granting exclusive access to shared resources to prevent concurrent corruption or inconsistency
Distributed mutual exclusion algorithms
Token-based solutions
Permission-based solutions
Token-based solutions can fairly easily ensure that every process will get a chance at accessing the resource, and can easily avoid deadlocks
The main drawback of token-based solutions is that when the token is lost, an intricate distributed procedure is needed to ensure a new token is created
Token-based solutions 
They can fairly easily ensure that every process will get a chance at accessing the resource, avoiding starvation
They can easily avoid deadlocks by which several processes are waiting for each other to proceed, contributing to their simplicity
The main drawback is that when the token is lost (e.g., because the process holding it crashed), an intricate distributed procedure needs to be started to ensure that a new token is created, but above all, that it is also the only token
Permission-based approach 
A process wanting to access the resource first requires the permission of other processes
Centralized algorithm 
1. Process sends request message to coordinator
2. Coordinator grants permission if no other process is accessing the resource
3. When process is finished, it sends release message to coordinator
4. Coordinator grants permission to next queued request
The centralized approach has shortcomings: the coordinator is a single point of failure, processes cannot distinguish a dead coordinator from "permission denied", and the coordinator can become a performance bottleneck
Decentralized algorithm 
1. Each resource is replicated n times, each with its own coordinator
2. Process needs to get a majority vote from m>n/2 coordinators to access the resource
3. When a coordinator crashes, it recovers quickly but forgets any previous votes, potentially granting permission to another process
Distributed algorithm 
1. Process sends request message with resource name, process number, and timestamp to all other processes
2. If receiver is not accessing the resource and does not want to, it sends OK message
3. If receiver is accessing the resource, it queues the request
4. If receiver wants to access the resource as well, it compares timestamps and sends OK if incoming message has lower timestamp
5. Process waits for all permissions before accessing the resource, and sends OK messages to queued processes when finished
Token ring algorithm 
1. Processes are arranged in a logical ring, with a token circulating among them
2. If a process is handed the token and is not interested in the resource, it just passes the token along
3. When no processes need the resource, the token just circulates at high speed around the ring
The centralized algorithm is simplest and most efficient, requiring only 3 messages to enter and leave a critical region
The decentralized algorithm requires m attempts, each with 3m messages
The distributed algorithm requires 2(n-1) messages
The token ring algorithm has a variable number of messages, from 1 per critical region entry to unbounded if the token circulates without anyone needing it
The centralized algorithm has the shortest delay to enter a critical region, taking only 2 message times, while the decentralized algorithm takes 3mk times and the distributed algorithm takes 2(n-1) message times
The token ring algorithm has a delay varying from 0 (token just arrived) to n-1 (token just departed)
All algorithms except the decentralized one suffer badly from crashes, requiring special measures to avoid bringing down the entire system
The decentralized algorithm is less sensitive to crashes, but processes may suffer from starvation and special measures are needed to guarantee efficiency
Geometric overlay networks 
Each node is given a position in an m-dimensional geometric space, such that the distance between two nodes reflects a real-world performance metric like internode latency
Geometric overlay networks enable efficient execution of distributed algorithms like routing, multicasting, data placement, and searching, by using the geometric positions to make local decisions