Topics related to distributed systems
Hardware Concepts
Software Concepts
Design Issues
Kernels
memory management
Distributed shared memory
Transaction management
Communication
IPC
Real-time distributed
systems
Processes
process management
Naming
Synchronization
distributed synchronization
Consistency
Replication
Fault tolerance
Fault-tolerant distributed systems
ATM networks
File systems
Distributed File Systems
Distributed file systems (with NFS and X.500)
Distributed Object-Based Systems
Object-based operating systems
Distributed Document-Based Systems
Distributed Coordination-Based Systems
Security
distributed
security
Middleware models.
How distributed systems
are designed and implemented in real systems.
Case studies of Distributed Operating Systems
Amoeba
Clouds
Mach
Chorus
JavaOS™
OSF/DCE.
CORBA™
OMG's Common Object Request Broker Architecture (CORBA) standard
MICOSec, an open source implementation of the CORBA
specification
DCOM™
Microsoft's DCOM
NFS
NFS v4
LDAP
X.500
Kerberos
RSA
DES
SSH
NTP
Real-world examples and case
studies,
CORBA
DCOM
Jini
World Wide Web.
Implementation
Sockets
RPC
Threads
Implementation of distributed algorithms using these tools.
Fundamental Concepts (Transparency, Service, and Coordination)
-
Chapter 1
introduces a classification of centralized operating system, network
operatingsystem, distributed operating system, and cooperative
autonomous systems, usingthe key characteristics of virtuality,
interoperability, transparency and autonomicity,respectively, for each
system. It illustrates the evolution that led to the development
ofmodern distributed operating systems and explains the emerging need
for distributedsoftware and the importance of distributed coordination
algorithms.
-
Chapter 2 begins the
discussion of distributed operating systems. It presents theconcepts of
transparency and services. Distributed systems and their
underlyingcommunication architectures are introduced. The chapter
concludes with a list ofmajor system design issues that establishes an
order for the presentation of thesubsequent chapters.
Distributed Processes (Synchronization, Communication, andScheduling)
-
Chapter 3
describes concurrent processes and programming. It defines processes
andthreads and shows how their interaction can be modeled by using some
fundamentalconcepts such as a graph, a logical clock, and the client
and server model. Both sharedmemory and message passing for
synchronization and communication are addressed.They are presented
along with the development of concurrent language constructs.A taxonomy
of these language mechanisms and their implementation is given.
Thischapter presents an integrated view of synchronization and
communication.
-
Chapter 4 extends the
discussion of process interaction from synchronization tocommunication
and to distributed process coordination using message passing
communication.Three communication models, message passing (socket),
request/reply(RPC), and transaction communication, are presented. A
special emphasis is placedon group communication and coordination. Two
classical distributed coordinationproblems, mutual exclusion and leader
election using message passing interprocesscommunication, are
introduced. These problems are further studied in Chapters 10 and11 in
Part II of the textbook. The chapter also includes a presentation of
name service,an essential facility for communication in distributed
systems.
-
Chapter 5 turns to the third
process management issue, that of process scheduling. Theeffect of
communication on both static and dynamic process scheduling is
emphasized.The chapter describes distributed computation through
dynamic redistribution ofprocesses by using remote execution and
process migration techniques. It also addressesseveral unique issues in
real-time scheduling.
Distributed Resources (Files and Memory)
-
Chapter 6
discusses the distributed implementation of file systems, the first of
thetwo important distributed resources: files and memory. It
demonstrates the use of theconcept of transparency and service in the
design of distributed file systems. Twomajor implementation issues,
data caching and file replication, are discussed in thischapter. The
chapter also covers distributed transactions as part of the file
service.Since management of replicated data touches upon both data and
communication,two central issues in distributed systems, it is further
detailed in Chapter 12.
-
Chapter 7
covers distributed shared memory systems that simulate a logical
sharedmemory on a physically distributed memory system. The issues
studied are coherenceand consistency of data due to memory sharing. The
chapter describes implementationstrategies for different memory
consistency requirements. It also shows the significanceof the
object-based data sharing models.
-
Chapter 8
addresses unique security issues in network and distributed
environments.These issues are divided into two areas: authorization and
authentication. Authorizationincludes the study of distributed access
and flow control models. Authenticationcovers cryptography and its
applications for mutual authentication and key distributionprotocols.
Implementations of some security features in modern systems
areillustrated.
Part II of the textbook discusses distributed
algorithms. The discussion is pragmaticand is intended to give the
reader a solid understanding of common problems andsolution techniques.
The topics are organized in five chapters.
Distributed Algorithms
-
Chapter 9:
introduces the concepts of time and global states in a distributed
system.The fundamental problem of distributed algorithms is a lack of a
global clock and aglobal state. Recent research on vector time and
distributed predicates has developedunified models for thinking about
distributed time and the distributed state. Thischapter presents the
concepts of causality, vector timestamps, and global states.
Thealgorithms for implementing these concepts are presented. The
connections betweenthe different models are explored. Finally, a model
for proving the correctness ofdistributed algorithms is presented.
-
Chapter 10:
covers distributed synchronization and distributed election. While
thedistributed synchronization algorithms are not considered pragmatic,
they illustrateimportant algorithm design techniques. For example,
voting algorithms for replicateddata management are foreshadowed in
Maekawa's algorithm, and the Chang-Singhal-Liualgorithm illustrates the
ideas behind distributed shared memory (and distributedobject)
algorithms. The chapter concludes with algorithms for electing a
computationleader. Election is a critical component of many systems.
The invitation algorithmviiin particular is a prototype for handling
failures in an asynchronous system andforeshadows the group view
maintenance algorithms of Chapter 12.
-
Chapter 11
discusses the abstract distributed agreement problem. First,
Byzantineagreement is discussed. Next, the Fischer-Lynch-Paterson (FLP)
result that no algorithmsolves distributed agreement problems in an
asynchronous system is coveredin detail. This is the appropriate point
to introduce the FLP result, because the nextchapter covers replicated
data management and must solve distributed agreement inasynchronous
systems. The FLP result leaves open three ways to achieve
distributedagreement in an asynchronous system: hope that it happens,
use relative agreement,or use a randomized algorithm. The chapter
discusses these implications of the FLPresult and concludes with some
randomized agreement protocols.
-
Chapter 12
covers replicated data management. Since providing replicated servers
reducesto replicating the state of the servers, this section also
discusses the problems andconcepts of replication. We cover three main
approaches: the transaction approach,the reliable multicast approach,
and the log propagation approach. The transactionapproach includes
discussion of two-phase commit, three-phase commit,
one-copyserializability, voting, and dynamic voting protocols. The
reliable multicast approachincludes discussion of virtual synchrony,
algorithms for implementing reliable andcausal multicast, algorithms
for totally ordered multicast, and consistent multicastgroup
maintenance algorithms. The log propagation approach covers naive log
propagation,epidemics, and causal log propagation. This chapter is the
culmination of PartII of the text and draws together the results
presented in previous chapters.
-
Chapter 13
covers distributed rollback and recovery. These techniques are critical
forimplementing fault-tolerant systems and are complimentary to the
replicated datamanagement techniques of the previous chapter. By using
the theory developed in theprevious chapters (especially Chapter 9),
different rollback and recovery algorithmsare presented in a unified
manner and are related to algorithms discussed previously.
More topics
http://nereida.deioc.ull.es/html/programming.html
Beowulf
http://nereida.deioc.ull.es/~cicyt/llbeowulf.html
http://nereida.deioc.ull.es/html/beowulf.html
HPC Java programming
http://nereida.deioc.ull.es/html/java.html
Distributed HPC
http://nereida.deioc.ull.es/~cicyt/welcome.html
Distributed Computing
http://en.wikipedia.org/wiki/Distributed_computing
Java DC
http://www.unix.org.ua/orelly/java-ent/dist/index.htm
Parallel Programming
http://nereida.deioc.ull.es/html/pram.html
Shared Memory
http://nereida.deioc.ull.es/html/openmp.html
HP Fortran
http://nereida.deioc.ull.es/html/hpf.html
Message Passing
http://nereida.deioc.ull.es/html/mppm.html
http://en.wikipedia.org/wiki/Message_passing
Parallel computers
http://nereida.deioc.ull.es/html/models.html
http://nereida.deioc.ull.es/html/operational_r.html
Journals
http://nereida.deioc.ull.es/html/conferences.html#journals
HPC European Research
http://www.hpc-europa.org/index.php?section=Transnational
Communication primitives:
Process concept, process interactions; message pasing: primitives,
synchronisation; channels: naming problems, mail box, ports, connections; rendezvous.
Distributed processing:
Basic concepts; client/server model; remote procedure call;
partitioning and configuration; MPI standard.
Transaction processing:
Decomposable abstrat operations; resource allocation; transactions;
concurrency; distributed transactions; international standards.
http://www.psc.edu/general/education/Lecture_List.html
Introduction to the Cray XT3
http://www.psc.edu/training/XT3_Aug05/
http://charm.cs.uiuc.edu/patHPC/