Topics in distributed systems

Topics related to distributed systems

Hardware Concepts

Software Concepts

Design Issues

Kernels

memory management
    Distributed shared memory

Transaction management

Communication
    IPC

Real-time distributed systems

Processes
    process management

Naming

Synchronization
    distributed synchronization

Consistency

Replication

Fault tolerance
Fault-tolerant distributed systems
ATM networks

File systems
    Distributed File Systems
    Distributed file systems (with NFS and X.500)

Distributed Object-Based Systems
    Object-based operating systems

Distributed Document-Based Systems

Distributed Coordination-Based Systems

Security
    distributed security

Middleware models.

How distributed systems are designed and implemented in real systems.

Case studies of Distributed Operating Systems
Amoeba
Clouds
Mach
Chorus
JavaOS™
OSF/DCE.
CORBA™
    OMG's Common Object Request Broker Architecture (CORBA)     standard
    MICOSec, an open source implementation of the CORBA
    specification

DCOM™
    Microsoft's DCOM

NFS
    NFS v4

LDAP

X.500

Kerberos

RSA

DES

SSH

NTP

Real-world examples and case studies,

CORBA
DCOM
Jini
World Wide Web.

Implementation
Sockets
RPC
Threads
Implementation of distributed algorithms using these tools.

Fundamental Concepts (Transparency, Service, and Coordination)

Chapter 1 introduces a classification of centralized operating system, network operatingsystem, distributed operating system, and cooperative autonomous systems, usingthe key characteristics of virtuality, interoperability, transparency and autonomicity,respectively, for each system. It illustrates the evolution that led to the development ofmodern distributed operating systems and explains the emerging need for distributedsoftware and the importance of distributed coordination algorithms.
Chapter 2 begins the discussion of distributed operating systems. It presents theconcepts of transparency and services. Distributed systems and their underlyingcommunication architectures are introduced. The chapter concludes with a list ofmajor system design issues that establishes an order for the presentation of thesubsequent chapters.

Distributed Processes (Synchronization, Communication, andScheduling)

Chapter 3 describes concurrent processes and programming. It defines processes andthreads and shows how their interaction can be modeled by using some fundamentalconcepts such as a graph, a logical clock, and the client and server model. Both sharedmemory and message passing for synchronization and communication are addressed.They are presented along with the development of concurrent language constructs.A taxonomy of these language mechanisms and their implementation is given. Thischapter presents an integrated view of synchronization and communication.
Chapter 4 extends the discussion of process interaction from synchronization tocommunication and to distributed process coordination using message passing communication.Three communication models, message passing (socket), request/reply(RPC), and transaction communication, are presented. A special emphasis is placedon group communication and coordination. Two classical distributed coordinationproblems, mutual exclusion and leader election using message passing interprocesscommunication, are introduced. These problems are further studied in Chapters 10 and11 in Part II of the textbook. The chapter also includes a presentation of name service,an essential facility for communication in distributed systems.
Chapter 5 turns to the third process management issue, that of process scheduling. Theeffect of communication on both static and dynamic process scheduling is emphasized.The chapter describes distributed computation through dynamic redistribution ofprocesses by using remote execution and process migration techniques. It also addressesseveral unique issues in real-time scheduling.

Distributed Resources (Files and Memory)

Chapter 6 discusses the distributed implementation of file systems, the first of thetwo important distributed resources: files and memory. It demonstrates the use of theconcept of transparency and service in the design of distributed file systems. Twomajor implementation issues, data caching and file replication, are discussed in thischapter. The chapter also covers distributed transactions as part of the file service.Since management of replicated data touches upon both data and communication,two central issues in distributed systems, it is further detailed in Chapter 12.
Chapter 7 covers distributed shared memory systems that simulate a logical sharedmemory on a physically distributed memory system. The issues studied are coherenceand consistency of data due to memory sharing. The chapter describes implementationstrategies for different memory consistency requirements. It also shows the significanceof the object-based data sharing models.
Chapter 8 addresses unique security issues in network and distributed environments.These issues are divided into two areas: authorization and authentication. Authorizationincludes the study of distributed access and flow control models. Authenticationcovers cryptography and its applications for mutual authentication and key distributionprotocols. Implementations of some security features in modern systems areillustrated.
Part II of the textbook discusses distributed algorithms. The discussion is pragmaticand is intended to give the reader a solid understanding of common problems andsolution techniques. The topics are organized in five chapters.

Distributed Algorithms

Chapter 9: introduces the concepts of time and global states in a distributed system.The fundamental problem of distributed algorithms is a lack of a global clock and aglobal state. Recent research on vector time and distributed predicates has developedunified models for thinking about distributed time and the distributed state. Thischapter presents the concepts of causality, vector timestamps, and global states. Thealgorithms for implementing these concepts are presented. The connections betweenthe different models are explored. Finally, a model for proving the correctness ofdistributed algorithms is presented.
Chapter 10: covers distributed synchronization and distributed election. While thedistributed synchronization algorithms are not considered pragmatic, they illustrateimportant algorithm design techniques. For example, voting algorithms for replicateddata management are foreshadowed in Maekawa's algorithm, and the Chang-Singhal-Liualgorithm illustrates the ideas behind distributed shared memory (and distributedobject) algorithms. The chapter concludes with algorithms for electing a computationleader. Election is a critical component of many systems. The invitation algorithmviiin particular is a prototype for handling failures in an asynchronous system andforeshadows the group view maintenance algorithms of Chapter 12.
Chapter 11 discusses the abstract distributed agreement problem. First, Byzantineagreement is discussed. Next, the Fischer-Lynch-Paterson (FLP) result that no algorithmsolves distributed agreement problems in an asynchronous system is coveredin detail. This is the appropriate point to introduce the FLP result, because the nextchapter covers replicated data management and must solve distributed agreement inasynchronous systems. The FLP result leaves open three ways to achieve distributedagreement in an asynchronous system: hope that it happens, use relative agreement,or use a randomized algorithm. The chapter discusses these implications of the FLPresult and concludes with some randomized agreement protocols.
Chapter 12 covers replicated data management. Since providing replicated servers reducesto replicating the state of the servers, this section also discusses the problems andconcepts of replication. We cover three main approaches: the transaction approach,the reliable multicast approach, and the log propagation approach. The transactionapproach includes discussion of two-phase commit, three-phase commit, one-copyserializability, voting, and dynamic voting protocols. The reliable multicast approachincludes discussion of virtual synchrony, algorithms for implementing reliable andcausal multicast, algorithms for totally ordered multicast, and consistent multicastgroup maintenance algorithms. The log propagation approach covers naive log propagation,epidemics, and causal log propagation. This chapter is the culmination of PartII of the text and draws together the results presented in previous chapters.
Chapter 13 covers distributed rollback and recovery. These techniques are critical forimplementing fault-tolerant systems and are complimentary to the replicated datamanagement techniques of the previous chapter. By using the theory developed in theprevious chapters (especially Chapter 9), different rollback and recovery algorithmsare presented in a unified manner and are related to algorithms discussed previously.

More topics

http://nereida.deioc.ull.es/html/programming.html

Beowulf
http://nereida.deioc.ull.es/~cicyt/llbeowulf.html
http://nereida.deioc.ull.es/html/beowulf.html

HPC Java programming
http://nereida.deioc.ull.es/html/java.html

Distributed HPC
http://nereida.deioc.ull.es/~cicyt/welcome.html

Distributed Computing
http://en.wikipedia.org/wiki/Distributed_computing
Java DC
http://www.unix.org.ua/orelly/java-ent/dist/index.htm

Parallel Programming
http://nereida.deioc.ull.es/html/pram.html

Shared Memory
http://nereida.deioc.ull.es/html/openmp.html

HP Fortran
http://nereida.deioc.ull.es/html/hpf.html

Message Passing
http://nereida.deioc.ull.es/html/mppm.html
http://en.wikipedia.org/wiki/Message_passing

Parallel computers
http://nereida.deioc.ull.es/html/models.html
http://nereida.deioc.ull.es/html/operational_r.html

Journals
http://nereida.deioc.ull.es/html/conferences.html#journals

HPC European Research
http://www.hpc-europa.org/index.php?section=Transnational

Communication primitives:
Process concept, process interactions; message pasing: primitives, synchronisation; channels: naming problems, mail box, ports, connections; rendezvous.

Distributed processing:
Basic concepts; client/server model; remote procedure call; partitioning and configuration; MPI standard.

Transaction processing:
Decomposable abstrat operations; resource allocation; transactions; concurrency; distributed transactions; international standards.

http://www.psc.edu/general/education/Lecture_List.html

Introduction to the Cray XT3
http://www.psc.edu/training/XT3_Aug05/
http://charm.cs.uiuc.edu/patHPC/