1.Distributed Computing in Practice: The Condor ExperienceCS739 11/4/2013
Todd Tannenbaum
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison
2.The Condor High Throughput Computing System
2
“Condor is a high-throughput distributed computing system. Like other batch systems, Condor provides a job management mechanism, scheduling policy, priority scheme, resource monitoring, and resource management….
While similar to other conventional batch systems, Condor's novel architecture allows it to perform well in environments where other batch systems are weak: high-throughput computing and opportunistic computing.”
3.3
3
Grids
Clouds
Map-Reduce
eScience
Cyber Infrastructure
SaaS
HPC
HTC
eResearch
Web Services
Virtual Machines
HTPC
MultiCore
GPUs
HDFS
IaaS
SLA
QoS
Open Source
Green Computing
Master-Worker
WorkFlows
Cyber Security
High Availability
Workforce
100Gb
4.Definitional Criteria for a Distributed Processing System
4
Multiplicity of resources
Component interconnection
Unity of control
System transparency
Component autonomy
P.H. Enslow and T. G. Saponas “”Distributed and Decentralized Control in Fully Distributed Processing Systems” Technical Report, 1981
5.5
A thinker's guide to the most important trends of the new decade
“The goal shouldn't be to eliminate failure; it should be to build a system resilient enough to withstand it”
“The real secret of our success is that we learn from the past, and then we forget it. Unfortunately, we're dangerously close to forgetting the most important lessons of our own history: how to fail gracefully and how to get back on our feet with equal grace.”
In Defense of FailureBy Megan McArdle Thursday, Mar. 11, 2010
6.Claims for “benefits” provided by Distributed Processing Systems
6
High Availability and Reliability
High System Performance
Ease of Modular and Incremental Growth
Automatic Load and Resource Sharing
Good Response to Temporary Overloads
Easy Expansion in Capacity and/or Function
P.H. Enslow, “What is a Distributed Data Processing System?” Computer, January 1978
7.High Throughput Computing
Goal: Provide large amounts of fault tolerant computational power over prolonged periods of time by effectively utilizing all resources
High Performance Computing (HPC) vs HTC
FLOPS vs FLOPY
FLOPY != FLOPS * (Num of seconds in a year)
High Throughput Computing (HTC)
7
8.Opportunistic Computing
Ability to use resources whenever they are available, without requiring one hundred percent availability.
Very attractive due to realities of distributed ownership
[Related: Volunteer Computing]
Opportunistic Computing
8
9.Let communities grow naturally.
Build structures that permit but do not require cooperation.
Relationships, obligations, and schemata will develop according to user necessity
Leave the owner in control.
Plan without being picky.
Better to be flexible than fast!
Lend and borrow.
Philosophy of Flexibility
9
10.Architect in terms of responsibility instead of expected functionality
Delegate like Ronald Reagan. How?
Plan for Failure!
Leases everywhere, 2PC, belt and suspenders, …
Never underestimate the value of code that has withstood the test of time in the field
Always keep end-to-end picture in mind when deciding upon layer functionality
Todd’s Helpful Tips
10
11.Code reuse creates tensions…
Example: Network layer uses checksums to ensure correctness
End to end thinking
11
12.The Condor Kernel
12
Step 1: User submits a job
13.ClassAds and Matchmaking
Name-value pairs
Values can be literal data or expressions
Expression can refer to match candidate
Requirements and Rank are treated special
How are jobs described?
13
14.Semi-structured data
No fixed schema
“Let communities grow naturally…”
Use three-value logic
Expressions evaluate to True, False, or Undefined
Bilateral
Interesting ClassAd Properties
14
15.What if schedd crashes during job submission?
ARIES-style recovery log
2 phase commit
Plan for Failure, Lend and borrow
15
Begin Transaction
105 Owner Todd
105 Cmd /bin/hi
…
End Transaction
Begin Transaction
106 Owner Fred
106 Cmd /bin/bye
schedd
Prepare
Job ID 106
Commit 106
16.The Condor Kernel
16
Step 2: Matchmaking
17.Each component (A, M, R) service is independent and has a distinct responsibility
Centralized Matchmaker is very light weight: after Match step, it is not involved
Claims can be reused, delegated, …
Matchmaking Protocol
17
Advertise
Match
Claim
18.Federation many years later was easy because of clearly defined roles of services
Federation via Direct Flocking
18
19.GRAM = Grid Resource Access and Management
“Defacto-standard” protocol for submission to a site scheduler
Problems
Dropped many end-to-end features
Like exit codes!
Couples resource allocation and job execution
Early binding of a specific job to a specific queue
Globus and GRAM
19
26.Process Checkpointing
Condor’s process checkpointing mechanism saves the entire state of a process into a checkpoint file
Memory, CPU, I/O, etc.
The process can then be restarted from right where it left off
Typically no changes to your job’s source code needed—however, your job must be relinked with Condor’s Standard Universe support library
27.Relinking Your Job for Standard Universe
To do this, just place “condor_compile” in front of the command you normally use to link your job:
% condor_compile gcc -o myjob myjob.c
- OR -
% condor_compile f77 -o myjob filea.f fileb.f
28.When will Condor checkpoint your job?
Periodically, if desired (for fault tolerance)
When your job is preempted by a higher priority job
Preempt/Resume scheduling – powerful!
When your job is vacated because the execution machine becomes busy
29.Remote System Calls
I/O system calls are trapped and sent back to submit machine
Allows transparent migration across administrative domains
Checkpoint on machine A, restart on B
No source code changes required
Language independent
Opportunities for application steering
33.Why not use Vanilla Universe for Java jobs?
Java Universe provides more than just inserting “java” at the start of the execute line
Knows which machines have a JVM installed
Knows the location, version, and performance of JVM on each machine
Can differentiate JVM exit code from program exit code
Can report Java exceptions
34.DAGMan
Directed Acyclic Graph Manager
DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you.
(e.g., “Don’t run job “B” until job “A” has completed successfully.”)
35.What is a DAG?
A DAG is the data structure used by DAGMan to represent these dependencies.
Each job is a “node” in the DAG.
Each node can have any number of “parent” or “children” nodes – as long as there are no loops!
Job A
Job B
Job C
Job D
36.A DAG is defined by a .dag file, listing each of its nodes and their dependencies:
# diamond.dag
Job A a.sub
Job B b.sub
Job C c.sub
Job D d.sub
Parent A Child B C
Parent B C Child D
each node will run the Condor job specified by its accompanying Condor submit file
Defining a DAG
Job A
Job B
Job C
Job D