1.A Faster Reconstruction of Binary Near-Perfect Phylogenetic Trees
Srinath Sridhar
Joint work with: Kedar Dhamdhere, Guy E. Blelloch, Eran Halperin, R. Ravi and Russell Schwartz
2.Steiner Tree Problem
Input: Graph G(V, E) with edge weights w: E R and a ‘terminal’ set S V
Output: Subtree T of G connecting all vertices in S
Objective: Minimize |w(T)|
Informally: MST with intermediate vertices
NP-complete, even if G is m-dimensional hypercube with unit edge weights
3.Near-Perfect Phylogenetic Trees
Input: set S of n points on an m-dimensional hypercube (n bit-strings of length m)
Output: Steiner (unrooted) tree T connecting S using intermediate nodes (Steiner nodes) of hypercube
Objective: Minimize |T|
Assumption: |Topt | m + q, constant q
4.Why is this important? (Foster et al., 98)
5.Why is this important? (Wirth et al., 04)
6.Typical Input Data
Rows: different species, languages etc
Columns: yes/no, 0/1 properties of rows
Phenotypes: Each column can represent binary questions: thumbs? color-blind?
DNA: Each position has 2 possibilities (almost always)
8.Perfectness
Annotate tree T with the column flip
Tree T ‘perfect’: annotations occur only once
Evolution is assumed to be (nearly) perfect
0001
Boggart
1001Goblin
1011
Centaur
1000
Steiner
Basilisk
1100
1
4
2
3
9.Perfectness
Annotate tree T with the column flip
Tree T ‘perfect’: annotations occur only once
Evolution is assumed to be (nearly) perfect
q-near-perfect: |Topt | m + q, constant q
0001
Boggart
1001Goblin
1011
Centaur
1000
Dementor
Basilisk
1100
1
4
2
3
Hippogriff
1101
4
1-near
perfect
10.General Phylogeny Problem
Input S: set of n strings in {1, …, k}m
Output: Steiner tree T connecting all of S (Hamming distance)
Objective: Minimize |T|
Variants:
k is bounded by a constant, k is 2
Tree T is perfect
Tree T is near-perfect
11.Some Prior Work
12.Overview
Optimal Tree
Discover O(q) edges, induced topology
13.Overview
Optimal Tree
Discover assignment of rows to super nodes
14.Overview
Optimal Tree
Grow perfect phylogeny within
Each super node
15.Overview
Optimal Tree
Link the super nodes
16.Current/Future Work
Simpler algorithm
States k > 2, near-perfect
Experimental evaluation, useable code
Related harder problem: Input is ‘mixture’ of 2 strings over {0, 1}mInput: 2 0 1 1 2Output: 1 0 1 1 0 0 0 1 1 1