A Faster Reconstruction of Binary Near-Perfect Phylogenetic Trees, Srinath

Download this Presentation

0

Presentation Transcript

  • 1.A Faster Reconstruction of Binary Near-Perfect Phylogenetic Trees Srinath Sridhar Joint work with: Kedar Dhamdhere, Guy E. Blelloch, Eran Halperin, R. Ravi and Russell Schwartz
  • 2.Steiner Tree Problem Input: Graph G(V, E) with edge weights w: E  R and a ‘terminal’ set S  V Output: Subtree T of G connecting all vertices in S Objective: Minimize |w(T)| Informally: MST with intermediate vertices NP-complete, even if G is m-dimensional hypercube with unit edge weights
  • 3.Near-Perfect Phylogenetic Trees Input: set S of n points on an m-dimensional hypercube (n bit-strings of length m) Output: Steiner (unrooted) tree T connecting S using intermediate nodes (Steiner nodes) of hypercube Objective: Minimize |T| Assumption: |Topt |  m + q, constant q
  • 4.Why is this important? (Foster et al., 98)
  • 5.Why is this important? (Wirth et al., 04)
  • 6.Typical Input Data Rows: different species, languages etc Columns: yes/no, 0/1 properties of rows Phenotypes: Each column can represent binary questions: thumbs? color-blind? DNA: Each position has 2 possibilities (almost always)
  • 7.Example H W RS B/NB Basilisk: 1 1 0 0 Boggart: 0 0 0 1 Centaur: 1 0 1 1 Goblin: 1 0 0 1 H: Head W: Wings RS: Can read stars B/NB: Bad/not-so-bad 0001 Boggart 1001Goblin 1011 Centaur 1000 Steiner Basilisk 1100
  • 8.Perfectness Annotate tree T with the column flip Tree T ‘perfect’: annotations occur only once Evolution is assumed to be (nearly) perfect 0001 Boggart 1001Goblin 1011 Centaur 1000 Steiner Basilisk 1100 1 4 2 3
  • 9.Perfectness Annotate tree T with the column flip Tree T ‘perfect’: annotations occur only once Evolution is assumed to be (nearly) perfect q-near-perfect: |Topt |  m + q, constant q 0001 Boggart 1001Goblin 1011 Centaur 1000 Dementor Basilisk 1100 1 4 2 3 Hippogriff 1101 4 1-near perfect
  • 10.General Phylogeny Problem Input S: set of n strings in {1, …, k}m Output: Steiner tree T connecting all of S (Hamming distance) Objective: Minimize |T| Variants: k is bounded by a constant, k is 2 Tree T is perfect Tree T is near-perfect
  • 11.Some Prior Work
  • 12.Overview Optimal Tree Discover O(q) edges, induced topology
  • 13.Overview Optimal Tree Discover assignment of rows to super nodes
  • 14.Overview Optimal Tree Grow perfect phylogeny within Each super node
  • 15.Overview Optimal Tree Link the super nodes
  • 16.Current/Future Work Simpler algorithm States k > 2, near-perfect Experimental evaluation, useable code Related harder problem: Input is ‘mixture’ of 2 strings over {0, 1}mInput: 2 0 1 1 2Output: 1 0 1 1 0 0 0 1 1 1