The study of the evolutionary history and relationships among individuals or group of organisms
Phylogenetic tree
A diagram that depicts the lines of evolutionary descent of different species, organisms, or genes from a common ancestor
Attempt to reconstruct evolutionary ancestors
Estimate time of divergence from ancestor
Phylogenetic analysis have become central to understanding biodiversity, evolution and genomes
Relationships in phylogenetic tree
Captured by the topology (branching order) and amount of evolutionary change (branch lengths) between nodes
The role of the root is to add direction to these relationships and clearly define ancestry
Applications of phylogenetic trees
Classification of organisms
Forensics (e.g. HIV virus mutation)
Predicting evolution of influenza viruses
Predicting functions of uncharacterized genes (orthologue detection)
Drug discovery
Bioinformatics
Vaccine development
The term 'phylogeny' introduced by Ernst Haeckel
1866
Ernst Haeckel's recapitulation theory was widely accepted during late 19th century but has been rejected in the modern world
Darwin's 'On The Origin of Species' convinced many biologists to accept common ancestry and start building phylogenies
Phylogenetictree
A two dimensional graph showing evolutionary relationships between organisms or genes from various organisms
Phylogenetic trees are a hypothesis of the evolutionary past since one cannot go back to confirm the proposed relationships
Types of phylogenetic trees
Rooted trees
Unrooted trees
Rooted tree
Has a single ancestral lineage (typically drawn from the bottom or left) to which all organisms represented in the diagram
The three domains (Bacteria, Archaea, and Eukarya) diverge from a single point and branch off
Bifurcating tree
Has a maximum of 2 descendants arising from each of the interior nodes
Multi-furcating tree
Has multiple descendants arising from each of the interior nodes
Unrooted tree
Doesn't show a common ancestor but does show relationships among taxa
Taxon
A formally named group represented by the leaves of a phylogenetic tree
Branch
Represents the persistence of a lineage through time, may subtend one or many leaves
Node
Represents the last common ancestors of organisms at the tips of the descendant lineages
External branch
Connects a tip to a node
Internal branch
Connects two nodes
Cladogenesis
Branching on an evolutionary tree, where an ancestral lineage splits to give rise to two or more descendant lineages
Clade
A grouping on a tree that includes a node and all of the lineages descended from that node
Monophyletic clade
A taxon that includes all descendants of an inferred ancestor, characterized by one or more apomorphies (derived character states)
Paraphyletic clade
An assemblage that is constructed by taking a clade and removing one or more smaller clades, characterized by one or more plesiomorphies (character states inherited from ancestors but not present in all of their descendants)
Polyphyletic clade
An assemblage that is neither monophyletic nor paraphyletic, characterized by one or more homoplasies (character states which have converged or reverted so as to be the same but which have not been inherited from a common ancestor)
Phylogenetic inference is the practice of reconstructing the evolutionary history of related species by grouping them in successively more inclusive sets based on shared ancestry
Methods for constructing phylogenetic trees
Character-based methods
Distance methods
Character-based methods
Use the aligned characters, such as DNA or protein sequences, directly during tree inference
Maximum parsimony
An optimality criterion under which the phylogenetic tree that minimizes the total number of character-state changes is selected
Maximum likelihood
A method that represents an additional opportunity to evaluate trees with variations in mutation rates in different lineages, and can be used to explore relationships among more diverse sequences and conditions not well handled by maximum parsimony methods
Maximum Likelihood
An additional opportunity to evaluate trees with variations in mutation rates in different lineage
Maximum Likelihood (ML) in phylogeny
A method used to figure out the most likely evolutionary tree (phylogeny) based on genetic data
It tries to find the tree that makes the sequences the most probable by calculating the likelihood of the data given a particular tree and then finds the tree that maximizes this likelihood
Distance-based method
Constructs phylogenetic trees based on the amount of distance or dissimilarity between aligned sequences, transforms sequence data into pairwise distances and then uses the matrix for tree building
Algorithms used in distance method
UPGMA (Unweighted Pair Group Method using Arithmetic Average)
Neighbour joining method
UPGMA
Simplest algorithm for tree construction, groups species together based on their similarity in genetic data and builds the tree in a way that reflects these similarities, assumes constant rates of evolution across lineages
Steps in UPGMA
1. Calculate pairwise distances
2. Group species into clusters based on similarities
3. Calculate average distance for each cluster
4. Join clusters with smallest average distance
5. Repeat until complete tree is built
Neighbor-Joining (NJ) method
Iteratively builds the tree by joining pairs of species that are most closely related based on their genetic distances, aims to find the tree structure that best reflects the evolutionary relationships between species