Lab Bioinformatics Fly Lab
(Created by Benjamin Carone, Yong Chen and Marina Bogush)
A hypothesis driven, Molecular Phylogenetics Exercise in Drosophila
Learning Objectives:
Identify morphological differences between species and recognize the process of evolution driving these differences
Pre-lab:
Learn how to generate a phylogenetic tree by watching the video:
Section 1: Introduction to Phylogeny and Hypothesis Generation
Part I – Identify Morphological Differences in Fruit Flies
Part II – Generating a Hypothesis-driven Phylogeny
Part III – Identifying Drosophila Cytochrome b gene using Flybase.org
Part IV – Getting the cytochrome B gene sequence for the remaining 5 species using Flybase.org BLAST
Section 2: Performing DNA and Protein Alignments and
Phylogenies
Part I – Perform a DNA alignment using “MUSCLE Alignment” (online program)
Part II – Generate a phylogeny using the MUSCLE output and compare to predicted phylogeny
Part III – Translate the first 500bp of D. melanogaster cytochrome b gene from DNA to protein (online program)
Part IV – Navigate to NCBI and perform a Protein BLAST
Part V – Identify a total of 7 different cytochrome b sequences from students choice of species from BLAST results and download into Microsoft Word
Part VI – Perform an amino acid alignment using “MUSCLE Alignment” (online program)
Part VII – Generate a phylogeny using the MUSCLE output and compare to DNA phylogeny and predictions
Section 1:
Part I – Identify Morphological Differences in Fruit Flies
Part II – Generating a Hypothesis-driven Phylogeny
Part III – Identifying Drosophila Cytochrome b gene using Flybase.org
Part IV – Getting the cytochrome B gene sequence for the remaining 5 species using Flybase.org BLAST
Hypothesis Generation
In this section of the lab, students will compare the trait of 5 different species of fruit fly, Drosophila and the common housefly, Musca domestica. In order to do so, students will utilize the Drosophila web resource flybase.org, which houses a wide selection of scientific tools including an image database of fruit fly strains as well as genomic data for a variety of species. During the hypothesis generation portion of this laboratory students will navigate to Flybase.org, identify traits and make a hypothesis as to their evolutionary relationship in the form of a phylogenetic tree prediction.
Geographic distribution of commonly studied Drosophila species
Part I – Identifying Morphological Differences in Fruit Flies.
Figure S1. Habitat distributions of Drosophila species. The coloured regions indicate the
climatic environment for each Drosophila species based on the rainfall, temperature and
vegetation: red, yellow, green and purple roughly correspond to equatorial, arid, warm and snowy climates, respectively (Kottek et al. 2006).
You will be using Flybase.org
Open a Chrome Browser window and navigate to flybase.org.
Part I – Identifying Morphological Differences in Fruit Flies.
You will be using Flybase.org
Open a Chrome Browser window and navigate to flybase.org.
Click on the drop-down menu “Species” on the top, select the “Color Illustrations” (you can also reach the page by this link https://flybase.org/static/Drosophilidae ). Click the “Illustrations” at the bottom to expand it.
Find the species in the table below, compare, and fill in Morphological Character section of the table with your observations. (there is no image of M. domestica, you can find from https://en.wikipedia.org/wiki/Housefly )
Using Figure 1 on the previous page fill in the Geographical distribution (Wet vs. Dry habitat). Note: D. melanogaster is in North America (Philly)
Fly Species
Geographical Distribution Habitat –(Wet/ Dry/Temperate)
Morphological Character
Eye Color (Red/White)
Thorax (Long/Short)
Body Color (Light/Dark & Striped/Plain
D. ananassae
(Fruit Fly)
D. melanogaster
(Fruit Fly)
D. virilis
(Fruit Fly)
D. mojavensis
(Fruit Fly)
D willistoni
(Fruit Fly)
M. domestica
(House Fly)
Part II – Generating a hypothesis-driven phylogeny.
Some organisms share common features and characteristics. The more similar characteristics two organisms share the more likely they are to be close to each other in evolutionary relationship. When comparing multiple organisms and trying to identify their evolutionary relationship we will often create a “tree” depicting their relationship. This “tree” is called a phylogeny. In this part of the lab, you will Individually create your own phylogeny based on the observations of traits that recorded in Part I of this lab.
Example phylogenies:
Using your data above create your own Individual phylogeny with all six species below:
Hint:
First, calculate the similarity distance of two species based on how many common features they have. (fill below table)
For example, D. willistoni and M. Domestica have 3 common features (red words)
D. willistoni: Temperate, red, short, light, striped
M. Domestica: Temperate, red, short, dark, plain
Similarity table (the maximum is denoted in red)
D. ananassae
(Fruit Fly)
D. melanogaster
(Fruit Fly)
D. virilis
(Fruit Fly)
D. mojavensis
(Fruit Fly)
D willistoni
(Fruit Fly)
M. domestica
(House Fly)
D. ananassae
(Fruit Fly)
D. melanogaster
(Fruit Fly)
D. virilis
(Fruit Fly)
D. mojavensis
(Fruit Fly)
D willistoni
(Fruit Fly)
M. domestica
(House Fly)
Second: iteratively connect two species (clusters) with biggest common features numbers. If there are several biggest numbers, you can randomly select one.
Third: if species a and b have been merged, consider them as one cluster and update the similarity between to all others (either single or merged cluster) by calculating the mean distance between elements of two clusters. And go second step.
For example, suppose the similarity between a, b, c,d are s(a,c), s(a,d), s(b,c),s(b,d)) respectively, the similarity between and is mean similarity of their elements (s(a,c)+s(a,d)+s(b,c)+s(b,d))/4.
Please draw you tree below (if several numbers exist equal, different selection of them may lead to different trees).
Part III – Identifying Drosophila cytochrome B gene using Flybase.org
Finding genes (Individual)-
Open a Chrome Browser window and navigate to flybase.org.
Scroll to top right corner and find “Jump to gene” input box type in “cytochrome b” and hit go
Under Genomic Location, click on the icon “Get decorated FASTA file”
Copy and paste this sequence into a Microsoft word document by highlighting the sequence and selecting copy under the Chrome browser edit tab. It should look like this:
>D.melanogaster/mitochondrion_genome:10499..11635 ATGAATAAACCTTTACGAAATTCCCATCCTCTATTTAAAATTGCCAATAATGCTTTAGTA GATTTACCAGCTCCAATTAATATTTCAAGATGATGAAATTTTGGATCATTACTTGGATTA TGTTTAATTATTCAAATTTTAACCGGATTATTTTTAGCTATACATTACACAGCTGATATT AATCTAGCTTTCTATAGTGTTAATCATATTTGTCGAGACGTTAATTATGGTTGATTATTA CGAACTTTACATGCTAACGGTGCATCATTTTTTTTTATTTGTATTTACTTACATGTAGGA CGAGGAATTTATTACGGTTCATATAAATTTACTCCAACTTGATTAATTGGAGTAATTATT TTATTTTTAGTAATAGGAACAGCTTTTATAGGATACGTATTACCTTGAGGACAAATATCA TTTTGAGGAGCTACTGTAATTACTAATTTATTATCAGCTATCCCTTACTTAGGTATAGAT TTAGTTCAATGATTATGAGGTGGATTTGCTGTTGATAATGCCACTTTAACTCGATTTTTT ACATTCCATTTTATTTTACCTTTTATTGTTCTTGCTATAACTATAATTCATTTATTATTC CTTCATCAAACAGGATCTAATAATCCTATCGGATTAAATTCTAATATTGATAAAATTCCT TTTCATCCTTATTTTACATTTAAAGATATTGTAGGATTTATTGTAATAATTTTTATTTTA ATTTCATTAGTATTAATTAGACCAAATTTATTGGGAGACCCTGATAATTTTATTCCAGCA AATCCTTTAGTAACACCTGCCCATATTCAACCAGAATGATATTTTTTATTTGCTTATGCT ATTTTACGATCTATTCCAAATAAATTAGGAGGAGTTATTGCATTAGTTTTATCAATTGCA ATTTTAATAATCCTTCCTTTTTATAATTTAAGAAAATTCCGAGGGATTCAATTTTATCCT ATTAATCAAGTAATATTCTGATCTATATTAGTAACAGTAATTTTATTAACTTGAATTGGA GCTCGACCAGTTGAAGAACCTTATGTATTAATTGGACAAATTCTAACTGTTGTATATTTC TTATATTATTTAGTAAACCCATTAATTACAAAATGATGAGATAATTTATTAAATTAA
Nice! You now have the cytochrome B DNA sequence for Drosophila melanogaster. As you can tell from the header this gene is present on the mitochondrial genome and it’s position relative to the rest of the mitochondrial genome is 10499bp – 11635bp.
Note: This sequence is in the format called FASTA. This is a common way for geneticists to display and work with DNA and protein sequences. You will have to work with various DNA and protein sequences in this laboratory often in FASTA format. The defining features of this format are:
1) Each new sequence name begins with a “>”
2) The actual sequence is on a new line with a hard return “enter key” in between.
Part IV – Getting the cytochrome B gene sequence for the remaining 5 species using Flybase.org BLAST
Select only the first 500 bps of the D. melanogaster cytochrome b gene (Hint: There is a tool in Microsoft Word called word count that can be found under the Tools tab and can be used to identify the number of characters highlighted)
Open a Chrome Browser window and navigate to flybase.org.
Click on the icon “BLAST”,
Click your cursor on empty box and paste your 500bp of cytochrome b gene sequence into the box.
Only select the species that we are going to look at today (Table 1) by making sure there is a check mark in the box next to that species.
Click on the BLAST button
A “Graphic Summary” of your results should look like this:
Click each red band, and you will be navigated to its alignment result.
You will now need to identify the species DNA sequences, download them and organize them into a Word document. You can do this by finding the matching species, record the Score and Identity values into following Table on the next page. If there are several matched segments in a strain, you can select the one with largest score. You can click on the “Subject FASTA” button to get the actual matching sequence for the species. You will get result like this.
Fly Species
Score
Identities
D. ananassae
(Fruit Fly)
D. melanogaster
(Fruit Fly)
D. virilis
(Fruit Fly)
D. mojavensis
(Fruit Fly)
D willistoni
(Fruit Fly)
M. domestica
(House Fly)
Does the percent identity match with your predicted similarities based on morphological features?
Section 2: Performing DNA and Protein Alignments and
Phylogenies
Part I – Perform a DNA alignment using “MUSCLE Alignment” (online program)
Part II – Generate a phylogeny using the MUSCLE output and compare to predicted phylogeny
Part III – Translate the first 500bp of D. melanogaster cytochrome b gene from DNA to protein (online program)
Part IV – Navigate to NCBI and perform a Protein BLAST
Part V – Identify a total of 7 different cytochrome b sequences from students choice of species from BLAST results and download into Microsoft Word
Part VI – Perform an amino acid alignment using “MUSCLE Alignment” (online program)
Part VII – Generate a phylogeny using the MUSCLE output and compare to DNA phylogeny and predictions
https://www.ebi.ac.uk/Tools/msa/muscle/
Part I – Perform a DNA sequence alignment
Now that you have collected the DNA sequences for the cytochrome B gene from 6 closely related species. It Is time to use DNA similarities to identify their evolutionary relationship. The simplest way to accomplish this is to use a program that will align similar sequences based on whether a DNA sequence is the same at a given position and then take the number of similarities and differences, and compute the most likely relationship between the organisms.
1. Format your cytochrome B sequences from Day 1 into a FASTA format (it’s OK for now if they aren’t the same length).
Example:
>D_mel
ATGAATAAACCTTTACGAAATTCCCATCCTCTATTTAAAATTGCCAATAATGCTTTAGTAGATTTACCAGCTCCAATTAATATTTCAAGATGATGAAATTTTGGATCATTACTTGGATTATGTTTAATTATTCAAATTTTAACCGGATTATTTTTAGCTATACATTACACAGCTGATATTAATCTAGCTTTCTATAGTGTTAATCATATTTGTCGAGACGTTAATTATGGTTGATTATTACGAACTTTACATGCTAACGGTGCATCATTTTTTTTTATTTGTATTTACTTACATGTAGGACGAGGAATTTATTACGGTTCATATAAATTTACTCCAACTTGATTAATTGGAGTAATTATTTTATTTTTAGTAATAGGAACAGCTTTTATAGGATACGTATTACCTTGAGGACAAATATCATTTTGAGGAGCTACTGTAATTACTAATTTATTATCAGCTATCCCTTACTTAGGTATAGATTTAGTTCAATGATTATGAGG
>D_ana
ATGAATAAACCTTTACGAACTTCCCACCCATTATTTAAAATTGCCAATAACGCATTAGTAGATTTACCAGCTCCTATTAATATTTCAAGATGATAAAACTTTGGATCATTATTAGGATTATGTTTAATTATTCAAATTTTAACAGGATTATTTTTAGCTATACATTATACAGCAGATGTAAATTTAGCTTTTTATAGAGTAAATCATATTTGTCGTAATGTAAATTACGGATTATTATTACGAACTCTACACGCTAATGGTGCATCATTTTTCTTTATTTGTATTTATTTACATGTAGGACGAGGAATATATTATGGTTCATATTTATTTACTCCTACATCATTAGTTCGAGTAATTATTCTATTTTTAGTTATAGGAACTGCTTTTATAGGATATGTTCTTCCTTGAGGACAAATATCATTTTGAGGAGCAACAGTAATTACAAACCTATTATCAGTCATTCCTAAGTTAGGAATAGATTTAGTACAATGAGTATGAGG
>Vir
ATGAACAAACCTTTACGAACCTCCCACCCTTTATTTAAAATTGCTAATAATGCTTTAGTTGATCTTCCTGCACCTGTTAA TATTTCAAGATGATGAAATTTTGGATCTTTATTAGGATTATGTTTAATTATCCAAATTTTAACTGGATTATTTTTAGCTA TACACTACACCGCAGATGTAAATTTAGCTTTTAATAGAGTTAATCATATTTGTCGTGATGTAAATTACGGATGATTATTA CGAACAATACACGCTAACGGTGCTTCTTTTTTCTTTATTTGTATTTATTTACATGTAGGACGAGGAATTTACTACGGATC TTATTTATTTACACCTACATGAATAATTGGAGTAATTATTTTATTTTTAGTAATAGGAACTGCTTTTATAGGTTATGTAT TACCATGAGGACAAATATCTTTTTGAGGAGCAACAGTTATTACAAATTTATTATCAGC
2. Open a Chrome browser and navigate to:
https://www.ebi.ac.uk/Tools/msa/muscle/
3. Highlight all 6 of your FASTA sequences in your Word document, copy (Control-C), and paste into the Input Sequences box.
4. Make sure your output parameters are set to ClustalW, and submit your job.
5. The job should process very quickly. You should be able to click on the generated link within 30 seconds to view your results. The alignment results will be like below. The * means identical position of six sequences. ‘-’ means a gap, e.g a deletion.
Part II – Generate a DNA sequence-based phylogeny and compare to your previously generated prediction.
1. Navigate back to your main results page ‘Results viewers’.
2. Click on the Send to Simple Phylogeny Button (bottom)
3. Leave all settings as default, except change Clustering Method to UPGMA, click on “Be notified by email” and submit.
Note: UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is a simple clustering method that assumes a constant rate of evolution. The UPGMA algorithm constructs a rooted tree (dendrogram or cladogram) that reflects the structure present in a pairwise similarity matrix (or a dissimilarity matrix). In the UPGMA tree, the numbers indicate evolution distances that calculated by using a predefined distance matrix (usually generated from real sequence datasets). Smaller number indicates close evolution distance (high similarity) to its pseudo ancestor (https://en.wikipedia.org/wiki/UPGMA).
4. View and draw both Phylograms below. For the diagram of Real branch lengths, your image will look compressed; do your best to draw the image as it is depicted. (copy and insert the figures below)
Branch length – Cladogram:
Branch length – Real:
Part III – Translate the DNA sequence of the cytochrome B gene from D. melanogaster into the Amino Acid sequence
Now that we have identified the relationship between different species of fruit flies as well as their relationship to the common house fly it is interesting to see how the sequence of this gene might relate to more distantly related species. Interestingly, if you tried to BLAST the DNA sequence of the cytochrome B gene to Human DNA sequence you would likely not see a match. However, if you BLAST the protein sequence you are much more likely to see a match. If you aren’t sure why by this point in the lab/semester ask your instructor about degenerate bases in a codon. The goal of this next activity is to convert your D. melanogaster DNA sequence into a protein and perform a Protein specific blast.
1. Open your Word file with your DNA sequences for cytochrome B.
2. You will be using a program called EMBOSS Transeq to convert your DNA sequence into an amino acid sequence. Navigate to the following website:
https://www.ebi.ac.uk/Tools/st/emboss_transeq/
3. Copy and paste the first 500 bp of your D. melanogaster sequence into the Input space.
4. Leave output parameters set to default Frame 1 and Standard code, click the notify by email box, and submit your job.
5. Copy and paste your amino acid sequence into your DNA sequence word document under the new Heading of “Amino acid sequences”.
You will get Results as following.
>EMBOSS_001_1
MNKPLRNSHPLFKIANNALVDLPAPINISR**NFGSLLGLCLIIQILTGLFLAIHYTADI NLAFYSVNHICRDVNYG*LLRTLHANGASFFFICIYLHVGRGIYYGSYKFTPT*LIGVII LFLVIGTAFIGYVLP*GQISF*GATVITNLLSAIPYLGIDLVQ*L*G
6. Transeq by default replaces all internal Methionine amino acids with a * character. Before proceeding onto the next step, manually replace all * with an M in your Word document or you will not be able to perform a protein BLAST.
>EMBOSS_001_1
MNKPLRNSHPLFKIANNALVDLPAPINISRMMNFGSLLGLCLIIQILTGLFLAIHYTADI NLAFYSVNHICRDVNYGMLLRTLHANGASFFFICIYLHVGRGIYYGSYKFTPTMLIGVII LFLVIGTAFIGYVLPMGQISFMGATVITNLLSAIPYLGIDLVQMLMG
Part IV – Perform a Protein Blast of your D. melanogaster cytochrome B amino acid sequence and collect amino acid sequences from five additional species.
1. Open your Word file with your amino acid sequence for D. melanogaster cytochrome B.
2. You will be using a program called Protein Blast to identify related cytochrome b genes in other species. Navigate to the following website:
https://blast.ncbi.nlm.nih.gov/Blast.cgi
Click on the Protein Blast tab
3. Copy and paste your D. melanogaster amino acid sequence into the Input space.
4. Under the Choose Search Set – Database tab, select “Model Organisms”.
Submit your job by clicking on the BLAST button.
5. Your results should load in about 30 seconds, DO NOT NAVIGATE AWAY FROM THIS PAGE while it is refreshing. You will get results as following figure.
You will now collect amino acid sequences from five species of your choice
Homo sapiens (human)
Danio rerio (zebrafish)
Mus musculus (house mouse)
Schizosaccharomyces pombe (fission yeast)
Saccharomyces cerevisiae S288C (budding yeast)
6. To collect an amino acid sequence, scroll down to your organism of choice. Click on Download (FASTA – Aligned Sequences). It should download the FASTA sequence into a file called seqdump.txt. If you click on this file it will open up in a Text editing program as your FASTA sequence. Copy and paste to your Word document and repeat for all six sequences.
Part V – Perform a MUSCLE alignment and UPGMA phylogeny of your D. melanogaster cytochrome B amino acid sequence and the Five other organisms you have collected.
Generate phylogeny based on amino acid sequences of cytochrome B gene.
1. Open your Word file with your amino acid sequences for all 6 cytochrome B proteins.
2. Make sure that your sequences are in FASTA format similar to the format you used for your collection of cytochrome B DNA sequences (the sequences can be found at Appendix).
Example
>Name
AMINOACIDSEQUENCEHERE
>Name2
SECONDAMINOACIDSEQUENCEHERE
3. Open a Chrome browser and navigate to:
https://www.ebi.ac.uk/Tools/msa/muscle/
4. Highlight all 6 of your FASTA sequences in your Word document, copy (Control-C), and paste into the Input Sequences box.
5. Make sure your output parameters are set to ClustalW, click the “notify by email” box, and submit your job.
6. The job should process very quickly. You should be able to click on the generated link within 30 seconds to view your results.
7. Once the alignment is complete, click on the Send to Simple Phylogeny Button.
8. Leave all settings as default, except change Clustering Method to UPGMA, click on “Be notified by email” and submit.
9. Draw (copy and insert) both Phylograms below
Branch length – Cladogram:
Branch length – Real:
1
The post Lab Bioinformatics Fly Lab (Created by Benjamin Carone, Yong Chen and Marina appeared first on PapersSpot.