BIO 151 – Sp2020
Exercise #2A – Pairwise Sequence Alignment – NCBI
Resources:
www.ncbi.nlm.nih.gov/BLAST/
https://www.ncbi.nlm.nih.gov/gquery/
Question 1) Michael Crichton’s fantasy about cloning dinosaurs, Jurassic Park contains a putative dinosaur DNA sequence. Use nucleotide-nucleotide BLAST against the “Nucleotide collection” database to identify the real source of the following sequence:
>DinoDNA “Dinosaur DNA” from Crichton’s JURASSIC PARK p. 103 nt 1-1200
GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGCGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCGTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGCTGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTGCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAAAGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAGATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACTCCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCTGGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGGCCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAACGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCGCACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAACAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGCTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCCGCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGGCCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGGCCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT
Question 2) Mark Boguski of the NBCI noticed this and supplied Crichton with a better sequence for the sequel, The Lost World. (Q2A) Identify the most likely source of this sequence using nucleotide-nucleotide BLAST. Give the genus/species and the common name (google the genus/species) of this organism.
>DinoDNA “Dinosaur DNA” from Crichton’s THE LOST WORLD p. 135
GAATTCCGGAAGCGAGCAAGAGATAAGTCCTGGCATCAGATACAGTTGGAGATAAGGACGGACGTGTGGCAGCTCCCGCAGAGGATTCACTGGAAGTGCATTACCTATCCCATGGGAGCCATGGAGTTCGTGGCGCTGGGGGGGCCGGATGCGGGCTCCCCCACTCCGTTCCCTGATGAAGCCGGAGCCTTCCTGGGGCTGGGGGGGGGCGAGAGGACGGAGGCGGGGGGGCTGCTGGCCTCCTACCCCCCCTCAGGCCGCGTGTCCCTGGTGCCGTGGGCAGACACGGGTACTTTGGGGACCCCCCAGTGGGTGCCGCCCGCCACCCAAATGGAGCCCCCCCACTACCTGGAGCTGCTGCAACCCCCCCGGGGCAGCCCCCCCCATCCCTCCTCCGGGCCCCTACTGCCACTCAGCAGCGGGCCCCCACCCTGCGAGGCCCGTGAGTGCGTCATGGCCAGGAAGAACTGCGGAGCGACGGCAACGCCGCTGTGGCGCCGGGACGGCACCGGGCATTACCTGTGCAACTGGGCCTCAGCCTGCGGGCTCTACCACCGCCTCAACGGCCAGAACCGCCCGCTCATCCGCCCCAAAAAGCGCCTGCTGGTGAGTAAGCGCGCAGGCACAGTGTGCAGCCACGAGCGTGAAAACTGCCAGACATCCACCACCACTCTGTGGCGTCGCAGCCCCATGGGGGACCCCGTCTGCAACAACATTCACGCCTGCGGCCTCTACTACAAACTGCACCAAGTGAACCGCCCCCTCACGATGCGCAAAGACGGAATCCAAACCCGAAACCGCAAAGTTTCCTCCAAGGGTAAAAAGCGGCGCCCCCCGGGGGGGGGAAACCCCTCCGCCACCGCGGGAGGGGGCGCTCCTATGGGGGGAGGGGGGGACCCCTCTATGCCCCCCCCGCCGCCCCCCCCGGCCGCCGCCCCCCCTCAAAGCGACGCTCTGTACGCTCTCGGCCCCGTGGTCCTTTCGGGCCATTTTCTGCCCTTTGGAAACTCCGGAGGGTTTTTTGGGGGGGGGGCGGGGGGTTACACGGCCCCCCCGGGGCTGAGCCCGCAGATTTAAATAATAACTCTGACGTGGGCAAGTGGGCCTTGCTGAGAAGACAGTGTAACATAATAATTTGCACCTCGGCAATTGCAGAGGGTCGATCTCCACTTTGGACACAACAGGGCTACTCGGTAGGACCAGATAAGCACTTTGCTCCCTGGACTGAAAAAGAAAGGATTTATCTGTTTGCTTCTTGCTGACAAATCCCTGTGAAAGGTAAAAGTCGGACACAGCAATCGATTATTTCTCGCCTGTGTGAAATTACTGTGAATATTGTAAATATATATATATATATATATATATCTGTATAGAACAGCCTCGGAGGCGGCATGGACCCAGCGTAGATCATGCTGGATTTGTACTGCCGGAATTC
(Q2B) What is the E-value for this match? Is this good or bad?
(Q2C) Based on what you know (or can find out) about dinosaur and bird evolution, briefly say why this second choice would be a better choice that the first choice.
Question 3) Cytokines are the molecular messengers of the vertebrate immune system, coordinating the local and systemic immune responses to infective organisms. One homologue of the human cytokine macrophage migration inhibitory factor (MIF) has been isolated from the parasitic nematode Brugia malayi.
(Q3A) What is the accession number for the human MIF gene (see instructions)? Give the latest version. (Q3B) Perform a nucleotide blast search (see instructions). What are the results?
(Q3C) Perform a blastx protein search (see instructions). How many “hits,” if any, are obtained? Give the gene name and protein accession number for any hit (since you are searching protein databases, any hits are for the protein sequence, not the nucleotide sequence). (you will need these protein accession numbers and sequences later)
(Q3D) Compare the data given for the first 2 hits (Bm-MIF-1 and macrophage migration….): Max Score; Total Score; E-value, etc (NOTE: Acc Len is the length of the sequence [in amino acids]). What do you see? What might be your conclusion?
(Q3E) Give (copy/paste) the amino acid sequence for all three proteins on the list. To find the protein sequences, click on the accession number for the individual protein. This will take you to the gene page for the protein. Scroll down for the sequence. (for question #4, you will also need the amino acid sequences for these proteins)
(Q3F) Compare the E-values for the hits found here with the E-value for the Dino DNA from question #2. What does it tell you about the homology between human MIF and the B. malayi proteins as compared to the homology between the “dino” DNA and its match?
Question 4) Next, perform pairwise alignments (see instructions) of the first two proteins described in exercise 3 (Bm-MIF-1 and macrophage migration). (Q4A) What difference is found between the two proteins? (Q4B) How do the E-values compare with the two previous alignments done with the Dino DNA?
Exercise #2B – Pairwise Sequence Alignment – EMBL –
Resources:
EMBOSS Matcher ( https://www.ebi.ac.uk/Tools/psa/emboss_matcher/ )
EMBOSS Needle ( https://www.ebi.ac.uk/Tools/psa/emboss_needle/
Question 5) We will now align the sequences for human MIF and its nematode analog using a different set of programs. Rather than NCBI-BLAST, we’ll be using using EMBL-EMBOSS.
EMBOSS matcher ( https://www.ebi.ac.uk/Tools/psa/emboss_matcher/ ) (which does local alignments)
EMBOSS Needle ( https://www.ebi.ac.uk/Tools/psa/emboss_needle/ ) which does global alignments.
For both of these you need to use a sequence not an accession number. You could copy paste the entire gene sequence into the box for either of these programs. However, since that would include many non-coding sequences, we’ll use the mRNA/coding sequence for our comparison instead. So you need to go to the human MIF gene page by (1) typing/pasting the name of the gene (use the same gene name copied from Exercise 2A) into the NCBI search engine on the NCBI home page. (2) You can also get to this page by typing/pasting the accession number into the search engine on the NCBI home page. This will take you to a “database” page that has as a title “Homo sapiens macrophage…….RefSeqGene on chromosome 22” title. If you click on that title, it will take you to the same MIF gene page as entering the gene name into the search engine (option 1 above).
On this page, as before, the entire gene sequence is located at the bottom of this page. If you scroll down and look at the sequence, the numbers on the left give the position/number of the nucleotides in the sequence. This can be used to locate specific sequences. In addition, on this page, you will see along the left side of the page a series of annotation about the gene which we briefly discussed before. Clicking on these annotations will show you information about the sequence on the sequence located at the bottom of the page. During lecture you were shown that clicking on the mRNA will highlight on the gene sequence where this mRNA sequence is located. Also, to the right of these annotations are sort of subheadings for the topic.
NOTE: to answer some of the questions you will need to copy/paste the entire mRNA sequence and the coding sequence (start to stop codon). The comments A-D below will take you to various pages where this information is available. It will make answering the questions easier if you copy/save the mRNA and protein sequences as you go through the pages rather than have to go back to them later.)
A. Clicking on the first “gene” for instance will not give you your MIF gene but an MIF antisense gene transcript (In the subheadings/wording next to “gene” you’ll see “compliment…note=”antisense RNA 1.”) This sequence is highlighted on the 5’3’ gene sequence at the bottom of the page (scroll down).
B. Clicking on the 2nd “gene” will give you the mRNA sequence for the MIF gene (this is the region copied to give you the mRNA). To the right of “gene” you will see subheadings/wording that says 5006..5205” and “gene=MIF.” In this case, this is the sequence of the mRNA (primary transcript) including exons, introns and 3’ and 5’ untranslated regions.
C. Clicking on “mRNA” will give you the exons and introns as they would appear on the mRNA (scroll down to sequence). The first highlighted base is the first base in the mRNA (this is the start exon 1) and the last highlighted base is the last base in the mRNA (this is end of exon 3). To copy the mRNA sequence, you can (1) copy from the beginning to end of these highlighted blocks OR (2) on the right of this page you will see a heading “Reference sequence information — with the hyperlink RefSeq mRNA. Clicking on that will take you to the NCBI page for the mRNA and scrolling down will give you the mRNA sequence which you can copy. Under the mRNA are several subheadings that are hyperlinks to a number of different resources: (1) db_xref= “GeneID:4282.” Clicking on that link will take you to a “gene” page in NCBI that contains interesting information on that gene and its gene product. (2) db_xref=”HGNC:HGNC:7097” will take to you the JUGO Gene Nomenclature Committee data resource page for MIF. (3) db_xref=”NIM:153602” will link you to Online Mendelian Inheritance in Man data resource page for MIF.
D. Clicking on “CDS” will give you the codons for the protein. This will be inside the first and last exons since there are 5’ and 3’ untranslated regions on the mRNA (see the next page for a little figure attempting to show this). In the subheadings/wording under CDS you will see the same hyperlinks that you saw under mRNA. You will also see additional hyperlinks. (1) protein_id=”NP_002406.1” – this will take you to the NCBI protein page where you’ll find the amino acid sequence of the protein. FYI you can also find the amino acid sequence at the end of this CDS segment. (2) the db_xref=”CCDS:CCDS13819.1” – This will take you to the NCBI Consensus CDS page for MIF. This page contains (among other things) the continuous mRNA sequence and the protein’s amino acid sequence.
mRNA
CDS
For the human MIF gene, answer the following questions:
Question 5A – Click on the first “gene”. Scroll down. At what nucleotide/position does the gene start at?
Question 5B – Click on the 2nd “gene”. Scroll down. At what position does the mRNA start? Stop?
Question 5C – Click on mRNA. How many exons are present?
Question 5D – Click on CDS. At what position is the start codon? At what position is the stop codon?
Question 5D – Copy/paste the mRNA sequence for the human MIF gene
Question 5E – Copy paste the coding sequence (start-to-top codon) for the human MIF gene
In lecture, we discussed the results for a pairwise sequence alignment when the mRNA sequence for the human MIF gene and the bm-MIF homolog were compared. Both the global and the local alignments were performed. Both gave similar % identity and similarity (high 30% range) but were different in the % of gaps (local<
B. Repeat steps B-G above to perform the needle pairwise alignment.
Question #6
A. What is meant by a local alignment? What is meant by a global alignment?
B. Attach the two screen shots of your pairwise search results – Compare the two: % identity/similarity, gaps, start of homology
C. How do these coding sequence alignments compare with the whole mRNA comparisons between human MIF and bm-MIF homology given in the power point?
Question #7
6) Is a cow more closely related to an elephant or to a walrus? To answer this question, first find the protein sequence of alpha hemoglobin from each of these three organisms. To make sure we all get the same sequence:
Walrus – NCBI home page search Alpha hemoglobin walrus – Protein data base (very important)
First reference – 142 aa protein – click on FASTA and that will give you the sequence
Elephant – NCBI home page search Alpha hemoglobin elephant – Protein data base (very important)
First reference – 142 aa protein – click on FASTA and that will give you the sequence
Cow – NCBI home page search Alpha hemoglobin cow – Protein data base (very important)
First reference – 142 aa protein – click on FASTA and that will give you the sequence
Next, perform pairwise sequence alignments using EMBOSS-Matcher pairing cow with walrus and then cow with elephant. Record the amino acid identities and similarities. What is your conclusion? Which one (walrus or elephant) is the cow more closely related to? Explain your answer. (the closer two sequences are, the more related the two organisms are)