Zebrafish Genome Literacy Workshop 2023
Exercise 1 - exploring the genome
Go to the region from 54,470,000 to 54,705,000 bp on chromosome 7.
How many contigs make up this region? Are they clones or whole genome shotgun sequence or both?
How many genes are in this region?
How many transcripts are in this region?
How many of the transcripts on the forward strand come from manual annotation and how many from automatic annotation? Note that manually annotated transcripts are given a source of "Havana", because that's the name of the team that did the manual annotation. Any other sources, like "Ensembl", indicate automated annotation run by Ensembl.
Zoom in on fgf4. There are a number of ways to do this. You could estimate the coordinates and type them in. Or you could draw a box round it on the middle panel or on the bottom panel and jump to it. Or you could draw a box round it and then mark it and then jump to the mark, which will also highlight this region so you can keep track of where you are as you navigate around. Or you could click on the transcript and then click on its location. Try a few of these methods to familiarise yourself with the navigation options.
Once you've zoomed in on fgf4, export the genomic sequence for this region in HTML format. What information is in the header of the FASTA sequence file?
Export an image of the bottom panel as a PDF file and open it. Are there any differences between the browser and the PDF?
What is the stable ID of the fgf4 gene? What is the stable ID of the only transcript of fgf4? What is the version number of the transcript's stable ID? What's the stable ID of the fgf4 protein? How many amino acids does the protein have?
Go to the archive site for GRCz10 and find the equivalent region. Hint: You could search for the contig on which fgf4 is located. Look at the surrounding genes. What differences do you see to the current annotation on GRCz11?
Use this sequence to run both BLASTN and BLAT against the zebrafish genome:
CAGAAGGCTGCTCCGTCATGCACTAGGTAGTGTGTGATGTAATTTCGCTGCAGATATAAATATCGTCGGCAAGGGAAAGTATACAGCATGGCTGTGCGTGGCTAGGATTCACAGTCAAAATTTTCCAACTCTTTTTGACA
GCTTGAACAACAGTCTAAAGTATATTGTCTTGAGAGGAAACCCACTGAGCGCGACTTCATGCCGCTCACT
TGGGTTGACCGTAGCTCTGTCCGTGAGTAAGCTGTTTTGTGCGTCTTTTTGGCTGCCGACTCAAGCATAG
AGAAAAACGGTAGCCGGTGTCTTCAACTCCTTTTAGAAGGATGAGTGTCCAGTCGGCCCTCTTGCCAATC
CTGGTCTTAGGACTAATGACAAGCTCTGTGCGCTGCGCTCCGCTGCCCGGTGGACACAGCGGCCCCGTAG
AGCGACGCTGGGAGACCCTCTACTCGCGTTCACTAGCACGAATCCCCGGGGAAAAAAGAGATATCAGCAG
GGACAGTGATTATCTCACGGGCATCAAAAGACTCCGACGTCTCTACTGCAACGTTGGCATTGGGTTTCAT
CTTCAAGTATTACCGGGTGGTAAAATCACCGGCGTACACAACGAAAACCGCTACAGTCTTCTTGAGATAT
CTCCGGTGGAGAGGGGAGTTGTGACACTGTTTGGCGTGCGGAGCGGGCTATTCGTGGCCATGAACAGCAA
AGGGAAGCTTTACGGATCTGAGCAGTTCACAAACGAATGCAAATTCAGAGAAAAGCTCCTCGCAAATAAT
What gene is found? Why does it hit multiple times? Are there any differences between the results for BLASTN and BLAT?
Find your favourite gene on ZFIN and follow one of the links to Ensembl. Switch to the "Region in detail" view. Mark your gene and then explore the region around it. Is it on clone or whole genome shotgun sequence? Are the nearby genes manually or automatically annotated? What types of gene and transcript are represented? How long are the UTRs? If something looks weird then ask your neighbour about it and if they also think it looks weird then ask us about it!