Difference between revisions of "Downloads"

From QuigleyWiki
Jump to: navigation, search
Line 1: Line 1:
== Databases ==
== Databases ==
*[https://www.dropbox.com/s/emsm5hjxtel6ymo/LAEVIS_7b_1kb.fa?dl=0 ''X. laevis'' genome build v7b] (fasta file)
*[https://www.dropbox.com/s/s1mduhaw9melm90/LAEVIS_7.1_1kb.fa?dl=0 ''X. laevis'' genome build v7b] (fasta file)
*[https://www.dropbox.com/s/u48q18r467ucb8y/Mayball_transcripts.fa?dl=0 ''X. laevis'' Mayball gene models] (fasta file)
*[https://www.dropbox.com/s/u48q18r467ucb8y/Mayball_transcripts.fa?dl=0 ''X. laevis'' Mayball gene models] (fasta file)
*[https://www.dropbox.com/s/00kkxgxk9qciucr/Mayball_v7b.gtf?dl=0 ''X. laevis'' Mayball exon positions in genome v7b] (gtf file)
*[https://www.dropbox.com/s/00kkxgxk9qciucr/Mayball_v7b.gtf?dl=0 ''X. laevis'' Mayball exon positions in genome v7b] (gtf file)
*[ftp://ftp.xenbase.org/pub/Genomics/JGI/Xenla9.1/Xla.v91.repeatMasked.fa.gz "X. laevis" genome build v9.1 (fasta file)
*[https://www.dropbox.com/s/imvo58ej91slk8l/Mayball_XL_v9.1_best_blat.gtf?dl=0 "X. laevis" Mayball exon positions in genome v9.1] (gtf file)

Revision as of 08:00, 8 February 2017




The genome

Let's face it, Xenopus rules. Great imaging, easy to manipulate, fantastic biochemistry. However, not everybody knows it is also now an extremely tractable genomics model. Above, find genome build version 7b and resources to work with it. Version 7b is in 17,006 scaffolds (over 1 kb, the total number including the dinky ones is more like 400,000). Still, it'll get you 90% of the way there.

While 90% is pretty good, a 100%-dazzling, chromosome-level assembly is very close. For years now, Dan Rokhsar's group, especially Adam Session and Taejoon Kwon and in conjunction with the Xenopus Genome Project Consortium have been hard at work to produce the highest quality genome possible. It can be pretty hard to put together the last pieces, though.

To help, I generated HiC data from X. laevis embryos. HiC is usually used to obtain information about long-range looping chromosomal interactions - the way it works is that you fix the genome in its glorious, bowl-of-spaghetti in situ state, cut randomly with restriction enzymes, religate and sequence, hoping to catch loops by seeing two distant pieces now stuck together. Most of data you get, though, are pieces that are linearly right next to each other with no looping required, which happens when you cut them apart and they religate like nothing ever happened. While not informative if you're studying long-range interactions, these data are perfect for figuring out which pieces of DNA are next to each other in a linear sequence and assembling chromosomes.

Even with nice data, it's a tricky problem - I made an assembly with Lachesis from Jay Shendure's lab that was only so-so - but then Nik Putnam took my raw HiC data and knocked it out of the park with his assembler HiRise and made X. laevis v8.0 with Adam and Taejoon, assisted by BAC-FISH data and extensive fine-grain corrections from the Consortium. It's simply gorgeous.

X. laevis v8.0 should be released shortly - sometime early fall 2015 - but for now, you can download v7b above.

The gene models

Genomic sequence is handy, but it's a lot handier if you know what's in it. X. laevis has several EST projects (like this one), but they are older and incomplete. To address this, Taejoon took some 2B RNAseq reads generated and donated by the community (including 1B from me) and generated several collections of gene models. Above, find a fasta file containing Taejoon's Mayball gene model release, which is a nice option to align RNAseq experiments to.

Like the rest of this project, Mayball is an interim release, but it's still excellent. To make comparisons to human biology easier, I inserted the ensembl gene ID of the best-matching human ortholog in the name of each transcript. This means the name format is

gene ID | ensembl ID | unique gene identifier | position of gene in genome build v7b: scaffold_start-end, +/- strand. 



Yeah, it's clunky. Sue me.

X. laevis has two pseudogenomes (it's a long story, you'll have to wait for the paper), and historically Xenopus researchers have referred to gene copies (homeologs) as "A" and "B" forms. The Mayball naming convention doesn't discriminate between "A" and "B" forms (which are now going to be named "L" and "S", referring to the long and short chromosomes of the pseudogenomes in X. laevis. They'll be in future builds but aren't in this one). As such, you'll see duplicated names for homeologs (e.g., you'll see two rfx2's) but you can discriminate between them using the unique gene identifier or positional information (they're generally on different scaffolds).

The annotation

Finally, above find a gtf-formatted file containing all exonic positions of Mayball models in v7b. To confirm active transcriptional start sites, one can check for overlap with the histone modification H3K4me3, and since RNAseq reads can have a 3' bias, it's a good idea to have an independent measure of promoters. Simon van Heeringen in Gert Veenstra's group and I each performed ChIPseq on H3K4me3 and a few other marks and Taejoon used the data to refine the transcriptional start site of the Mayball models.

You can use this annotation file along with v7b to visualize experiments with IGV or other genome browsers. v7b combined with the Mayball models and my naming convention is also the default X. laevis infrastructure in place if you use HOMER, a fantastic package for sequence manipulation and motif-finding. If you're a frog person and you use HOMER and this infrastructure all together, your life will be a dream.

The data

Last year, we published our first genomics paper. It showed that in X. laevis multiciliated cells, a scaffolding protein binds to the cell cycle regulator e2f4 and changes its targets to discourage cell cycle progression but promote centriole duplication, enabling the hundreds of cilia in that cell type to develop. There are RNAseq and ChIPseq experiments there; please download them, align them to these resources, and see for yourself how they look. We have other papers currently in review; please also examine the data from them above.