Mapping reads

From QuigleyWiki
Jump to: navigation, search

Once you have some sequence, you'll want to map it to some reference. 100M reads that are 50 bases long each really aren't gonna help anybody until you put them in context. There are lots of mappers out there. Some are easier to use than others, and for the most part, any one of them will get the job done. What do they do? Basically, they perform a similar function as "find" on your Word doc or web browser, but they're engineered to have a little more slop. And go a lot faster. A good mapper will map millions of reads an hour. Different ones use slightly different strategies to map, which you can read about if you're super excited about it. If you're reading this page, though, you are probably more likely to think, "I have RNAseq data and no list of differentially-expressed genes. How do I get one?" So, we can leave detailed discussions on mapping strategies for another time. Since there are a bunch of mappers, which one do you use?


Taejoon Kwon did a nice test of how the different mappers performed on X. laevis, which is important because the pseudogenomes look very similar and that can confuse mappers sometimes. He found that RNA-STAR is probably the best, but the commands to run it are very clunky (see below).


To keep it simple, I am going to recommend first-timers go with bowtie2. It's a fine mapper, even if it's not as good as RNA-STAR. I'll also give RNA-STAR commands for you below to try if you're ambitious.