Difference between revisions of "Pancrustacean MegaTree"
From Ucsbgalaxy
(→Mitochondrial Genomes) |
(→Data Sets) |
||
Line 2: | Line 2: | ||
=Methods= | =Methods= | ||
==Data Sets== | ==Data Sets== | ||
+ | ===6-gene Data Set=== | ||
+ | Seth | ||
===Mitochondrial Genomes=== | ===Mitochondrial Genomes=== | ||
# Accession numbers are available on the NCBI website, by searching with the taxon ID for Pancrustacea, which is 197562 [[http://www.ncbi.nlm.nih.gov/genomes/OrganelleResource.cgi?taxid=197562]]. As of July 1, 2012, there were 373 accessions available. Note that some of these are sub-species. The list of accessions can be downloaded at the right near the top, clicking on "Download". | # Accession numbers are available on the NCBI website, by searching with the taxon ID for Pancrustacea, which is 197562 [[http://www.ncbi.nlm.nih.gov/genomes/OrganelleResource.cgi?taxid=197562]]. As of July 1, 2012, there were 373 accessions available. Note that some of these are sub-species. The list of accessions can be downloaded at the right near the top, clicking on "Download". |
Revision as of 00:03, 3 July 2012
Contents
Introduction
Methods
Data Sets
6-gene Data Set
Seth
Mitochondrial Genomes
- Accession numbers are available on the NCBI website, by searching with the taxon ID for Pancrustacea, which is 197562 [[1]]. As of July 1, 2012, there were 373 accessions available. Note that some of these are sub-species. The list of accessions can be downloaded at the right near the top, clicking on "Download".
- The accessions downloaded in May along with scripts to directly pull and parse the data, written by THO, sit on macroevolution in the following directory: /labdata/nfs/lab/scripts/ATOLmt/
- The accessions pulled at that time are in the file AccList.tx . There are 365 accessions in that list.
- I think Seth and Heather somehow manually pulled down the proteins and aligned them. I think perhaps they did this before visiting UCSB. In any event, somehow the genes are in individual fasta files named by gene as *.fa in the directory above. THO then converted to tabular format using the shell script 1_make_tables, which calls the perl script getSpeciesofGB.pl for each gene. That perl script pulls out the species name of each accession from GenBank, and writes a tabular file, which is concatenated together into the file
mtGenome.tab
- That is the proteome data. For the rDNA data, THO wrote scripts to parse data from GenBank files. These are in the subdirectory gbstrip of the directory listed above.
- The next step is to use BioPerl to download all the GenBank files directly from GenBank. This is done using the script getGB.pl. The actual command is:
./getGB.pl AccList.tx > mtGenomes.gb
this pulls accessions in AccList.tx from GenBank and writes the data into the file called mtGenomes.gb
- 5