Pancrustacean MegaTree
From Ucsbgalaxy
Contents
Introduction
Methods
Data Sets
6-gene Data Set
Seth
Mitochondrial Genomes
- Accession numbers are available on the NCBI website, by searching with the taxon ID for Pancrustacea, which is 197562 [[1]]. As of July 1, 2012, there were 373 accessions available. Note that some of these are sub-species. The list of accessions can be downloaded at the right near the top, clicking on "Download".
- The accessions downloaded in May along with scripts to directly pull and parse the data, written by THO, sit on macroevolution in the following directory: /labdata/nfs/lab/scripts/ATOLmt/
- The accessions pulled at that time are in the file AccList.tx . There are 365 accessions in that list.
- I think Seth and Heather somehow manually pulled down the proteins and aligned them. I think perhaps they did this before visiting UCSB. In any event, somehow the genes are in individual fasta files named by gene as *.fa in the directory above. THO then converted to tabular format using the shell script 1_make_tables, which calls the perl script getSpeciesofGB.pl for each gene. That perl script pulls out the species name of each accession from GenBank, and writes a tabular file, which is concatenated together into the file
mtGenome.tab
- That is the proteome data. For the rDNA data, THO wrote scripts to parse data from GenBank files. These are in the subdirectory gbstrip of the directory listed above.
- The next step is to use BioPerl to download all the GenBank files directly from GenBank. This is done using the script getGB.pl. The actual command is:
./getGB.pl AccList.tx > mtGenomes.gb
this pulls accessions in AccList.tx from GenBank and writes the data into the file called mtGenomes.gb
- 5