Difference between revisions of "Phylocatenator.pl"
(Created page with "What it does This tool written by THO in perl and implemented in Galaxy that produces a concatenated data set for phylogenetics when not all genes are sampled for all species. …") |
|||
Line 1: | Line 1: | ||
− | What it does | + | ==What it does== |
This tool written by THO in perl and implemented in Galaxy that produces a concatenated data set for phylogenetics when not all genes are sampled for all species. | This tool written by THO in perl and implemented in Galaxy that produces a concatenated data set for phylogenetics when not all genes are sampled for all species. | ||
− | Basic Example | + | ==Basic Example== |
The input data must be in column format. Column 1 is species name, C2 is genefamily, C3 individual gene name, C4 is sequence. Sequences of each gene family must be aligned: | The input data must be in column format. Column 1 is species name, C2 is genefamily, C3 individual gene name, C4 is sequence. Sequences of each gene family must be aligned: | ||
Line 19: | Line 19: | ||
Running phylocatenator on the above data with 0 for genes and 0 for species yields: | Running phylocatenator on the above data with 0 for genes and 0 for species yields: | ||
− | 4 32 | + | 4 32 |
− | species1 acgttagcgcgctatagc--gttagtttgcta | + | species1 acgttagcgcgctatagc--gttagtttgcta |
− | species2 acgttag--cgctataaa?????????????? | + | species2 acgttag--cgctataaa?????????????? |
− | species3 acgttagcgcgctatagcgtgttagtttgcta | + | species3 acgttagcgcgctatagcgtgttagtttgcta |
− | species4 acgttagcgcgctatagc?????????????? | + | species4 acgttagcgcgctatagc?????????????? |
− | Optional Functionality | + | |
+ | ==Optional Functionality== | ||
I. You may enter a list of species. Species not in this list will not be written to the output file. For example, a species list of: | I. You may enter a list of species. Species not in this list will not be written to the output file. For example, a species list of: | ||
− | species1 | + | species1 |
− | species2 | + | species2 |
+ | |||
Would change the above output to: | Would change the above output to: | ||
− | species1 acgttagcgcgctatagc--gttagtttgcta | + | species1 acgttagcgcgctatagc--gttagtttgcta |
− | species2 acgttag--cgctataaa?????????????? | + | species2 acgttag--cgctataaa?????????????? |
− | Table of partition models | + | |
+ | ==Table of partition models== | ||
You may enter a table of models for each gene family/partition. Phylocatenator will then sort all the data to put all data for the same models together. It will then create the appropriate partition file, which will specify each model in raxml. Currently, it is only possible to partiion data into valid raxml models. | You may enter a table of models for each gene family/partition. Phylocatenator will then sort all the data to put all data for the same models together. It will then create the appropriate partition file, which will specify each model in raxml. Currently, it is only possible to partiion data into valid raxml models. | ||
+ | |||
The format is a tab-delimited file as follows: | The format is a tab-delimited file as follows: | ||
− | gene1 WAG | + | gene1 WAG |
− | gene2 JTT | + | gene2 JTT |
− | gene3 DNA | + | gene3 DNA |
− | gene4 WAG | + | gene4 WAG |
+ | |||
Valid models include the following: | Valid models include the following: | ||
− | BIN = binary morphological data | + | BIN = binary morphological data |
− | MULTI = multistate morphological data | + | MULTI = multistate morphological data |
− | DNA = DNA data | + | DNA = DNA data |
− | WAG = one of several protein models listed in raxml help documents | + | WAG = one of several protein models listed in raxml help documents |
Revision as of 23:32, 21 December 2011
What it does
This tool written by THO in perl and implemented in Galaxy that produces a concatenated data set for phylogenetics when not all genes are sampled for all species.
Basic Example
The input data must be in column format. Column 1 is species name, C2 is genefamily, C3 individual gene name, C4 is sequence. Sequences of each gene family must be aligned:
species1 gene1 genenameA acgttagcgcgctatagc species2 gene1 genenameB acgttag--cgctataaa species3 gene1 genenameC acgttagcgcgctatagc species4 gene1 genenameD acgttagcgcgctatagc species1 gene2 genenameE --gttagtttgcta species3 gene2 genenameF gtgttagtttgcta
Two variables are $gene and $species. These set thresholds for inclusion of data. $species is the minimum number of species that contain a particular gene. $gene sets a minimum number of gene families that a species must have to be included in the dataset.
Running phylocatenator on the above data with 0 for genes and 0 for species yields:
4 32 species1 acgttagcgcgctatagc--gttagtttgcta species2 acgttag--cgctataaa?????????????? species3 acgttagcgcgctatagcgtgttagtttgcta species4 acgttagcgcgctatagc??????????????
Optional Functionality
I. You may enter a list of species. Species not in this list will not be written to the output file. For example, a species list of:
species1 species2
Would change the above output to:
species1 acgttagcgcgctatagc--gttagtttgcta species2 acgttag--cgctataaa??????????????
Table of partition models
You may enter a table of models for each gene family/partition. Phylocatenator will then sort all the data to put all data for the same models together. It will then create the appropriate partition file, which will specify each model in raxml. Currently, it is only possible to partiion data into valid raxml models.
The format is a tab-delimited file as follows:
gene1 WAG gene2 JTT gene3 DNA gene4 WAG
Valid models include the following:
BIN = binary morphological data MULTI = multistate morphological data DNA = DNA data WAG = one of several protein models listed in raxml help documents