Implement EvolMap -- Assigned to Roger

From Ucsbgalaxy
Revision as of 10:36, 15 November 2011 by Oakley (talk | contribs)
Jump to: navigation, search
  1. Implement EvolMap [1] program in Galaxy.
  2. Write script to parse output (this could also be done by modifying the EvolMap Java)
    1. The goal is to obtain genes that have 1 and only 1 representative in each species. For the dataset called "algae_genomes" this information is present in the file:

algae_genomes.ancestors_pass2.rn

That is a large file, which contains all gene families, line by line. Essentially, each line of the file is a gene family.

The file begins with a line that starts with

ANCESTOR

Following "Ancestor" is a list of species. The first line contains all species in the analysis, and is referring to the common ancestor of all species in the analysis. Following the "ANCESTOR" line are different categories of gene families. After specifying all the gene families for the first ANCESTOR, there are other gene families grouped into each ancestral node, each specified by an ANCESTOR line.

Lines representing gene families below each ANCESTOR begin with the following words:

PRESENT

Indicates a gene was inferred present in the common ancestor. This line then contains a list of all the genes in this gene family that are present in

DIVERGED

gene is inferred not present in the ancestor, but are duplicated at one of the descendant lineages from the source gene.

SINGULAR

gene is not present in this ancestor and is not gained in any of the descendant lineages [but gained in a later branch]

For now, we are only interested in the