Difference between revisions of "Program to find probes"

Revision as of 10:23, 28 February 2014

project on galaxy-dev in PIA user

opsinomics.pl Program flow:

1. Reads in sequence.fasta sequence by sequence
- 1b. creates a hash of all sequence ids to their corresponding sequences
2. Reads in tree.tre
3. Reroots the tree on the midpoint
4. Gets all the nodes sorted by how many generations they are from the root and adds them to a nodes array
5. Goes through each node in reverse order (starting farthest from the root)
- 5b. Retrieves the node's sibling(s)
  - If one of the siblings is blank, writes the other's sequence to the output file
- 5c. Uses blastn's bl2seq to compare the two sequences
- 5d. Uses SimpleAlign to get the best consensus string
  - If the consensus is smaller than $MINLENGTH, writes it to the output file
- 5e. Updates the sequence hash, writing the consensus string to the ancestor node

Verbal description of program:

Stage 2 (Not written yet)
- Use a 'sliding window' approach to test sub-sequences (putative-probe) of each full sequence in the output file.
- Use blast to find all full sequences the putative-probe hits, with particular similarity and length parameters.
- Find the PD (phylogenetic diversity=sum of branch lengths) of the full sequences that were hit
- Use 1 or 2 putative probes from each sequence that hit the maximum PD

@@ Line 1: / Line 1: @@
-project on oakley-dev in PIA user
+project on galaxy-dev in PIA user
+opsinomics.pl
 Program flow:
-*1. Creates a BLAST+ database using sequence.fasta
+*1. Reads in sequence.fasta sequence by sequence
-*2. Reads in sequence.fasta sequence by sequence
+**1b. creates a hash of all sequence ids to their corresponding sequences
-**2b. creates a hash of all sequence ids to false
+*2. Reads in tree.tre
-*3. Reads through the sequence.fasta file again, this time running blastn on each sequence
+*3. Reroots the tree on the midpoint
-**3b. For each hit, set the hash with the display_id as key to true
+*4. Gets all the nodes sorted by how many generations they are from the root and adds them to a nodes array
+*5. Goes through each node in reverse order (starting farthest from the root)
+**5b. Retrieves the node's sibling(s)
+*** If one of the siblings is blank, writes the other's sequence to the output file
+**5c. Uses blastn's bl2seq to compare the two sequences
+**5d. Uses SimpleAlign to get the best consensus string
+*** If the consensus is smaller than $MINLENGTH, writes it to the output file
+**5e. Updates the sequence hash, writing the consensus string to the ancestor node
 Verbal description of program: