Step 1 - Separating Multiplexed files into Libraries
Barcode Splitting Workflow Introduction[edit]
The workflow can be found at the following link: [1]
After clicking on the link, you can import the workflow into your own galaxy account in order to use it and/or edit it as required.
The output from the illumina sequencing center at UC Davis produces two multiplexed Fastq format files (among thousands of others).
These two files result from the paired-end sequencing run and are named S_6_1 (Left-hand reads) and S_6_3 (right hand reads) and contain all the data from all the species/samples that have have been pooled (multiplexed) into one lane.
They will form the input for our workflow.
The read containing /1 will be used as the left-hand read and the read containing /2 will be used as the right-hand read (see example below).
Fastq Sanger format Left-hand Read:
@HWI-EAS91_1_30788AAXX:7:21:1542:1758/1
GTCAATTGTACTGGTCAATACTAAAAGAATAGGATC
+HWI-EAS91_1_30788AAXX:7:21:1542:1758/1
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
Fastq Sanger format Right-hand Read:
@HWI-EAS91_1_30788AAXX:7:21:1542:1758/2
GCTCCTAGCATCTGGAGTCTCTATCACCTGAGCCCA
+HWI-EAS91_1_30788AAXX:7:21:1542:1758/2
hhhhhhhhhhhhhhhhhhhhhhhh`hfhhVZSWehR
Currently, our output from UC Davis for right hand reads contains a /3 instead of the standard /2, so you will have to do a simple search and replace to correct this. More on this below.
Step by Step[edit]
Prior to beginning the workflow in Galaxy, you need to upload the paired-end fastq files to a history in Galaxy by using the Upload File tool under the Get Data tab. This process will take a while, but can be accelerated if the files are placed in a folder on the macroevolution server [2]
You also have to upload a text file with the names of the respective libraries and the adapter sequences that identify them.
Example:
Dicy GAGCAAT
ScA1 TTGCGAT
LrA3 ACTAGCT
MA1 TGCAACT
MA3 GCATAGT
MA4 CATTCGT
MA5 ATGGCTT
In order to run a workflow, you got to a history with the data you would like to process, then click on the workflow tab. This will take you to a list of all your workflows, click on the one you want and select run.
Steps 1 & 2
The first steps to any workflow involves single or multiple input datasets.
For barcode splitting, we start with two input datasets, Step1 Input dataset for the left-hand fastq file and Step 2 input dataset for the right-hand fastq file.
The right-hand fastq file needs to be corrected from /3 to /2. This will occur automatically with the Find and Replace tool.
Steps 3 & 4
Here you need to select the text file with the adapter sequences.
Steps 5 & 6
Here Fastq groomer will change the file formats to fastqsanger using Illumina 1.3+ quality scores type.
Step 7
The final step of this workflow splits the multiplexed Illumina file into multiple different files using the adapter sequences (barcode). For each barcode, a new fastq file is created. The output is an html table displaying the split counts and file locations. These can be downloaded or the link can be used to upload them into Galaxy.