Uploading WI FASTQ sequence data to SRA¶

For each CENDR release, it is important to also upload the FASTQ files to NCBI's Sequence Read Archive (SRA). If a bioproject already exists, you can create a new submission and link to the previous bioproject. If there is no previous bioproject, you can create a new bioproject and add all relevant data. See below for more instructions.

SRA submission¶

Begin submission

In the SRA submission portal, click the button for "New submission" and follow the prompts.
Remember to add the previous bioproject ID if applicable to link this submission to previous submissions.
Select "Model organism or animal" for biosample type, select "upload file using excel" and download the template

Create biosample sheet (an example can be found below)

An easy starting point here is the sample sheet used for alignment-nf. You will keep the id as the sample name (unique identifier) and strain (note strain also needs to be unique - if there are multiple library preps for the same strain, please append "-2" etc. to the strain)
To these two columns, you can add bioproject, organism, developmental stage, sex, and tissue (same across all strains)
Finally, join this data with the WI species master sheet to get collected by, collection date, and latitude/longitude. - Note: latitude/longitude need to be converted into one shared column in the format "34.89 S 56.15 W" (+ refers to North and East)
Copy the data into the relevent columns in the template, save, and upload to the submission portal

Create SRA metadata sheet

Again, select "upload a file using excel" and download the template
An easy starting point here is, again, the sample sheet used for alignment-nf. You will keep the id as the sample name and the lb as library id. You will also keep fq1 and fq2 for filename and filename2.
You will then add the rest of the columns as shown below. Note: the formatting is very specific for this sheet. The title can be found on the bioproject page. The instrument_model can be found by using the "sequencing_folder" (not shown, but part of the original sample sheet) and looking up the instrument that folder was run on in the Sequencing Runs google sheet (here)
Copy and paste the rows from this file into the template to check for correct formatting. Then save the tab as a tsv and upload to the submission portal.

Pre-upload FASTQ files using FTP

Create a list of files to upload to the FTP server by combining the filename1 and filename2 from the SRA metadata sheet (above).
Begin submission by creating an NCBI account (or signing in -- personal account). Then follow the link to the SRA submission portal
Follow the instructions under the "FTP upload":

# establish FTP connection from terminal (on QUEST!)
# ftp <address>
ftp ftp-private.ncbi.nlm.nih.gov

# navigate to your account folder (i.e.)
cd uploads/kathrynevans2015_u.northwestern.edu_YSlKSXQ4

# create new folder for submission (i.e.)
mkdir 20210121_submission

# exit FTP connection
exit

# back on quest, run the following line to transfer every file with path listed in "files_to_upload.tsv" to that folder
# make sure to change your upload folder and files to upload
module load parallel
parallel --verbose lftp -e \"put -O /uploads/kathrynevans2015_u.northwestern.edu_YSlKSXQ4/20210121_submission {}\; bye\" -u subftp,w4pYB9VQ ftp-private.ncbi.nlm.nih.gov < files_to_upload.tsv

Note: it is important that this step is completely finished before you complete your SRA submission

Complete submission

When finished, in the SRA portal you will be asked to select which folder you want to pull files from
Review and submit! If there are any issues they will let you know.

List of current bioprojects associated with the Andersen Lab¶

C. elegans WI genome FASTQ - PRJNA549503 (link here)