Backup data¶
Backup files to google cloud¶
1. Download and setup google cloud SDK - only need to do once
# download from google
curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-sdk-325.0.0-linux-x86_64.tar.gz
# un-tar
tar -xf google-cloud-sdk-325.0.0-linux-x86_64.tar.gz
# add to path
./google-cloud-sdk/install.sh
# initialize and authenticate
gcloud init
# follow prompts
2. Add files from QUEST folder to google cloud bucket
You can see the structure of the google buckets by going here. Most things are currently in the elegansvariation.org
bucket.
# copy all files in current directory to specified bucket
gsutil cp * gs://<YOUR_BUCKET_NAME>
# copy all files in parallel (good for multiple files)
gsutil -m cp * gs://<YOUR_BUCKET_NAME>
# view list of files in bucket
gsutil ls gs://<YOUR_BUCKET_NAME>
# example - move all vcf files to the variation folder under 20210121 release
gsutil -m cp * gs://elegansvarition.org/releases/20210121/variation
Note
We store .bam
and .vcf
files on google because they are used by CeNDR. The .bam
files are not separated by release but the .vcf
files (and accompanying files for a CeNDR release) are.
Local Backup¶
We also store all .fastq
files locally in duplicate. If anything ever happens to any downstream data we can always recreate it with the original FASTQ files. This step is extremely important.
A list of all data and which hardrive it is backed up on can be found here. The main copies of all data can be found on Pangolin, Armadillo, Raven, Karasu, and Turkey
- Choose a hardrive that has enough space for the data you need to back up and plug it in to your computer
- Change directories on your local computer into the hardrive
cd /Volumes/{name_of_harddrive}/{path}
- Sync data from QUEST with
Rsync -avh <netid>@quest.northwestern.edu:<path_to_folder_quest> .
. Don't forget the '.'!!! - When finished, check that the total file size is the same on the hard drive and on QUEST, you can always re-run the Rsync command to verify it is complete.
- Repeat with the second hard drive
- Complete the data backup google sheet with your new backup.
Adding FASTQ to SRA project¶
Checkout this guide for how to upload data to SRA. This should be done with wild isolate FASTQ files after each new CeNDR release. The SRA submission ID might need to be cited in publications.