CeNDR¶
CeNDR User privileges¶
To view, modify, or edit a user account, navigate to the 'Admin -> Users' menu. You can promote existing users to 'Admin' through this form as well.
Updating Staff/Collaborator/Committee Profiles¶
To modify the personal profiles of individuals associated with the project on the 'Staff', 'Scientific Advisory Committee', and 'Collaborators' pages:
- Admin -> Profile Pages
From there you can create, modify, or delete user profiles and select on which page the profile should be published.
Updating Publications¶
The publications page (/about/publications
) is generated using a google spreadsheet. The spreadsheet can be accessed here or through the 'Admin' menu. You can request access to edit the spreadsheet by visiting that link.
The last row of the spreadsheet contains a function that can fetch publication data from Pubmed using its API. Simply fill in column A with the PMID (Pubmed Identifier), and the publication data will be fetched.
Once you have retrieved the latest pubmed data, create a new row and copy/paste the values for any new publications so they are not fetched from the Pubmed API.
Alternatively, you can fill in the details for a publication manually. In either case, any details added should be double checked. Changes should be instant, but there may be some dely on the CeNDR website.
Updating Site Tools¶
To change the version of the container that CeNDR uses for a tool:
- Admin -> Tool Versions
The container versions are populated from the andersenlab docker hub repository. Updating the selected container version tag will switch the version that CeNDR uses for all future operations.
Creating a new release¶
Before a new release is possible, you must have first completed the following tasks:
See Pipeline Overview for details.
- Add new wild isolate sequence data, and process with the trim-fq-nf pipeline.
- Align reads with alignment-nf.
- Call variants with wi-gatk.
- Identify new isotypes using the concordance-nf.
- Update the C. elegans WI Strain Info spreadsheet with the new isotypes and update the release column to reflect the release date.
- Perform population genetic analysis with post-gatk-nf.
- Impute the VCF.
- Annotate the VCF with the annotation-nf pipeline.
Pushing a new release requires a series of steps described below.
Uploading BAMs to Google Storage¶
You will need Google Cloud credentials to upload BAMs to Google Storage.
Install the Google Cloud SDK and configure with your GCP credentials and the caendr
project ID.
gcloud init
Once configured, navigate to the BAM location on b1059.
# CD to bams folder...
cd /projects/b1059/data/c_elegans/WI/alignments/
Run this command in screen to ensure that it completes (it's going to take a while)
gsutil rsync . gs://caendr-site-private-bucket/bam/c_elegans/
Adding isotype images¶
Isolation photos are initially prepared on dropbox and are located in the folder here:
~/Dropbox/Andersenlab/Reagents/WormReagents/isolation_photos/c_elegans
Each file should be named using the isotype name and the strain name strain name in the following format:
<strain>.jpg
Upload the image files to Google Storage. Thumbnails for the images will be automatically generated by a cloud function that monitors the photos bucket. Images should be uploaded to:
gs://caendr-photos-bucket/c_elegans
You can drag/drop the photos using the web-based browser or use gsutil:
# First cd to the appropriate directory
cd ~/Dropbox/Andersenlab/Reagents/WormReagents/isolation_photos/c_elegans
gsutil rsync -x ".DS_Store" . gs://caendr-photos-bucket/c_elegans
Uploading Release Data to Google Storage¶
When you run the wi-gatk
pipeline it will create a folder with the format WI-YYYYMMDD
. These data are output in a format that CeNDR can read as a release. You must upload the WI-YYYYMMDD
folder to google storage with a command that looks like this:
Note
Data from several different pipelines are combined to form a CeNDR data release. Check the bottom of the page of each pipeline to see what data will be incorporated.
# first cd into the folder you want to upload
gsutil rsync . gs://caendr-site-public-bucket/dataset_release/c_elegans/20210121/
Important
Use rsync to copy the files up to google storage. Note that the WI-
prefix has been dropped from the YYYYMMDD
declaration.
There are 2 different expected filename and directory structures for dataset releases, V1 (the legacy format) and V2 (the current format). You will need to select the appropriate release format when adding a new dataset release through the admin panel. These directories may contain additional data not listed here, but these are the files that are referenced by CeNDR. Substitute [RELEASE_VERSION] in each filename with the version number for the dataset release ex: 20210121
V2 Structure:
- [RELEASE_VERSION]/alignment_report.html
- [RELEASE_VERSION]/concordance_report.html
- [RELEASE_VERSION]/divergent_regions_strain.[RELEASE_VERSION].bed
- [RELEASE_VERSION]/divergent_regions_all.[RELEASE_VERSION].bed
- [RELEASE_VERSION]/gatk_report.html
- [RELEASE_VERSION]/haplotype/haplotype.png
- [RELEASE_VERSION]/haplotype/haplotype.pdf
- [RELEASE_VERSION]/haplotype/sweep.pdf
- [RELEASE_VERSION]/haplotype/sweep_summary.tsv
- [RELEASE_VERSION]/methods.md
- [RELEASE_VERSION]/release_notes.md
- [RELEASE_VERSION]/tree/WI.[RELEASE_VERSION].hard-filter.min4.tree
- [RELEASE_VERSION]/tree/WI.[RELEASE_VERSION].hard-filter.min4.tree.pdf
- [RELEASE_VERSION]/tree/WI.[RELEASE_VERSION].hard-filter.isotype.min4.tree
- [RELEASE_VERSION]/tree/WI.[RELEASE_VERSION].hard-filter.isotype.min4.tree.pdf
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].soft-filter.vcf.gz
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].soft-filter.vcf.gz.tbi
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].soft-filter.isotype.vcf.gz
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].soft-filter.isotype.vcf.gz.tbi
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].hard-filter.vcf.gz
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].hard-filter.vcf.gz.tbi
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].hard-filter.isotype.vcf.gz
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].hard-filter.isotype.vcf.gz.tbi
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].impute.isotype.vcf.gz
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].impute.isotype.vcf.gz.tbi
V1 Structure:
- [RELEASE_VERSION]/methods.md
- [RELEASE_VERSION]/haplotype/haplotype.png
- [RELEASE_VERSION]/haplotype/haplotype.thumb.png
- [RELEASE_VERSION]/popgen/tajima_d.png
- [RELEASE_VERSION]/popgen/tajima_d.thumb.png
- [RELEASE_VERSION]/popgen/trees/genome.svg
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].soft-filter.vcf.gz
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].hard-filter.vcf.gz
- [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].impute.vcf.gz
- [RELEASE_VERSION]/multiqc_bcftools_stats.json
- [RELEASE_VERSION]/popgen/trees/genome.pdf
Adding the release to the CeNDR website¶
To publish the release data on the CeNDR website, you must be logged into the CeNDR website as an Admin user. If you do not see the 'Admin' menu in the navbar menu of the site, request administrative privileges for your CeNDR user account.
Perform the following steps in order:
1. Updating the site's internal database of Strains (~5 minutes).
This data is populated from the C. elegans WI Strain Info
Google Sheet linked in the 'Admin' menu. Confirm this data is accurate and formatted correctly before begin the import operation. While the table is being updated, some portions of the site may have errors or display incorrect data. The status of the table update operation is shown on the Database Operations Admin page, but that status only reflects the success of the database operation - you should also verify the correctness of the updated data. The 'strain' table is used to as the data source for the 'Strain Catalog', 'Istotype List', 'Strain Map', 'Strain Issues', etc...
- Admin -> Database Operations
- Click the 'New Operation' button
- Select 'Rebuild strain table from Google Sheet'
- Add any (optional) notes about why the operation is being run
- Click 'Start'
2. (Optional) If the wormbase version used in the release is different from the previous release, that gene data must also be loaded into the site's internal database (~15 minutes). Gene data is compiled from several external databases and used to look up chromosome:start-stop intervals using Wormbase Gene IDs, gene names, Homologenes from other species, etc..:
- Admin -> Database Operations
- Click the 'New Operation' button
- Select 'Rebuild wormbase gene table from external sources'
- Enter the wormbase version number to use (ie: 280 to use version WS280)
- Click 'Start'
3. Finally, you can publish the release files on the site:
- Admin -> Dataset Releases
- Click 'Create Release'
- Enter the RELEASE_VERSION of the release that was uploaded to Google Storage in previous section (format: YYYYMMDD)
- Enter the WORMBASE_VERSION that the release uses (ie: 280 to use version WS280)
- Leave the Report Type as 'V2' for new releases (if you are adding a legacy format release created before 20200101, use V1)
- Click 'Save'
The release should now show as a new tab on the 'Data -> Genomic Data' page
Adding Strain Variant Annotation Data¶
Strain Variant Annotation data must first be gzipped and uploaded to Google Storage. The CSV should be named with the pattern: WI.strain-annotation.bcsq.[VERSION_NUMBER].csv Substitute [RELEASE_VERSION] in the filename with the version number for the dataset release in the form YYYYMMDD ex: 20210401
# First cd to the appropriate directory
gzip WI.strain-annotation.bcsq.[VERSION_NUMBER].csv
gsutil cp WI.strain-annotation.bcsq.[VERSION_NUMBER].csv.gz gs://caendr-db-bucket/strain_variant_annotation/c_elegans/WI.strain-annotation.bcsq.[VERSION_NUMBER].csv.gz
You can also upload the gzipped file directly through the Google Cloud Console to this locations:
gs://caendr-db-bucket/strain_variant_annotation/c_elegans/
Then update the CeNDR tool through the 'Admin' portal (This operation may take a long time to complete ~24hrs):
- Admin -> Database Operations
- Click the 'New Operation' button
- Select 'Rebuild Strain Annotated Variant table from .csv.gz file'
- Enter the [VERSION_NUMBER] to use (ie: 20210401)
- Click 'Start'
Google Storage Details¶
caendr-photos-bucket¶
This bucket contains photos of the environment from where a strain was isolated.
Photos follow the naming pattern:
'gs://
Examples: 'gs://caendr-photos-bucket/c_elegans/ECA1217.jpg' 'gs://caendr-photos-bucket/c_elegans/WN2082.jpg'
When an image is uploaded to the bucket, a scaled down thumbnail will automatically be generated with the naming pattern:
'
Examples: 'gs://caendr-photos-bucket/c_elegans/ECA1217.thumb.jpg' 'gs://caendr-photos-bucket/c_elegans/WN2082.thumb.jpg'
caendr-db-bucket¶
This bucket is used for storing local backups of external databases (ie: Wormbase) and backing up CeNDR's internal database. Strain Variant Annotation data must be manually uploaded to the bucket before you can update the 'Strain Variant Annotation' table in the CeNDR database through the 'Admin' portal. The uploaded CSV file must be gzipped. For the example file below, the database operation form would require version '20210401'.
- caendr-db-bucket
- strain_variant_annotation
- c_elegans
- WI.strain-annotation.bcsq.20210401.csv.gz
caendr-main-terraform-state¶
This bucket contains the state information about the cloud infrastructure for CeNDR as well as a zipped backup of the secret.env file
caendr-nextflow-work-bucket¶
Work bucket for storing intermediate files in between Nextflow stages
caendr-site-private-bucket¶
This bucket contains any files that may (now or in the future) require custom access permissions and should not necessarily be globally public.
tools - This directory contains any data or file depencies for an associated tool.
- tools
- pairwise_indel_primer
- sv.20200815.bed.gz
- sv.20200815.bed.gz.tbi
- sv.20200815.vcf.gz
- sv.20200815.vcf.gz.csi
- sv.20200815.vcf.gz.tbi
- nemascan
- input_data
- all_species
- c_elegans
- annotations
- genotypes
- isotypes
- phenotypes
- c_briggsae
- ...
- c_tropicalis
- ...
bam - contains the BAM and BAI files for each sequenced strain of each species. It also contains the 'Download All BAM/BAI Files' script that is automatically generated by CAENDR (bam_bai_signed_download_script.sh)
- bam
- c_elegans
- AB1.bam
- AB1.bam.bai
- AB4.bam
- AB4.bam.bai
- etc...
reports - This directory contains the source data and the output from running the associated tool. The data and results are organized by the container and version (with the exception of the indel primer), and the hash of the tool's input data.
- reports
- heritability
- v0.01a
- 15251d73f683450317089dae6736dee3
- data.tsv
- result.tsv
- indel-primer
- 00d67eecc8a9ea13b7eb5e615b3a6860
- input.json
- result.tsv
- nemascan-nxf
- v1.0
- 00d67eecc8a9ea13b7eb5e615b3a6860
- data.tsv
- results
- Divergent_and_haplotype
- Genotype_Matrix
- Mapping
- Nextflow
- Phenotypes
- Plots
- Reports
- v1.0a
- 3facbf334a99a2cde3b6c4372cbe7da6
- data.tsv
- results
- Divergent_and_haplotype
- Genotype_Matrix
- Mapping
- Nextflow
- Phenotypes
- Plots
- Reports
caendr-site-public-bucket¶
This bucket contains files that have been manually uploaded to the bucket (following an expected naming convention) or created through the 'Admin' portal. The content of several pages on CeNDR depends on these files and their contents.
dataset_release - This is the directory where formal releases of CeNDR data are uploaded
- dataset_release
- c_elegans
- 20160408
- 20170531
- 20180527
- 20200815
- 20210121
- release_notes.md
- methods.md
- alignment_report.html
- gatk_report.html
- concordance_report.html
- divergent_regions_strain.20210121.bed.gz
- variation
- WI.20210121.soft-filter.vcf.gz
- WI.20210121.soft-filter.vcf.gz.tbi
- WI.20210121.soft-filter.isotype.vcf.gz
- WI.20210121.soft-filter.isotype.vcf.gz.tbi
- WI.20210121.hard-filter.vcf.gz
- WI.20210121.hard-filter.vcf.gz.tbi
- WI.20210121.hard-filter.isotype.vcf.gz
- WI.20210121.hard-filter.isotype.vcf.gz.tbi
- WI.20210121.impute.isotype.vcf.gz
- WI.20210121.impute.isotype.vcf.gz.tbi
- tree
- WI.20210121.hard-filter.min4.tree
- WI.20210121.hard-filter.min4.tree.pdf
- WI.20210121.hard-filter.isotype.min4.tree
- WI.20210121.hard-filter.isotype.min4.tree.pdf
- haplotype
- haplotype.png
- haplotype.pdf
- sweep.pdf
- sweep_summary.tsv
Profile - User Profile photos for Staff, Advisory Committee, etc... are uploaded here when managed using the 'Admin' portal
- profile
- photos
- 0f7394fa0d48431884e673b34f4c4dea.jpg
caendr-site-static-bucket¶
This bucket contains static site resources like images, videos, example data, etc... These files are stored in the CAENDR git repo, but are not accessible from the CeNDR web server. Terraform will automatically upload them to the caendr-site-static-bucket. To reference any of these assets on a page, you can use the Jinja macro ext_asset()
example:
<img src="{{ ext_asset('img/logo.png') }}">
caendr-src-bucket¶
This bucket is used by Terraform during the deployment process to store source code before it gets provisioned