CeNDR

CeNDR User privileges

To view, modify, or edit a user account, navigate to the 'Admin -> Users' menu. You can promote existing users to 'Admin' through this form as well.

Updating Staff/Collaborator/Committee Profiles

To modify the personal profiles of individuals associated with the project on the 'Staff', 'Scientific Advisory Committee', and 'Collaborators' pages:

  • Admin -> Profile Pages

From there you can create, modify, or delete user profiles and select on which page the profile should be published.

Updating Publications

The publications page (/about/publications) is generated using a google spreadsheet. The spreadsheet can be accessed here or through the 'Admin' menu. You can request access to edit the spreadsheet by visiting that link.

The last row of the spreadsheet contains a function that can fetch publication data from Pubmed using its API. Simply fill in column A with the PMID (Pubmed Identifier), and the publication data will be fetched.

Once you have retrieved the latest pubmed data, create a new row and copy/paste the values for any new publications so they are not fetched from the Pubmed API.

Alternatively, you can fill in the details for a publication manually. In either case, any details added should be double checked. Changes should be instant, but there may be some dely on the CeNDR website.

Updating Site Tools

To change the version of the container that CeNDR uses for a tool:

  • Admin -> Tool Versions

The container versions are populated from the andersenlab docker hub repository. Updating the selected container version tag will switch the version that CeNDR uses for all future operations.

Creating a new release

Before a new release is possible, you must have first completed the following tasks:

See Pipeline Overview for details.

  1. Add new wild isolate sequence data, and process with the trim-fq-nf pipeline.
  2. Align reads with alignment-nf.
  3. Call variants with wi-gatk.
  4. Identify new isotypes using the concordance-nf.
  5. Update the C. elegans WI Strain Info spreadsheet with the new isotypes and update the release column to reflect the release date.
  6. Perform population genetic analysis with post-gatk-nf.
  7. Impute the VCF.
  8. Annotate the VCF with the annotation-nf pipeline.

Pushing a new release requires a series of steps described below.

Uploading BAMs to Google Storage

You will need Google Cloud credentials to upload BAMs to Google Storage.

Install the Google Cloud SDK and configure with your GCP credentials and the caendr project ID.

gcloud init 

Once configured, navigate to the BAM location on b1059.

# CD to bams folder...
cd /projects/b1059/data/c_elegans/WI/alignments/

Run this command in screen to ensure that it completes (it's going to take a while)

gsutil rsync . gs://caendr-site-private-bucket/bam/c_elegans/

Adding isotype images

Isolation photos are initially prepared on dropbox and are located in the folder here:

~/Dropbox/Andersenlab/Reagents/WormReagents/isolation_photos/c_elegans

Each file should be named using the isotype name and the strain name strain name in the following format:

<strain>.jpg

Upload the image files to Google Storage. Thumbnails for the images will be automatically generated by a cloud function that monitors the photos bucket. Images should be uploaded to:

gs://caendr-photos-bucket/c_elegans

You can drag/drop the photos using the web-based browser or use gsutil:

# First cd to the appropriate directory
cd ~/Dropbox/Andersenlab/Reagents/WormReagents/isolation_photos/c_elegans

gsutil rsync -x ".DS_Store" . gs://caendr-photos-bucket/c_elegans

Uploading Release Data to Google Storage

When you run the wi-gatk pipeline it will create a folder with the format WI-YYYYMMDD. These data are output in a format that CeNDR can read as a release. You must upload the WI-YYYYMMDD folder to google storage with a command that looks like this:

Note

Data from several different pipelines are combined to form a CeNDR data release. Check the bottom of the page of each pipeline to see what data will be incorporated.

# first cd into the folder you want to upload
gsutil rsync . gs://caendr-site-public-bucket/dataset_release/c_elegans/20210121/

Important

Use rsync to copy the files up to google storage. Note that the WI- prefix has been dropped from the YYYYMMDD declaration.

There are 2 different expected filename and directory structures for dataset releases, V1 (the legacy format) and V2 (the current format). You will need to select the appropriate release format when adding a new dataset release through the admin panel. These directories may contain additional data not listed here, but these are the files that are referenced by CeNDR. Substitute [RELEASE_VERSION] in each filename with the version number for the dataset release ex: 20210121

V2 Structure:

  • [RELEASE_VERSION]/alignment_report.html
  • [RELEASE_VERSION]/concordance_report.html
  • [RELEASE_VERSION]/divergent_regions_strain.[RELEASE_VERSION].bed
  • [RELEASE_VERSION]/divergent_regions_all.[RELEASE_VERSION].bed
  • [RELEASE_VERSION]/gatk_report.html
  • [RELEASE_VERSION]/haplotype/haplotype.png
  • [RELEASE_VERSION]/haplotype/haplotype.pdf
  • [RELEASE_VERSION]/haplotype/sweep.pdf
  • [RELEASE_VERSION]/haplotype/sweep_summary.tsv
  • [RELEASE_VERSION]/methods.md
  • [RELEASE_VERSION]/release_notes.md
  • [RELEASE_VERSION]/tree/WI.[RELEASE_VERSION].hard-filter.min4.tree
  • [RELEASE_VERSION]/tree/WI.[RELEASE_VERSION].hard-filter.min4.tree.pdf
  • [RELEASE_VERSION]/tree/WI.[RELEASE_VERSION].hard-filter.isotype.min4.tree
  • [RELEASE_VERSION]/tree/WI.[RELEASE_VERSION].hard-filter.isotype.min4.tree.pdf
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].soft-filter.vcf.gz
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].soft-filter.vcf.gz.tbi
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].soft-filter.isotype.vcf.gz
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].soft-filter.isotype.vcf.gz.tbi
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].hard-filter.vcf.gz
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].hard-filter.vcf.gz.tbi
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].hard-filter.isotype.vcf.gz
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].hard-filter.isotype.vcf.gz.tbi
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].impute.isotype.vcf.gz
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].impute.isotype.vcf.gz.tbi

V1 Structure:

  • [RELEASE_VERSION]/methods.md
  • [RELEASE_VERSION]/haplotype/haplotype.png
  • [RELEASE_VERSION]/haplotype/haplotype.thumb.png
  • [RELEASE_VERSION]/popgen/tajima_d.png
  • [RELEASE_VERSION]/popgen/tajima_d.thumb.png
  • [RELEASE_VERSION]/popgen/trees/genome.svg
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].soft-filter.vcf.gz
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].hard-filter.vcf.gz
  • [RELEASE_VERSION]/variation/WI.[RELEASE_VERSION].impute.vcf.gz
  • [RELEASE_VERSION]/multiqc_bcftools_stats.json
  • [RELEASE_VERSION]/popgen/trees/genome.pdf

Adding the release to the CeNDR website

To publish the release data on the CeNDR website, you must be logged into the CeNDR website as an Admin user. If you do not see the 'Admin' menu in the navbar menu of the site, request administrative privileges for your CeNDR user account.

Perform the following steps in order:

1. Updating the site's internal database of Strains (~5 minutes). This data is populated from the C. elegans WI Strain Info Google Sheet linked in the 'Admin' menu. Confirm this data is accurate and formatted correctly before begin the import operation. While the table is being updated, some portions of the site may have errors or display incorrect data. The status of the table update operation is shown on the Database Operations Admin page, but that status only reflects the success of the database operation - you should also verify the correctness of the updated data. The 'strain' table is used to as the data source for the 'Strain Catalog', 'Istotype List', 'Strain Map', 'Strain Issues', etc...

  • Admin -> Database Operations
  • Click the 'New Operation' button
  • Select 'Rebuild strain table from Google Sheet'
  • Add any (optional) notes about why the operation is being run
  • Click 'Start'

2. (Optional) If the wormbase version used in the release is different from the previous release, that gene data must also be loaded into the site's internal database (~15 minutes). Gene data is compiled from several external databases and used to look up chromosome:start-stop intervals using Wormbase Gene IDs, gene names, Homologenes from other species, etc..:

  • Admin -> Database Operations
  • Click the 'New Operation' button
  • Select 'Rebuild wormbase gene table from external sources'
  • Enter the wormbase version number to use (ie: 280 to use version WS280)
  • Click 'Start'

3. Finally, you can publish the release files on the site:

  • Admin -> Dataset Releases
  • Click 'Create Release'
  • Enter the RELEASE_VERSION of the release that was uploaded to Google Storage in previous section (format: YYYYMMDD)
  • Enter the WORMBASE_VERSION that the release uses (ie: 280 to use version WS280)
  • Leave the Report Type as 'V2' for new releases (if you are adding a legacy format release created before 20200101, use V1)
  • Click 'Save'

The release should now show as a new tab on the 'Data -> Genomic Data' page

Adding Strain Variant Annotation Data

Strain Variant Annotation data must first be gzipped and uploaded to Google Storage. The CSV should be named with the pattern: WI.strain-annotation.bcsq.[VERSION_NUMBER].csv Substitute [RELEASE_VERSION] in the filename with the version number for the dataset release in the form YYYYMMDD ex: 20210401

 # First cd to the appropriate directory
gzip WI.strain-annotation.bcsq.[VERSION_NUMBER].csv
gsutil cp WI.strain-annotation.bcsq.[VERSION_NUMBER].csv.gz gs://caendr-db-bucket/strain_variant_annotation/c_elegans/WI.strain-annotation.bcsq.[VERSION_NUMBER].csv.gz

You can also upload the gzipped file directly through the Google Cloud Console to this locations:

gs://caendr-db-bucket/strain_variant_annotation/c_elegans/

Then update the CeNDR tool through the 'Admin' portal (This operation may take a long time to complete ~24hrs):

  • Admin -> Database Operations
  • Click the 'New Operation' button
  • Select 'Rebuild Strain Annotated Variant table from .csv.gz file'
  • Enter the [VERSION_NUMBER] to use (ie: 20210401)
  • Click 'Start'

Google Storage Details

caendr-photos-bucket

This bucket contains photos of the environment from where a strain was isolated.

Photos follow the naming pattern: 'gs:////.jpg'

Examples: 'gs://caendr-photos-bucket/c_elegans/ECA1217.jpg' 'gs://caendr-photos-bucket/c_elegans/WN2082.jpg'

When an image is uploaded to the bucket, a scaled down thumbnail will automatically be generated with the naming pattern: '.thumb.jpg'

Examples: 'gs://caendr-photos-bucket/c_elegans/ECA1217.thumb.jpg' 'gs://caendr-photos-bucket/c_elegans/WN2082.thumb.jpg'

caendr-db-bucket

This bucket is used for storing local backups of external databases (ie: Wormbase) and backing up CeNDR's internal database. Strain Variant Annotation data must be manually uploaded to the bucket before you can update the 'Strain Variant Annotation' table in the CeNDR database through the 'Admin' portal. The uploaded CSV file must be gzipped. For the example file below, the database operation form would require version '20210401'.

  • caendr-db-bucket
  • strain_variant_annotation
    • c_elegans
    • WI.strain-annotation.bcsq.20210401.csv.gz

caendr-main-terraform-state

This bucket contains the state information about the cloud infrastructure for CeNDR as well as a zipped backup of the secret.env file

caendr-nextflow-work-bucket

Work bucket for storing intermediate files in between Nextflow stages

caendr-site-private-bucket

This bucket contains any files that may (now or in the future) require custom access permissions and should not necessarily be globally public.

tools - This directory contains any data or file depencies for an associated tool.

  • tools
  • pairwise_indel_primer
    • sv.20200815.bed.gz
    • sv.20200815.bed.gz.tbi
    • sv.20200815.vcf.gz
    • sv.20200815.vcf.gz.csi
    • sv.20200815.vcf.gz.tbi
  • nemascan
    • input_data
    • all_species
    • c_elegans
      • annotations
      • genotypes
      • isotypes
      • phenotypes
    • c_briggsae
      • ...
    • c_tropicalis
      • ...

bam - contains the BAM and BAI files for each sequenced strain of each species. It also contains the 'Download All BAM/BAI Files' script that is automatically generated by CAENDR (bam_bai_signed_download_script.sh)

  • bam
  • c_elegans
    • AB1.bam
    • AB1.bam.bai
    • AB4.bam
    • AB4.bam.bai
    • etc...

reports - This directory contains the source data and the output from running the associated tool. The data and results are organized by the container and version (with the exception of the indel primer), and the hash of the tool's input data.

  • reports
  • heritability
    • v0.01a
    • 15251d73f683450317089dae6736dee3
      • data.tsv
      • result.tsv
    • indel-primer
    • 00d67eecc8a9ea13b7eb5e615b3a6860
      • input.json
      • result.tsv
  • nemascan-nxf
    • v1.0
    • 00d67eecc8a9ea13b7eb5e615b3a6860
      • data.tsv
      • results
      • Divergent_and_haplotype
      • Genotype_Matrix
      • Mapping
      • Nextflow
      • Phenotypes
      • Plots
      • Reports
    • v1.0a
    • 3facbf334a99a2cde3b6c4372cbe7da6
      • data.tsv
      • results
      • Divergent_and_haplotype
      • Genotype_Matrix
      • Mapping
      • Nextflow
      • Phenotypes
      • Plots
      • Reports

caendr-site-public-bucket

This bucket contains files that have been manually uploaded to the bucket (following an expected naming convention) or created through the 'Admin' portal. The content of several pages on CeNDR depends on these files and their contents.

dataset_release - This is the directory where formal releases of CeNDR data are uploaded

  • dataset_release
  • c_elegans
    • 20160408
    • 20170531
    • 20180527
    • 20200815
    • 20210121
    • release_notes.md
    • methods.md
    • alignment_report.html
    • gatk_report.html
    • concordance_report.html
    • divergent_regions_strain.20210121.bed.gz
    • variation
      • WI.20210121.soft-filter.vcf.gz
      • WI.20210121.soft-filter.vcf.gz.tbi
      • WI.20210121.soft-filter.isotype.vcf.gz
      • WI.20210121.soft-filter.isotype.vcf.gz.tbi
      • WI.20210121.hard-filter.vcf.gz
      • WI.20210121.hard-filter.vcf.gz.tbi
      • WI.20210121.hard-filter.isotype.vcf.gz
      • WI.20210121.hard-filter.isotype.vcf.gz.tbi
      • WI.20210121.impute.isotype.vcf.gz
      • WI.20210121.impute.isotype.vcf.gz.tbi
    • tree
      • WI.20210121.hard-filter.min4.tree
      • WI.20210121.hard-filter.min4.tree.pdf
      • WI.20210121.hard-filter.isotype.min4.tree
      • WI.20210121.hard-filter.isotype.min4.tree.pdf
    • haplotype
      • haplotype.png
      • haplotype.pdf
      • sweep.pdf
      • sweep_summary.tsv

Profile - User Profile photos for Staff, Advisory Committee, etc... are uploaded here when managed using the 'Admin' portal

  • profile
  • photos
    • 0f7394fa0d48431884e673b34f4c4dea.jpg

caendr-site-static-bucket

This bucket contains static site resources like images, videos, example data, etc... These files are stored in the CAENDR git repo, but are not accessible from the CeNDR web server. Terraform will automatically upload them to the caendr-site-static-bucket. To reference any of these assets on a page, you can use the Jinja macro ext_asset()

example:

<img src="{{ ext_asset('img/logo.png') }}">

caendr-src-bucket

This bucket is used by Terraform during the deployment process to store source code before it gets provisioned