VCF-kit is a command-line based collection of utilities for performing analysis on Variant Call Format (VCF) files. A summary of the commands is provided below. See documentation for details on installation and usage.
|calc||Obtain frequency/count of genotypes and alleles.|
|call||Compare variants identified from sequences obtained through alternative methods against a VCF.|
|filter||Filter variants with a minimum or maximum number of REF, HET, ALT, or missing calls.|
|geno||Various operations at the genotype level.|
|genome||Reference genome processing and management.|
|hmm||Hidden-markov model for use in imputing genotypes from parental genotypes in linkage studies.|
|phylo||Generate dendrograms from a VCF.|
|primer||Generate primers for variant validation.|
|rename||Add a prefix, suffix, or substitute a string in sample names.|
|tajima||Calculate Tajima’s D.|
|vcf2tsv||Convert a VCF to TSV.|
A set of functions to process phenotype data, perform GWAS, and perform post-mapping data processing for C. elegans.
pheno <- spike(snps, c(80, 1020)) processed_phenotypes = process_pheno(pheno) mapping_df = gwas_mappings(processed_phenotypes, cores = 4, only_sig = FALSE) processed_mapping_df = process_mappings(mapping_df, phenotype_df = processed_phenotypes, CI_size = 50, snp_grouping = 200) manplot(processed_mapping_df)
This package includes all data and functions necessary to complete a mapping for the phenotype of your choice using the recombinant inbred lines from Andersen, et al. 2015 (G3). Included with this package are the cross and map objects for this strain set as well a markers.rds file containing a lookup table for the physical positions of all markers used for mapping. See the github page for more information on usage.
# Get the cross object data("N2xCB4856cross") cross <- N2xCB4856cross # Get the phenotype data pheno <- readRDS("~/Dropbox/AndersenLab/LabFolders/PastMembers/Tyler/ForTrip/RIAILs2_processed.rds") # Merge the cross object and the phenotype data cross <- mergepheno(cross, pheno) # Perform a mapping with only 10 iterations of the phenotype data for FDR calc map <- fsearch(cross, permutations = 10) # Annotate the LOD scores annotatedlods <- annotate_lods(map, cross)
easysorter is effectively version 2 of the COPASutils package (Shimko and Andersen, 2014). This package is specialized for use with worms and includes additional functionality on top of that provided by COPASutils, including division of recorded objects by larval stage and the ability to regress out control phenotypes from those recorded in experimental conditions. The package is rather specific to use in the Andersen Lab and, therefore, is not available from CRAN. To install you will need the devtools package. You can install both the devtools package and easy sorter using the commands below:
An R package that presents a logical workflow for the reading, processing, and visualization of data obtained from the Union Biometrica Complex Object Parametric Analyzer and Sorter (COPAS) platform large-particle flow cytometers and a powerful suite of functions for the rapid processing and analysis of large high-throughput screening data sets. It combines the speed of dplyr with the elegance of ggplot2 to make analysis of COPAS data fast and painless.
Liftover is a python script that wraps the
remap_gff_between_releases.pl script by Gary Williams. It expands upon the number of filetypes you can liftover:
- VCF/BCF (Requires bcftools)
Additionally, custom file formats can be lifted over by specifying chromosome, start position column, and optionally an end position column.
pip install https://github.com/AndersenLab/liftover-utils/archive/v0.1.tar.gz
Note that the end_pos_column parameter is optional, meaning you only need to specify a chromosome and base pair location to be lifted over.
liftover <file> <release1> <release2> (bcf|vcf|gff|bed) liftover <file> <release1> <release2> <chrom_col> <start_pos_column> [<end_pos_column>] [options] Options: -h --help Show this screen. --delim=<delim> File Delimiter; Default is a tab [default: TAB].
Scripts and Utilities
Python script for calculating depth of coverage and breadth of coverage by chromosome, nuclear, mtDNA, and genome from a given bam file.
Used to produce a bed file of masked ranges within a hard masked ("NNN") fasta file.
Python script for 'polarizing' or re-calling heterogzygous genotype calls as homozygous reference or alternate genotypes based on bayesian probabilities in inbred or largely homozgyous individuals.