VCF-kit - Documentation

VCF-kit is a command-line based collection of utilities for performing analysis on Variant Call Format (VCF) files. A summary of the commands is provided below. See documentation for details on installation and usage.

Command Description
calc Obtain frequency/count of genotypes and alleles.
call Compare variants identified from sequences obtained through alternative methods against a VCF.
filter Filter variants with a minimum or maximum number of REF, HET, ALT, or missing calls.
geno Various operations at the genotype level.
genome Reference genome processing and management.
hmm Hidden-markov model for use in imputing genotypes from parental genotypes in linkage studies.
phylo Generate dendrograms from a VCF.
primer Generate primers for variant validation.
rename Add a prefix, suffix, or substitute a string in sample names.
tajima Calculate Tajima’s D.
vcf2tsv Convert a VCF to TSV.


A set of functions to process phenotype data, perform GWAS, and perform post-mapping data processing for C. elegans.




pheno <- spike(snps, c(80, 1020))
processed_phenotypes = process_pheno(pheno)
mapping_df = gwas_mappings(processed_phenotypes, cores = 4, only_sig = FALSE)
processed_mapping_df = process_mappings(mapping_df, phenotype_df = processed_phenotypes, CI_size = 50, snp_grouping = 200)


This package includes all data and functions necessary to complete a mapping for the phenotype of your choice using the recombinant inbred lines from Andersen, et al. 2015 (G3). Included with this package are the cross and map objects for this strain set as well a markers.rds file containing a lookup table for the physical positions of all markers used for mapping. See the github page for more information on usage.




# Get the cross object
cross <- N2xCB4856cross

# Get the phenotype data
pheno <- readRDS("~/Dropbox/AndersenLab/LabFolders/PastMembers/Tyler/ForTrip/RIAILs2_processed.rds")

# Merge the cross object and the phenotype data
cross <- mergepheno(cross, pheno)

# Perform a mapping with only 10 iterations of the phenotype data for FDR calc
map <- fsearch(cross, permutations = 10)

# Annotate the LOD scores
annotatedlods <- annotate_lods(map, cross)


easysorter is effectively version 2 of the COPASutils package (Shimko and Andersen, 2014). This package is specialized for use with worms and includes additional functionality on top of that provided by COPASutils, including division of recorded objects by larval stage and the ability to regress out control phenotypes from those recorded in experimental conditions. The package is rather specific to use in the Andersen Lab and, therefore, is not available from CRAN. To install you will need the devtools package. You can install both the devtools package and easy sorter using the commands below:




An R package that presents a logical workflow for the reading, processing, and visualization of data obtained from the Union Biometrica Complex Object Parametric Analyzer and Sorter (COPAS) platform large-particle flow cytometers and a powerful suite of functions for the rapid processing and analysis of large high-throughput screening data sets. It combines the speed of dplyr with the elegance of ggplot2 to make analysis of COPAS data fast and painless.



Additional Resources


Liftover is a python script that wraps the script by Gary Williams. It expands upon the number of filetypes you can liftover:

  • VCF/BCF (Requires bcftools)
  • GFF
  • BED

Additionally, custom file formats can be lifted over by specifying chromosome, start position column, and optionally an end position column.


pip install


Note that the end_pos_column parameter is optional, meaning you only need to specify a chromosome and base pair location to be lifted over.

liftover <file> <release1> <release2> (bcf|vcf|gff|bed)
liftover <file> <release1> <release2> <chrom_col> <start_pos_column> [<end_pos_column>] [options]

  -h --help     Show this screen.
  --delim=<delim>  File Delimiter; Default is a tab [default: TAB].

Scripts and Utilities

depth_of_coverage.pyDaniel Cook
Python script for calculating depth of coverage and breadth of coverage by chromosome, nuclear, mtDNA, and genome from a given bam file.
generate_masked_ranges.pyDaniel Cook
Used to produce a bed file of masked ranges within a hard masked ("NNN") fasta file.
het_polarization.pyDaniel Cook
Python script for 'polarizing' or re-calling heterogzygous genotype calls as homozygous reference or alternate genotypes based on bayesian probabilities in inbred or largely homozgyous individuals.