easyFulcrum.Rmd
The easyfulcrum package is a tool to process and analyze ecological field sampling data generated using the Fulcrum mobile application.
The easyfulcrum R package offers an organized workflow for processing ecological sampling data generated using the Fulcrum mobile application. easyfulcrum provides simple and efficient functions to clean, process, and visualize ecological field sampling and isolation data collected using custom Fulcrum applications. It also provides functions to join these data with genotype information if organisms isolated from the field are identified using molecular barcodes. Together, the Fulcrum mobile application and easyfulcrum R package allow researchers to easily implement mobile data-collection, cloud-based databases, and standardized data analysis tools to improve ecological sampling accuracy and efficiency.
Fulcrum is a customizable, geographic data-collection platform compatible with Apple iOS and Google Android devices that allows users to collect rich, location-based data. To facilitate large-scale ecological surveys of nematodes that are difficult to identify in the field, we developed two Fulcrum applications. The Nematode field sampling application allows the user to organize various ecological data types associated with the substrate sampled in the field, such as environmental parameters and substrate characteristics. The “Nematode isolation” application helps organize data associated with the specimens isolated from samples after they have been brought into the laboratory.
The Fulcrum data collection application can be downloaded online (https://www.fulcrumapp.com). Fulcrum uses a powerful GUI to help users customize data-collection applications even when they have no coding or database administration knowledge, which makes Fulcrum’s robust, cloud-based database adaptable to sampling nearly any species from nature. If desired, users can customize our field collection and isolation applications following our Fulcrum templates.
Users can use the applications as is, but in order for easyfulcrum to work with custom applications, users should save their field sampling application and isolation applications with a unique identifier in the place of our nematode prefix, e.g. fungus field sampling and fungus isolation.
Install the package via devtools (>= 2.4.1):
install.packages("devtools")
devtools::install_github("AndersenLab/easyfulcrum")
Load the package:
library(easyfulcrum)
The makeDirStructure
function makes a standardized directory of folders for the easyfulcrum run, taking a base directory (startdir
) and the project name (projectdirname
) as inputs.
Every collection project should be contained in its own directory. The directory name should follow the YearMonthPlace
format used for Fulcrum collection projects, e.g. 2020JanuaryHawaii
.
makeDirStructure(startdir = "~/Desktop",
projectdirname = "2020JanuaryHawaii")
The data
directory contains the raw
and processed
subdirectories.
- raw/fulcrum
holds the .csv
files exported from Fulcrum and raw/fulcrum/photos
contains .jpg
files exported from Fulcrum.
- raw/annotate
can hold spatial location files island.csv
, location.csv
, and trail.csv
that the user generates for mapping the collection sites.
- processed/fulcrum
holds easyfulcrum function outputs.
The reports
directory holds easyfulcrum function outputs. These outputs will be generated by the user processing script(s) saved in the scripts
directory.
Following makeDirStructure
, the user adds collection .csv
and .jpg
files into the appropriate subfolder locations.
We include example files from a small collection to use with this vignette. To copy these files into the project directory you just made we include a helper function called loadExampleFiles
. This function is not used in the normal easyfulcrum workflow. Note, the (startdir
) and (projectdirname
) should be identical to arguments above for the makeDirStructure
function.
loadExampleFiles(startdir = "~/Desktop",
projectdirname = "2020JanuaryHawaii")
Before processing collection data using easyFulcrum, the raw Fulcrum data must be exported from the Fulcrum database using the Fulcrum website’s data export tool. We recommend exporting the data by selecting the following checkboxes:
- the desired project
- include photos
- include GPS data
- field sampling
- isolation
Exporting with change sets are not currently supported.
After the data is exported, the .csv
files must be moved to [your project directory]/data/raw/fulcrum
, and the field sampling photos in .jpg
format are moved to [your project directory]/data/raw/fulcrum/photos
.
The first group of functions cleans the results from the Fulcrum .csv
files.
readFulcrum
takes a dir
argument that specifies the directory to read in Fulcrum .csv
files. This will be useful throughout the package.
dir <- "~/Desktop/2020JanuaryHawaii"
raw_fulc <- readFulcrum(dir = dir)
procFulcrum
processes individual data frames and adds flags for unexpected data.
proc_fulc <- procFulcrum(data = raw_fulc)
#> Processing nematode_field_sampling
#> Processing nematode_field_sampling_sample_photo
#> Processing nematode_isolation
#> Processing nematode_isolation_s_labeled_plates
#> Processing nematode_isolation_photos
checkTemperatures
identifies flags in three temperature variables. Setting the return_flags
option to TRUE
will return a list of three data frames that pulls only the rows where each of the three flag types appear. The function automatically prints the rows where the flags exist.
procFulcrum
function assumes that, when raw_substrate_temperature
or raw_ambient_temperature
temperatures are above 40 degrees, the temperatures were mistakenly input as Fahrenheit rather than Celsius, and converts these values to Celsius. It will also notice when both raw_ambient_temperature
and raw_ambient_humidity
get stuck on the same value for 5 or more measurements in a row. These are the three flags returned above.
flag_temp <- checkTemperatures(data = proc_fulc, return_flags = TRUE)
#> >>> Checking substrate temperature
#> [1] "There are 1 rows with flagged substrate temperature:"
#> fulcrum_id raw_substrate_temperature
#> 1 a7db618d-44cc-4b4a-bc67-871306029274 65.3
#> proc_substrate_temperature
#> 1 18.5
#> >>> Checking ambient temperature
#> [1] "There are 1 rows with flagged ambient temperature:"
#> fulcrum_id raw_ambient_temperature
#> 1 b1f20ae4-c5c2-426f-894a-e1f46c2fa693 71.2
#> proc_ambient_temperature
#> 1 21.8
#> >>> Checking ambient run temperature
#> [1] "There are 5 rows with flagged ambient run temperature:"
#> fulcrum_id c_label raw_ambient_temperature
#> 1 854b40c4-ac6a-47d1-a737-cf611aa94268 C-5115 16.9
#> 2 de2ae06e-cfff-44bc-869e-8dd9205a65c6 C-5139 23.3
#> 3 b1f20ae4-c5c2-426f-894a-e1f46c2fa693 C-5121 71.2
#> 4 0a049170-e63c-4545-8b02-04bc08a5ea54 C-5117 16.7
#> 5 34848a83-5951-4136-b632-d39aa85c76d3 C-5120 17.4
#> 6 dda77efe-d73c-48e9-aefb-b508e613256b C-5122 17.4
#> 7 93de14a0-40ab-4793-8614-ab1512ab158c C-5118 17.4
#> 8 216cb71a-6470-46eb-950d-366ac3180498 C-5123 17.4
#> 9 920a6a56-7a29-47f4-afce-a5f83787d639 C-5131 17.4
#> 10 6b25c113-4bb6-4bc5-9473-eca1f8075d10 C-5132 17.4
#> 11 e38a295e-7595-4227-887b-74e1514c71bc C-5133 17.4
#> 12 a4afcda9-2d13-4f7f-991a-2299cb3c5a3c C-5135 15.8
#> 13 c0e68d42-5180-4fda-88e8-7721bbc8c96d C-5134 15.3
#> 14 4b7c21ea-2987-49d8-88c3-951417de4965 C-5130 15.3
#> 15 2922c799-0dd7-499c-b674-ea99c72e5ea7 C-5134 15.3
#> proc_ambient_temperature ambient_humidity flag_ambient_temperature_run
#> 1 16.9 85.8 FALSE
#> 2 23.3 65.3 FALSE
#> 3 21.8 77.9 FALSE
#> 4 16.7 85.9 FALSE
#> 5 17.4 80.0 FALSE
#> 6 17.4 80.0 TRUE
#> 7 17.4 80.0 TRUE
#> 8 17.4 80.0 TRUE
#> 9 17.4 80.0 TRUE
#> 10 17.4 80.0 TRUE
#> 11 17.4 87.0 FALSE
#> 12 15.8 90.2 FALSE
#> 13 15.3 91.2 FALSE
#> 14 15.3 90.2 FALSE
#> 15 15.3 87.8 FALSE
#> collection_local_time collection_datetime_UTC
#> 1 42660 2020-01-19 21:51:26
#> 2 43080 2020-01-19 21:58:02
#> 3 43140 2020-01-19 21:59:02
#> 4 43320 2020-01-19 22:02:01
#> 5 44100 2020-01-19 22:15:15
#> 6 45780 2020-01-19 22:43:04
#> 7 45960 2020-01-19 22:46:11
#> 8 46440 2020-01-19 22:54:01
#> 9 38520 2020-01-22 20:42:55
#> 10 39000 2020-01-22 20:50:48
#> 11 40320 2020-01-22 21:12:29
#> 12 45420 2020-01-22 22:37:28
#> 13 45540 2020-01-22 22:39:10
#> 14 47400 2020-01-22 23:10:25
#> 15 51780 2020-01-23 00:23:02
fixTemperatures
takes a) fulcrum_id
s that need to be reverted back to their original values (if readings above 40 degrees were truly in Celsius) for both substrate temperatures (substrate_temperature_ids
) and ambient temperatures (ambient_temperature_ids
) as well as b) fulcrum_id
s for which humidity and temperature readings need to be set to NA due to a stuck measurement device (ambient_temperature_run_ids
). In the example below we set all the ambient_temperature_run = TRUE
values to NA. Also, note that the first observation of the humidity and temperature value for a given run is not flagged as part of the run.
proc_fulc_clean <- fixTemperatures(data = proc_fulc,
substrate_temperature_ids = "a7db618d-44cc-4b4a-bc67-871306029274",
ambient_temperature_ids = "b1f20ae4-c5c2-426f-894a-e1f46c2fa693",
ambient_temperature_run_ids=c("dda77efe-d73c-48e9-aefb-b508e613256b",
"93de14a0-40ab-4793-8614-ab1512ab158c",
"216cb71a-6470-46eb-950d-366ac3180498",
"920a6a56-7a29-47f4-afce-a5f83787d639",
"6b25c113-4bb6-4bc5-9473-eca1f8075d10"))
The flag variables will be maintained, so rerunning checkTemperatures
on the cleaned processed fulcrum data will ensure that corrections have been implemented as desired.
joinFulcrum
joins the Fulcrum dataframes. The function works to first join the processed field sampling dataframe to the processed isolation dataframe via unique collection labels (c_label
). Following this join, it selects a “best” photo for each unique C-label based on the existence of a matching photo id in the processed field sampling sample photo dataframe, before joining the aggregated data to this dataframe as well. Finally, the complete merge of the dataframes is achieved when the processed isolation S-labeled plates is joined to this large dataframe on the basis of isolation id (s_label
).
If the user is using easyfulcrum on customized Fulcrum applications other than “Nematode field sampling” and “Nematode isolation”, it is recommended that select_vars
is set to FALSE
, such that joinFulcrum
does not only return the default variables.
join_fulc <- joinFulcrum(data = proc_fulc, select_vars = TRUE)
#> Attempting to join:
#> proc_fulc$field_sampling_proc
#> proc_fulc$field_sampling_sample_photo_proc
#> proc_fulc$isolation_proc
#> proc_fulc$isolation_s_labeled_plates_proc
#> proc_fulc$isolation_photos_proc
#> Complete fulcrum data detected, joining all data.
#> returning joined and selected data, set select_vars to FALSE if variables are missing
checkJoin
conducts 10 different checks for flags in the joined Fulcrum data frame regarding extreme temperatures and altitude values, missing, improper, and/or duplicated C-labels, missing, and/or duplicated isolation records corresponding to these C-labels, and finally unusual sample photo numbers. The return_flags
option is the same as in checkTemperatures
, and the function automatically prints the rows where the flags exist. If desired, the user can manually edit values or correct mistakes in the underlying data based on these flags and re-run the pipeline again.
flag_join <- checkJoin(data = join_fulc, return_flags = TRUE)
#> >>> Checking data classes
#> [1] "There are 1 improperly classified variables"
#> [1] "substrate_other"
#> Improperly classified variables may require manipulation after read-in.
#> See easyfulcrum::fulcrumTypes for expected classes
#> >>> Checking duplicated c labels
#> [1] "There are 2 rows with duplicated c labels, these c labels are:"
#> [1] "C-5134" "C-5134"
#> [1] "Duplicated c labels are found in the ...field_sampling.csv"
#> >>> Checking unusual sample photo number
#> [1] "There are 6 rows with unusual sample photo numbers, their c labels are:"
#> [1] "C-5121" "C-5121" "C-5121" "C-5121" "C-5121" "C-5121"
#> [1] "Unusual sample photo number are found in the ...field_sampling.csv"
#> >>> Checking duplicated isolation for c label
#> [1] "There are 4 rows with duplicated isolation for c label, their c labels are:"
#> [1] "C-5130" "C-5130" "C-5130" "C-5130"
#> [1] "Duplicated isolation for c label are found in the ...isolation.csv"
#> >>> Checking missing isolation records
#> [1] "There are 4 rows with missing isolation records, their c labels are:"
#> [1] "C-5130" "C-5130" "C-5130" "C-5133"
#> [1] "Missing isolation records are found in the ...isolation.csv"
#> >>> Checking extreme substrate temperatures
#> [1] "There are 6 rows with extreme substrate temperatures, their c labels are:"
#> [1] "C-5121" "C-5121" "C-5121" "C-5121" "C-5121" "C-5121"
#> [1] "Extreme substrate temperatures are found in the ...field_sampling.csv"
#> >>> Checking extreme ambient temperatures
#> [1] "There are 3 rows with extreme ambient temperatures, their c labels are:"
#> [1] "C-5126" "C-5126" "C-5126"
#> [1] "Extreme ambient temperatures are found in nematode_field_sampling.csv"
#> >>> Checking extreme collection altitude
#> [1] "There are 1 rows with extreme collection altitudes, their c labels are:"
#> [1] "C-5134"
#> [1] "Extreme collection altitudes are found in nematode_field_sampling.csv"
#> >>> Checking missing s labels
#> [1] "There are 1 rows with missing s labels, their c labels are:"
#> [1] "C-5115"
#> [1] "Missing s labels are found in nematode_isolation_s_labeled_plates.csv"
#> >>> Checking duplicated s labels
#> [1] "There are 2 rows with duplicated s labels, their s labels are:"
#> [1] "S-12776" "S-12776"
#> [1] "Duplicated s labels are found in nematode_isolation_s_labeled_plates.csv"
annnotateFulcrum
adds spatial information to the joined Fulcrum data frame, noting if sample collections were collected on specific islands, trails, and/or locations. Examples islands, trails, and/or locations from Hawaii are automatically loaded with the package, but a user can specify manually made .csv
files and place them in data/raw/annotate, specifying the base directory as dir
in annotateFulcrum
will override the example files.
The hawaii_islands
and hawaii_locations
dataframes are composed of simple latitude and longitude starts and ends to create a bounding box, and hawaii_trails
is composed of a character list of geojson polygon points from geojson output of that can be created on a bounding box online tool.
If the user is using easyfulcrum on customized Fulcrum applications other than “Nematode field sampling” and “Nematode isolation”, it is recommended that select_vars
is set to FALSE
, such that annotateFulcrum
does not only return the default variables.
anno_fulc <- annotateFulcrum(data = join_fulc, dir = NULL, select_vars = TRUE)
This second group of functions operates to clean the results from a project specific Google Sheet that contains genotyping results.
Since easyFulcrum was originally built for processing nematode samples we provide a profile
parameter to toggle the functions between the neamatode specific nematode
profile and the more flexible, non-nematode specific general
profile.
Users that want to use the general
profile can use our “general” genotyping sheet template
Users that want to use the nematode
profile can use our “nematode” genotyping sheet template
Details on how to fill out a “nematode” genotyping sheet can be found in the Nematode Collection Protocol, look for “wild_isolate_genotyping_template”.
readGenotypes
reads in genotyping data from a Google Sheet with requisite gsKey
. The col_types
variable will specify the class of each data column. Note, additional columns can be added to either genotyping template if desired.
For more details on reading in genotyping sheets, look into the googlesheets4 package (which underlies this function), as well as further information on how to specify the col_types
if needed.
# read example data from the "general" genotyping template
raw_geno_general <- readGenotypes(gsKey = c("1aXH-8UDvFVddl7JA-R2y8QOcNxESpmseZGjheM8hDro"), col_types = "cccdcc")
#> ✓ Reading from 2020JanuaryHawaii_easyFulcrum_general_genotyping_vignette.
#> ✓ Range ''genotyping template''.
head(raw_geno_general)
#> # A tibble: 6 × 6
#> project_id s_label species_id possible_new_sp strain_name notes
#> <chr> <chr> <chr> <dbl> <chr> <chr>
#> 1 2020JanuaryHawaii S-12607 Caenorhabditis br… NA ECA2499 NA
#> 2 2020JanuaryHawaii S-12608 Caenorhabditis br… NA ECA2501 NA
#> 3 2020JanuaryHawaii S-12634 NA NA NA NA
#> 4 2020JanuaryHawaii NA NA NA NA NA
#> 5 2020JanuaryHawaii S-12652 NA NA NA NA
#> 6 2020JanuaryHawaii S-12664 NA NA NA NA
# read example data from the "nematode" genotyping template
raw_geno_nema <- readGenotypes(gsKey = c("1eviRoe0NyIEkIexM6c_oTVTX6U4ndPx-hkXJvkjhqWM"), col_types = "cDDdcdcddddddDcDDdcdcdddddddcdcccddccc")
#> ✓ Reading from 2020JanuaryHawaii_easyFulcrum_nematode_genotyping_vignette.
#> ✓ Range ''genotyping template''.
checkGenotypes
both processes the genotyping data (adds flags) and returns info on those flags if desired. Setting the profile
parameter to general
will check for unexpected or improper use of S-labels, including missing isolations, improper S-label names, and/or duplicated S-labels. Setting the profile
parameter to nematode
will result in additional nematode specific checks, including unusual species IDs, strain names, proliferation values and whether ITS2 genotypes are missing when expected. Setting the return_geno
option to TRUE
and the return_flags
option to FALSE
will return the processed genotyping data and print information on the flags. Setting the return_geno
option to FALSE
and the return_flags
option to TRUE
will return a list of data frames that detail the rows where the flags appear. Both of these cannot be TRUE
at the same time.
proc_geno_general <- checkGenotypes(geno_data = raw_geno_general, fulc_data = anno_fulc,
return_geno = TRUE, return_flags = FALSE, profile = "general")
#> Using "general" profile, target_sp parameter will not be used:
#> >>> Checking s labels
#> [1] "There are 1 rows with missing s labels, these s labels are:"
#> [1] NA
#> [1] "There are 2 rows with duplicated s labels, these s labels are:"
#> [1] "S-12664" "S-12664"
#> [1] "There are 1 rows with unusual s labels, these s labels are:"
#> [1] "R-12666"
#> [1] "There are 3 rows with s labels not found in the Fulcrum data, these s labels are:"
#> [1] "S-12652" "R-12666" "S-12688"
#> [1] "There are 3 s labels in the Fulcrum data not in the genotyping data, these s labels are:"
#> [1] "S-12666" "S-12662" "S-12663"
proc_geno_nema <- checkGenotypes(geno_data = raw_geno_nema, fulc_data = anno_fulc,
return_geno = TRUE, return_flags = FALSE, profile = "nematode")
#> Using "nematode" profile:
#> >>> Checking data classes
#> [1] "There are 0 improperly classified variables"
#> >>> Checking s labels
#> [1] "There are 1 rows with missing s labels, these s labels are:"
#> [1] NA
#> [1] "There are 2 rows with duplicated s labels, these s labels are:"
#> [1] "S-12664" "S-12664"
#> [1] "There are 1 rows with unusual s labels, these s labels are:"
#> [1] "R-12666"
#> [1] "There are 3 rows with s labels not found in the Fulcrum data, these s labels are:"
#> [1] "S-12652" "R-12666" "S-12688"
#> [1] "There are 3 s labels in the Fulcrum data not in the genotyping data, these s labels are:"
#> [1] "S-12666" "S-12662" "S-12663"
#> >>> Checking genotyping process
#> [1] "There are 0 rows missing expected proliferation data, these s labels are:"
#> [1] "There are 1 rows missing expected its2 genotype, these s labels are:"
#> [1] "S-12652"
#> [1] "There are 1 rows missing expected species_id, these s labels are:"
#> [1] "S-12668"
#> [1] "There are 1 rows with unusual target species names, these names are:"
#> [1] "O. myriophilus"
#> [1] "There are 1 rows missing expected strain_name, these s labels are:"
#> [1] "S-12775"
flag_geno_general <- checkGenotypes(geno_data = raw_geno_general, fulc_data = anno_fulc,
return_geno = FALSE, return_flags = TRUE, profile = "general")
#> Using "general" profile, target_sp parameter will not be used:
#> >>> Checking s labels
#> [1] "There are 1 rows with missing s labels, these s labels are:"
#> [1] NA
#> [1] "There are 2 rows with duplicated s labels, these s labels are:"
#> [1] "S-12664" "S-12664"
#> [1] "There are 1 rows with unusual s labels, these s labels are:"
#> [1] "R-12666"
#> [1] "There are 3 rows with s labels not found in the Fulcrum data, these s labels are:"
#> [1] "S-12652" "R-12666" "S-12688"
#> [1] "There are 3 s labels in the Fulcrum data not in the genotyping data, these s labels are:"
#> [1] "S-12666" "S-12662" "S-12663"
Based on these flags “fixed” genotyping sheets were made, eliminating rows with blank S-labels, duplicated S-labels, etc. The “fixed” data are read in below and checkGenotypes
is re-run on the “fixed” data.
# general example
raw_geno_general_fixed <- readGenotypes(gsKey = c("1AcovAEfQIF46PigrrM_D2QPOpdTQM-YndlFo4MmsQoc"), col_types = "cccdcc")
#> ✓ Reading from 2020JanuaryHawaii_easyFulcrum_general_genotyping_vignette_fixed.
#> ✓ Range ''genotyping template''.
proc_geno_general_fixed <- checkGenotypes(geno_data = raw_geno_general_fixed, fulc_data = anno_fulc,
return_geno = TRUE, return_flags = FALSE, profile = "general")
#> Using "general" profile, target_sp parameter will not be used:
#> >>> Checking s labels
#> [1] "There are 0 rows with missing s labels, these s labels are:"
#> [1] "There are 0 rows with duplicated s labels, these s labels are:"
#> [1] "There are 0 rows with unusual s labels, these s labels are:"
#> [1] "There are 0 rows with s labels not found in the Fulcrum data, these s labels are:"
#> [1] "There are 0 s labels in the Fulcrum data not in the genotyping data, these s labels are:"
# nematode example
raw_geno_nema_fixed <- readGenotypes(gsKey = c("1WaOsAU0Pmf_rOp9BoGmDeYMmyENw0gBppdllfedLG9s"), col_types = "cDDdcdcddddddDcDDdcdcdddddddcdcccddccc")
#> ✓ Reading from
#> 2020JanuaryHawaii_easyFulcrum_nematode_genotyping_vignette_fixed.
#> ✓ Range ''genotyping template''.
proc_geno_nema_fixed <- checkGenotypes(geno_data = raw_geno_nema_fixed, fulc_data = anno_fulc,
return_geno = TRUE, return_flags = FALSE, profile = "nematode")
#> Using "nematode" profile:
#> >>> Checking data classes
#> [1] "There are 0 improperly classified variables"
#> >>> Checking s labels
#> [1] "There are 0 rows with missing s labels, these s labels are:"
#> [1] "There are 0 rows with duplicated s labels, these s labels are:"
#> [1] "There are 0 rows with unusual s labels, these s labels are:"
#> [1] "There are 0 rows with s labels not found in the Fulcrum data, these s labels are:"
#> [1] "There are 0 s labels in the Fulcrum data not in the genotyping data, these s labels are:"
#> >>> Checking genotyping process
#> [1] "There are 0 rows missing expected proliferation data, these s labels are:"
#> [1] "There are 0 rows missing expected its2 genotype, these s labels are:"
#> [1] "There are 0 rows missing expected species_id, these s labels are:"
#> [1] "There are 0 rows with unusual target species names, these names are:"
#> [1] "There are 0 rows missing expected strain_name, these s labels are:"
joinGenoFulc
will join the joined Fulcrum data frame with the genotyping information. This function will also save the processed genotyping information in data/processed/genotypes if dir
is set to the base folder of the project.
If the user is using easyfulcrum on customized Fulcrum applications other than “Nematode field sampling” and “Nematode isolation”, it is recommended that select_vars
is set to FALSE
, such that joinGenoFulc
does not only return the default variables.
# general example
join_genofulc_general <- joinGenoFulc(geno = proc_geno_general_fixed, fulc = anno_fulc, dir = dir, select_vars = TRUE)
#> Joining, by = "s_label"
#> returning selected data, set select_vars to FALSE if variables are missing
# nematode example
join_genofulc_nema <- joinGenoFulc(geno = proc_geno_nema_fixed, fulc = anno_fulc, dir = NULL, select_vars = TRUE)
#> Joining, by = "s_label"
#> returning selected data, set select_vars to FALSE if variables are missing
The final function processes and resizes images, adding details to a final dataframe.
procPhotos
copies raw sample photos, renames them with the C-label, makes a new directory data/processed/fulcrum/photos and pastes the renamed files there. The function also makes thumbnails for use with interactive maps and places these in the data/processed/fulcrum/photos/thumbnails directory. Setting the CeNDR
option to TRUE
will rename photos of samples meeting CeNDR criteria with the name of the nematode strains isolated from the sample and paste them in the data/processed/fulcrum/photos/CeNDR directory.
The function will also accept a public url (pub_url
) for hosting the sample photos renamed by C-label. A compatible public url should follow the pub_url/Project/sampling_thumbs/C-label.jpg
format. For example, if the full url for C-5133 is https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/2020JanuaryHawaii/sampling_thumbs/C-5133.jpg, the pub_url should be set to https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/. The project name, “sampling_thumbs”, C-label, and file extension will be filled by the function.
We’ve included example code for each profile, but note that if you rerun the procPhotos
function with the overwite
parameter set to TRUE
the output files will be overwritten.
final_data_general <- procPhotos(dir = dir, data = join_genofulc_general,
max_dim = 500, overwrite = TRUE,
pub_url = "https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/",
CeNDR = TRUE)
head(final_data_general)
#> # A tibble: 6 × 75
#> project c_label s_label species_id strain_name collection_by
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2020JanuaryHawaii C-5115 S-12695 NA NA dec@u.northwestern.e…
#> 2 2020JanuaryHawaii C-5115 NA NA NA dec@u.northwestern.e…
#> 3 2020JanuaryHawaii C-5115 S-12706 NA NA dec@u.northwestern.e…
#> 4 2020JanuaryHawaii C-5115 S-12697 NA NA dec@u.northwestern.e…
#> 5 2020JanuaryHawaii C-5115 S-12776 NA NA dec@u.northwestern.e…
#> 6 2020JanuaryHawaii C-5115 S-12776 NA NA dec@u.northwestern.e…
#> # … with 69 more variables: collection_datetime_UTC <dttm>,
#> # collection_date_UTC <date>, collection_local_time <dbl>,
#> # collection_fulcrum_latitude <dbl>, collection_fulcrum_longitude <dbl>,
#> # exif_gps_latitude <dbl>, exif_gps_longitude <dbl>,
#> # collection_latitude <dbl>, collection_longitude <dbl>,
#> # collection_lat_long_method <chr>, collection_lat_long_method_diff <dbl>,
#> # fulcrum_altitude <dbl>, exif_gps_altitude <dbl>, …
final_data_nema <- procPhotos(dir = dir, data = join_genofulc_nema,
max_dim = 500, overwrite = TRUE,
pub_url = "https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/",
CeNDR = TRUE)
head(final_data_nema)
#> # A tibble: 6 × 89
#> project c_label s_label species_id strain_name collection_by
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 2020JanuaryHawaii C-5115 S-12695 NA NA dec@u.northwestern.e…
#> 2 2020JanuaryHawaii C-5115 NA NA NA dec@u.northwestern.e…
#> 3 2020JanuaryHawaii C-5115 S-12706 NA NA dec@u.northwestern.e…
#> 4 2020JanuaryHawaii C-5115 S-12697 NA NA dec@u.northwestern.e…
#> 5 2020JanuaryHawaii C-5115 S-12776 NA NA dec@u.northwestern.e…
#> 6 2020JanuaryHawaii C-5115 S-12776 NA NA dec@u.northwestern.e…
#> # … with 83 more variables: collection_datetime_UTC <dttm>,
#> # collection_date_UTC <date>, collection_local_time <dbl>,
#> # collection_fulcrum_latitude <dbl>, collection_fulcrum_longitude <dbl>,
#> # exif_gps_latitude <dbl>, exif_gps_longitude <dbl>,
#> # collection_latitude <dbl>, collection_longitude <dbl>,
#> # collection_lat_long_method <chr>, collection_lat_long_method_diff <dbl>,
#> # fulcrum_altitude <dbl>, exif_gps_altitude <dbl>, …
We include two functions for generating summaries of a finalized colleciton project. The final dataframes can otherwise be used as needed by the user.
makeSpSheet
generates a species specific .csv
file for the species of interest (target_sp
) and writes it to the /reports
subdirectory. This function simplifies the output of the final dataframe, pulling variables of particular interest for a user specified species of interest. This function is written to standardize the output dataframe to meet the specifications for submitting wild nematode collections to the Caenorhabditis Natural Diversity Resource (CaeNDR). For this reason the makeSpSheet
function is likely only applicable to nematode sampling projects.
makeSpSheet
also returns a dataframe with flags for these select samples, and prints a description of these flags.
# general example
sp_sheet_general <- makeSpSheet(data = final_data_general, target_sp = "Caenorhabditis briggsae", dir = dir)
#> [1] "There are 9 strains with an email address for sampled_by:"
#> [1] "ECA2499" "ECA2501" "ECA2503" "ECA2505" "ECA2507" "ECA2509" "ECA2511"
#> [8] "ECA2513" "ECA2515"
#> [1] "There are 9 strains with an email address for isolated_by:"
#> [1] "ECA2499" "ECA2501" "ECA2503" "ECA2505" "ECA2507" "ECA2509" "ECA2511"
#> [8] "ECA2513" "ECA2515"
#> [1] "There are 0 strains with a species name not in the target species list:"
#> [1] "There are 0 strains with an unusual substrate class:"
#> [1] "There are 0 strains with an unusual landscape class:"
# nematode example
sp_sheet_nema <- makeSpSheet(data = final_data_nema, target_sp = "Caenorhabditis briggsae", dir = dir)
#> [1] "There are 9 strains with an email address for sampled_by:"
#> [1] "ECA2499" "ECA2501" "ECA2503" "ECA2505" "ECA2507" "ECA2509" "ECA2511"
#> [8] "ECA2513" "ECA2515"
#> [1] "There are 9 strains with an email address for isolated_by:"
#> [1] "ECA2499" "ECA2501" "ECA2503" "ECA2505" "ECA2507" "ECA2509" "ECA2511"
#> [8] "ECA2513" "ECA2515"
#> [1] "There are 0 strains with a species name not in the target species list:"
#> [1] "There are 0 strains with an unusual substrate class:"
#> [1] "There are 0 strains with an unusual landscape class:"
We provide a function, generateReport
, that will generate an interactive overview of the entire sampling project. generateReport
saves a file named sampleReport.Rmd
into the /scripts
sub-directory, and saves a sampleReport.html
file in the /reports
sub-directory. The sampleReport.html
can be viewed in any web browser and includes: an overview of the collection project (such as who conducted the respective processes and on what dates they were completed), summary tables of collection and isolation data, interactive maps of where the collections in the project were acquired, and box plots showing the distributions of various environmental parameters at all collection sites. These parameters include substrate temperature, ambient temperature, humidity, and elevation.
Please feel free to edit the sampleReport.Rmd
as you require once it is moved into the /scripts
sub-directory.
The profile
parameter will switch between nematode specific reports and non-nematode specific reports.
# general example
generateReport(data = final_data_general, dir = dir, target_sp = c("O. myriophilus", "Caenorhabditis briggsae"),
profile = "general")
#> /Applications/RStudio.app/Contents/MacOS/pandoc/pandoc +RTS -K512m -RTS sampleReport_general.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /Users/tim/Desktop/2020JanuaryHawaii/reports/sampleReport_general.html --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --table-of-contents --toc-depth 4 --variable toc_float=1 --variable toc_selectors=h1,h2,h3,h4 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --include-in-header /var/folders/lm/tby5m_cd0cs789xpcnwjm54h0000gn/T//RtmphBgnGl/rmarkdown-str60aa4ae3ebc.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'
# nematode example
generateReport(data = final_data_nema, dir = dir, profile = "nematode")
#> Warning: Unknown levels in `f`: Caenorhabditis elegans, Caenorhabditis
#> tropicalis
#> Warning: Unknown levels in `f`: Caenorhabditis elegans, Caenorhabditis
#> tropicalis
#> /Applications/RStudio.app/Contents/MacOS/pandoc/pandoc +RTS -K512m -RTS sampleReport.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /Users/tim/Desktop/2020JanuaryHawaii/reports/sampleReport.html --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --table-of-contents --toc-depth 4 --variable toc_float=1 --variable toc_selectors=h1,h2,h3,h4 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --include-in-header /var/folders/lm/tby5m_cd0cs789xpcnwjm54h0000gn/T//RtmphBgnGl/rmarkdown-str60aa7d7231bf.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'