easyFulcrum

The easyfulcrum package is a tool to process and analyze ecological field sampling data generated using the Fulcrum mobile application.

The easyfulcrum R package offers an organized workflow for processing ecological sampling data generated using the Fulcrum mobile application. easyfulcrum provides simple and efficient functions to clean, process, and visualize ecological field sampling and isolation data collected using custom Fulcrum applications. It also provides functions to join these data with genotype information if organisms isolated from the field are identified using molecular barcodes. Together, the Fulcrum mobile application and easyfulcrum R package allow researchers to easily implement mobile data-collection, cloud-based databases, and standardized data analysis tools to improve ecological sampling accuracy and efficiency.

What is Fulcrum?

Fulcrum is a customizable, geographic data-collection platform compatible with Apple iOS and Google Android devices that allows users to collect rich, location-based data. To facilitate large-scale ecological surveys of nematodes that are difficult to identify in the field, we developed two Fulcrum applications. The Nematode field sampling application allows the user to organize various ecological data types associated with the substrate sampled in the field, such as environmental parameters and substrate characteristics. The “Nematode isolation” application helps organize data associated with the specimens isolated from samples after they have been brought into the laboratory.

Fulcrum installation and application customization

The Fulcrum data collection application can be downloaded online (https://www.fulcrumapp.com). Fulcrum uses a powerful GUI to help users customize data-collection applications even when they have no coding or database administration knowledge, which makes Fulcrum’s robust, cloud-based database adaptable to sampling nearly any species from nature. If desired, users can customize our field collection and isolation applications following our Fulcrum templates.

The template for the Nematode field sampling application is here: [https://www.fulcrumapp.com/apps/nematode-field-sampling]
The Nematode isolation template is here: [https://www.fulcrumapp.com/apps/nematode-isolation]

Users can use the applications as is, but in order for easyfulcrum to work with custom applications, users should save their field sampling application and isolation applications with a unique identifier in the place of our nematode prefix, e.g. fungus field sampling and fungus isolation.

easyfulcrum installation:

Install the package via devtools (>= 2.4.1):

install.packages("devtools")
devtools::install_github("AndersenLab/easyfulcrum")

Load the package:

library(easyfulcrum)

Directory structure:

The makeDirStructure function makes a standardized directory of folders for the easyfulcrum run, taking a base directory (startdir) and the project name (projectdirname) as inputs.

Every collection project should be contained in its own directory. The directory name should follow the YearMonthPlace format used for Fulcrum collection projects, e.g. 2020JanuaryHawaii.

makeDirStructure(startdir = "~/Desktop",
                 projectdirname = "2020JanuaryHawaii")

The data directory contains the raw and processed subdirectories.
- raw/fulcrum holds the .csv files exported from Fulcrum and raw/fulcrum/photos contains .jpg files exported from Fulcrum.
- raw/annotate can hold spatial location files island.csv, location.csv, and trail.csv that the user generates for mapping the collection sites.
- processed/fulcrum holds easyfulcrum function outputs.

The reports directory holds easyfulcrum function outputs. These outputs will be generated by the user processing script(s) saved in the scripts directory.

Following makeDirStructure, the user adds collection .csv and .jpg files into the appropriate subfolder locations.

We include example files from a small collection to use with this vignette. To copy these files into the project directory you just made we include a helper function called loadExampleFiles. This function is not used in the normal easyfulcrum workflow. Note, the (startdir) and (projectdirname) should be identical to arguments above for the makeDirStructure function.

loadExampleFiles(startdir = "~/Desktop",
                 projectdirname = "2020JanuaryHawaii")

Exporting data from Fulcrum:

Before processing collection data using easyFulcrum, the raw Fulcrum data must be exported from the Fulcrum database using the Fulcrum website’s data export tool. We recommend exporting the data by selecting the following checkboxes:
- the desired project
- include photos
- include GPS data
- field sampling
- isolation

Exporting with change sets are not currently supported.

After the data is exported, the .csv files must be moved to [your project directory]/data/raw/fulcrum, and the field sampling photos in .jpg format are moved to [your project directory]/data/raw/fulcrum/photos.

Reading, processing, and joining Fulcrum results:

The first group of functions cleans the results from the Fulcrum .csv files.

readFulcrum:

readFulcrum takes a dir argument that specifies the directory to read in Fulcrum .csv files. This will be useful throughout the package.

dir <- "~/Desktop/2020JanuaryHawaii"
raw_fulc <- readFulcrum(dir = dir)

procFulcrum:

procFulcrum processes individual data frames and adds flags for unexpected data.

proc_fulc <- procFulcrum(data = raw_fulc)
#> Processing nematode_field_sampling
#> Processing nematode_field_sampling_sample_photo
#> Processing nematode_isolation
#> Processing nematode_isolation_s_labeled_plates
#> Processing nematode_isolation_photos

checkTemperatures:

checkTemperatures identifies flags in three temperature variables. Setting the return_flags option to TRUE will return a list of three data frames that pulls only the rows where each of the three flag types appear. The function automatically prints the rows where the flags exist.

procFulcrum function assumes that, when raw_substrate_temperature or raw_ambient_temperature temperatures are above 40 degrees, the temperatures were mistakenly input as Fahrenheit rather than Celsius, and converts these values to Celsius. It will also notice when both raw_ambient_temperature and raw_ambient_humidity get stuck on the same value for 5 or more measurements in a row. These are the three flags returned above.

flag_temp <- checkTemperatures(data = proc_fulc, return_flags = TRUE)
#> >>> Checking substrate temperature
#> [1] "There are 1 rows with flagged substrate temperature:"
#>                             fulcrum_id raw_substrate_temperature
#> 1 a7db618d-44cc-4b4a-bc67-871306029274                      65.3
#>   proc_substrate_temperature
#> 1                       18.5
#> >>> Checking ambient temperature
#> [1] "There are 1 rows with flagged ambient temperature:"
#>                             fulcrum_id raw_ambient_temperature
#> 1 b1f20ae4-c5c2-426f-894a-e1f46c2fa693                    71.2
#>   proc_ambient_temperature
#> 1                     21.8
#> >>> Checking ambient run temperature
#> [1] "There are 5 rows with flagged ambient run temperature:"
#>                              fulcrum_id c_label raw_ambient_temperature
#> 1  854b40c4-ac6a-47d1-a737-cf611aa94268  C-5115                    16.9
#> 2  de2ae06e-cfff-44bc-869e-8dd9205a65c6  C-5139                    23.3
#> 3  b1f20ae4-c5c2-426f-894a-e1f46c2fa693  C-5121                    71.2
#> 4  0a049170-e63c-4545-8b02-04bc08a5ea54  C-5117                    16.7
#> 5  34848a83-5951-4136-b632-d39aa85c76d3  C-5120                    17.4
#> 6  dda77efe-d73c-48e9-aefb-b508e613256b  C-5122                    17.4
#> 7  93de14a0-40ab-4793-8614-ab1512ab158c  C-5118                    17.4
#> 8  216cb71a-6470-46eb-950d-366ac3180498  C-5123                    17.4
#> 9  920a6a56-7a29-47f4-afce-a5f83787d639  C-5131                    17.4
#> 10 6b25c113-4bb6-4bc5-9473-eca1f8075d10  C-5132                    17.4
#> 11 e38a295e-7595-4227-887b-74e1514c71bc  C-5133                    17.4
#> 12 a4afcda9-2d13-4f7f-991a-2299cb3c5a3c  C-5135                    15.8
#> 13 c0e68d42-5180-4fda-88e8-7721bbc8c96d  C-5134                    15.3
#> 14 4b7c21ea-2987-49d8-88c3-951417de4965  C-5130                    15.3
#> 15 2922c799-0dd7-499c-b674-ea99c72e5ea7  C-5134                    15.3
#>    proc_ambient_temperature ambient_humidity flag_ambient_temperature_run
#> 1                      16.9             85.8                        FALSE
#> 2                      23.3             65.3                        FALSE
#> 3                      21.8             77.9                        FALSE
#> 4                      16.7             85.9                        FALSE
#> 5                      17.4             80.0                        FALSE
#> 6                      17.4             80.0                         TRUE
#> 7                      17.4             80.0                         TRUE
#> 8                      17.4             80.0                         TRUE
#> 9                      17.4             80.0                         TRUE
#> 10                     17.4             80.0                         TRUE
#> 11                     17.4             87.0                        FALSE
#> 12                     15.8             90.2                        FALSE
#> 13                     15.3             91.2                        FALSE
#> 14                     15.3             90.2                        FALSE
#> 15                     15.3             87.8                        FALSE
#>    collection_local_time collection_datetime_UTC
#> 1                  42660     2020-01-19 21:51:26
#> 2                  43080     2020-01-19 21:58:02
#> 3                  43140     2020-01-19 21:59:02
#> 4                  43320     2020-01-19 22:02:01
#> 5                  44100     2020-01-19 22:15:15
#> 6                  45780     2020-01-19 22:43:04
#> 7                  45960     2020-01-19 22:46:11
#> 8                  46440     2020-01-19 22:54:01
#> 9                  38520     2020-01-22 20:42:55
#> 10                 39000     2020-01-22 20:50:48
#> 11                 40320     2020-01-22 21:12:29
#> 12                 45420     2020-01-22 22:37:28
#> 13                 45540     2020-01-22 22:39:10
#> 14                 47400     2020-01-22 23:10:25
#> 15                 51780     2020-01-23 00:23:02

fixTemperatures:

fixTemperatures takes a) fulcrum_ids that need to be reverted back to their original values (if readings above 40 degrees were truly in Celsius) for both substrate temperatures (substrate_temperature_ids) and ambient temperatures (ambient_temperature_ids) as well as b) fulcrum_ids for which humidity and temperature readings need to be set to NA due to a stuck measurement device (ambient_temperature_run_ids). In the example below we set all the ambient_temperature_run = TRUE values to NA. Also, note that the first observation of the humidity and temperature value for a given run is not flagged as part of the run.

proc_fulc_clean <- fixTemperatures(data = proc_fulc,
                                   substrate_temperature_ids = "a7db618d-44cc-4b4a-bc67-871306029274",
                                   ambient_temperature_ids = "b1f20ae4-c5c2-426f-894a-e1f46c2fa693",
                                   ambient_temperature_run_ids=c("dda77efe-d73c-48e9-aefb-b508e613256b",
                                                                 "93de14a0-40ab-4793-8614-ab1512ab158c",
                                                                 "216cb71a-6470-46eb-950d-366ac3180498",
                                                                 "920a6a56-7a29-47f4-afce-a5f83787d639",
                                                                 "6b25c113-4bb6-4bc5-9473-eca1f8075d10"))

The flag variables will be maintained, so rerunning checkTemperatures on the cleaned processed fulcrum data will ensure that corrections have been implemented as desired.

joinFulcrum:

joinFulcrum joins the Fulcrum dataframes. The function works to first join the processed field sampling dataframe to the processed isolation dataframe via unique collection labels (c_label). Following this join, it selects a “best” photo for each unique C-label based on the existence of a matching photo id in the processed field sampling sample photo dataframe, before joining the aggregated data to this dataframe as well. Finally, the complete merge of the dataframes is achieved when the processed isolation S-labeled plates is joined to this large dataframe on the basis of isolation id (s_label).

If the user is using easyfulcrum on customized Fulcrum applications other than “Nematode field sampling” and “Nematode isolation”, it is recommended that select_vars is set to FALSE, such that joinFulcrum does not only return the default variables.

join_fulc <- joinFulcrum(data = proc_fulc, select_vars = TRUE)
#> Attempting to join:
#> proc_fulc$field_sampling_proc
#> proc_fulc$field_sampling_sample_photo_proc
#> proc_fulc$isolation_proc
#> proc_fulc$isolation_s_labeled_plates_proc
#> proc_fulc$isolation_photos_proc
#> Complete fulcrum data detected, joining all data.
#> returning joined and selected data, set select_vars to FALSE if variables are missing

checkJoin:

checkJoin conducts 10 different checks for flags in the joined Fulcrum data frame regarding extreme temperatures and altitude values, missing, improper, and/or duplicated C-labels, missing, and/or duplicated isolation records corresponding to these C-labels, and finally unusual sample photo numbers. The return_flags option is the same as in checkTemperatures, and the function automatically prints the rows where the flags exist. If desired, the user can manually edit values or correct mistakes in the underlying data based on these flags and re-run the pipeline again.

flag_join <- checkJoin(data = join_fulc, return_flags = TRUE)
#> >>> Checking data classes
#> [1] "There are 1 improperly classified variables"
#> [1] "substrate_other"
#> Improperly classified variables may require manipulation after read-in.
#> See easyfulcrum::fulcrumTypes for expected classes
#> >>> Checking duplicated c labels
#> [1] "There are 2 rows with duplicated c labels, these c labels are:"
#> [1] "C-5134" "C-5134"
#> [1] "Duplicated c labels are found in the ...field_sampling.csv"
#> >>> Checking unusual sample photo number
#> [1] "There are 6 rows with unusual sample photo numbers, their c labels are:"
#> [1] "C-5121" "C-5121" "C-5121" "C-5121" "C-5121" "C-5121"
#> [1] "Unusual sample photo number are found in the ...field_sampling.csv"
#> >>> Checking duplicated isolation for c label
#> [1] "There are 4 rows with duplicated isolation for c label, their c labels are:"
#> [1] "C-5130" "C-5130" "C-5130" "C-5130"
#> [1] "Duplicated isolation for c label are found in the ...isolation.csv"
#> >>> Checking missing isolation records
#> [1] "There are 4 rows with missing isolation records, their c labels are:"
#> [1] "C-5130" "C-5130" "C-5130" "C-5133"
#> [1] "Missing isolation records are found in the ...isolation.csv"
#> >>> Checking extreme substrate temperatures
#> [1] "There are 6 rows with extreme substrate temperatures, their c labels are:"
#> [1] "C-5121" "C-5121" "C-5121" "C-5121" "C-5121" "C-5121"
#> [1] "Extreme substrate temperatures are found in the ...field_sampling.csv"
#> >>> Checking extreme ambient temperatures
#> [1] "There are 3 rows with extreme ambient temperatures, their c labels are:"
#> [1] "C-5126" "C-5126" "C-5126"
#> [1] "Extreme ambient temperatures are found in nematode_field_sampling.csv"
#> >>> Checking extreme collection altitude
#> [1] "There are 1 rows with extreme collection altitudes, their c labels are:"
#> [1] "C-5134"
#> [1] "Extreme collection altitudes are found in nematode_field_sampling.csv"
#> >>> Checking missing s labels
#> [1] "There are 1 rows with missing s labels, their c labels are:"
#> [1] "C-5115"
#> [1] "Missing s labels are found in nematode_isolation_s_labeled_plates.csv"
#> >>> Checking duplicated s labels
#> [1] "There are 2 rows with duplicated s labels, their s labels are:"
#> [1] "S-12776" "S-12776"
#> [1] "Duplicated s labels are found in nematode_isolation_s_labeled_plates.csv"

annotateFulcrum:

annnotateFulcrum adds spatial information to the joined Fulcrum data frame, noting if sample collections were collected on specific islands, trails, and/or locations. Examples islands, trails, and/or locations from Hawaii are automatically loaded with the package, but a user can specify manually made .csv files and place them in data/raw/annotate, specifying the base directory as dir in annotateFulcrum will override the example files.

The hawaii_islands and hawaii_locations dataframes are composed of simple latitude and longitude starts and ends to create a bounding box, and hawaii_trails is composed of a character list of geojson polygon points from geojson output of that can be created on a bounding box online tool.

If the user is using easyfulcrum on customized Fulcrum applications other than “Nematode field sampling” and “Nematode isolation”, it is recommended that select_vars is set to FALSE, such that annotateFulcrum does not only return the default variables.

anno_fulc <- annotateFulcrum(data = join_fulc, dir = NULL, select_vars = TRUE)

Reading, processing, and joining genotyping google sheet:

This second group of functions operates to clean the results from a project specific Google Sheet that contains genotyping results.

Since easyFulcrum was originally built for processing nematode samples we provide a profile parameter to toggle the functions between the neamatode specific nematode profile and the more flexible, non-nematode specific general profile.

Making a project specific “genotyping sheet”:

Users that want to use the general profile can use our “general” genotyping sheet template

Users that want to use the nematode profile can use our “nematode” genotyping sheet template

Details on how to fill out a “nematode” genotyping sheet can be found in the Nematode Collection Protocol, look for “wild_isolate_genotyping_template”.

readGenotypes:

readGenotypes reads in genotyping data from a Google Sheet with requisite gsKey. The col_types variable will specify the class of each data column. Note, additional columns can be added to either genotyping template if desired.

For more details on reading in genotyping sheets, look into the googlesheets4 package (which underlies this function), as well as further information on how to specify the col_types if needed.

# read example data from the "general" genotyping template 
raw_geno_general <- readGenotypes(gsKey = c("1aXH-8UDvFVddl7JA-R2y8QOcNxESpmseZGjheM8hDro"), col_types = "cccdcc")
#> ✓ Reading from 2020JanuaryHawaii_easyFulcrum_general_genotyping_vignette.
#> ✓ Range ''genotyping template''.
head(raw_geno_general)
#> # A tibble: 6 × 6
#>   project_id        s_label species_id         possible_new_sp strain_name notes
#>   <chr>             <chr>   <chr>                        <dbl> <chr>       <chr>
#> 1 2020JanuaryHawaii S-12607 Caenorhabditis br…              NA ECA2499     NA   
#> 2 2020JanuaryHawaii S-12608 Caenorhabditis br…              NA ECA2501     NA   
#> 3 2020JanuaryHawaii S-12634 NA                              NA NA          NA   
#> 4 2020JanuaryHawaii NA      NA                              NA NA          NA   
#> 5 2020JanuaryHawaii S-12652 NA                              NA NA          NA   
#> 6 2020JanuaryHawaii S-12664 NA                              NA NA          NA

# read example data from the "nematode" genotyping template 
raw_geno_nema <- readGenotypes(gsKey = c("1eviRoe0NyIEkIexM6c_oTVTX6U4ndPx-hkXJvkjhqWM"), col_types = "cDDdcdcddddddDcDDdcdcdddddddcdcccddccc")
#> ✓ Reading from 2020JanuaryHawaii_easyFulcrum_nematode_genotyping_vignette.
#> ✓ Range ''genotyping template''.

checkGenotypes:

checkGenotypes both processes the genotyping data (adds flags) and returns info on those flags if desired. Setting the profile parameter to general will check for unexpected or improper use of S-labels, including missing isolations, improper S-label names, and/or duplicated S-labels. Setting the profile parameter to nematode will result in additional nematode specific checks, including unusual species IDs, strain names, proliferation values and whether ITS2 genotypes are missing when expected. Setting the return_geno option to TRUE and the return_flags option to FALSE will return the processed genotyping data and print information on the flags. Setting the return_geno option to FALSE and the return_flags option to TRUE will return a list of data frames that detail the rows where the flags appear. Both of these cannot be TRUE at the same time.

proc_geno_general <- checkGenotypes(geno_data = raw_geno_general, fulc_data = anno_fulc, 
                                  return_geno = TRUE, return_flags = FALSE, profile = "general")
#> Using "general" profile, target_sp parameter will not be used:
#> >>> Checking s labels
#> [1] "There are 1 rows with missing s labels, these s labels are:"
#> [1] NA
#> [1] "There are 2 rows with duplicated s labels, these s labels are:"
#> [1] "S-12664" "S-12664"
#> [1] "There are 1 rows with unusual s labels, these s labels are:"
#> [1] "R-12666"
#> [1] "There are 3 rows with s labels not found in the Fulcrum data, these s labels are:"
#> [1] "S-12652" "R-12666" "S-12688"
#> [1] "There are 3 s labels in the Fulcrum data not in the genotyping data, these s labels are:"
#> [1] "S-12666" "S-12662" "S-12663"

proc_geno_nema <- checkGenotypes(geno_data = raw_geno_nema, fulc_data = anno_fulc, 
                                  return_geno = TRUE, return_flags = FALSE, profile = "nematode")
#> Using "nematode" profile:
#> >>> Checking data classes
#> [1] "There are 0 improperly classified variables"
#> >>> Checking s labels
#> [1] "There are 1 rows with missing s labels, these s labels are:"
#> [1] NA
#> [1] "There are 2 rows with duplicated s labels, these s labels are:"
#> [1] "S-12664" "S-12664"
#> [1] "There are 1 rows with unusual s labels, these s labels are:"
#> [1] "R-12666"
#> [1] "There are 3 rows with s labels not found in the Fulcrum data, these s labels are:"
#> [1] "S-12652" "R-12666" "S-12688"
#> [1] "There are 3 s labels in the Fulcrum data not in the genotyping data, these s labels are:"
#> [1] "S-12666" "S-12662" "S-12663"
#> >>> Checking genotyping process
#> [1] "There are 0 rows missing expected proliferation data, these s labels are:"
#> [1] "There are 1 rows missing expected its2 genotype, these s labels are:"
#> [1] "S-12652"
#> [1] "There are 1 rows missing expected species_id, these s labels are:"
#> [1] "S-12668"
#> [1] "There are 1 rows with unusual target species names, these names are:"
#> [1] "O. myriophilus"
#> [1] "There are 1 rows missing expected strain_name, these s labels are:"
#> [1] "S-12775"

flag_geno_general <- checkGenotypes(geno_data = raw_geno_general, fulc_data = anno_fulc, 
                          return_geno = FALSE, return_flags = TRUE, profile = "general")
#> Using "general" profile, target_sp parameter will not be used:
#> >>> Checking s labels
#> [1] "There are 1 rows with missing s labels, these s labels are:"
#> [1] NA
#> [1] "There are 2 rows with duplicated s labels, these s labels are:"
#> [1] "S-12664" "S-12664"
#> [1] "There are 1 rows with unusual s labels, these s labels are:"
#> [1] "R-12666"
#> [1] "There are 3 rows with s labels not found in the Fulcrum data, these s labels are:"
#> [1] "S-12652" "R-12666" "S-12688"
#> [1] "There are 3 s labels in the Fulcrum data not in the genotyping data, these s labels are:"
#> [1] "S-12666" "S-12662" "S-12663"

Based on these flags “fixed” genotyping sheets were made, eliminating rows with blank S-labels, duplicated S-labels, etc. The “fixed” data are read in below and checkGenotypes is re-run on the “fixed” data.

# general example
raw_geno_general_fixed <- readGenotypes(gsKey = c("1AcovAEfQIF46PigrrM_D2QPOpdTQM-YndlFo4MmsQoc"), col_types = "cccdcc")
#> ✓ Reading from 2020JanuaryHawaii_easyFulcrum_general_genotyping_vignette_fixed.
#> ✓ Range ''genotyping template''.

proc_geno_general_fixed <- checkGenotypes(geno_data = raw_geno_general_fixed, fulc_data = anno_fulc,
                            return_geno = TRUE, return_flags = FALSE, profile = "general")
#> Using "general" profile, target_sp parameter will not be used:
#> >>> Checking s labels
#> [1] "There are 0 rows with missing s labels, these s labels are:"
#> [1] "There are 0 rows with duplicated s labels, these s labels are:"
#> [1] "There are 0 rows with unusual s labels, these s labels are:"
#> [1] "There are 0 rows with s labels not found in the Fulcrum data, these s labels are:"
#> [1] "There are 0 s labels in the Fulcrum data not in the genotyping data, these s labels are:"

# nematode example
raw_geno_nema_fixed <- readGenotypes(gsKey = c("1WaOsAU0Pmf_rOp9BoGmDeYMmyENw0gBppdllfedLG9s"), col_types = "cDDdcdcddddddDcDDdcdcdddddddcdcccddccc")
#> ✓ Reading from
#>   2020JanuaryHawaii_easyFulcrum_nematode_genotyping_vignette_fixed.
#> ✓ Range ''genotyping template''.

proc_geno_nema_fixed <- checkGenotypes(geno_data = raw_geno_nema_fixed, fulc_data = anno_fulc, 
                                 return_geno = TRUE, return_flags = FALSE, profile = "nematode")
#> Using "nematode" profile:
#> >>> Checking data classes
#> [1] "There are 0 improperly classified variables"
#> >>> Checking s labels
#> [1] "There are 0 rows with missing s labels, these s labels are:"
#> [1] "There are 0 rows with duplicated s labels, these s labels are:"
#> [1] "There are 0 rows with unusual s labels, these s labels are:"
#> [1] "There are 0 rows with s labels not found in the Fulcrum data, these s labels are:"
#> [1] "There are 0 s labels in the Fulcrum data not in the genotyping data, these s labels are:"
#> >>> Checking genotyping process
#> [1] "There are 0 rows missing expected proliferation data, these s labels are:"
#> [1] "There are 0 rows missing expected its2 genotype, these s labels are:"
#> [1] "There are 0 rows missing expected species_id, these s labels are:"
#> [1] "There are 0 rows with unusual target species names, these names are:"
#> [1] "There are 0 rows missing expected strain_name, these s labels are:"

joinGenoFulc:

joinGenoFulc will join the joined Fulcrum data frame with the genotyping information. This function will also save the processed genotyping information in data/processed/genotypes if dir is set to the base folder of the project.

If the user is using easyfulcrum on customized Fulcrum applications other than “Nematode field sampling” and “Nematode isolation”, it is recommended that select_vars is set to FALSE, such that joinGenoFulc does not only return the default variables.

# general example
join_genofulc_general <- joinGenoFulc(geno = proc_geno_general_fixed, fulc = anno_fulc, dir = dir, select_vars = TRUE)
#> Joining, by = "s_label"
#> returning selected data, set select_vars to FALSE if variables are missing

# nematode example
join_genofulc_nema <- joinGenoFulc(geno = proc_geno_nema_fixed, fulc = anno_fulc, dir = NULL, select_vars = TRUE)
#> Joining, by = "s_label"
#> returning selected data, set select_vars to FALSE if variables are missing

Reading, resizing, and joining collection images:

The final function processes and resizes images, adding details to a final dataframe.

procPhotos:

procPhotos copies raw sample photos, renames them with the C-label, makes a new directory data/processed/fulcrum/photos and pastes the renamed files there. The function also makes thumbnails for use with interactive maps and places these in the data/processed/fulcrum/photos/thumbnails directory. Setting the CeNDR option to TRUE will rename photos of samples meeting CeNDR criteria with the name of the nematode strains isolated from the sample and paste them in the data/processed/fulcrum/photos/CeNDR directory.

The function will also accept a public url (pub_url) for hosting the sample photos renamed by C-label. A compatible public url should follow the pub_url/Project/sampling_thumbs/C-label.jpg format. For example, if the full url for C-5133 is https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/2020JanuaryHawaii/sampling_thumbs/C-5133.jpg, the pub_url should be set to https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/. The project name, “sampling_thumbs”, C-label, and file extension will be filled by the function.

We’ve included example code for each profile, but note that if you rerun the procPhotos function with the overwite parameter set to TRUE the output files will be overwritten.


final_data_general <- procPhotos(dir = dir, data = join_genofulc_general,
                         max_dim = 500, overwrite = TRUE,
                         pub_url = "https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/",
                         CeNDR = TRUE)
head(final_data_general)
#> # A tibble: 6 × 75
#>   project           c_label s_label species_id strain_name collection_by        
#>   <chr>             <chr>   <chr>   <chr>      <chr>       <chr>                
#> 1 2020JanuaryHawaii C-5115  S-12695 NA         NA          dec@u.northwestern.e…
#> 2 2020JanuaryHawaii C-5115  NA      NA         NA          dec@u.northwestern.e…
#> 3 2020JanuaryHawaii C-5115  S-12706 NA         NA          dec@u.northwestern.e…
#> 4 2020JanuaryHawaii C-5115  S-12697 NA         NA          dec@u.northwestern.e…
#> 5 2020JanuaryHawaii C-5115  S-12776 NA         NA          dec@u.northwestern.e…
#> 6 2020JanuaryHawaii C-5115  S-12776 NA         NA          dec@u.northwestern.e…
#> # … with 69 more variables: collection_datetime_UTC <dttm>,
#> #   collection_date_UTC <date>, collection_local_time <dbl>,
#> #   collection_fulcrum_latitude <dbl>, collection_fulcrum_longitude <dbl>,
#> #   exif_gps_latitude <dbl>, exif_gps_longitude <dbl>,
#> #   collection_latitude <dbl>, collection_longitude <dbl>,
#> #   collection_lat_long_method <chr>, collection_lat_long_method_diff <dbl>,
#> #   fulcrum_altitude <dbl>, exif_gps_altitude <dbl>, …

final_data_nema <- procPhotos(dir = dir, data = join_genofulc_nema,
                         max_dim = 500, overwrite = TRUE,
                         pub_url = "https://storage.googleapis.com/elegansvariation.org/photos/isolation/fulcrum/",
                         CeNDR = TRUE)
head(final_data_nema)
#> # A tibble: 6 × 89
#>   project           c_label s_label species_id strain_name collection_by        
#>   <chr>             <chr>   <chr>   <chr>      <chr>       <chr>                
#> 1 2020JanuaryHawaii C-5115  S-12695 NA         NA          dec@u.northwestern.e…
#> 2 2020JanuaryHawaii C-5115  NA      NA         NA          dec@u.northwestern.e…
#> 3 2020JanuaryHawaii C-5115  S-12706 NA         NA          dec@u.northwestern.e…
#> 4 2020JanuaryHawaii C-5115  S-12697 NA         NA          dec@u.northwestern.e…
#> 5 2020JanuaryHawaii C-5115  S-12776 NA         NA          dec@u.northwestern.e…
#> 6 2020JanuaryHawaii C-5115  S-12776 NA         NA          dec@u.northwestern.e…
#> # … with 83 more variables: collection_datetime_UTC <dttm>,
#> #   collection_date_UTC <date>, collection_local_time <dbl>,
#> #   collection_fulcrum_latitude <dbl>, collection_fulcrum_longitude <dbl>,
#> #   exif_gps_latitude <dbl>, exif_gps_longitude <dbl>,
#> #   collection_latitude <dbl>, collection_longitude <dbl>,
#> #   collection_lat_long_method <chr>, collection_lat_long_method_diff <dbl>,
#> #   fulcrum_altitude <dbl>, exif_gps_altitude <dbl>, …

Generating summary and output files:

We include two functions for generating summaries of a finalized colleciton project. The final dataframes can otherwise be used as needed by the user.

makeSpSheet:

makeSpSheet generates a species specific .csv file for the species of interest (target_sp) and writes it to the /reports subdirectory. This function simplifies the output of the final dataframe, pulling variables of particular interest for a user specified species of interest. This function is written to standardize the output dataframe to meet the specifications for submitting wild nematode collections to the Caenorhabditis Natural Diversity Resource (CaeNDR). For this reason the makeSpSheet function is likely only applicable to nematode sampling projects.

makeSpSheet also returns a dataframe with flags for these select samples, and prints a description of these flags.

# general example
sp_sheet_general <- makeSpSheet(data = final_data_general, target_sp = "Caenorhabditis briggsae", dir = dir)
#> [1] "There are 9 strains with an email address for sampled_by:"
#> [1] "ECA2499" "ECA2501" "ECA2503" "ECA2505" "ECA2507" "ECA2509" "ECA2511"
#> [8] "ECA2513" "ECA2515"
#> [1] "There are 9 strains with an email address for isolated_by:"
#> [1] "ECA2499" "ECA2501" "ECA2503" "ECA2505" "ECA2507" "ECA2509" "ECA2511"
#> [8] "ECA2513" "ECA2515"
#> [1] "There are 0 strains with a species name not in the target species list:"
#> [1] "There are 0 strains with an unusual substrate class:"
#> [1] "There are 0 strains with an unusual landscape class:"

# nematode example
sp_sheet_nema <- makeSpSheet(data = final_data_nema, target_sp = "Caenorhabditis briggsae", dir = dir)
#> [1] "There are 9 strains with an email address for sampled_by:"
#> [1] "ECA2499" "ECA2501" "ECA2503" "ECA2505" "ECA2507" "ECA2509" "ECA2511"
#> [8] "ECA2513" "ECA2515"
#> [1] "There are 9 strains with an email address for isolated_by:"
#> [1] "ECA2499" "ECA2501" "ECA2503" "ECA2505" "ECA2507" "ECA2509" "ECA2511"
#> [8] "ECA2513" "ECA2515"
#> [1] "There are 0 strains with a species name not in the target species list:"
#> [1] "There are 0 strains with an unusual substrate class:"
#> [1] "There are 0 strains with an unusual landscape class:"

generateReport:

We provide a function, generateReport, that will generate an interactive overview of the entire sampling project. generateReport saves a file named sampleReport.Rmd into the /scripts sub-directory, and saves a sampleReport.html file in the /reports sub-directory. The sampleReport.html can be viewed in any web browser and includes: an overview of the collection project (such as who conducted the respective processes and on what dates they were completed), summary tables of collection and isolation data, interactive maps of where the collections in the project were acquired, and box plots showing the distributions of various environmental parameters at all collection sites. These parameters include substrate temperature, ambient temperature, humidity, and elevation.

Please feel free to edit the sampleReport.Rmd as you require once it is moved into the /scripts sub-directory.

The profile parameter will switch between nematode specific reports and non-nematode specific reports.

# general example
generateReport(data = final_data_general, dir = dir, target_sp = c("O. myriophilus", "Caenorhabditis briggsae"),
               profile = "general")
#> /Applications/RStudio.app/Contents/MacOS/pandoc/pandoc +RTS -K512m -RTS sampleReport_general.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /Users/tim/Desktop/2020JanuaryHawaii/reports/sampleReport_general.html --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --table-of-contents --toc-depth 4 --variable toc_float=1 --variable toc_selectors=h1,h2,h3,h4 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --include-in-header /var/folders/lm/tby5m_cd0cs789xpcnwjm54h0000gn/T//RtmphBgnGl/rmarkdown-str60aa4ae3ebc.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'

# nematode example
generateReport(data = final_data_nema, dir = dir, profile = "nematode")
#> Warning: Unknown levels in `f`: Caenorhabditis elegans, Caenorhabditis
#> tropicalis

#> Warning: Unknown levels in `f`: Caenorhabditis elegans, Caenorhabditis
#> tropicalis
#> /Applications/RStudio.app/Contents/MacOS/pandoc/pandoc +RTS -K512m -RTS sampleReport.knit.md --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output /Users/tim/Desktop/2020JanuaryHawaii/reports/sampleReport.html --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/pagebreak.lua --lua-filter /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmarkdown/lua/latex-div.lua --self-contained --variable bs3=TRUE --standalone --section-divs --table-of-contents --toc-depth 4 --variable toc_float=1 --variable toc_selectors=h1,h2,h3,h4 --variable toc_collapsed=1 --variable toc_smooth_scroll=1 --variable toc_print=1 --template /Library/Frameworks/R.framework/Versions/4.1/Resources/library/rmarkdown/rmd/h/default.html --no-highlight --variable highlightjs=1 --variable theme=bootstrap --include-in-header /var/folders/lm/tby5m_cd0cs789xpcnwjm54h0000gn/T//RtmphBgnGl/rmarkdown-str60aa7d7231bf.html --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML'

Authors: Tim Crombie and Matteo Di Bernardo

02 Jun 2022