Abstract
Emerald ash borer (Agrilus planipennis) has killed millions of trees in the United States. Community managers face treatment or removal decisions for all publicly owned ash (Fraxinus spp.) trees. These decisions are based on the overall condition of each tree. In this study, the U.S. Forest Service-trained a Boy Scout troop in Oconomowoc, Wisconsin, U.S., in a tree health assessment protocol that used rubrics designed to measure physiological stress symptoms. The city provided tree inventory data, which included the location of 316 city-owned ash trees. After a two-hour training session, the Scouts and adult leaders assessed all ash trees in August 2015. A tree health expert re-assessed 20% of the trees. The protocol measured diameter at breast height and included a suite of tree stress assessment variables. Researchers used a five-class system for defoliation, leaf discoloration, and overall vigor. Fine-twig dieback was estimated in 5% classes. Digital photographs were taken and automatically processed so as to measure percent crown transparency. Expert/volunteer agreement for diameter at breast height was within 2.5 cm 92% of the time; defoliation, discoloration, and vigor were within two classes 100%, 93%, and 92% of the time, respectively. Crown dieback estimates were within 10% of each other 76% of the time, and transparency estimates were within 15% of each other 76% of the time. Researchers calculated an overall stress index value and ranked the trees from lowest to highest stress. The volunteer-generated data enabled Oconomowoc to make science-based management decisions for its infested ash trees.
The emerald ash borer (EAB) is an exotic invasive insect that attacks ash trees (Fraxinus spp.) It is an example of a rapidly emerging environmental issue with broad geographic impact. Dead and dying ash trees pose a particular monitoring and management problem for our urban and community forests that contain ash trees, due to the hazard they represent to people and property.
EAB was first detected near Detroit, Michigan, U.S., in 2002, but it probably arrived in the area in the early 1990s (Siegert et al. 2007). During the decade before EAB was detected, it killed or severely stressed 5–7 million ash trees (Poland and McCullough 2006). Since then EAB has spread to 31 U.S. states and two Canadian provinces (Emerald Ash Borer Information Network 2017) and has killed hundreds of millions of trees in North America.
Kovacs et al. (2010) estimate that 37.9 million ash trees grow in urban areas in states with confirmed EAB infestation. For urban areas, the first step in dealing with EAB is to carry out an inventory to identify the location and current condition of publicly owned trees. Priority for removal is given to trees that are in the poorest condition. Some communities use pesticides to preserve trees. This treatment is not recommended if the trees have greater than 50% canopy loss (Herms et al. 2014). Both actions require up-to-date tree health assessment data. Once EAB establishes itself, it can take many years for all the ash trees to die. This provides an opportunity to spread treatment and removal costs over time; however, accurate planning and prioritization depend on periodic monitoring of individual tree health as infested trees decline over time. This monitoring activity requires additional trained personnel that small communities may not have, especially as they face increased tree removal activity.
Citizen science projects are becoming increasingly common (Kosmala et al. 2016) and cover a diversity of scales (global to neighborhood) and disciplines. Community-based monitoring (CBM) falls under the umbrella of citizen science and encompasses efforts that are designed to track and respond to issues of common community concern (Whitelaw et al. 2003). Collaborators on CBM projects typically involve scientists, the public, government agencies, community groups, and local institutions.
CBM efforts often focus on areas of environmental concern because of an increasing awareness of anthropogenic impacts on ecosystems (Conrad and Daoust 2008) and in some cases public concern about governmental capacity to monitor ecosystems (Pollock and Whitelaw 2005). The increasing complexity—and in some cases the rapidity—of emerging environmental issues stretches the capacity of monitoring programs to provide decision- and policymakers the information they need to address environmentally important changes (Vaughan et al. 2001).
For CBM efforts to be useful with respect to providing information to management agencies, it is necessary to understand the accuracy and reliability of the data collected by non-experts. Data reliability is a common theme within the citizen science literature (Cox et al. 2012; Kosmala et al. 2016; Lukyanenko et al. 2016; Roman et al. 2017). Regardless of whether experts or non-experts collect data for scientific studies, data quality is constantly monitored, and there are many protocols and procedures that are designed to reduce error. Furthermore, data variability and reliability play key roles in determining how scientific data can and should be used.
Urban forestry is positioned to take advantage of an increasing interest in CBM projects because of the high concentrations of people in cities and the importance of trees and greenspace to urban environments. Studies that have focused on volunteer data quality for urban forestry applications have found that accuracy levels vary for each variable being collected. For instance, Roman et al. (2017) found highest consistency between experts and non-experts for species ID, DBH, site type, land use, and dieback (all >80% agreement), and lower levels of agreement for transparency and wood condition (71.5% and 55.2%, respectively). Two other studies focusing on volunteer data quality for urban tree inventory applications found similar levels of agreement for species identification (approximately 80% agreement) but lower levels of agreement for volunteers assessing tree condition or maintenance needs (Bloniarz and Ryan 1996; Cozad et al. 2005). Roman et al. (2017) suggests that citizen science is appropriate for some urban tree inventory and monitoring projects but not for those that require extremely high accuracy.
Oconomowoc, Wisconsin, U.S., is a midwestern community facing EAB infestation, and as such, presents an opportunity to use and evaluate a CBM approach for ash tree health assessment.
The primary goal of this project was to use volunteers from a local Boy Scout troop to furnish the City of Oconomowoc with detailed health assessment data for all city-owned ash trees. There were two research objectives: 1) assess the level of agreement between data collected by volunteers and data collected on the same trees by an expert, and 2) use the agreement statistics calculated in Objective 1 to inform a discussion about the potential usefulness of volunteer-collected tree health assessment data for making management decisions about city-owned ash trees.
METHODS
Study Area
This study took place within an eight-week period between July and September 2015 in Oconomowoc (43°6′31″N, 88°29′49″W), a small city in southeastern Wisconsin, 56.3 km west of Milwaukee (Figure 1). EAB was first detected in 2013, and while the city had a tree inventory, it did not have information about current ash tree condition. An Eagle Scout candidate approached the city’s Parks and Forestry Superintendent with a proposal to organize volunteers and assess the physiological condition of all city-owned ash trees using a tree health assessment method developed by the U.S. Forest Service.
Study Design
The city furnished tree inventory data for 3,758 city-owned street and park trees. The inventory included information about species, diameter, and height, along with geographic coordinates for each tree. Researchers extracted location information for all ash trees (n = 316) with a diameter at breast height (DBH) of at least 10 cm, and created an interactive Google Maps™ image showing each tree along with its associated city-assigned identification number. Twenty-two volunteers (17 high school students and 5 adults) visited Oconomowoc’s ash trees (Figure 1) in July and August 2015. A tree health assessment expert revisited 20% (n = 64) of the trees on 9–10 September 2015. The validation sample was randomly selected from the population of city-owned ash trees (Figure 1).
Tree Health Assessment Methods
The study adopted tree health assessment methods from Pontius and Hallett (2014). Wulff (2002) found that accuracy of forest damage assessments improved when observers operated in teams. In this study, teams of two worked together, rating each assigned tree independently. If the independent ratings differed, the teams were instructed to discuss and reach consensus. The Eagle Scout candidate made the team assignments. Teams did not always include an adult.
Crown Discoloration and Defoliation
Ocular estimates of leaf discoloration and defoliation were made by estimating the proportion of a tree’s crown that was either discolored or defoliated. Estimates were recorded for each variable using a five-class rating system (Table 1).
Fine-Twig Dieback
Fine-twig dieback was estimated as a proportion of the crown containing dead twigs (no leaves). Estimates were made in 5% classes (Table 2).
Crown Vigor
Crown vigor, which is an overall assessment of tree health, was rated following the rubric in Table 3.
Crown Transparency
Digital photographs were used to quantify the percentage of open versus dark pixels of photographs taken vertically through the crown of a tree (Pontius and Hallett 2014). A digital camera was zoomed in to include only the crown of the subject tree from up to four locations around the tree. Digital images were automatically processed using a script written for CellProfiler™ (Lamprecht et al. 2007), which reports percent transparency for each image. Transparency values from up to four photographs per tree were averaged to represent overall percent crown transparency. Careful records were kept to ensure that a given photo was associated with the correct tree, and the photographs were organized and curated for processing.
Tree Photographs
A portrait of each tree was taken to preserve a visual record of the tree. This was useful for resolving data issues and may help with future assessments.
Volunteer Training
Literature from the education field provides insight into best practices for training raters to use rubrics. The education field relies heavily on rubrics to evaluate student learning in literacy. Consistency between raters is an important factor in achieving an accurate assessment of a student’s learning. Belanger et al. (2015) stresses the importance of the norming process after rubrics have been developed. They recommend having raters 1) work together through several examples, 2) independently score examples covering a range of conditions, 3) convene raters to review individual scores and identify areas of discrepancy, and 4) discuss and reconcile inconsistent scores. Oakleaf (2009) suggests that additional sets of examples be provided, thereby repeating the calibration process in steps two through four, up to four times. This norming process is designed to establish interrater reliability when using rubrics to assess student work.
The study authors created norming training procedures focusing on increasing interrater reliability when using our tree health assessment rubrics (described below). All volunteers participated in this two-hour practical training session, which was conducted by a research ecologist from the U.S. Forest Service.
The training goals were to educate non-experts about signs of stress in trees, familiarize volunteers with rubrics and data collection procedures, and maximize interrater consistency. The training involved five phases:
Phase I: Non-informed tree health assessment
The volunteers were asked to rank four preselected trees in order from most stressed to least stressed and to discuss the criteria they used to complete the ranking.
Phase II: Classroom session
Researchers presented slides illustrating the rating system and rubrics, including pictures of trees at various stages of decline. Researchers also provided guidance on how to take and organize transparency and tree photos and ash tree identification. For stem size, specialized diameter tapes were not used to measure DBH. Instead, volunteers were trained to measure circumference at breast height using cloth measuring tapes. Diameter was calculated after data collection and entry.
Phase III: Collaborative tree health assessment
Volunteers used the tree health assessment rubrics they learned in Phase II to assess three ash trees in various stages of decline. The whole group worked together and the trainer led a guided discussion on how to apply the rubrics to living trees.
Phase IV: Paired tree health assessment and evaluation
The volunteers split into teams of two and were asked to independently rate five trees in various stages of decline. Trees rated in Phase III were not included in the exercise. Individuals were asked to rate each tree separately and to reach consensus with their partner if their scores did not match.
Phase V: Evaluation and discussion
Each team submitted their ratings, and these were entered into a spreadsheet that graphed the data, showing bar graphs and error bars for each tree and variable. These graphs were used to discuss areas of greatest variability in the ratings.
Each volunteer received a two-page field guide that contained a short description of each variable along with the classification rubric.
Statistical Methods
A comprehensive tree stress index was created by standardizing the individual health assessment variables using z-scores (Green 1979). Researchers used a tree health database containing 640 ash trees to obtain the mean and standard deviation for each variable. A z-score was calculated for each variable and each tree in the study, using an R-script (R Core Team 2013). The z-scores for each variable were averaged for each tree to create an overall stress index score. To make the stress index score more intuitive, all values were made positive by adding a constant equal to the absolute value of the lowest score in the data set. Lower values represent healthier trees. This procedure is a refinement of the methods outlined in Pontius and Hallett (2014).
The Wilcoxon signed-rank test was applied to paired differences between expert and volunteer ratings of the same tree to test whether the median of the two rating groups was different. Linear regression was used to assess the strength of the relationship between the expert and volunteer ratings. The JMP® Pro software package version 12.2 (Sall et al. 2001) was used for statistical tests and creating graphs. The threshold for statistical significance is α ≤ 0.05.
RESULTS
Report
Researchers prepared an ash tree condition report and interactive Google Maps imagery for the City of Oconomowoc. Of the 316 inventoried ash trees, 26 were dead or missing. For living trees, the tree health data were presented in two ways:
The trees were sorted by their stress index value and binned into nine classes of approximately 30 trees. A tenth class contained 20 of the most-stressed trees. Class 1 contained the healthiest trees and Class 10 represented the most-stressed trees.
The stress index was multiplied by DBH to emphasize the largest and most-stressed ash trees. This rating was binned as previously described and mapped (Figure 2).
The report described the methods used and contained representative pictures of trees in each of the 10 classes. The city used this report to prioritize tree removals.
Data Validation
Of the 64 trees that were re-measured for data validation, three were dead or missing. One was a bitternut hickory (Carya cordiformis) that had been misidentified in the city’s tree inventory, and one was listed as missing or dead by the volunteers but was still alive. This resulted in a validation data set of 59 living trees that were rated twice. This data set represented the full range of DBH, stress, and geography of Oconomowoc’s ash tree population (Table 4; Figure 1). The volunteer data set was missing DBH for 1 tree and transparency for 81 trees, indicating a file management and tracking issue.
For DBH, the volunteers used cloth measuring tapes to measure circumference at breast height; the expert measurement was completed with a standard diameter tape. After conversion to diameter, the measurements were within 2.5 cm 92% of the time.
Table 4 contains the results of the Wilcoxon signed-rank test, indicating that there were differences between volunteer and expert raters for DBH, defoliation, dieback, and transparency, but not for discoloration, vigor, and the overall stress index. Table 5 shows the level of agreement between volunteer and expert health variable ratings in a different but less statistically definitive way. Given that these variables were estimated using a subjective rating system (except for transparency), information is given for the percentage of measurements that were within 5%, 10%, or 15% of each other for dieback and transparency and within one and two classes for defoliation, discoloration, and vigor. The best agreement was achieved for dieback estimates, which were within 10% of each other 76% of the time. Volunteers tended to slightly underestimate dieback estimates at higher percentages (Figure 3a). Crown transparency, calculated from digital photographs, was ±15% more than 75% of the time; at higher levels of transparency, the volunteer data tended to underestimate transparency (Figure 3b). Defoliation, discoloration, and vigor were rated using a five-class rubric, and all estimates were within one class about 50% of the time and within two classes more than 90% of the time.
The overall stress rating standardizes and combines all health variables into a single stress index score for each tree and ranges between 0 (healthy) and 3.7 (declining) (Table 4). The stress index created from data collected by volunteers tended to overestimate stress at the low end of the scale and underestimate stress at the high end of the scale (Figure 3c). In the validation data set, the largest disagreement for stress index was for the trees where the volunteer data set was missing transparency data. The mean difference between volunteer and expert stress index ratings for trees missing transparency data was 0.64 and the mean difference for trees that had all health variables measured was 0.41. A Wilcoxon signed-rank test indicated that volunteer and expert stress index scores were not significantly different (Z = 0.648, P = 0.638). The associated matched pairs cumulative probability plot shows the deviation between volunteer and expert stress indices across the range of stress index scores (Figure 4). The greatest deviation occurred between the 25th and 60th percentiles.
DISCUSSION
Citizen science has the potential to play an important role in developing increased information and in deepening the relationship of citizens to science and to urban and rural forests. Lukyanenko et al. (2016) contends that holding amateurs to scientific standards may be unrealistic, but engaging with citizen scientists may still afford value. On the other hand, there are examples in which environmental engagement benefits are achieved and the environmental monitoring data is useful to scientists or managers (Stokes et al. 1990; Engel and Voshell 2002). An important consideration in designing a successful (accurate and useable information) CBM is to design a project that uses accessible methods (Au et al. 2000). Along with accessible methods, it is important to create clear training and norming protocols. The tree health assessment methods used in the current study rely on non-stressor-specific visual signs and symptoms related to physiological tree stress. The advantage of this approach is that it does not require the rater to have specific knowledge about the pest or disease that may be causing the stress symptoms. Researchers used clear and concise rubrics and guidelines for evaluating each stress symptom.
For many CBM projects, data quality remains an open question. This project presents an opportunity for a detailed examination of the level of agreement between volunteer and expert tree health assessments for the same group of trees. This analysis imparts knowledge that will inform training and data collection practices for future efforts. As with all scientific studies, an analysis of data variability informs how the data may be used and the conclusions that can be drawn from them.
To assess the level of agreement between volunteer and expert raters, researchers applied the Wilcoxon signed-rank test to each variable and determined that there were significant differences between raters for DBH, defoliation, dieback, and transparency, but not for discoloration, vigor, and the stress index (Table 4). These results suggest that while there are discrepancies for some variables, overall stress index scores for the ash trees in this study were statistically reliable.
The Wilcoxon test, while statistically valid for comparing overall levels of agreement between raters, does not give information about the magnitude of differences or where they lie across the range of the variables measured. Table 4 shows the mean difference between expert and volunteer raters for each variable. While DBH measurements were statistically different, the mean difference was only 0.05 cm. The question is: Can volunteer-collected data inform management decisions made by tree care professionals?
Further examination of the data and its relative levels of agreement can move beyond the simple rejection or acceptance of the null hypothesis that there is no difference between expert and volunteer collected data. Table 5 defines thresholds for levels of agreement and percentages for when those thresholds were met for each variable. Volunteer and expert DBH measurements were within 2.5 cm 92% of the time. Of the health rating variables, fine-twig dieback had the best overall agreement between volunteer and expert raters (mean difference 3%, Table 5). The dieback rating has 21 possible categories as opposed to 5 categories for each of the other variables determined by ocular estimate. The defoliation and discoloration rubrics require raters to make estimates in 25% classes; volunteer and expert raters were within two classes of each other more than 90% of the time for these variables (Table 5). These results suggest that training methods may need to be adjusted to increase the reliability of defoliation and discoloration estimates. Evidence also suggests that increasing the number of response categories may help with estimate reliability. Preston and Colman (2000) determined that more than five response categories yielded the most reliable results in a study on reliability, validity, and rater preference of rating scales of varying length.
The crown vigor rating requires that the rater follow a more complicated rubric, which is partially based on the dieback, defoliation, and discoloration ratings. The level of agreement for vigor was also within two classes more than 90% of the time. This variable requires careful application of the rubric and may also benefit from more detailed explanation and practice during training.
Percent crown transparency is not based on an ocular estimate but on the automated processing of digital photographs. The levels of agreement between volunteer and expert were not as good as expected, and they were within 15% of each other 76% of the time. This method is sensitive to how the pictures are taken. The photographs must be taken of the crown of the tree without including areas of sky outside the crown or parts of other tree crowns. Data management for the photographs is complicated. This could explain why 28% of the volunteer-assessed trees had no transparency data. The method depends on taking up to four pictures per tree to fully represent the variability in transparency. The volunteers took an average of 2.1 photographs per tree, not counting trees missing photos. The expert assessment had transparency photos for every tree and averaged 3.7 photos per tree.
The overall stress index rating is an average of the standardized value of each health assessment variable. The stress index was used to create the maps presented to the City of Oconomowoc (Figure 2) and provides a simple, spatially explicit view of ash tree health across the city. A pairwise comparison of expert versus volunteer stress index scores showed no significant differences (Figure 4; Table 4); however, this figure shows areas of greater disagreement in the middle of the frequency distribution. The greatest discrepancy between scores occurred for trees in which the volunteers were missing transparency data. This indicates a sensitivity to missing data in the method.
The greater difference between raters in the middle of the stress index distribution (i.e., volunteer raters underestimate stress) could have an impact on decisions about whether trees should be treated with insecticides or not. Treatment is not recommended for trees with greater than 50% canopy loss (Herms et al. 2014), consequently managers could use the maps and data provided by the volunteers to locate a selection of trees that meet criteria for treatment but would then need to verify with on-the-ground assessment by experts.
This citizen science project was successful in several different ways. The City of Oconomowoc used the data to prioritize ash street tree removal with overall stress index classes 7–10 receiving the highest priority (B. Spencer, Superintendent of Parks and Forestry, personal communication). In this way, the project fulfilled its role as a scientific partnership. This community-based monitoring project also fulfilled the requirements of an Eagle Scout service project that subsequently earned the 2017 National Eagle Scout Association Council service project of the year award. The project deepened the knowledge base and relationship of this Scout team with its local trees and with practical science. Perhaps more importantly, the project created a model for small-scale citizen-science projects, where even young citizens are trained to collect helpful information. Future tree health CBM projects will be able to take advantage of the tree health assessment module in the Healthy Trees, Healthy Cities app (HTHC 2018).
Tree vitality assessment and hazard tree management are priorities for urban tree inventories, although the accuracy of these methods is rarely tested (Nielsen et al. 2014). Nielsen et al. (2014) goes on to say that along with how the data is collected, the “why” should be a primary consideration. In this case, non-stressor-specific variables were used to assess the overall physiological health of ash trees that were likely infested by EAB. This methodology says nothing about whether a tree is hazardous or not. Tree hazard ratings should only be completed by certified arborists for liability reasons.
For cities such as Oconomowoc, which are faced with EAB and the death of their ash trees, knowledge of current tree condition is needed to plan and prioritize tree removal. This project was an example of how CBM can leverage urban tree inventory data to create a useful product without further expenditures of municipal personnel or funding.
The overall utility of projects like this can only be determined by policymakers and urban forestry professionals who will ultimately use the information. The study authors provided a nuanced look at levels of agreement between experts and volunteers that includes tables of thresholds that can be used by urban foresters to make a determination as to whether the data is useful based on the questions they need answered.
Acknowledgments
We would like to thank the Oconomowoc, Wisconsin, Boy Scout Troop 169 for the time and effort they dedicated to assessing the health of their community’s ash trees. We also appreciate the data and support from Bryan Spencer, Oconomowoc Parks and Forestry Superintendent.
- © 2018, International Society of Arboriculture. All rights reserved.