Abstract
TreesCount! 2015 (TC2015) was the third citizen-participatory inventory of street trees in New York City, New York, U.S. Every ten years, the New York City Department of Parks & Recreation has worked with citizen scientists to record the location, size, species, and condition of all public curbside trees. Volunteer street tree inventories promote awareness of the importance of the urban forest and support municipal urban forest management. New York City’s prior street tree inventories in 1995 and 2005 led to advances in customer service, funding for routine street tree pruning, and urban greening initiatives. TC2015 attracted 2,241 voluntary participants through multiple recruitment efforts, more than doubling involvement from 2005. Fully digital data collection improved data quality and facilitated near-real-time quality assurance of data, and advanced tree location methods increased spatial data accuracy from past inventories. Data-collection events and reward strategies were also implemented to promote volunteer engagement. Citizen scientists collected tree location data with a high-level of accuracy (96.1%) after minimal training. All 666,134 street trees surveyed in TC2015 populated NYC Parks’s operational forestry database, as well as a public facing map (NYC Street Tree Map) for tree stewards. The following paper describes TC2015 project design and execution, outlines some of the key changes made since the first inventory in 1995, and provides results-based recommendations for practitioners planning similar projects.
TreesCount! 2015 (TC2015) was the third decadal street tree inventory and the largest citizen-science project ever undertaken by the New York City Department of Parks & Recreation (NYC Parks). Like its predecessors, which began in 1995 and 2005, TC2015 spanned two years, beginning in May 2015 and finishing in October 2016. The goal of all three inventories was to engage citizens in inventorying NYC’s street trees to facilitate management and stewardship of this natural resource. TC2015 differed from the two prior inventories with its greater spatial accuracy, citizen-scientist engagement, and inclusion of fewer but more operationally useful variables.
TC2015 yielded a spatially accurate database of NYC’s street trees that has been incorporated into NYC Parks’s daily operations. These data are also shared with the public through the NYC Street Tree Map (NYC Parks 2017), an interactive web application that displays tree location, species, and size, and allows stewards to record and track their tree care activities and report problems, thus creating a dialogue with city foresters. NYC Parks updates this map daily from its forestry management system. The TC2015 data set has also been made available on the NYC OpenData portal (2017).
TC2015 engaged over 2,000 citizen scientists, many of whom had no prior involvement with NYC Parks. These participants contributed data for 225,595 of the total 666,134 street trees inventoried. The remaining data were collected by NYC Parks staff with assistance from the Student Conservation Association. The following article discusses TC2015 recruitment, training, and reward strategies; variable selection; and data collection methodology, with an overview of the major successes and lessons learned from working with citizen scientists on a project of this size and scope.
PROJECT DESIGN
Recruitment Strategies, Training, and Events
TC2015 recruitment strategies employed both traditional and social media. A citywide advertising campaign included 4,500 posters in buses, trains, and public transportation hubs; local radio and television ads; and social media posts, with the aim of recruiting a broad range of participants. The advertising campaign provided the URL for the official TC2015 website, where interested individuals could learn more about the project and how to participate. NYC Parks also partnered with 64 community groups to recruit their membership for involvement, similar to recruitment efforts implemented by Purcell et al. (2012).
Participation was a multi-step process. First, individuals were required to read a training manual (available in the online supplemental materials) and successfully complete an approximately 30-minute online training module. The next step was to attend a three-hour in-field training exercise in which participants learned the TC2015 data-collection methodology and received a gear pack containing a t-shirt, safety vest, leaf-identification guide, tape measure, and surveyor’s wheel. TC2015 had a shorter training time than other similar efforts (Bloniarz et al. 1996; Cozad et al. 2005; Roman et al. 2017) with the aim of accommodating more people. After this introductory field training, participants could join frequent data collection events in neighborhoods across NYC; after two events, they could request to map independently. Additionally, before TC2015 commencement, one member from each community group was trained to be a team leader to recruit their members and conduct trainings. A total of 983 training and data collection events were held over the course of TC2015. Events were intended to both amplify data collection and create a community around the project. The importance of this community aspect is highlighted by research on volunteer motivation, which suggests a large proportion of long- and short-term volunteers participate to meet new friends (Smith et al. 2010; Haywood 2016).
As a result of these efforts, over 9,000 people created TC2015 accounts, 4,383 completed online training, and 2,241 citizen scientists participated in data collection—a 200% increase in active participation from 2005.
Participation Incentives
An incentive system was built into the project design and website as a strategy for volunteer recruitment and retention. Rewards were associated with different participation levels—50, 100, 250, 500, and 1,000 surveys (blocks of trees). These prizes consisted of TC2015 branded gear, such as hats, coffee mugs, and bags; the top participant received a park-bench dedication. Additional prizes went to participants who mapped in four of the six seasons the inventory spanned (spring, summer, autumn, and winter 2015, and spring and summer 2016). TC2015 also partnered with an arts festival, which offered a free ticket to any participant who attended one of their sponsored data-collection events. Appreciation events were held for citizen scientists, including a final event at the Museum of the City of New York, where the top participants were recognized, and maps of the final data were displayed. In addition to rewards, the data-collection application was gamified: when signed in, a counter revealed to each participant their reward level, the number of trees surveyed, and the total number of species they identified (Dickinson et al. 2012). Participants were also sent frequent “thank you” emails to inspire volunteer retention (Wolcott et al. 2008), and were made aware of the intended use of the data to inspire interest (Legg and Nagy 2006; Conrad and Hilchey 2011).
Spatial-Data-Collection Methodology & Technology
For TC2015, NYC Parks incorporated new data-collection technologies with the aim of increasing data quality while decreasing the time spent collecting data per tree. Specifically, researchers implemented new spatial-data-collection methods and shifted to fully-digital data collection, omitting the need for the paper datasheets used in past inventories.
During the 1995 and 2005 inventories, tree locations were recorded by street address, which is the most common method used in urban tree monitoring programs (Roman et. al 2013). While relatively low cost, NYC Parks found this method lacked precision, creating operational difficulties at locations like large buildings fronting multiple streets with multiple trees, or trees in center medians. Imprecise locations resulted in confusion when tree workers were deployed to specific trees sharing a single address with others. Further, in 2005, the address-based method resulted in over 23,000 trees that could not be geocoded because of incorrect location data.
To increase the accuracy of spatial locations, NYC Parks considered methods developed by TreeKIT (Silva et al. 2013) and OpenTreeMap (OpenTreeMap 2017). TreeKIT uses linear referencing, where participants use a surveyor’s wheel to measure along lengths of sidewalk (blockedges) from a start point and enter distance-to-tree measurements into paper datasheets. The data from one blockedge comprises one survey. These linear measurements are then geocoded using linear referencing methods in GIS. OpenTreeMap utilizes GPS and manual placement of tree-points on aerial imagery on a tablet. NYC Parks evaluated both methods in a small pilot with citizen scientists who all preferred the TreeKIT method, citing ease of use and understanding. NYC Parks implemented a simplified version of the TreeKIT method (omitting tree pit measurements) for TC2015.
NYC Parks contracted with a software company, Azavea (Philadelphia, Pennsylvania, U.S.), to create a web-based data-collection application that citizen scientists could access on any web-enabled device, as recommended by Silva et al. (2013). The code for this web application is currently available on GitHub (Azavea 2017), a software development platform, and a full open-source data-collection toolkit is in development. The application supported in-field data entry and visualization of collected tree locations overlaid on aerial imagery for review before submission. The application also featured a live interactive map that displayed the collection status of all city blockedges, allowing participants to independently identify locations requiring data collection, and reserve them for collection. This shift to digital data collection eliminated the need for manual data entry, an intensive part of both previous inventories fraught with transcription errors. The web application also prevented the submission of incomplete data by precluding blank fields, enabled live progress reporting, and a near-real-time quality assurance (QA) process (Houston and Heider 2009).
Inventory Variables
Table 1 outlines the variables included in 1995, 2005, and 2015 by category, and describes changes between inventory years. The general categories of variables are metadata, location, tree, infrastructure, and site. In-depth descriptions of each of the 2015 variables can be found in the training manual, included in the online supplementary material.
NYC Parks weighed the need for accurate and robust data for management with the need for a streamlined collection protocol suitable for laypersons (Ferretti 2009; Kosmala 2016). Variables from 1995 and 2005 were re-evaluated, and Östberg et al.’s (2012; 2013) research was consulted to ensure TC2015’s variables were comparable to preexisting data sets to allow for cross-study comparisons. Four of the top-five variables recommended by Östberg et al. were included: tree species, vitality class (Tree Condition), coordinates, and tree ID number. Species, condition, and coordinates (from the TreeKIT method) were included in field data collection, and tree ID number was auto-generated by the web application. Östberg et al. (2012; 2013) also included “risk posed” in their top-five recommended variables, although NYC Parks chose not to include this variable because citizen scientists are not qualified to perform tree risk assessments.
The Stewardship variable was a new addition in 2015, based on Lu et al.’s (2010) research suggesting the quantity of stewardship signs present at a tree correlates positively with survivorship of young street trees. Quantity of stewardship signs refers to the number of individual signs of stewardship noted for a single tree and includes indicators such as the presence of plantings, “curb your dog” signs, and mulch. This study also found a positive correlation between the presence of a tree guard and survivorship; as such, the Tree Guard variable from 2005 (previously called Vertical Treatment) was retained in an altered form. Other variables altered but retained were Tree Problems, Tree Condition, and Tree Circumference. The Tree Problems variable was renamed from the 2005 variable Infrastructure Conflicts because it was perceived as easier to understand. Tree Condition, as collected in TC2015, eliminated the option of “excellent” in order to leave the top condition category for park trees, which generally have more favorable growing conditions; this served to reduce the number of categories citizen scientists had to learn. Tree Circumference to the nearest inch was collected in TC2015 rather than DBH, in order to eliminate confusion over which side of a diameter tape to use. These measurements were converted to DBH post-collection. Additionally, the shift to web-based data collection greatly reduced the number of variables collected by completely eliminating collection of metadata as described in Table 1.
DATA QUALITY
Spatial
Extensive QA was conducted on tree locations by comparing the locations of the trees inventoried, as determined by linear referencing, and the location of trees on current, orthorectified aerial imagery. If the tree locations were not within 3.05 m of the stems on the imagery, the entire survey was rejected. The overall survey rejection rate was low for both NYC Parks’s staff (2.2%) and citizen scientists (3.9%). These numbers do not take into account the ≈7% of data that was edited for minor inconsistencies during the QA process. These overall numbers were calculated by pooling all surveys collected within each data collector group and dividing the total rejected by the total collected. Individual user rejection rates show varied performance between lower- and higher-level contributors. Users who contributed ≤10 surveys had an average rejection rate of 14.9%, whereas those who contributed ≥25 surveys had a rate of 3.9%. These numbers were calculated by averaging the individual user rejection rates within each category.
Inventory Variables
To measure volunteer accuracy for non-spatial variables, NYC Parks’ staff resurveyed approximately 1% of volunteer-collected trees. Comparative results can be found in Table 2. Percentage agreement was calculated for all variables, and sensitivity (Sn), specificity (Sp), positive predictive value (PPV), and negative predictive value (NPV) were calculated for all presence/absence variables. All of the calculated statistics are judged based on the following scale: high >85%, moderate = 70%–85%, low <70%. When calculating Sn, Sp, PPV, and NPV, staff data are considered to represent the actual state of the specimen, and volunteer data are treated as the test data. See Table 3 for clarification of Sn, Sp, PPV, and NPV. The variables with high percentage agreement include Genus, DBH within 2.54 cm, and Tree Guards, as well as all Tree Problem sub-variables, except Root Problem: Sidewalk/Stones. Moderate agreement was found for Species, Tree Condition, Root Problem: Sidewalk/Stones, and Sidewalk Damage. Low percentage agreement was seen for exact DBH. Additionally, low PPV was seen for all presence/absence variables except Stewardship, although NPV was high for all, except Sidewalk Damage.
DISCUSSION
TC2015 resulted in a spatially accurate data set that was incorporated into NYC Parks’ forestry operational database. The high-quality spatial data were a particular success, indicating the scalability of the TreeKIT method. For street tree inventories, this spatial-positioning method with application-based data collection is highly recommended. This method not only produced more accurate spatial data than previous inventories, it also allowed for an ongoing QA process that greatly increased the quality of NYC Parks’ final spatial data set.
In addition, the quality of volunteer data for key tree variables (Genus, Species, DBH [within 2.54 cm], and Tree Condition) had moderate-to-high agreement with staff data. Genus agreement (86.5%) with staff was higher than that for Species (77.6%). This was an expected result, as the study design called for greater focus on genus by including genus-only options, and instructing participants to focus primarily on genus-correctness, if species was unattainable. This study design was implemented because NYC Parks manages most species within a genus similarly, placing less importance on exact species identification. Overall, these results were similar to those noted by Roman et al. (2017), whose results found (mean) percentage agreements of 90.7% for Genus and 84.8% for Species, as compared to results of the current study of 86.5% and 77.6%, respectively. Additionally, agreement of 92.7% for DBH (within 2.54 cm), in the current study, was comparable to Roman et al.’s (2017) result of 93.3%, and percentage agreement for exact DBH was 32.0%, as compared to Roman et al.’s (2017) result of 20.2%. However, it should be noted that TC2015 measurements were collected as circumference to the nearest inch and converted to DBH, whereas Roman et al. (2017) collected sizes as DBH to the nearest tenth of an inch, making comparisons between the two studies tenuous. These DBH results suggest collection of circumference at breast height and conversion to diameter reduced confusion.
NYC Parks learned some valuable lessons about volunteer engagement from TC2015. On the basis of recruitment and volunteer participation, the data-collection events were very successful, with over 75% of citizen scientists having participated in the effort exclusively at events. Anecdotally, the reason for this event-centric data collection trend appeared to be the social aspect of events; though it should be noted that events alone did not result in volunteer retention (≈50% of individuals participated for a single day/event). The spatial data quality results noted above suggest those who contributed ≤10 surveys (approximately one day) had markedly less accurate tree locations than those who contributed ≥25 surveys, suggesting single-time participants may decrease data quality. Presumably those with low contribution levels joined due to passing interest or were motivated by one-off reward offerings, such as free tickets for attending one event. In one case, more than 300 participants from a partner group did not participate again after receiving their reward. The same pattern was not noted among the rest of the citizen scientist cohort in response to the TC2015 branded rewards, likely because such rewards took more effort to obtain. Overall, the top ten contributors in TC2015 collected data for 57,965 trees (equivalent to 26% of volunteer-contributed trees), with an average spatial data rejection rate of 1.4%. Their combined contribution was greater than that of the lowest participating 1,500 citizen scientists. These results suggest that to maximize data quality and volunteer retention, catering to a highly involved and motivated contingent may be optimal (Conrad and Daoust 2008; Conrad and Hilchey 2011; Eveleigh et al. 2014).
Recruiting and retaining long-term volunteers might be accomplished by requiring minimum time contributions (i.e., multiple days), or a minimum data-quality standard. The latter could be accomplished by requiring pre-qualifications for participation (Kosmala et al. 2016), or through monitoring performance, and retraining participants as needed. However, project managers should consider that minimum time requirements can also be a disincentive to participation. Additionally, Wiggins et al. (2011) noted performance monitoring and correction is educational, but also potentially demotivating. Volunteer performance monitoring during TC2015 included gentle reminders to poor performers at events, and rejection of low-quality spatial data (forcing resurvey) without feedback to the participant. This method worked for TC2015, since positive engagement with volunteers was a lasting goal and the data could be re-collected. Projects with a need to maximize data quality may consider more stringent requirements.
Care should also be taken in choosing variable names. The results also show widespread assessment errors (Ferretti 2009) associated with the Tree Problems variable. This may be because of a combination of misinterpretation of the meaning of the variable and training errors. For most of the Tree Problems sub-variables, there is high percentage agreement, but low Sn, and even lower PPV (max 50%). This means participants did not record the majority of problems (low Sn), and most of their identified problems were incorrect (low PPV). However, there was high overall percentage agreement because absences made up the majority of the data set. Part of the confusion may relate to the name change from Infrastructure Conflicts in 2005 to Tree Problems in 2015, which was intended to provide clarity using simpler wording, but potentially led to greater confusion. Some participants indicated they thought it referred to anything wrong with a tree, including structural issues that were outside the parameters of the variable. This tendency to include more problems than required may explain the very low PPV, though the low Sn suggests many were excluding legitimate infrastructure problems. Both of these results indicate inconsistent or subpar in-field training for this variable. NYC Parks suggests it is preferable to use a potentially unfamiliar but specific term (Infrastructure) than a more general term that can be interpreted in multiple ways (Problems).
TC2015 built on the experiences of the 1995 and 2005 inventories to actively engage over 2,000 volunteers in inventorying the street trees of New York City. Novel applications of mapping technology coupled with field-based training and data-collection events provided participants with the tools to collect accurate spatial data and connect with like-minded citizen scientists in group settings. A web application served as the platform for data collection and live progress reporting for volunteers and program managers. TC2015 accomplished one of its main goals of creating an operational data set with highly accurate spatial locations and standard tree variables (DBH, Species, Genus, and Tree Condition), aiding foresters in confidently identifying trees to which they are deployed. The inventory data were also incorporated into the NYC Street Tree Map, a public facing web application of the forestry operations database that is updated daily (with removed and planted trees), and displays the species, size, and ecological benefits of the urban forest. This map allows participants to access and use the inventory data to record and share their tree care activities, which has resulted in over 10,000 recorded stewardship activities as of May 2017. NYC Parks has carried the momentum forward by continuing to work with a small cadre of highly experienced TC2015 citizen scientists to inventory other components of the urban forest, including trees within landscaped areas of parkland.
Acknowledgments
We would like to thank our project sponsor, NYC Parks First Deputy Commissioner Liam Kavanagh, as well as project contributors, including U.S. Forest Service colleagues at the NYC Urban Field Station, TreeKIT, Azavea, as well as the many NYC Parks staff and TC2015 volunteers. In particular, we would like to acknowledge our top five citizen science surveyors: Jennifer Chen, Joel Berson, Robert Hay, Sarath Sochannam, and Ruth Salas, whose dedication and skill were without equal.
Footnotes
Supplemental Content. The training manual for participating volunteers is available for browsing on the website of the publisher, International Society of Arboriculture (www.isa-arbor.com). The manual is also available as an electronic file (.pdf) upon individual request (editor{at}isa-arbor.com).
- © 2018, International Society of Arboriculture. All rights reserved.