Abstract
Background The inventory of street tree populations has acquired new importance due to interest in the provision of ecosystem services. That said, this paper aims to compare systematic sampling with stratified systematic sampling using different sizes of sampling units to estimate the variables of interest: number of trees per kilometer of sidewalk (DF), basal area per kilometer of sidewalk (Dg), mean total height , volume per kilometer of sidewalk (DV), and number of species per kilometer of sidewalk (DE). An innovative contribution here is testing new alternative density variables.
Methods In the densely urbanized area of Piracicaba (Sao Paulo State, Brazil), 90 sets of 4 blocks were systematically sampled. They were used to compose sampling units of 1, 2, 3, and 4 blocks. Stratification was based on the percentage of street tree cover obtained with geoprocessing tools. Only public trees with a circumference at breast height greater than or equal to 12 cm and planted on sidewalks or avenue medians were included.
Results The effect of sampling unit size and stratification on estimate accuracy, sample size, and sampling intensity were analyzed. The results show that stratified systematic sampling was the more accurate process, especially for DF, Dg, and DV.
Conclusions Reductions in sample size were more significant when stratified systematic sampling of 2-block sampling units were used.
Introduction
Forest inventories are necessary to quantify the benefits provided by street trees, such as air purification, thermal comfort, reduction of heat island and stormwater effects, landscape and habitat connectivity, biodiversity conservation, carbon sequestration potential, and others (Kim 2016; Tan et al. 2021; Cavender-Bares et al. 2022).
The inventory scope, variables of interest, precision, and process are determined by goals, limited by time and resources, and can be influenced by specific features of an area and study population (Avery and Burkhart 1983; Shiver and Borders 1996). Concerning street tree populations, sampling inventories can provide adequate information for many projects and are less costly than a census. The sample is sufficient to acquire an overview of a population, though it does not provide specific data on all trees (Grey and Deneke 1992; Miller 1996).
To run a sampling inventory, first it is necessary to define the variables of interest, which are the population characteristics that will be estimated. Subsequently, it is necessary to define the sampling unit so that the population is divided into small portions from which can be obtained a unique observed value of the variables of interest (Avery and Burkhart 1983; Shiver and Borders 1996). The sampling units can be equal-area or variable-area plots, or non-surface sampling units, such as lines and points (e.g., the trees themselves) (Jaenson et al. 1992; Alvarez et al. 2005; Nagendra and Gopal 2010; Nowak et al. 2015).
In sampling inventories of street tree populations, variables of interest are usually defined per sidewalk length, like trees per kilometer of sidewalk. The sampling unit is a block or street, and its edges are sidewalks. In this way, it is easy to identify and determine which plants should be considered (Jaenson et al. 1992; Alvarez et al. 2005; Nagendra and Gopal 2010; Nowak et al. 2015).
Sampling units of efficient size are those that provide estimates with low variance (S2), which means that the sampling units have little variability between each other and values closer to the sample mean. Increasing sampling unit size can decrease their variability once they become more like the population, but it is necessary to find the size that is cost effective (Avery and Burkhart 1983; Shiver and Borders 1996).
The sample comprises a set of selected sampling units using a determined sampling process. The most common are: simple random sampling (sampling units are randomly obtained within the population); systematic sampling (sampling units are selected at a constant interval from a selected sampling unit that is initially sampled randomly); and stratified systematic sampling (a population is initially divided into strata, also called homogeneous sub-populations, and the sample is thus composed of sub-samples from each stratum) (Avery and Burkhart 1983; Shiver and Borders 1996).
The distribution and composition of street tree populations are influenced by factors such as degree of urbanization, zoning, road infrastructure, municipal investment and management, educational level, and residents’ affinity with plants. For example, city street trees are often grouped in monospecific plantings locally for aesthetic or cost reasons. This is one reason for spatial heterogeneity (Jim 1998; Nagendra and Gopal 2010; Lo and Jim 2012).
Therefore, woody regions occur throughout the city, and variables of interest are grouped in spatial distributions accordingly. In this context, systematic sampling is presented as a suitable process for sample selection, as the sampling units are evenly spaced and can be proportionally distributed in the regions of the area. Thus, a systematic sample assures greater spatial balance and is more likely to cover the range of variable values than a random sample. Another advantage of systematic sampling is that the location of a systematic sampling unit is usually easily identifiable (Avery and Burkhart 1983; Shiver and Borders 1996). Several previous studies about street tree inventorying have used random samples. Still, it is generally accepted that the sampling units may be concentrated in certain areas, and that the sample might omit, underrepresent, or overrepresent large regions of the city (Jaenson et al. 1992; Maco and Mcpherson 2003; Alvarez et al. 2005).
Systematic samples do not necessarily represent the population more than simple random samples. However, systematic samples assure greater spatial balance and, therefore, have a greater probability of covering the range of variable values in the population.
The main potential difficulty of systematic sampling is associated with the periodicity of the population, which may occur if there are heterogeneous regions in the area. Of course, this will only be noted if the sampling units intersect those regions. Under these conditions, estimates of variables of interest will be inaccurate, and variance will be underestimated, since this sample may not specifically resemble the population. On the other hand, stratified systematic sampling seems to be more efficient in heterogeneous populations because the estimates are separately made for each stratum rather than for the entire population. This results in lower standard errors, assuming strata are correctly defined and the homogeneous subpopulations are properly delimited (Avery and Burkhart 1983; Shiver and Borders 1996).
If the stratification is proper, the stratified systematic sample will be smaller than the simple systematic sample, i.e., a sample with fewer sampling units (Avery and Burkhart 1983; Shiver and Borders 1996). Regarding the stratification of street tree population by sociopolitical variables, like socioeconomic level, administrative division, and occupancy date, no previous research examining a correlation between those variables and the distribution of street trees throughout the city was found. Although such stratification could contribute to characterizing city zones, it does not reduce sample size (Jaenson et al. 1992; Alvarez et al. 2005).
Nevertheless, Nagendra and Gopal (2010) obtained efficient stratification using a quantitative variable: road width. The percentage of street tree cover is a quantitative variable as well, and using it as a stratification variable seems to be suitable, given its positive correlation with tree crown area, diameter at breast height, total height, and number of trees (Brix and Mitchell 1983; Sanders 1984; Nowak 1994; O’Brien et al. 1995; Tonini and Arco-Verde 2005).
Traditionally, a street tree inventory is used as a source of data for planning their management in the urban space, collecting variables that capture characteristics like the distribution of species and diameters, plant health, and distance from other urban elements (Grey and Deneke 1992; Miller 1996). However, the expansion of urban ecology and interest in ecosystem services derived from street trees drives the demand for knowledge about efficient processes for accurately estimating underexplored quantitative variables, such as crown area, volume, and biomass of populations or species. These are essential to managing street trees to understand better how to augment their ecosystem services (Nowak et al. 2008; Speak et al. 2018).
The estimation of ecosystem services has been made using allometric equations obtained by regression analysis, in which the dependent variable is the ecosystem service, and the independent variables are measurable characteristics of the population (Woodall et al. 2011; Pretzsch et al. 2023). There is also software developed for this purpose, such as i-Tree, which requires variables like diameter at breast height (DBH) and tree species in its calculations (Bagstad et al. 2013; Zięba-Kulawik 2021).
This manuscript deals with a case study comparing systematic sampling with stratified systematic sampling using the percentage of street cover as a stratification variable for street tree inventory in Piracicaba City, Brazil. Four different sampling unit sizes are also compared to estimate five variables of interest.
Materials and Methods
Study Area
Piracicaba is a medium-sized city located in the Brazilian state of Sao Paulo. The city has an overall area of 221 km2, a human population of 356,743 (1,614 people/km2), and 1,575 km of public streets (IBGE 2010; IPPLAP 2015). The study area covers only the most densely urbanized area of Piracicaba (22°39′23″S and 22°46′51″S, 47°34′49″W and 47°42′16″W), which is at 554 m in elevation and has an Aw Köppen class–tropical savanna climate with dry winter. It covers only 82 km2 (37% of the city’s overall size) and 991 km of public streets (63% of the total length).
To delimit the study area, we used an image of the overall area of Piracicaba that was captured on 2011 April 22 by the WorldView II satellite (0.5-m spatial resolution) composed of R, G, B, and NIR bands, merged, orthorectified, and georeferenced to WGS 1984 datum and Universal Mercator system (UTM) Zone 23 South coordinates. A georeferenced grid with cells of 0.75 km2 (862 m × 862 m) was laid over the image in QGIS software, and cells that had more than 50% of the urbanized area and formed a continuous area were selected for inclusion in the study area, amounting to 90 cells or 66.8 km2 (Figure 1).
Simple Systematic Sampling
Simple systematic sampling was carried out using the georeferenced grid of 90 cells. Then, a grid of points was used to mark the center of the cells, and each point served to locate the nearest set of 4 blocks without squares or parks that were preferably arranged around a crossroad. Using different configurations of the sampling unit, 4 samples with 90 sampling units were determined: 1, 2, 3, and 4 blocks (Figure 2).
The total number of blocks in the study area was 3,759, so for each sampling unit size, the population (N) corresponds to 3,759 sampling units (1-block size); 1,879 sampling units (2-block size); 1,253 sampling units (3-block size); and 940 sampling units (4-block size). Thus, the sampling intensity (i.e., the percentage of the population that has been sampled) from each sampling unit size was, respectively, 2.4%, 4.8%, 7.2%, and 9.6%.
For this study, the following estimates of simple random sampling for finite populations described by Shiver and Borders (1996) were used:
Sample mean : 1
Variance of sample (S2): 2
Coefficient of variation (CV): 3
Variance of sample mean : 4
Standard error of sample mean : 5
1 – α Confidence interval (CI): 6
Sample size (ne): 7
Sampling intensity (I%): 8
where xi is the value of ith sampling unit, n is the total number of sampling units in the sample, N is the total number of sampling units in the population, t is Student’s t-value for α obtained from the 2-tailed table, and 1 – α is the probability that the confidence interval will capture the true mean.
The sample size (ne) was determined for 95% and 90% confidence intervals (α equal to 0.05 and 0.1, and ta equal to 1.99 and 1.66, respectively) with 10%, 15%, and 20% of allowable error (AE%).
Stratified Systematic Sampling
Stratification of the Study Area by the Percentage of Street Tree Cover
First, the urbanized area in each of the 90 cells was separated to stratify the study area by the percentage of street tree cover. For this portion, the ArcGIS software vectorized areas are not considered part of that urban structure. They were: (1) unoccupied areas (non-urbanized areas); (2) new housing developments (despite having urban infrastructure, these areas do not contain a developed population of street trees, i.e., their circumference at breast height is less than 12 cm); (3) mining and agriculture areas are generally much larger than one block and are in industrial areas or suburbs of a city; and (4) riparian forests and rivers are protected areas, such as the Piracicaba River. These areas were subtracted from the area of the cells, leaving only what we called the urbanized area.
Next, the cover area of the street trees only was obtained. The same software was used to delete the vegetation not included from the cells, i.e., (1) central block vegetation (garden vegetation); and (2) parks and squares (including their sidewalks). The image contained only streets, sidewalks, avenue medians, and vegetation. Supervised Classification was run with an image processing software called Multispec to map the following land cover classes: street tree cover, asphalt, and exposed soil. Fifteen samples (with a size of approximately twenty pixels) of each land cover class were delimited in the image, shown by the class of spectral signature, and each pixel was classified (Multispec 2017). Finally, the percentage of street tree cover in each cell was calculated by the ratio of street tree cover area to urbanized area.
To identify the strata, the cells were categorized into the following street tree cover percentages: 1% to 2% (10 cells); 2% to 3% (31 cells); 3% to 4% (18 cells); 4% to 5% (10 cells); 5% to 6% (3 cells); 6% to 7% (8 cells); 7% to 8% (5 cells); 8% to 9% (2 cells); and 9% to 12% (3 cells). The 9% to 12% cells were grouped into the same category so that all categories had more than one cell (Figure 3a).
Two strata were delimited through visual interpretation of the cluster of cells with approximate street tree cover. So, most cells with more than 5% of street tree cover were placed into Stratum 1 and less than 5% into Stratum 2 (Figures 3a and 3b). A mix of cells remained. For example, cell #80, which has 6.76% tree cover, is in the middle of Stratum 2, although it has a value that would put it in Stratum 1. We distributed the cells with values between 4% and 5% into the 2 strata using the neighborhood criterion based on our best judgment of the tree cover. The certain mix of cells was useful in maintaining the continuity of the strata. The stratum continuity is not obligatory but can facilitate fieldwork in everyday situations.
Estimates of Stratified Systematic Sampling
The following estimates of stratified systematic sampling for finite populations were used for the 4 different samples (Shiver and Borders 1996):
Sample mean for stratum : 9
Variance of sample : 10
Coefficient of variation for stratum : 11
Standard error of subsample mean for stratum : 12
Mean for stratified sample : 13
Variance of mean for stratified sample : 14
Standard error of mean for stratified sample : 15
Upper bound on the error of estimation for sample mean (BM): 16
1 – α Confidence interval (CI): 17
Sample size (ne): 18
Sampling intensity (I%): 19
where xh,i is the value of the ith sampling unit within stratum h, nh is the number of sampling units from stratum h included in the sample, Nh is the total number of sampling units in stratum h, N is the number of sampling units in the entire population, L is the number of strata, t is the Student’s t-value for α obtained from the 2-tailed table, 1 – α is the probability that the confidence interval will capture the true mean, and wh is the proportion of sampling units in the stratum .
Sample size (ne) was determined for 95% and 90% confidence intervals (α equal to 0.05 and 0.1, and t∝ equal to 1.99 and 1.66, respectively) with 10%, 15%, and 20% allowable error (AE%), as in the simple systematic sampling. There is no coefficient of variation for the population, only for strata.
Collecting Data, Calculating Variables, and Determining Form Factor of Trunk from Street Tree Population
The data to calculate the variables of interest were collected from March to May 2013 (for 3 months). Only public street plants with a circumference at breast height (CBH) ≥ 12 cm that were planted on sidewalks or avenue medians up to 3 m wide were included (larger avenue medians were considered squares). It is important to note that shrub species are used as street trees in the study area. In Brazil, they are pruned to have only one stem and to reach about 3 m in height.
The following data were stored in a spreadsheet software using hardware in the field: location, species, circumference at breast height ( CBH, m), total height (Ht, m), and height of first fork (Hf, m). A sequential number was assigned to each sampling unit, block, and plant, and the street name was recorded; botanical identification was carried out in the field or when necessary by an expert in plant taxonomy using botanical material (a collected branch, preferably with flowers and fruits); CBH (m) was measured from the trunk at 1.30 m using a tape (0.1-cm accuracy), but if the plant fork was below 1.30 m, all branches with CBH ≥ 12 cm were measured; Ht (m) was measured from the base to the top of the plant using a hypsometer (measurement error up to 0.3 m; Vasilescu 2013); and Hf (m) was measured from the base to the first fork of the trunk or until the trunk tapered to 5.0 cm in diameter using a hypsometer. The perimeter of blocks (m) was measured along curbs using a wheel tape (0.1-m accuracy).
The following variables were calculated for each street tree:
Basal area of the trunk or sum of basal areas of branches (g, m2): 20 21
Corresponding diameter from the basal areas of branches (DBHc, m) or the sum of basal area 22
Estimated volume of trunk (V, m3): 23
where g (m2) is the basal area of the trunk or the sum of basal areas of branches; b is the number of branches; DBH (m) is the diameter at breast height of the trunk or each branch, which was calculated from the CBH; CBH (m) is the circumference at breast height of the trunk or each branch, which was measured from the street tree; i is the counter of circumferences and diameters at breast height from the branches; DBHc (m) is the corresponding diameter from the basal area of the trunk or sum of basal areas of branches; V (m3) is the estimated volume of the trunk using fc; Ht (m) is the total height of the plant; and fc is the form factor of the trunk from the street tree population, which was equal to 0.5178 as calculated below. Data analyses were conducted using statistical software (R Core Team 2019).
The form factor (fc) was established from the 10 most frequent species representing 56.6% of the sampled street trees. They were: (1) Murraya paniculata (L.) Jack (16.26%); (2) Licania tomentosa (Benth.) Fritsch. (10.36%); (3) Poincianella pluviosa (DC.) L.P.Queiroz (6.44%); (4) Lagerstroemia indica L. (6.42%); (5) Schinus molle L. (3.13%); (6)Magnolia champaca L. (3.05%); (7) Handroanthus chrysotrichus (Mart. ex DC.) Mattos (2.94%); (8) Syagrus romanzoffiana (Cham.) Glassman. (2.92%); (9) Calistemon viminalis G. Don ex Loud. (2.73%); and (10) Terminalia catappa L. (2.54%)(Table S1).
The range of basal areas (cross section at breast height areas) obtained for those species was divided into 3 classes, and 2 plants that forked above 1.30 m were randomly selected from each class. For those 60 plants (10 species × 3 basal area classes × 2 plants), the trunk scaling was determined using an electronic dendrometer (Laser Technology, Inc., USA), which provides estimates of height and diameter along the trunk with an accuracy up to 0.635 cm (Laser Technology 2016). Cross-section diameters of each selected plant trunk were measured from the base up to 5 cm or up until some obstruction arose, such as the canopy or where there were many forks. The actual volume of the trunks was calculated using Smalian’s Formula, as described by Husch et al. (2002), who claimed that it is necessary to measure diameters to 0.1 m, 0.3 m, 0.7 m, and 1.3 m from the ground, and thereafter at 1-m intervals to obtain accurate data of volume. Finally, form factor (fc) was established by the following formula (Prodan et al. 1997):
Form factor (fc): 24 25
where fc is the form factor; Va (m3) is the actual volume of the trunk by Smalian’s formula; Vc (m3) is the cylindrical volume of the trunk; g (m2) is the basal area of the trunk; Ht (m) is the total height of the plant; and 60 is the number of trunk scaling.
Variables of Interest of Forest Inventory
The following variables of interest were calculated for each sampling unit (Table S2):
Number of trees per kilometer of sidewalk (DF, u/km): 26
Basal area per kilometer of sidewalk (Dg, m2/km): 27
Mean total height of sampling unit (, m): 28
Volume per kilometer of sidewalk (DV, m3/km): 29
Number of species per kilometer of sidewalk (DE, u/km): 30
where f(u) is the number of plants in the sampling unit; P (km) is the length of the sampling unit, which is the sum of perimeters of its blocks; g (m2) is the basal area of the trunk; Ht (m) is the total height of the plant; V (m3) is the estimated volume of the plant; and e (u) is the number of species in the sampling unit.
Results
Some 5,744 plants were cataloged throughout 360 blocks within the city, corresponding to 9.6% of the total 3,759 blocks. If that value is extrapolated, the study area population could be estimated at approximately 60,000 street trees. From the 360 selected blocks, the mean perimeter was 428.31 m, the smallest perimeter was 155.1 m, and the largest perimeter was 1,164.2 m; the 99% probability range was 428.31 ± 280.23 m, and the coefficient of variation was 32.9%.
We observed that 1,599 individuals (27.83%) are shrubs and, therefore, belong to species that branch from the base, must be repeatedly pruned to acquire the shape of small trees, and have low wood density. The other 4,145 individuals (72.16%) belong to arboreal species, which have bigger dimensions than shrubs, only 1 or 2 stems, and vary in wood density.
Corresponding diameter at breast height (DBHc) and total height (Ht) show frequency concentration in lower class values, with DBHc up to 25 cm and Ht up to 6 m. The mean DBHc was 21.04 cm, the median was 17.51 cm, and the coefficient of variation was 65.16%, while the mean Ht was 6.20 m, the median was 5.10 m, and the coefficient of variation was 55.59% (Figure 4 and Table 1).
The Results from Variables of Interest
Here, we will reference the estimates of the variables of interest from 4-block sampling units and α = 5% because they have the lowest error (Table 2).
First, the mean total height of sampling unit shows a low sample mean value and the lowest variability among all variables of interest ( and CV% = 21.73 from SSS). The same behavior remains with the population stratification (, and CV% = 24.38 to Stratum 1; , and CV% = 17.86 to Stratum 2)(Table 2).
The number of street trees per kilometer of sidewalk (DF) shows a mean value ( = 38 plants/km from both sampling processes) that could represent one street tree every 26.5 m if they have a homogeneous distribution. This variable of interest shows intermediate variability compared with the others, even with stratification. It obtained a reduction of approximately 5% on the coefficient of variation of strata compared with the entire population ( CV% = 34.08 from Stratum 1, CV% = 35.94 from Stratum 2, CV% = 40.40 of the whole population)(Table 2).
Basal area per kilometer of sidewalk (Dg) and volume per kilometer of sidewalk (DV) present greater variability and much more noticeable reduction of the coefficient of variation due to the stratification, more than 15% to Dg (Dg presents CV% = 81.54 from entire population, CV% = 68.48 from Stratum 1, CV% = 55.93 from Stratum 2) and more than 24% to DV (DV presents CV% = 115.39 from entire population, CV% = 87.09 from Stratum 1, CV% = 77.44 from Stratum 2) (Table 2).
The number of species per kilometer of sidewalk (DE) demonstrates mean value associated with a high diversity of species around the city ( species/km from entire population; species/km from Stratum 1; species/km from Stratum 2)(Table 2). We identified 165 species belonging to 122 genera and 53 families (Table S1).
The Results from Increasing Sampling Unit Size
Now, we will consider only the results of increasing the sampling unit and not that of stratification (Table 2). However, similar behavior of error estimates can be identified from both strata.
Results of increasing the sampling unit show similar trends in error estimates from the variables of interest volume per kilometer of sidewalk (DV), basal area per kilometer of sidewalk (Dg), street trees per kilometer of sidewalk (DF), and mean total height of sampling unit . For each, there is a decrease in error estimates from the 1-block sampling units to the 2-block sampling units (decrease in CV value for DF from 54.32% to 45.84%; Dg from 102.97% to 87.62%; DV from 144.07% to 127.90%; and from 33.80% to 26.67%)(Table 2).
After that, the 3 error estimates became quite stable, reflected in the approximate sample mean values found in sampling units of 2, 3, and 4 blocks (except for , where sample mean values are similar among all sampling unit sizes). For Dg and DV, the sampling unit increase of 1 to 2 blocks causes a greater impact on decreasing error estimates. For DF and , the 2-block sampling units give us great precision for the 3 error estimates (Table 2).
For the number of species per kilometer of sidewalk (DE), the greatest decrease in error estimates occurs from the 2-to 3-block sampling unit size, and the stability of sample mean value occurs from the 3-to 4-block sampling unit size.
Now, we will observe the results of sample size (ne) and sampling intensity (I%) according to the increase in sampling unit size. For this, we will focus on 10% allowable error and 95% confidence interval (α = 5%), which require the greatest sample size and let us better note the change in ne and I% due to sampling unit size increase (Table 3).
We note for DF, Dg, and DV that the increase in sampling unit size from 1 to 2 blocks causes a relatively larger decrease in sample size ne (29.82% to DF, 30.69% to Dg, 28.95% to DV from SSS; 22.99% to DF, 33.55% to Dg, 30.19% to DV from StSS). For the 3-block sampling units, the decrease in ne is not as large (15% to DF, 15.27% to Dg, 19.33% to DV from SSS; 14.93% to DF, 11.5% to Dg, 14.59% to DV from StSS), and I% almost doubles compared with a 1-block sampling unit. Thus, 3-block and 4-block sampling unit sizes cause high increases in sampling intensity.
For , we see the same trend of sample size and sampling intensity values due to the sampling unit size increase, although the variability of the estimates is much lower than in those other variables. For DE, the increase in the sampling unit causes a small decrease in ne and a large increase in I% from sampling unit size increases.
The Results from Stratification by Cover Area
We will focus on the behavior of error estimates from 1-block sampling units to analyze their decrease as a function of stratification once the 1-block sampling units have greater variability between them and the other sampling unit sizes (Table 2).
In fact, the strata are much more homogeneous than the population for DV (CV% = 144.07 from entire population, CV% = 98.55 from Stratum 1, CV% = 92.78 from Stratum 2) and Dg (CV% = 102.97 from entire population, CV% = 81.20 from Stratum 1, CV% = 76.78 from Stratum 2), and more homogeneous for DF ( CV% = 54.32 from entire population, CV% = 42.54 from Stratum 1, CV% = 49.01 from Stratum 2). The stratification also promoted a decrease in values of the other error estimates ( and CI). The error estimates for the other sampling units’ sizes follow the same trend as the 1-block sampling unit.
For DE and , the strata are as heterogeneous as the population (DE: CV% = 41.23 from the entire population, CV% = 37.14 from Stratum 1, CV% = 39.83 from Stratum 2; : CV% = 33.80 from the whole population, CV% = 32.00 from Stratum 1, CV% = 31.34 from Stratum 2). A heterogeneity continues comparing other sampling unit sizes DE, while for , the heterogeneity is concentrated in Stratum 1, which has a high percentage of street tree cover.
Again, we will focus on the 10% allowable error results and 95% confidence interval (α = 5%)(Table 3). The sample size and sampling intensity are reduced to at least 20% when the stratification is applied to 1-block sampling units for DF, Dg, and DV. For and DE, the reduction is smaller, no greater than 11%.
The Results from Both Increasing Sampling Unit Size and Stratification by Cover Area
The use of stratified systematic sampling and 2-block sampling units, for 10% available error and 95% confidence interval, provided a reduction in the sample size (ne) by 41% for DF (114 to 67 sampling units), by 47% for Dg (378 to 200), by 45% for DV (677 to 370). In contrast, the sampling intensity (I%) was kept quite stable when compared with the use of simple systematic sampling and 1-block sampling units.
It is noted that the highest reductions in sample size (ne) and sampling intensity (I%) were obtained by changes in the allowable error (AE% from 10% to 15% and to 20%) and the confidence intervals (1 – α from 95% to 90%)(Table 3).
Discussion
Background of Street Tree Population
The study area encompassed residential, commercial, and industrial settlements of old and recent occupations in Piracicaba, the foundation of which dates back 250 years. Over time, the city did not have a well-defined urbanization pattern of territorial expansion. One of the results of the lack of planning for the urban space occupation is that great differences in the distribution of street trees and a wide range of block perimeters are observed throughout the city. This is the case in many cities around the world.
No surveys were found for Piracicaba regarding the percentage of trees in streets, parks, squares, and private properties. However, in Brazil, residential lots tend to have less room for trees (i.e., yard space) than in North America, and one of the reasons is the unplanned urban development.
A motivation to study street trees in cities like Piracicaba is the Municipality’s ability to manage the street tree population and implement public policy. Furthermore, the inventory of street trees is a niche for calculating ecosystem services and tree benefits, such as biomass estimation, carbon accounting, cooling effects, rainfall interception, and air quality, which are important to urban areas and urban planning. Specifically, in developing countries, carbon accounting could be used for Clean Development Mechanism projects. However, in the case of Brazil, the estimation of ecosystem services is not yet widespread among the municipalities.
In Piracicaba, the street tree population mainly comprises mature plants (i.e., in the reproductive stage). Most of them have small dimensions due to the abundance of shrubs used as street trees (28%), which reach about 3 m in height. The pruning is used to avoid the contact of the canopy with the electrical network at 5 m, which is executed by the electric company. So, a low value of total height mean and median (6.2 m and 5.1 m, respectively) was found. Young trees from new plantings and more mature and post-mature trees were also present (Table 1 and Figure 4).
Estimates from both sampling processes to the variables of interest can tell us more about the variability in the street tree population across the area (Table 2). The mean total height of the sampling unit , the low mean value (approximately 6.0 m for both sampling processes), and low variability, even if the population is stratified, show us that the pruning practice and shrub planting has been applied around the entire city.
The number of street trees per kilometer of sidewalk (DF) presents a mean value of 38 plants/km from both sampling processes, representing one street tree every 26.5 m if they have a homogeneous distribution. This is a low quantity of street trees, according to the maximum distance of 12 meters between them, which is recommended for some cities in Brazil (RGE 2001; Secretaria Municipal do Verde e do Meio Ambiente 2022). This variable of interest shows intermediate variability, which indicates a heterogeneous distribution of the number of street trees in the area.
That heterogeneous distribution gains prominence when we think about stratification by a percentage of street tree cover, since the number of street trees and the tree cover area are variables that are directly proportional. In fact, with stratification, there is a reduction of approximately 5% in the coefficient of variation of strata DF compared with the entire population (Table 2).
At the same time, the cover area is influenced by the stage and species of the plants. Basal area per kilometer of sidewalk (Dg) and volume per kilometer of sidewalk (DV) are variables that express the characteristics of each plant. Because of this, they present greater variability and a much more noticeable reduction of the coefficient of variation due to the stratification (more than 15% to Dg and 24% to DV). The variation between them can be explained, since Dg is calculated from only 2 spatial dimensions, while DV is derived from 3, resulting in the highest variability among all variables of interest.
Other characteristics that influence the variability of DV include the homogeneity of total heights, wherein a wide range of basal areas are related to a narrow range of heights. If these plants could grow with fewer prunings to avoid the canopy’s contact with the power lines and more prunings to shape the trunk, removing lower branches and forming canopies above the electrical wires, the basal area and total height would be better correlated, and volume would show less variability for the population.
The number of species per kilometer of sidewalk (DE) demonstrates mean values associated with a high diversity of species around the city ( species/km from the entire population). On the other hand, this variable of interest was not sensitive to the frequency distribution of species, since only M. paniculata and L. tomentosa jointly represent more than 25% of the street tree total, the 4 more frequent species make up 40%, and the top 10 almost 60% (Table S1).
Effects of Sampling Unit Size on Accuracy and Error Estimates
The sampling error is the estimated value of the sample and is associated with the variability among sampling units. The accuracy is the difference between the sampling error and the true value of the population parameter. One way to reduce this error is to increase the size of sampling units because it is expected that they will become more homogeneous compared to each other.
Many factors may affect street tree distribution, and different situations are possible. For example, an unvegetated street block may exist beside another heavily wooded one. Thus, a sampling unit that comprises more than one block will more effectively encompass population heterogeneity. Standard error of the mean , confidence interval (CI), and coefficient of variation (CV) are estimates sensitive to sampling error and make it possible to assess whether using one sampling unit size is better than another.
We can see the behavior of the error estimates as a function of sampling unit size for each variable of interest. Error estimates from the variables of interest volume per kilometer of sidewalk (DV), basal area per kilometer of sidewalk (Dg), street trees per kilometer of sidewalk (DF), and mean total height of sampling unit show a similar trend. For each, there is a decrease in error estimates from the 1-block sampling units to the 2-block sampling units (Table 2).
After that, the 3 error estimates became quite stable, reflected in the approximate sample mean values found in sampling units of 2, 3, and 4 blocks (except for , where sample mean values are similar among all sampling unit sizes). Nevertheless, Dg and DV present very high variability within the sample, and the sampling unit increase causes a greater impact on decreasing error estimates. Thus, using 2-block sampling units reveals itself to be the best procedure for these variables.
For DF and , the variability within a sample is not as pronounced as in DV and Dg, but even so, using the 2-block sampling units gives us great precision for the 3 error estimates and proves to be the best option.
The number of species per kilometer of sidewalk (DE) demonstrates low variability compared with DV and Dg, but it displays a peculiar behavior for the error estimates because, for them, the greatest decrease occurs from between the 2-block and 3-block sampling unit size, and the stability of sample mean value occurs between 3-block and 4-block sampling unit size. Although the variability is not so high for this variable, the stability of the sample mean value shows us that the best procedure was to use the 3-block sampling units. On the other hand, other aspects make using 3-block sampling units a bad option, as we shall discuss later.
Effects of Stratification on the Accuracy and Error Estimates
With respect to stratification, it is important to note that the street tree population strata are not clearly defined, i.e., the tree cover does not have an exact differentiation between regions of the study area. Thus, within the cells that divide the study area, a range of tree cover values from 1.01% to 11.56% was discerned, but boundaries for grouping by class were not evident. In addition, the definition of the strata becomes subjective once the boundary of each stratum is determined by our visual interpretation of the tree cover distribution (Figure 3).
Despite the subjectivity of the process, stratification by tree cover percentage was an effective strategy to improve precision, especially given that the variable of interest had high variability. Now, we will focus on the behavior of error estimates from 1-block sampling units in Table 2 to analyze their decrease as a function of stratification. After this, we will analyze their values relative to other sampling unit sizes.
In fact, the strata are much more homogeneous than the population of the variables DV and Dg and more homogeneous than DF. As expected, the stratification also showed a decrease in values of the other error estimates ( and CI). The error estimates for the other sampling units’ sizes follow the same trend as the 1-block sampling unit.
For DE and , the strata are as heterogeneous as the population. A heterogeneity continues comparing other sampling unit sizes in DE, while for , the heterogeneity is concentrated in Stratum 1. But for these variables, the stratification turned out to be of little use.
Indeed, we have achieved quality results stratifying for variables of interest that are directly correlated with tree cover area according to the literature (i.e., number, basal area, and volume of trees). The mean total height would be better correlated with tree cover area if the street trees had not been pruned as heavily, thereby reducing height.
Effects of Sampling Unit Size and Stratification on Sample Size and Sampling Intensity
In general, the motivation for increasing the sampling units or stratifying the population is to reduce the variability between sampling units, as reducing variability decreases the error estimates and increases the precision of the mean and variance statistics. Another consideration is the cost and time required to execute the sampling. In short, the sampling unit with optimal size and the most adequate sampling process will be one that gives the desired precision with the shortest time requirements and lowest cost. Thus, it is worth evaluating the effects of both sampling processes on the behavior of sample size and sampling intensity.
Sample size and sampling intensity are controlled by the researcher and are not directly dependent on sampling unit size and stratification. However, sampling unit size and stratification can affect the sample size and sampling intensity necessary to achieve desired levels of precision.
With regards to the increase in sampling unit size, it is expected that a larger sampling unit will comprise more of the variability of a population (Avery and Burkhart 1983; Shiver and Borders 1996). Less variability means that a smaller number of sampling units (or a smaller sample size) will be sufficient to represent the population to a specific allowable error. On the other hand, when we increase the sampling unit size, the sampling intensity will be larger per sampling unit once it will occupy a more extensive area. Therefore, in this case, sample size (ne) and sampling intensity (I%) are inversely proportional (Table 3).
To understand the effect of increased sampling unit size, we will focus on the sample size and the sampling intensity for 10% allowable error and 95% confidence interval (α = 5%)(Table 3). For DF, Dg, and DV, we noted that the increase in sampling unit size from 1 to 2 blocks causes a relatively larger decrease of ne (29.82% to DF, 30.69% to Dg, 28.95% to DV for SSS; 22.99% to DF, 33.55% to Dg, 30.19% to DV for StSS). When using the 3-block sampling units, the decrease of ne is not as large (15% to DF, 15.27% to Dg, 19.33% to DV for SSS; 14.93% to DF, 11.5% to Dg, 14.59% to DV for StSS) and I% doubles compared with a 1-block sampling unit. Thus, 3-block and 4-block sampling unit sizes can cause high increases in sampling intensity that compromise the efficiency of the inventory procedure. For , we see the same trend, although this variability is much lower than in DF, Dg, and DV. For DE, the increase of sampling units is inefficient, because it causes a small decrease in ne and a large increase in I% from 1-block to 2-block sampling units.
With regard to the stratification effect, assuming the sampling unit does not change, sample size and sampling intensity are directly proportional. We see that the sample size and sampling intensity are reduced to at least 20% when the stratification is applied to 1-block sampling units for DF, Dg, and DV (Table 3). For and DE, the reduction is smaller, no greater than 11%, and the stratification is not as efficient. We expected this would be the case, because and DE were not well correlated with street tree cover percentage (the variable of stratification), unlike DF, Dg, and DV. Thus, in the case where variability in the variables of interest is high and is correlated with the variable of stratification, stratified systematic sampling can be really useful to reduce the sampling effort.
In fact, using stratified systematic sampling and 2-block sampling units, for 10% available error and 95% confidence interval, is an efficient strategy to reduce the sample size (ne) by 41% for DF, by 47% for Dg, by 45% for DV. In contrast, the sampling intensity (I%) was kept quite stable when compared with the use of simple systematic sampling and 1-block sampling units.
Implementing a stratification scheme presents tradeoffs between the different time and monetary costs of computer-based work and field work. In this study, for example, stratification required high-resolution multispectral images, free and owned software, a computer, specialized staff, and time sufficient to run geoprocessing computations. As an advantage, however, working inside an office may be more comfortable and safer than doing fieldwork on city streets. The weather may be variable, subjecting staff to sun overexposure, heavy rainfall, or other events, and staff are also subject to hazards common in the city, including theft and car accidents. Beyond these considerations, field work necessitates a more complex level of organization, with considerations such as transport, fuel, measuring devices, data sheets, suitable clothing, food, and water. Therefore, reducing field time can be advantageous despite the expense of stratification.
In field work, two factors demanding time are clear: travel between sampling units and data collection in each. Smaller sampling units require shorter collection and travel times within the units than larger sampling units. With larger sampling units, besides reducing sampling unit number, time is saved in the number of trips between units and for data collection within the sampling units.
The highest reductions in sample size (ne) and sampling intensity (I%) to the inventory of variables of interest were obtained by changes in the allowable error (AE% from 10% to 15% and later to 20%) and the confidence intervals (1 – α from 95% to 90%) (Table 3). The desired precision of estimates must be determined by the objectives of an inventory, the availability of financial resources and time, the features of the population, and the variables of interest. Sometimes, a lower precision of estimates can be sufficient to answer what we need about street tree population. Finally, the increase of AE% resulted in diminishing returns for reducing sample sizes, regardless of confidence interval (Table 3). Shiver and Borders (1996) reported a similar pattern, i.e., as sample size increased, smaller reductions of allowable error were observed.
Conclusion
Piracicaba, a medium-sized city, is representative of many cities around the world, since it did not have a planned growth pattern during its development. One of the results of that is the heterogeneous distribution of the street tree population, which leads to the need for greater sampling efforts in the forest inventory to get good accuracy of estimates.
Stratifying the area by percentage of street tree cover to divide the heterogeneous population into more homogeneous subpopulations proved to be an appropriate technique, leading to a reduction in sample size by more than 20% for the variables of interest number of street trees per kilometer of sidewalk, basal area per kilometer of sidewalk, and volume per kilometer of sidewalk, which are directly correlated to street tree cover. Indeed, an innovative contribution of this paper is testing new alternative density variables.
In the study area, the variable of interest mean total height of the sampling unit presented a low mean value and low variability due to almost one-third of the street tree population being shrubs and due to the practice of pruning to avoid contact with power lines at 5 m. If these trees could grow without frequent pruning, the mean total height of the sampling unit and volume per kilometer of sidewalk would correlate better to the street tree cover; therefore, the sampling effort would be even lower. Number of species per kilometer of sidewalk demonstrated low variability among the sampling units in the city and among the strata.
In fact, the association of the stratification with increasing the sampling unit size from 1 block to 2 blocks, for 10% allowable error and 95% confidence interval, provided a reduction on average of 48% in the sample size for the variables of interest that had good correlation with street tree cover.
Since they performed well in Piracicaba, the tested techniques can be used successfully in other cities worldwide to achieve less sampling effort. If there is heterogeneity in the distribution of street trees between regions of the city, stratification by street tree cover can lead to a reduction in the sample size, regardless of whether pruning is carried out to avoid contact of the crowns with power lines. Increasing the sampling unit size was only interesting up to 2 blocks, as beyond that, the increase in sampling intensity compromises the gain in efficiency by reducing the sample size.
A good follow-up study would compare the random sampling units in the i-Tree software with stratified systematic sampling units or even the weighted method. This could help analyze how certain characteristics would suit i-Tree’s traditional ecological parameters (e.g., storm damage), since using its established sampling units could lead to a miscalculation. Sometimes a lot of sampling effort can be expended in sampling areas with few or no trees.
Another issue to be addressed in future research is how the leaf area index (LAI), a variable heavily used for studies focused on measuring ecosystem services, might correlate with our variables of interest. It would be interesting to see how the volume per kilometer of sidewalk and the basal per kilometer of sidewalk, for example, are correlated with LAI.
Conflicts of Interest
The authors reported no conflicts of interest.
Acknowledgements
The authors thank FAPESP (Foundation for Research Support of the State of Sao Paulo) for funding this research.
- © 2024 International Society of Arboriculture