## Abstract

Sampling can be used as a method for urban tree inventory estimation. There are several sampling methods available, and choices for urban tree inventory methods vary according to the place to be studied and the urban tree conditions. This study compared the results of simple and stratified random sampling methods with those of a total district tree census. The simple random sampling error was 17%, and the coefficient of variation was 47%. The stratified random sampling errors varied from 19% to 60%, and their coefficients of variation ranged from 32% to 70%, depending on the stratum. The Shannon diversity index (SDI) was low in the census (3.07), as in the simple random sampling (1.27). The total number of trees and the number of trees per kilometer of sidewalk calculated by the simple random sampling were similar to those obtained by the census. Because the sampling error obtained by stratified random sampling was higher than for that obtained by simple random sampling, the use of stratified random sampling was not advantageous when compared with simple random sampling. Furthermore, the stratified random sampling procedure was more complex.

Urban forest research has focused on tree surveys to assist planning by municipalities. Although a total tree survey is a difficult undertaking, there are statistical sampling methods that describe the whole from population samples.

The use of sampling as a research tool can be applied by sampling city blocks and number of trees per lineal kilometer. By means of a sampling inventory, it is possible to draw some conclusions, such as the frequency of species, their diversity, and species’ adaptation to the site (Chacalo et al. 1994).

Once the limits and characteristics of the population of street trees are known and the desired level of precision is established, the use of sampling techniques constitutes a procedure of significant efficiency for street tree evaluation. Using rectangular sample plots and number of trees per kilometer of forested sidewalk as main variables, simple random sampling procedures provided a significant efficiency for street tree evaluation, as previously demonstrated by Milano (1994). However, population heterogeneity can introduce sampling errors when the simple random sampling method is used.

Stratification can be used to increase survey precision when it is extrapolated for the total population once the characteristics of the strata are taken into account. The population can be divided in subpopulations, so that each one can be more homogeneous. This division in layers is called a stratum (Cochran 1977).

Couto (1994) showed that stratified random sampling can be very useful for urban forest surveys. The strata can be selected based on the district, street tree density, or a group of blocks, depending on the criteria chosen for grouping the sampling units for stratum composition. To obtain a coherent survey, the strata should possess a homogeneity of measured values. Thus, a precise median value of any stratum estimate can be obtained using a small sample of that stratum. According to Cochran (1977), those estimates can be combined to produce an accurate estimate of the total population.

Jaeson et al. (1992) used a method that combined different-sized blocks to identify the strata. The mean number of trees per randomly selected block was multiplied by the number of trees per block to estimate the total number of trees.

The choice of sampling type depends on a previous analysis or pre-sample of the area to be studied. According to Milano et al. (1992), it is necessary to quantify the forest and know its distribution in urban areas by defining its characteristics and quality.

Determining the purpose of the inventory is important in choosing the most appropriate methodology because each methodology presents a different degree of precision (Grey and Deneke 1978). According to Couto (1994), determining the most appropriate sampling type depends on the distribution of the measurements in space and time. A common criterion for choosing sampling techniques is species present and their relative frequency.

This study compares simple and stratified random sampling methods with a total tree survey of a district that has varying characteristics.

## MATERIALS AND METHODS

The city of Piracicaba is located at 22°42′ 30.9″ S latitude and 47°38′ 01″ W longitude. The tree inventory was performed in Santa Cecília district (Figure 1), located in the East Zone of Piracicaba, São Paulo, Brazil. The residents of Santa Cecília are in the middle- and upper-middle-class income groups.

The district of Santa Cecília contains parcels of land that were occupied gradually. The urban district is relatively new. In 1970s, urbanization began with intensity and continues. This fact has contributed to the diversity of urban characteristics with regard to human population, socioeconomic level, and presence of vegetation. This heterogeneity must be analyzed in full.

The block was the sampling unit chosen, and the data records were expressed as number of trees per lineal kilometer of sidewalk. The variable definition was the ratio of the total number of existing trees on the sidewalks to the total of kilometers of sidewalk. AutoCAD software measured these values to obtain a district map.

The total number of blocks in Santa Cecília district is 57, excluding public free spaces. An initial 10 random blocks were sampled, with the later addition of 11 more random blocks. One block was dropped from the total of 21 because it had no trees and would thereby exert undue influence on the analysis. For the stratified random sampling, the same 20 blocks were divided into four strata. The sampling strata were divided to achieve homogeneity, based on date of initial occupancy, physical proximity to other blocks in the stratification group, and purchasing power of residents.

The typical sampling unit was a four-sided block, but other clearly defined shapes were acceptable, especially in stratum 3, which had cul-de-sacs.

The measured variables were number of trees per lineal kilometer of sidewalk and total number of trees in the district. The estimate of the total number of trees in the district (57 blocks) is important, even though it does not express a difference in the tree density. The number of trees per kilometer of sidewalk gives a clearer measure of the presence of the trees per human-occupied space. On the other hand, evaluating the density does not mean that the quality of the urban forest is analyzed (i.e., there could be an excess of nonappropriate species and/or low diversity). Therefore, the Shannon diversity index (SDI) was selected to describe species diversity in relation to the total number of trees (Pielou 1975).

### Simple Random Sampling

The number of trees per kilometer of sidewalk was estimated according to Cochran (1977), who defines population ratio as

where *X _{T}* is the number of existing trees in the blocks, and

*Y*is the number of lineal kilometers of sidewalk in the blocks.

_{T}The sampling ratio is represented by

where *x _{i}* represents number of trees in the selected block, and

*y*represents the total number of kilometers of the selected block. The interval of 95% of trust for each one of the population ratios is given by [

_{i}*r*– 2

*s*(

*r*),

*r*+ 2

*s*(

*r*)].

The variation coefficient is represented by

whereis the sample average, and *s*^{2} is the variance, whose formula is

The sampling fraction is represented by

where *N* is the total number of blocks in the studied area, and *n* is the number of blocks selected for the sample.

The median value of the total number of kilometers variable is represented by

and the sampling error, expressed as a percentage, is represented by

### Stratified Random Sampling

According to Cochran (1977), the number of trees per kilometer in stratified random sampling is estimated by the separate and proportional population ratio. The sampling ratio is the same in all strata: denominated stratification with proportional partition of the blocks per stratum.

The formula of this ratio is

The components of the preceding formula are

where *x _{ih}* is the number of existing trees in the

*i*block of the

*h*stratum, and

*y*is the value of the lineal kilometers of sidewalk of the

_{ih}*h*stratum. The total number of kilometers of sidewalk of the

*h*stratum is

where *N _{h}* is the total of blocks of the

*h*stratum, and

*Y*is the total of kilometers of sidewalk of the

_{jh}*j*block of the

*h*stratum. The total population of the kilometers of sidewalk variable is

The coefficient of variation, the sampling ratio per stratum, and the sampling error for each one of the population ratios per stratum were obtained as in the simple random sampling. The Shannon diversity index (SDI) was calculated, describing the variability of the species number for the total number of trees. The SDI is calculated by the following formula:

where *p _{i}* is the proportion of individuals found in the

*i*species, and ln is the natural logarithm of

*p*.

_{i}## RESULTS AND DISCUSSION

Table 1 shows the census, the simple random sampling, and the stratified random sampling results.

The total number of trees and the number of trees per kilometer of sidewalk obtained by simple random sampling are similar to the real numbers obtained by the census. The stratified random sampling values, considering the average of all strata in the total area and obtained from the strata of the total number of trees, presented a larger error in relation to the census results. It is important to remember that the total number of trees was not simply calculated as the sum of the individual values of each stratum but was calculated as the strata average multiplied by the number of strata.

The evaluation of the number of trees should be described together with the analysis of the sampling error. The simple random sampling error is 17%, which was not considered high. The error calculation is important to give an idea of the usefulness of the sampling method for an urban forest inventory. The sampling error values of each stratum were indicators that not all the strata were homogeneous. The sampling error that represented the stratum homogeneity varied from 19% (stratum 1) to 60% (stratum 4). Where sampling error was lower, the number of trees per kilometer of sidewalk was closer to the real value. In strata 1 and 2, where the tree/km of sidewalk average was a little higher than in strata 3 and 4, the sampling errors were also higher. Stratum 1 presented a higher trees per kilometer of sidewalk value (36.22) than stratum 4, which had the lowest of all (15.16). The strata general average was not similar to the census value; however, stratum 1 results approached it.

In the stratified sampling, the estimated number of trees per kilometer of sidewalk (779 trees) represents the total number of trees of the 57 blocks, while the average number of trees per kilometer value represents the number of trees per kilometer of sidewalk of the 20-block sample.

The coefficient of variation represents the precision of the experiment. When the purpose is population sampling, it depicts the sampling validity. The simple random sampling presented a high coefficient of variation (47%), demonstrating an imprecision with regard to sample choice. That variability was due to the variation of the samples relative to each other, but evaluating the entire sampling results provided an accurate estimate when compared to the real values. This is what Cochran (1977) called imprecise and exact sampling.

When the strata were analyzed, a higher precision was verified in stratum 1, and stratum 4 showed the highest imprecision. This finding demonstrates that stratum 1 possessed the highest sampling homogeneity, which was confirmed in the field.

The difficulty of obtaining a homogeneous stratum to maintain the criteria previously defined for all strata was shown only in stratum 4. The lots in stratum 4 were found to have unoccupied areas that did not possess any urban forest. Areas with new construction presented a high density of trees, elevating the number of trees in the stratum.

The comparison of a simple and a stratified random sampling on the same blocks raised important facts that could be analyzed purely by their quantitative aspects, in a statistical way, as well as by the qualitative aspects of the urban forest. The identification of the species used in the urban forest was important to define the choice of future species that could be planted. Table 2 shows the species selected in the sampling schemes compared to those verified by the census. Not all the found species are listed—only the main ones obtained in the census.

The Shannon diversity index gives an idea of how much species diversity is enough when compared with the total number of trees, as shown in Figure 2.

An analysis of Table 2 demonstrates that a few species occupy most of the urban forest. The census found that 15 species comprise 84% of the number of trees, while in the simple random sampling and strata 1, 2, 3 and 4, the same species comprised 80%, 83%, 80%, 52%, and 74% of the trees, respectively. Tree species were not evenly distributed within the total district; thus, not all species in the census were found in each sampling unit. It is important to point out that the diversity in stratum 3 was much lower than the others. This fact is probably related to the lower economic level and the poor quality of tree planting in that area.

The Shannon diversity index was a little higher in the census than in the simple random sampling and was much lower in the stratified random sampling (Figure 2). On the other hand, the simple random sampling presented an SDI lower than 3. According to Martins and Santos (2001), the index uses sampling that includes all the species, but this ideal situation is impossible once there is a high degree of heterogeneity in urban forest. This fact can explain the differences between the samplings and the census values.

A street tree population is complex, with many urban factors interacting to influence the frequency and diversity of that population, thus making it difficult to choose an appropriate inventory method for all situations. Each case is unique, but, in several studies in Brazil, such as that of Rachid and Couto (1999), the stratified random sampling was no better than simple random sampling. In our case, simple random sampling was an advantageous technique because stratified random sampling is a more complex method.

## CONCLUSIONS

The simple random sampling of the district was more appropriate to estimate the total number of trees and the number of trees per kilometer of sidewalk than the stratified random sample. However, the large coefficient of variation showed that the sampling distribution along the district was not uniform. We concluded that systematized sampling would be more appropriate than simple random sampling.

The stratified random sampling did not represent the sampled universe in a reliable way. There is a need to establish other criteria for strata block grouping that differ from those used in this work. Stratification may not be a suitable method for urban forest analysis because of the multiplicity of variables and randomized factors that determine the presence of trees in a city that such sampling cannot express.

- © 2005, International Society of Arboriculture. All rights reserved.