Abstract
A regression-based econometric model was generated from a statewide survey of South Carolina, U.S., residents concerning participation in urban and community forestry programs. The econometric model attempts to estimate the probability of an individual’s participation. Results are intended to increase effectiveness of program planning and organization within state forestry commissions. Model 1 was created as follows: participation = F (gender, age, education, marital status, region, area raised, area reside, household, duties, and income). Because these responses represented qualitative values, a number of dummy variables (0 or 1, for example, for yes or no) were generated to more accurately reflect the values for participation and a logit model was used. Logit regression analysis produces a value between 0 and 1 that can be interpreted as a probability. Model 2, with fewer variables, was later created to reduce possible multicollinearity problems. Model 1 had a pseudo-R2 value of 0.2955 or a 29.55% probability of having a correct prediction for participation. Model 2 had a pseudo-R2 value of 0.2407. The models produced reasonable predictions of participation.
What factors influence participation in urban and community forestry (U&CF) programs? Which participant characteristics are most predictive of participation levels? How likely is a specific forest owner to participate in the program? Econometrics is a tool that helps answer these questions. Econometrics is “the application of statistical and mathematical methods to the analysis of economic data, with a purpose of giving empirical content to economic theories and verifying them or refuting them” (Maddala 2001). We used econometric methods in this study to assist in U&CF program planning and to aid in better identifying the factors that affect participation in the program.
At the turn of the century, over three-fourths of U.S. residents lived in urban areas (Alig et al. 1999; U.S. Department of Commerce 2000) and the urban forest has had a significant impact on their quality of life (Alig et al. 2003). Congress realized this when it amended the Cooperative Forestry Assistance Act of 1978 to authorize financial, technical, and related assistance to state foresters in support of cooperative efforts in U&CF (Cubbage et al. 1993).
Between 1960 and 1997, the nation’s urban area increased from 10.2 to 26.7 million ha (25–66 million ac) (Vesterby and Krupa 2001). Over the 48 contiguous states, in 1992, less than 3% of land area was urban and less than 5% of the land area was considered developed (Heimlich and Anderson 2001). Urban land area in 1997 varied from 10% in the Northeast to 1% in the Mountain Region (Vesterby and Krupa 2001). Urbanization has been tied to population growth, and by 2050, another 16.2 million ha (40 million ac) is expected to be converted into urban and other development uses (Alig et al. 2003). South Carolina followed this national trend (London and Hill 2000). This increased urbanization increased the importance of U&CF programs. Knowledge of the characteristics of people who participate and who do not participate in these programs should allow planners to target an audience for participation.
Assistance from U&CF programs involves U&CF planning, recreational development, air and water quality improvement programs, stormwater management, urban wildlife management, and economic, urban development, and conservation management plans. Within the United States, typical program recipients are local governments, policymakers and elected officials, builders and developers, civic and community groups, neighborhood associations, nonprofit groups, local businesses, and urban forest councils (Urban Forestry South Expo 2005).
An important aspect of U&CF programs is public involvement (SC Forestry Commission 2005). Citizen participation has been shown to be essential to U&CF program success (Cole 1979; Henderson 1984). With tight budgets and other constraints, volunteerism and public participation are key determinants to program success (Bloniarz and Ryan 1996; Sommer 1996). The support of nontraditional audiences is considered crucial to enhancing these programs (Iles 1998) and increased volunteers provide different skills, new ideas, and more effective outreach (Westphal and Childs 1994). For programs like Tree City USA expanded participation is seen as necessary to counter lagging fiscal support (Andresen 1989).
To ensure public participation, one must first establish who the individuals are and when promoting these programs who needs to be targeted. The purpose of this study was to provide insight on continued public participation within the U&CF programs. The econometric model created in this study will show the likelihood of participation in these programs for individuals based on personal characteristics. The data used in the model were described and analyzed by Straka et al. (2005). We used the same data to develop a predictive model that will help identify factors that impact participation and aid in projecting individual forest owner participation.
Wall et al. (2006) described a similar econometric study. They also attempted to identify factors that led to U&CF program participation. That study used data from 42 of the states to quantify participation; we used data from a survey of South Carolina residents to attempt to do the same thing. Wall’s study attempted to identify variables that impacted participation, whereas this study produced a probability of participation.
STUDY METHODS
In the fall of 2003, a survey was mailed to 324 South Carolina residents to identify characteristics of participants and nonparticipants in U&CF programs and their attitudes toward the programs (Straka et al. 2005). Past participants were randomly selected from South Carolina Forestry Commission records, whereas nonparticipants were randomly selected from occupational groups that would be expected to exhibit equal interest in U&CF programs. The information on the 192 surveys returned was used to generate the econometric model. This is a 59% response rate; participants were 56% of the respondents.
Econometrics involves the specification of a regression analysis model that forecasts or explains behavior. We developed an econometric or regression model to predict participation in U&CF programs. Specific questions answered by both participants and nonparticipants were used to create the independent and dependent variables. “Regression analysis is concerned with describing and evaluating the relationship between a given variable (often called the explained or dependent variable, in our case participation) and one or more other variables (often called the explanatory variables or independent variable)” (Maddala 2001). The responses to each question were placed in a Microsoft Excel document then imported into SAS 9.0 (Statistical Analysis System for Windows) to create the regression model (SAS Institute 2002).
Model formulation needed to describe the dependent variable, participation, was the primary task. The standard regression model using ordinary least squares could not be used because the dependent variable was nonnumeric, that is, questions were answered by responses like “yes” or “no” or “male” or female.” A linear probability model was first considered for the analysis with a dichotomous dependent variable, that is, the participation variable would take on a value of 1 or 0, yes or no, respectively (Maddala 2001). Participation would be an indicator variable that shows the incidence of an event or whether the person participated in the program, and we would have some independent variables that determine the likelihood of participation (Maddala 2001). The qualitative nature of the dependent variable proved inappropriate for the linear probability model.
The logit model creates dummy variables for each of the dependent variables, that is, it accounts for the nonnumeric values by transforming the qualitative values into numeric values (0 or 1). This is achieved by creating dummy variables for each of the independent variables.
Dummy variables were created to define each independent variable (Table 1). For the independent variable “age,” three dummy variables were created. The question was “What is your age?” The possible responses were: a) under 30 years old, b) 30 to 49 years old, c) 50 to 65 years old, or d) 66 years old or older. Of these four answers, three were chosen to become dummy variables. One answer was omitted because its effect can be seen in the models intercept. This approach was used throughout the model. The three answers retained for age were a, b, and d. They were defined as age1, age2, and age4.
The logit regression analysis was computed using the SAS 9.0 system. This type of regression returns a numeric value between 0 and 1(which can be interpreted as a probability or percent) that describes how likely a certain individual (based on characteristics such as gender, age, and education level) will be to participate in U&CF programs. Once the value for participation of an individual is computed, if it is less than 0.50, we predicted that individual is not likely to participate in U&CF programs. Likewise, a value greater than or equal to 0.50 indicated that the individual is likely to participate in U&CF programs. Another way to interpret the participation value is to consider it a probability. If the value is 0.85, we predicted the individual will likely participate, but you can also say the individual is 85% likely to participate in U&CF programs.
RESULTS AND DISCUSSION
The logistic regression completed in SAS 9.0 yielded the following model (model 1) for participation:
The corresponding β values can be found in Table 2. The calculation of the probability for participation in U&CF programs is best illustrated with an example. Consider a female, age 35, with a graduate degree, married with two children under 18 years old, living in a rural nonfarm area in the upstate, who lived as a child in a suburbs on the Lower Coastal Plain, and is a forestry consultant with an annual household income is $150,000.
Variables that match the individual, like Age2 (because she is 35) become “1’s” and the other variable become “0’s.” If 1’s are plugged into the appropriate areas of the model, the equation becomes participation = 0.721275 + 0.782774 (Gender = 1) + 0.245537 (Age2 = 1) + 0.327203 (Education6 = 1) + 0.686155 (Region1 = 1) + 0.110472 (AreaReside1 = 1) – 0.043248 (Duties3 = 1) – 0.174308 (Income3 = 1).
Participation equals 2.65586. In a logit model, the result must be transformed to equal a probability. In this case, the transformation is (exp^(2.65586))/(1 + exp^(2.65586)) = 0.93437125, which indicates that there is a 93.44% chance she will participate in U&CF programs. Sice 0.9344 > 0.5, we conclude she will be participating in U&CF programs.
A likelihood ratio test, based on the χ2 distribution, was used to determine if the model was significant. The likelihood ratio value for the entire model was 43.698. The model proved to be useful at the 10% significance level because the calculated value of 43.698 is less than the χ2 tabulated value of 43.75 with 33 degrees of freedom.
The next step in interpreting the regression results involved the significance of the individual parameter estimates for the independent variables (Table 3). Significance levels were used to determine if the parameter estimates are significantly different from zero. “It is customary to use 0.05 as a low probability and to reject the suggested hypothesis if the probability of obtaining as extreme a t-value as the observed t0 is less than 0.05” (Maddala 2001). In our case, the suggested hypothesis (Ho: βn = 0) is true (fail to reject) if the approximate probability is less than 0.05. There were only four variables that were significantly different from 0 for the full model.
This can be misleading when interpreting the results; all of the independent variables are dummy variables, which gives them a value of either 0 or 1. Because these variables only correspond with 10 questions from the survey, there was a high probability that a particular variable would receive more 0’s than 1’s with a sample size of 192. Zeroes indicated that a particular variable was not a characteristic of an individual and 1’s indicated that that variable (characteristic) did not apply. This would suggest why so many parameter estimates appeared to be equal to 0. Specific groups of individual independent variables were also examined using the likelihood ratio method to determine how different they were from 0, but these tests were deemed inconclusive to the model for the same reasons described for the individual variables.
There are three pitfalls of econometric models. Each is a potential problem for any regression model, but all are more likely to occur when economic or social data are used in the regression model. One potential problem involves unequal variance in the disturbance terms but was unlikely to occur in our model. A second potential problem is autocorrelation associated with time-series data. Because our data were not from a time series, we did not expect this problem.
A third potential problem is multicollinearity caused by highly correlated independent variables. It can cause large standard errors and can make individual correlated variables appear to have weak impacts when, as a group, they have a strong impact (Allison 1999). This problem was possible in our model and we evaluated the problem examining the independent variables (Table 4). There were three pairs of independent variables that were highly correlated: Education1 (elementary school) and MaritalStat4 (widowed), MaritalStat1 and Household5 (living alone), and Region3 (lower coastal) and Region1 (upstate). These three combinations were all correlated higher than 50%. Each individual variable was examined to determine how it might be affecting the model.
Education1 and all marital status variables were likely sources of multicollinearity and these variables were from the original full model to create a second model (model 2) with better explanatory power. The third set of variables that were highly correlated was Region1 and Region3. These variables are in the same category, which would indicate that they should be correlated. Because these variables can never interact, they were retained in the model. Age1 (under 30) and Duties4 (educator) proved to be significant at the 5% level indicating that these variables have a large effect on participation.
The next step in defining the model dealt with goodness of fit as measured by R2. Because the model in this study is logistic, the normal R2 value cannot be used. We used pseudo-R2 measures (Maddala 2001). The Cragg–Uhler R2 is an appropriate pseudo-R2 formula for a logistic model with 0 to 1 values that assesses a proportion of correct predictions (Table 5). These values are not especially high, suggesting that the model was not properly specified and/or other variables not included on the survey were important.
A modified regression mode was run without the previously mentioned independent variables (model 2) to evaluate the impact of excluding the variables potentially causing multicollinearity problems (Table 6).
The likelihood ratio test, which uses a χ2 distribution, was used to determine if model 2 was significant. The likelihood ratio value for model 2 was 34.712. The model proved to be useful at the 10% significance level because the calculated value of 34.712 is less than the χ2 tabulated value of 37.92 with 28 degrees of freedom. Model 2 may prove more useful than the full model in estimating participation as a result of the problem of multicollinearity being reduced.
The calculation of the probability for participation in U&CF programs using model 2 was:
If model 2 is applied to the same individual that was used in the earlier example, the parameters result in all 1’s being plugged into the model and participation equals 2.055974. This converts to a probability of 0.88654987, which indicates that there is an 88.65% chance she will participate in U&CF programs. Because 0.8865 > 0.5, this leads to the conclusion the she is very likely to be participating in U&CF programs.
There was a 4.8% decrease in the probability of participation with the use of model 2. Both models may both be very successful in determining the likelihood of participation in U&CF programs.
Education2 (high school) and Duties4 (educator) proved to be significant at the 5% level indicating that these variables have a large effect on participation. U&CF program planners should pay close attention to the characteristics defined by the previously mentioned variables when targeting individuals for participation.
Our results are consistent with other research on factors affecting participation in volunteer organizations. All the variables identified in the final model are considered primary determinants of participation (Natural Resources Conservation Service 2004). Other studies that discuss determinants of participation consistently use the type of variables in the models discussed here (Smith 1994). Pseudo R2 values are not high. Analysts using biologic or physical data would generally be unhappy with these results. However, for social data of this type and the logit model formulation, these R2 levels are usually considered acceptable (Maddala 2001). We were satisfied that these results are significant and do illustrate valuable explanatory relationships that can be used to estimate participation levels.
Note that we were limited to data included in the 2003 survey. This model is merely a starting point in establishing factors that affect participation. Additional data will surely strengthen the model. Our main contribution is showing that this technique can be used effectively to estimate participation and we provide a starting point for a more detailed study.
Can a model like this be used in day-to-day work of the U&CF professional? Yes, it does provide valuable information. Notice in our prior example of the 35 year old female forestry consultant that we determined the likelihood of her participation. The variables in the model interact and a simple table of likelihoods by characteristic would be too complex to be usable. However, other variables can be held constant and changes in variables like income level or age can be evaluated. The model certainly can be used to estimate likelihood of participation for any individual and would show the program planner where to best spend his or her time.
CONCLUSION
The purpose of this study was to provide insight into participation within U&CF programs. A logistic regression model was used with independent variables being qualitative. Two econometric models were evaluated—one using all the available independent variables (model 1) and the other omitting certain variables (model 2). The pseudo-R2 values were not especially high, but they suggest a level of predictability. These low values could mean the model was not properly specified or that relevant variables were omitted. For an econometric study of this type, these are acceptable R2 values.
The two models proved to be significant (at the 10% level) in the prediction of participation. Model 2 may prove more useful than the full model in estimating participation as a result of the problem of multicollinearity being corrected.
Acknowledgments
Support for this research was provided by a USDA Forest Service Urban and Community Forestry Assistance Grant awarded through the South Carolina Forestry Commission.
- © 2006, International Society of Arboriculture. All rights reserved.