4th International Conference on Integrating GIS and Environmental Modeling (GISEM4):
Problems, Prospect and Research Needs. Banff, Alberta, Canada, September 2 - 8, 2000.


Paradigm shifts in theory and methods:

regression quantile analysis enables new insights for ecology

GIS/EM4 No. 118

Sandra L. Haire

Carl E. Bock

Brian S. Cade

Abstract

Ecological systems are heterogeneous across space and time, and variation in observations therefore holds important information. Analysis methods based on central tendency collapse the differences in individual and species responses that create diverse ecological patterns. The inconsistency between the concept of heterogeneity in ecology and commonly used statistical methods created the need to draw upon ideas developed within the larger scientific community. This community includes other disciplines, technological research and advancements, and perspectives brought to science by individuals with diverse worldviews. We examine the advantages of regression quantiles, a statistical method new to ecology, for understanding variation in species abundance and its relationship to landscape heterogeneity in the Boulder Valley, Colorado USA. The use of regression quantiles to model rates of change across data distributions enabled estimation of the limitations imposed on species abundance by landscape characteristics. Inclusion of location variables and distance measurements allowed interpretation of changes in animal abundance in a spatial context. Regression quantiles is one of many ideas and methods that originated from another discipline and subsequently benefited ecological study. Expanding forums for communication that encourage exchange of ideas from a wide range of perspectives will improve our ability as scientists to shift theoretical and methodological paradigms when inconsistencies arise. The resulting innovations will increase the ability of science to be informative and useful in problem solving.

Keywords

Statistical methods, regression quantiles, grassland communities, urbanization, paradigm shifts, alternative philosophies.


Introduction

Even the simplest steps taken are imbued with meaning, assumptions, and epistemological construction that are often hidden to the researchers.  Recognizing this fundamental point can be very liberating, for it allows us to better understand what we can learn from the information we collect, the limits to our knowledge, and new ways of knowing, even for those methodological approaches we have been using for years. (Hodge 1995)

Statistical methods are central to many environmental modeling efforts, and ecology is among the most statistically oriented disciplines. Our hope is that science holds promise for meeting a myriad of challenges in today’s world. Examining our central theories and methodologies may be advantageous at a time when urban growth, climate change, disparate economies, and extinction of species demand our most creative endeavors. Continual questioning of assumptions, theories, and methods is a natural outcome of inconsistencies in observations and expectations.


There has been an historical progression from viewing natural systems as homogeneous in order to simplify research (Wiens 1995) to recognition of the importance of heterogeneity in understanding ecological processes (Kotliar 1996). Do traditional viewpoints and techniques serve our need to understand systems characterized by heterogeneity? The predominance of averages and the methods of least squares regression and maximum likelihood estimation in statistics suggests that understanding what is ‘normal’ is a panacea for questions in ecology. Recognizing the importance of understanding heterogeneity leads us, as scientists, to conclude that there is valuable information in the variation in the data; variation left unexamined by commonly used statistical techniques.


Understanding variation in species abundance and its relationship to landscape heterogeneity is a primary objective of our research in the Boulder Valley, Colorado USA. Meeting this objective required a shift in thinking away from commonly used statistical techniques that focus on measures of central tendency, to techniques that capture the variation intrinsic to ecology. Innovations to meet the need for change come from a community of scientists, which contains both the traditions of science, and the ability and willingness to change the theories and methods of science (Kuhn 1996). We define the scientific community as inclusive of all scientific disciplines, which bring unique perspectives developed from their distinctive histories and systems of study, and the technology that enables implementation of new techniques. Important sources of ideas and potential solutions to challenges in science can also be found in individuals with diverse backgrounds and perspectives, who bring unique worldviews, theories, and practices to the scientific community.


The implementation of statistical techniques developed by econometricians (Koenker and Bassett 1978, and other work by Roger Koenker) enabled us to meet our study objective, and we examine this shift in thinking as it relates to our particular study, and to the practice of science in a larger sense. We first review the assumptions of linear regression and summarize the limits of our conclusions based on ordinary least squares and related methods. The development of regression quantiles, an extension of the usual univariate concept of quantiles to the linear model (Cade et al. 1999) and its applicability to, and advantages for, ecology are illustrated through examples from the Boulder study. In our examples, we use a land-cover database derived from satellite imagery, and field data describing plant communities and the abundance of birds and small mammals in grassland areas managed by Boulder Open Space (see Haire et al. 2000 for more detail1). Regression quantile models for the prairie vole (Microtus orchogaster), a small mammal species found in open, dry grasslands, and the lark sparrow (Chondestes grammacus), a songbird associated with upland grass types, illustrate the new insights that regression quantiles provided in an ecological application. 


In the broader context, we review situations where ideas from other disciplines and technological advances have enabled scientists to respond to paradigmatic change. Scientists from diverse philosophical perspectives have further increased the ability of science to gain understanding and solve problems. We provide some specific examples that are meant to encourage a wider view of the scientific community in order to increase our potential for meeting new challenges in the future.

Broadening the focus of analysis

Statistics can bring order to, and reveal, the complexities of scientific observation.  Linear regression is one of many statistical methods that focuses on averages and represents a commonly used statistical approach. The ability to track the computation of least squares and to estimate functions based on means make linear regression models very attractive (Koenker and Portnoy 1996). Originally, regression was developed for problems where relevant variables, their order of importance, and the function best representing relationships were known, and theorems in contemporary texts use these same assumptions to justify the use of regression (Freedman 1997). In an historical example consistent with the intended use, Gauss predicted the location of the asteroid Ceres in a regression on 6 variables describing its orbit (Diaconis 1998). Although some questions in ecology may gain insight from an analysis that is based on clear, predictable conditions, a more comprehensive view of ecological systems and their inherent variability is not possible using traditional regression analysis.

 

The implications of emphasizing averages, or central tendencies in data analysis are apparent in the method of least squares. The chosen prediction line (estimates of intercept and slope) minimizes the sum of the squared errors of prediction for all sample points (Figure 1) (Ott 1992). The familiar model:

includes the random error that represents the difference between a measurement y and a point on the line, and takes into account all unpredictable and unknown factors that are not included in the model. The average of the error for a given value of x is 0, and a point on the line represents the average of y (or its expected value) for a fixed value of x. In ecology, variance represents measurement error, random error, and temporal and spatial differences between individuals, species, and environmental features (Wiens 1989). Given the wide range of possible biotic and abiotic factors that may influence ecological response, a great deal of information is likely contained within the error term, and the fitted line cannot adequately represent changes over space and time in an ecological system. We recognized the importance of describing varied response among the bird and mammal species in the Boulder Open Space, and were unable to effectively describe the diversity observed in the field using statistics based on mean functions.

Figure 1: The least squares regression line for a data set with a high degree of variability.
Figure 1. The least squares regression line for a data set with a high degree of variability.  Residual standard error = 11.26 (60 degrees of freedom)
y = 15.01445 - 0.02646519 (distance to riparian cover type) + 



Several approaches were taken to gain an understanding of the relationships represented by the Boulder data, but the results, while enlightening to some degree, didn’t describe many of the patterns we considered important. Our analysis of nesting grassland species using ordinary least squares regression incorporated a term for spatial variation (Haire et al. 1998) but failed to describe differences in individual species' responses to landscape change. Multivariate analyses provided a picture of the bird community along gradients of urbanization and upland and lowland grassland habitats (Haire 1998); however, we did not gain the understanding of diversity that we sought from multivariate analysis. Principal components analysis and related techniques use weighted averaging to develop models that explain more of the variation in the data, while effectively condensing the initial variation in the data along resulting axes in ordination space (Wiens 1989). In a study of raptors, a threshold of 5% to 7% urbanization was identified for 4 diurnal species by examining patterns in scatter plots (Berry et al. 1998). Quantifying a threshold, or limiting, relationship is outside the scope of traditional analyses. The importance of understanding complexities in a grassland community facing the pressures of urbanization created the impetus for seeking a new approach.

Describing variation using regression quantiles

The application of regression quantiles to ecology represents a new technique with a long history of development and application in other disciplines. The history of regression quantiles extends back to the first attempts at regression, before the development of least squares techniques (Koenker and Portnoy 1996). Boscovich wanted to estimate the extent of the earth’s ellipticity using measures of arc-length taken at several locations. He attempted to estimate a fitted line by minimizing the sum of absolute errors, requiring that the fitted line pass through the centroid of the observations, thus estimating the slope parameter as a median. The use of the median, rather than the mean, to determine the regression line distinguishes regression quantiles from least squares regression at the most basic level. The median regression estimator was later extended to other quantiles (represented by the Greek letter tau) (Figure 2). The median, or 0.50th quantile, describes the center of the distribution such that 50% of the observations are less and 50% greater than the estimate; a 0.90th quantile is an estimate such that 90% of the observations are less and 10% greater than the estimate. Ranking and sorting of observations determine parameter estimates for a given univariate quantile, but in the regression setting optimization via linear programming has replaced ranking and sorting (Koenker and Portnoy 1996).

Figure 2. Regression lines for three values of , using the same data in Figure 1.



Rank scores for each set of observations, for any given quantile can be used to test hypotheses and construct confidence intervals for regression quantiles in cases of homogeneous or heterogeneous error distributions (Koenker 1994, Cade et al. 1999). The test statistic is evaluated with a chi-square distribution; degrees of freedom are equal to the difference in number of parameters in alternative and hypothesized models. For the special case of 1-degree of freedom the rank score test calculates a standardized test statistic that is referenced to the standard normal distribution to calculate probabilities under the null hypothesis. The rank score test can be inverted to compute confidence intervals. Model selection techniques (e.g., Akaike's Information Criterion [AICC] (Hurvich and Tsai, 1990) can be applied as well. Methods for nonlinear functions are currently being developed to further extend the usefulness of regression quantiles. Mathematical details of regression quantile estimation, rank score tests, and calculation of confidence intervals are provided in the Appendix. Computer code for regression quantile functions is available in the Ecological Archives.


Limiting factor theory proposes that limits are imposed in a heterogeneous fashion across space and time (Cade et al. 1999). Limits can also be visualized in terms of constraints imposed by processes at one scale on those of another scale, in a hierarchical fashion.  For example, changes in slope in quantiles at the extremes of the data distribution may provide different information than changes at the center of the distribution. These changes hold useful information; interaction of unmeasured factors results in greater variation in the data distribution, and estimates at the extremes of the distribution help account for this interaction (Cade et al. 1999). When the measured independent variable (x) is constraining the measured response (y), changes in slope at upper quantiles are useful for estimating this relationship. 

Example 1: Prairie vole models

Models of change in prairie vole abundance led to insights about the relative importance of landscape characteristics for this species, and they provided examples of the types of information gained by using regression quantiles. The negative rate of change in vole numbers appears to be greater at higher quantiles when quantile regression lines are plotted for the prairie vole model that includes distance to riparian cover type; the slope of the 90th quantile is more negative than that of the 50th quantile (Figure 2). Slopes for each quantile (0.01 to 0.99) can be plotted on one graph to illustrate the magnitude and direction of change in slope for a particular model (Figure 3). Increasingly negative rates of change at upper quantiles (0.50 to 0.99) are consistent with a limiting factor effect on prairie vole abundance at some sample locations. This information was key to interpreting the importance of shrub or tree cover around streams and ditches, which provided a network of potential corridors for small mammals in upland grass areas. 

Figure 3. Slope values plotted across all quanitle values.
Figure 3. Slope values plotted across all quantile values. Upper and lower bounds of the 95% confidence intervals are shown in dashed lines.

The relationship between prairie vole abundance and landscape urbanization was of particular interest in our study, so we tested the partial effects of distance to riparian and urban landscape composition in a multiple regression analysis.  Higher rates of change in abundance at higher quantiles were consistent with a limiting factor relationship for multiple regression models, which were compared using rank score tests and confidence intervals. The chi-square statistic tests for the 50th to 90th quantiles ranged from 0.000 07 < P < 0.023 (distance to riparian only) and 0.000 089 < P < 0.099 (distance to riparian, given urban composition).  Confidence intervals were nearly identical for the model containing only distance to riparian (Figure 3) and the model with urban composition included.  Urbanization decreases the area and diversity of grasslands, including riparian areas, and this indirect effect is seen in the regression quantile analysis.  Looking at changes across data distributions gives a picture of dynamic processes that vary over space and time, rather than a collapsed view of overall propensity represented by measures of central tendency.

Example 2: Lark sparrow models

Comparing models across quantiles allowed a more comprehensive view of ecological process as represented by lark sparrow abundance in the Boulder Valley. Lark sparrow distribution in the Boulder study area was clumped and localized (Haire et al. 2000). Geographic location, or spatial structure can be used as an indirect descriptor of various processes that have generated spatial structure, leading to hypotheses about unmeasured influences on phenomena of interest (Borcard and Legendre 1994). We examined three models to test whether including spatial location provided useful information about this species’ distribution and abundance. Models containing coordinate location, and local and landscape plant community descriptors were compared with a parameter model that is equivalent to conventional univariate quantiles (Figure 4). The location model contained only location coordinates, and the habitat model contained 2 variables: the plant community descriptor for shale types measured at the local scale, and the composition of upland grass cover types in the surrounding landscape. At higher quantiles, the model with location and habitat variables was most parsimonious (smallest AICC). In other applications, model comparisons at central or lower ends of the spectrum may have important implications for study questions. Examination of the differences in AICC across quantiles allows the opportunity to choose a particular quantile model or set of models that best describes relationships of interest.

Figure 4. Difference in AICC values for three lark sparrow models.


Discussion

Other Disciplines as Sources of New Ideas

When our observations contradict familiar paradigms, we need to find new ways to address change creatively. Advancements in technology, associated with computer science, electronics, and satellite technologies have filled this role in many cases. The development of computers made it possible to organize and analyze data efficiently. The availability of remotely sensed images allows an aerial perspective that has resulted in new questions and new methods of study. Software that incorporates spatial data structures has allowed our questions to extend to landscapes and ecosystems. From Ada Lovelace’s early conception of computer programming, an increasingly higher level of abstraction of concepts has enabled implementation of many new ideas. Particularly relevant to the regression quantile methods presented herein are the use of linear programming, and development of the simplex algorithm for optimization problems by George Dantzig in the 1940s (Hadley 1962).


In response to needed paradigm shifts, scientists have drawn upon statistical methods that originated in various disciplines. For example, spatial statistics, including analysis of spatial autocorrelation (e.g. Moran’s I), kriging, and trend surface analysis have roots in mining engineering and geology (Cressie 1991). The quantification of spatial relationships reflects the underlying focus on the importance of location especially in the field of geography. This focus has become increasingly influential in landscape ecology. Other statistical approaches to studying extreme values that were originally developed in geology, climatology, and engineering have recently been applied in ecology. These extensions are another example of the need to go beyond mean and variance in understanding extremes of biological interest, such as temperature, force, and lifespan (Gaines and Denny 1993). Econometricians were pioneers in the development of methods to examine limiting functions of various factors (e.g., cost, productivity) that led to implementation of regression quantiles and improvements in its application.

Diverse Worldviews, Theories, and Methods

Inclusion of alternative viewpoints in the scientific community will influence scientific practice (Longino 1996). An obvious benefit is the increased recognition of the connection between cultural values and theoretical and methodological approaches (Birke 1986, Hodge 1995). All concepts (potential solutions in a paradigm shift) must be in harmony with worldviews. For example, probability theory, with its notion of random events, developed in a society that had discarded the belief that gods directed all events (Lightner 1991).


Alternative descriptions of biological and social systems emerge from different worldviews. Women scientists in primatology brought different perspectives to observation of social interactions at a time when dominance hierarchies with male leadership were thought to be universal among primate species (Rosser 1992). Barbara McClintock’s worldview led her to propose theories that were in sharp contrast with those of her contemporaries in molecular biology (Magada-Ward 1999). Her belief that exceptions or anomalies have meaning in and of themselves led her to discover the ability of normal DNA to rearrange itself in response to environmental pressures.


People of the First Nations in North America hold a view of ecology that differs from Western scientific understanding in many respects (Pierotti and Wildcat 1997). The language used in Western ecology reflects divergent philosophies; terms such as “natural resources” convey a value system based on the idea of commodities, rather than places with spiritual and material importance (Kimmerer 1998). Observations by Native peoples have contributed to understanding relationships in nature (Pierotti and Wildcat 1997). The view that badgers and coyotes cooperated in hunting efforts was long recognized in American Indian stories, while competition theory precluded this understanding in Western ecology. Stories of friendship between wolves and ravens are one example of the greater knowledge of Native people concerning the behavior and ecology of wolves, now recognized to be valuable in understanding these animals (Pierotti and Wildcat 1997).

Conclusion

Changes in thought and practice occur when there is conflict between theory, observation, and method. Our need to understand variability in ecology can lead to considering and applying new techniques. The tension between theory that emphasizes heterogeneity and analysis of central tendency was resolved by a new technique that highlights the complexities of ecological process. Regression quantiles may be the first of many new methods that represent a shift in emphasis toward understanding heterogeneous, dynamic systems.


Regression quantiles are particularly applicable to ecological data because results can be interpreted in the context of ecological theory. Limiting factor theory suggests that when a factor required for biological process is limiting, its abundance determines the magnitude of biological response at particular places and times (Kaiser et al. 1994, Cade et al. 1999). We demonstrated that modeling relationships throughout data distributions is a useful approach, and focusing on changes along the edge of distributions may capture limiting relationships and account for the interactive effects of unmeasured factors. The development of regression quantiles by econometricians, and the extensive body of useful ideas and techniques brought to ecology from other disciplines demonstrates the importance of communication in the scientific community. Dialogue at conferences, on-line, and through research that integrates scientific disciplines facilitates exchange of information needs and resources for solutions.


Another valuable resource for ideas can be found in diverse perspectives brought to science by individuals with different worldviews. It is interesting to imagine the approaches that would result from more radical epistemologies, including belief in connectedness of nature and conception of the world as a whole (Rosser 1992). How would the idea of web, or spiral models, rather than hierarchical models, be useful in ecological study? What types of information would be gained using cyclical models of time as an alternative to linear conceptions of time? A scientific community that remains open to criticisms of evidence, methods, assumptions, and reasoning will hold greater potential for gaining information that contributes to problem solving in the future. Creating forums for presenting scientific research outside of traditional paradigms would broaden the scope and influence of science as a comprehensive system of thought and practice.

Acknowledgements

We thank J. Carpenter, M. Hippard, N. Kotliar, B.L. Lamb, C. Melcher, and C. Miller for insightful comments on an earlier draft, and L. Everette for web formatting. City of Boulder Open Space provided imagery and access to properties. Research support by US Geological Survey, University of Colorado, and Colorado State University is gratefully acknowledged.

Endnotes

  1. Methods for quantifying small mammal abundance in the Boulder study have not been published. A 25 x 25-m grid of 25 Sherman live-traps was set out for one week at the center of 62 plots in grassland areas between late May and mid-August, in each of three summers (1994-1996). Number of different individuals for each species was counted for a total of 100 trap-nights per plot per summer.

Literature Cited

Berry ME, Bock CE, Haire SL. 1998. Abundance of diurnal raptors on open space grasslands in an urbanized landscape. The Condor 100:601-608.

Birke L. 1986. Women, feminism, and biology: The feminist challenge. Methuen, New York.

Borcard D, Legendre P. 1994. Environmental control and spatial structure in ecological communities: an example using oribatid mites (Acari, Oribatei). Environmental and Ecological Statistics 1:37-61.

Cade BS, Terrell JW, Schroeder RL. 1999. Estimating effects of limiting factors with regression quantiles. Ecology 80(1):311-323.

Cressie NAC. 1991. Statistics for spatial data. J. Wiley, New York. 900 p.

Diaconis P. 1998. A place for philosophy? The rise of modeling in statistical science. Quarterly of Applied Mathematics LVI(4):797-805.

Freedman D. 1997. From association to causation via regression. Advances in Applied Mathematics 18:59-110.

Gaines SD, Denny MW. 1993. The largest, smallest, highest, lowest, longest, and shortest: extremes in ecology. Ecology 74(6):1677-1692.

Hadley G. 1962. Linear programming. Addison-Wesley Publishing Company. Reading, Massachusetts. 520 p.

Haire SL. 1998. Spatial factors influencing bird distribution in grasslands near Boulder, Colorado. MS Thesis, Colorado State University, Fort Collins. 127 p.

Haire SL, Dean DJ, Bock CE. 1998. Incorporation of spatial dependence in models of bird community response to landscape pattern. Page 149-163 in H J-H. Whiffen and W C. Hubbard, Editors. SOFOR GIS ’98. Proceedings of the 2nd Southern Forestry GIS Conference. October 28-29, 1998, Athens, Georgia.

Haire SL, Bock CE, Cade BS, Bennett BC. 2000. The role of landscape and habitat characteristics in limiting abundance of grassland nesting songbirds in an urban open space. Landscape and Urban Planning 48(1-2): 65-82.

Hodge DC 1995. Should women count? The role of quantitative methodology in feminist geographic research. Professional Geographer 47(4):426.

Hurvich CM, Tsai C-L. 1990. Model selection for least absolute deviations regression in small samples. Statistics and Probability Letters 9:259-265.

Kaiser MS, Speckman PL, Jones JR. 1994. Statistical models for limiting nutrient relations in inland waters. Journal of the American Statistical Association 89(426): 410-423.

Kimmerer R. 1998. Bringing the Native perspective into natural resources education. Winds of Change 13(3): 14-18.

Koenker R. 1994. Confidence intervals for regression quantiles. Pages 349-359 in P. Mandl and M. Hušková, Editors. Asymptotic statistics: Proceedings of the Fifth Prague Symposium. Physica-Verlag, Heidelberg, Germany.

Koenker R, Bassett G. 1978. Regression quantiles. Econometrica 46: 33-50.

Koenker R, Portnoy S. 1996. Quantile regression. Office of Research Working Paper Number 97-0100. University of Illinois at Urbana-Champaign, College of Commerce and Business Administration Office of Research.

Kotliar NB. 1996. Scale dependency and the expression of hierarchical structure in Delphinium patches. Vegetatio 127:117-128.

Kuhn TS. 1996. The structure of scientific revolutions, 3rd Edition. University of Chicago Press, Chicago, Illinois.

Lightner JE. 1991. A brief look at the history of probability and statistics. The Mathematics Teacher 84(8): 623-630.

Longino HE. 1996. Subjects, power, and knowledge: description and prescription in feminist philosophies of science. Chapter 17 in Keller, EF. and HE. Longino, Editors. Feminism and Science. Oxford University Press, Oxford, New York.

Magada-Ward M. 1999. Rescuing Keller by abducting her: toward a pragmaticist feminist philosophy of science. Journal of Speculative Philosophy 13(1): 19-38.

Ott RL. 1992. An introduction to statistical methods and data analysis. Duxbury Press, Belmont, California.  1051 p. + Appendices.

Pierotti R, Wildcat DR. 1997. The science of ecology and Native American tradition. Winds of Change (Autumn): 94-97.

Rosser SV. 1992. Are there feminist methodologies appropriate for the natural sciences and do they make a difference? Women’s Studies International Forum 15(5/6): 535-550.

Wiens JA. 1989.  The ecology of bird communities.  Vol. 2. Processes and variations. Cambridge University Press.  316 p.

Wiens JA. 1995. Landscape mosaics and ecological theory. Chapter 1 in Mosaic landscapes and ecological processes. Hansson L., L. Fahrig, and G. Merriam, Editors. Chapman and Hall, London.


Authors

Sandra L. Haire, Wildlife Biologist
US Geological Survey, 4512 McMurry Ave. Fort Collins, Colorado 80525 USA
e-mail: sandy_haire@usgs.gov, Tel: 970-226-9367, Fax: 970-226-9230


Carl E. Bock, Professor of Biology
University of Colorado, Boulder, Colorado 80309
e-mail: bockc@colorado.edu, Tel: 303-492-7184, Fax: 303-492-8699


Brian S. Cade, Biostatistician
US Geological Survey, 4512 McMurry Ave. Fort Collins, Colorado 80525 USA
e-mail: brian_cade@usgs.gov, Tel: 970-226-9326, Fax: 970-226-9230