4th International Conference on Integrating GIS and Environmental Modeling (GISEM4):
Problems, Prospect and Research Needs. Banff, Alberta, Canada, September 2 - 8, 2000.
Paradigm shifts in theory and methods:
regression quantile analysis enables new insights for ecology
GIS/EM4 No. 118
Sandra L. Haire
Carl E. Bock
Brian S. Cade
Abstract
Ecological systems are heterogeneous across space and time, and variation in observations therefore holds important information. Analysis methods based on central tendency collapse the differences in individual and species responses that create diverse ecological patterns. The inconsistency between the concept of heterogeneity in ecology and commonly used statistical methods created the need to draw upon ideas developed within the larger scientific community. This community includes other disciplines, technological research and advancements, and perspectives brought to science by individuals with diverse worldviews. We examine the advantages of regression quantiles, a statistical method new to ecology, for understanding variation in species abundance and its relationship to landscape heterogeneity in the Boulder Valley, Colorado USA. The use of regression quantiles to model rates of change across data distributions enabled estimation of the limitations imposed on species abundance by landscape characteristics. Inclusion of location variables and distance measurements allowed interpretation of changes in animal abundance in a spatial context. Regression quantiles is one of many ideas and methods that originated from another discipline and subsequently benefited ecological study. Expanding forums for communication that encourage exchange of ideas from a wide range of perspectives will improve our ability as scientists to shift theoretical and methodological paradigms when inconsistencies arise. The resulting innovations will increase the ability of science to be informative and useful in problem solving.
Keywords
Statistical methods, regression quantiles, grassland
communities, urbanization, paradigm shifts, alternative philosophies.
Introduction
Even the simplest steps taken are imbued with meaning, assumptions, and epistemological construction that are often hidden to the researchers. Recognizing this fundamental point can be very liberating, for it allows us to better understand what we can learn from the information we collect, the limits to our knowledge, and new ways of knowing, even for those methodological approaches we have been using for years. (Hodge 1995)
Statistical methods are central to many environmental modeling efforts, and ecology is among the most statistically oriented disciplines. Our hope is that science holds promise for meeting a myriad of challenges in today’s world. Examining our central theories and methodologies may be advantageous at a time when urban growth, climate change, disparate economies, and extinction of species demand our most creative endeavors. Continual questioning of assumptions, theories, and methods is a natural outcome of inconsistencies in observations and expectations.
There has been an historical progression from
viewing natural systems as homogeneous in order to simplify research (Wiens
1995) to recognition of the importance of heterogeneity in understanding
ecological processes (Kotliar 1996). Do traditional viewpoints and techniques serve our need to understand
systems characterized by heterogeneity? The
predominance of averages and the methods of least squares regression and maximum
likelihood estimation in statistics suggests that understanding what is
‘normal’ is a panacea for questions in ecology. Recognizing the importance of understanding heterogeneity leads us, as
scientists, to conclude that there is valuable information in the variation in
the data; variation left unexamined by commonly used statistical techniques.
Understanding variation in
species abundance and its relationship to landscape heterogeneity is a primary
objective of our research in the Boulder Valley, Colorado USA. Meeting this objective required a shift in thinking away from commonly
used statistical techniques that focus on measures of central tendency, to
techniques that capture the variation intrinsic to ecology. Innovations to meet the need for change come from a community of
scientists, which contains both the traditions of science, and the ability and
willingness to change the theories and methods of science (Kuhn 1996). We define the scientific community as inclusive of all scientific
disciplines, which bring unique perspectives developed from their distinctive
histories and systems of study, and the technology that enables implementation
of new techniques. Important
sources of ideas and potential solutions to challenges in science can also be
found in individuals with diverse backgrounds and perspectives, who bring unique
worldviews, theories, and practices to the scientific community.
The implementation of
statistical techniques developed by econometricians (Koenker and Bassett 1978,
and other work by Roger
Koenker) enabled us to meet our study objective, and we
examine this shift in thinking as it relates to our particular study, and to the
practice of science in a larger sense. We
first review the assumptions of linear regression and summarize the limits of
our conclusions based on ordinary least squares and related methods. The development of regression quantiles, an extension of the usual
univariate concept of quantiles to the linear model (Cade et al. 1999) and its
applicability to, and advantages for, ecology are illustrated through examples
from the Boulder study. In our examples, we use a
land-cover database derived from satellite imagery, and field data describing
plant communities and the abundance of birds and small mammals in grassland areas managed by
Boulder Open Space (see Haire et al. 2000 for more detail1). Regression quantile models for the prairie vole (Microtus
orchogaster), a small mammal species found in open, dry grasslands, and the
lark sparrow (Chondestes grammacus), a
songbird associated with upland grass types, illustrate the new insights that regression quantiles provided in an ecological application.
In the broader context, we
review situations where ideas from other disciplines and technological advances
have enabled scientists to respond to paradigmatic change. Scientists from diverse philosophical perspectives have further increased
the ability of science to gain understanding and solve problems. We provide some specific examples that are meant to encourage a wider
view of the scientific community in order to increase our potential for meeting
new challenges in the future.
Broadening the focus of analysis
Statistics can bring order to, and reveal, the complexities of scientific observation. Linear regression is one of many statistical methods that focuses on averages and represents a commonly used statistical approach. The ability to track the computation of least squares and to estimate functions based on means make linear regression models very attractive (Koenker and Portnoy 1996). Originally, regression was developed for problems where relevant variables, their order of importance, and the function best representing relationships were known, and theorems in contemporary texts use these same assumptions to justify the use of regression (Freedman 1997). In an historical example consistent with the intended use, Gauss predicted the location of the asteroid Ceres in a regression on 6 variables describing its orbit (Diaconis 1998). Although some questions in ecology may gain insight from an analysis that is based on clear, predictable conditions, a more comprehensive view of ecological systems and their inherent variability is not possible using traditional regression analysis.
The implications of emphasizing averages, or central tendencies in data analysis are apparent in the method of least squares. The chosen prediction line (estimates of intercept and slope) minimizes the sum of the squared errors of prediction for all sample points (Figure 1) (Ott 1992). The familiar model:
includes
the random error that represents the difference between a measurement y and a
point on the line, and takes into account all unpredictable and unknown factors
that are not included in the model. The
average of the error for a given value of x is 0, and a point on the line
represents the average of y (or its expected value) for a fixed value of x. In ecology, variance represents measurement error, random error, and
temporal and spatial differences between individuals, species, and environmental
features (Wiens 1989). Given the
wide range of possible biotic and abiotic factors that may influence ecological
response, a great deal of information is likely contained within the error term,
and the fitted line cannot adequately represent changes over space and time in
an ecological system. We recognized
the importance of describing varied response among the bird and mammal species
in the Boulder Open Space, and were unable to effectively describe the diversity
observed in the field using statistics based on mean functions.
![]() |
||
Figure 1. The least squares regression line for a data set with a high
degree of variability. Residual standard error = 11.26 (60 degrees
of freedom)
|
Describing variation using regression quantiles
The application of regression
quantiles to ecology represents a new technique with a long history of
development and application in other disciplines. The history of regression quantiles extends back to the first
attempts at regression, before the development of least squares techniques (Koenker
and Portnoy 1996). Boscovich wanted
to estimate the extent of the earth’s ellipticity using measures of arc-length
taken at several locations. He
attempted to estimate a fitted line by minimizing the sum of absolute errors,
requiring that the fitted line pass through the centroid of the observations,
thus estimating the slope parameter as a median. The use of the median, rather than the mean, to determine the regression
line distinguishes regression quantiles from least squares regression at the
most basic level. The median
regression estimator was later extended to other quantiles (represented by the
Greek letter tau) (Figure 2). The median, or
0.50th quantile, describes the center of the distribution such that
50% of the observations are less and 50% greater than the estimate; a 0.90th
quantile is an estimate such that 90% of the observations are less and 10%
greater than the estimate. Ranking
and sorting of observations determine parameter estimates for a given univariate
quantile, but in the regression setting optimization via linear programming has
replaced ranking and sorting (Koenker and Portnoy 1996).
![]() |
| Figure 2. Regression lines for three values of |
Rank scores for each set of
observations, for any given quantile can be used to test hypotheses and
construct confidence intervals for regression quantiles in cases of homogeneous
or heterogeneous error distributions (Koenker 1994, Cade et al. 1999). The test statistic is evaluated with a chi-square distribution;
degrees of freedom are equal to the difference in number of parameters in
alternative and hypothesized models. For the special case of 1-degree of freedom the rank score
test calculates a standardized test statistic that is referenced to the standard
normal distribution to calculate probabilities under the null hypothesis. The rank score test can be inverted to compute confidence intervals. Model selection techniques
(e.g., Akaike's Information Criterion [AICC] (Hurvich and Tsai, 1990) can be applied as well. Methods for nonlinear functions are currently being developed
to further extend the usefulness of regression quantiles. Mathematical details of regression quantile estimation, rank score tests,
and calculation of confidence intervals are provided in the Appendix. Computer code for regression quantile functions is available in the
Ecological Archives.
Limiting factor theory proposes that limits are imposed in a
heterogeneous fashion across space and time (Cade et al. 1999). Limits can also be visualized in terms of constraints imposed by
processes at one scale on those of another scale, in a hierarchical fashion.
For example, changes in slope in quantiles at the extremes of the data distribution may
provide different information than changes at the center of the distribution. These changes hold useful information; interaction of unmeasured factors
results in greater variation in the data distribution, and estimates at the
extremes of the distribution help account for this interaction (Cade et al.
1999). When the measured
independent variable (x) is constraining the measured response (y), changes in
slope at upper quantiles are useful for estimating this relationship.
Example 1: Prairie vole models
Models of change in prairie
vole abundance led to insights about the relative importance of landscape
characteristics for this species, and they provided examples of the types of
information gained by using regression quantiles. The negative rate of change in vole numbers appears to be greater at
higher quantiles when quantile regression lines are plotted for the prairie vole
model that includes distance to riparian cover type; the slope of the 90th
quantile is more negative than that of the 50th quantile (Figure 2). Slopes for each quantile (0.01 to 0.99) can be plotted on one graph to
illustrate the magnitude and direction of change in slope for a particular model
(Figure 3). Increasingly negative rates of change at upper quantiles
(0.50 to 0.99) are consistent with a limiting factor effect on prairie vole
abundance at some sample locations. This information was key to interpreting the
importance of shrub or tree cover around
streams and ditches, which provided a network of potential corridors for small
mammals in upland grass areas.
![]() |
| Figure 3. Slope values plotted across all quantile values. Upper
and lower bounds of the 95% confidence intervals are shown in dashed
lines. |
The relationship between prairie vole abundance and landscape urbanization was of particular interest in our study, so we tested the partial effects of distance to riparian and urban landscape composition in a multiple regression analysis. Higher rates of change in abundance at higher quantiles were consistent with a limiting factor relationship for multiple regression models, which were compared using rank score tests and confidence intervals. The chi-square statistic tests for the 50th to 90th quantiles ranged from 0.000 07 < P < 0.023 (distance to riparian only) and 0.000 089 < P < 0.099 (distance to riparian, given urban composition). Confidence intervals were nearly identical for the model containing only distance to riparian (Figure 3) and the model with urban composition included. Urbanization decreases the area and diversity of grasslands, including riparian areas, and this indirect effect is seen in the regression quantile analysis. Looking at changes across data distributions gives a picture of dynamic processes that vary over space and time, rather than a collapsed view of overall propensity represented by measures of central tendency.
Example 2: Lark sparrow models
Comparing
models across quantiles allowed a more comprehensive view of ecological process
as represented by lark sparrow abundance in the Boulder Valley. Lark sparrow distribution in the Boulder study area was clumped and
localized (Haire et al. 2000). Geographic
location, or spatial structure can be used as an indirect descriptor of various
processes that have generated spatial structure, leading to hypotheses about unmeasured
influences on phenomena of interest (Borcard and Legendre 1994). We examined three models to test whether including spatial
location provided useful information about this species’ distribution and
abundance. Models containing
coordinate location, and local and landscape plant community descriptors were
compared with a parameter model that is equivalent to conventional univariate
quantiles (Figure 4). The location
model contained only location coordinates, and the habitat model contained 2
variables: the plant community descriptor for shale types measured at the local
scale, and the composition of upland grass cover types in the surrounding
landscape. At higher quantiles, the
model with location and habitat variables was most parsimonious (smallest AICC). In other applications, model comparisons at central or lower ends of the
spectrum may have important implications for study questions. Examination of the differences in AICC across quantiles allows
the opportunity to choose a particular quantile model or set of models that best
describes relationships of interest.
![]() |
| Figure 4. Difference in AICC values for three lark sparrow models. |
Discussion
Other Disciplines as Sources of New Ideas
When our observations contradict familiar paradigms, we need to find new ways to address change creatively. Advancements in technology, associated with computer science, electronics, and satellite technologies have filled this role in many cases. The development of computers made it possible to organize and analyze data efficiently. The availability of remotely sensed images allows an aerial perspective that has resulted in new questions and new methods of study. Software that incorporates spatial data structures has allowed our questions to extend to landscapes and ecosystems. From Ada Lovelace’s early conception of computer programming, an increasingly higher level of abstraction of concepts has enabled implementation of many new ideas. Particularly relevant to the regression quantile methods presented herein are the use of linear programming, and development of the simplex algorithm for optimization problems by George Dantzig in the 1940s (Hadley 1962).
In response to needed paradigm shifts, scientists
have drawn upon statistical methods that originated in various disciplines. For
example, spatial statistics, including analysis of spatial autocorrelation (e.g.
Moran’s I), kriging, and trend surface analysis have roots in mining
engineering and geology (Cressie 1991). The
quantification of spatial relationships reflects the underlying focus on the
importance of location especially in the field of geography. This focus has become increasingly influential in landscape ecology. Other statistical approaches to studying extreme values that were
originally developed in geology, climatology, and engineering have recently been
applied in ecology. These
extensions are another example of the need to go beyond mean and variance in
understanding extremes of biological interest, such as temperature, force, and
lifespan (Gaines and Denny 1993). Econometricians were pioneers in the development of methods to examine
limiting functions of various factors (e.g., cost, productivity) that led to
implementation of regression quantiles and improvements in its application.
Diverse Worldviews, Theories, and Methods
Inclusion of alternative viewpoints in the scientific community will influence scientific practice (Longino 1996). An obvious benefit is the increased recognition of the connection between cultural values and theoretical and methodological approaches (Birke 1986, Hodge 1995). All concepts (potential solutions in a paradigm shift) must be in harmony with worldviews. For example, probability theory, with its notion of random events, developed in a society that had discarded the belief that gods directed all events (Lightner 1991).
Alternative descriptions of
biological and social systems emerge from different worldviews. Women scientists in primatology brought different perspectives to
observation of social interactions at a time when dominance hierarchies with
male leadership were thought to be universal among primate species (Rosser
1992). Barbara McClintock’s
worldview led her to propose theories that were in sharp contrast with those of her
contemporaries in molecular biology (Magada-Ward 1999). Her belief that exceptions or anomalies have meaning in and of themselves
led her to discover the ability of normal DNA to rearrange itself in
response to environmental pressures.
People of the First Nations in
North America hold a view of ecology that differs from Western scientific
understanding in many respects (Pierotti and Wildcat 1997). The language used in Western ecology reflects divergent philosophies;
terms such as “natural resources” convey a value system based on the idea of
commodities, rather than places with spiritual and material importance (Kimmerer
1998). Observations by Native peoples have contributed to
understanding relationships in nature (Pierotti and Wildcat 1997). The view that badgers and coyotes cooperated in hunting
efforts was long recognized in American Indian stories, while competition theory
precluded this understanding in Western ecology. Stories of friendship between wolves and ravens are one
example of the greater knowledge of Native people concerning the behavior and
ecology of wolves, now recognized to be valuable in understanding these animals
(Pierotti and Wildcat 1997).
Conclusion
Changes in thought and practice occur when there is conflict between theory, observation, and method. Our need to understand variability in ecology can lead to considering and applying new techniques. The tension between theory that emphasizes heterogeneity and analysis of central tendency was resolved by a new technique that highlights the complexities of ecological process. Regression quantiles may be the first of many new methods that represent a shift in emphasis toward understanding heterogeneous, dynamic systems.
Regression quantiles are
particularly applicable to ecological data because results can be interpreted in
the context of ecological theory. Limiting
factor theory suggests that when a factor required for biological process is
limiting, its abundance determines the magnitude of biological response at
particular places and times (Kaiser et al. 1994, Cade et al. 1999). We demonstrated that modeling relationships throughout data distributions
is a useful approach, and focusing on changes along the edge of distributions
may capture limiting relationships and account for the interactive effects of
unmeasured factors. The development
of regression quantiles by econometricians, and the extensive body of useful
ideas and techniques brought to ecology from other disciplines demonstrates the
importance of communication in the scientific community. Dialogue at conferences, on-line, and through research that integrates
scientific disciplines facilitates exchange of information needs and resources
for solutions.
Another valuable resource for
ideas can be found in diverse perspectives brought to science by individuals
with different worldviews. It is
interesting to imagine the approaches that would result from more radical
epistemologies, including belief in connectedness of nature and conception of
the world as a whole (Rosser 1992). How
would the idea of web, or spiral models, rather than hierarchical models, be
useful in ecological study? What
types of information would be gained using cyclical models of time as an
alternative to linear conceptions of time? A scientific community that remains open to criticisms of evidence,
methods, assumptions, and reasoning will hold greater potential for gaining
information that contributes to problem solving in the future. Creating forums for presenting scientific research outside of
traditional paradigms would broaden the scope and influence of science as a
comprehensive system of thought and practice.
Acknowledgements
We thank J. Carpenter, M. Hippard, N. Kotliar, B.L. Lamb, C. Melcher, and C. Miller for insightful comments on an earlier draft, and L. Everette for web formatting. City of Boulder Open Space provided imagery and access to properties. Research support by US Geological Survey, University of Colorado, and Colorado State University is gratefully acknowledged.
Endnotes
Literature Cited
Berry ME, Bock CE, Haire SL. 1998. Abundance of diurnal raptors on open space grasslands in an urbanized landscape. The Condor 100:601-608.
Birke L. 1986. Women, feminism, and biology: The feminist challenge. Methuen, New York.
Borcard D, Legendre P. 1994. Environmental control and spatial structure in ecological communities: an example using oribatid mites (Acari, Oribatei). Environmental and Ecological Statistics 1:37-61.
Cade BS, Terrell JW, Schroeder RL. 1999. Estimating effects of limiting factors with regression quantiles. Ecology 80(1):311-323.
Cressie NAC. 1991. Statistics for spatial data. J. Wiley, New York. 900 p.
Diaconis P. 1998. A place for philosophy? The rise of modeling in statistical science. Quarterly of Applied Mathematics LVI(4):797-805.
Freedman D. 1997. From association to causation via regression. Advances in Applied Mathematics 18:59-110.
Gaines SD, Denny MW. 1993. The largest, smallest, highest, lowest, longest, and shortest: extremes in ecology. Ecology 74(6):1677-1692.
Hadley G. 1962. Linear programming. Addison-Wesley Publishing Company. Reading, Massachusetts. 520 p.
Haire SL. 1998. Spatial factors influencing bird distribution in grasslands near Boulder, Colorado. MS Thesis, Colorado State University, Fort Collins. 127 p.
Haire SL, Dean DJ, Bock CE. 1998. Incorporation of spatial dependence in models of bird community response to landscape pattern. Page 149-163 in H J-H. Whiffen and W C. Hubbard, Editors. SOFOR GIS ’98. Proceedings of the 2nd Southern Forestry GIS Conference. October 28-29, 1998, Athens, Georgia.
Haire SL, Bock CE, Cade BS, Bennett BC. 2000. The role of landscape and habitat characteristics in limiting abundance of grassland nesting songbirds in an urban open space. Landscape and Urban Planning 48(1-2): 65-82.
Hodge DC 1995. Should women count? The role of quantitative methodology in feminist geographic research. Professional Geographer 47(4):426.
Hurvich CM, Tsai C-L. 1990. Model selection for least absolute deviations regression in small samples. Statistics and Probability Letters 9:259-265.
Kaiser MS, Speckman PL, Jones JR. 1994. Statistical models for limiting nutrient relations in inland waters. Journal of the American Statistical Association 89(426): 410-423.
Kimmerer R. 1998. Bringing the Native perspective into natural resources education. Winds of Change 13(3): 14-18.
Koenker R. 1994. Confidence intervals for regression quantiles. Pages 349-359 in P. Mandl and M. Hušková, Editors. Asymptotic statistics: Proceedings of the Fifth Prague Symposium. Physica-Verlag, Heidelberg, Germany.
Koenker R, Bassett G. 1978. Regression quantiles. Econometrica 46: 33-50.
Koenker R, Portnoy S. 1996. Quantile regression. Office of Research Working Paper Number 97-0100. University of Illinois at Urbana-Champaign, College of Commerce and Business Administration Office of Research.
Kotliar NB. 1996. Scale dependency and the expression of hierarchical structure in Delphinium patches. Vegetatio 127:117-128.
Kuhn TS. 1996. The structure of scientific revolutions, 3rd Edition. University of Chicago Press, Chicago, Illinois.
Lightner JE. 1991. A brief look at the history of probability and statistics. The Mathematics Teacher 84(8): 623-630.
Longino HE. 1996. Subjects, power, and knowledge: description and prescription in feminist philosophies of science. Chapter 17 in Keller, EF. and HE. Longino, Editors. Feminism and Science. Oxford University Press, Oxford, New York.
Magada-Ward M. 1999. Rescuing Keller by abducting her: toward a pragmaticist feminist philosophy of science. Journal of Speculative Philosophy 13(1): 19-38.
Ott RL. 1992. An introduction to statistical methods and data analysis. Duxbury Press, Belmont, California. 1051 p. + Appendices.
Pierotti R, Wildcat DR. 1997. The science of ecology and Native American tradition. Winds of Change (Autumn): 94-97.
Rosser SV. 1992. Are there feminist methodologies appropriate for the natural sciences and do they make a difference? Women’s Studies International Forum 15(5/6): 535-550.
Wiens JA. 1989. The ecology of bird communities. Vol. 2. Processes and variations. Cambridge University Press. 316 p.
Wiens JA. 1995. Landscape mosaics and ecological theory. Chapter 1 in Mosaic landscapes and ecological processes. Hansson L., L. Fahrig, and G. Merriam, Editors. Chapman and Hall, London.
Authors
Sandra L. Haire, Wildlife
Biologist
US Geological Survey, 4512 McMurry Ave. Fort Collins, Colorado 80525 USA
e-mail: sandy_haire@usgs.gov, Tel: 970-226-9367, Fax: 970-226-9230
Carl E. Bock, Professor of Biology
University of Colorado, Boulder, Colorado 80309
e-mail: bockc@colorado.edu, Tel: 303-492-7184, Fax: 303-492-8699
Brian S. Cade, Biostatistician
US Geological Survey, 4512 McMurry Ave. Fort Collins, Colorado 80525 USA
e-mail: brian_cade@usgs.gov, Tel: 970-226-9326, Fax: 970-226-9230