There has been a lot of discussions on offering an introductory statistics course as part of the graduate students’ curriculum at Odum School of Ecology over the years. A few different kinds of introductory statistics courses were offered in recent years. I heard quite a few discussion on the strengths and weakness of these courses. When I served as the graduate student representative in the graduate program committee two years ago, the discussion on offering a introductory statistics course often pop up in the conversation. I have been wondering what is the ideal statistic course for ecologist.
To be clear, we as ecologists face complex data. Almost none of the data we encounter in real research are the standard textbook examples. Expecting one or two classes to solve all these problems is unrealistic. But can we design an introductory statistics course to better prepare us for these complex problems ? If we were to require one statistics course for all ecology graduate students, how should that course be?
The goal of an introductory statistics course for ecologists
It is useful to start the discussion with what we want to get out of a statistics course. One of my committee members, John Drake, summarizes this well. I took the population ecology class with him and he talked about his vision of the course at the beginning of the semester. He summarized the goal of the course, and more generally of graduate school, as learning “how to think about problems” and “how to solve the problem”. I like this summary a lot. Specifically for a statistic course, this means we want to learn 1) what methods we should use for a particular problem and 2) how we can perform the analysis with software. This requires theoretical understanding of various statistical methods and practical skills of performing these analyses with softwares. In my opinion, a good statistical course should try to achieve both of these goals.
Common ways of structuring statistic courses
I have taken quite a few statistics courses both from ecology department and statistics department in graduate school. These courses have very different approaches. As a start, let’s look at these different approaches more closely.
1. The statistics major approach. This is how the statistics department trains their graduate students. This is how I was trained when I did the master degree in statistics. You take a lot of courses, each focusing on a specific topic. You start with basic probability, mathematical statistics and maybe some linear algebra. Then you move on to theories and applications of linear models. After these required courses, you take more advanced topics, such as generalized linear model, linear mixed model, nonlinear model, stochastic process, experimental design etc. You also take a few application focused classes, such as programing in statistics and statistical consulting. Each course is a in-depth treatment of a particular method or theory. Through this kind of training, you know many different methods in depth but it also takes a lot of time. While I personally think this is probably the only way you know various methods well, it is not realistic to offer such training in an ecology graduate program. On one hand, we often don’t have all the expertise to teach these courses within an ecology program. On the other hand, we do not have time to go through this many courses. After all, we are ecologists and our focus is on ecology. It might be the desired way to learn statistics for some, but I don’t think it is feasible to use this as a general approach to teach statistics in an ecology program.
2. The method survey approach. This is probably the most common approach of teaching introductory statistics in ecology. The course covers a variety of methods without discussing the theories behind the methods in depth. I have taken a few classes like this. For example, I took a biometry class with John Kelly when I first started graduate school at University of Kansas. We went over the Biometry textbook by Sokal and Rohlf. The class covers a fairly broad range of statistical methods and often offers a step by step recipe for each method. John Drake and Seth Wenger recently taught a general statistics class using the book Modern Applied Statistics with S. I did not take this class but helped some of the students in their data analyses session. The class covers many methods briefly and some of the methods covered are fairly advanced. Students have the opportunity to analyze their own data with the help of the instructors. In my opinion, the method survey approach is probably the most appropriate way to offer statistics class to ecology students. The difficulty of teaching a course like this is balancing the tradeoff between breath and depth. Most statistic course in ecology tends to cover a lot of methods. You sacrifice the time to introduce the basic theory. Without proper understanding of the theories behind models, one can easily make mistakes in choosing the method and interpreting the results. Properly balancing depth and breath is quite a challenging task.
3. The unifying framework approach. I have seen relative few people approaching statistics course this way. My advisor, Ford Ballantyne, taught a course likelihood based inference in ecology a few years ago at UGA and a similar course together with Mark Holder and John Kelly at KU. Denis Boos has an excellent textbook called Essential Statistical Inference that covers mathematic statistics based mostly on theories of likelihood. Personally, I like the generality and elegance of such unifying framework approach. But approaching a course this way often requires a in depth treatment of theories. It is often difficult to see how the various methods are linked to the unifying framework without knowing the theories clearly. As a result, we may be left with some understanding of the theory but are still quite far away from putting these to real applications.
A combination of approaches 2 and 3 is perhaps a promising method. Method survey could be the backbone of the class but emphasis on a few general themed can be infused throughout the course. From my experience in statistic courses, I think the following points could be helpful in designing an introductory statistics course for ecology students.
1. Cover the most commonly used statistical methods and put a little more depth to them. These simple methods can handle quite a large range of problems. More advanced methods can be deferred to more advanced classes. In my opinion, linear model, generalized linear model, linear mixed model and nonlinear model should provide us sufficient tools for analyzing most of data we encounter. These commonly used method can be discussed in a little more depth. For example, in addition to talking about what the method generally does, we can discuss more on the model assumptions, fitting procedure, model evaluation etc.
2. Put more emphasis on what the model really assumes. For most common statistical models, what we do essentially is to specify the mean structure and the variance structure. Thus making it clear what the model is and what we estimate from the data are essential. This also helps us understand what assumptions we need to check when fitting the model. When fitting models with software, it is useful to point out what terms in the model are reported in the output, which further reinforce the understanding of model assumptions.
3. Emphasize when the model is suitable. I think this follows naturally from a understanding of the model assumptions. For example, when we fit a one way ANOVA model, we assume that each treatment has a unique mean and the error is iid normal. What we do is to use the data to estimate the mean of each treatment. This sort of understanding helps us understand what the model assumptions are and naturally when the model applies.
4. Making the connection across methods, unifying them with some general themes of theories. For example, likelihood based or sum of square based hypothesis testing are both very commonly used in many models. Illustrating how it is applied in various methods could be useful to see some generalities across different methods. Such connection across methods can also be made to see how various method tackle one particular kinds of problem. For example, a contingency table can be analyzed with a Chi-square goodness of fit test, a G test, a generalized linear model or a log-linear model. They essentially does the same thing. But understanding the similarities and subtle differences help us choosing the best method for the job.
5. Demonstrating the whole work flow of conducting analysis with more realistic data. We often see a lot of examples demonstrating one aspect of analysis. A lot can be learnt when we see how we conduct analysis step by step from the very beginning. For example, fitting a simple linear model is straightforward with most software. But knowing how to do model assumption check, model selection, linear contrasts, pairwise comparison, confidence/prediction intervals may not be as familiar to many of us. From my experience, this sort of whole workflow demonstration is not so commonly used in classes. But we have to perform the whole procedure for real data. Seeing how this is done in class could be very helpful.
6. Analyzing messy data in class. This really help us put different pieces of knowledges together. It is in those moments when the suitable method is not obvious or when there is ambiguity in what the best methods is that we comprehend statistical methods better.
One more minor point
Many introductory statistics course covers ANOVA by using ANOVA table. For different kinds of models, we use different ratio of mean sum of square for hypothesis testing. I find this really confusing. ANOVA table is essentially a particular hypothesis test in linear model. It is theoretically no different from testing slopes in linear regressions or testing linear contrast in linear models. So why not approach it from a hypothesis testing perspective? It provides a unifying way to looking at all linear models and is more clear about what we are really testing. Christensen’s book Plain Answers to Complex Questions has a great presentation on how ANOVA and hypothesis testing are the same thing.