FAQ: ANOVA or linear mixed model in Genstat?
Supakorn and V. Cave
Modern statistics and experimental design can be traced back to 1919 when R. A. Fisher was employed by Rothamsted Experimental Station to analyse field experiments. Fisher’s many remarkable contributions to statistics include developing analysis of variance (ANOVA), a widely used statistical technique for analysing data from designed experiments. However, over recent decades many analysts are now using linear mixed models (LMMs), also known as multi-level models, to analyse data from designed experiments instead of ANOVA. But which method should you use?
Like ANOVA, LMMs can analyse data with more than one source of variation, in addition to the usual residual term (e.g. data from a split-plot experiment). However, there are several important advantages of LMMs over ANOVA. LMMs can:
* analyse unbalanced data sets (e.g. unbalanced designs or data sets containing missing values)
* model correlations between observations (e.g. repeated measures data or spatial data)
Genstat has a very powerful set of ANOVA and LMM tools that are straightforward and easy to use. In Genstat, the REML algorithm is used to fit LMMs. For balanced data sets, the LMM results are the same as those from ANOVA, but it cannot provide an analysis-of-variance table. Also, the ANOVA algorithm is much more efficient than the REML algorithm for fitting LMMs, so it is better to use ANOVA whenever possible.
Let’s compare the ANOVA and LMM analyses of a balanced data set.
An experiment was conducted to study the effect of two meat-tenderizing chemicals and three temperatures on the force required to break strips of meat. Two hind legs were taken from four carcasses of beef and one leg was treated with chemical 1 and the other with chemical 2. Three sections were then cut from each leg and randomly allocated to three cooking temperatures. All 24 strips of meat (4 carcasses × 2 legs × 3 sections) were cooked in separate ovens.
Notice that this experiment has a hierarchical “split-plot” design, with sections (i.e. sub-plots) nested with legs (i.e. whole plots) nested with carcasses (i.e. blocks). The two chemicals were randomly allocated to the two legs within each carcass and the three temperatures to the three sections within each leg.
Let’s analyse the experiment using both ANOVA and a LMM, and compare the results.
To analyse the data in Genstat using ANOVA, from the main menu select Stats|Analysis of Variance|General…. We can customize this menu for the split-plot design by selecting the “Split-plot design” in the Design drop-down list. The model is specified by populating the Y-variate, Treatment structure, Blocks, Whole plots, and Sub-plot fields as shown below:
The ANOVA table shows that we have three strata in the hierarchy, corresponding to the three random terms: carcass (blocks), leg within the carcass (whole plots), and section within leg within carcass (sub-plots). The analysis automatically determines where each fixed (or treatment) term is estimated and compares it with the correct residual. So, the sum of squares for the chemical is compared with a residual which represents the random variability of the legs within carcasses. Conversely, temp and the chemical by temp interaction are compared with the residual for sections within legs.
To analyse the data using an LMM, from the main menu select Stats|Mixed Models (REML)|Linear Mixed Models…. The response, fixed terms and random terms in the model are specified as shown below:
We can control which output to display by clicking the Options button:
Genstat estimates a variance component for every term in the random model, apart from the residual (green rectangle). The residual term is a random term with a parameter for every unit in the design, here the 24 strips of meat (4 carcasses × 2 legs × 3 sections). The variance component of a term measures the inherent variability of that term over and above the variability of the sub-units of which it is composed. This is generally positive, suggesting the units become more variable the larger they become. However, in this example the variance component for carcass.leg is negative, indicating that the legs are less variable than the sections within the legs. (This is the same conclusion we draw from the analysis-of-variance above by comparing the residuals in the different strata.)
The significance of the fixed terms is assessed using F statistics (purple rectangle), or, if the denominator degrees of freedom can’t be estimated, the less reliable Wald tests. For orthogonal designs, the F statistics and p-values will be identical to those generated by ANOVA. For non-orthogonal designs, the F statistics have approximate F-distributions and the order of the tests should be concerned!
As we have seen in the above example, for balanced data, LMM and ANOVA give the same result. However, ANOVA is preferred as it provides an analysis-of-variance table and the ANOVA’s algorithm is much more efficient than the REML algorithm used to fit the LMM.