In today's blog entry, I will walk through the basics of conducting a repeated-measures MANCOVA in SPSS. I will focus on the most basic steps of conducting this analysis (I will not address some complex side issues, such as assumptions, power…etc). Full disclosure: the example data used is from the SPSS sample/help files, and it can be downloaded below.
Let's get started:
Repeated-Measures MANCOVA is used to examine how a dependent variable (DV) varies over time, using multiple measurements of that variable, with each measurement separated by a given period of time. In addition to determining whether the DV itself varies, a MANCOVA can also determine wether other variables are predictive of variability in the DV over time. If that wasn't crystal clear, don't worry, just keep reading.
Repeated-Measures MANCOVA Example:
In our example, your local stats store Stats "R" Us launched a marketing campaign, with three different strategies (variable name:
promo; value labels: Strategy A, Strategy B, Strategy C). Stats "R" Us launched campaigns in markets of three different sizes (variable name: mktsize; value labels: Small, Medium, and Large), and measured the sales in each store every three months over the course of one year (4 time points; variable names: sales.1, sales.2, sales.3, and sales.4; see data below).
NOTE: Sales are scaled in "thousands" (e.g. 70.63 is actually $70,630). Also, your data should be in person-level (a.k.a. "wide") format (as opposed to person-period, a.k.a. "long", format), meaning each row of data is a single case (store, in our example). If it were in person-period (long) format, each case (store) would have the number of rows equal to the number of repeated measures (four, in our example), because the repeated measures (sales.1, sales.2, sales.3, and sales.4) would be stacked to form a single variable (Sales). Here is a useful resource for converting data between the two forms: CLICK HERE FOR INFO ABOUT CONVERTING DATA FORMS.
To begin your analysis using the SPSS drop-down menus, click on: Analyze > General Linear Model > Repeated Measures... (1, below)
In the Repeated Measures Define Factors dialogue window, do the following:
- Replace the default Within-Subject Factor Name, which is factor1, with your own name for the concept of time. I've chosen to use the name Time (1, below).
- Type the number of times your DV was measured (how many DV variables you have) in the Number of Levels box (2, below) and click the "Add" button.
- Choose a name for your DV (the variable that is measured repeatedly), and type it in the Measure Name box. I chose the name Sales (3, below).
- Click the "Add" button again (4, below).
- Click the "Define" button (5, below).
In the Repeated Measures dialogue window that appears next, move the four sales variables (1, below) to the Within-Subjects Variables (Time) box (2, below).
NOTE: Be sure that they stay in the same order (Sales.1, Sales.2, Sales.3, and Sales.4).
Next, move both promo and mktsize to the Between-Subjects Factor(s): box (1, below).
NOTE: Both promo and mktsize were placed into the Between-Subjects Factor(s) box because they are categorical variables (discrete variables). Continuous variables (scale variables) would go into the Covariates box (2, below).
Next, click on the "Model" button (3, above).
In the Repeated Measures: Model dialogue window (1, below), you can specify your model. In other words, you can choose which variables have "main effects" on the DV (individual predictors), and which variables might interact with each other to predict the DV. The default option is the Full factorial (2, below), which will examine every variable's main effect, as well as every possible interaction among all variables.
We'll stick with Full factorial for today. However, if you wanted build your own model, you can choose Custom (1, below), and then use the Build Term(s) tool (2, below) to specify what kind of effects/interactions you want. Again, in this example, we'll stick with Full factorial (2, above). To exit this dialogue window, click the "Continue" button.
Next, you'll need to click on the "Contrasts" button (1, below). In the Repeated Measures: Contrasts dialogue window that appears, you can change each factor variable's type of contrast. I recommend leaving the Time variable with its default contrast "Polynomial" (2, below), and changing both promo and mktsize to "Simple" and "First". To change each, you must select "Simple" from the list, click on "First", and then click on the "Change" button (3, below).
Next, click on the "Plots" button (1, below). In the Repeated Measures: Profile Plots dialogue window that appears (2, below), you can choose what graphs you'd like to see. In repeated measures models, I like to produce plots with Time on the Horizontal Axis (x-axis; 3, below) and my factor variables as Separate Lines (4, below).
NOTE: The reason you don't see anywhere to specify the vertical axis (y-axis), is that the DV (i.e. Sales) is assumed to be on the y-axis in this dialogue window.
As you can see, in our example I've made a Time-by-Factor plot for each of the factors in our model (promo and mktsize).
If you'd like to get Post Hoc comparisons of the DV (comparing between each of the factor levels, respectively), click on the "Post Hoc" button. Once in the dialogue window:
- Move the factors from the Factor(s) box (1, below) to the Post Hoc Tests box (2, below).
- Choose the type of Post Hoc test to use, and place a check-mark in its respective box (you can choose more than one). The most commonly used is Tukey's, which I've chosen below (3, below).
- Click the "Continue" button.
I also recommend clicking on the "Save" button (1, below), and choosing Predicted Values:Unstandardized (2, below) and Residuals: Unstandardized (3, below) in the Repeated Measures: Save dialogue window.
NOTE: By checking these two boxes, your analysis will now produce two new variables in your dataset, called PRED_1 (Predicted) and RES_1 (Residual), which can be used to produce graphs after analysis (if you choose). We will not cover this in this tutorial.
Back at the main Repeated Measures dialogue, you can either click "OK" (1, below), to execute the analysis, or you can click "Paste" (2, below), to paste the analysis commands into a syntax window. I recommend choose the "Paste" option, as that will allow you to more easily re-create the analysis later.
Below is the syntax window, with the various commands of the analysis, which you specified while going through the dialogue windows. Over time, you may learn to use syntax exclusively, bypassing the need to use the dialogue windows. Learning syntax can dramatically improve your efficiency, especially when you need to create a lot of different types and/or iterations of analyses.
To execute the commands in the syntax, simply highlight all the text you want to run, and push the green play button (1, above). Alternatively, you can use the menus: Run > Selection.
There is a lot to digest in the output file that results from an analysis, so we'll stick to the basics. Below is the Descriptive Statistics table, which simply shows the Mean (1, below), Standard Deviation (Std. Deviation; 2, below), and sample size (N; 3, below) for each DV, broken-down by all subgroups of your factors (promo and mktsize).
The next table we'll examine is the Mauchly's Test of Sphericity. This test essentially determines whether the variance of the difference between each pair of repeated measure (of your DV) is approximately equal. This is a bit of an over-simplification, but it'll work here. For our purposes, we just need to be concerned with whether it is significant or not (1, below). If it is NOT significant (i.e. Sig. is greater than .05), then sphericity can be assumed (more on that soon). If it IS significant (i.e. Sig. is less than .05), then sphericity can not be assumed (more about why we care, in a moment).
NOTE: I know I said earlier that we wouldn't deal with assumptions today, but this is an exception, because it directly determines how we interpret the next table...
In our example, we CAN NOT assume sphericity (p=.003).
The reason we want to note whether sphericity can be assumed, is that it directly determines how we interpret our next table, the Tests of Within-Subjects Effects table (1, below). For each effect in our model, there are four estimates present (2, below). If sphericity CAN be assumed, then we can reference the first estimate, aptly labeled Sphericity Assumed. If sphericity CAN NOT be assumed, then we'll want to reference one of the other three (the differences between them is somewhat esoteric, but I typically choose Greenhouse-Geisser). In either case, we reference the Sig. column (3, below) to determine whether our effects are significant.
In our example, we see that we had no significant effects. Since we could NOT assume sphericity, the Greenhouse-Geisser test tells us that Time was not a significant predictor of Sales (i.e. there was no overall positive or negative trend in Sales in the company as a whole), F(2.743, 340.097)=.743, p=.516, ηp2=.006.
We also see that neither promo F(5.485, 340.097)=.660, p=.668, ηp2=.011, nor mktsize F(5.485, 340.097)=1.048, p=.391, ηp2=.017 interacted with Time to predict trends in Sales. Additionally, there was also no significant three-way interaction between Time, promo, and mktsize F(10.971, 340.097)=.940, p=.502, ηp2=.029. Take note of how I report those statistics , as it is necessary for APA format.
The next table was produced because we chose the "Polynomial" contrast for Time earlier. It is very useful in case non-linear relationships exist in your data. More specifically, it determines whether there is a Linear or a non-linear relationship exists, such as Quadratic or Cubic (1, below). The more nuanced differences between these effects is beyond the scope of this blog, but Notre Dame's Dr. Richard Williams explains it well in his page on Non-linear Relationships.
The Tests of Between-Subjects Effects table (below) shows whether the factors were associated with differences in Sales (overall, as opposed to whether there were differences in trends).
Results indicate that both promo F(2, 124)=12.837, p<.001, ηp2=.172 and mktsize F(2, 124)=15.085, p<.001, ηp2=.196 were predictive of differences in Sales (overall), while the interaction between the two was not significant F(4, 124)=.186, p=.945, ηp2=.006. These results may seem a bit confusing, because they are in direct contrast to the within-subject effects reported earlier, but it will become more clear when we examine the plots of the effects next.
The plot below shows the mean Sales at each of the four data collections, for stores using each of the three promotional Strategies (three lines). The graph demonstrates that there are distinctions between sales numbers of the three strategy groups, as (Strategy A was highest at every time point and Strategy B was lowest at every time point). However, since the trend for each group (if you were to impose a trendline across the four points for each group) is not dramatically different (and because the interaction term was not significant), we can't clearly say that one promotional strategy is superior to the others.
Since, the differences between groups at time pionts 2, 3, and 4 are largely reflective of the differencs that existed at baseline (time 1), it seems that differences that exist between groups are more likely attributed to differences in the composition of the groups, rather than differences in the promotional strategy. The graph for mktsize can be interpreted in the same way as promo.
This graph further shows how it is better to examine within subject differences when analyzing change over time, as plotting those effects makes the lack of differences in trend between promo groups more clear. Thanks for reading and please leave comments and/or questions!