So things have been a bit busy lately... for the last four years. But, I’m back. I’m not going to make some grandiose promise that I probably can’t keep about starting to write weekly entries. Let’s start with trying to beat my last effort, post again in less than four years!

Anyway, I always found one of the toughest parts of blogging to be thinking of a topic. With that in mind, I have an idea: CROWDSOURCE!

If there is a topic you'd like to see explained in a future blog, please submit in the form below. As responses roll in, I'll show them in the bar graph below and we'll see which topics emerge as the favorites!

The core component of all four of these analyses (ANOVA, ANCOVA, MANOVA, AND MANCOVA) is the first in the list, the ANOVA. An "Analysis of Variance" (ANOVA) tests three or more groups for mean differences based on a continuous (i.e. scale or interval) response variable (a.k.a. dependent variable). The term "factor" refers to the variable that distinguishes this group membership. Race, level of education, and treatment condition are examples of factors.

There are two main types of ANOVA: (1) "one-way" ANOVA compares levels (i.e. groups) of a single factor based on single continuous response variable (e.g. comparing test score by 'level of education') and (2) a "two-way" ANOVA compares levels of two or more factors for mean differences on a single continuous response variable(e.g. comparing test score by both 'level of education' and 'zodiac sign'). In practice, you will see one-way ANOVAs more often and when the term ANOVA is generically used, it often refers to a one-way ANOVA. Henceforth in this blog entry, I use the term ANOVA to refer to the one-way flavor.

One-way ANOVA has one continuous response variable (e.g. Test Score) compared by three or more levels of a factor variable (e.g. Level of Education).

Two-way ANOVA has one continuous response variable (e.g. Test Score) compared by more than one factor variable (e.g. Level of Education and Zodiac Sign).

ALSO CHECK OUT: Wikieducator has a nice set of slides explaining the distinctions between one-way and two-way ANOVA |

The obvious difference between ANOVA and ANCOVA is the the letter "C", which stands for 'covariance'. Like ANOVA, "Analysis of Covariance" (ANCOVA) has a single continuous response variable. Unlike ANOVA, ANCOVA compares a response variable by both a factor and a continuous independent variable (e.g. comparing test score by both 'level of education' and 'number of hours spent studying'). The term for the continuous independent variable (IV) used in ANCOVA is "covariate".

ANCOVA compares a continuous response variable (e.g. Test Score) by levels of a factor variable (e.g. Level of Education), controlling for a continuous covariate (e.g. Number of Hours Spent Studying).

ANCOVA is also commonly used to describe analyses with a single response variable, continuous IVs, and no factors. Such an analysis is also known as a regression. In fact, you can get almost identical results in SPSS by conducting this analysis using either the "Analyze > Regression > Linear" dialog menus or the "Analze > General Linear Model (GLM) > Univariate" dialog menus.

A key (but not only) difference in these methods is that you get slightly different output tables. Also, regression requires that user dummy code factors, while GLM handles dummy coding through the "contrasts" option. The linear regression command in SPSS also allows for variable entry in hierarchical blocks (i.e. stages).

The obvious difference between ANOVA and a "Multivariate Analysis of Variance" (MANOVA) is the “M”, which stands for multivariate. In basic terms, A MANOVA is an ANOVA with two or more continuous response variables. Like ANOVA, MANOVA has both a one-way flavor and a two-way flavor. The number of factor variables involved distinguish a one-way MANOVA from a two-way MANOVA.

One-way MANOVA compares two or more continuous response variables (e.g. Test Score and Annual Income) by a single factor variable (e.g. Level of Education).

Two-way MANOVA compares two or more continuous response variables (e.g. Test Score and Annual Income) by two or more factor variables (e.g. Level of Education and Zodiac Sign).

When comparing two or more continuous response variables by a single factor, a one-way MANOVA is appropriate (e.g. comparing ‘test score’ and ‘annual income’ together by ‘level of education’). A two-way MANOVA also entails two or more continuous response variables, but compares them by at least two factors (e.g. comparing ‘test score’ and ‘annual income’ together by both ‘level of education’ and ‘zodiac sign’).

MANCOVA

Like ANOVA and ANCOVA, the main difference between MANOVA and MANCOVA is the “C,” which again stands for “covariance.” Both a MANOVA and MANCOVA feature two or more response variables, but the key difference between the two is the nature of the IVs. While a MANOVA can include only factors, an analysis evolves from MANOVA to MANCOVA when one or more more covariates are added to the mix.

MANCOVA compares two or more continuous response variables (e.g. Test Scores and Annual Income) by levels of a factor variable (e.g. Level of Education), controlling for a covariate (e.g. Number of Hours Spent Studying).

SPSS NOTE: When running either a MANOVA or MANCOVA, SPSS produces tables that show whether response variables (on the whole) vary by levels of your factor(s). SPSS also produces a table that presents follow-up univariate analyses (i.e. one response variable at a time - ANOVA/ANCOVA). This table shows which response variables in particular vary by level of the factors tested. In most cases, we are only concerned with this table when we find significant differences in the initial multivariate (a.k.a. omnibus) test. In other words, we first determine if our set of response variables differ by levels of our factor(s) and then explore which are driving any significant differences we find.

]]>Imagine we collected a score from every person in your town that measured how much they wanted ice cream at the particular moment of data collection (let's say scores could range from 1 to 100, with 100 meaning REALLY WANT ice cream). Further, let's pretend we did this once a day for 5 days. Our within-subject effect would be a measure of how much individuals in our sample tended to change on their wanting of ice cream over the five days.

Each colored line represents individuals' trend line for change over time in liking of ice cream (each person's within-subject effect). The black line represents the average of the individual trend lines in the sample (which is the sample's within-subject effect).

Between-persons (or between-subjects) effects, by contrast, examine differences between individuals. This can be between groups of cases when the independent variable (IV) is categorical or between individuals when the (IV) is continuous. These type of effects can be observed in either the univariate context or the multivariate context (including repeated measures). Either way, between-subjects effects determine if respondents differ on the dependent variable (DV), depending on their group (males vs. females, young vs. old…etc) or depending on their score on a particular continuous IV.

For example, let's return to our ice cream anecdote. If we want to test whether respondents are more likely to want ice cream if they score highly on an IQ test, we are testing for between-subjects effects. In this example, we are seeing if differences between persons with different IQs also have correspondingly different scores for "wanting ice cream". If course, the correct answer here is obviously yes.

Townspeople with a higher IQ tended to like ice cream more than those with a comparatively lower IQ, as the blue trend line shows. This is a between-subjects effect - it is comparing "ice cream liking" between people with various levels of intelligence (IQ).

**Editorial Note:*** Stats Make Me Cry is owned and operated by Dr. Jeremy J. Taylor. The site offers many free statistical resources (e.g. a blog, SPSS video tutorials, and R video tutorials), as well as fee-based statistical consulting and dissertation consulting services to individuals from a variety of disciplines all over the world.*

APA Format Table Example Before and After

Pictured (**above**) are examples of standard SPSS tables (**left**) and tables produced in SPSS after a few adjustments (**right**) to the settings. The table on the right more closely aligns with APA format than the table on the left in several ways:

- The title has been changed from center justified and
**bold**to left justified,*italics*, and NOT bold (**[1] above-right;**APA format).

- The table borders have been adjusted appropriately (details of specific changes to follow shortly).

- The default font type and size has been changed to Times New Roman 12pt.

The adjustments to SPSS that are needed to produce tables like the ones on the right are only necessary to be made once, after which the adjustments are made automatically by SPSS and you'll find all of your future tables are ready for insertion into your APA manuscript immediately after analysis. The necessary changes can be accomplished in 3 steps:

- Produce an initial table for alteration (using any analysis; a simple frequency table is sufficient).

- Create a custom "Table Look Style", by "Editing" the initial table's "Look Style" and saving the changes as a custom "style" ("APA Table" seems like a reasonable choice).

- Adjust your SPSS settings (options) so that SPSS recognizes your newly created "Look Style" as the default table "Look Style".

From there, you can simply run your analyses as you typically would and your tables should be formatted in APA format. Let's get into the specifics about how to accomplish these three steps...

**1) PRODUCE INITIAL TABLE**

The first step to make your SPSS adjustment is to produce an initial table for editing. For our purposes, a simple frequency does the trick (in the SPSS drop-down menus, navigate to: Analyze>descriptives>frequencies). Once your table is produced (

**below**

), right click on the table and click on "Edit Content" and then either "In Viewer" or "In Separate Window" (it doesn't really matter which you choose, for our purposes).

SPSS Edit Content Menu

Once your table is in "editing" mode (**below**), right click again and click on "TableLooks..."

Next, the "TableLooks" screen (**below**) should pop-up.

Under "TableLooks Files:", change the selection "CompactAcademicTimesRoman" (**[1]****below**).

While simply making that switch gets us a lot closer to APA format than the "default" SPSS table, we can improve the settings to get us much closer with a few additional changes.

NOTE: "CompactAcademicTimesRoman" is the closest "TableLook" to APA on its own, but luckily we can alter its attributes and save the changes!

Once you've clicked on "CompactAcademicTimesRoman", click on the "Edit Look..." button (**[2] above**). After clicking on "Edit Look...", the "Table Properties" screen should pop-up (**below**).

Within the "Table Properties" screen, we are going to adjust elements of both the "Cell Formats" tab (**above)** and the "Borders" tab (**[1] below**).

First, the "Cell Formats" tab (**above)**:

On the "Cell Formats" screen, you are able to adjust: the tables "Text" (font), the "Alignment" (justifications) of the text, the background color (which we will not be adjusting), and the "Inner Margins". We will only be changing the "Text" and "Alignment" settings. We'll deal with the "Text" first.

The default of all text in SPSS tables is 8 pt (**[4] above**), while the appropriate APA format font is 12 point, so the first thing we'll need to to is change all of the text in the table from 8 pt (**[4] above**) to 12 pt.

**Unfortunately, you are required to change each text element separately**by either clicking on the element in the "Sample" table on the right side of the screen (**[1] above**), or by selecting different elements in the "Area" drop-down menu (**[2]****above****)**.

**For example**

, click on the "

**Table Title**

" (

**[3] **

**above**

)

in the "Sample" table to edit that element. After clicking on the element, simply adjust the attributes on the left side of th screen

**NOTE**

: to comply with APA format for table titles, change your font size from 8 pt. to 12 pt. (

**[4] **

**above**

), make it italics and not bold (

**[5] **

**above)**

, and click on "Left Alignment" (

**[6] **

**above**

**]**

)

Next, switch to the "Borders" tab (**[1] below).**

Once in the "Borders" tab, there are three elements that we are going to adjust:

- Top inner frame (
**[2] above**)

- Bottom inner frame

- Data area top

To adjust the "Top inner frame", highlight it in the Border menu section (

**[1] below**

). Next, click on the "Style" drop-down menu (

**[2] below**

) and change the style from the double line (not APA format) to the single thin line (

**[3] below**

; second from the bottom; complies with APA format).

SPSS TableLooks Screen Example Cell Table Properties: Top Inner FrameNext, repeat the style adjustment for the "Bottom inner frame" (**[1] below**).

Again, repeat the style adjustment for the "Data area top" (**[1] below**).

Next, click the "Apply" button (**[2] above**), followed by the "OK" button (**[3] above**).

**2) CREATE CUSTOM TABLE LOOK STYLE**

After clicking the "OK" button, you should find yourself back at the "TableLooks" screen (**[1] below**). On this screen click on "Save As" (**[2] below**).

In the "Save As" dialogue screen (**below), **give your newly create table "Look" a name, preferably something self-explanatory and easy to remember. As you can see, I choose to call it "APA Table"(**[1] below**).

**Before clicking "Save"**, make sure you are saving the "TableLook" file in the correct directory:

**On a mac**

, the "Looks" directory can be found in the "SPSS" folder (or PASWStatistics folder;

**[1] below**

) within "Applications" (

**[2] below**

).

**On a PC**, the "Looks" director can be found at **C:>Documents and Settings> Program Files>SPSS**

Once inside the "Looks" folder (**below**), you should see various other "TableLooks" files (the files end in ".stt"). If you see that, you know you are in the right folder. From here, check to make sure your "File Name" is what you want it to be and then click "Save" (**[1] below**).

After you've clicked "Save", you should find yourself back in the "TableLooks" dialogue screen (**below**). Also, you should now see a newly available "TableLook" in the "TableLook Files:" area (**[1] below**) (the one you saved above). Next, simply click on that to highlight it (**1] below**) and then click the "OK" button (**[2] below**).

After clicking "OK", the "TableLooks" screen should disappear and the initial table you created should again be visible, but its format should now reflect the changes we've made and it should more closely resemble APA format (**below**)!

While certainly you could choose to do all of those steps for every graph you produce from now until forever, that wouldn't seem to be a very efficient use of your time. Instead, let's change the default SPSS settings to **automatically** use our newly created "TableLook" for all tables that are created in the future.

**3) ADJUST SPSS TABLE "LOOK STYLE" SETTINGS (OPTIONS)**

To adjust the SPSS "TableLook" settings, go to "Options" (**[1] below**), which you'll find under the "Edit" menu.

With the "Options" dialogue screen now visible, select the "Pivot Tables" tab (**[1] below**). Next, select our newly created "Table Look" (I called mine "APA table"; **[2] below**).

**On a side note**, I'd also suggest changing the "Copying wide tables to the clipboard in rich text format" option (**[3] below**) to "Shrink width to fit". Making this change will prevent SPSS from wrapping tables that are too wide for your page to another row (making them appear as two tables, even though they are really just two parts of the same table). I personally find that very irritating. Instead, this will tell SPSS to adjust the width of the cells in the table so that the table can fit within the margins of the page.

**Finally**, click on the "Apply" button (**[4] below**), followed by the "OK" button (**[5] below**). You should now be done and all future graphs should be produced in APA format (or closer to it anyway). Happy table making!

RIGHT-CLICK HERE AND "SAVE AS FILE" TO DOWNLOAD THE STT "LOOKS" FILE

on 2011-06-08 20:45 by Jeremy Taylor

A few sharp readers have made a great point about this post: If you have a version of SPSS that is licensed by a University, the instructions may not work. Specifically, when you try to create a "new look", it will likely display an error message that says you don't have "access" to the directory (or something like that).

Thanks to one of our readers (Benjamin Telkamp), we have a solution! Benjamin discover that you can save the "new look" as one of the existing looks in SPSS (just pick one that you don't think you'll be needing). Thanks for the tip Benjamin!

]]>

*NOTE: This tutorial was created by Jared Knowles, doctoral candidate from the University of Wisconsin and owner of **Jared Knowles: From Data to Decisions**. Mr. Knowles is not affiliated with Stats Make Me Cry in any way and written consent was obtained before this tutorial was posted.*

Here is the handout associated with this presentation: CLICK HERE FOR HANDOUT

Here is the R Code associated with this presentation: CLICK HERE FOR R CODE

]]>*This video was created by Dr. Roger Peng, professor at the Johns Hopkins Bloomberg School of Public Health *and author at* Simply Statistics. Dr. Peng is not affiliated with Stats Make Me Cry in any way and written consent was obtained before this video was posted.*

To demonstrate this task I'm using one of the sample datasets that comes with SPSS named "demo_cs.sav". To start let's assume that we've already found an interaction effect **(see figure below)**. In this case, we've run a model in which income and gender are predictive of the price of one's vehicle. The figure below also shows us that income and gender interact to predict price of one's car (p<.001), so we have an effect to explore/plot!

The significant interaction term indicates that there is a moderating effect to explore graphically!

As you may or may not know, the above analysis can be run using either the GLM menu dialog or the regression dialog in SPSS. A key difference between the two is that you'll need to manually create the interaction term using the regression method, whereas the GLM will allow you to specify the interaction in the "Model..." dialog **(see 1 in figure below)**.

Click on the "Model..." button to specify main effects and interactions in a Univariate General Linear Model (GLM). Click on "Plots" to produce effect plots, but this only works for categorical/binary predictors (Fixed Factors). How do you do this when a predictor is continuous? Read on...

In the GLM dialog (above) you might've also noticed that there is a "Plots" button that you can click (see 2 in figure above), which seems promising, except you may be disappointed to find that it is only helpful if both predictors are binary or categorical (Fixed Factors in Univariate GLM). If either of the predictors in the interaction you wish to explore graphically are continuous (Covariate in Univariate GLM), then that predictor won't be available to create a plot in the "Plots" dialog (see figure below).

Only the "Fixed Factor(s)" predictor is available in the "Univariate: Profile Plots" dialogue

To obtain the plot you are seeking when one of your predictors is continuous (Covariate in Univariate GLM), you simply need to save your predicted values during analysis and plot them using "Graphs > Legacy Dialogs > Scatter/Dot...".

Let's walk through our example. Whether you used the GLM - Univariate analysis or the Regression - Linear analysis the first step is the same: return to your analysis dialog and click on the "Save..." button (GLM - Univariate example on left below, Regression-Linear example on right below).

Click "Save..." and then click on the "Unstandardized" box in the "Predicted Values" options.

Click "Save..." and then click on the "Unstandardized" box in the "Predicted Values" options.

After re-running your analysis while saving the predicted values, you will find a new variable in your dataset named "PRE_1" (which stands for "Predicted Values"; see figure below).

NOTE: each time you re-run this analysis a new version of this variable will be created with a new numeric postscript...PRE_2, PRE_3...etc). You will use this new variable to plot your effects.

Next navigate to "Graphs > Legacy Dialogs > Scatter/Dot..." (see figure below).

Graphs > Legacy Dialogs > Scatter/Dot...

Once in the "Scatter/Dot..." dialog, move the newly-created predicted values variable (PRE_1) to the Y-Axis (predicted value for price of car in our example), your continuous predictor to the X-Axis (income in our example) and your categorical variable (gender in our example) to the "Set Markers By" field (see figure below). When done (set "Titles" and change "Options" as desired), click "OK".

Plot "predicted values" from regression or Univariate GLM to explore interaction effects.

You now have your plot, but you'll probably notice immediately that you are missing your trend/regression lines to compare your effects **(see figure left below)**! We need to make some slight modifications here. To add these lines: double click on the plot in the output viewer (or right click and choose "Edit Content > In Separate Window"). Once your new plot editor window appears **(circled in figure center below)**, click on the "At Fit Line at Subgroups" button.

Upon clicking on the "At Fit Line at Subgroups" you should now see your trend lines for each group (i.e. moderator group; our example has only 2 groups, but this would work the same if there were more than 2 groups).

Initial plot from "Scatter/Dot..." dialog. We need to add trend/fit lines to make interpretable.

To add the trend/fit lines, click on the "At Fit Line at Subgroups" button (circled in red above).

To delete the R-squared text, simply click on the text and hit delete on your keyboard

Next, I like to remove the text that appears indicating an R-squared statistic for each group's trend line **(see figure center above)**. I don't find this enormously helpful, so typically delete it **(see figure right above)**. To delete the R-squared text, simply click on it to select (will be outlined in yellow when selected) and press the delete key on your keyboard **(see figure right above)**. You are now done editing your plot. Close your "Chart editor" dialog and your new plot should now be visible in your output viewer **(see figure below)**.

We see above that the interaction effect in this example is not large, as the difference in slopes between males and females is not huge, but according to the initial analysis the difference in slope is statistically significant.

If you wish to make the plot black and white, instead of color (differentiating groups by line type, per APA format), check out my blog on making SPSS automatically create APA formatted plots! Otherwise, thanks for reading and feel free to leave a comment/provide feedback below!

NOTE: observations are not typically as neatly placed along the fit lines as they are in our example above, but in this example from SPSS sample data the predictors explain over 90% of the variability in our dependent variable (very rare in real life analyses). Typically, you will see many observations straying away from the trend lines.

]]>download pdf

This is a fantastic resource for learning to run confirmatory factor analysis (CFA) models and structural equation models (SEM) in R using the lavaan package. The tutorial provides example models, includes example code, discusses multi-group analysis, and even references some advanced functions for producing path diagrams in R. As the presentation is a few years old, there may be some places where the code is out-of-date, but updates for any obsolete code can be found in the lavaan manual. If you are using R, you are probably accustomed to these code adjustments.

This tutorial was posted with the author's permission, so thank-you to Dr. Revelle. Find more great resources from Dr. Revelle at personality-project.org/r. To download a copy of R (for mac or PC) visit the CRAN project or visit my R video tutorial page to learn how to install R (on either a Mac or PC).

]]>For example, a survey measure of depression may include many questions that each measure various aspects of depression, such as:

Assuming the items are worded appropriately and asked of an appropriate sample, we would expect that each of these items would correlate with each of the other items, since they are all indicators depression (see correlation matrix below).

Internal Consistency Correlation Matrix and Cronbach's Alpha Example High.png

To the extent that this is true, internal consistency would be high, giving us confidence that our measure of depression is reliable (see alpha above, explanation of Cronbach's Alpha to come).

However, if an item is poorly worded or does not belong in there at all, the internal consistency of the scale could be threatened. For example, if we replaced the question about Lethargy in our measure of depression with the new question below, our internal consistency is likely to be threatened.

- Loss of interest in activities (X1)
- Negative Mood (X2)
- Weight Loss/Weight Gain (X3)
- Sleep Problems (X4)
**Number letters in your last name (Y1)**

Internal consistency is likely to be threatened because "Number of letters in your last name" is unlikely to be highly correlated with any of the other four items (see low correlation coefficients circled in image below), because it is not really an indicator of depression. Thus, replacing the "Lethargy" question with the "Number letters in your last name" question will lower internal consistency of our Depression scale and ultimately, lower the reliability of our measurement (see below, explanation of Cronbach's Alpha below).

Internal Consistency Correlation Matrix and Cronbach's Alpha Example Low.pngInternal consistency is typically measured using Cronbach's Alpha (α). Cronbach's Alpha ranges from 0 to 1, with higher values indicating greater internal consistency (and ultimately reliability). Common guidelines for evaluating Cronbach's Alpha are:

- .00 to .69 = Poor
- .70 to .79 = Fair
- .80 to .89 = Good
- .90 to .99 = Excellent/Strong

…if you get a value of 1.0 then you have "complete agreement" (i.e. redundancy) in your items, so you likely need to eliminate some. Items that are in perfect agreement with each other do not each uniquely contribute to the measurement in the construct they are intended to measure, so they should not both be included in the scale. Occasionally, you may also see a negative Cronbach's Alpha value, but this is usually indicative of a coding error, having too few people in your sample (relative to the number of items in your scale), or REALLY poor internal consistency.

If Cronbach's Alpha (i.e. *internal consistency*) is poor for your scale, there are a couple ways to improve it:

- Eliminate items that are poorly correlated with other items in your scale (i.e. "Number letters in your last name" item in previous example)
- Add highly reliable items to your scale (i.e. that correlate with existing items in your scale, but are not redundant with items already in your scale)

As always, I hope this is helpful and please let me know if you have questions in the comments! What stats terms do you find confusing?

]]>*The following is not a Stats Make Me Cry original, but rather something I came across and found very useful. The article demonstrates how to examine non-linear effects (e.g. quadratic effects) using a regression model in R. If you are interested in the topic, please read the preview and follow the link that follows to the original site.*

Part 3 we used the lm() command to perform least squares regressions. In Part 4 we will look at more advanced aspects of regression models and see what R has to offer. One way of checking for non-linearity in your data is to fit a polynomial model and check whether the polynomial model fits the data better than a linear model. Or you may wish to fit a quadratic or higher model because you have reason to believe that the relationship between the variables is inherently polynomial in nature...

]]>A scatterplot of these variables will often create a cone-like shape, as the scatter (or variability) of the dependent variable (DV) widens or narrows as the value of the independent variable (IV) increases. The inverse of heteroscedasticity is homoscedasticity, which indicates that a DV's variability is equal across values of an IV.

For example: annual income might be a heteroscedastic variable when predicted by age, because most teens aren't flying around in G6 jets that they bought from their own income. More commonly, teen workers earn close to the minimum wage, so there isn't a lot of variability during the teen years. However, as teens turn into 20-somethings, and 20-somethings into 30-somethings, some will tend to shoot-up the tax brackets, while others will increase more gradually (or perhaps not at all, unfortunately). Put simply, the gap between the "haves" and the "have-nots" is likely to widen with age.

If the above where true and I had a random sample of earners across all ages, a plot of the association between age and income would demonstrate heteroscedasticity, like this:

Plot No. 1 demonstrating heteroscedasticity (heteroskedasticity)

Plot No. 2 demonstrating heteroscedasticity (heteroskedasticity)

By the way, I have no real data behind this example; this is just a hypothetical situation, though it does seem logical.

Heteroscedasticity is most frequently discussed in terms of the assumption of parametric analyses (e.g. linear regression). More specifically, it is assumed that the error (a.k.a residual) of a regression model is homoscedastic across all values of the predicted value of the DV. Put more simply, a test of homoscedasticity of error terms determines whether a regression model's ability to predict a DV is consistent across all values of that DV. If a regression model is consistently accurate when it predicts low values of the DV, but highly inconsistent in accuracy when it predicts high values, then the results of that regression should not be trusted.

I want to re-iterate that the concern about heteroscedasticity, in the context of regression and other parametric analyses, is specifically related to error terms and NOT between two individual variables (as in the example of income and age). This is a common misconception, similar to the misconception about normality (IVs or DVs need not be normally distributed, as long as the residuals of the regression model are normally distributed). Now that you know what heteroscedasticity means, now try saying it five times fast!

I hope you found this helpful. What stats terms do you find confusing?

]]>Unfortunately, if you use SPSS you've probably already discovered that it produces graphics in color by default. Not to worry, your graphs can be changed easily. Better yet, you can make simple adjustments to your SPSS settings that will force the program to create APA-compliant (i.e. black & white) graphics in all output! Here is how you do it:

First, I'll show you how to change an individual chart (this works for a newly created chart or a chart saved in output that you created previously). In the screenshot below, you can see I'm creating a table that uses FAKE data about how many times each day people check Facebook. The line graph I am creating examines the number of Facebook checks per day by age group and by gender. Again, I want to be clear: **THIS IS NOT REAL DATA, IT IS 100% FAKE**.

To produce the initial graph I did the following:

**In the upper left screenshot**I am choosing a "Line" graph from the "Legacy Dialogs" menu.**In the upper right screenshot**I am choosing the "Multiple" and "Summaries for groups of cases" options in the "Line Charts" dialogue box, and then clicking the "Define" button.**In the lower left screenshot**I am choosing the variables I want to include in my graph (i.e. # of Facebook checks, gender, and age), and then clicking the "OK" button.**In the lower right screenshot**you can see the resulting graph that is produced in SPSS, which is in color by default.

Once the graph is produced in the SPSS output, hover your mouse pointer over the graph, and you should see a message pop-up that says "Double-click to activate" **(below, left)**. Go ahead and double-click. At that point, you should see something similar to what you see in the **screenshot below and on the right**. Next, click on the "Show Properties Window" icon **(circled in the screenshot below and on the right)**.

Once the "Properties" dialogue window appears (top-left screenshot below), use the mouse pointer to click on one of the lines in the bar graph (which will highlight both lines in the graph). When the lines are highlighted, you should also notice that more tabs are now available in the "Properties" dialogue window (top-right screenshot below), including the "Variables" tab.

Next, click on the "Variables" tab, bringing it to the foreground (bottom-left screenshot below), and then use the drop-down box to change the Group Style" from "Color" to "Dash" (bottom-right screenshot below).

Next, click the "Apply" button and you should see the lines change from color to black solid lines and dashes **(below, left)**. On a side note: you can also change which dash/dot patterns are used, but clicking on one of the lines and then clicking on the "Lines" tab in the "Properties" dialogue window **(below, right)**, and changing the option in the drop-down menu.

The great news is that we can make SPSS do this process for us automatically, for every graphic! To accomplish this, navigate to the SPSS "Options" menu, which can be found in the "Edit" menu **(top-left screenshot below)**. Next, click on the "Charts" tab in the large dialogue window that appears **(top-right screenshot below)** and change the "Style cycle preference" from "Cycle through color only" to "Cycle through patterns only". Next, change the "Font" from "SansSerif" to "Times New Roman" **(bottom-left screenshot below)**.

Optionally, you can also adjust the order in which the dash/dot patterns are cycled through groups/categories by clicking on "Lines" in the "Style Cycles" section **(circled in the bottom-left screenshot below)** and adjusting the order as desired **(circled in the bottom-right screenshot below)**. Finally, hit "Apply" and "OK", to close the large dialogue window.

You should be all set! You may need to restart SPSS for the changes to the settings to take effect, but they should be perminant otherwise. You should now see an APA-compliant table in your output now **(screenshot below)**!

Structural equation modeling (SEM) is a complex beast, and can be quite intimidating to someone trying to learn

the basics. Fortunately, there are some great resources out there for learning! Unfortunately, I think a lot of beginners don't know what those great resources are, or where to find them.

One example of a wonderful, but I fear under-used, resource are the SEM tutorial videos created by the AMOS Development Team. The AMOS Development Team tutorial video page features 17 different videos, covering a variety of topics, including (but not limited to): estimating indirect effects, fitting growth curve models, fitting models with categorical and ordinal variables, working with censored data, Bayesian estimation, and mixture modeling/latent class analysis.

If learning the mechanics of running an SEM model seems like putting the "cart before the horse" to you, because you are still trying to grasp SEM on a conceptual level, check out Principles and Practice of Structural Equation Modeling by Rex B. Kline, PhD. Dr. Kline's book is well written and easy to understand (relative to the topic).

Page for SEM tutorial videos created by the AMOS Development Team:

http://www.amosdevelopment.com/video/index.htm

Guilford Publishing's Principles and Practice of Structural Equation Modeling page:

http://www.guilford.com/pr/kline.htm

Amazon's Principles and Practice of Structural Equation Modeling page:

http://www.amazon.com/Principles-Practice-Structural-Equation-Methodology/dp/1606238760

]]>*It is quite common in political science for researchers to run statistical models, find that a coefficient for a variable is not statistically significant, and then claim that the variable "has no effect." This is equivalent to proposing a research hypothesis, failing to reject the null, and then claiming that the null hypothesis is true (or discussing results as though the null hypothesis is true). This is a terrible idea. Even if you believe the null, you shouldn't use p > 0.05 as evidence for your claim. In this post, I illustrate why.To demonstrate why analysts should not conclude "no effect" from insignificant coefficients, I return to a debate waged over blogs and Twitter about a NYT article. See Seth Masket's original take, my response, and Seth's recasting. The data come from Nate Silver's post, which adopts a more nuanced position that I think is appropriate in light of the data. *

RIGHT-CLICK HERE AND "SAVE AS FILE" FOR SAMPLE DATA

**Let's get started:**

Repeated-Measures MANCOVA is used to examine how a dependent variable (DV) varies over time, using multiple measurements of that variable, with each measurement separated by a given period of time. In addition to determining whether the DV itself varies, a MANCOVA can also determine wether other variables are predictive of variability in the DV over time. If that wasn't crystal clear, don't worry, just keep reading.

**Repeated-Measures MANCOVA Example:**

In our example, your local stats store *Stats "R" Us *launched a marketing campaign, with three different strategies (**variable name:**

*promo***; value labels: Strategy A, Strategy B, Strategy C**). *Stats "R" Us *launched campaigns in markets of three different sizes (**variable name: ***mktsize***; value labels: Small, Medium, and Large**), and measured the sales in each store every three months over the course of one year (4 time points; **variable names: ***sales.1***, ***sales.2***, ***sales.3***, and **** sales.4**;

*NOTE: Sales are scaled in "thousands" (e.g. 70.63 is actually $70,630). Also, your data should be in person-level (a.k.a. "wide") format (as opposed to person-period, a.k.a. "long", format), meaning each row of data is a single case (store, in our example). If it were in person-period (long) format, each case (store) would have the number of rows equal to the number of repeated measures (four, in our example), because the repeated measures (sales.1, sales.2, sales.3, and sales.4) would be stacked to form a single variable (Sales). Here is a useful resource for converting data between the two forms: CLICK HERE FOR INFO ABOUT CONVERTING DATA FORMS.*

To begin your analysis using the SPSS drop-down menus, click on: **Analyze > General Linear Model > Repeated Measures... (1, below)**

In the *Repeated Measures Define Factors* dialogue window, do the following:

- Replace the default
*Within-Subject Factor Name*, which is*factor1*, with your own name for the concept of time. I've chosen to use the name*Time*(**1, below**). - Type the number of times your DV was measured (how many DV variables you have) in the
*Number of Levels*box (**2, below**) and click the "Add" button. - Choose a name for your DV (the variable that is measured repeatedly), and type it in the
*Measure Name*box. I chose the name*Sales*(**3, below**). - Click the "Add" button again (
**4, below**). - Click the "Define" button (
**5, below**).

In the *Repeated Measures* dialogue window that appears next, move the four sales variables (**1, below**) to the *Within-Subjects Variables (Time)* box (**2, below**).

*NOTE: Be sure that they stay in the same order (Sales.1, Sales.2, Sales.3, and Sales.4).*

Next, move both *promo* and *mktsize* to the *Between-Subjects Factor(s):* box (**1, below**).

*NOTE: Both promo and mktsize were placed into the Between-Subjects Factor(s) box because they are categorical variables (discrete variables). Continuous variables (scale variables) would go into the Covariates box ( 2, below).*

Next, click on the "Model" button (**3, above**).

In the *Repeated Measures: Model* dialogue window (**1, below**), you can specify your model. In other words, you can choose which variables have "main effects" on the DV (individual predictors), and which variables might interact with each other to predict the DV. The default option is the *Full factorial* (**2, below**), which will examine every variable's main effect, as well as every possible interaction among all variables.

We'll stick with *Full factorial* for today. However, if you wanted build your own model, you can choose *Custom* (**1, below**), and then use the *Build Term(s)* tool (**2, below**) to specify what kind of effects/interactions you want. Again, in this example, we'll stick with *Full factorial* (**2, above**). To exit this dialogue window, click the "Continue" button.

Next, you'll need to click on the "Contrasts" button (**1, below**). In the *Repeated Measures: Contrasts* dialogue window that appears, you can change each factor variable's type of contrast. I recommend leaving the *Time* variable with its default contrast "Polynomial" (**2, below**), and changing both * promo* and

Next, click on the "Plots" button (**1, below**). In the *Repeated Measures: Profile Plots* dialogue window that appears (**2, below**), you can choose what graphs you'd like to see. In repeated measures models, I like to produce plots with *Time* on the *Horizontal Axis* (x-axis; **3, below**) and my factor variables as *Separate Lines* (**4, below**).

*NOTE: The reason you don't see anywhere to specify the vertical axis (y-axis), is that the DV (i.e. Sales) is assumed to be on the y-axis in this dialogue window. *

As you can see, in our example I've made a *Time-by-Factor* plot for each of the factors in our model (* promo* and

If you'd like to get *Post Hoc* comparisons of the DV (comparing between each of the factor levels, respectively), click on the "Post Hoc" button. Once in the dialogue window:

- Move the factors from the
*Factor(s)*box (**1, below**) to the*Post Hoc Tests*box (**2, below**). - Choose the type of Post Hoc test to use, and place a check-mark in its respective box (you can choose more than one). The most commonly used is
*Tukey's*, which I've chosen below (**3, below**). - Click the "Continue" button.

I also recommend clicking on the "Save" button (**1, below**), and choosing *Predicted Values:Unstandardized* (**2, below**) and *Residuals: Unstandardized* (**3, below**) in the *Repeated Measures: Save* dialogue window.

*NOTE: By checking these two boxes, your analysis will now produce two new variables in your dataset, called PRED_1 (Predicted) and RES_1 (Residual), which can be used to produce graphs after analysis (if you choose). We will not cover this in this tutorial.*

Back at the main *Repeated Measures* dialogue, you can either click "OK" (**1, below**), to execute the analysis, or you can click "Paste" (**2, below**), to paste the analysis commands into a syntax window. I recommend choose the "Paste" option, as that will allow you to more easily re-create the analysis later.

Below is the syntax window, with the various commands of the analysis, which you specified while going through the dialogue windows. Over time, you may learn to use syntax exclusively, bypassing the need to use the dialogue windows. Learning syntax can dramatically improve your efficiency, especially when you need to create a lot of different types and/or iterations of analyses.

SPSS Repeated Measures MANCOVA Syntax WindowTo execute the commands in the syntax, simply highlight all the text you want to run, and push the green play button (**1, above**). Alternatively, you can use the menus: **Run > Selection**.

**Interpreting Output/Results**

There is a lot to digest in the output file that results from an analysis, so we'll stick to the basics. Below is the *Descriptive Statistics* table, which simply shows the Mean (**1, below**), Standard Deviation (Std. Deviation; **2, below**), and sample size (N; **3, below**) for each DV, broken-down by all subgroups of your factors (*promo* and *mktsize*).

The next table we'll examine is the *Mauchly's Test of Sphericity*. This test essentially determines whether the variance of the difference between each pair of repeated measure (of your DV) is approximately equal. This is a bit of an over-simplification, but it'll work here. For our purposes, we just need to be concerned with whether it is significant or not (*1, below*). If it is NOT significant (i.e. *Sig.* is greater than .05), then sphericity can be assumed (more on that soon). If it IS significant (i.e. *Sig.* is less than .05), then sphericity can not be assumed (more about why we care, in a moment).

*NOTE: I know I said earlier that we wouldn't deal with assumptions today, but this is an exception, because it directly determines how we interpret the next table...*

In our example, we CAN NOT assume sphericity (p=.003).

SPSS Repeated Measures MANCOVA Mauchly's Table ImageThe reason we want to note whether sphericity can be assumed, is that it directly determines how we interpret our next table, the *Tests of Within-Subjects Effects* table (**1, below**). For each effect in our model, there are four estimates present (**2, below**). If sphericity CAN be assumed, then we can reference the first estimate, aptly labeled *Sphericity Assumed*. If sphericity CAN NOT be assumed, then we'll want to reference one of the other three (the differences between them is somewhat esoteric, but I typically choose *Greenhouse-Geisser*). In either case, we reference the *Sig.* column (**3, below**) to determine whether our effects are significant.

In our example, we see that we had no significant effects. Since we could NOT assume sphericity, the *Greenhouse-Geisser* test tells us that *Time* was not a significant predictor of Sales (i.e. there was no overall positive or negative trend in *Sales* in the company as a whole), F(2.743, 340.097)=.743, p=.516, ηp2=.006.

We also see that neither *promo* F(5.485, 340.097)=.660, p=.668, ηp2=.011, nor *mktsize* F(5.485, 340.097)=1.048, p=.391, ηp2=.017 interacted with *Time* to predict trends in *Sales*. Additionally, there was also no significant three-way interaction between *Time*, *promo*, and *mktsize* F(10.971, 340.097)=.940, p=.502, ηp2=.029. Take note of how I report those statistics , as it is necessary for APA format.

The next table was produced because we chose the "Polynomial" contrast for *Time* earlier. It is very useful in case non-linear relationships exist in your data. More specifically, it determines whether there is a *Linear* or a non-linear relationship exists, such as *Quadratic* or *Cubic* (**1, below**). The more nuanced differences between these effects is beyond the scope of this blog, but Notre Dame's Dr. Richard Williams explains it well in his page on Non-linear Relationships.

The *Tests of Between-Subjects Effects* table (**below**) shows whether the factors were associated with differences in *Sales* (overall, as opposed to whether there were differences in trends).

Results indicate that both *promo* F(2, 124)=12.837, p<.001, ηp2=.172 and *mktsize* F(2, 124)=15.085, p<.001, ηp2=.196 were predictive of differences in Sales (overall), while the interaction between the two was not significant F(4, 124)=.186, p=.945, ηp2=.006. These results may seem a bit confusing, because they are in direct contrast to the within-subject effects reported earlier, but it will become more clear when we examine the plots of the effects next.

The plot below shows the mean *Sales* at each of the four data collections, for stores using each of the three promotional *Strategies* (three lines). The graph demonstrates that there are distinctions between sales numbers of the three strategy groups, as (*Strategy A* was highest at every time point and *Strategy B* was lowest at every time point). However, since the trend for each group (if you were to impose a trendline across the four points for each group) is not dramatically different (and because the interaction term was not significant), we can't clearly say that one promotional strategy is superior to the others.

Since, the differences between groups at time pionts 2, 3, and 4 are largely reflective of the differencs that existed at baseline (time 1), it seems that differences that exist between groups are more likely attributed to differences in the composition of the groups, rather than differences in the promotional strategy. The graph for *mktsize* can be interpreted in the same way as *promo*.

This graph further shows how it is better to examine within subject differences when analyzing change over time, as plotting those effects makes the lack of differences in trend between *promo* groups more clear. Thanks for reading and please leave comments and/or questions!

This video tutorial demonstrates how to import data into R that is currently in SPSS format. The video also shows how to do use a few basic commands on datasets, once they are imported into R. The steps in this video apply whether you are using a Mac or a PC/Windows machine.

]]>This video shows how to obtain and install R on the Windows (PC) platform. It also shows a few basic functions in R, such as how to install packages in R and load them for use.

]]>This video shows how to obtain and install R on the Mac OS X platform. It also shows a few basic functions in R, such as how to install packages in R and load them for use. A PC version is here: How to Install R for Windows

]]>Unfortunately, reality inevitably sets-in in the form of red-line-filled draft that features more recommendations than the surgeon general and more added work than you ever imagined would be forthcoming. The feeling can be both crushing and disheartening. The good news is: the pain can be avoided, or at least reduced to a minor setback.

The key to limiting the frustration inherent in this process is: **TURN YOUR DRAFT IN!** Relinquish your dreams of producing a perfect draft and turn-in the draft you have. The sooner you get a draft on your advisor's desk, the sooner you can receive their valuable feedback, allowing you to spend more time integrating it and less time on details that they may (or may not) ultimately approve of.

Please don't misinterpret what I'm saying here: I'm not suggesting that you should turn-in work that is blatantly of sub-standard quality. Certainly an advisor's time is valuable and not to be wasted. However, I am suggesting that you let go of the fantasy that your first draft will be perfect. When you feel that you've made an honest effort to move your manuscript forward, turn in a draft and allow your advisor to direct you to where things can be improved. The reality is that many details that are agonized over in early drafts are made irrelevant by recommended cuts and revisions by our advisor.

As a warning, the concepts I'm discussing here will feel unnatural to many readers. In fact, most doctoral candidates are high-achieving by nature, so striving for anything less than perfection may feel uncomfortable, or even down-right wrong. However, you'll be happy you heeded this advise when you receive your advisor's feedback and know that you are moving closer to your degree and saved yourself a lot of agonizing and wasted energy along the way.

**Understand that major feedback is unquestionably coming and try to get your hands on that feedback sooner, rather than later.** The alternative is spending day, weeks, or even months longer on a draft, only to receive a very similar amount of feedback, with a whole lot more frustration.