The core component of all four of these analyses (ANOVA, ANCOVA, MANOVA, AND MANCOVA) is the first in the list, the ANOVA. An "Analysis of Variance" (ANOVA) tests three or more groups for mean differences based on a continuous (i.e. scale or interval) response variable (a.k.a. dependent variable). The term "factor" refers to the variable that distinguishes this group membership. Race, level of education, and treatment condition are examples of factors.

There are two main types of ANOVA: (1) "one-way" ANOVA compares levels (i.e. groups) of a single factor based on single continuous response variable (e.g. comparing test score by 'level of education') and (2) a "two-way" ANOVA compares levels of two or more factors for mean differences on a single continuous response variable(e.g. comparing test score by both 'level of education' and 'zodiac sign'). In practice, you will see one-way ANOVAs more often and when the term ANOVA is generically used, it often refers to a one-way ANOVA. Henceforth in this blog entry, I use the term ANOVA to refer to the one-way flavor.

One-way ANOVA has one continuous response variable (e.g. Test Score) compared by three or more levels of a factor variable (e.g. Level of Education).

Two-way ANOVA has one continuous response variable (e.g. Test Score) compared by more than one factor variable (e.g. Level of Education and Zodiac Sign).

ALSO CHECK OUT: Wikieducator has a nice set of slides explaining the distinctions between one-way and two-way ANOVA |

The obvious difference between ANOVA and ANCOVA is the the letter "C", which stands for 'covariance'. Like ANOVA, "Analysis of Covariance" (ANCOVA) has a single continuous response variable. Unlike ANOVA, ANCOVA compares a response variable by both a factor and a continuous independent variable (e.g. comparing test score by both 'level of education' and 'number of hours spent studying'). The term for the continuous independent variable (IV) used in ANCOVA is "covariate".

ANCOVA compares a continuous response variable (e.g. Test Score) by levels of a factor variable (e.g. Level of Education), controlling for a continuous covariate (e.g. Number of Hours Spent Studying).

ANCOVA is also commonly used to describe analyses with a single response variable, continuous IVs, and no factors. Such an analysis is also known as a regression. In fact, you can get almost identical results in SPSS by conducting this analysis using either the "Analyze > Regression > Linear" dialog menus or the "Analze > General Linear Model (GLM) > Univariate" dialog menus.

A key (but not only) difference in these methods is that you get slightly different output tables. Also, regression requires that user dummy code factors, while GLM handles dummy coding through the "contrasts" option. The linear regression command in SPSS also allows for variable entry in hierarchical blocks (i.e. stages).

The obvious difference between ANOVA and a "Multivariate Analysis of Variance" (MANOVA) is the “M”, which stands for multivariate. In basic terms, A MANOVA is an ANOVA with two or more continuous response variables. Like ANOVA, MANOVA has both a one-way flavor and a two-way flavor. The number of factor variables involved distinguish a one-way MANOVA from a two-way MANOVA.

One-way MANOVA compares two or more continuous response variables (e.g. Test Score and Annual Income) by a single factor variable (e.g. Level of Education).

Two-way MANOVA compares two or more continuous response variables (e.g. Test Score and Annual Income) by two or more factor variables (e.g. Level of Education and Zodiac Sign).

When comparing two or more continuous response variables by a single factor, a one-way MANOVA is appropriate (e.g. comparing ‘test score’ and ‘annual income’ together by ‘level of education’). A two-way MANOVA also entails two or more continuous response variables, but compares them by at least two factors (e.g. comparing ‘test score’ and ‘annual income’ together by both ‘level of education’ and ‘zodiac sign’).

A more subtle way that MANOVA differs from ANOVA is that MANOVA compares levels of a factor that has only two levels (a.k.a. binary). When dealing with a single response variable and binary factor (e.g. gender), one uses an independent sample t-test. However, a t-test can not estimate differences for more than one response variable together, thus a MANOVA fills that need.

Like ANOVA and ANCOVA, the main difference between MANOVA and MANCOVA is the “C,” which again stands for “covariance.” Both a MANOVA and MANCOVA feature two or more response variables, but the key difference between the two is the nature of the IVs. While a MANOVA can include only factors, an analysis evolves from MANOVA to MANCOVA when one or more more covariates are added to the mix.

MANCOVA compares two or more continuous response variables (e.g. Test Scores and Annual Income) by levels of a factor variable (e.g. Level of Education), controlling for a covariate (e.g. Number of Hours Spent Studying).

SPSS NOTE: When running either a MANOVA or MANCOVA, SPSS produces tables that show whether response variables (on the whole) vary by levels of your factor(s). SPSS also produces a table that presents follow-up univariate analyses (i.e. one response variable at a time - ANOVA/ANCOVA). This table shows which response variables in particular vary by level of the factors tested. In most cases, we are only concerned with this table when we find significant differences in the initial multivariate (a.k.a. omnibus) test. In other words, we first determine if our set of response variables differ by levels of our factor(s) and then explore which are driving any significant differences we find.

]]>Imagine we collected a score from every person in your town that measured how much they wanted ice cream at the particular moment of data collection (let's say scores could range from 1 to 100, with 100 meaning REALLY WANT ice cream). Further, let's pretend we did this once a day for 5 days. Our within-subject effect would be a measure of how much individuals in our sample tended to change on their wanting of ice cream over the five days.

Each colored line represents individuals' trend line for change over time in liking of ice cream (each person's within-subject effect). The black line represents the average of the individual trend lines in the sample (which is the sample's within-subject effect).

Between-persons (or between-subjects) effects, by contrast, examine differences between individuals. This can be between groups of cases when the independent variable (IV) is categorical or between individuals when the (IV) is continuous. These type of effects can be observed in either the univariate context or the multivariate context (including repeated measures). Either way, between-subjects effects determine if respondents differ on the dependent variable (DV), depending on their group (males vs. females, young vs. old…etc) or depending on their score on a particular continuous IV.

For example, let's return to our ice cream anecdote. If we want to test whether respondents are more likely to want ice cream if they score highly on an IQ test, we are testing for between-subjects effects. In this example, we are seeing if differences between persons with different IQs also have correspondingly different scores for "wanting ice cream". If course, the correct answer here is obviously yes.

Townspeople with a higher IQ tended to like ice cream more than those with a comparatively lower IQ, as the blue trend line shows. This is a between-subjects effect - it is comparing "ice cream liking" between people with various levels of intelligence (IQ).

**Editorial Note:*** Stats Make Me Cry is owned and operated by Dr. Jeremy J. Taylor. The site offers many free statistical resources (e.g. a blog, SPSS video tutorials, and R video tutorials), as well as fee-based statistical consulting and dissertation consulting services to individuals from a variety of disciplines all over the world.*

APA Format Table Example Before and After

Pictured (**above**) are examples of standard SPSS tables (**left**) and tables produced in SPSS after a few adjustments (**right**) to the settings. The table on the right more closely aligns with APA format than the table on the left in several ways:

- The title has been changed from center justified and
**bold**to left justified,*italics*, and NOT bold (**[1] above-right;**APA format).

- The table borders have been adjusted appropriately (details of specific changes to follow shortly).

- The default font type and size has been changed to Times New Roman 12pt.

The adjustments to SPSS that are needed to produce tables like the ones on the right are only necessary to be made once, after which the adjustments are made automatically by SPSS and you'll find all of your future tables are ready for insertion into your APA manuscript immediately after analysis. The necessary changes can be accomplished in 3 steps:

- Produce an initial table for alteration (using any analysis; a simple frequency table is sufficient).

- Create a custom "Table Look Style", by "Editing" the initial table's "Look Style" and saving the changes as a custom "style" ("APA Table" seems like a reasonable choice).

- Adjust your SPSS settings (options) so that SPSS recognizes your newly created "Look Style" as the default table "Look Style".

From there, you can simply run your analyses as you typically would and your tables should be formatted in APA format. Let's get into the specifics about how to accomplish these three steps...

**1) PRODUCE INITIAL TABLE**

The first step to make your SPSS adjustment is to produce an initial table for editing. For our purposes, a simple frequency does the trick (in the SPSS drop-down menus, navigate to: Analyze>descriptives>frequencies). Once your table is produced (

**below**

), right click on the table and click on "Edit Content" and then either "In Viewer" or "In Separate Window" (it doesn't really matter which you choose, for our purposes).

SPSS Edit Content Menu

Once your table is in "editing" mode (**below**), right click again and click on "TableLooks..."

Next, the "TableLooks" screen (**below**) should pop-up.

Under "TableLooks Files:", change the selection "CompactAcademicTimesRoman" (**[1]****below**).

While simply making that switch gets us a lot closer to APA format than the "default" SPSS table, we can improve the settings to get us much closer with a few additional changes.

NOTE: "CompactAcademicTimesRoman" is the closest "TableLook" to APA on its own, but luckily we can alter its attributes and save the changes!

Once you've clicked on "CompactAcademicTimesRoman", click on the "Edit Look..." button (**[2] above**). After clicking on "Edit Look...", the "Table Properties" screen should pop-up (**below**).

Within the "Table Properties" screen, we are going to adjust elements of both the "Cell Formats" tab (**above)** and the "Borders" tab (**[1] below**).

First, the "Cell Formats" tab (**above)**:

On the "Cell Formats" screen, you are able to adjust: the tables "Text" (font), the "Alignment" (justifications) of the text, the background color (which we will not be adjusting), and the "Inner Margins". We will only be changing the "Text" and "Alignment" settings. We'll deal with the "Text" first.

The default of all text in SPSS tables is 8 pt (**[4] above**), while the appropriate APA format font is 12 point, so the first thing we'll need to to is change all of the text in the table from 8 pt (**[4] above**) to 12 pt.

**Unfortunately, you are required to change each text element separately**by either clicking on the element in the "Sample" table on the right side of the screen (**[1] above**), or by selecting different elements in the "Area" drop-down menu (**[2]****above****)**.

**For example**

, click on the "

**Table Title**

" (

**[3] **

**above**

)

in the "Sample" table to edit that element. After clicking on the element, simply adjust the attributes on the left side of th screen

**NOTE**

: to comply with APA format for table titles, change your font size from 8 pt. to 12 pt. (

**[4] **

**above**

), make it italics and not bold (

**[5] **

**above)**

, and click on "Left Alignment" (

**[6] **

**above**

**]**

)

Next, switch to the "Borders" tab (**[1] below).**

Once in the "Borders" tab, there are three elements that we are going to adjust:

- Top inner frame (
**[2] above**)

- Bottom inner frame

- Data area top

To adjust the "Top inner frame", highlight it in the Border menu section (

**[1] below**

). Next, click on the "Style" drop-down menu (

**[2] below**

) and change the style from the double line (not APA format) to the single thin line (

**[3] below**

; second from the bottom; complies with APA format).

SPSS TableLooks Screen Example Cell Table Properties: Top Inner FrameNext, repeat the style adjustment for the "Bottom inner frame" (**[1] below**).

Again, repeat the style adjustment for the "Data area top" (**[1] below**).

Next, click the "Apply" button (**[2] above**), followed by the "OK" button (**[3] above**).

**2) CREATE CUSTOM TABLE LOOK STYLE**

After clicking the "OK" button, you should find yourself back at the "TableLooks" screen (**[1] below**). On this screen click on "Save As" (**[2] below**).

In the "Save As" dialogue screen (**below), **give your newly create table "Look" a name, preferably something self-explanatory and easy to remember. As you can see, I choose to call it "APA Table"(**[1] below**).

**Before clicking "Save"**, make sure you are saving the "TableLook" file in the correct directory:

**On a mac**

, the "Looks" directory can be found in the "SPSS" folder (or PASWStatistics folder;

**[1] below**

) within "Applications" (

**[2] below**

).

**On a PC**, the "Looks" director can be found at **C:>Documents and Settings> Program Files>SPSS**

Once inside the "Looks" folder (**below**), you should see various other "TableLooks" files (the files end in ".stt"). If you see that, you know you are in the right folder. From here, check to make sure your "File Name" is what you want it to be and then click "Save" (**[1] below**).

After you've clicked "Save", you should find yourself back in the "TableLooks" dialogue screen (**below**). Also, you should now see a newly available "TableLook" in the "TableLook Files:" area (**[1] below**) (the one you saved above). Next, simply click on that to highlight it (**1] below**) and then click the "OK" button (**[2] below**).

After clicking "OK", the "TableLooks" screen should disappear and the initial table you created should again be visible, but its format should now reflect the changes we've made and it should more closely resemble APA format (**below**)!

While certainly you could choose to do all of those steps for every graph you produce from now until forever, that wouldn't seem to be a very efficient use of your time. Instead, let's change the default SPSS settings to **automatically** use our newly created "TableLook" for all tables that are created in the future.

**3) ADJUST SPSS TABLE "LOOK STYLE" SETTINGS (OPTIONS)**

To adjust the SPSS "TableLook" settings, go to "Options" (**[1] below**), which you'll find under the "Edit" menu.

With the "Options" dialogue screen now visible, select the "Pivot Tables" tab (**[1] below**). Next, select our newly created "Table Look" (I called mine "APA table"; **[2] below**).

**On a side note**, I'd also suggest changing the "Copying wide tables to the clipboard in rich text format" option (**[3] below**) to "Shrink width to fit". Making this change will prevent SPSS from wrapping tables that are too wide for your page to another row (making them appear as two tables, even though they are really just two parts of the same table). I personally find that very irritating. Instead, this will tell SPSS to adjust the width of the cells in the table so that the table can fit within the margins of the page.

**Finally**, click on the "Apply" button (**[4] below**), followed by the "OK" button (**[5] below**). You should now be done and all future graphs should be produced in APA format (or closer to it anyway). Happy table making!

RIGHT-CLICK HERE AND "SAVE AS FILE" TO DOWNLOAD THE STT "LOOKS" FILE

on 2011-06-08 20:45 by Jeremy Taylor

A few sharp readers have made a great point about this post: If you have a version of SPSS that is licensed by a University, the instructions may not work. Specifically, when you try to create a "new look", it will likely display an error message that says you don't have "access" to the directory (or something like that).

Thanks to one of our readers (Benjamin Telkamp), we have a solution! Benjamin discover that you can save the "new look" as one of the existing looks in SPSS (just pick one that you don't think you'll be needing). Thanks for the tip Benjamin!

]]>

Reproducible research (RR) provides a road map through a study. In other words, conducting RR means engaging in a transparent analytic process. This prospect can be both exciting and terrifying.

Cartoon by Sidney Harris (The New Yorker)

Many of us harbor a burning fear that we will be "found out" as someone trying to "fake it until we make it." No matter how many goals we reach or degrees we earn, a hint of doubt remains. This doubt can keep us humble, vigilant, and hungry for professional growth. For some, a potent mix of doubt and fear is fuel for ambition. Unfortunately, this same combination can also make RR a scary concept.

The RR movement champions accuracy and clarity over "progress" and polish. Unfortunately, for too long much of the research world prioritized the latter over the former. According to a 2013 Economist article, replication studies show that as few as 11 to 25% of selected published biomedical findings held-up when re-tested. Similar doubts were cast on psychological priming research. Though explanations for these findings may be many, only uncertainty and doubt are created when science is a black box.

We must learn from these revelations. Though intimidating, RR is a critical step toward restoring confidence in the research community. As colleagues, we must reward transparency with constructive feedback and support. As researchers, we must develop thick skin when feedback is critical. For RR to gain momentum, we must find a delicate balance between constructive feedback and mutual respect for the scientific process. That process is rooted in replication and falsification. Researchers must be free to champion transparency without fear of ridicule or embarrassment. After all, every scientific discovery stands on the shoulders of many disproven theories.

There are many excellent sources of information available. For those looking to educate themselves on this topic, I will post some useful resources below. Promoting RR can be as simple as making your analyses' syntax (i.e. code) publicly available (e.g. in a linked footnote). When possible, making your data available along with your code verifies both your process and results. There are many ways one can promote RR, but the most important step is to educate yourself. Make it a priority to learn what RR is, how it works, and how you can apply it to your own work. In addition to some valuable resources, I've attempted to explain a few key terms below (some used in this blog and some not). In the open-source spirit, please share any other RR terms you'd like discussed. As my work is iterative and undoubtedly flawed, please also let me know if I've gone astray with my explanations below!

**Reproducible Research** - research that is conducted with systematic procedures and documentation, such that it can be reproduced either by the original researcher or by an independent party, for the purposes of verifying the study's results and/or conclusions. According to CRAN, the goal of reproducible research "is to tie specific instructions to data analysis and experimental data so that scholarship can be recreated, better understood and verified."

**Minimum-Threshold Journals** - journals that seek to publish as much quality science as possible, regardless of "statistical significance." These journals' inclusion criteria is focused on a studies' methodological rigor, instead of whether results are statistically significant. Despite their inclusive intentions, up to 50% of submitted articles are rejected because of poor methodological rigor, so perhaps their threshold is more accurately characterized as "re-calibrated" rather than "minimum." PLoS ONE is a prominent example of a "minimum-threshold journal."

**Literate Programming** - a method of programming (often statistical analysis) that combines typical programming code, graphics, and natural language into a document that is more easily understood by individuals other than the programmer. The product of literate programming is often an html page that presents an analytic process through a combination of natural language narrative, programming code snippets, and table or graphical output of analysis results. Well produced literate programming allows a reader to completely reproduce an analysis as they read along, using embedded links to download data and code snippets to produce the results presented in the document.

CRAN Task View: Reproducible Research. *CRAN R-Project*, 2014. Web. 20 Jan. 2014. http://cran.r-project.org/web/views/ReproducibleResearch.html

Knuth, Donald E. "Literate programming." *The Computer Journal, * 27.2 (1984): 97-111.

Trouble at the lab. *The Economist*, 2013. Web. 19, Oct. 2013. http://www.economist.com/news/briefing/21588057-scientists-think-science-self-correcting-alarming-degree-it-not-trouble

Yihui Xie (2013). knitr: A general-purpose package for dynamic report generation in R. R package version 1.5. http://cran.r-project.org/web/packages/knitr/index.html

**Reproducible Research.net** - a repository of RR resources and links. See more at: http://reproducibleresearch.net/index.php/Main_Page

**The Comprehensive R Archive Network (CRAN) RR Page** - resources for RR with R, which is an open-source statistical programming software platform. See more at: http://cran.r-project.org/web/views/ReproducibleResearch.html

**Johns Hopkins Free Online RR Course (with video) - **Learn the concepts and tools behind reporting modern data analyses in a reproducible manner. This is the fifth course in the Johns Hopkins Data Science Specialization. See more at: https://www.coursera.org/course/repdata

**Johns Hopkins Free Online Courses (with video) on Data Science - **A set of 9 free online courses in data science. Topics include: 1) The Data Scientist’s Toolbox - Get yourself set up, 2) R programming - Learn to code, 3) Getting and Cleaning Data - You need data. Get some. 4) Exploratory Data Analysis - What’s that in my data? 5) Reproducible Research - Did you do what you think you did? 6) Statistical Inference - You don’t have infinite money. Try sampling. 7) Regression Models - The duct tape of data science. 8) Practical Machine Learning - Predict the future with data. Easy. 9) Developing Data Products - There better be an app for that data.

See more at: http://jhudatascience.org

**knitr: Elegant, flexible and fast dynamic report generation with R** - The knitr package was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver + animation::saveLatex + R2HTML::RweaveHTML + highlight::HighlightWeaveLatex + 0.2 * brew + 0.1 * SweaveListingUtils + more). See more at: http://yihui.name/knitr/

**The Sweave Homepage - **Sweave is a tool that allows to embed the R code for complete data analyses in latex documents. The purpose is to create dynamic reports, which can be updated automatically if data or analysis change. Instead of inserting a prefabricated graph or table into the report, the master document contains the R code necessary to obtain it. When run through R, all data analysis output (tables, graphs, etc.) is created on the fly and inserted into a final latex document. The report can be automatically updated if data or analysis change, which allows for truly reproducible research. See more at: http://www.stat.uni-muenchen.de/~leisch/Sweave/

**Science Exchange** - is a marketplace for scientific collaboration, where researchers can order experiments from the world's best labs and/or have their own research replicated. See more at: https://www.scienceexchange.com

*NOTE: This tutorial was created by Jared Knowles, doctoral candidate from the University of Wisconsin and owner of **Jared Knowles: From Data to Decisions**. Mr. Knowles is not affiliated with Stats Make Me Cry in any way and written consent was obtained before this tutorial was posted.*

Here is the handout associated with this presentation: CLICK HERE FOR HANDOUT

Here is the R Code associated with this presentation: CLICK HERE FOR R CODE

]]>*This video was created by Dr. Roger Peng, professor at the Johns Hopkins Bloomberg School of Public Health *and author at* Simply Statistics. Dr. Peng is not affiliated with Stats Make Me Cry in any way and written consent was obtained before this video was posted.*

To demonstrate this task I'm using one of the sample datasets that comes with SPSS named "demo_cs.sav". To start let's assume that we've already found an interaction effect **(see figure below)**. In this case, we've run a model in which income and gender are predictive of the price of one's vehicle. The figure below also shows us that income and gender interact to predict price of one's car (p<.001), so we have an effect to explore/plot!

The significant interaction term indicates that there is a moderating effect to explore graphically!

As you may or may not know, the above analysis can be run using either the GLM menu dialog or the regression dialog in SPSS. A key difference between the two is that you'll need to manually create the interaction term using the regression method, whereas the GLM will allow you to specify the interaction in the "Model..." dialog **(see 1 in figure below)**.

Click on the "Model..." button to specify main effects and interactions in a Univariate General Linear Model (GLM). Click on "Plots" to produce effect plots, but this only works for categorical/binary predictors (Fixed Factors). How do you do this when a predictor is continuous? Read on...

In the GLM dialog (above) you might've also noticed that there is a "Plots" button that you can click (see 2 in figure above), which seems promising, except you may be disappointed to find that it is only helpful if both predictors are binary or categorical (Fixed Factors in Univariate GLM). If either of the predictors in the interaction you wish to explore graphically are continuous (Covariate in Univariate GLM), then that predictor won't be available to create a plot in the "Plots" dialog (see figure below).

Only the "Fixed Factor(s)" predictor is available in the "Univariate: Profile Plots" dialogue

To obtain the plot you are seeking when one of your predictors is continuous (Covariate in Univariate GLM), you simply need to save your predicted values during analysis and plot them using "Graphs > Legacy Dialogs > Scatter/Dot...".

Let's walk through our example. Whether you used the GLM - Univariate analysis or the Regression - Linear analysis the first step is the same: return to your analysis dialog and click on the "Save..." button (GLM - Univariate example on left below, Regression-Linear example on right below).

Click "Save..." and then click on the "Unstandardized" box in the "Predicted Values" options.

Click "Save..." and then click on the "Unstandardized" box in the "Predicted Values" options.

After re-running your analysis while saving the predicted values, you will find a new variable in your dataset named "PRE_1" (which stands for "Predicted Values"; see figure below).

NOTE: each time you re-run this analysis a new version of this variable will be created with a new numeric postscript...PRE_2, PRE_3...etc). You will use this new variable to plot your effects.

Next navigate to "Graphs > Legacy Dialogs > Scatter/Dot..." (see figure below).

Graphs > Legacy Dialogs > Scatter/Dot...

Once in the "Scatter/Dot..." dialog, move the newly-created predicted values variable (PRE_1) to the Y-Axis (predicted value for price of car in our example), your continuous predictor to the X-Axis (income in our example) and your categorical variable (gender in our example) to the "Set Markers By" field (see figure below). When done (set "Titles" and change "Options" as desired), click "OK".

Plot "predicted values" from regression or Univariate GLM to explore interaction effects.

You now have your plot, but you'll probably notice immediately that you are missing your trend/regression lines to compare your effects **(see figure left below)**! We need to make some slight modifications here. To add these lines: double click on the plot in the output viewer (or right click and choose "Edit Content > In Separate Window"). Once your new plot editor window appears **(circled in figure center below)**, click on the "At Fit Line at Subgroups" button.

Upon clicking on the "At Fit Line at Subgroups" you should now see your trend lines for each group (i.e. moderator group; our example has only 2 groups, but this would work the same if there were more than 2 groups).

Initial plot from "Scatter/Dot..." dialog. We need to add trend/fit lines to make interpretable.

To add the trend/fit lines, click on the "At Fit Line at Subgroups" button (circled in red above).

To delete the R-squared text, simply click on the text and hit delete on your keyboard

Next, I like to remove the text that appears indicating an R-squared statistic for each group's trend line **(see figure center above)**. I don't find this enormously helpful, so typically delete it **(see figure right above)**. To delete the R-squared text, simply click on it to select (will be outlined in yellow when selected) and press the delete key on your keyboard **(see figure right above)**. You are now done editing your plot. Close your "Chart editor" dialog and your new plot should now be visible in your output viewer **(see figure below)**.

We see above that the interaction effect in this example is not large, as the difference in slopes between males and females is not huge, but according to the initial analysis the difference in slope is statistically significant.

If you wish to make the plot black and white, instead of color (differentiating groups by line type, per APA format), check out my blog on making SPSS automatically create APA formatted plots! Otherwise, thanks for reading and feel free to leave a comment/provide feedback below!

NOTE: observations are not typically as neatly placed along the fit lines as they are in our example above, but in this example from SPSS sample data the predictors explain over 90% of the variability in our dependent variable (very rare in real life analyses). Typically, you will see many observations straying away from the trend lines.

]]>download pdf

This is a fantastic resource for learning to run confirmatory factor analysis (CFA) models and structural equation models (SEM) in R using the lavaan package. The tutorial provides example models, includes example code, discusses multi-group analysis, and even references some advanced functions for producing path diagrams in R. As the presentation is a few years old, there may be some places where the code is out-of-date, but updates for any obsolete code can be found in the lavaan manual. If you are using R, you are probably accustomed to these code adjustments.

This tutorial was posted with the author's permission, so thank-you to Dr. Revelle. Find more great resources from Dr. Revelle at personality-project.org/r. To download a copy of R (for mac or PC) visit the CRAN project or visit my R video tutorial page to learn how to install R (on either a Mac or PC).

]]>A free Dropbox account only provides 2 gigabytes (gb) of store, which can be used quickly when collaborating for work. This can be a problem because Dropbox will no longer sync once you’ve exceeded your storage limit. There are 3 possible solutions to this problem:

- Pay for a “Pro” account, which provides you with 100gb of storage, instead of 2gb, for $8.25/month ($99 a year) - this is my favorite option and I believe well worth it, but I am a long time Dropbox user. Here is the link to upgrade: https://www.dropbox.com/upgrade
- Start referring friends to dropbox, as you receive additional space for each person that you refer that signs-up (even for the free account). You get 1gb extra for each person and the person that signs-up gets an extra 500mb also!!! Here is the link to the referral page: https://www.dropbox.com/referrals
- Clear space in your Dropbox account by archiving files and folders that you are no longer actively working in.

The following is a step-by-step tutorial on how to prevent old files and folders (that you are no longer actively working with) from taking-up your precious storage space by archiving them (solution 3 from above).

**Step 1:** go to www.dropbox.com in your internet browser of choice (e.g. Chrome, Firefox, Internet Explorer, Safari…etc). Once there, click "sign-in" (arrow 1 below) and then enter your username (email) and password for Dropbox (arrow 2 below).

**Step 2:** once signed-in, click "sharing" on the left sidebar (arrow 3 below)

**Step 3:** Find an old folder that you are no longer actively working in (don't worry, we will have an archive, in case you need the files later). Once you've found the folder (in our example it's called "An Example Shared Folder"), click on the "Options" link to the right of the folder's name and date last modified (arrow 4 below)

**Step 4:** After clicking "Options", you should see the dialogue box below pop-up. In that box, click the "Leave folder" button (arrow 5 below-left). When the confirmation pop-up appears (2nd image below), be sure the "I still want to keep my copy of these files" box is checked (arrow 6 below-right) and then click "Leave folder" (arrow 7 below-right)

**Step 5:** Close your browser. Next go to your Dropbox folder on your computer, probably in your Home folder (on PC use Explorer, on a Mac use Finder). Once in your Dropbox folder, right click on the folder you just "left" on dropbox.com and compress or zip the folder. On PC, you will click on "Send to" -> "Compressed (zipped) folder" (arrow 8 below-left).

On Mac, you click on "Compress "[insert name of your folder]" (arrow 9 below-right).

**Step 6:** A new file should now appear in your Dropbox with the same name as the folder you compressed, but it will have ".zip" at the end of it (May not be visible on PC, but a little zipper icon should be visible on the file). See PC example below-left (arrow 10 below-top) and Mac example below-right (arrow 11 below-bottom).

**Step 7:** You may now move that newly created zipped archive anywhere on your computer outside of Dropbox. I recommend creating a folder in your Home folder (maybe called "My Documents" on PC), naming the folder something like "Dropbox Archives", and then moving the zipped archive files there.

After the zipped file is moved, the original folder can now be deleted from the Dropbox folder. If a copy of the zipped archive file remains in your Dropbox folder (i.e. you copied it to the "Dropbox Archives" folder instead of moving it), you may also delete the zipped folder from your Dropbox folder (but do not delete it from the "Dropbox Archives" folder that you created in your home folder).

TA-DA! That folder will no longer take-up space on your Dropbox! I hope this helps!

]]>For example, a survey measure of depression may include many questions that each measure various aspects of depression, such as:

Assuming the items are worded appropriately and asked of an appropriate sample, we would expect that each of these items would correlate with each of the other items, since they are all indicators depression (see correlation matrix below).

Internal Consistency Correlation Matrix and Cronbach's Alpha Example High.png

To the extent that this is true, internal consistency would be high, giving us confidence that our measure of depression is reliable (see alpha above, explanation of Cronbach's Alpha to come).

However, if an item is poorly worded or does not belong in there at all, the internal consistency of the scale could be threatened. For example, if we replaced the question about Lethargy in our measure of depression with the new question below, our internal consistency is likely to be threatened.

- Loss of interest in activities (X1)
- Negative Mood (X2)
- Weight Loss/Weight Gain (X3)
- Sleep Problems (X4)
**Number letters in your last name (Y1)**

Internal consistency is likely to be threatened because "Number of letters in your last name" is unlikely to be highly correlated with any of the other four items (see low correlation coefficients circled in image below), because it is not really an indicator of depression. Thus, replacing the "Lethargy" question with the "Number letters in your last name" question will lower internal consistency of our Depression scale and ultimately, lower the reliability of our measurement (see below, explanation of Cronbach's Alpha below).

Internal Consistency Correlation Matrix and Cronbach's Alpha Example Low.pngInternal consistency is typically measured using Cronbach's Alpha (α). Cronbach's Alpha ranges from 0 to 1, with higher values indicating greater internal consistency (and ultimately reliability). Common guidelines for evaluating Cronbach's Alpha are:

- .00 to .69 = Poor
- .70 to .79 = Fair
- .80 to .89 = Good
- .90 to .99 = Excellent/Strong

…if you get a value of 1.0 then you have "complete agreement" (i.e. redundancy) in your items, so you likely need to eliminate some. Items that are in perfect agreement with each other do not each uniquely contribute to the measurement in the construct they are intended to measure, so they should not both be included in the scale. Occasionally, you may also see a negative Cronbach's Alpha value, but this is usually indicative of a coding error, having too few people in your sample (relative to the number of items in your scale), or REALLY poor internal consistency.

If Cronbach's Alpha (i.e. *internal consistency*) is poor for your scale, there are a couple ways to improve it:

- Eliminate items that are poorly correlated with other items in your scale (i.e. "Number letters in your last name" item in previous example)
- Add highly reliable items to your scale (i.e. that correlate with existing items in your scale, but are not redundant with items already in your scale)

As always, I hope this is helpful and please let me know if you have questions in the comments! What stats terms do you find confusing?

]]>*The following is not a Stats Make Me Cry original, but rather something I came across and found very useful. The article demonstrates how to examine non-linear effects (e.g. quadratic effects) using a regression model in R. If you are interested in the topic, please read the preview and follow the link that follows to the original site.*

Part 3 we used the lm() command to perform least squares regressions. In Part 4 we will look at more advanced aspects of regression models and see what R has to offer. One way of checking for non-linearity in your data is to fit a polynomial model and check whether the polynomial model fits the data better than a linear model. Or you may wish to fit a quadratic or higher model because you have reason to believe that the relationship between the variables is inherently polynomial in nature...

]]>A scatterplot of these variables will often create a cone-like shape, as the scatter (or variability) of the dependent variable (DV) widens or narrows as the value of the independent variable (IV) increases. The inverse of heteroscedasticity is homoscedasticity, which indicates that a DV's variability is equal across values of an IV.

For example: annual income might be a heteroscedastic variable when predicted by age, because most teens aren't flying around in G6 jets that they bought from their own income. More commonly, teen workers earn close to the minimum wage, so there isn't a lot of variability during the teen years. However, as teens turn into 20-somethings, and 20-somethings into 30-somethings, some will tend to shoot-up the tax brackets, while others will increase more gradually (or perhaps not at all, unfortunately). Put simply, the gap between the "haves" and the "have-nots" is likely to widen with age.

If the above where true and I had a random sample of earners across all ages, a plot of the association between age and income would demonstrate heteroscedasticity, like this:

Plot No. 1 demonstrating heteroscedasticity (heteroskedasticity)

Plot No. 2 demonstrating heteroscedasticity (heteroskedasticity)

By the way, I have no real data behind this example; this is just a hypothetical situation, though it does seem logical.

Heteroscedasticity is most frequently discussed in terms of the assumption of parametric analyses (e.g. linear regression). More specifically, it is assumed that the error (a.k.a residual) of a regression model is homoscedastic across all values of the predicted value of the DV. Put more simply, a test of homoscedasticity of error terms determines whether a regression model's ability to predict a DV is consistent across all values of that DV. If a regression model is consistently accurate when it predicts low values of the DV, but highly inconsistent in accuracy when it predicts high values, then the results of that regression should not be trusted.

I want to re-iterate that the concern about heteroscedasticity, in the context of regression and other parametric analyses, is specifically related to error terms and NOT between two individual variables (as in the example of income and age). This is a common misconception, similar to the misconception about normality (IVs or DVs need not be normally distributed, as long as the residuals of the regression model are normally distributed). Now that you know what heteroscedasticity means, now try saying it five times fast!

I hope you found this helpful. What stats terms do you find confusing?

]]>Unfortunately, if you use SPSS you've probably already discovered that it produces graphics in color by default. Not to worry, your graphs can be changed easily. Better yet, you can make simple adjustments to your SPSS settings that will force the program to create APA-compliant (i.e. black & white) graphics in all output! Here is how you do it:

First, I'll show you how to change an individual chart (this works for a newly created chart or a chart saved in output that you created previously). In the screenshot below, you can see I'm creating a table that uses FAKE data about how many times each day people check Facebook. The line graph I am creating examines the number of Facebook checks per day by age group and by gender. Again, I want to be clear: **THIS IS NOT REAL DATA, IT IS 100% FAKE**.

To produce the initial graph I did the following:

**In the upper left screenshot**I am choosing a "Line" graph from the "Legacy Dialogs" menu.**In the upper right screenshot**I am choosing the "Multiple" and "Summaries for groups of cases" options in the "Line Charts" dialogue box, and then clicking the "Define" button.**In the lower left screenshot**I am choosing the variables I want to include in my graph (i.e. # of Facebook checks, gender, and age), and then clicking the "OK" button.**In the lower right screenshot**you can see the resulting graph that is produced in SPSS, which is in color by default.

Once the graph is produced in the SPSS output, hover your mouse pointer over the graph, and you should see a message pop-up that says "Double-click to activate" **(below, left)**. Go ahead and double-click. At that point, you should see something similar to what you see in the **screenshot below and on the right**. Next, click on the "Show Properties Window" icon **(circled in the screenshot below and on the right)**.

Once the "Properties" dialogue window appears (top-left screenshot below), use the mouse pointer to click on one of the lines in the bar graph (which will highlight both lines in the graph). When the lines are highlighted, you should also notice that more tabs are now available in the "Properties" dialogue window (top-right screenshot below), including the "Variables" tab.

Next, click on the "Variables" tab, bringing it to the foreground (bottom-left screenshot below), and then use the drop-down box to change the Group Style" from "Color" to "Dash" (bottom-right screenshot below).

Next, click the "Apply" button and you should see the lines change from color to black solid lines and dashes **(below, left)**. On a side note: you can also change which dash/dot patterns are used, but clicking on one of the lines and then clicking on the "Lines" tab in the "Properties" dialogue window **(below, right)**, and changing the option in the drop-down menu.

The great news is that we can make SPSS do this process for us automatically, for every graphic! To accomplish this, navigate to the SPSS "Options" menu, which can be found in the "Edit" menu **(top-left screenshot below)**. Next, click on the "Charts" tab in the large dialogue window that appears **(top-right screenshot below)** and change the "Style cycle preference" from "Cycle through color only" to "Cycle through patterns only". Next, change the "Font" from "SansSerif" to "Times New Roman" **(bottom-left screenshot below)**.

Optionally, you can also adjust the order in which the dash/dot patterns are cycled through groups/categories by clicking on "Lines" in the "Style Cycles" section **(circled in the bottom-left scr**