Exploratory Factor Analysis (EFA)
Introduction to Exploratory Factor Analysis (EFA)
- Things psychs are interested in are often unobservableHow psychologists have held onto their jobs:
- Good at coming up with indirect ways of measuring things (questionnaires, experimental tasks, etc. - directly observable responses reflect unobservable psychological constructs)
- Operationalise theoretical constructs - run stat tests on observable measures, use them to discuss psychological constructs (e.g. differences in state anxiety scores described as "differences in anxiety")
- can only do this if measures accurately reflect psych constructs being discussed.
EFA is one way to check if measures reflect construct.
EFA uses
- Often collect data on far more variables than we care to talk about
- EFA - simplifies complex data sets - organises similar variables by assessing shared variance in responses - hypothetical constructs aren't directly measured in study - just mathematically examines which variables are related - up to researcher to determine what constructs are being represented, etc. - meaning of the constructs is based on content of variables and relevant theory
- Can help clarify what construct(s) variables are measuring - allows researcher to judge whether variables measure what we think they do
Example
Provide large battery of tests
- lexical decisions
- sentence comprehension
- spelling test
- logical sequences
- running sums
- speeded algebra
Could divide into linguistic and mathematical ability (first three linguistic, last three maths)
- 6 tests, 2 constructs - provides simpler, clearer picture
Summarising data with factors
- Factors directly summarise commonalities among different measures - there is a hypothesised 'latent variable' - a bunch of measures that may relate to it are run - interpretation of a factor (i.e. what it means) depends on the variables the factor is derived from
- Not all measures contribute equally to a factor - factor loadings refers to the correlation between an observed measure and the latent (unobserved) factor - the higher the factor loading, the better a predictor of the latent factor
- EFA can help to clarify questions about variables
Types of research questions
- EFA Gives a principled way to divide variable into factors based on maths - e.g. hospital satisfaction questionnaire, EFA could be used to determine areas of satisfaction that items load onto
- EFA could be used to determine underlying structure of a construct - e.g. Intelligence - EFA may find that intelligence is uni- or multi- dimensional - if all intelligence tests share a lot of variance, uni-dimensional - if some relate more stingily to each other than others, multi-dimensional
- Are constructs distinct - e.g. is self-esteem and self-worth the same construct - run the measures and determine if they are loading onto the same factor
Two Key Principles: Variables and Variance
- EFA is interested in relationships between different variables - are there patterns in the way our variables correlate with each other?
- Unique variance v Communality - uniqueness is proportion of variance not shared with any other variables, communality is proportion of variance shared with other variables
- EFA is interested in the pattern of commonalities and determines the factor structure from it (reflects sub-groups of variables that correlate highly with each other, but have low correlations with other variables)
From Shared Variance to Factors
- We assume patterns in observed variables are driven by unobserved mental processes - several ways to measure single process (e.g. various language ability tests) - this is reflected in shared variance among variables
- Goal of EFA is to arrive at a parsimonious factor structure - Factors are clearly distinguishable but not too numerous
Two Steps
- Extraction of Factors: capture as much shared variance as possible across all the extracted factors
- Rotation of Factors: simplify the extracted factor structure
Extraction of Factors
- Have a scatterplot of k variables in k-dimensional space
- Question: Can we summarise the data in fewer than k-dimensions?
- Pass k eigenvectors through data to capture shared variance (eigenvectors - essentially multiple regression lines but not exactly bc that would just be multiple lines going exactly the same way) - all eigenvectors must be orthogonal to one another (at right angles)
- Eigenvectors look like axes - establish the dimensional structure - put data in k-dimensional space - Represent different factors because they are explaining variance no other fact is explaining
(each line is oriented based on accounting for the most variance while also being orthogonal to the previous - only the last eigenvector's direction is set)
Extraction of Factors cont.
- Determine Factor loadings: how strongly does each variable correlate with each factor - big correlations suggest the eigenvector represents a real underlying factor, low correlations suggest the factor probably isn't real
(Eigenvalues: quantifies variance explained by each eigenvector - sum of squared factor loadings)
- Being that eigenvectors are orthogonal, earlier eigenvectors account for more variance than later ones - will have larger factor loadings - each eigenvector only explains variance that is not explained by other eigenvectors
- You add eigenvectors until all variance is explained or you reach the k-th eigenvector - is possible to explain all variance without using all k eigenvectors
Example: What is the skill profile of a wizard?
- 7 variables: concentration, reasoning, animal training, strength, endurance, spell casting, spell learning
- Extraction finds 3 Factors: Factor 1 has substantial loadings from V1, V2, V3 (all other negligible); Factor 2 has V4, V5, V3; Factor 3 has V6, V7, V1 - some variables load on multiple factors
- Limitations of Extraction: Blunt instrument - number crunching exercise - each factor seeks to explain as much variance as possible without interest in which variables contribute to each factor (esp 'late' factors operating under more constraints)
- Therefore, move on to rotation
Rotation
- Simplify the factor structure - maximise high loadings, minimise low loadings
- How: eigenvectors are rotated to better capture subsets of variables with high loadings - do so in a way that makes logical sense but also in line with some mathematical constraints
- Changes pattern of shared variance accounted for by each factor
- Not every eigenvector needs to be rotated
- Constraints: - number of factors must remain the same - proportion of variance explained by each factor must remain the same (factors can't explain more or less variance than before - only the pattern can change - increases in high loadings must be matched by decreases in low loadings - eigenvalues must stay constant)
Rotation cont.
- Rotation makes the biggest difference for variables that loaded onto multiple factors (not much for variables that loaded strongly on a single factor)
Rotation 'cleans up' the factor structure - pushes variables loading on multiple factors onto one
- Rotation may mean orthogonality is compromised - that's fine
Rotated factor structure is often more readily interpreted - the 'theme that runs through individual factors is easier to discern
Important Caveat on Rotation
- Rotation doesn't guarantee a clear factor structure - can only happen if the data have a clear factor structure to begin with
- If there are no clear patterns of correlations among subgroups of variables, no distinct and interpretable factors will emerge
- Two possible reasons rotation may not help: - no distinct constructs underlie the variables - variables might not be assessed properly (measurement error obscures the 'true' amount of shared variance among measures)
Doing EFA
- Broad aim of EFA is to simplify a data set - reduces the number of measured variables to a smaller number of unobserved latent variables (i.e. hypothetical constructs)
Five steps
- Planning for the analysis
- Decide on the number of factors to retain
- Choose an extraction method
- Choose a rotation method
- Interpret the solution
Planning for the analysis: Data collection:
- which variables/items to assess? - depends on which constructs you want to differentiate
- how many variables/items are needed? - depends on who you ask and number of factors you think exist - some say 3 variables per factor, some say 5-6 variables per factor - more variables, more conservative, safer
- How many participants/cases needed? - depends on number of variables - determines power - different people give different answers (2x the n of variables, 5x the n of variables, no fewer than 50 ps, etc.)
Planning for the analysis: Checking assumptions:
- Are the data measured on an interval scale - can handle dichotomous but really want continuous measures with equal intervals between scale points
- Do scores on the measures vary enough - correlations need variance - must have adequate variability
- Do scores have linear correlations with each other - technical point: determines factorability of the data matrix - magnitudes ideally at least .3 - but not too high (problems of invariance and parsimony)
- Are scores (generally) normally distributed - no outliers - not skewed
Decide on number of factors to retain: Choice in Exploration
- Factors are extracted to account for all variance in the data
- researchers rarely retain all factors - some factors just pick up noise, some explain only very little variance, some may include only a single item
- Need to establish: how many factors you are looking for and/or how many substantive factors you have evidence for - often don't know bc EFA is an exploratory technique
- Various stopping rules are used to this end
Stopping rules: A priori Decision
- prob least consistent with EFA
- based on previous knowledge of constructs (i.e. the literature), you decide on the number of factors you want to extract - analysis is then constrained to extract only that many factors
- Pros: scientifically appropriate to work a priori - constrained analysis might reduce severity of interpretation problems
- Cons: if existing theory is underdeveloped, exploration is limited - can't address questions about number of factors underlying constructs
Stopping rules: Kaiser's Criterion
- in EFA variables are standardised so variance = 1
- if an eigenvalue >1, a factor explains more variance than a single measured variable - achieves some degree of data reduction
- if eigenvalue <1 - factor explains less variance than singe variable - does not achieve data reduction
- Kaiser's Criterion: retain every factor with eigenvalue >1
- Pro and con: permits 'true exploration' of the data - no a priori commitments to factor structure - con is that you just have to hope the data is good
Stopping rules: Scree Test
- Scree plot - shows eigenvalues for each extracted factor (can see examples of plot basically steep descending at first, flat at the tail)
- Discontinuity principle: retain factors associated with the steeply descending part of the plot
- How - draw a straight line summarising the descending part of the plot - draw straight line summarising flat part of the plot - lines intersect at the point of inflection - retain number of factors to the left of the point of inflection
Choosing a Stopping Rule
- If you have strong a priori ideas, specify n of factors
- If you really are exploring, use a post-hoc method: Kaiser or Scree (probs Kaiser - easier sell)
- different approaches might lead to different conclusions - best approach is to try multiple approaches and check for consistency and clarity of interpretation (can get away with that in EFA)
Choosing an Extraction Method
- Several Methods: Principal Components Analysis; Principle Axis Factoring; Maximum Likelihood
- Methods differ in assumptions about variance in measured variables
- Specific computations involved
Choosing an Extraction Method: Principal Components Analysis
- Communalities are set to 1 for each measured variable (assumed that all measured variance is shared variance [no error or unique variance]; bc all variance is assumed shared it analyses both communalities and unique variance [other methods just deal with communalities])
- Extracts components that are assumed to be uncorrelated (assumes correlations between factors to be zero)
- Pros: Finds the best mathematical solution; typically explains more variance than other methods
- Cons: Measurement assumption is inappropriate for psych; Factor loadings may be artificially high
[in psych, every correlates with everything - painful but largely true - this method ignores that fact - bad assumptions]
Choosing an Extraction Method: Principal Axis Factoring
- Communalities are estimated from empirical correlation matrix (assumes not all variance is shared - therefore communalities will be <1 - recognises some variance is due to random factors, shouldn't be explained by model)
- Analyses only the variance that is shared between measured variables (leaves out error and variance specific to a variable)
- Assumes factors might be correlated (maybe not but maybe)
- Goal is to maximise the variance in the observed variables that is explained by the extracted factors (explain as much variance as possible, but only shared variance - PCA tries to explain all variance)
- Pros: Appropriate measurement assumptions for psych
Choosing an Extraction Method: Maximum Likelihood
- Similar to PAF - estimates communalities from empirical correlation matrix, analyses only shared variance and allows for the possibility of correlated factors
- Differs from PAF - goal is to maximise the likelihood of reproducing the observed correlations between variables (may result in different solution than just 'fitting' the variance/explaining as much as possible)
- Tend to have similar results to PAF but analysis is more rigorous and tends to be viewed as better
- Pros: measurement assumptions appropriate for psych; Provides a goodness of fit test (rarely reported) comparing observed correlation matrix v the one produced by the factor solution (gives an idea of how good the model is
Choosing an Extraction Method: Overview
- A priori ideas v pure exploration: for a priori use PAF or ML over PCA (bc more flexible and more realistic assumptions)
- Measurement assumptions: if variables have no unique variance, can use PCA - but pretty much never happens in psych so use PAF or ML
- Assumed relationships between test constructs/factors: if there is a reason to believe constructs are independent, use PCA - but probably aren't so use PAF or ML
- Anything PCA can do PAF or ML can probably do better
- So PAF or ML? - usually consistent, use conventions in sub field as a guide
Methods of Rotation
- 2 classes of methods - differ on assumptions on how factors may be correlated
- Orthogonal rotation: assumes factors are independent (i.e. uncorrelated); factor axes are perpendicular to one another after rotation
- Oblique rotation: assumes factors can be correlated; factor axes may not be perpendicular after rotation
Orthogonal Rotation Methods
- Independent factor axes - maintains original eigenvectors - angle of rotation the same for all axes
Methods
- Varimax rotation - minimises complexity of the factors - identifies clusters of variables that define any one factor
- Quartimax rotation - minimises complexity of variables - indentifies variables defined by only one factor (more common in sociology research where specific measures may be more important than underlying constructs)
- largely a difference in perspective - do we want factors that are easily interpretable or variables that are unambiguously described by the factors
- Equamax rotation - tries to minimise complexity of both factors and variables - results are unstable - don't use it
Oblique rotation
- Correlated factor axes - different angles of rotation for different axes of the factor solution
- Performed using two kinds of loadings
- Pattern loadings: indexes unique relation between a factor and a variable partialling out effects of other factors (like a partial correlation
- Structure loadings - Indexes relation between a factor and a variable without accounting for other factors (like a bivariate correlation)
- maths becomes more tedious with oblique rotation but the guiding principles are the same as orthogonal
Oblique Rotation Methods
- Oblimin Rotation - Minimises sum of cross products of pattern loadings to get variables to load on only a single factor - same goal as in orthogonal methods, just more complicated maths
- Promax Rotation - Raises orthogonal loadings to a power to reduce small loadings, then rotates axes to accomodate this modified interim solution
Which rotation method should I use?
- Depends on theory and measurement assumptions
- are constructs of interest likely to be correlated or not? (correlated, use oblique - uncorrelated, use orthogonal)
- Varimax is the most common orthogonal method (in psych we care more about simple factors than simple variables)
- Oblimin is the most common oblique method but they generate similar solutions
- Pure exploration is acceptable so can run both orthogonal and oblique and pick the simpler solution - Goal of EFA is to produce simple interpretable factor structure, no right or wrong really
Navigating Output
SPSS provides:
- Initial factor solution: communalities; variance explained by each factor/component
- Information about extracted factors/components: Factor loadings
- Information about the rotated factor solution: for orthogonal, factor loadings; for oblique, pattern and structure loadings
- If using oblique - also get correlations between factors
What do you do with output
- Identify patterns in the rotated factor solution - loadings calculated for every variable on every factor - look for groups of variables that load strongly on one factor (>.7) and weakly on others (<.3)
- Interpret the content of the groups of variables - consider what they have in common
- Identify construct represented by each factor - based on content of variables and existing theory - subjective decisions, justify your interpretation! very subjective
Interpreting loadings: positive and negative
- usually positive but can be negative
- don't need to be concerned with signs of factor loadings of initial (unrotated) factor solution (signs of factors can reflect artifacts of EFA algorithm - unrotated solutions are often not clearly interpretable)
- Do consider signs of the rotated solution - if all signs are consistent it is easier to interpret - if loadings have different signs, interpretation is more difficult
Differing signs on loadings example
- 3 variables load onto a single factor: Funny (0.87); Sad (-0.79); Happy (0.90)
- Factor has something in common with Funny and Happy and shares something in common with the opposite of sad - perhaps factor is positive affect
Troubleshooting
- A variable loads strongly onto multiple factors - antithetical to the data reduction goal of EFA - could reflect a higher order or more complex factor - could just remove variable to increase interpretability of general solution, but need to consider the importance of the variable (and potential theoretical relevance of a higher order factor)
- A variable doesn't load strongly on any factor - variable doesn't have much to do with any construct - drop it and re-run the analysis (typically unproblematic to remove)
- Factor is uninterpretable given its constituent variables - trickier to deal with - uninterpretable factors are not very useful to anyone - can drop the variables making up the factor and re-run analysis
- Heywood cases
Heywood cases
- when communalities for a variable is estimated to be >1 or an eigenvalue for a factor is estimated to be negative - the impossible has happened
- Named after Heywood (1931) - sometimes numerical values that solve the optimisation problem posed by EFA are logically impossible values
- Often happens when researchers haven't put the correct constraints on values that can be identified by the optimisation algorithm
- Why does it happen? - its just running the maths, if you don't specify the impossible it will try the maths for that too - often arise when there are too few data points given the n of factors extracted, or variables are highly correlated
- Communalities >1 don't make sense - more than all the variance cannot be shared
- Neg eigenvalues don't make sense - can't explain less than none of the variance
Dealing with Heywood cases
- Drop highly correlated variables (highly correlated variables can make it difficult to identify factor structure)
- Collect more data (can clarify factor structure)
- Maximum Likelihood (ML) methods are more vulnerable - switch to PAF
Key points to remember
- EFA can only analyse variables submitted to the analysis - factors are only discoverable only if relevant variables are entered - factor structure can change dependent on addition or removal of variables
- Default settings may not provide enough iterations for EFA to converge on good solution - SPSS defaults to 25 - want more like 250+ in practice (iterative process - deminishing returns - narrows in on better and better solutions)
- Take notes as you go - may do multiple analyses (e.g. different extraction/rotation methods, add/remove variables, etc.) - keep track of why you do them
Reporting EFA
Report:
- list of variables (or items)
- choice of extraction and rotation method (and justification for choices)
- n of factors extracted and the rules used to determine this (a priori, scree plot, Kaiser's criterion)
- proportion of variance accounted for by each factor
- Factor labels and summary of variables for each
- Factor loadings for each factor (rotated and unrotated) - list range of loadings in text, full matrix in table - pattern loadings are most important for oblique rotation
- Correlations between factors (if relevant)
Reporting: Decision process or end result?
- Lots of decision points in EFA - researchers are divided: some say report everything, some say just report end result
- Err on the side of reporting more detail rather than less when: EFA results are central to your research question; you are using an established measure/set of variables; you are investigating theoretically motivated a priori ideas
- No hard and fast rules - report what you want, justify, be prepared to change it