Title: | Next Eigenvalue Sufficiency Test |
---|---|
Description: | Determine the number of dimensions to retain in exploratory factor analysis. The main function, nest(), returns the solution and the plot(nest()) returns a plot. |
Authors: | P.-O. Caron [aut, cre, cph] |
Maintainer: | P.-O. Caron <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0 |
Built: | 2025-01-24 20:33:12 UTC |
Source: | https://github.com/quantmeth/rnest |
A list of seven correlation matrices.
achim
achim
A list of correlation matrices.
Achim, A. (personal communication).
A correlation matrix composed of 18 items based on six factors. Four have more than three variables, three variables have crossloadings (items 6, 7 and 13), two are doublets factors (items 13-14, 15-16), and there is two unique variables (17 and 18). Loadings range between .40 and .80.
achim24
achim24
A 18 by 18 correlation matrix.
Achim, A. (2024, April 4). Signal cancellation factor analysis. PsyArXiv, 1–13. doi:10.31234/osf.io/h7qwg
BartlettSphericity
tests if variables are orthogonal.
BartlettSphericity(R, n)
BartlettSphericity(R, n)
R |
the correlation matrix. |
n |
the sample size. |
The test of the correlation matrix
R
with sample size n
.
André Achim (Matlab)
P.-O. Caron (R)
Bartlett, M. S. (1937). Properties of sufficiency and statistical tests. Proceedings of the Royal Statistical Society, Series A, 160, 268–282
BartlettSphericity(ex_4factors_corr, 42)
BartlettSphericity(ex_4factors_corr, 42)
A list of three correlation matrices.
briggs_maccallum2003
briggs_maccallum2003
A a list of three correlation matrices.
Briggs, N. E., & MacCallum, R. C. (2003). Recovery of weak common factors by Maximum likelihood and ordinary least squares estimation. Multivariate Nehavioral Research, 38(1), 25–56. doi:10.1207/S15327906MBR3801_2
A list of six correlation matrices composed of nine variables with three factors and different levels of correlations between factors.
caron2016
caron2016
A list of six 9 x 9 correlation matrices.
Caron, P.-O. (2016). A Monte Carlo examination of the broken-stick distribution to identify components to retain in principal component analysis. Journal of Statistical Computation and Simulation, 86(12), 2405-2410. doi:10.1080/00949655.2015.1112390
A list of 15 correlation matrices composed of nine variables with three factors and different levels of correlations between factors.
caron2019
caron2019
A list of 15 9 x 9 correlation matrices.
Caron, P.-O. (2019). Minimum average partial correlation and parallel analysis : The influence of oblique structures. Communications in Statistics - Simulation and Computation, 48(7), 2110-2117. doi:10.1080/03610918.2018.1433843
A list containing 120 correlation matrices
(R) built to represent different factor structures.
Details are found in the 'cormat.l' data.
cormat
cormat
A a list of 120 correlation matrices.
Caron, P.-O. (2025). A comparison of the Next Eigenvalue Sufficiency Test to other stopping rules for the number of factors in factor analysis. Educational and Psychological Measurement. doi:10.1177/00131644241308528
A list containing 120 lists of correlation matrices
(R) built to represent different factor structures.
Different levels of loadings (delta, .4, .5, .6, .7, .8),
correlation between factors (corrfact, .0, .1, .2 .3), and.
number of factors (nfactors, 1:8) are used. The list contained
matrice (R), and their underlying characteristics (delta, corrfact,
and nfactors).
cormat.l
cormat.l
A list containing 120 matrices.
See Caron, P.-O. (2025). A comparison of the Next Eigenvalue Sufficiency Test to other stopping rules for the number of factors in factor analysis. Educational and Psychological Measurement. doi:10.1177/00131644241308528
Compute covariance or correlation matrix with treatments for clusters and missing values
cor_nest(.data, ..., cluster = NULL, missing = "fiml", pvalue = FALSE) cov_nest(.data, ..., cluster = NULL, missing = "fiml", pvalue = FALSE)
cor_nest(.data, ..., cluster = NULL, missing = "fiml", pvalue = FALSE) cov_nest(.data, ..., cluster = NULL, missing = "fiml", pvalue = FALSE)
.data |
a data frame, a numeric matrix. |
... |
further arguments. |
cluster |
a variable name defining the clusters in a two-level dataset in the data frame. |
missing |
treatment to deal with missing values. Options are |
pvalue |
an argument to indicate if |
A list of class "covnest"
cov_nest(airquality)
cov_nest(airquality)
Full Information Maximum Likelihood (FIML) correlation or covariance matrix
covFIML(data, tol = 1e-6, maxiter = 1000, pvalue = FALSE) corFIML(data, tol = 1e-6, maxiter = 1000, pvalue = FALSE)
covFIML(data, tol = 1e-6, maxiter = 1000, pvalue = FALSE) corFIML(data, tol = 1e-6, maxiter = 1000, pvalue = FALSE)
data |
a data frame of rdata matrix. |
tol |
tolerance. |
maxiter |
maximum number of iterations. |
pvalue |
an argument to indicate if |
A list containing the means, th correlation or covariance matrix, and optionnaly the degree of freedom and the p-values.
A not so efficient function. See ?cor_nest
instead.
covFIML(airquality)
covFIML(airquality)
Empirical Kaiser Criterion (EKC)
EKC(.data = NULL, n = NULL, nv = NULL, lowest.eig = 1, ...)
EKC(.data = NULL, n = NULL, nv = NULL, lowest.eig = 1, ...)
.data |
a data frame, a numeric matrix, covariance matrix or correlation matrix from which to determine the number of factors. |
n |
the number of cases (subjects, participants, or units) if a covariance matrix is supplied in |
nv |
the number of variables if the critical values are required. |
lowest.eig |
minimal eigenvalues to retain. Default is Kaiser's suggestion of 1. |
... |
further argument for |
The number of factors to retain or the crititical eigenvalues.
Braeken, J., & van Assen, M. A. L. M. (2017). An empirical Kaiser criterion. Psychological Methods, 22(3), 450–466. doi:10.1037/met0000074
EKC(ex_4factors_corr, n = 42)
EKC(ex_4factors_corr, n = 42)
A correlation matrix composed of 10 items based on 2 factors with 5 variables each and loadings equals to .80.
ex_2factors
ex_2factors
A 10 by 10 correlation matrix.
Caron, P.-O. (2025). Rnest: An R package for the Next Eigenvalue Sufficiency Test. https://github.com/quantmeth/Rnest
A correlation matrix composed of 10 items based on two main factors among which there is two cross-loadings. There is also a doublet factors and an unique variable.
ex_3factors_doub_unique
ex_3factors_doub_unique
A 10 by 10 correlation matrix.
Achim, A. (personal communication).
A correlation matrix composed of 12 items based on 4 factors with 3 variables each. Loadings equals to .9, .9, and .3. Factors 1 and 2, and factors 3 and 4 are correlated at .7.
ex_4factors_corr
ex_4factors_corr
A 12 by 12 correlation matrix.
Achim, A (personal communication).
A population correlation matrix composed of 6 items from a two factor stucture. Factor 1 is based on items 1 to 3 and 6, and Factor 2 is based on items 4 to 6.
ex_mqr
ex_mqr
A 6 by 6 correlation matrix.
Caron, P.-O. (2024). Méthodes quantitatives avec R. https://mqr.teluq.ca
Speed up the use of MASS::mvrnorm
.
genr8(n = 1, R = diag(10), mean = rep(0, ncol(R)), ...)
genr8(n = 1, R = diag(10), mean = rep(0, ncol(R)), ...)
n |
the number of samples required. |
R |
a positive-definite symmetric matrix specifying the covariance matrix of the variables. |
mean |
an optinal vector giving the means of the variables. Default is 0. |
... |
arguments for |
A data frame of size n
by ncol(R)
.
set.seed(19) R <- caron2016$mat1 mydata <- genr8(n = nrow(R)+1, R = R, empirical = TRUE) round(mydata, 2) round(cov(mydata), 2)
set.seed(19) R <- caron2016$mat1 mydata <- genr8(n = nrow(R)+1, R = R, empirical = TRUE) round(mydata, 2) round(cov(mydata), 2)
Returns the maximum number of latent factors in a factor analysis model.
Ledermann(p)
Ledermann(p)
p |
The number of variables. |
The Ledermann bound.
André Achim (Matlab)
P.-O. Caron (R)
Ledermann, W. (1937). On the rank of reduced correlation matrices in multiple factor analysis. Psychometrika, 2, 85–93.
Ledermann(ncol(ex_4factors_corr))
Ledermann(ncol(ex_4factors_corr))
Print Loadings in NEST
loadings(x, nfactors = x$nfactors, method = x$method, ...)
loadings(x, nfactors = x$nfactors, method = x$method, ...)
x |
an object of class "nest". |
nfactors |
the number of factors to retains. |
method |
a method used to compute loadings and uniquenesses. |
... |
further arguments to methods in "nest" or the |
A matrix containing loadings where
is the number of variables and
is the number of factors (
nfactors
).
See stats::loadings
for the original documentation.
results <- nest(ex_2factors, n = 100) loadings(results)
results <- nest(ex_2factors, n = 100) loadings(results)
Minimum average partial correlation (MAP)
MAP(.data, ...)
MAP(.data, ...)
.data |
a data frame, a numeric matrix, covariance matrix or correlation matrix from which to determine the number of factors. |
... |
further argument for |
The number of factors to retain.
Velicer, W. F. (1976). Determining the number of components from the matrix of partial correlations. Psychometrika, 41(3), 321-327. doi:10.1007/BF02293557
D <- genr8(n = 42, R = ex_4factors_corr) MAP(D)
D <- genr8(n = 42, R = ex_4factors_corr) MAP(D)
A sample correlation matrix composed of 44 items.
meek_bouchard
meek_bouchard
A 44 by 44 correlation matrix.
Meek-Bouchard, C. (personal communication).
nest
is used to identify the number of factors to retain in exploratory factor analysis.
nest( .data, ..., n = NULL, nreps = 1000, alpha = 0.05, max.fact = TRUE, method = "ml", na.action = "fiml" )
nest( .data, ..., n = NULL, nreps = 1000, alpha = 0.05, max.fact = TRUE, method = "ml", na.action = "fiml" )
.data |
a data frame, a numeric matrix, covariance matrix or correlation matrix from which to determine the number of factors. |
... |
arguments for |
n |
the number of cases (subjects, participants, or units) if a covariance matrix is supplied in |
nreps |
the number of replications to simulate. Default is 1000. |
alpha |
a vector of type I error rates or |
max.fact |
an optional maximum number of factor to extract. Default is |
method |
a method used to compute loadings and uniquenesses. Four methods are implemented in |
na.action |
how should missing data be removed. |
The Next Eigenvalues Sufficiency Test (NEST) is an extension of parallel analysis by adding a sequential hypothesis testing procedure for every factor until the hypothesis is not rejected.
At , NEST and parallel analysis are identical. Both use an Identity matrix as the correlation matrix. Once the first hypothesis is rejected, NEST uses a correlation matrix based on the loadings and uniquenesses of the
factorial structure. NEST then resamples the eigenvalues of this new correlation matrix. NEST stops when the
eigenvalues is within the confidence interval.
There is four method
already implemented in nest
to extract loadings and uniquenesses: maximum likelihood ("ml"
; default), principal axis factoring ("paf"
), regularized common factor analysis method = "rcfa"
, and minimum rank factor analysis ("mrfa"
). The functions use as arguments: covmat
, n
, factors
, and ...
(supplementary arguments passed by nest
). They return loadings
and uniquenesses
. Any other user-defined functions can be used as long as it is programmed likewise.
nest()
returns an object of class nest
. The functions summary
and plot
are used to obtain and show a summary of the results.
An object of class nest
is a list containing the following components:
nfactors
- The number of factors to retains (one by alpha
).
cor
- The supplied correlation matrix.
n
- The number of cases (subjects, participants, or units).
values
- The eigenvalues of the supplied correlation matrix.
alpha
- The type I error rate.
method
- The method used to compute loadings and uniquenesses.
nreps
- The number of replications used.
prob
- Probabilities of each factor.
Eig
- A list of simulated eigenvalues.
plot.nest
Scree plot of the eigenvalues and the simulated confidence intervals for alpha
.
loadings
Extract loadings. It does not overwrite stat::loadings
.
summary.nest
Summary statistics for the number of factors.
P.-O. Caron
Achim, A. (2017). Testing the number of required dimensions in exploratory factor analysis. The Quantitative Methods for Psychology, 13(1), 64-74. doi:10.20982/tqmp.13.1.p064
nest(ex_2factors, n = 100) nest(mtcars)
nest(ex_2factors, n = 100) nest(mtcars)
Parallel analysis
pa( data = NULL, n = NULL, nv = NULL, nreps = 1000, alpha = 0.05, crit = NULL, ... )
pa( data = NULL, n = NULL, nv = NULL, nreps = 1000, alpha = 0.05, crit = NULL, ... )
data |
a data.frame. |
n |
the number of subjects. |
nv |
the number of variables. |
nreps |
the number of replications. |
alpha |
type I error rate. |
crit |
critical values to compare the eigenvalues. |
... |
other arguments |
nfactors (if data is supplied) and sampled eigenvalues
Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30(2), 179–185. doi:10.1007/BF02289447
pa(ex_2factors, n = 42) pa(n = 10, nv = 2, nreps = 100)
pa(ex_2factors, n = 42) pa(n = 10, nv = 2, nreps = 100)
Scree plot of the eigenvalues and the (1-alpha)*100%
confidence intervals derived from the resampled eigenvalues supplied to nest
.
## S3 method for class 'nest' plot(x, pa = FALSE, ...)
## S3 method for class 'nest' plot(x, pa = FALSE, ...)
x |
an object of class "nest". |
pa |
show results of Parallel Analysis. |
... |
further arguments for other methods, ignored for "nest". |
A ggplot output.
This function is more interesting with many alpha
values.
results <- nest(ex_2factors, n = 100, alpha = c(.01, .05, .01)) plot(results) # Return the data used to produce the plot df <- plot(results)$data
results <- nest(ex_2factors, n = 100, alpha = c(.01, .05, .01)) plot(results) # Return the data used to produce the plot df <- plot(results)$data
Print the number of factors to retain according to confidence levels.
## S3 method for class 'nest' print(x, ...)
## S3 method for class 'nest' print(x, ...)
x |
an object of class "nest". |
... |
further arguments for other methods, ignored for "nest". |
No return value, called for side effects.
results <- nest(ex_2factors, n = 100) print(results)
results <- nest(ex_2factors, n = 100) print(results)
Remove unique variables
remove_unique(.data, ..., alpha = 0.05)
remove_unique(.data, ..., alpha = 0.05)
.data |
a data frame, a numeric matrix, covariance matrix or correlation matrix from which to determine the number of factors. |
... |
further arguments for |
alpha |
type I error rate. |
A list containing the unique variables and a data frame containing their probabilities and the .data
with the unique variable removed.
remove_unique(ex_3factors_doub_unique, n = 420)
remove_unique(ex_3factors_doub_unique, n = 420)
shem
estimates the number of principal components via Split-Half Eigenvector Matching (SHEM).
shem(data, nIts = 30)
shem(data, nIts = 30)
data |
a data frame, a numeric matrix, covariance matrix or correlation matrix from which to determine the number of factors. |
nIts |
number of iterations. |
shem
returns a list containing the number of components, nfactors
, whether the additional step in case of zero true latent components was carried, zeroComponents
, the eigenvalues
and the eigenvectors
of the solution.
Galdwin, T. E. (2023) Estimating the number of principal components via Split-Half Eigenvector Matching (SHEM). MethodsX, 11, 102286. doi:10.1016/j.mex.2023.102286
jd <- genr8(n = 404, R = ex_4factors_corr) shem(jd)
jd <- genr8(n = 404, R = ex_4factors_corr) shem(jd)
summary method for class "nest".
## S3 method for class 'nest' summary(object, ...)
## S3 method for class 'nest' summary(object, ...)
object |
an object of class "nest". |
... |
further arguments for other methods, ignored for "nest". |
No returned value, called for side effects.
results <- nest(ex_2factors, n = 100) summary(results)
results <- nest(ex_2factors, n = 100) summary(results)
A sample covariance matrix composed of 11 items based on two factors according to Tabachnick and Fidell (2019, see, 576-578).
The first five variables are related to "Verbak IQ", the next five are related to "Performance IQ".
The last variable CODING
is unique. Loadings range between .39 and .76.
tabachnick_fidell2019
tabachnick_fidell2019
A 11 by 11 covariance matrix.
Tabachnick, B. G., & Fidell, L. S. (2019). Using multivariate statistics. Allyn and Bacon. p. 576-577.
Probability of unique variables
unique_variable(.data, n = NULL, ...)
unique_variable(.data, n = NULL, ...)
.data |
a data frame, a numeric matrix, covariance matrix or correlation matrix from which to determine the number of factors. |
n |
the number of cases (subjects, participants, or units) if a covariance matrix is supplied in |
... |
further arguments for |
A data frame containing the F-values and probabilities of the variable to be an unique variable.
P.-O. Caron (R) André Achim (Matlab)
exData <- genr8(n = 420, R = ex_3factors_doub_unique) unique_variable(exData)
exData <- genr8(n = 420, R = ex_3factors_doub_unique) unique_variable(exData)