

Hello everyone,
This is my first time posting, so if I need to change anything in this post
please let me know!
I am working on a paper myself and came across this research topic called
AESPI (Aggregated Energy Security Performance Indicator). (The paper can be
found here for those interested:
https://www.sciencedirect.com/science/article/pii/S0306261912007337)
So the same author has successfully applied AESPI in the case of Thailand
( https://www.sciencedirect.com/science/article/pii/S0306261914003985) and
has included all standardized data for 45 years in all 25 indicators.
So here comes my dilemma, I have tried to reproduce their results using SPSS
and performing a PCA on the standardized data, but including all variables
leads to the "This matrix is not positive definite" error when trying to do
a KMO Test.
Additionally, the eigenvalues that are offered in the paper are different
from mine. I get only 3 components, while they in their paper get 5.
I have included the SPSS file I was using and a picture of the orginal data.
< http://spssxdiscussion.1045642.n5.nabble.com/file/t341397/DataPic.png>
< http://spssxdiscussion.1045642.n5.nabble.com/file/t341397/Rotated_Values.png>
Thailand_Data_Test_Comparison.sav
< http://spssxdiscussion.1045642.n5.nabble.com/file/t341397/Thailand_Data_Test_Comparison.sav>

Sent from: http://spssxdiscussion.1045642.n5.nabble.com/=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Okay, I'll make a couple of simple points and I'm sure that if
I am terribly wrong (or even slightly), someone will correct me. So: (1) You probably have negative eigenvalues. Now, you may not realize this because for reasons only the original programmers of the Factor procedure decided to print only positive eigenvalues. If you do a principal components analysis and have negative eigenvalues, then your correlation matrix (I assume 25 x 25 matrix based on 45 units of analysis  a prize to the first person to explain this problem) is not positive definite or positive semidefinite. But maybe you don't really want to do a principal components analysis. (2) I thank God everyday for providing use with the UCLA IDRE center even though it is far from perfect. Why am I so glad? Consider the following link the presents a Principal FACTOR analysis done with SAS: https://stats.idre.ucla.edu/sas/output/factoranalysis/Now, you might be asking "Why is he showing me SAS output when I'm doing SPSS?" Well, the answer is that the SAS output is better annotated than the SPSS output. For example, consider the following quote:
"b. Eigenvalue – This is the initial eigenvalue. An eigenvalue is
the variance of the factor. Because this is an unrotated solution, the
first factor will account for the most variance, the second will account for the
second highest amount of variance, and so on. Some of the eigenvalues are
negative because the matrix is not of full rank. This means that there are
probably only four dimensions (corresponding to the four factors whose
eigenvalues are greater than zero). Although it is strange to have a
negative variance, this happens because the factor analysis is only analyzing
the common variance, which is less than the total variance. *******If we were
doing a principal components analysis, we would have had 1’s on the diagonal,
which means that all of the variance is being analyzed (which is another way of
saying that we are assuming that we have no measurement error), and we would not
have negative eigenvalues. In general, it is not uncommon to have negative
eigenvalues.********"
So, make sure that you don't have any negative eigenvalues
if you are doing a principal components analysis. Otherwise,
you ****might***** want to do a principal factor analysis instead
(which may be what your original source did but did not report
it correctly). I note that the SPSS output for factor does not
provide this warning.
(3) The UCLA IDRE center does provide an annotated output
However, let me point out something that presented in
the front matter of this webpage. Quoting:
"
Factor analysis is a technique that requires a large sample size.
Factor analysis is based on the correlation matrix of the variables involved,
and correlations usually need a large sample size before they stabilize.
Tabachnick and Fidell (2001, page 588) cite Comrey and Lee’s (1992) advise
regarding sample size: 50 cases is very poor, 100 is poor, 200 is fair, 300 is
good, 500 is very good, and 1000 or more is excellent. As a rule of thumb,
a bare minimum of 10 observations per variable is necessary to avoid
computational difficulties."
You say that you have 45 years but the table you present
indicates that there are a few variables that do not have
values for certain years, meaning, if year is the unit of
analysis, you have less than 45 years or a little more than
1 case per variable. What is wrong with this picture?
I will leave it to others to suggest ways of dealing with this
situation.
Mike Palij
New York University
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


I looked at the tables 4 and 5 (your data, I think), and here are some notes.
The columns ("observations") run from 1986 to 2030; however, the set with "complete
data" which a factor analysis will use start with 2004. Twentyseven observations will be
highly unstable for components or factors of 25 variables. If you are analyzing correlations
(rather than raw values), you are at about the minimum for full rank. The KMO result says
to me that your data are not fullrank. Somewhere, you have collinearity.
The first three variables (eco1.1 to eco1.3) appear to be practically identical across the
range of years. Are these versions of each other?
Most or all the variables show a strong "year" trend. Since we haven't seen Years of 2018 to
2030, I assume that these are projections. The formulas for projecting would produce
collinearity, if the latter columns are linear combinations of the early columns.
A principal component analysis /can/ show you as many components (if full rank) as you have
variables. Getting 3 or 5 from a set of data depends on what you specify as options.

Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Daradai <[hidden email]>
Sent: Monday, February 5, 2018 7:21:59 PM
To: [hidden email]
Subject: Trying to reproduce PCA analysis of a published paper, but not getting same results
Hello everyone,
This is my first time posting, so if I need to change anything in this post
please let me know!
I am working on a paper myself and came across this research topic called
AESPI (Aggregated Energy Security Performance Indicator). (The paper can be
found here for those interested:
https://www.sciencedirect.com/science/article/pii/S0306261912007337)
So the same author has successfully applied AESPI in the case of Thailand
( https://www.sciencedirect.com/science/article/pii/S0306261914003985) and
has included all standardized data for 45 years in all 25 indicators.
So here comes my dilemma, I have tried to reproduce their results using SPSS
and performing a PCA on the standardized data, but including all variables
leads to the "This matrix is not positive definite" error when trying to do
a KMO Test.
Additionally, the eigenvalues that are offered in the paper are different
from mine. I get only 3 components, while they in their paper get 5.
I have included the SPSS file I was using and a picture of the orginal data.
< http://spssxdiscussion.1045642.n5.nabble.com/file/t341397/DataPic.png>
< http://spssxdiscussion.1045642.n5.nabble.com/file/t341397/Rotated_Values.png>
Thailand_Data_Test_Comparison.sav
< http://spssxdiscussion.1045642.n5.nabble.com/file/t341397/Thailand_Data_Test_Comparison.sav>

Sent from: http://spssxdiscussion.1045642.n5.nabble.com/
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Not positive definite (p.d.) means the correlation matrix has either
some zero or some negative (or both) eigenvalues. Zero eigenvalues
appear when there are linear dependencies among variables or when
N<P (number of cases is less than number of variables). Negative
eigenvalues may appear if there were missing data which you deleted
in "pairwise" manner, or when the correlation matrix was not
computed from data but estimated somehow or simply borrowed and
entered with not enough precision.
Note please, besides, that KMO index isn't needed in PCA. It is of
value in Factor analysis. PCA easily tolerates non p.d. matrix, but
Factor analysis (most methods) doesn't. If you pretend to use PCA as
"factor analysis" (i.e. going to interpret factors as real latents
generating data) your matrix should be p.d.
=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Thank you, everyone, for your great help. Unfortunately, I have just started
delving into statistics and SPSS so it will take me some time to understand
all of the intricacies you have discussed here.
I want to include an answer I just received via mail, who found the solution
to my main issue, my data deviating from the source:
"
It seems that the authors of the article used the option "Replace with mean"
under factor analysis Options/Missing Values. In SPSS version 24 this seems
to produce the same summary statistics (Table 7), rotated loadings (Table
8), but slightly different KMO/Bartlett results (Table6).
"
Again, thank you everyone for your great help!

Sent from: http://spssxdiscussion.1045642.n5.nabble.com/=====================
To manage your subscription to SPSSXL, send a message to
[hidden email] (not to SPSSXL), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSXL
For a list of commands to manage subscriptions, send the command
INFO REFCARD

