python - How do I convert list of correlations to covariance matrix? -
i have list of correlations generated text file form:
(first 2 values indicate between points correlation)
2 1 -0.798399811877855e-01 3 1 0.357718108972297e+00 3 2 -0.406142457763738e+00 4 1 0.288467030571132e+00 4 2 -0.129115034405361e+00 4 3 0.156739504479856e+00 5 1 -0.756332254716083e-01 5 2 0.479036971438800e+00 5 3 -0.377545460300584e+00 5 4 -0.265467953118191e+00 6 1 0.909003414436468e-01 6 2 -0.363568902645620e+00 6 3 0.482042347959232e+00 6 4 0.292931692897587e+00 6 5 -0.739868576924150e+00
i have list standard deviations associated of points. how combine these 2 in numpy/scipy create covariance matrix?
it needs efficient method since there 300 points, ~ 50 000 correlations.
assuming table named df
, first column labeled a
, second b
correlation value labeled correlation
:
df2 = df.pivot(index='a', columns='b', values='correlation') >>> df2 b 1 2 3 4 5 2 -0.0798 nan nan nan nan 3 0.3580 -0.406 nan nan nan 4 0.2880 -0.129 0.157 nan nan 5 -0.0756 0.479 -0.378 -0.265 nan 6 0.0909 -0.364 0.482 0.293 -0.74
to convert symmetrical square matrix ones in diagonal:
# unique list of items in rows , columns. items = list(df2) items.extend(list(df2.index)) items = list(set(items)) # create square symmetric correlation matrix corr = df2.values.tolist() corr.insert(0, [np.nan] * len(corr)) corr = pd.dataframe(corr) corr[len(corr) - 1] = [np.nan] * len(corr) in range(len(corr)): corr.iat[i, i] = 1. # set diagonal 1.00 corr.iloc[i, i:] = corr.iloc[i:, i].values # flip matrix. # rename rows , columns. corr.index = items corr.columns = items >>> corr 1 2 3 4 5 6 1 1.0000 -0.0798 0.358 0.288 -0.0756 0.0909 2 -0.0798 1.0000 -0.406 -0.129 0.4790 -0.3640 3 0.3580 -0.4060 1.000 0.157 -0.3780 0.4820 4 0.2880 -0.1290 0.157 1.000 -0.2650 0.2930 5 -0.0756 0.4790 -0.378 -0.265 1.0000 -0.7400 6 0.0909 -0.3640 0.482 0.293 -0.7400 1.0000
do same steps std dev data if not in matrix form.
assuming matrix named df_std
, can covariance matrix follows:
df_cov = corr.multiply(df_std.multiply(df_std.t.values))
Comments
Post a Comment