# List of features with strongest correlation

The code from this note is useful in case when there is a lot of features (e.g 1k+). In such case it is difficult to analyse visually heatmap of correlation matrix (e.g. plotted with sns.heatmap(), see beautiful example here). Instead we extract pairs with the strongest correlation.

To get a list of features with the strongest correlation in a pandas DataFrame, you can use the `corr()` method to calculate the correlation between all pairs of columns. Here is the Python code to do so:

``````import pandas as pd
import seaborn as sns

# Calculate the correlation matrix
corr_matrix = df.corr()

# Get the top n pairs with the highest correlation
n = 5 # change this to the number of pairs you want to get
top_pairs = corr_matrix.unstack().sort_values(ascending=False)[:n*2]

# Create a list to store the top pairs without duplicates
unique_pairs = []

# Iterate over the top pairs and add only unique pairs to the list
for pair in top_pairs.index:
if pair != pair and (pair, pair) not in unique_pairs:
unique_pairs.append(pair)

# Create a dataframe with the top pairs and their correlation coefficients
top_pairs_df = pd.DataFrame(columns=['feature_1', 'feature_2', 'corr_coef'])
for i, pair in enumerate(unique_pairs[:n]):
top_pairs_df.loc[i] = [pair, pair, corr_matrix.loc[pair, pair]]

# Print the top pairs as a dataframe
display(top_pairs_df)
``````

In this code, we use the `unstack()` method to transform the correlation matrix into a Series of pairs of column names and their correlation values. We then sort the Series in descending order and get the top `2*n` pairs (in correlation matrix pairs appear twice, except correlation of the feature with itself).