CS156 LBA: Alexanderplatz PCA and Reconstruction.

For this assignment, I chose Alexanderplatz as the landmark of interest. I took several photos and performed PCA on them. I grayscaled the images, resized and reshaped them as arrays in order to plug the entire dataset into Scikit-Learn's PCA module. After this, I used the module's inverse_transform method along with NumPy reshaping, to recompose the images. Finally, I picked the image that was furthest away from the dataset in 2d form, and separately reconstructed it.

import glob from PIL import Image from resizeimage import resizeimage import numpy as np import matplotlib.pyplot as plt %matplotlib inline from sklearn.decomposition import PCA

Loading the data

for_pca_images = [] normal_images = [] # Loading the images files = glob.glob('/work/images/*.jpg') # Separately storing the normal images for comparison for f in files: img = Image.open(f) image = resizeimage.resize_cover(img, [212, 212]) # resizing here slighly smaller since 512 looks too large with PIL show() normal_images.append(np.array(image)) for f in files: img = Image.open(f).convert("L") # Grayscaling the images image = resizeimage.resize_cover(img, [512, 512]) # Resizing in place for_pca_images.append(np.array(image.getdata())) # Storing as an array for_pca_images = np.asarray(for_pca_images) normal_images = np.asarray(normal_images)

resolution = (512,512) # Showing the images for comparison for i in range(10): Image.fromarray(normal_images[i-1]).show() plt.matshow(np.reshape(for_pca_images[i-1], (resolution))) plt.show()

Performing PCA

# Initializing the PCA pca = PCA(n_components=2) # Fit transforming to the data res = pca.fit_transform(for_pca_images)

print("Explained variance: ", pca.explained_variance_) print("Explained variance ratio: ", pca.explained_variance_ratio_)

As we can see, our two components explain 28% and 14% of the variance. This means that overall, with just two components, we explain 42% of the variance, which is not bad given the low dimension. We can compare it to 3 component PCA, and see how much improvement we would get if we did PCA with 3 components.

# Initializing the PCA pca_3_comp = PCA(n_components=3) # Fit transforming to the data res_3_comp = pca_3_comp.fit_transform(for_pca_images) print("Explained variance: ", pca_3_comp.explained_variance_) print("Explained variance ratio: ", pca_3_comp.explained_variance_ratio_)

As we can see, the 3rd component explains 7% of the variance, in different scenarios, we might consider this a valuable enough and use 3 components, or consider it not very informative and stick to 2.

# Taking the two dimensional coordinates x_cords = [img[0] for img in res] y_cords = [img[1] for img in res]

# Plotting the 2D image data plt.scatter(x_cords, y_cords) plt.title('Scatterplot of PCA-Compressed images') plt.show()

Image Reconstruction

# Recunstructing and showing all images for i in range(len(res)): plt.matshow(np.reshape(pca.inverse_transform(res)[i],resolution))

"Outlier" Reconstruction

# This point is not necessarily an outlier, but we can call it so just for reference here # As we could see in the plot, the furthest outlier was the one at the bottom right corner, # We can use the knowledge that its the maximum value on the x-axis to find its exact x and y coordinates outlier_x_idx = x_cords.index(max(x_cords)) outlier_x = x_cords[outlier_x_idx] outlier_y = y_cords[outlier_x_idx] print(outlier_x, outlier_y)

# Plotting the recunstructed "outlier" point = (29640.100029603953, -8914.494245459658) plt.matshow(np.reshape(pca.inverse_transform(point),size))

As we can see, the image is still fairly recognizable. There are two reasons behind this: Firstly, all images in the dataset were fairly similar to each other, which was useful in the fit stage of the PCA since it could detect patterns better. Secondly, as we can see in the original images, there is a clear distinction between the main object of interest (the tower and nearby buildings) and the the sky, which was mainly just gray and thus there was little noise. This also explains why this particular image was an outlier - we can see that my camera angle here was slightly lower and thus the stores and buildings were not as well captured as in other images, there was more sky here than in other images, which the PCA clearly recognized. It's very possible that the maximum value on x-axis had something to do with the properties of the sky's patterns, since its most prominent here. But we cannot guarantee that for sure.

.css-15w88e5{color:var(--chakra-colors-fg-neutral-primary);font-weight:inherit;letter-spacing:-0.09px;}CS156 LBA: Alexanderplatz PCA and Reconstruction.

Loading the data

Performing PCA

Image Reconstruction

"Outlier" Reconstruction

CS156 LBA: Alexanderplatz PCA and Reconstruction.