Image Compression with K-Means


Image compression is the process of reducing the size of an image file while maintaining its quality.

This is important because images take up a lot of memory and bandwidth, and therefore, compressing them can save time, space, and money.

A frequently used technique for image compression is vector quantization, which maps each pixel in an image to a more limited set of representative colors. One method to accomplish this is through K-Means clustering, where the image is divided into \(K\) clusters and each pixel’s color is replaced by the average color of its cluster. This results in the image being represented by \(K\) distinct colors, which significantly reduces its size.


In this section, we briefly give readers a short primer on the definition and the algorithm that governs K-Means.

On a high level, K-Means is an unsupervised machine-learning algorithm that is used in clustering analysis. Given a set of data points, the algorithm groups the points into \(K\) clusters, where \(K\), a priori, is a user-defined number. The algorithm calculates the mean of each cluster, known as the cluster centroid, and assigns each data point to the closest cluster. This process is repeated until the cluster centroids converge to stable values.

Let’s make this idea more concrete by introducing the concept with proper notations.

Problem Statement

Given a set \(\mathcal{S}\) containing \(N\) data points:

\( \mathcal{S} = \left\{\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \dots, \mathbf{x}^{(N)}\right\} \subseteq \mathbb{R}^D \)

where the vector \(\mathbf{x}^{(n)}\) is the \(n\)-th sample with \(D\) number of features, given by:

\(\mathbf{x}^{(n)} \in \mathbb{R}^{D} = \begin{bmatrix} x_1^{(n)} & x_2^{(n)} & \cdots & x_D^{(n)} \end{bmatrix}^{\mathrm{T}} \quad \text{where } n = 0, 1, \ldots, N.\)

The K-Means algorithm aims to group the data points into a set \(\hat{\mathcal{C}}\) containing \(K\) clusters:

\( \hat{\mathcal{C}} = \left\{ \hat{C}_1, \hat{C}_2 \dots, \hat{C}_K \right\} \)

where \( \hat{C}_k \) is the set of data points \( \mathbf{x}^{(n)} \in \mathcal{S}\) assigned by \( \mathcal{A}(\cdot) \) (explained shortly) to the \( k \)-th cluster:

\(\hat{C}_k = \left\{\mathbf{x}^{(n)} \in \mathbb{R}^{D} \middle\vert \mathcal{A}(n) = k\right\}.\)

Note that \(\mathcal{A}(\cdot)\) is the assignment map that “predicts” and “classifies” each data point into their respective clusters.

Since \( \mathcal{A}(\cdot) \) is an assignment rule, an intuitive way is to find a representative vector \( \boldsymbol{v}_k \) in each cluster \( \hat{C}_k \), and assign every data point \( \mathbf{x}^{(n)} \) that is closest to this representative. This is similar to the nearest-neighbour search algorithm.

What follows is the definition of centroids (cluster centers),

\(\boldsymbol{v}_1, \boldsymbol{v}_2, \dots, \boldsymbol{v}_K \in \mathbb{R}^{D}\)

where \(\boldsymbol{v}_k\) is the centroid of cluster \(C_k\). These are the representatives of each cluster and it can be proven that these centroids are nothing but the mean vector \(\boldsymbol{\mu}_k\) of the cluster \(C_k\).

To this end, we finalize the problem statement by defining an appropriate loss and objective function. More formally, we want to find the assignment \(\mathcal{A}(\cdot)\) and the cluster center \(\boldsymbol{v}_k\) such that the sum of squared distances between each data point and its cluster center is minimized. This means partitioning the data points according to the Voronoi Diagram.

Mathematically, we minimize the following objective function:

\( \begin{alignat}{3} \underset{\mathcal{A}, \boldsymbol{v}_k}{\operatorname{argmin}} &\quad& \hat{\mathcal{J}}_{\mathcal{S}}(\mathcal{A}, \boldsymbol{v}_1, \boldsymbol{v}_2, \ldots, \boldsymbol{v}_K) &= \sum_{n=1}^{N} \sum_{\mathcal{A}(n) = k} \left\|\mathbf{x}^{(n)} – \boldsymbol{v}_k \right\|^2 \\ \text{subject to} &\quad& \hat{C}_1 \sqcup \hat{C}_2 \sqcup \cdots \sqcup \hat{C}_K &= \mathcal{S} \\ &\quad& \boldsymbol{v}_1, \boldsymbol{v}_2, \ldots, \boldsymbol{v}_K &\in \mathbb{R}^D \\ \end{alignat}\)

The problem in itself seems manageable since we are trying to partition the data points into \(K\) clusters and minimize the intra-cluster distance (variances). However, jointly optimizing both the assignment map \(\mathcal{A}(\cdot)\) and the centroids \(\boldsymbol{v}_k\) is an NP-hard problem and is computationally challenging.

However, there are many heuristics that are used to solve the problem. We will talk about one of the most popular heuristics, Lloyd’s algorithm, in the next section.

Lloyd’s Algorithm

Lloyd’s algorithm is the most well-known algorithm that greedily optimizes and solves the K-Means clustering problem. It is a greedy algorithm because it optimizes and finds the best step at each iteration. Consequently, it guarantees the objective function converges to a local minimum.

What follows is a description of Lloyd’s algorithm.

Step 1. Initialization Step.

Initialize \(K\) cluster centers \(\boldsymbol{\mu}_1^{0}, \boldsymbol{\mu}_2^{0}, \dots, \boldsymbol{\mu}_K^{0}\) randomly (best to be far apart) where the superscript \(0\) denotes the iteration number \(t=0\).

Some things to note:

  • In the very first iteration, there are no data points in any cluster \(\hat{C}_k^{0} = \emptyset\). Therefore, the cluster centers are just randomly chosen for simplicity.
  • By random, we mean that the cluster centers are randomly chosen from the data points \(\mathcal{S} = \left\{\mathbf{x}^{(1)}, \mathbf{x}^{(2)}, \dots, \mathbf{x}^{(N)}\right\}\) and not randomly chosen from the feature space \(\mathbb{R}^D\).
  • Subsequent iterations will have data points in the clusters \(\hat{C}_k^{t} \neq \emptyset\) and thus \(\boldsymbol{\mu}_k^{t}\) will be the mean of the data points in cluster \(k\).
  • Each \(\boldsymbol{\mu}_k^{0} = \begin{bmatrix} \mu_{1k}^{0} & \mu_{2k}^{0} & \cdots & \mu_{Dk}^{0} \end{bmatrix}^{\mathrm{T}}\) is a \(D\)-dimensional vector, where \(D\) is the number of features, and represents the mean vector of all the data points in cluster \(k\).
  • Note that \(\mu_{dk}^{0}\) is the mean value of the \(d\)-th feature in cluster \(k\).
  • We denote \(\boldsymbol{\mu} = \begin{bmatrix} \boldsymbol{\mu}_1 & \boldsymbol{\mu}_2 & \cdots & \boldsymbol{\mu}_K \end{bmatrix}^{T}_{K \times D}\) to be the collection of all \(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2, \dots, \boldsymbol{\mu}_K\).

Step 2. Assignment Step.

Then for \(t=0, 1, 2, \dots\), we have:

Assignment Step (E): Assign each data point \(\mathbf{x}^{(n)}\) to the closest cluster center \(\boldsymbol{\mu}_k^{t}\),

\( \begin{aligned} \hat{y}^{(n), t} := \mathcal{A}^{*, t}(n) &= \underset{k \in {1, 2, \ldots, K}}{\operatorname{argmin}} \left| \mathbf{x}^{(n)} – \boldsymbol{\mu}_k^{t} \right|^2 \ \end{aligned} \)

In other words, \(\hat{y}^{(n), t}\) is the output of the optimal assignment rule \(\mathcal{A}^{*}\) at the \(t\)-th iteration and is the index of the cluster center \(\boldsymbol{\mu}_k^{t}\) that is closest to \(\mathbf{x}^{(n)}\).

Step 3. Update Step.

Update Step (M): Update the cluster centers for the next iteration.

\(\begin{aligned} \boldsymbol{\mu}_k^{t+1} &= \frac{1}{\left|\hat{C}_k^{t}\right|} \sum_{\mathbf{x}^{(n)} \in \hat{C}_k^{t}} \mathbf{x}^{(n)} \end{aligned}\)

Notice that the cluster center \(\boldsymbol{\mu}_k^{t+1}\) is the mean of all data points that are assigned to cluster \(k\) at iteration \(t\).

Step 4. Repeat till Convergence

Repeat steps 2 and 3 until the centroids stop changing.

\(\begin{aligned}\boldsymbol{\mu}_k^{t+1} = \boldsymbol{\mu}_k^{t}\end{aligned}\)

This is the convergence condition.

In the above algorithm, we have assumed without proof that the mentioned assignment map \(\mathcal{A}(\cdot)\) and the centroids \(\boldsymbol{\mu}_k\) does indeed minimize the objective function \(\mathcal{J}\) at each step. To this end, we have defined what K-Means is and the algorithm behind it.

Vector Quantization

Recall that we mentioned that vector quantization is a technique used in image compression to reduce the amount of data needed to represent an image, and that K-Means is a method of vector quantization. We shall see below that they are of equivalent form. Below is an adaptation from “Chapter 21.3. K-Means Clustering.” In Probabilistic Machine Learning: An Introduction, written by Murphy, Kevin P.

Given a sequence of real-valued vector \(\left\{\mathbf{x}^{(n)}\right\}_{n=1}^N\) where each \(\mathbf{x}^{(n)} \in \mathbb{R}^{D}\), we can use vector quantization to perform a lossy compression as follows:

Step 1.

First, we replace each real-valued vector \(\mathbf{x}^{(n)} \in \mathbb{R}^D\) with a discrete symbol \(\mathbf{z}^{(n)} \in \{1, \ldots , K\}\).

Step 2.

Then, we define a codebook of \(K\) prototypes \(\left\{\boldsymbol{\mu}_k\right\}_{k=1}^K \in \mathbb{R}^D\). Since each real-valued vector \(\mathbf{x}^{(n)} \in \mathbb{R}^D\) is replaced with a discrete symbol \(\mathbf{z}^{(n)} \in \{1, \ldots , K\}\), we can think of the codebook as a lookup table that maps each symbol to a prototype.

Step 3.

Finally, we encode each real-valued vector \(\mathbf{x}^{(n)} \in \mathbb{R}^D\) by identifying the prototype that is most similar to it, using Euclidean distance as a measure of similarity:

\(\begin{equation} \text{encode}\left(\mathbf{x}^{(n)}\right) = \arg \min_{k} \left|\mathbf{x}^{(n)} – \boldsymbol{\mu}_k \right|^2 \end{equation}\)

The quality of a codebook can be assessed by determining the reconstruction error or the amount of distortion it causes, as expressed by the following equation:

\(\begin{equation} \mathcal{J} = \frac{1}{N} \sum_{n=1}^N |\mathbf{x}^{(n)} – \text{decode}\left(\text{encode}\left(\mathbf{x}^{(n)}\right)\right)|^2 = \frac{1}{N} \sum_{n=1}^N \left\|\mathbf{x}^{(n)} – \boldsymbol{\mu}_{\mathbf{z}^{(n)}}\right\|^2 \end{equation}\)

where \(\text{decode}(k) = \boldsymbol{\mu}_k\).

We see that the compression error has the same form as the K-Means cost function. And therefore we can use K-Means to perform vector quantization, and thereby achieving compression.

A Primer on Binary Digits (Bits) and 8-bit Unsigned Integers

To better appreciate how image compression work, it is important to understand how images (and any information) are stored in computers.

Information is stored in computers as a series of bits, the smallest units of information used in a computer. They have binary values, meaning they can either be 0 or 1. A series of bits are used to represent data and constitute the backbone of all digital information.

Images are often stored in 8-bit precision, which means they are represented as 8-bit unsigned integers. An 8-bit unsigned integer is a type of integer data type that can store a positive value between 0 and 255, inclusive, without a sign bit. Each bit in an 8-bit unsigned integer requires one memory space and can only have a value of 0 or 1.

For example, let’s say a pixel value is 100. It can be represented in 8-bit binary form as \(01100100\). In an 8-bit unsigned integer representation, the most significant bit (leftmost bit) represents the value \(2^{7}=128\), the second most significant bit represents \(2^6=64\), the third most significant bit represents \(2^5=32\), and all the way till the least significant bit (rightmost bit) represents \(2^0=1\).

This is why most images range from 0 to 255, as they are stored as 8-bit unsigned integers.

The number of bits needed to store an image depends on the size of the image, the number of color channels, and the number of bits used to represent each color channel.

Number of Bits needed for a Positive Integer

Now given a positive integer \(K\), the number of bits needed to represent it is:

\( n_{bits} = \lceil \log_2 K \rceil \)


  • \(n_{bits}\) is the number of bits required;
  • \(\lceil \cdot \rceil\) is the ceiling function, which rounds up to the nearest integer.

With this formula, we can write a simple function that takes in a NumPy array and calculate the number of bits needed to store it. This function does not take into account negative numbers and many other things as it is only meant for educational purposes.

def calculate_num_bits(arr: np.ndarray, max_int: int) -> Tuple[int, int]:
    Calculate the number of bits required to store a NumPy array.
        arr (np.ndarray): A NumPy array.
        max_int (int): The maximum integer value allowed in this number system.
        num_bits (int): The number of bits required to store each value.
        total_bits (int): The number of bits required to store the array.
    # number of bits required to store each value
    num_bits = int(math.ceil(math.log2(max_int))) 
    total_bits = num_bits * arr.size
    return num_bits, total_bits

Image Compression with K-Means

In image compression, we use K-Means to group similar pixels into \(K\) clusters. Each cluster centroid represents a representative color for the pixels in the cluster, and we can map each pixel to the closest centroid. This reduces the number of colors required to represent the image, and thus the size of the image data. The mapping of each pixel to a cluster centroid can be stored using a smaller number of bits compared to the original 24-bit RGB values.

Where did this 24-bit come from? Let’s define the problem statement and find out!

Problem Statement

Let an image be denoted by \(\mathbf{I} \in \mathbb{R}^{m \times n \times 3}\), where \(m\) is the width of the image, \(n\) is the height of the image, and 3 is the number of color channels representing the red, green, and blue colors.

We have the following properties:

  • There are a total of \(N = m \times n\) pixels in the image \(\mathbf{I}\).
  • Each pixel \(\mathbf{x} = \begin{bmatrix} r & g & b \end{bmatrix}^T \in \mathbb{Z}_{rgb}^3\) on the image \(\mathbf{I}\) is a 3-dimensional (3-D) tuple residing in the 3-D integer space comprising the intensities of the red, green and blue channels.
  • \(\mathbb{Z}_{rgb}\) represents all integers from \(0\) to \(255\). Therefore, each color channel in the pixel is of the range \(0\) to \(255\), and is assumed to be an 8-bit unsigned integer.

Based on our earlier discussion of bits, we require 8 bits to represent numbers from \(0\) to \(255\). It follows that each pixel contains \(3 \times 8 = 24\) bits of information. Consequently, the total number of bits required to represent the image is \(24 \times N = 24N\) bits.

Our aim is to find a compression map \(h: \mathbb{R}^{N} \rightarrow \mathbb{R}^{M}\), where \(M < N\), such that the number of bits required to represent the compressed image is less than the number of bits required to represent the original image.

The compressed output \(\mathbf{z} = h(\mathbf{x})\) is therefore called a code (compressed representation) of the input \(\mathbf{x}\) (original representation).

We can then reconstruct the original image \(\mathbf{I}\) from the compressed image \(\mathbf{z}\) by a reconstruction map \(r: \mathbb{R}^{M} \rightarrow \mathbb{R}^{N}\), such that \(\hat{\mathbf{x}} = r(\mathbf{z}) = r(h(\mathbf{x}))\). Note that \(\hat{\mathbf{x}}\) is the reconstructed representation of \(\mathbf{x}\) and may or may not be the same as \(\mathbf{x}\). If they are not the same, then the compression is lossy.

Steps to Compress an Image

Here is a step-by-step guide to using K-Means for image compression, we will later solidify this with code implementation.

  • Load and read the image: Load the image \(\mathbf{I}\) into memory. Typically \(\mathbf{I}\) will be converted to a NumPy array of shape \(m \times n \times 3\).
  • Reshape the image data: Reshape the image matrix \(\mathbf{I}\) into a 2D array of size \(N \times 3\), where \(N = m \times n\) is the total number of pixels in the image. Each row of the array represents a pixel in the image. This step is necessary because K-Means expects a 2D array as input. One can now think of \(N \times 3\) as a dataset \(\mathcal{S}\) with \(N\) data points, with \(D=3\) features.
  • Apply K-Means clustering: Apply K-Means clustering to the reshaped image data, using a user-defined number of clusters, \(K\). This will group similar pixels into \(K\) clusters and calculate the cluster centroids \(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2, \ldots, \boldsymbol{\mu}_K\). Each centroid is a 3-dimensional vector representing the RGB values of the representative color for the cluster.
  • Compress the image: For each pixel in the image, find the closest cluster centroid \(\boldsymbol{\mu}_k\) and replace the pixel with the index \(\hat{y}^{(n)}\) of the centroid.
  • Store the compressed image: Store the compressed image, which includes the cluster centroids and the mapping of each pixel to a centroid, to disk.


To put things into perspective, let’s consider a simple example.

Let us consider an image of size \(100 \times 100 \times 3\) pixels.

Then there are \(N = 100 \times 100 = 10,000\) pixels in the image, and each pixel is represented by a 3-dimensional vector.

Then assuming that each value in the RGB tuple is stored with 8 bits of precision, then the total number of bits required to represent the image is

\( 24N = 24 \times 100 \times 100 = 240,000 \text{ bits} \)

Then suppose we run K-Means with \(K=16\) clusters. We will get \(16\) cluster centroids, each of which is a \(3\)-dimensional vector. This requires us to store \(16 \times 3 = 48\) numbers, each of which is represented with 8 bits of precision. This leads to a total of \(16 \times 3 \times 8 = 384\) bits to store the cluster centroids.

Next, we also need to store the mapping of each pixel in the image to a cluster centroid. This is an \(N \times 1\) vector, and for each pixel, we only store a single number ranging from 0 to 15. This means that we will need \(\lceil \log_2 16 \rceil = 4\) bits to represent the index of the cluster centroid to which a pixel is assigned. This amounts to a total of \(10,000 \times 4 = 40,000\) bits to store the mapping of each pixel to a cluster centroid.

Thus, the total number of bits required to store the compressed image is

\( 40,000 + 384 = 40,384 \text{ bits} \)

compared to the original image size of \(240,000\) bits. This results in a compression ratio of

\(40,384 / 240,000 = 16.8\%\)

Step-by-step code implementation

In this section, we will go through the process of applying K-Means for compressing images. Specifically, we will use a hestain1 image, a type of tissue image stained with hematoxylin and eosin (H&E). Pathologists utilize this staining technique to differentiate between tissue types that are colored blue-purple and pink.

Image courtesy of Alan Partin, Johns Hopkins University

Load and read the Image

Let’s load the image into memory first.

image_path = "https://storage.googleapis.com/reighns/images/hestain.png"
image = np.array(PIL.Image.open(urlopen(image_path)).convert("RGB"))
image_shape = image.shape
print(f"Image shape: {image_shape}")
print(f"Image dtype: {image.dtype}") # uint8
>>> Image shape: (227, 303, 3)
>>> Image dtype: uint8

Here are the steps for the code:

  • Define the image path as a URL, "https://storage.googleapis.com/reighns/images/hestain.png".
  • Convert the image into a NumPy array using the PIL library, by opening the image using PIL.Image.open method and converting it into an RGB format using the convert method.
  • Store the shape of the image as a tuple in the variable image_shape.
  • We also print out the image data type.

Size of the Image

We use the function calculate_num_bits defined earlier to calculate how many bits this image takes.

_, image_size_in_bits = calculate_num_bits(image, 255)
print(f"Image size in bits: {image_size_in_bits}")
>>> Image size in bits: 1650744

We can verify this number via the formula earlier: \(227 \times 303 \times 24 = 1,650,744\)

Apply Compression via k-means

In what follows, we will write a function compress_image that takes in an image represented as a NumPy array image and other optional parameters as a dictionary kmeans_kwargs, which are passed to the K-Means algorithm.

def compress_image(
    image: np.ndarray, **kmeans_kwargs: Dict[str, Any]
) -> Tuple[np.ndarray, np.ndarray]:
    pixels = image.reshape((image.shape[0] * image.shape[1], 3))

    kmeans = KMeans(**kmeans_kwargs)

    compressed_image = kmeans.labels_.astype(np.uint8) # y_preds and convert to uint8
    centroids = kmeans.cluster_centers_ # total bits = 3 * 8 * K
    return compressed_image, centroids

The function performs the following steps:

  1. Reshapes the image from (height, width, depth) to (height * width, depth) to convert it into a 2D array, where each row is a pixel represented by its RGB values. We assign the reshaped image as a variable named pixels.
  2. Instantiate the K-Means algorithm with the given kmeans_kwargs and fit the K-Means model on the pixel data.
  3. Extract the predicted cluster labels for each pixel and name it compressed_image.
  4. Extract the predicted centroids (mean values) of each cluster and name it centroids.
  5. Return compressed_image and centroids. The former has the shape of (N, 1), where N is the number of pixels in the image, and the latter has the shape of (K, 3), where K is the number of clusters. Note that compressed_image is nothing but a 1D array of cluster labels and centroids is a 2D array of RGB values of the cluster centroids (the mean of each cluster).

Let’s run the code with K=2 on the image.

K = 2
compressed_image, centroids = compress_image(
    image, n_clusters=K, n_init="auto", max_iter=300, random_state=42

Let’s see a comparison of the size of the original image, compressed image, centroid, total compressed size, and compression ratio in the table below.

Original size1650744 bits
Compressed Image size68781 bits
Centroids size48 bits
Compressed size68829 bits
Compression ratio0.04 (~4 percent)
Image size in bits, before and after.

We see that we managed to reduce our storage by approximately \(96\%\). Let’s see how the compressed image looks by reconstructing it.


In this section, we will write a function reconstruct_image that takes in the compressed image represented as a NumPy array compressed_image, the centroids that was learned from the K-Means algorithm, and original_shape, which is the shape of the original image.

def reconstruct_image(
    compressed_image: np.ndarray,
    centroids: np.ndarray,
    original_shape: Tuple[int, int, int],
) -> np.ndarray:
    reconstructed_image = centroids[compressed_image]
    reconstructed_image = reconstructed_image.reshape(original_shape).astype(np.uint8)
    return reconstructed_image
Let’s see what the arguments represent:
  • compressed_image: A NumPy array of shape (N, 1) representing the compressed image, where N is the number of pixels in the original image.
  • centroids: A NumPy array of shape (K, 3) representing the cluster centroids obtained after performing K-Means on the original image.
  • original_shape: A tuple representing the shape of the original image, in the format (height, width, depth).
The function performs the following steps to reconstruct the original image from the compressed form:
  1. It uses the compressed_image array to index into the centroids array, effectively replacing each pixel value in compressed_image with its corresponding cluster centroid.
  2. It then reshapes the resulting array to match the original shape of the image.
  3. Finally, it returns the reconstructed image as a NumPy array of unsigned 8-bit integers.
In summary, the reconstruct_image function takes a compressed image represented as a 1D array of cluster labels and the RGB values of the cluster centroids and returns the original image in its 3D NumPy array representation.
reconstructed_image = reconstruct_image(compressed_image, centroids, image.shape)

Comparing the original image and the reconstructed image

[Left] Original Image versus [Right] Reconstructed Image with K=2.

We can clearly see that with \(K=2\) clusters, the image is dichotomized into two distinct segments. This is quite a coarse representation of the original image.

Let’s run the compression technique a few more times, this time with \(K=2, 3, 6\) and \(16\) and observe the results.

Reconstructed image for K=2, 3, 6 and 16

As we can see, the quality of the compressed image will depend on the number of clusters used. In general, a larger number of clusters will result in higher image quality, but at the expense of a larger file size.

Image Segmentation

Image compression using K-Means can be viewed as a form of image segmentation in the RGB space. Why is that the case? Let’s first look at the definition of image segmentation.

Image segmentation is the process of partitioning a digital image into multiple segments or regions, each of which corresponds to a separate object or part of the image.

Extracted from Wikipedia, Image Segmentation

Looking back at our K-Means algorithm, it divides the image into segments based on the similarities of the colors of the pixels in the image. In other words, this coincides with partitioning the image into \(K\) colors, defined by each centroid.

Consequently, this results in segmentation of the image via the color (RGB) space.

One should however note that this method of segmentation does not take into account the spatial geometry of neighboring pixels, and thus result in a less meaningful representation of images. Let’s see why.

Although K-Means does not assume any underlying probability distribution (unless you view it as Gaussian Mixture Model), we can still view the clustering method as treating each pixel as an independent data point and grouping them together solely based on their color values. As a result, neighboring pixels with different colors may be assigned to different clusters or segments, leading to a loss of spatial coherence in the resulting segmentation or compression. Therefore, if you think that segmenting an image based on the color space is not the aim, you might need to explore other methods. Modern deep learning tends to handle it better with more fine-grained segments.


In this article, we covered how we can utilize K-Means as a way to compress images through the concept of vector quantization. We further link how the images can be segmented into different color spaces as a consequence of performing the clustering algorithm.

K-Means has many other applications, for example, in the context of Natural Language Processing (NLP), you can use it to cluster documents as well.

For a more formal treatment of the K-Means algorithm as a whole, one can refer to the references section. I have also written a more detailed post on K-Means here, which touches on the formulation of this algorithm.


  • Murphy, Kevin P. “Chapter 21.3. K-Means Clustering.” In Probabilistic Machine Learning: An Introduction. MIT Press, 2022.
  • Bishop, Christopher M. “Chapter 9.1. K-Means Clustering.” In Pattern Recognition and Machine Learning. New York: Springer-Verlag, 2016.



5 3 votes
Article Rating
Notify of
Newest Most Voted
Inline Feedbacks
View all comments
1 month ago

Good work!