posh.wiki


Comparing colours mathematically is harder than I thought.

2026-03-19

Tags: Art, Math

Some time ago, I started working on a small project to create cross-stitch patterns from images. As a part of it, I wanted to reduce the number of distinct colours so I knew exactly how much of exactly which colour of thread to buy.

Initially, I assumed that it would be as simple as calculating Euclidean distance between the colour in the source image and nearby thread colours. Boy was I wrong! While it's a valid approach, the results left a lot to be desired. Some colours looked noticably wrong.

This led me down a rabbit hole of different colour spaces and comparison algorithms. A rabbit hole which I will now present to you, in the hopes that you are as insane as I am and might jump into it.

Manhattan Distance

Manhattan distance is a computationally cheap algorithm for determining the distance between 2 points in a Cartesian co-ordinate system, such as RGB colour space.

It's also known as:

Unlike Euclidean distance, which is the length of a line segment between 2 points (often termed "as the crow flies"), Manhattan distance is the sum of absolute differences of the co-ordinates. Imagine how a taxi would drive from A to B in a grid-like city - it can't take a straight line path; it has to go around buildings.

The Manhattan distance between points \(p\) \(p\) and \(q\) \(q\) in an \(n\) \(n\) -dimensional Cartesian co-ordinate system, where \(p=(p_1,p_2...p_n)\) \(p=(p_1,p_2...p_n)\) and \(q=(q_1,q_2...q_n)\) \(q=(q_1,q_2...q_n)\) , is:

\[ d_T(p,q) = ||p - q||_T = \sum_{i=1}^{n}|p_i-q_i| \] \[ d_T(p,q) = ||p - q||_T = \sum_{i=1}^{n}|p_i-q_i| \]

This is one of the fastest options for computing the visual similarity of two colours, but yields the least accurate results due to the limitations of the algorithm and the fact that the RGB colour space is not perceptually uniform. We can improve upon this by using a more fitting algorithm.

Euclidean Distance

Euclidean distance, as previously mentioned, is the length of the line segment between two points in a Cartesian co-ordinate system.

Its implementation differs slightly based on the number of dimensions. Since RGB is a 3-dimensional space, we use:

\[ d(p,q) = \sqrt{(p_1-q_1)^2 + (p_2-q_2)^2 + (p_3-q_3)^2} \] \[ d(p,q) = \sqrt{(p_1-q_1)^2 + (p_2-q_2)^2 + (p_3-q_3)^2} \]

This is a little more computationally expensive and a little more accurate than Manhattan distance, but still relatively inaccurate because of the non-uniformity of the colour space.

There are several adjustments upon this algorithm which attempt to introduce weights, however the improvements are minimal. What we really need is to change the colour space we're using.

CIE76

In 1976, the Commission Internationale de L'Éclairage attempted to define a more perceptually uniform colour space. They came up with LAB (or CIELab), expresses colour in terms of Lightness (\(L\) \(L\) ), green/red spectrum (\(a\) \(a\) ), and blue/yellow spectrum (\(b\) \(b\) ).

The CIE76 colour difference formula is simply the application of the Euclidean distance algorithm to two \(L*a*b*\) \(L*a*b*\) co-ordinates:

\[ \Delta E^{*}_{ab} = \sqrt{(L^*_2-L^*_1)^2 + (a^*_2-a^*_1)^2 + (b^*_2-b^*_1)^2} \] \[ \Delta E^{*}_{ab} = \sqrt{(L^*_2-L^*_1)^2 + (a^*_2-a^*_1)^2 + (b^*_2-b^*_1)^2} \]

Research has shown that a \(\Delta E^*_{ab}\) \(\Delta E^*_{ab}\) of around 2.3 is a "just noticeable difference".

Unfortunately, due to limitations in the CIELab colour space, this algorithm performs more poorly in blue, gray, and low-chroma (saturation) regions.

CIE94

In 1994, the CIE improved upon the CIE76 algorithm in an attempt to address non-uniformities. They introduced some weight parameters \(k_L\) \(k_L\) , \(k_C\) \(k_C\) and \(k_H\) \(k_H\) .

For graphic arts, the parameters are:

\[ k_L = 1 \] \[ k_L = 1 \]
\[ K_1 = 0.045 \] \[ K_1 = 0.045 \]
\[ K_2 = 0.015 \] \[ K_2 = 0.015 \]

For textile arts, they are:

\[ k_L = 2 \] \[ k_L = 2 \]
\[ K_1 = 0.048 \] \[ K_1 = 0.048 \]
\[ K_2 = 0.014 \] \[ K_2 = 0.014 \]

The final algorithm also relies on the following functions:

\[ C^*_n = \sqrt{{a^*_n}^2 + {b^*_n}^2} \] \[ C^*_n = \sqrt{{a^*_n}^2 + {b^*_n}^2} \]
\[ \Delta H* = \sqrt{{\Delta E^*_{ab}}^2 - \Delta L^{*2} - \Delta C^{*2}} = \sqrt{\Delta a^{*2} + \Delta b^{*2} - \Delta C^{*2}} \] \[ \Delta H* = \sqrt{{\Delta E^*_{ab}}^2 - \Delta L^{*2} - \Delta C^{*2}} = \sqrt{\Delta a^{*2} + \Delta b^{*2} - \Delta C^{*2}} \]
\[ S_L = 1 \] \[ S_L = 1 \]
\[ S_C = 1 + K_1C^*_1 \] \[ S_C = 1 + K_1C^*_1 \]
\[ S_H = 1 + K_2C^*_1 \] \[ S_H = 1 + K_2C^*_1 \]

With all that set up, difference between colours \((L^*_1, a^*_1, b^*_1)\) \((L^*_1, a^*_1, b^*_1)\) and \((L^*_2, a^*_2, b^*_2)\) \((L^*_2, a^*_2, b^*_2)\) is defined as:

\[ \Delta E^*_{94} = \sqrt{({{\Delta L*}\over{k_LS_L}})^2 + ({{\Delta C*}\over{k_CS_C}})^2 + ({{\Delta L*}\over{k_HS_H}})^2} \] \[ \Delta E^*_{94} = \sqrt{({{\Delta L*}\over{k_LS_L}})^2 + ({{\Delta C*}\over{k_CS_C}})^2 + ({{\Delta L*}\over{k_HS_H}})^2} \]

While this was better than CIE74, it still retained the problems with blue, grey, and low-chroma regions.

CIEDE2000

Since CIE94 didn't fully resolve perceptual uniformity issues, the CIE introduced CIEDE2000 in 2001. This algorithm applies a series of corrections to better align computed colour differences with human perception, particularly addressing the blue, grey, and low-chroma options.

There's a lot more setup required. We have to adjust (or "prime") some values to work with the final equation.

To compute chroma, then correct it:

\[ C^*_1 = \sqrt{a_1^{*2} + b_1^{*2}}, \quad C^*_2 = \sqrt{a_2^{*2} + b_2^{*2}} \] \[ C^*_1 = \sqrt{a_1^{*2} + b_1^{*2}}, \quad C^*_2 = \sqrt{a_2^{*2} + b_2^{*2}} \]
\[ \bar{C}^* = \frac{C^*_1 + C^*_2}{2} \] \[ \bar{C}^* = \frac{C^*_1 + C^*_2}{2} \]
\[ G = \frac{1}{2} \left(1 - \sqrt{\frac{\bar{C}^{*7}}{\bar{C}^{*7} + 25^7}} \right) \] \[ G = \frac{1}{2} \left(1 - \sqrt{\frac{\bar{C}^{*7}}{\bar{C}^{*7} + 25^7}} \right) \]
\[ a'_1 = (1 + G)a^*_1, \quad a'_2 = (1 + G)a^*_2 \] \[ a'_1 = (1 + G)a^*_1, \quad a'_2 = (1 + G)a^*_2 \]
\[ C'_1 = \sqrt{a_1'^2 + b_1^{*2}}, \quad C'_2 = \sqrt{a_2'^2 + b_2^{*2}} \] \[ C'_1 = \sqrt{a_1'^2 + b_1^{*2}}, \quad C'_2 = \sqrt{a_2'^2 + b_2^{*2}} \]

To compute hue (angles should be expressed in degrees):

\[ h'_1 = \text{atan2}(b^*_1, a'_1), \quad h'_2 = \text{atan2}(b^*_2, a'_2) \] \[ h'_1 = \text{atan2}(b^*_1, a'_1), \quad h'_2 = \text{atan2}(b^*_2, a'_2) \]

To find the necessary differences:

\[ \Delta L' = L^*_2 - L^*_1 \] \[ \Delta L' = L^*_2 - L^*_1 \]
\[ \Delta C' = C'_2 - C'_1 \] \[ \Delta C' = C'_2 - C'_1 \]

Hue difference requires special handling due to angle wrapping:

\[ \Delta h' = \begin{cases} 0 & \text{if } C'_1 C'_2 = 0 \\ h'_2 - h'_1 & \text{if } |h'_2 - h'_1| \le 180^\circ \\ h'_2 - h'_1 + 360^\circ & \text{if } h'_2 - h'_1 < -180^\circ \\ h'_2 - h'_1 - 360^\circ & \text{if } h'_2 - h'_1 > 180^\circ \end{cases} \] \[ \Delta h' = \begin{cases} 0 & \text{if } C'_1 C'_2 = 0 \\ h'_2 - h'_1 & \text{if } |h'_2 - h'_1| \le 180^\circ \\ h'_2 - h'_1 + 360^\circ & \text{if } h'_2 - h'_1 < -180^\circ \\ h'_2 - h'_1 - 360^\circ & \text{if } h'_2 - h'_1 > 180^\circ \end{cases} \]
\[ \Delta H' = 2\sqrt{C'_1 C'_2} \sin\left(\frac{\Delta h'}{2}\right) \] \[ \Delta H' = 2\sqrt{C'_1 C'_2} \sin\left(\frac{\Delta h'}{2}\right) \]

To compute mean values:

\[ \bar{L}' = \frac{L^*_1 + L^*_2}{2}, \quad \bar{C}' = \frac{C'_1 + C'_2}{2} \] \[ \bar{L}' = \frac{L^*_1 + L^*_2}{2}, \quad \bar{C}' = \frac{C'_1 + C'_2}{2} \]
\[ \bar{h}' = \begin{cases} h'_1 + h'_2 & \text{if } C'_1 C'_2 = 0 \\ \frac{h'_1 + h'_2}{2} & \text{if } |h'_1 - h'_2| \le 180^\circ \\ \frac{h'_1 + h'_2 + 360^\circ}{2} & \text{if } |h'_1 - h'_2| > 180^\circ \text{ and } h'_1 + h'_2 < 360^\circ \\ \frac{h'_1 + h'_2 - 360^\circ}{2} & \text{otherwise} \end{cases} \] \[ \bar{h}' = \begin{cases} h'_1 + h'_2 & \text{if } C'_1 C'_2 = 0 \\ \frac{h'_1 + h'_2}{2} & \text{if } |h'_1 - h'_2| \le 180^\circ \\ \frac{h'_1 + h'_2 + 360^\circ}{2} & \text{if } |h'_1 - h'_2| > 180^\circ \text{ and } h'_1 + h'_2 < 360^\circ \\ \frac{h'_1 + h'_2 - 360^\circ}{2} & \text{otherwise} \end{cases} \]

Weighing functions:

\[ S_L = 1 + \frac{0.015(\bar{L}' - 50)^2}{\sqrt{20 + (\bar{L}' - 50)^2}} \] \[ S_L = 1 + \frac{0.015(\bar{L}' - 50)^2}{\sqrt{20 + (\bar{L}' - 50)^2}} \]
\[ S_C = 1 + 0.045\bar{C}' \] \[ S_C = 1 + 0.045\bar{C}' \]
\[ T = 1 - 0.17\cos(\bar{h}' - 30^\circ) + 0.24\cos(2\bar{h}') + 0.32\cos(3\bar{h}' + 6^\circ) - 0.20\cos(4\bar{h}' - 63^\circ) \] \[ T = 1 - 0.17\cos(\bar{h}' - 30^\circ) + 0.24\cos(2\bar{h}') + 0.32\cos(3\bar{h}' + 6^\circ) - 0.20\cos(4\bar{h}' - 63^\circ) \]
\[ S_H = 1 + 0.015\bar{C}'T \] \[ S_H = 1 + 0.015\bar{C}'T \]

Rotational term:

\[ R_T = -2\sqrt{\frac{\bar{C}'^7}{\bar{C}'^7 + 25^7}} \sin\left(60^\circ e^{-((\bar{h}' - 275^\circ)/25)^2}\right) \] \[ R_T = -2\sqrt{\frac{\bar{C}'^7}{\bar{C}'^7 + 25^7}} \sin\left(60^\circ e^{-((\bar{h}' - 275^\circ)/25)^2}\right) \]

And with all that, we can use the final equation:

\[ \Delta E^*_{00} = \sqrt{ \left(\frac{\Delta L'}{k_L S_L}\right)^2 + \left(\frac{\Delta C'}{k_C S_C}\right)^2 + \left(\frac{\Delta H'}{k_H S_H}\right)^2 + R_T \left(\frac{\Delta C'}{k_C S_C}\right)\left(\frac{\Delta H'}{k_H S_H}\right) } \] \[ \Delta E^*_{00} = \sqrt{ \left(\frac{\Delta L'}{k_L S_L}\right)^2 + \left(\frac{\Delta C'}{k_C S_C}\right)^2 + \left(\frac{\Delta H'}{k_H S_H}\right)^2 + R_T \left(\frac{\Delta C'}{k_C S_C}\right)\left(\frac{\Delta H'}{k_H S_H}\right) } \]

CIEDE2000 is considered one of the best available colour comparison algorithms.

Afterthoughts

In my specific use case, there was a lot to optimise beyond this. I had to optimise finding the best match among hundreds or thousands of candidates. That was a whole different can of worms... and maybe one day I'll open it on this blog too.