Extracting All Colors in Images with Python

5 min readNov 28, 2022

Improving your computer’s ability to see a wider range of colors.

Recently I made some improvements to the Pillow library so it can detect a wider range of colors in a given image. As we all know, there are many colors in the universe and I want to help computers recognize more of them. I highly recommend running my Jupyter Notebook first to visualize the colors detected.

Simply running these lines of code did not do much for me:

from PIL import Image
img = Image.open(image_path)
colors = img.convert('RGB').getcolors(maxcolors=256)

Here is a reference to Pillow’s documentation, specifically their image module: https://pillow.readthedocs.io/en/stable/reference/Image.html

To tackle this problem, first I built a web scraper to collect all the different color ranges and its associated color codes. I managed to collect a total of 865 different colors to use for my demo. This data is saved in my colors.csv file. Don’t worry, this part is already done and is accessible in my Github. The ranges can be found here for your reference: https://www.colorhexa.com/

Here is what the colors database looks like.

Second, I made a function to resize the original image so neither its width or height is greater than 100 while also maintaining its aspect ratio to prevent too much distortion. Smaller dimensions helped improve runtime significantly.

def resize_image(width, height, threshold):
    """
    Function takes in an image's original dimensions and returns the 
    new width and height while maintaining its aspect ratio where 
    both are below the threshold. Purpose is to reduce runtime and 
    not distort the original image too much. 
    
    Parameters
    ----------
    width : int
        original width of image
    height : int 
        original height of image
    threshold : int
        max dimension size for both width and height
    """
    if (width > threshold) or (height > threshold):
        max_dim = max(width, height)
        if height == max_dim:
            new_width = int((width * threshold) / height)
            new_height = threshold
        if width == max_dim:
            new_height = int((height * threshold) / width)
            new_width = threshold
        return new_width, new_height
    else: return width, height

Third, I wrote a function to create a hash map to keep track of all the different colors detected and its total number of occurrences across all pixels defined by iterating through the resized image.

def detect_colors(image_path):
    """
    Function returns colors detected in image. 
    
    Parameters
    ----------
    image_path : str
        path to imagefile for detection
        
    Return
    ------
    sorted list of tuples (color, total number detections) 
    """
    
    # Read image
    image = Image.open(image_path)
    
    # Convert image into RGB
    image = image.convert('RGB')    # Get width and height of image
    width, height = image.size
    print(f'Original dimensions: {width} x {height}')
    
    # Resize image to improve runtime
    width, height = resize_image(width, height, threshold=100)
    print(f'New dimensions: {width} x {height}')
    image = image.resize((width, height))
 
    # Iterate through each pixel
    detected_colors = {} # hash-map
    for x in range(0, width):
        for y in range(0, height):
            # r,g,b value of pixel
            r, g, b = image.getpixel((x, y))
            rgb = f'{r}:{g}:{b}'
            if rgb in detected_colors:
                detected_colors[rgb] += 1
            else: 
                detected_colors[rgb] = 1
 
    # Sort colors from most common to least common
    detected_colors = sorted(detected_colors.items(), key=lambda x:x[1], reverse=True)return detected_colors

Fourth, I calculated the absolute differences between detected color codes in the image and (R,G,B) values from reference (colors.csv). Then I stored all the differences in a list of dictionaries and used the shortest distance method to get the best match.

def get_color_codes(detected_colors):
    """ 
    Function finds the best matches between detected color codes 
    and source color codes from: https://www.colorhexa.com    Parameters
    ---------
    detected_colors : list
        list of detected colors in image
    color_codes : list
        list of best matches
    """
    
    color_codes = []
    for idx,detected_color in enumerate(detected_colors):        detected_color = detected_color[0].split(':')        # Calculate absolute differences
        color_map = []
        for idx,row in colors.iterrows():
            r = abs(int(detected_color[0]) - row['R'])
            g = abs(int(detected_color[1]) - row['G'])
            b = abs(int(detected_color[2]) - row['B'])
    
            # Query row values
            color = row['color'], 
            code = row['code'].replace('#', '')
    
            # Map results
            color_map.append({
                                'color':color, 
                                'code':code,
                                'distance':sum([r,g,b])
                            })
        
        # Get best match (shortest distance)
        best_match = min(color_map, key=lambda x:x['distance'])
        
        # Get color code
        color_code = best_match['code']
        if color_code not in color_codes:
            color_codes.append(color_code)
    
    return color_codes

To improve runtime further, I recommend splicing up the list of detected colors since I realized that not every consecutive pixel is going to be a different color; only slight variations. Extracting and analyzing every 10th pixel will do.

color_codes = get_color_codes(detected_colors[0::10]) # list splice

Finally, I returned all the closest matches with its associated color names. Results are stored in a pandas dataframe.

def get_association(color_codes):
    """
    Function returns color name associated w/ detected color codes.    Parameters
    ----------
    color_codes : list
        list of detected color codes in image    Return
    ------
    res : list
        list of color names associated with respective color codes
    """
    
    res = []
    for color_code in color_codes:
        colorfile = os.path.join('colors', color_code + '.png')
        # Query color name associated with color code
        colorname = colors[colors['code'] == f'#{color_code}']
        color_name = ['color'].values[0]        # Append results...
        res.append({
                       'color name':color_name, 
                       'color code':f'#{color_code}'
                   })
    
    # Generate pandas dataframe
    if len(res) == 0: return []
    elif len(res) == 1: res = pd.DataFrame(res, index=[0])
    else: res = pd.DataFrame(res, index=None)return res

Read color strips from left to right, then top to bottom to get most dominant to least dominant.

I got a total of 89 detected color tones and shades from Andy Warhol’s Marilyn Monroe, but usually the top 10 would work for most use cases. I simply wanted to display them all to show my code’s robustness.

Warning: Lightning and shadows can affect how the algorithm performs and detects colors. Other than that, thanks for reading and my source code and Jupyter Notebook can be accessed via Github. Please give me a follow and let me know if you run into any issues with my code. Feel free to contact me here.

Thank you so much for making it to the end of this page. If you found me helpful, please feel free to support me and Shegocodes by giving me a follow here on Medium and/or buying me a cup of coffee so I can continue to contribute to open source work and build. Happy Coding!

Extracting All Colors in Images with Python

Written by Shegocodes

Responses (1)