Mathias Gatti

Software Developer specialized in Data Science

How to sort generative art patterns by beauty (Simple clustering example with python and sklearn)

Published Aug 05, 2019Last updated Aug 07, 2019

Introduction

Some time ago I created this small script to convert numbers into patterns. I'm not going to explain how the script works in detail but it's inspired on Stephen Wolfram's Elementary Cellular Automatas which converts numbers like 30 into binary (00011110) and then interprets the digits as turning ON or OFF of 8 different basic rules (In that case there are 4 rules activated, rule 4, 5, 6 and 7) that define when to turn ON and OFF a pixel in the image.

Using this I can generate an infinite number of different patterns, the problem is that most of them are not really interesting and I have no time to check them one by one. That's why in this post I explain how I tried to automate the process of finding out the most interesting/beautiful cellular automatas.

Clusterization

My goal is to group the patterns by its beauty. I do this using a clustering algorithm based on features frequently attributed to beauty such as fractal dimensionality and compression efficiency. You can read more about these features here: Forsythe, Alex, et al. "Predicting beauty: fractal dimension and visual complexity in art.".

The Code

The full code is here but I also uploaded it into colab here so you can run everything from your web browser.

Defining clustering attributes

First I define the previously mentioned attributes, fractal dimension (Code taken from here) and compression score (The weight of a raw tiff image over its weight compressed as a gif image).

from fractaldimension import fractal_dimension
import cv2
import os

def fractalDimension(number):
    im = cv2.imread('images/'+str(number)+'.tiff', cv2.IMREAD_GRAYSCALE)
    newDimension = fractal_dimension(im, 0.9)
    return newDimension

def compressionScore(number):
    statinfo = os.stat('images/'+str(number)+'.gif')
    gif = statinfo.st_size # size of the file
    
    statinfo = os.stat('images/'+str(number)+'.tiff')
    tiff = statinfo.st_size # size of the file
    
    return tiff/gif

Clustering

There are several clustering algorithms, you can choose the one that best fits your use case.

In my case I ended up using Agglomerative Clustering which captures better the clusters generated by this dataset.

You need to specify the number of clusters, I tried with different numbers, at the end I chose 5 since it grouped them well from null patterns to crazy and chaotic ones.

from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt
from matplotlib import cm

# Applying clustering algorithm
clustering = AgglomerativeClustering(n_clusters=5).fit(df[['Fractal Dimension','Compression Eficciency']].values)
df["cluster"] = clustering.labels_

# Plotting results
fig, ax = plt.subplots()
cmap = cm.get_cmap('gist_rainbow')
ax = df.plot(kind='scatter', x='Fractal Dimension', y='Compression Eficciency',cmap=cmap, c='cluster',ax=ax)
plt.show()

descarga (5).png

Results

Here I show some samples of each cluster. I sorted them from the simplest ones to the most complex. As you can see this method is useful to identify and discard uninteresting patterns such us the ones from the Cluster 0. It's also useful to identify the most beautiful patterns, most of the best patterns I found are from the Cluster 3, the one with big complexity but not the biggest fractal dimension.

Cluster 0 (Null patterns)

Cluster 1

descarga (1).png

Cluster 2

descarga (2).png

Cluster 3

Cluster 4 (Crazy and chaotic patterns)

descarga (1).png

Python Clustering Data Science Image processing

Report

Enjoy this post? Give Mathias Gatti a like if it's helpful.

Mathias Gatti

Software Developer specialized in Data Science

I am a software developer specialized in data science. I have a computer science degree and several years of experience as a programmer and math teacher. In my spare time I contribute to open source projects.

Discover and read more posts from Mathias Gatti

get started

4Replies

Russell Winterbotham

6 years ago

I have as many as 30 numbers measuring the attributes of each of the horses in thousands and thousands of horse races. I want to know if I can use Clustering to study Patterns of the Numbers which emerge and Classify Horses in to Three Groups: Winners, Top 3, Losers. Or would other Python/Machine Learning Programs be more appropriate?

Mathias Gatti

6 years ago

Yes, that might work. Another approach if you already have old data is to tag the horses as winners, top 3 and losers based on old results, then using a classifier such as random forest to learn how to identify each type of horse based on the attributes you chose.

Russell Winterbotham

6 years ago

Thanks for your reply. I have little knowledge about Machine Learning. However, I am thinking that Supervised Learning would be appropriate for studying how successful the most popular THEORETICAL Variables are in classifying groups of horses (or postdicting/predicting winners). On the other hand, would it be possible to ALSO use UNsupervised Learning in the hopes that NEW Insights/Patterns about the Relationships between Measurements and Variables might be uncovered?

Mathias Gatti

6 years ago

Yes, unsupervised learning might be good to identify new groups. For example if you are not sure that “Winners”, “Top 3” and “Losers” is a good or realistic partition you could try a clustering algorithm and see which clusters arise, maybe it groups horses in a new but yet useful set of groups.

Show more replies