How to sort generative art patterns by beauty (Simple clustering example with python and sklearn)
Introduction
Some time ago I created this small script to convert numbers into patterns. I'm not going to explain how the script works in detail but it's inspired on Stephen Wolfram's Elementary Cellular Automatas which converts numbers like 30 into binary (00011110) and then interprets the digits as turning ON or OFF of 8 different basic rules (In that case there are 4 rules activated, rule 4, 5, 6 and 7) that define when to turn ON and OFF a pixel in the image.
Using this I can generate an infinite number of different patterns, the problem is that most of them are not really interesting and I have no time to check them one by one. That's why in this post I explain how I tried to automate the process of finding out the most interesting/beautiful cellular automatas.
Clusterization
My goal is to group the patterns by its beauty. I do this using a clustering algorithm based on features frequently attributed to beauty such as fractal dimensionality and compression efficiency. You can read more about these features here: Forsythe, Alex, et al. "Predicting beauty: fractal dimension and visual complexity in art.".
The Code
The full code is here but I also uploaded it into colab here so you can run everything from your web browser.
Defining clustering attributes
First I define the previously mentioned attributes, fractal dimension (Code taken from here) and compression score (The weight of a raw tiff image over its weight compressed as a gif image).
from fractaldimension import fractal_dimension
import cv2
import os
def fractalDimension(number):
im = cv2.imread('images/'+str(number)+'.tiff', cv2.IMREAD_GRAYSCALE)
newDimension = fractal_dimension(im, 0.9)
return newDimension
def compressionScore(number):
statinfo = os.stat('images/'+str(number)+'.gif')
gif = statinfo.st_size # size of the file
statinfo = os.stat('images/'+str(number)+'.tiff')
tiff = statinfo.st_size # size of the file
return tiff/gif
Clustering
There are several clustering algorithms, you can choose the one that best fits your use case.
In my case I ended up using Agglomerative Clustering which captures better the clusters generated by this dataset.
You need to specify the number of clusters, I tried with different numbers, at the end I chose 5 since it grouped them well from null patterns to crazy and chaotic ones.
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt
from matplotlib import cm
# Applying clustering algorithm
clustering = AgglomerativeClustering(n_clusters=5).fit(df[['Fractal Dimension','Compression Eficciency']].values)
df["cluster"] = clustering.labels_
# Plotting results
fig, ax = plt.subplots()
cmap = cm.get_cmap('gist_rainbow')
ax = df.plot(kind='scatter', x='Fractal Dimension', y='Compression Eficciency',cmap=cmap, c='cluster',ax=ax)
plt.show()
Results
Here I show some samples of each cluster. I sorted them from the simplest ones to the most complex. As you can see this method is useful to identify and discard uninteresting patterns such us the ones from the Cluster 0. It's also useful to identify the most beautiful patterns, most of the best patterns I found are from the Cluster 3, the one with big complexity but not the biggest fractal dimension.
I have as many as 30 numbers measuring the attributes of each of the horses in thousands and thousands of horse races. I want to know if I can use Clustering to study Patterns of the Numbers which emerge and Classify Horses in to Three Groups: Winners, Top 3, Losers. Or would other Python/Machine Learning Programs be more appropriate?
Yes, that might work. Another approach if you already have old data is to tag the horses as winners, top 3 and losers based on old results, then using a classifier such as random forest to learn how to identify each type of horse based on the attributes you chose.
Thanks for your reply. I have little knowledge about Machine Learning. However, I am thinking that Supervised Learning would be appropriate for studying how successful the most popular THEORETICAL Variables are in classifying groups of horses (or postdicting/predicting winners). On the other hand, would it be possible to ALSO use UNsupervised Learning in the hopes that NEW Insights/Patterns about the Relationships between Measurements and Variables might be uncovered?
Yes, unsupervised learning might be good to identify new groups. For example if you are not sure that “Winners”, “Top 3” and “Losers” is a good or realistic partition you could try a clustering algorithm and see which clusters arise, maybe it groups horses in a new but yet useful set of groups.