Onicescu correlation coefficient-Python - Alexandru Daia

Published Nov 15, 2017

Here is Onicescu correlation coefficient based on kinetic energy , see here more details.

My ideea is to use new correaltion coefficient as performance metric instead of cross entropy like in the case of neural networks or in the case of genetic alghoritms as fitness function .

This will act like a building block for the next implementation that will post.
Kinetic energy simple function :

def  kin_energy(random_vec):
    """return    kinetic  energy  of   random vector represented    as   (n,) dimmensional  array"""
    freq=np.unique(random_vec,return_counts=True)
    prob=freq[1]/random_vec.shape[0]
    energy=np.sum(prob**2)
    return  energy

import numpy as np
a=np.array([1,3,2,2]])
print(kin_energy(a))
#case with     total energy
a=np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1])
print(kin_energy(a))
#case  with    converging    to  zero
a=np.array([1,2,3,4,5,6,7,8,9,10])
print(kin_energy(a))
0.375
1.0

The issue with correlation :

Having 2 random vectors

X=x1,x2,..xNX=x1,x2,..xN

and

Y=y1,y2,...yN

The informational correlation IC ( this does not make use of kinetic energy)
C(x,y)=Cx1,x2,…xN,y1,y2,….,yN=∑Ni=0 p(xi)∗p(yi)C(x,y)=Cx1,x2,…xN,y1,y2,….,yN=∑i=0N p(xi)∗p(yi)

The IC is bounded between [0,1] , being totally 0 if boith vectors are zero ( the system is total indifferent’.

The INFORMATIONAL CORRELATION COEFFICIENT
Simmilar to other statistical correlation coefficients the IC could suffer normation using kinetic energy resulting in :

O(X,Y)=(∑Ni=0 p(xi)∗p(yi))/(∑Ni=0 xi2∗∑Ni=0 yi2)O(X,Y)=(∑i=0N p(xi)∗p(yi))/(∑i=0N xi2∗∑i=0N yi2)

Nottice the denominator is jut the kinetic energy so the ICC coefficient could be writted as :

O(X,Y)=IC/kinetic(X)∗kinetic(Y)O(X,Y)=IC/kinetic(X)∗kinetic(Y)

The real issue is at nominator :

This meaning in order to compute dot product there , it is necessary that some 2 random vectors to have same cardinality of unique events( classes)

For example if x=1,2,1,y=4,2,5IC:=1∗4+2∗2+1∗5x=1,2,1,y=4,2,5IC:=1∗4+2∗2+1∗5, but what if the 2 random vectors look like : x=1,2,1,4,y=7,5,1x=1,2,1,4,y=7,5,1not having same shape, will disable the correlation coefficient to work , until figured out something.

Setting up the implementation :
def ic(vector1,vector2):
"""return information coefficient IC for 2 random variables
-defined as dot product of probabilities corresponding to each class

"""
a=vector1
b=vector2
# get the probs  in order  to    do     dot product with  them

    prob1=np.unique(a,return_counts=True)[1]/a.shape[0]
    prob2=np.unique(b,return_counts=True)[1]/b.shape[0]
    p1=list(prob1)
    p2=list(prob2)
    diff=len(p1)-len(p2)
    if diff>0:
        for elem in range(diff):
            p2.append(0)
    if diff<0:
        for  elem in range((diff*-1)):
            p1.append(0)
    ic=np.dot(np.

array(p1),np.array(p2))
return ic

And finally after having functions for kinetic energy of a vector and for information correlation , we can define a new function that computes kinetic correlation :

    def  o(vector1,vector2):
    """return onicescu   information   correlation   based on kinetic energy """
    i_c=ic(vector1,vector2)
    o=i_c/np.sqrt(kin_energy(vector1)*kin_energy(vector2))
    return o
    
    Testing some toy use cases .

Nottice I updated the formula such as the denominator contains sqrt in order to have probs bounded between o and 1.

Example 1

a=np.array([1,2,3,4,5,6,7])
b=np.array([2,3,1,5,7,9,1])
o(a,b)
0.88191710368819676
#Example  2
a=np.array([1,2,3,4,5,6,7])
b=np.array([2,3,1,5,7,9,11])
o(a,b)
1

Having this , could simple use it instead of cross entropy or as fitness functions for the genetic algs .

Associated notebook could be found here :
https://github.com/alexandrudaia/kinetic-correlation/blob/master/KInetic_Correlation.ipynb

Machine learning Correlation analysis Feature engineering Python

Report

Enjoy this post? Give Daia Alexandru a like if it's helpful.

Daia Alexandru

My name is Alexandru , Machine Learning /Ai Engineer .

Hello , I AM Alexandru , I am a MACHINE Learning and Data Science Expert that is enthusiastic about data science in general and machine learning in particular with lots of years experience in this fields Programming Languages: R, ...

Discover and read more posts from Daia Alexandru

get started