Onicescu correlation coefficient-Python - Alexandru Daia
Here is Onicescu correlation coefficient based on kinetic energy , see here more details.
My ideea is to use new correaltion coefficient as performance metric instead of cross entropy like in the case of neural networks or in the case of genetic alghoritms as fitness function .
This will act like a building block for the next implementation that will post.
Kinetic energy simple function :
def kin_energy(random_vec):
"""return kinetic energy of random vector represented as (n,) dimmensional array"""
freq=np.unique(random_vec,return_counts=True)
prob=freq[1]/random_vec.shape[0]
energy=np.sum(prob**2)
return energy
import numpy as np
a=np.array([1,3,2,2]])
print(kin_energy(a))
#case with total energy
a=np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1])
print(kin_energy(a))
#case with converging to zero
a=np.array([1,2,3,4,5,6,7,8,9,10])
print(kin_energy(a))
0.375
1.0
The issue with correlation :
Having 2 random vectors
X=x1,x2,..xNX=x1,x2,..xN
and
Y=y1,y2,...yN
- The informational correlation IC ( this does not make use of kinetic energy)
C(x,y)=Cx1,x2,…xN,y1,y2,….,yN=∑Ni=0 p(xi)∗p(yi)C(x,y)=Cx1,x2,…xN,y1,y2,….,yN=∑i=0N p(xi)∗p(yi)
The IC is bounded between [0,1] , being totally 0 if boith vectors are zero ( the system is total indifferent’.
- The INFORMATIONAL CORRELATION COEFFICIENT
Simmilar to other statistical correlation coefficients the IC could suffer normation using kinetic energy resulting in :
O(X,Y)=(∑Ni=0 p(xi)∗p(yi))/(∑Ni=0 xi2∗∑Ni=0 yi2)O(X,Y)=(∑i=0N p(xi)∗p(yi))/(∑i=0N xi2∗∑i=0N yi2)
Nottice the denominator is jut the kinetic energy so the ICC coefficient could be writted as :
O(X,Y)=IC/kinetic(X)∗kinetic(Y)O(X,Y)=IC/kinetic(X)∗kinetic(Y)
The real issue is at nominator :
This meaning in order to compute dot product there , it is necessary that some 2 random vectors to have same cardinality of unique events( classes)
For example if x=1,2,1,y=4,2,5IC:=1∗4+2∗2+1∗5x=1,2,1,y=4,2,5IC:=1∗4+2∗2+1∗5, but what if the 2 random vectors look like : x=1,2,1,4,y=7,5,1x=1,2,1,4,y=7,5,1not having same shape, will disable the correlation coefficient to work , until figured out something.
Setting up the implementation :
def ic(vector1,vector2):
"""return information coefficient IC for 2 random variables
-defined as dot product of probabilities corresponding to each class
"""
a=vector1
b=vector2
# get the probs in order to do dot product with them
prob1=np.unique(a,return_counts=True)[1]/a.shape[0]
prob2=np.unique(b,return_counts=True)[1]/b.shape[0]
p1=list(prob1)
p2=list(prob2)
diff=len(p1)-len(p2)
if diff>0:
for elem in range(diff):
p2.append(0)
if diff<0:
for elem in range((diff*-1)):
p1.append(0)
ic=np.dot(np.
array(p1),np.array(p2))
return ic
And finally after having functions for kinetic energy of a vector and for information correlation , we can define a new function that computes kinetic correlation :
def o(vector1,vector2):
"""return onicescu information correlation based on kinetic energy """
i_c=ic(vector1,vector2)
o=i_c/np.sqrt(kin_energy(vector1)*kin_energy(vector2))
return o
Testing some toy use cases .
Nottice I updated the formula such as the denominator contains sqrt in order to have probs bounded between o and 1.
Example 1
a=np.array([1,2,3,4,5,6,7])
b=np.array([2,3,1,5,7,9,1])
o(a,b)
0.88191710368819676
#Example 2
a=np.array([1,2,3,4,5,6,7])
b=np.array([2,3,1,5,7,9,11])
o(a,b)
1
Having this , could simple use it instead of cross entropy or as fitness functions for the genetic algs .