Distance in Machine Learning

1. Euclidean Distance

The Euclidean distance between two data point X and Y is given as: \[ d = \sqrt{\sum(x_i-y_i)^2} \]

E Distance is one of the most frequently used distances in machine learning, for its direct and intuitive demonstration.
E Distance treat all dimensions as same, thus would be effected by scale of variable. To eliminate that, we can used standard euclidean distance, by replace data values x with: \[ x' = \frac{x-\bar{x}}{s} \]

2. Manhattan Distance

In this demo, the euclidean distance between the beginning and the end in a district shown as the is the green line. However, as we cannot walk through the building straightforward, we must move vertically or horizontally along the blocks. This distance is called Manhattan distance.

The manhattan distance is the sum of axis difference between two data points. It is given as: \[ d = \sum_i^m |x_i-y_i| \]

The manhatten distance only involves summing calculation. So it is faster then float calculation, and it does not generate accuracy lost.

3. Chebyshev Distance

0.5

The manhattan distance is the maxnium of axis difference between two data points. It is given as: \[ D = max |x_i-y_i| \]

4. Minkowski Distance

Minkowski Distance is the distance between two points in a p norm space \[ d = (\sum|x_i-y_i|^p)^\frac{1}{p} \]

when p = 1, MK distance equals Manhattan Distance
when p = 2, MK distance equals Euclidean Distance
when p = \(\infty\), MK distance equals Chebyshev Distance

The norm p is a hyper-parameter can be adjusted. It is more flexible, but choosing p can be difficult

5. Mahalanobis Distance

Mahalanobis Distance is the Euclidean distance in a standard principal component space(Space after PC decomposition). \[ D(X) = \sqrt{(X-\mu)^TS^{-1}(X-\mu)} \] where:

X is the feature vector of a sample
S is the covariance matrix of all features
\(\mu\) is the mean vector of all samples

The M distance between two data point can be given as: \[ D(X,Y) = \sqrt{(X-Y)^TS^{-1}(X-Y)} \]

The M distance is not effcted my scale of data, and it can avoid the correlation among features. Thus it can be a better metrics for measuring distance between data point and central point
The M distance is usually applied in outlier detection, combined with Grubb's Test
when the covariance matrix is a identity matrix, which means the distributions of features are independent to each other, the M distance equals to the E distance

6. Cosine Distance

The Cosine Distance uses cos of the inner angle of two vectors to represent their distance \[ d = 1- cos(\theta) = 1- \frac{a\cdot b}{|a|*|b|} \]

The rule of cosine similarity of two vector is "according -> 1, Orthogonal ->, inverse -> -1". Such rule remains same even when the dimension is extremely high. Since the Euclidean distance could be effected by the number of dimensions, consine distance can be a better measure when it comes to high-dimension data. FOr example, a long sentence and a short sentence share a same meaning can have large euclidean distance, but small cosine distance
cosine distance is not a strictly defined distance for mathematics, as it does not satisfy triangle inequality

7. Collections Distance(MMD and Jaccard Index)

Minimal Matching Distance

Minimal Matching Distance counts the matching points in two collections： \[ MM(A,B) = 1 - |A \cap B| \] The Jaccard Distance uses Jaccard similarity to represent distance: \[ d = 1-J(A,B) = 1-\frac{|A \cap B|}{|A \cup B|} \] Where J(A,B) is the ratio of two set's intersection and their union

Jaccard distance can represent the similarity of two set instead of two points. For example, it can measure the similarity of two customer's shopping records. It can be applied in user segmentation and recommendation

8. Sequences Distance(Hamming Distance & Edit Distance)

Hamming Distance

The hamming distance measures two sequences with same length. Two element would be zero-distant only if there position and value are both identical:

For two string s1 and s2, the hamming distance is the minimun times needed to transform s1 to s2 by replacing a single character: \[ H('abcde','acrdb') = 2 \]

Edit Distance：

The Edit distance can measure two sequences with different length. For two string s1 and s2, the edit distance is the minimun times needed to transform s1 to s2 by adding,deleting or replacing a single character. \[ E('aecfg','akgh') = 3 \]

9. Other Measure similar to Distance

Some measures, just like cosine distance, are not strictly defined distance by mathematics, but can take effects in a similar way, including:

Correlation Distance(for two variables)
KL divergence(for two distributions)
Mutual Information(for two variables)

Machine Learning

#Distance

Distance in Machine Learning

http://example.com/2022/10/08/distance/

Author

Zhengyuan Yang

Posted on

October 8, 2022

Licensed under

Basic Procedure of A/B Testing Previous

Outlier Detection in Feature Engineering Next