Distance in Machine Learning
Distance in Machine Learning
1. Euclidean Distance
The Euclidean distance between two data point X and Y is given as: \[ d = \sqrt{\sum(x_i-y_i)^2} \]
E Distance is one of the most frequently used distances in machine learning, for its direct and intuitive demonstration.
E Distance treat all dimensions as same, thus would be effected by scale of variable. To eliminate that, we can used standard euclidean distance, by replace data values x with: \[ x' = \frac{x-\bar{x}}{s} \]
2. Manhattan Distance
In this demo, the euclidean distance between the beginning and the end in a district shown as the is the green line. However, as we cannot walk through the building straightforward, we must move vertically or horizontally along the blocks. This distance is called Manhattan distance.
The manhattan distance is the sum of axis difference between two data points. It is given as: \[ d = \sum_i^m |x_i-y_i| \]
- The manhatten distance only involves summing calculation. So it is faster then float calculation, and it does not generate accuracy lost.
3. Chebyshev Distance
The manhattan distance is the maxnium of axis difference between two data points. It is given as: \[ D = max |x_i-y_i| \]
4. Minkowski Distance
Minkowski Distance is the distance between two points in a p norm space \[ d = (\sum|x_i-y_i|^p)^\frac{1}{p} \]
- when p = 1, MK distance equals Manhattan Distance
- when p = 2, MK distance equals Euclidean Distance
- when p = \(\infty\), MK distance equals Chebyshev Distance
The norm p is a hyper-parameter can be adjusted. It is more flexible, but choosing p can be difficult
5. Mahalanobis Distance
Mahalanobis Distance is the Euclidean distance in a standard principal component space(Space after PC decomposition). \[ D(X) = \sqrt{(X-\mu)^TS^{-1}(X-\mu)} \] where:
- X is the feature vector of a sample
- S is the covariance matrix of all features
- \(\mu\) is the mean vector of all samples
The M distance between two data point can be given as: \[ D(X,Y) = \sqrt{(X-Y)^TS^{-1}(X-Y)} \]
- The M distance is not effcted my scale of data, and it can avoid the correlation among features. Thus it can be a better metrics for measuring distance between data point and central point
- The M distance is usually applied in outlier detection, combined with Grubb's Test
- when the covariance matrix is a identity matrix, which means the distributions of features are independent to each other, the M distance equals to the E distance
6. Cosine Distance
The Cosine Distance uses cos of the inner angle of two vectors to represent their distance \[ d = 1- cos(\theta) = 1- \frac{a\cdot b}{|a|*|b|} \]
The rule of cosine similarity of two vector is "according -> 1, Orthogonal ->, inverse -> -1". Such rule remains same even when the dimension is extremely high. Since the Euclidean distance could be effected by the number of dimensions, consine distance can be a better measure when it comes to high-dimension data. FOr example, a long sentence and a short sentence share a same meaning can have large euclidean distance, but small cosine distance
cosine distance is not a strictly defined distance for mathematics, as it does not satisfy triangle inequality
7. Collections Distance(MMD and Jaccard Index)
Minimal Matching Distance
Minimal Matching Distance counts the matching points in two collections: \[ MM(A,B) = 1 - |A \cap B| \] The Jaccard Distance uses Jaccard similarity to represent distance: \[ d = 1-J(A,B) = 1-\frac{|A \cap B|}{|A \cup B|} \] Where J(A,B) is the ratio of two set's intersection and their union
- Jaccard distance can represent the similarity of two set instead of two points. For example, it can measure the similarity of two customer's shopping records. It can be applied in user segmentation and recommendation
8. Sequences Distance(Hamming Distance & Edit Distance)
Hamming Distance
The hamming distance measures two sequences with same length. Two element would be zero-distant only if there position and value are both identical:
For two string s1 and s2, the hamming distance is the minimun times needed to transform s1 to s2 by replacing a single character: \[ H('abcde','acrdb') = 2 \]
Edit Distance:
The Edit distance can measure two sequences with different length. For two string s1 and s2, the edit distance is the minimun times needed to transform s1 to s2 by adding,deleting or replacing a single character. \[ E('aecfg','akgh') = 3 \]
9. Other Measure similar to Distance
Some measures, just like cosine distance, are not strictly defined distance by mathematics, but can take effects in a similar way, including:
- Correlation Distance(for two variables)
- KL divergence(for two distributions)
- Mutual Information(for two variables)