scipy.stats.weightedtau¶

scipy.stats.weightedtau(x, y, rank=True, weigher=None, additive=True)[source]¶

Computes a weighted version of Kendall’s \(\tau\).

The weighted \(\tau\) is a weighted version of Kendall’s \(\tau\) in which exchanges of high weight are more influential than exchanges of low weight. The default parameters compute the additive hyperbolic version of the index, \(\tau_\mathrm h\), which has been shown to provide the best balance between important and unimportant elements [R648].

The weighting is defined by means of a rank array, which assigns a nonnegative rank to each element, and a weigher function, which assigns a weight based from the rank to each element. The weight of an exchange is then the sum or the product of the weights of the ranks of the exchanged elements. The default parameters compute \(\tau_\mathrm h\): an exchange between elements with rank \(r\) and \(s\) (starting from zero) has weight \(1/(r+1) + 1/(s+1)\).

Specifying a rank array is meaningful only if you have in mind an external criterion of importance. If, as it usually happens, you do not have in mind a specific rank, the weighted \(\tau\) is defined by averaging the values obtained using the decreasing lexicographical rank by (x, y) and by (y, x). This is the behavior with default parameters.

Note that if you are computing the weighted \(\tau\) on arrays of ranks, rather than of scores (i.e., a larger value implies a lower rank) you must negate the ranks, so that elements of higher rank are associated with a larger value.

Parameters:

Parameters:	x, y : array_like Arrays of scores, of the same shape. If arrays are not 1-D, they will be flattened to 1-D. rank: array_like of ints or bool, optional A nonnegative rank assigned to each element. If it is None, the decreasing lexicographical rank by (x, y) will be used: elements of higher rank will be those with larger x-values, using y-values to break ties (in particular, swapping x and y will give a different result). If it is False, the element indices will be used directly as ranks. The default is True, in which case this function returns the average of the values obtained using the decreasing lexicographical rank by (x, y) and by (y, x). weigher : callable, optional The weigher function. Must map nonnegative integers (zero representing the most important element) to a nonnegative weight. The default, None, provides hyperbolic weighing, that is, rank \(r\) is mapped to weight \(1/(r+1)\). additive : bool, optional If True, the weight of an exchange is computed by adding the weights of the ranks of the exchanged elements; otherwise, the weights are multiplied. The default is True.
Returns:	correlation : float The weighted \(\tau\) correlation index. pvalue : float Presently `np.nan`, as the null statistics is unknown (even in the additive hyperbolic case).

x, y : array_like

Arrays of scores, of the same shape. If arrays are not 1-D, they will be flattened to 1-D.

rank: array_like of ints or bool, optional

A nonnegative rank assigned to each element. If it is None, the decreasing lexicographical rank by (x, y) will be used: elements of higher rank will be those with larger x-values, using y-values to break ties (in particular, swapping x and y will give a different result). If it is False, the element indices will be used directly as ranks. The default is True, in which case this function returns the average of the values obtained using the decreasing lexicographical rank by (x, y) and by (y, x).

weigher : callable, optional

The weigher function. Must map nonnegative integers (zero representing the most important element) to a nonnegative weight. The default, None, provides hyperbolic weighing, that is, rank \(r\) is mapped to weight \(1/(r+1)\).

additive : bool, optional

If True, the weight of an exchange is computed by adding the weights of the ranks of the exchanged elements; otherwise, the weights are multiplied. The default is True.

Returns:

correlation : float

The weighted \(\tau\) correlation index.

pvalue : float

Presently np.nan, as the null statistics is unknown (even in the additive hyperbolic case).

See also

kendalltau: Calculates Kendall’s tau.
spearmanr: Calculates a Spearman rank-order correlation coefficient.
theilslopes: Computes the Theil-Sen estimator for a set of points (x, y).

Notes

This function uses an \(O(n \log n)\), mergesort-based algorithm [R648] that is a weighted extension of Knight’s algorithm for Kendall’s \(\tau\) [R649]. It can compute Shieh’s weighted \(\tau\) [R650] between rankings without ties (i.e., permutations) by setting additive and rank to False, as the definition given in [R648] is a generalization of Shieh’s.

NaNs are considered the smallest possible score.

New in version 0.19.0.

References

[R648]

(1, 2, 3, 4) Sebastiano Vigna, “A weighted correlation index for rankings with ties”, Proceedings of the 24th international conference on World Wide Web, pp. 1166-1176, ACM, 2015.

[R649]

(1, 2) W.R. Knight, “A Computer Method for Calculating Kendall’s Tau with Ungrouped Data”, Journal of the American Statistical Association, Vol. 61, No. 314, Part 1, pp. 436-439, 1966.

[R650]

(1, 2) Grace S. Shieh. “A weighted Kendall’s tau statistic”, Statistics & Probability Letters, Vol. 39, No. 1, pp. 17-24, 1998.

Examples

>>> from scipy import stats
>>> x = [12, 2, 1, 12, 2]
>>> y = [1, 4, 7, 1, 0]
>>> tau, p_value = stats.weightedtau(x, y)
>>> tau
-0.56694968153682723
>>> p_value
nan
>>> tau, p_value = stats.weightedtau(x, y, additive=False)
>>> tau
-0.62205716951801038

NaNs are considered the smallest possible score:

>>> x = [12, 2, 1, 12, 2]
>>> y = [1, 4, 7, 1, np.nan]
>>> tau, _ = stats.weightedtau(x, y)
>>> tau
-0.56694968153682723

This is exactly Kendall’s tau:

>>> x = [12, 2, 1, 12, 2]
>>> y = [1, 4, 7, 1, 0]
>>> tau, _ = stats.weightedtau(x, y, weigher=lambda x: 1)
>>> tau
-0.47140452079103173

>>> x = [12, 2, 1, 12, 2]
>>> y = [1, 4, 7, 1, 0]
>>> stats.weightedtau(x, y, rank=None)
WeightedTauResult(correlation=-0.4157652301037516, pvalue=nan)
>>> stats.weightedtau(y, x, rank=None)
WeightedTauResult(correlation=-0.71813413296990281, pvalue=nan)