scipy.optimize.isotonic_regression#
- scipy.optimize.isotonic_regression(y, *, weights=None, increasing=True)[source]#
Nonparametric isotonic regression.
A (not strictly) monotonically increasing array x with the same length as y is calculated by the pool adjacent violators algorithm (PAVA), see [1]. See the Notes section for more details.
- Parameters:
- y(N,) array_like
Response variable.
- weights(N,) array_like or None
Case weights.
- increasingbool
If True, fit monotonic increasing, i.e. isotonic, regression. If False, fit a monotonic decreasing, i.e. antitonic, regression. Default is True.
- Returns:
- resOptimizeResult
The optimization result represented as a
OptimizeResult
object. Important attributes are:x
: The isotonic regression solution, i.e. an increasing (or decreasing) array of the same length than y, with elements in the range from min(y) to max(y).weights
: Array with the sum of case weights for each block (or pool) B.blocks
: Array of length B+1 with the indices of the start positions of each block (or pool) B. The j-th block is given byx[blocks[j]:blocks[j+1]]
for which all values are the same.
Notes
Given data \(y\) and case weights \(w\), the isotonic regression solves the following optimization problem:
\[\operatorname{argmin}_{x_i} \sum_i w_i (y_i - x_i)^2 \quad \text{subject to } x_i \leq x_j \text{ whenever } i \leq j \,.\]For every input value \(y_i\), it generates a value \(x_i\) such that \(x\) is increasing (but not strictly), i.e. \(x_i \leq x_{i+1}\). This is accomplished by the PAVA. The solution consists of pools or blocks, i.e. neighboring elements of \(x\), e.g. \(x_i\) and \(x_{i+1}\), that all have the same value.
Most interestingly, the solution stays the same if the squared loss is replaced by the wide class of Bregman functions which are the unique class of strictly consistent scoring functions for the mean, see [2] and references therein.
The implemented version of PAVA according to [1] has a computational complexity of O(N) with input size N.
References
[1] (1,2)Busing, F. M. T. A. (2022). Monotone Regression: A Simple and Fast O(n) PAVA Implementation. Journal of Statistical Software, Code Snippets, 102(1), 1-25. DOI:10.18637/jss.v102.c01
[2]Jordan, A.I., Mühlemann, A. & Ziegel, J.F. Characterizing the optimal solutions to the isotonic regression problem for identifiable functionals. Ann Inst Stat Math 74, 489-514 (2022). DOI:10.1007/s10463-021-00808-0
Examples
This example demonstrates that
isotonic_regression
really solves a constrained optimization problem.>>> import numpy as np >>> from scipy.optimize import isotonic_regression, minimize >>> y = [1.5, 1.0, 4.0, 6.0, 5.7, 5.0, 7.8, 9.0, 7.5, 9.5, 9.0] >>> def objective(yhat, y): ... return np.sum((yhat - y)**2) >>> def constraint(yhat, y): ... # This is for a monotonically increasing regression. ... return np.diff(yhat) >>> result = minimize(objective, x0=y, args=(y,), ... constraints=[{'type': 'ineq', ... 'fun': lambda x: constraint(x, y)}]) >>> result.x array([1.25 , 1.25 , 4. , 5.56666667, 5.56666667, 5.56666667, 7.8 , 8.25 , 8.25 , 9.25 , 9.25 ]) >>> result = isotonic_regression(y) >>> result.x array([1.25 , 1.25 , 4. , 5.56666667, 5.56666667, 5.56666667, 7.8 , 8.25 , 8.25 , 9.25 , 9.25 ])
The big advantage of
isotonic_regression
compared to callingminimize
is that it is more user friendly, i.e. one does not need to define objective and constraint functions, and that it is orders of magnitudes faster. On commodity hardware (in 2023), for normal distributed input y of length 1000, the minimizer takes about 4 seconds, whileisotonic_regression
takes about 200 microseconds.