SciPy Roadmap

Most of this roadmap is intended to provide a high-level view on what is most needed per SciPy submodule in terms of new functionality, bug fixes, etc. Part of those are must-haves for the 1.0 version of Scipy. Furthermore it contains ideas for major new features - those are marked as such, and are not needed for SciPy to become 1.0. Things not mentioned in this roadmap are not necessarily unimportant or out of scope, however we (the SciPy developers) want to provide to our users and contributors a clear picture of where SciPy is going and where help is needed most urgently.

When a module is in a 1.0-ready state, it means that it has the functionality we consider essential and has an API and code quality (including documentation and tests) that’s of high enough quality.

General

This roadmap will be evolving together with SciPy. Updates can be submitted as pull requests. For large or disruptive changes you may want to discuss those first on the scipy-dev mailing list.

API changes

In general, we want to take advantage of the major version change to fix some known warts in the API. The change from 0.x.x to 1.x.x is the chance to fix those API issues that we all know are ugly warts. Example: unify the convention for specifying tolerances (including absolute, relative, argument and function value tolerances) of the optimization functions. More API issues will be noted in the module sections below. However, there should be clear value in making a breaking change. The 1.0 version label is not a license to just break things - see it as a normal release with a somewhat more aggressive/extensive set of cleanups.

It should be made more clear what is public and what is private in SciPy. Everything private should be underscored as much as possible. Now this is done consistently when we add new code, but for 1.0 it should also be done for existing code.

Test coverage

Test coverage of code added in the last few years is quite good, and we aim for a high coverage for all new code that is added. However, there is still a significant amount of old code for which coverage is poor. Bringing that up to the current standard is probably not realistic, but we should plug the biggest holes. Additionally the coverage should be tracked over time and we should ensure it only goes up.

Besides coverage there is also the issue of correctness - older code may have a few tests that provide decent statement coverage, but that doesn’t necessarily say much about whether the code does what it says on the box. Therefore code review of some parts of the code (stats and signal in particular) is necessary.

Documentation

The documentation is in decent shape. Expanding of current docstrings and putting them in the standard NumPy format should continue, so the number of reST errors and glitches in the html docs decreases. Most modules also have a tutorial in the reference guide that is a good introduction, however there are a few missing or incomplete tutorials - this should be fixed.

Other

Scipy 1.0 will likely contain more backwards-incompatible changes than a minor release. Therefore we will have a longer-lived maintenance branch of the last 0.X release.

Regarding Cython code:

  • It’s not clear how much functionality can be Cythonized without making the .so files too large. This needs measuring.
  • Cython’s old syntax for using NumPy arrays should be removed and replaced with Cython memoryviews.
  • New feature idea: more of the currently wrapped libraries should export Cython-importable versions that can be used without linking.

Regarding build environments:

  • NumPy and SciPy should both build from source on Windows with a MinGW-w64 toolchain and be compatible with Python installations compiled with either the same MinGW or with MSVC.
  • Bento development has stopped, so will remain having an experimental, use-at-your-own-risk status. Only the people that use it will be responsible for keeping the Bento build updated.

A more complete continuous integration setup is needed; at the moment we often find out right before a release that there are issues on some less-often used platform or Python version. At least needed are Windows (MSVC and MingwPy), Linux and OS X builds, coverage of the lowest and highest Python and NumPy versions that are supported.

Modules

cluster

This module is in good shape.

constants

This module is basically done, low-maintenance and without open issues.

fftpack

Needed:

  • solve issues with single precision: large errors, disabled for difficult sizes
  • fix caching bug
  • Bluestein algorithm (or chirp Z-transform)
  • deprecate fftpack.convolve as public function (was not meant to be public)

There’s a large overlap with numpy.fft. This duplication has to change (both are too widely used to deprecate one); in the documentation we should make clear that scipy.fftpack is preferred over numpy.fft. If there are differences in signature or functionality, the best version should be picked case by case (example: numpy’s rfft is preferred, see gh-2487).

integrate

Needed for ODE solvers:

  • Documentation is pretty bad, needs fixing
  • A promising new ODE solver interface is in progress: gh-6326. This needs to be finished and merged. After that, older API can possibly be deprecated.

The numerical integration functions are in good shape. Support for integrating complex-valued functions and integrating multiple intervals (see gh-3325) could be added, but is not required for SciPy 1.0.

interpolate

Needed:

  • Both fitpack and fitpack2 interfaces will be kept.
  • splmake is deprecated; is different spline representation, we need exactly one
  • interp1d/interp2d are somewhat ugly but widely used, so we keep them.

Ideas for new features:

  • Spline fitting routines with better user control.
  • Integration and differentiation and arithmetic routines for splines
  • Transparent tensor-product splines.
  • NURBS support.
  • Mesh refinement and coarsening of B-splines and corresponding tensor products.

io

wavfile;

  • PCM float will be supported, for anything else use audiolab or other specialized libraries.
  • Raise errors instead of warnings if data not understood.

Other sub-modules (matlab, netcdf, idl, harwell-boeing, arff, matrix market) are in good shape.

linalg

Needed:

  • Remove functions that are duplicate with numpy.linalg
  • get_lapack_funcs should always use flapack
  • Wrap more LAPACK functions
  • One too many funcs for LU decomposition, remove one

Ideas for new features:

  • Add type-generic wrappers in the Cython BLAS and LAPACK
  • Make many of the linear algebra routines into gufuncs

misc

scipy.misc will be removed as a public module. The functions in it can be moved to other modules:

  • pilutil, images : ndimage
  • comb, factorial, logsumexp, pade : special
  • doccer : move to scipy._lib
  • info, who : these are NumPy functions
  • derivative, central_diff_weight : remove, replace with more extensive functionality for numerical differentiation - likely in a new module scipy.diff (see below)

ndimage

Underlying ndimage is a powerful interpolation engine. Unfortunately, it was never decided whether to use a pixel model ((1, 1) elements with centers (0.5, 0.5)) or a data point model (values at points on a grid). Over time, it seems that the data point model is better defined and easier to implement. We therefore propose to move to this data representation for 1.0, and to vet all interpolation code to ensure that boundary values, transformations, etc. are correctly computed. Addressing this issue will close several issues, including #1323, #1903, #2045 and #2640.

The morphology interface needs to be standardized:

  • binary dilation/erosion/opening/closing take a “structure” argument, whereas their grey equivalent take size (has to be a tuple, not a scalar), footprint, or structure.
  • a scalar should be acceptable for size, equivalent to providing that same value for each axis.
  • for binary dilation/erosion/opening/closing, the structuring element is optional, whereas it’s mandatory for grey. Grey morphology operations should get the same default.
  • other filters should also take that default value where possible.

odr

Rename the module to regression or fitting, include optimize.curve_fit. This module will then provide a home for other fitting functionality - what exactly needs to be worked out in more detail, a discussion can be found at https://github.com/scipy/scipy/pull/448.

optimize

Overall this module is in reasonably good shape, however it is missing a few more good global optimizers as well as large-scale optimizers. These should be added. Other things that are needed:

  • deprecate the fmin_* functions in the documentation, minimize is preferred.
  • clearly define what’s out of scope for this module.

signal

Convolution and correlation: (Relevant functions are convolve, correlate, fftconvolve, convolve2d, correlate2d, and sepfir2d.) Eliminate the overlap with ndimage (and elsewhere). From numpy, scipy.signal and scipy.ndimage (and anywhere else we find them), pick the “best of class” for 1-D, 2-D and n-d convolution and correlation, put the implementation somewhere, and use that consistently throughout SciPy.

B-splines: (Relevant functions are bspline, cubic, quadratic, gauss_spline, cspline1d, qspline1d, cspline2d, qspline2d, cspline1d_eval, and spline_filter.) Move the good stuff to interpolate (with appropriate API changes to match how things are done in interpolate), and eliminate any duplication.

Filter design: merge firwin and firwin2 so firwin2 can be removed.

Continuous-Time Linear Systems: remove lsim2, impulse2, step2. Make lsim, impulse and step “just work” for any input system. Improve performance of ltisys (less internal transformations between different representations). Fill gaps in lti system conversion functions.

Second Order Sections: Make SOS filtering equally capable as existing methods. This includes ltisys objects, an lfiltic equivalent, and numerically stable conversions to and from other filter representations. SOS filters could be considered as the default filtering method for ltisys objects, for their numerical stability.

Wavelets: what’s there now doesn’t make much sense. Continous wavelets only at the moment - decide whether to completely rewrite or remove them. Discrete wavelet transforms are out of scope (PyWavelets does a good job for those).

sparse

The sparse matrix formats are getting feature-complete but are slow ... reimplement parts in Cython?

  • Small matrices are slower than PySparse, needs fixing

There are a lot of formats. These should be kept, but improvements/optimizations should go into CSR/CSC, which are the preferred formats. LIL may be the exception, it’s inherently inefficient. It could be dropped if DOK is extended to support all the operations LIL currently provides. Alternatives are being worked on, see https://github.com/ev-br/sparr and https://github.com/perimosocordiae/sparray.

Ideas for new features:

  • Sparse arrays now act like np.matrix. We want sparse arrays.

sparse.csgraph

This module is in good shape.

sparse.linalg

Arpack is in good shape.

isolve:

  • callback keyword is inconsistent
  • tol keyword is broken, should be relative tol
  • Fortran code not re-entrant (but we don’t solve, maybe re-use from PyKrilov)

dsolve:

  • add sparse Cholesky or incomplete Cholesky
  • look at CHOLMOD

Ideas for new features:

  • Wrappers for PROPACK for faster sparse SVD computation.

spatial

QHull wrappers are in good shape.

Needed:

  • KDTree will be removed, and cKDTree will be renamed to KDTree in a backwards-compatible way.
  • distance_wrap.c needs to be cleaned up (maybe rewrite in Cython).

special

Though there are still a lot of functions that need improvements in precision, probably the only show-stoppers are hypergeometric functions, parabolic cylinder functions, and spheroidal wave functions. Three possible ways to handle this:

  1. Get good double-precision implementations. This is doable for parabolic cylinder functions (in progress). I think it’s possible for hypergeometric functions, though maybe not in time. For spheroidal wavefunctions this is not possible with current theory.
  2. Port Boost’s arbitrary precision library and use it under the hood to get double precision accuracy. This might be necessary as a stopgap measure for hypergeometric functions; the idea of using arbitrary precision has been suggested before by @nmayorov and in gh-5349. Likely necessary for spheroidal wave functions, this could be reused: https://github.com/radelman/scattering.
  3. Add clear warnings to the documentation about the limits of the existing implementations.

stats

stats.distributions is in good shape.

gaussian_kde is in good shape but limited. It should not be expanded probably, this fits better in Statsmodels (which already has a lot more KDE functionality).

stats.mstats is a useful module for worked with data with missing values. One problem it has though is that in many cases the functions have diverged from their counterparts in scipy.stats. The mstats functions should be updated so that the two sets of functions are consistent.

weave

This module is deprecated and will be removed for Scipy 1.0. It has been packaged as a separate package “weave”, which users that still rely on the functionality provided by scipy.weave should use.

Also note that this is the only module that was not ported to Python 3.

New modules under discussion

diff

Currently Scipy doesn’t provide much support for numerical differentiation. A new scipy.diff module for that is discussed in https://github.com/scipy/scipy/issues/2035. There’s also a fairly detailed GSoC proposal to build on, see here.

There is also approx_derivative in optimize, which is still private but could form a solid basis for this module.

transforms

This module was discussed previously, mainly to provide a home for discrete wavelet transform functionality. Other transforms could fit as well, for example there’s a PR for a Hankel transform . Note: this is on the back burner, because the plans to integrate PyWavelets DWT code has been put on hold.