# pysindy package¶

## pysindy.pysindy module¶

class pysindy.pysindy.SINDy(optimizer=None, feature_library=None, differentiation_method=None, feature_names=None, t_default=1, discrete_time=False)[source]

Bases: sklearn.base.BaseEstimator

Sparse Identification of Nonlinear Dynamical Systems (SINDy). Uses sparse regression to learn a dynamical systems model from measurement data.

Parameters
• optimizer (optimizer object, optional) – Optimization method used to fit the SINDy model. This must be a class extending pysindy.optimizers.BaseOptimizer. The default is STLSQ.

• feature_library (feature library object, optional) – Feature library object used to specify candidate right-hand side features. This must be a class extending pysindy.feature_library.base.BaseFeatureLibrary. The default option is PolynomialLibrary.

• differentiation_method (differentiation object, optional) – Method for differentiating the data. This must be a class extending pysindy.differentiation_methods.base.BaseDifferentiation class. The default option is centered difference.

• feature_names (list of string, length n_input_features, optional) – Names for the input features (e.g. ['x', 'y', 'z']). If None, will use ['x0', 'x1', ...].

• t_default (float, optional (default 1)) – Default value for the time step.

• discrete_time (boolean, optional (default False)) – If True, dynamical system is treated as a map. Rather than predicting derivatives, the right hand side functions step the system forward by one time step. If False, dynamical system is assumed to be a flow (right-hand side functions predict continuous time derivatives).

Attributes
• model (sklearn.multioutput.MultiOutputRegressor object) – The fitted SINDy model.

• n_input_features_ (int) – The total number of input features.

• n_output_features_ (int) – The total number of output features. This number is a function of self.n_input_features and the feature library being used.

• n_control_features_ (int) – The total number of control input features.

Examples

>>> import numpy as np
>>> from scipy.integrate import solve_ivp
>>> from pysindy import SINDy
>>> lorenz = lambda z,t : [10*(z[1] - z[0]),
>>>                        z[0]*(28 - z[2]) - z[1],
>>>                        z[0]*z[1] - 8/3*z[2]]
>>> t = np.arange(0,2,.002)
>>> x = solve_ivp(lorenz, [-8,8,27], t)
>>> model = SINDy()
>>> model.fit(x, t=t[1]-t[0])
>>> model.print()
x0' = -10.000 1 + 10.000 x0
x1' = 27.993 1 + -0.999 x0 + -1.000 1 x1
x2' = -2.666 x1 + 1.000 1 x0
>>> model.coefficients()
array([[ 0.        ,  0.        ,  0.        ],
[-9.99969193, 27.99344519,  0.        ],
[ 9.99961547, -0.99905338,  0.        ],
[ 0.        ,  0.        , -2.66645651],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.99990257],
[ 0.        , -0.99980268,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ]])
>>> model.score(x, t=t[1]-t[0])
0.999999985520653

>>> import numpy as np
>>> from scipy.integrate import solve_ivp
>>> from pysindy import SINDy
>>> u = lambda t : np.sin(2 * t)
>>> lorenz_c = lambda z,t : [
10 * (z[1] - z[0]) + u(t) ** 2,
z[0] * (28 - z[2]) - z[1],
z[0] * z[1] - 8 / 3 * z[2],
]
>>> t = np.arange(0,2,0.002)
>>> x = solve_ivp(lorenz_c, [-8,8,27], t)
>>> u_eval = u(t)
>>> model = SINDy()
>>> model.fit(x, u_eval, t=t[1]-t[0])
>>> model.print()
x0' = -10.000 x0 + 10.000 x1 + 1.001 u0^2
x1' = 27.994 x0 + -0.999 x1 + -1.000 x0 x2
x2' = -2.666 x2 + 1.000 x0 x1
>>> model.coefficients()
array([[ 0.        , -9.99969851,  9.99958359,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ,  1.00120331],
[ 0.        , 27.9935177 , -0.99906375,  0.        ,  0.        ,
0.        ,  0.        , -0.99980455,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        , -2.666437  ,  0.        ,
0.        ,  0.99990137,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])
>>> model.score(x, u_eval, t=t[1]-t[0])
0.9999999855414495

fit(x, t=None, x_dot=None, u=None, multiple_trajectories=False, unbias=True, quiet=False, ensemble=False, library_ensemble=False, replace=True, n_candidates_to_drop=1, n_subset=None, n_models=None, ensemble_aggregator=None)[source]

Fit a SINDy model.

Parameters
• x (array-like or list of array-like, shape (n_samples, n_input_features)) – Training data. If training data contains multiple trajectories, x should be a list containing data for each trajectory. Individual trajectories may contain different numbers of samples.

• t (float, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – If t is a float, it specifies the timestep between each sample. If array-like, it specifies the time at which each sample was collected. In this case the values in t must be strictly increasing. In the case of multi-trajectory training data, t may also be a list of arrays containing the collection times for each individual trajectory. If None, the default time step t_default will be used.

• x_dot (array-like or list of array-like, shape (n_samples, n_input_features), optional (default None)) – Optional pre-computed derivatives of the training data. If not provided, the time derivatives of the training data will be computed using the specified differentiation method. If x_dot is provided, it must match the shape of the training data and these values will be used as the time derivatives.

• u (array-like or list of array-like, shape (n_samples, n_control_features), optional (default None)) – Control variables/inputs. Include this variable to use sparse identification for nonlinear dynamical systems for control (SINDYc). If training data contains multiple trajectories (i.e. if x is a list of array-like), then u should be a list containing control variable data for each trajectory. Individual trajectories may contain different numbers of samples.

• multiple_trajectories (boolean, optional, (default False)) – Whether or not the training data includes multiple trajectories. If True, the training data must be a list of arrays containing data for each trajectory. If False, the training data must be a single array.

• unbias (boolean, optional (default True)) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support. If the optimizer (self.optimizer) applies any type of regularization, that regularization may bias coefficients toward particular values, improving the conditioning of the problem but harming the quality of the fit. Setting unbias==True enables an extra step wherein unregularized linear regression is applied, but only for the coefficients in the support identified by the optimizer. This helps to remove the bias introduced by regularization.

• quiet (boolean, optional (default False)) – Whether or not to suppress warnings during model fitting.

• ensemble (boolean, optional (default False)) – This parameter is used to allow for “ensembling”, i.e. the generation of many SINDy models (n_models) by choosing a random temporal subset of the input data (n_subset) for each sparse regression. This often improves robustness because averages (bagging) or medians (bragging) of all the models are usually quite high-performing. The user can also generate “distributions” of many models, and calculate how often certain library terms are included in a model.

• library_ensemble (boolean, optional (default False)) – This parameter is used to allow for “library ensembling”, i.e. the generation of many SINDy models (n_models) by choosing a random subset of the candidate library terms to truncate. So, n_models are generated by solving n_models sparse regression problems on these “reduced” libraries. Once again, this often improves robustness because averages (bagging) or medians (bragging) of all the models are usually quite high-performing. The user can also generate “distributions” of many models, and calculate how often certain library terms are included in a model.

• replace (boolean, optional (default True)) – If ensemble true, whether or not to time sample with replacement.

• n_candidates_to_drop (int, optional (default 1)) – Number of candidate terms in the feature library to drop during library ensembling.

• n_subset (int, optional (default len(time base))) – Number of time points to use for ensemble

• n_models (int, optional (default 20)) – Number of models to generate via ensemble

• ensemble_aggregator (callable, optional (default numpy.median)) – Method to aggregate model coefficients across different samples. This method argument is only used if ensemble or library_ensemble is True. The method should take in a list of 2D arrays and return a 2D array of the same shape as the arrays in the list. Example: lambda x: np.median(x, axis=0)

Returns

self

Return type

a fitted SINDy instance

predict(x, u=None, multiple_trajectories=False)[source]

Predict the time derivatives using the SINDy model.

Parameters
• x (array-like or list of array-like, shape (n_samples, n_input_features)) – Samples.

• u (array-like or list of array-like, shape(n_samples, n_control_features), (default None)) – Control variables. If multiple_trajectories==True then u must be a list of control variable data from each trajectory. If the model was fit with control variables then u is not optional.

• multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.

Returns

x_dot – Predicted time derivatives

Return type

array-like or list of array-like, shape (n_samples, n_input_features)

equations(precision=3)[source]

Get the right hand sides of the SINDy model equations.

Parameters

precision (int, optional (default 3)) – Number of decimal points to include for each coefficient in the equation.

Returns

equations – List of strings representing the SINDy model equations for each input feature.

Return type

list of strings

print(lhs=None, precision=3)[source]

Print the SINDy model equations.

Parameters
• lhs (list of strings, optional (default None)) – List of variables to print on the left-hand sides of the learned equations. By defualt self.input_features are used.

• precision (int, optional (default 3)) – Precision to be used when printing out model coefficients.

score(x, t=None, x_dot=None, u=None, multiple_trajectories=False, metric=<function r2_score>, **metric_kws)[source]

Returns a score for the time derivative prediction produced by the model.

Parameters
• x (array-like or list of array-like, shape (n_samples, n_input_features)) – Samples from which to make predictions.

• t (float, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – Time step between samples or array of collection times. Optional, used to compute the time derivatives of the samples if x_dot is not provided. If None, the default time step t_default will be used.

• x_dot (array-like or list of array-like, shape (n_samples, n_input_features), optional (default None)) – Optional pre-computed derivatives of the samples. If provided, these values will be used to compute the score. If not provided, the time derivatives of the training data will be computed using the specified differentiation method.

• u (array-like or list of array-like, shape(n_samples, n_control_features), optional (default None)) – Control variables. If multiple_trajectories==True then u must be a list of control variable data from each trajectory. If the model was fit with control variables then u is not optional.

• multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.

• metric (callable, optional) – Metric function with which to score the prediction. Default is the R^2 coefficient of determination. See Scikit-learn for more options.

• metric_kws (dict, optional) – Optional keyword arguments to pass to the metric function.

Returns

score – Metric function value for the model prediction of x_dot.

Return type

float

differentiate(x, t=None, multiple_trajectories=False)[source]

Apply the model’s differentiation method (self.differentiation_method) to data.

Parameters
• x (array-like or list of array-like, shape (n_samples, n_input_features)) – Data to be differentiated.

• t (int, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – Time step between samples or array of collection times. If None, the default time step t_default will be used.

• multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.

Returns

x_dot – Time derivatives computed by using the model’s differentiation method

Return type

array-like or list of array-like, shape (n_samples, n_input_features)

coefficients()[source]

Get an array of the coefficients learned by SINDy model.

Returns

coef – Learned coefficients of the SINDy model. Equivalent to $$\Xi^\top$$ in the literature.

Return type

np.ndarray, shape (n_input_features, n_output_features)

get_feature_names()[source]

Get a list of names of features used by SINDy model.

Returns

feats – A list of strings giving the names of the features in the feature library, self.feature_library.

Return type

list

simulate(x0, t, u=None, integrator='solve_ivp', stop_condition=None, interpolator=None, integrator_kws={'atol': 1e-12, 'method': 'LSODA', 'rtol': 1e-12}, interpolator_kws={})[source]

Simulate the SINDy model forward in time.

Parameters
• x0 (numpy array, size [n_features]) – Initial condition from which to simulate.

• t (int or numpy array of size [n_samples]) – If the model is in continuous time, t must be an array of time points at which to simulate. If the model is in discrete time, t must be an integer indicating how many steps to predict.

• u (function from R^1 to R^{n_control_features} or list/array, optional (default None)) – Control inputs. If the model is continuous time, i.e. self.discrete_time == False, this function should take in a time and output the values of each of the n_control_features control features as a list or numpy array. Alternatively, if the model is continuous time, u can also be an array of control inputs at each time step. In this case the array is fit with the interpolator specified by interpolator. If the model is discrete time, i.e. self.discrete_time == True, u should be a list (with len(u) == t) or array (with u.shape[0] == 1) giving the control inputs at each step.

• integrator (string, optional (default solve_ivp)) – Function to use to integrate the system. Default is scipy.integrate.solve_ivp. The only options currently supported are solve_ivp and odeint.

• stop_condition (function object, optional) – If model is in discrete time, optional function that gives a stopping condition for stepping the simulation forward.

• interpolator (callable, optional (default interp1d)) – Function used to interpolate control inputs if u is an array. Default is scipy.interpolate.interp1d.

• integrator_kws (dict, optional (default {})) – Optional keyword arguments to pass to the integrator

• interpolator_kws (dict, optional (default {})) – Optional keyword arguments to pass to the control input interpolator

Returns

x – Simulation results

Return type

numpy array, shape (n_samples, n_features)

property complexity

Complexity of the model measured as the number of nonzero parameters.

## Module contents¶

class pysindy.SINDy(optimizer=None, feature_library=None, differentiation_method=None, feature_names=None, t_default=1, discrete_time=False)[source]

Bases: sklearn.base.BaseEstimator

Sparse Identification of Nonlinear Dynamical Systems (SINDy). Uses sparse regression to learn a dynamical systems model from measurement data.

Parameters
• optimizer (optimizer object, optional) – Optimization method used to fit the SINDy model. This must be a class extending pysindy.optimizers.BaseOptimizer. The default is STLSQ.

• feature_library (feature library object, optional) – Feature library object used to specify candidate right-hand side features. This must be a class extending pysindy.feature_library.base.BaseFeatureLibrary. The default option is PolynomialLibrary.

• differentiation_method (differentiation object, optional) – Method for differentiating the data. This must be a class extending pysindy.differentiation_methods.base.BaseDifferentiation class. The default option is centered difference.

• feature_names (list of string, length n_input_features, optional) – Names for the input features (e.g. ['x', 'y', 'z']). If None, will use ['x0', 'x1', ...].

• t_default (float, optional (default 1)) – Default value for the time step.

• discrete_time (boolean, optional (default False)) – If True, dynamical system is treated as a map. Rather than predicting derivatives, the right hand side functions step the system forward by one time step. If False, dynamical system is assumed to be a flow (right-hand side functions predict continuous time derivatives).

Attributes
• model (sklearn.multioutput.MultiOutputRegressor object) – The fitted SINDy model.

• n_input_features_ (int) – The total number of input features.

• n_output_features_ (int) – The total number of output features. This number is a function of self.n_input_features and the feature library being used.

• n_control_features_ (int) – The total number of control input features.

Examples

>>> import numpy as np
>>> from scipy.integrate import solve_ivp
>>> from pysindy import SINDy
>>> lorenz = lambda z,t : [10*(z[1] - z[0]),
>>>                        z[0]*(28 - z[2]) - z[1],
>>>                        z[0]*z[1] - 8/3*z[2]]
>>> t = np.arange(0,2,.002)
>>> x = solve_ivp(lorenz, [-8,8,27], t)
>>> model = SINDy()
>>> model.fit(x, t=t[1]-t[0])
>>> model.print()
x0' = -10.000 1 + 10.000 x0
x1' = 27.993 1 + -0.999 x0 + -1.000 1 x1
x2' = -2.666 x1 + 1.000 1 x0
>>> model.coefficients()
array([[ 0.        ,  0.        ,  0.        ],
[-9.99969193, 27.99344519,  0.        ],
[ 9.99961547, -0.99905338,  0.        ],
[ 0.        ,  0.        , -2.66645651],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.99990257],
[ 0.        , -0.99980268,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        ]])
>>> model.score(x, t=t[1]-t[0])
0.999999985520653

>>> import numpy as np
>>> from scipy.integrate import solve_ivp
>>> from pysindy import SINDy
>>> u = lambda t : np.sin(2 * t)
>>> lorenz_c = lambda z,t : [
10 * (z[1] - z[0]) + u(t) ** 2,
z[0] * (28 - z[2]) - z[1],
z[0] * z[1] - 8 / 3 * z[2],
]
>>> t = np.arange(0,2,0.002)
>>> x = solve_ivp(lorenz_c, [-8,8,27], t)
>>> u_eval = u(t)
>>> model = SINDy()
>>> model.fit(x, u_eval, t=t[1]-t[0])
>>> model.print()
x0' = -10.000 x0 + 10.000 x1 + 1.001 u0^2
x1' = 27.994 x0 + -0.999 x1 + -1.000 x0 x2
x2' = -2.666 x2 + 1.000 x0 x1
>>> model.coefficients()
array([[ 0.        , -9.99969851,  9.99958359,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ,  1.00120331],
[ 0.        , 27.9935177 , -0.99906375,  0.        ,  0.        ,
0.        ,  0.        , -0.99980455,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ,  0.        ],
[ 0.        ,  0.        ,  0.        , -2.666437  ,  0.        ,
0.        ,  0.99990137,  0.        ,  0.        ,  0.        ,
0.        ,  0.        ,  0.        ,  0.        ,  0.        ]])
>>> model.score(x, u_eval, t=t[1]-t[0])
0.9999999855414495

fit(x, t=None, x_dot=None, u=None, multiple_trajectories=False, unbias=True, quiet=False, ensemble=False, library_ensemble=False, replace=True, n_candidates_to_drop=1, n_subset=None, n_models=None, ensemble_aggregator=None)[source]

Fit a SINDy model.

Parameters
• x (array-like or list of array-like, shape (n_samples, n_input_features)) – Training data. If training data contains multiple trajectories, x should be a list containing data for each trajectory. Individual trajectories may contain different numbers of samples.

• t (float, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – If t is a float, it specifies the timestep between each sample. If array-like, it specifies the time at which each sample was collected. In this case the values in t must be strictly increasing. In the case of multi-trajectory training data, t may also be a list of arrays containing the collection times for each individual trajectory. If None, the default time step t_default will be used.

• x_dot (array-like or list of array-like, shape (n_samples, n_input_features), optional (default None)) – Optional pre-computed derivatives of the training data. If not provided, the time derivatives of the training data will be computed using the specified differentiation method. If x_dot is provided, it must match the shape of the training data and these values will be used as the time derivatives.

• u (array-like or list of array-like, shape (n_samples, n_control_features), optional (default None)) – Control variables/inputs. Include this variable to use sparse identification for nonlinear dynamical systems for control (SINDYc). If training data contains multiple trajectories (i.e. if x is a list of array-like), then u should be a list containing control variable data for each trajectory. Individual trajectories may contain different numbers of samples.

• multiple_trajectories (boolean, optional, (default False)) – Whether or not the training data includes multiple trajectories. If True, the training data must be a list of arrays containing data for each trajectory. If False, the training data must be a single array.

• unbias (boolean, optional (default True)) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support. If the optimizer (self.optimizer) applies any type of regularization, that regularization may bias coefficients toward particular values, improving the conditioning of the problem but harming the quality of the fit. Setting unbias==True enables an extra step wherein unregularized linear regression is applied, but only for the coefficients in the support identified by the optimizer. This helps to remove the bias introduced by regularization.

• quiet (boolean, optional (default False)) – Whether or not to suppress warnings during model fitting.

• ensemble (boolean, optional (default False)) – This parameter is used to allow for “ensembling”, i.e. the generation of many SINDy models (n_models) by choosing a random temporal subset of the input data (n_subset) for each sparse regression. This often improves robustness because averages (bagging) or medians (bragging) of all the models are usually quite high-performing. The user can also generate “distributions” of many models, and calculate how often certain library terms are included in a model.

• library_ensemble (boolean, optional (default False)) – This parameter is used to allow for “library ensembling”, i.e. the generation of many SINDy models (n_models) by choosing a random subset of the candidate library terms to truncate. So, n_models are generated by solving n_models sparse regression problems on these “reduced” libraries. Once again, this often improves robustness because averages (bagging) or medians (bragging) of all the models are usually quite high-performing. The user can also generate “distributions” of many models, and calculate how often certain library terms are included in a model.

• replace (boolean, optional (default True)) – If ensemble true, whether or not to time sample with replacement.

• n_candidates_to_drop (int, optional (default 1)) – Number of candidate terms in the feature library to drop during library ensembling.

• n_subset (int, optional (default len(time base))) – Number of time points to use for ensemble

• n_models (int, optional (default 20)) – Number of models to generate via ensemble

• ensemble_aggregator (callable, optional (default numpy.median)) – Method to aggregate model coefficients across different samples. This method argument is only used if ensemble or library_ensemble is True. The method should take in a list of 2D arrays and return a 2D array of the same shape as the arrays in the list. Example: lambda x: np.median(x, axis=0)

Returns

self

Return type

a fitted SINDy instance

predict(x, u=None, multiple_trajectories=False)[source]

Predict the time derivatives using the SINDy model.

Parameters
• x (array-like or list of array-like, shape (n_samples, n_input_features)) – Samples.

• u (array-like or list of array-like, shape(n_samples, n_control_features), (default None)) – Control variables. If multiple_trajectories==True then u must be a list of control variable data from each trajectory. If the model was fit with control variables then u is not optional.

• multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.

Returns

x_dot – Predicted time derivatives

Return type

array-like or list of array-like, shape (n_samples, n_input_features)

equations(precision=3)[source]

Get the right hand sides of the SINDy model equations.

Parameters

precision (int, optional (default 3)) – Number of decimal points to include for each coefficient in the equation.

Returns

equations – List of strings representing the SINDy model equations for each input feature.

Return type

list of strings

print(lhs=None, precision=3)[source]

Print the SINDy model equations.

Parameters
• lhs (list of strings, optional (default None)) – List of variables to print on the left-hand sides of the learned equations. By defualt self.input_features are used.

• precision (int, optional (default 3)) – Precision to be used when printing out model coefficients.

score(x, t=None, x_dot=None, u=None, multiple_trajectories=False, metric=<function r2_score>, **metric_kws)[source]

Returns a score for the time derivative prediction produced by the model.

Parameters
• x (array-like or list of array-like, shape (n_samples, n_input_features)) – Samples from which to make predictions.

• t (float, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – Time step between samples or array of collection times. Optional, used to compute the time derivatives of the samples if x_dot is not provided. If None, the default time step t_default will be used.

• x_dot (array-like or list of array-like, shape (n_samples, n_input_features), optional (default None)) – Optional pre-computed derivatives of the samples. If provided, these values will be used to compute the score. If not provided, the time derivatives of the training data will be computed using the specified differentiation method.

• u (array-like or list of array-like, shape(n_samples, n_control_features), optional (default None)) – Control variables. If multiple_trajectories==True then u must be a list of control variable data from each trajectory. If the model was fit with control variables then u is not optional.

• multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.

• metric (callable, optional) –

Metric function with which to score the prediction. Default is the R^2 coefficient of determination. See Scikit-learn for more options.

• metric_kws (dict, optional) – Optional keyword arguments to pass to the metric function.

Returns

score – Metric function value for the model prediction of x_dot.

Return type

float

differentiate(x, t=None, multiple_trajectories=False)[source]

Apply the model’s differentiation method (self.differentiation_method) to data.

Parameters
• x (array-like or list of array-like, shape (n_samples, n_input_features)) – Data to be differentiated.

• t (int, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – Time step between samples or array of collection times. If None, the default time step t_default will be used.

• multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.

Returns

x_dot – Time derivatives computed by using the model’s differentiation method

Return type

array-like or list of array-like, shape (n_samples, n_input_features)

coefficients()[source]

Get an array of the coefficients learned by SINDy model.

Returns

coef – Learned coefficients of the SINDy model. Equivalent to $$\Xi^\top$$ in the literature.

Return type

np.ndarray, shape (n_input_features, n_output_features)

get_feature_names()[source]

Get a list of names of features used by SINDy model.

Returns

feats – A list of strings giving the names of the features in the feature library, self.feature_library.

Return type

list

simulate(x0, t, u=None, integrator='solve_ivp', stop_condition=None, interpolator=None, integrator_kws={'atol': 1e-12, 'method': 'LSODA', 'rtol': 1e-12}, interpolator_kws={})[source]

Simulate the SINDy model forward in time.

Parameters
• x0 (numpy array, size [n_features]) – Initial condition from which to simulate.

• t (int or numpy array of size [n_samples]) – If the model is in continuous time, t must be an array of time points at which to simulate. If the model is in discrete time, t must be an integer indicating how many steps to predict.

• u (function from R^1 to R^{n_control_features} or list/array, optional (default None)) – Control inputs. If the model is continuous time, i.e. self.discrete_time == False, this function should take in a time and output the values of each of the n_control_features control features as a list or numpy array. Alternatively, if the model is continuous time, u can also be an array of control inputs at each time step. In this case the array is fit with the interpolator specified by interpolator. If the model is discrete time, i.e. self.discrete_time == True, u should be a list (with len(u) == t) or array (with u.shape[0] == 1) giving the control inputs at each step.

• integrator (string, optional (default solve_ivp)) – Function to use to integrate the system. Default is scipy.integrate.solve_ivp. The only options currently supported are solve_ivp and odeint.

• stop_condition (function object, optional) – If model is in discrete time, optional function that gives a stopping condition for stepping the simulation forward.

• interpolator (callable, optional (default interp1d)) – Function used to interpolate control inputs if u is an array. Default is scipy.interpolate.interp1d.

• integrator_kws (dict, optional (default {})) – Optional keyword arguments to pass to the integrator

• interpolator_kws (dict, optional (default {})) – Optional keyword arguments to pass to the control input interpolator

Returns

x – Simulation results

Return type

numpy array, shape (n_samples, n_features)

property complexity

Complexity of the model measured as the number of nonzero parameters.

class pysindy.BaseDifferentiation[source]

Bases: sklearn.base.BaseEstimator

Base class for differentiation methods.

Simply forces differentiation methods to implement a _differentiate function.

class pysindy.FiniteDifference(order=2, d=1, axis=0, is_uniform=False, drop_endpoints=False, periodic=False)[source]

Finite difference derivatives.

Parameters
• order (int, optional (default 2)) – The order of the finite difference method to be used. Currently only centered differences are implemented, for even order and left-off-centered differences for odd order.

• d (int, optional (default 1)) – The order of derivative to take. Must be positive integer.

• axis (int, optional (default 0)) – The axis to differentiate along.

• is_uniform (boolean, optional (default False)) – Parameter to tell the differentiation that, although a N-dim grid is passed, it is uniform so can use dx instead of the full grid array.

• drop_endpoints (boolean, optional (default False)) – Whether or not derivatives are computed for endpoints. If False, endpoints will be set to np.nan. Note that which points are endpoints depends on the method being used.

• periodic (boolean, optional (default False)) – Whether to use periodic boundary conditions for endpoints. Use forward differences for periodic=False and periodic boundaries with centered differences for periodic=True on the boundaries. No effect if drop_endpoints=True

Examples

>>> import numpy as np
>>> from pysindy.differentiation import FiniteDifference
>>> t = np.linspace(0, 1, 5)
>>> X = np.vstack((np.sin(t), np.cos(t))).T
>>> fd = FiniteDifference()
>>> fd._differentiate(X, t)
array([[ 1.00114596,  0.00370551],
[ 0.95885108, -0.24483488],
[ 0.8684696 , -0.47444711],
[ 0.72409089, -0.67456051],
[ 0.53780339, -0.84443737]])

class pysindy.SINDyDerivative(**kwargs)[source]

Bases: sklearn.base.BaseEstimator

Wrapper class for differentiation classes from the derivative package. This class is meant to provide all the same functionality as the dxdt method.

This class also has _differentiate and __call__ methods which are used by PySINDy.

Parameters

derivative_kws (dictionary, optional) –

Keyword arguments to be passed to the dxdt method.

Notes

See the derivative documentation for acceptable keywords.

set_params(**params)[source]

Set the parameters of this estimator. Modification of the pysindy method to allow unknown kwargs. This allows using the full range of derivative parameters that are not defined as member variables in sklearn grid search.

Returns

Return type

self

get_params(deep=True)[source]

Get parameters.

class pysindy.SmoothedFiniteDifference(smoother=<function savgol_filter>, smoother_kws={}, **kwargs)[source]

Smoothed finite difference derivatives.

Perform differentiation by smoothing input data then applying a finite difference method.

Parameters
• smoother (function, optional (default savgol_filter)) – Function to perform smoothing. Must be compatible with the following call signature: x_smoothed = smoother(x, **smoother_kws)

• smoother_kws (dict, optional (default {})) – Arguments passed to smoother when it is invoked.

• **kwargs (kwargs) – Addtional parameters passed to the pysindy.FiniteDifference.__init__ function.

Examples

>>> import numpy as np
>>> from pysindy.differentiation import SmoothedFiniteDifference
>>> t = np.linspace(0,1,10)
>>> X = np.vstack((np.sin(t),np.cos(t))).T
>>> sfd = SmoothedFiniteDifference(smoother_kws={'window_length': 5})
>>> sfd._differentiate(X, t)
array([[ 1.00013114e+00,  7.38006789e-04],
[ 9.91779070e-01, -1.10702304e-01],
[ 9.73376491e-01, -2.20038119e-01],
[ 9.43001496e-01, -3.26517615e-01],
[ 9.00981354e-01, -4.29066632e-01],
[ 8.47849424e-01, -5.26323977e-01],
[ 7.84260982e-01, -6.17090177e-01],
[ 7.11073255e-01, -7.00180971e-01],
[ 6.29013295e-01, -7.74740601e-01],
[ 5.39752150e-01, -8.41980082e-01]])

class pysindy.SpectralDerivative(d=1, axis=0)[source]

Spectral derivatives. Assumes uniform grid, and utilizes FFT to approximate a derivative. Works well for derivatives in periodic dimensions. Equivalent to a maximal-order finite difference, but runs in O(NlogN).

Parameters
• d (int) – The order of derivative to take

• axis (int, optional (default 0)) – The axis to differentiate along

Examples

>>> import numpy as np
>>> from pysindy.differentiation import SpectralDerivative
>>> t = np.arange(0,1,0.1)
>>> X = np.vstack((np.sin(t), np.cos(t))).T
>>> sd = SpectralDerivative()
>>> sd._differentiate(X, t)
array([[ 6.28318531e+00,  2.69942771e-16],
[ 5.08320369e+00, -3.69316366e+00],
[ 1.94161104e+00, -5.97566433e+00],
[-1.94161104e+00, -5.97566433e+00],
[-5.08320369e+00, -3.69316366e+00],
[-6.28318531e+00,  7.10542736e-16],
[-5.08320369e+00,  3.69316366e+00],
[-1.94161104e+00,  5.97566433e+00],
[ 1.94161104e+00,  5.97566433e+00],
[ 5.08320369e+00,  3.69316366e+00]])

class pysindy.ConcatLibrary(libraries: list, library_ensemble=False, ensemble_indices=[0])[source]

Concatenate multiple libraries into one library. All settings provided to individual libraries will be applied.

Parameters
• libraries (list of libraries) – Library instances to be applied to the input matrix.

• library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library).

• ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library. For instance, if ensemble_indices = [0], it chops off the first column of the library.

Attributes
• libraries_ (list of libraries) – Library instances to be applied to the input matrix.

• n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.

• n_output_features_ (int) – The total number of output features. The number of output features is the sum of the numbers of output features for each of the concatenated libraries.

Examples

>>> import numpy as np
>>> from pysindy.feature_library import FourierLibrary, CustomLibrary
>>> from pysindy.feature_library import ConcatLibrary
>>> x = np.array([[0.,-1],[1.,0.],[2.,-1.]])
>>> functions = [lambda x : np.exp(x), lambda x,y : np.sin(x+y)]
>>> lib_custom = CustomLibrary(library_functions=functions)
>>> lib_fourier = FourierLibrary()
>>> lib_concat = ConcatLibrary([lib_custom, lib_fourier])
>>> lib_concat.fit()
>>> lib.transform(x)

fit(x, y=None)[source]

Compute number of output features.

Parameters

x (array-like, shape (n_samples, n_features)) – The data.

Returns

self

Return type

instance

transform(x)[source]

Transform data with libs provided below.

Parameters

x (array-like, shape [n_samples, n_features]) – The data to transform, row by row.

Returns

xp – The matrix of features, where NP is the number of features generated from applying the custom functions to the inputs.

Return type

np.ndarray, shape [n_samples, NP]

get_feature_names(input_features=None)[source]

Return feature names for output features.

Parameters

input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns

output_feature_names

Return type

list of string, length n_output_features

class pysindy.TensoredLibrary(libraries: list, library_ensemble=False, inputs_per_library=None, ensemble_indices=[0])[source]

Tensor multiple libraries together into one library. All settings provided to individual libraries will be applied.

Parameters
• libraries (list of libraries) – Library instances to be applied to the input matrix.

• library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library).

• ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library. For instance, if ensemble_indices = [0], it chops off the first column of the library.

Attributes
• libraries_ (list of libraries) – Library instances to be applied to the input matrix.

• inputs_per_library_ (numpy nd.array) – Array that specifies which inputs should be used for each of the libraries you are going to tensor together. Used for building GeneralizedLibrary objects.

• n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.

• n_output_features_ (int) – The total number of output features. The number of output features is the product of the numbers of output features for each of the libraries that were tensored together.

Examples

>>> import numpy as np
>>> from pysindy.feature_library import FourierLibrary, CustomLibrary
>>> from pysindy.feature_library import TensoredLibrary
>>> x = np.array([[0.,-1],[1.,0.],[2.,-1.]])
>>> functions = [lambda x : np.exp(x), lambda x,y : np.sin(x+y)]
>>> lib_custom = CustomLibrary(library_functions=functions)
>>> lib_fourier = FourierLibrary()
>>> lib_tensored = lib_custom * lib_fourier
>>> lib_tensored.fit(x)
>>> lib_tensored.transform(x)

fit(x, y=None)[source]

Compute number of output features.

Parameters

x (array-like, shape (n_samples, n_features)) – The data.

Returns

self

Return type

instance

transform(x)[source]

Transform data with libs provided below.

Parameters

x (array-like, shape [n_samples, n_features]) – The data to transform, row by row.

Returns

xp – The matrix of features, where NP is the number of features generated from applying the custom functions to the inputs.

Return type

np.ndarray, shape [n_samples, NP]

get_feature_names(input_features=None)[source]

Return feature names for output features.

Parameters

input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns

output_feature_names

Return type

list of string, length n_output_features

class pysindy.GeneralizedLibrary(libraries: list, tensor_array=None, inputs_per_library=None, library_ensemble=False, ensemble_indices=[0])[source]

Put multiple libraries into one library. All settings provided to individual libraries will be applied. Note that this class allows one to specifically choose which input variables are used for each library, and take tensor products of any pair of libraries. Tensored libraries inherit the same input variables specified for the individual libraries.

Parameters
• libraries (list of libraries) – Library instances to be applied to the input matrix.

• tensor_array (2D list of booleans, optional, (default None)) – Default is to not tensor any of the libraries together. Shape equal to the # of tensor libraries and the # feature libraries. Indicates which pairs of libraries to tensor product together and add to the overall library. For instance if you have 5 libraries, and want to do two tensor products, you could use the list [[1, 0, 0, 1, 0], [0, 1, 0, 1, 1]] to indicate that you want two tensored libraries from tensoring libraries 0 and 3 and libraries 1, 3, and 4.

• inputs_per_library (2D np.ndarray, optional (default None)) – Shape should be equal to # feature libraries by # variable input. Can be used to specify a subset of the variables to use to generate a feature library. If number of feature libraries > 1, then can be used to generate a large number of libraries, each using their own subsets of the input variables. Note that this must be specified for all the individual feature libraries.

• library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library).

• ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library. For instance, if ensemble_indices = [0], it chops off the first column of the library.

Attributes
• libraries_ (list of libraries) – Library instances to be applied to the input matrix.

• tensor_array_ (2D list of booleans (default None)) – Indicates which pairs of libraries to tensor product together and add to the overall library. For instance if you have 5 libraries, and want to do two tensor products, you could use the list [[1, 0, 0, 1, 0], [0, 1, 0, 1, 1]] to indicate that you want two tensored libraries from tensoring libraries 0 and 3 and libraries 1, 3, and 4. Shape equal to # of tensor libraries to make by the # feature libraries.

• inputs_per_library_ (2D np.ndarray, (default None)) – Default is that all inputs are used for every library. Can be used to specify a subset of the variables to use to generate a feature library. If number of feature libraries > 1, then can be use to generate a large number of libraries, each using their own subsets of the input variables. Note that this must be specified for all the individual feature libraries. The shape is equal to # feature libraries, # variable inputs.

• n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.

• n_output_features_ (int) – The total number of output features. The number of output features is the sum of the numbers of output features for each of the concatenated libraries.

Examples

>>> import numpy as np
>>> from pysindy.feature_library import FourierLibrary, CustomLibrary
>>> from pysindy.feature_library import GeneralizedLibrary
>>> x = np.array([[0.,-1],[1.,0.],[2.,-1.]])
>>> functions = [lambda x : np.exp(x), lambda x,y : np.sin(x+y)]
>>> lib_custom = CustomLibrary(library_functions=functions)
>>> lib_fourier = FourierLibrary()
>>> lib_generalized = GeneralizedLibrary([lib_custom, lib_fourier])
>>> lib_generalized.fit(x)
>>> lib_generalized.transform(x)

fit(x, y=None)[source]

Compute number of output features.

Parameters

x (array-like, shape (n_samples, n_features)) – The data.

Returns

self

Return type

instance

transform(x)[source]

Transform data with libs provided below.

Parameters

x (array-like, shape [n_samples, n_features]) – The data to transform, row by row.

Returns

xp – The matrix of features, where NP is the number of features generated from applying the custom functions to the inputs.

Return type

np.ndarray, shape [n_samples, NP]

get_feature_names(input_features=None)[source]

Return feature names for output features.

Parameters

input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns

output_feature_names

Return type

list of string, length n_output_features

class pysindy.CustomLibrary(library_functions, function_names=None, interaction_only=True, library_ensemble=False, ensemble_indices=[0], include_bias=False)[source]

Generate a library with custom functions.

Parameters
• library_functions (list of mathematical functions) – Functions to include in the library. Default is to use same functions for all variables. Can also be used so that each variable has an associated library, in this case library_functions is shape (n_input_features, num_library_functions)

• function_names (list of functions, optional (default None)) – List of functions used to generate feature names for each library function. Each name function must take a string input (representing a variable name), and output a string depiction of the respective mathematical function applied to that variable. For example, if the first library function is sine, the name function might return $$\sin(x)$$ given $$x$$ as input. The function_names list must be the same length as library_functions. If no list of function names is provided, defaults to using $$[ f_0(x),f_1(x), f_2(x), \ldots ]$$.

• interaction_only (boolean, optional (default True)) – Whether to omit self-interaction terms. If True, function evaulations of the form $$f(x,x)$$ and $$f(x,y,x)$$ will be omitted, but those of the form $$f(x,y)$$ and $$f(x,y,z)$$ will be included. If False, all combinations will be included.

• library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)

• ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.

• include_bias (boolean, optional (default False)) – If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model). This is hard to do with just lambda functions, because if the system is not 1D, lambdas will generate duplicates.

Attributes
• functions (list of functions) – Mathematical library functions to be applied to each input feature.

• function_names (list of functions) – Functions for generating string representations of each library function.

• n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.

• n_output_features_ (int) – The total number of output features. The number of output features is the product of the number of library functions and the number of input features.

Examples

>>> import numpy as np
>>> from pysindy.feature_library import CustomLibrary
>>> x = np.array([[0.,-1],[1.,0.],[2.,-1.]])
>>> functions = [lambda x : np.exp(x), lambda x,y : np.sin(x+y)]
>>> lib = CustomLibrary(library_functions=functions).fit(x)
>>> lib.transform(x)
array([[ 1.        ,  0.36787944, -0.84147098],
[ 2.71828183,  1.        ,  0.84147098],
[ 7.3890561 ,  0.36787944,  0.84147098]])
>>> lib.get_feature_names()
['f0(x0)', 'f0(x1)', 'f1(x0,x1)']

get_feature_names(input_features=None)[source]

Return feature names for output features.

Parameters

input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns

output_feature_names

Return type

list of string, length n_output_features

fit(x, y=None)[source]

Compute number of output features.

Parameters

x (array-like, shape (n_samples, n_features)) – Measurement data.

Returns

self

Return type

instance

transform(x)[source]

Transform data to custom features

Parameters

x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.

Returns

xp – The matrix of features, where n_output_features is the number of features generated from applying the custom functions to the inputs.

Return type

np.ndarray, shape (n_samples, n_output_features)

class pysindy.FourierLibrary(n_frequencies=1, include_sin=True, include_cos=True, library_ensemble=False, ensemble_indices=[0])[source]

Generate a library with trigonometric functions.

Parameters
• n_frequencies (int, optional (default 1)) – Number of frequencies to include in the library. The library will include functions $$\sin(x), \sin(2x), \dots \sin(n_{frequencies}x)$$ for each input feature $$x$$ (depending on which of sine and/or cosine features are included).

• include_sin (boolean, optional (default True)) – If True, include sine terms in the library.

• include_cos (boolean, optional (default True)) – If True, include cosine terms in the library.

• library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)

• ensemble_indices (integer array, optional (default 0)) – The indices to use for ensembling the library.

Attributes
• n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.

• n_output_features_ (int) – The total number of output features. The number of output features is 2 * n_input_features_ * n_frequencies if both sines and cosines are included. Otherwise it is n_input_features * n_frequencies.

Examples

>>> import numpy as np
>>> from pysindy.feature_library import FourierLibrary
>>> x = np.array([[0.],[1.],[2.]])
>>> lib = FourierLibrary(n_frequencies=2).fit(x)
>>> lib.transform(x)
array([[ 0.        ,  1.        ,  0.        ,  1.        ],
[ 0.84147098,  0.54030231,  0.90929743, -0.41614684],
[ 0.90929743, -0.41614684, -0.7568025 , -0.65364362]])
>>> lib.get_feature_names()
['sin(1 x0)', 'cos(1 x0)', 'sin(2 x0)', 'cos(2 x0)']

get_feature_names(input_features=None)[source]

Return feature names for output features

Parameters

input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns

output_feature_names

Return type

list of string, length n_output_features

fit(x, y=None)[source]

Compute number of output features.

Parameters

x (array-like, shape (n_samples, n_features)) – The data.

Returns

self

Return type

instance

transform(x)[source]

Transform data to Fourier features

Parameters

x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.

Returns

xp – The matrix of features, where n_output_features is the number of Fourier features generated from the inputs.

Return type

np.ndarray, shape (n_samples, n_output_features)

class pysindy.IdentityLibrary(library_ensemble=False, ensemble_indices=[0])[source]

Generate an identity library which maps all input features to themselves.

Attributes
• n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.

• n_output_features_ (int) – The total number of output features. The number of output features is equal to the number of input features.

• library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)

• ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.

Examples

>>> import numpy as np
>>> from pysindy.feature_library import IdentityLibrary
>>> x = np.array([[0,-1],[0.5,-1.5],[1.,-2.]])
>>> lib = IdentityLibrary().fit(x)
>>> lib.transform(x)
array([[ 0. , -1. ],
[ 0.5, -1.5],
[ 1. , -2. ]])
>>> lib.get_feature_names()
['x0', 'x1']

get_feature_names(input_features=None)[source]

Return feature names for output features

Parameters

input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns

output_feature_names

Return type

list of string, length n_output_features

fit(x, y=None)[source]

Compute number of output features.

Parameters

x (array-like, shape (n_samples, n_features)) – The data.

Returns

self

Return type

instance

transform(x)[source]

Perform identity transformation (return a copy of the input).

Parameters

x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.

Returns

x – The matrix of features, which is just a copy of the input data.

Return type

np.ndarray, shape (n_samples, n_features)

class pysindy.PolynomialLibrary(degree=2, include_interaction=True, interaction_only=False, include_bias=True, order='C', library_ensemble=False, ensemble_indices=[0])[source]

Bases: sklearn.preprocessing._polynomial.PolynomialFeatures, pysindy.feature_library.base.BaseFeatureLibrary

Generate polynomial and interaction features.

This is the same as sklearn.preprocessing.PolynomialFeatures, but also adds the option to omit interaction features from the library.

Parameters
• degree (integer, optional (default 2)) – The degree of the polynomial features.

• include_interaction (boolean, optional (default True)) – Determines whether interaction features are produced. If false, features are all of the form x[i] ** k.

• interaction_only (boolean, optional (default False)) – If true, only interaction features are produced: features that are products of at most degree distinct input features (so not x[1] ** 2, x[0] * x[2] ** 3, etc.).

• include_bias (boolean, optional (default True)) – If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).

• order (str in {'C', 'F'}, optional (default 'C')) – Order of output array in the dense case. ‘F’ order is faster to compute, but may slow down subsequent estimators.

• library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)

• ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.

Attributes
• powers_ (array, shape (n_output_features, n_input_features)) – powers_[i, j] is the exponent of the jth input in the ith output.

• n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.

• n_output_features_ (int) – The total number of output features. This number is computed by iterating over all appropriately sized combinations of input features.

property powers_

Exponent for each of the inputs in the output.

get_feature_names(input_features=None)[source]

Return feature names for output features.

Parameters

input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns

output_feature_names

Return type

list of string, length n_output_features

fit(x, y=None)[source]

Compute number of output features.

Parameters

x (array-like, shape (n_samples, n_features)) – The data.

Returns

self

Return type

instance

transform(x)[source]

Transform data to polynomial features.

Parameters

x (array-like or CSR/CSC sparse matrix, shape (n_samples, n_features)) – The data to transform, row by row. Prefer CSR over CSC for sparse input (for speed), but CSC is required if the degree is 4 or higher. If the degree is less than 4 and the input format is CSC, it will be converted to CSR, have its polynomial features generated, then converted back to CSC. If the degree is 2 or 3, the method described in “Leveraging Sparsity to Speed Up Polynomial Feature Expansions of CSR Matrices Using K-Simplex Numbers” by Andrew Nystrom and John Hughes is used, which is much faster than the method used on CSC input. For this reason, a CSC input will be converted to CSR, and the output will be converted back to CSC prior to being returned, hence the preference of CSR.

Returns

xp – shape (n_samples, n_output_features) The matrix of features, where n_output_features is the number of polynomial features generated from the combination of inputs.

Return type

np.ndarray or CSR/CSC sparse matrix,

class pysindy.PDELibrary(library_functions=[], derivative_order=0, spatial_grid=None, interaction_only=True, function_names=None, include_bias=False, include_interaction=True, is_uniform=False, library_ensemble=False, ensemble_indices=[0], periodic=False)[source]

Generate a PDE library with custom functions.

Parameters
• library_functions (list of mathematical functions, optional (default None)) – Functions to include in the library. Each function will be applied to each input variable (but not their derivatives)

• derivative_order (int, optional (default 0)) – Order of derivative to take on each input variable, can be arbitrary non-negative integer.

• spatial_grid (np.ndarray, optional (default None)) – The spatial grid for computing derivatives

• function_names (list of functions, optional (default None)) – List of functions used to generate feature names for each library function. Each name function must take a string input (representing a variable name), and output a string depiction of the respective mathematical function applied to that variable. For example, if the first library function is sine, the name function might return $$\sin(x)$$ given $$x$$ as input. The function_names list must be the same length as library_functions. If no list of function names is provided, defaults to using $$[ f_0(x),f_1(x), f_2(x), \ldots ]$$.

• interaction_only (boolean, optional (default True)) – Whether to omit self-interaction terms. If True, function evaulations of the form $$f(x,x)$$ and $$f(x,y,x)$$ will be omitted, but those of the form $$f(x,y)$$ and $$f(x,y,z)$$ will be included. If False, all combinations will be included.

• include_bias (boolean, optional (default False)) – If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model). This is hard to do with just lambda functions, because if the system is not 1D, lambdas will generate duplicates.

• include_interaction (boolean, optional (default True)) – This is a different than the use for the PolynomialLibrary. If true, it generates all the mixed derivative terms. If false, the library will consist of only pure no-derivative terms and pure derivative terms, with no mixed terms.

• is_uniform (boolean, optional (default True)) – If True, assume the grid is uniform in all spatial directions, so can use uniform grid spacing for the derivative calculations.

• library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)

• ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.

Attributes
• functions (list of functions) – Mathematical library functions to be applied to each input feature.

• function_names (list of functions) – Functions for generating string representations of each library function.

• n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.

• n_output_features_ (int) – The total number of output features. The number of output features is the product of the number of library functions and the number of input features.

Examples

>>> import numpy as np
>>> from pysindy.feature_library import PDELibrary

get_feature_names(input_features=None)[source]

Return feature names for output features.

Parameters

input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns

output_feature_names

Return type

list of string, length n_output_features

fit(x, y=None)[source]

Compute number of output features.

Parameters

x (array-like, shape (n_samples, n_features)) – Measurement data.

Returns

self

Return type

instance

transform(x)[source]

Transform data to pde features

Parameters

x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.

Returns

xp – The matrix of features, where n_output_features is the number of features generated from the tensor product of the derivative terms and the library_functions applied to combinations of the inputs.

Return type

np.ndarray, shape (n_samples, n_output_features)

class pysindy.WeakPDELibrary(library_functions=[], derivative_order=0, spatiotemporal_grid=None, function_names=None, interaction_only=True, include_bias=False, include_interaction=True, is_uniform=False, K=100, num_pts_per_domain=100, H_xt=None, p=4, library_ensemble=False, ensemble_indices=[0], periodic=False)[source]
Generate a weak formulation library with custom functions and,

optionally, any spatial derivatives in arbitrary dimensions.

Parameters
• library_functions (list of mathematical functions, optional (default None)) – Functions to include in the library. Each function will be applied to each input variable (but not their derivatives)

• derivative_order (int, optional (default 0)) – Order of derivative to take on each input variable, can be arbitrary non-negative integer.

• spatiotemporal_grid (np.ndarray (default None)) – The spatiotemporal grid for computing derivatives. This variable must be specified with at least one dimension corresponding to a temporal grid, so that integration by parts can be done in the weak formulation.

• function_names (list of functions, optional (default None)) – List of functions used to generate feature names for each library function. Each name function must take a string input (representing a variable name), and output a string depiction of the respective mathematical function applied to that variable. For example, if the first library function is sine, the name function might return $$\sin(x)$$ given $$x$$ as input. The function_names list must be the same length as library_functions. If no list of function names is provided, defaults to using $$[ f_0(x),f_1(x), f_2(x), \ldots ]$$.

• interaction_only (boolean, optional (default True)) – Whether to omit self-interaction terms. If True, function evaulations of the form $$f(x,x)$$ and $$f(x,y,x)$$ will be omitted, but those of the form $$f(x,y)$$ and $$f(x,y,z)$$ will be included. If False, all combinations will be included.

• include_bias (boolean, optional (default False)) – If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model). This is hard to do with just lambda functions, because if the system is not 1D, lambdas will generate duplicates.

• include_interaction (boolean, optional (default True)) – This is a different than the use for the PolynomialLibrary. If true, it generates all the mixed derivative terms. If false, the library will consist of only pure no-derivative terms and pure derivative terms, with no mixed terms.

• is_uniform (boolean, optional (default True)) – If True, assume the grid is uniform in all spatial directions, so can use uniform grid spacing for the derivative calculations.

• K (int, optional (default 100)) – Number of domain centers, corresponding to subdomain squares of length Hxt. If K is not specified, defaults to 100.

• H_xt (array of floats, optional (default None)) – Half of the length of the square subdomains in each spatiotemporal direction. If H_xt is not specified, defaults to H_xt = L_xt / 20, where L_xt is the length of the full domain in each spatiotemporal direction. If H_xt is specified as a scalar, this value will be applied to all dimensions of the subdomains.

• p (int, optional (default 4)) – Positive integer to define the polynomial degree of the spatial weights used for weak/integral SINDy.

• library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)

• ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.

Attributes
• functions (list of functions) – Mathematical library functions to be applied to each input feature.

• function_names (list of functions) – Functions for generating string representations of each library function.

• n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.

• n_output_features_ (int) – The total number of output features. The number of output features is the product of the number of library functions and the number of input features.

Examples

>>> import numpy as np
>>> from pysindy.feature_library import WeakPDELibrary
>>> x = np.array([[0.,-1],[1.,0.],[2.,-1.]])
>>> functions = [lambda x : np.exp(x), lambda x,y : np.sin(x+y)]
>>> lib = WeakPDELibrary(library_functions=functions).fit(x)
>>> lib.transform(x)
array([[ 1.        ,  0.36787944, -0.84147098],
[ 2.71828183,  1.        ,  0.84147098],
[ 7.3890561 ,  0.36787944,  0.84147098]])
>>> lib.get_feature_names()
['f0(x0)', 'f0(x1)', 'f1(x0,x1)']

get_feature_names(input_features=None)[source]

Return feature names for output features.

Parameters

input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns

output_feature_names

Return type

list of string, length n_output_features

fit(x, y=None)[source]

Compute number of output features.

Parameters

x (array-like, shape (n_samples, n_features)) – Measurement data.

Returns

self

Return type

instance

transform(x)[source]

Transform data to custom features

Parameters

x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.

Returns

xp – The matrix of features, where n_output_features is the number of features generated from applying the custom functions to the inputs.

Return type

np.ndarray, shape (n_samples, n_output_features)

class pysindy.SINDyPILibrary(library_functions=None, t=None, x_dot_library_functions=None, function_names=None, interaction_only=True, differentiation_method=None, include_bias=False, library_ensemble=False, ensemble_indices=[0])[source]

Generate a library with custom functions. The Library takes custom libraries for X and Xdot respectively, and then tensor-products them together. For a 3D system, a library of constant and linear terms in x_dot, i.e. [1, x_dot0, …, x_dot3], is good enough for most problems and implicit terms. The function names list should include both X and Xdot functions, without the mixed terms.

Parameters
• library_functions (list of mathematical functions) – Functions to include in the library. Each function will be applied to each input variable x.

• x_dot_library_functions (list of mathematical functions) – Functions to include in the library. Each function will be applied to each input variable x_dot.

• t (np.ndarray of time slices) – Time base to compute Xdot from X for the implicit terms

• differentiation_method (differentiation object, optional) – Method for differentiating the data. This must be a class extending pysindy.differentiation_methods.base.BaseDifferentiation class. The default option is centered difference.

• function_names (list of functions, optional (default None)) – List of functions used to generate feature names for each library function. Each name function must take a string input (representing a variable name), and output a string depiction of the respective mathematical function applied to that variable. For example, if the first library function is sine, the name function might return $$\sin(x)$$ given $$x$$ as input. The function_names list must be the same length as library_functions. If no list of function names is provided, defaults to using $$[ f_0(x),f_1(x), f_2(x), \ldots ]$$. For SINDy-PI, function_names should include the names of the functions in both the x and x_dot libraries (library_functions and x_dot_library_functions), but not the mixed terms, which are computed in the code.

• interaction_only (boolean, optional (default True)) – Whether to omit self-interaction terms. If True, function evaulations of the form $$f(x,x)$$ and $$f(x,y,x)$$ will be omitted, but those of the form $$f(x,y)$$ and $$f(x,y,z)$$ will be included. If False, all combinations will be included.

• include_bias (boolean, optional (default False)) – If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model). This is hard to do with just lambda functions, because if the system is not 1D, lambdas will generate duplicates.

• library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)

• ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.

Attributes
• functions (list of functions) – Mathematical library functions to be applied to each input feature.

• function_names (list of functions) – Functions for generating string representations of each library function.

• n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.

• n_output_features_ (int) – The total number of output features. The number of output features is the product of the number of library functions and the number of input features.

Examples

>>> import numpy as np
>>> from pysindy.feature_library import SINDyPILibrary
>>> t = np.linspace(0, 1, 5)
>>> x = np.ones((5, 2))
>>> functions = [lambda x: 1, lambda x : np.exp(x),
lambda x,y : np.sin(x+y)]
>>> x_dot_functions = [lambda x: 1, lambda x : x]
>>> function_names = [lambda x: '',
lambda x : 'exp(' + x + ')',
lambda x, y : 'sin(' + x + y + ')',
lambda x: '',
lambda x : x]
>>> lib = ps.SINDyPILibrary(library_functions=functions,
x_dot_library_functions=x_dot_functions,
function_names=function_names, t=t
).fit(x)
>>> lib.transform(x)
[[ 1.00000000e+00  2.71828183e+00  2.71828183e+00  9.09297427e-01
2.22044605e-16  6.03579815e-16  6.03579815e-16  2.01904588e-16
2.22044605e-16  6.03579815e-16  6.03579815e-16  2.01904588e-16]
[ 1.00000000e+00  2.71828183e+00  2.71828183e+00  9.09297427e-01
0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00]
[ 1.00000000e+00  2.71828183e+00  2.71828183e+00  9.09297427e-01
0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00]
[ 1.00000000e+00  2.71828183e+00  2.71828183e+00  9.09297427e-01
0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00
0.00000000e+00  0.00000000e+00  0.00000000e+00  0.00000000e+00]
[ 1.00000000e+00  2.71828183e+00  2.71828183e+00  9.09297427e-01
-2.22044605e-16 -6.03579815e-16 -6.03579815e-16 -2.01904588e-16
-2.22044605e-16 -6.03579815e-16 -6.03579815e-16 -2.01904588e-16]]
>>> lib.get_feature_names()
['', 'exp(x0)', 'exp(x1)', 'sin(x0x1)', 'x0_dot', 'exp(x0)x0_dot',
'exp(x1)x0_dot', 'sin(x0x1)x0_dot', 'x1_dot', 'exp(x0)x1_dot',
'exp(x1)x1_dot', 'sin(x0x1)x1_dot']

get_feature_names(input_features=None)[source]

Return feature names for output features.

Parameters

input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.

Returns

output_feature_names

Return type

list of string, length n_output_features

fit(x, y=None)[source]

Compute number of output features.

Parameters

x (array-like, shape (n_samples, n_features)) – Measurement data.

Returns

self

Return type

instance

transform(x)[source]

Transform data to custom features

Parameters

x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.

Returns

xp – The matrix of features, where n_output_features is the number of features generated from applying the custom functions to the inputs.

Return type

np.ndarray, shape (n_samples, n_output_features)

class pysindy.BaseOptimizer(max_iter=20, normalize_columns=False, fit_intercept=False, initial_guess=None, copy_X=True)[source]

Bases: sklearn.linear_model._base.LinearRegression, pysindy.optimizers.base.ComplexityMixin

Base class for SINDy optimizers. Subclasses must implement a _reduce method for carrying out the bulk of the work of fitting a model.

Parameters
• fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.

• normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.

• copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.

• initial_guess (np.ndarray, shape (n_features,) or (n_targets, n_features),) – optional (default None) Initial guess for coefficients coef_. If None, the initial guess is obtained via a least-squares fit.

Attributes
• coef_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s).

• ind_ (array, shape (n_features,) or (n_targets, n_features)) – Array of 0s and 1s indicating which coefficients of the weight vector have not been masked out.

• history_ (list) – History of coef_ over iterations of the optimization algorithm.

• Theta_ (np.ndarray, shape (n_samples, n_features)) – The Theta matrix to be used in the optimization. We save it as an attribute because access to the full library of terms is sometimes needed for various applications.

fit(x_, y, sample_weight=None, **reduce_kws)[source]

Fit the model.

Parameters
• x (array-like, shape (n_samples, n_features)) – Training data

• y (array-like, shape (n_samples,) or (n_samples, n_targets)) – Target values

• sample_weight (float or numpy array of shape (n_samples,), optional) – Individual weights for each sample

• reduce_kws (dict) – Optional keyword arguments to pass to the _reduce method (implemented by subclasses)

Returns

self

Return type

returns an instance of self

class pysindy.SINDyOptimizer(optimizer, unbias=True)[source]

Bases: sklearn.base.BaseEstimator

Wrapper class for optimizers/sparse regression methods passed into the SINDy object.

Enables single target regressors (i.e. those whose predictions are 1-dimensional) to perform multi target regression (i.e. predictions are 2-dimensional). Also enhances an _unbias function to reduce bias when regularization is used.

Parameters
• optimizer (estimator object) – The optimizer/sparse regressor to be wrapped, implementing fit and predict. optimizer should also have the attributes coef_, fit_intercept, and intercept_. Note that attribute normalize is deprecated as of sklearn versions >= 1.0 and will be removed in future versions.

• unbias (boolean, optional (default True)) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support. For example, if optimizer=STLSQ(alpha=0.1) is used then the learned coefficients will be biased toward 0 due to the L2 regularization. Setting unbias=True will trigger an additional step wherein the nonzero coefficients learned by the optimizer object will be updated using an unregularized least-squares fit.

fit(x, y)[source]
predict(x)[source]
property coef_
property intercept_
property complexity
class pysindy.SR3(threshold=0.1, thresholds=None, nu=1.0, tol=1e-05, thresholder='L0', trimming_fraction=0.0, trimming_step_size=1.0, max_iter=30, fit_intercept=False, copy_X=True, initial_guess=None, normalize_columns=False, verbose=False)[source]

Sparse relaxed regularized regression.

Attempts to minimize the objective function

$0.5\|y-Xw\|^2_2 + \lambda R(u) + (0.5 / \nu)\|w-u\|^2_2$

where $$R(u)$$ is a regularization function. See the following references for more details:

Zheng, Peng, et al. “A unified framework for sparse relaxed regularized regression: SR3.” IEEE Access 7 (2018): 1404-1423.

Champion, K., Zheng, P., Aravkin, A. Y., Brunton, S. L., & Kutz, J. N. (2020). A unified sparse optimization framework to learn parsimonious physics-informed models from data. IEEE Access, 8, 169259-169271.

Parameters
• threshold (float, optional (default 0.1)) – Determines the strength of the regularization. When the regularization function R is the L0 norm, the regularization is equivalent to performing hard thresholding, and lambda is chosen to threshold at the value given by this parameter. This is equivalent to choosing lambda = threshold^2 / (2 * nu).

• nu (float, optional (default 1)) – Determines the level of relaxation. Decreasing nu encourages w and v to be close, whereas increasing nu allows the regularized coefficients v to be farther from w.

• tol (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm.

• thresholder (string, optional (default 'L0')) – Regularization function to use. Currently implemented options are ‘L0’ (L0 norm), ‘L1’ (L1 norm), ‘L2’ (L2 norm) and ‘CAD’ (clipped absolute deviation). Note by ‘L2 norm’ we really mean the squared L2 norm, i.e. ridge regression

• trimming_fraction (float, optional (default 0.0)) – Fraction of the data samples to trim during fitting. Should be a float between 0.0 and 1.0. If 0.0, trimming is not performed.

• trimming_step_size (float, optional (default 1.0)) – Step size to use in the trimming optimization procedure.

• max_iter (int, optional (default 30)) – Maximum iterations of the optimization algorithm.

• initial_guess (np.ndarray, shape (n_features) or (n_targets, n_features), optional (default None)) – Initial guess for coefficients coef_. If None, least-squares is used to obtain an initial guess.

• fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.

• normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.

• copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.

• thresholds (np.ndarray, shape (n_targets, n_features), optional (default None)) – Array of thresholds for each library function coefficient. Each row corresponds to a measurement variable and each column to a function from the feature library. Recall that SINDy seeks a matrix $$\Xi$$ such that $$\dot{X} \approx \Theta(X)\Xi$$. thresholds[i, j] should specify the threshold to be used for the (j + 1, i + 1) entry of $$\Xi$$. That is to say it should give the threshold to be used for the (j + 1)st library function in the equation for the (i + 1)st measurement variable.

• verbose (bool, optional (default False)) – If True, prints out the different error terms every max_iter / 10 iterations.

Attributes
• coef_ (array, shape (n_features,) or (n_targets, n_features)) – Regularized weight vector(s). This is the v in the objective function.

• coef_full_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s) that are not subjected to the regularization. This is the w in the objective function.

• history_ (list) – History of sparse coefficients. history_[k] contains the sparse coefficients (v in the optimization objective function) at iteration k.

Examples

>>> import numpy as np
>>> from scipy.integrate import odeint
>>> from pysindy import SINDy
>>> from pysindy.optimizers import SR3
>>> lorenz = lambda z,t : [10*(z[1] - z[0]),
>>>                        z[0]*(28 - z[2]) - z[1],
>>>                        z[0]*z[1] - 8/3*z[2]]
>>> t = np.arange(0,2,.002)
>>> x = odeint(lorenz, [-8,8,27], t)
>>> opt = SR3(threshold=0.1, nu=1)
>>> model = SINDy(optimizer=opt)
>>> model.fit(x, t=t[1]-t[0])
>>> model.print()
x0' = -10.004 1 + 10.004 x0
x1' = 27.994 1 + -0.993 x0 + -1.000 1 x1
x2' = -2.662 x1 + 1.000 1 x0

enable_trimming(trimming_fraction)[source]

Enable the trimming of potential outliers.

Parameters

trimming_fraction (float) – The fraction of samples to be trimmed. Must be between 0 and 1.

disable_trimming()[source]

Disable trimming of potential outliers.

class pysindy.STLSQ(threshold=0.1, alpha=0.05, max_iter=20, ridge_kw=None, normalize_columns=False, fit_intercept=False, copy_X=True, initial_guess=None, verbose=False)[source]

Sequentially thresholded least squares algorithm. Defaults to doing Sequentially thresholded Ridge regression.

Attempts to minimize the objective function $$\|y - Xw\|^2_2 + \alpha \|w\|^2_2$$ by iteratively performing least squares and masking out elements of the weight array w that are below a given threshold.

See the following reference for more details:

Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz. “Discovering governing equations from data by sparse identification of nonlinear dynamical systems.” Proceedings of the national academy of sciences 113.15 (2016): 3932-3937.

Parameters
• threshold (float, optional (default 0.1)) – Minimum magnitude for a coefficient in the weight vector. Coefficients with magnitude below the threshold are set to zero.

• alpha (float, optional (default 0.05)) – Optional L2 (ridge) regularization on the weight vector.

• max_iter (int, optional (default 20)) – Maximum iterations of the optimization algorithm.

• ridge_kw (dict, optional (default None)) – Optional keyword arguments to pass to the ridge regression.

• fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.

• normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.

• copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.

• initial_guess (np.ndarray, shape (n_features) or (n_targets, n_features),) – optional (default None) Initial guess for coefficients coef_. If None, least-squares is used to obtain an initial guess.

• verbose (bool, optional (default False)) – If True, prints out the different error terms every iteration.

Attributes
• coef_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s).

• ind_ (array, shape (n_features,) or (n_targets, n_features)) – Array of 0s and 1s indicating which coefficients of the weight vector have not been masked out, i.e. the support of self.coef_.

• history_ (list) – History of coef_. history_[k] contains the values of coef_ at iteration k of sequentially thresholded least-squares.

Examples

>>> import numpy as np
>>> from scipy.integrate import odeint
>>> from pysindy import SINDy
>>> from pysindy.optimizers import STLSQ
>>> lorenz = lambda z,t : [10*(z[1] - z[0]),
>>>                        z[0]*(28 - z[2]) - z[1],
>>>                        z[0]*z[1] - 8/3*z[2]]
>>> t = np.arange(0,2,.002)
>>> x = odeint(lorenz, [-8,8,27], t)
>>> opt = STLSQ(threshold=.1, alpha=.5)
>>> model = SINDy(optimizer=opt)
>>> model.fit(x, t=t[1]-t[0])
>>> model.print()
x0' = -9.999 1 + 9.999 x0
x1' = 27.984 1 + -0.996 x0 + -1.000 1 x1
x2' = -2.666 x1 + 1.000 1 x0

property complexity
class pysindy.ConstrainedSR3(threshold=0.1, nu=1.0, tol=1e-05, thresholder='l0', max_iter=30, trimming_fraction=0.0, trimming_step_size=1.0, constraint_lhs=None, constraint_rhs=None, constraint_order='target', normalize_columns=False, fit_intercept=False, copy_X=True, initial_guess=None, thresholds=None, inequality_constraints=False, verbose=False, verbose_cvxpy=False)[source]

Sparse relaxed regularized regression with linear equality constraints.

Attempts to minimize the objective function

\begin{align}\begin{aligned}0.5\|y-Xw\|^2_2 + \lambda R(u) + (0.5 / \nu)\|w-u\|^2_2\\\text{subject to}\end{aligned}\end{align}
$Cw = d$

over u and w, where $$R(u)$$ is a regularization function, C is a constraint matrix, and d is a vector of values. See the following reference for more details:

Champion, Kathleen, et al. “A unified sparse optimization framework to learn parsimonious physics-informed models from data.” IEEE Access 8 (2020): 169259-169271.

Parameters
• threshold (float, optional (default 0.1)) – Determines the strength of the regularization. When the regularization function R is the l0 norm, the regularization is equivalent to performing hard thresholding, and lambda is chosen to threshold at the value given by this parameter. This is equivalent to choosing lambda = threshold^2 / (2 * nu).

• nu (float, optional (default 1)) – Determines the level of relaxation. Decreasing nu encourages w and v to be close, whereas increasing nu allows the regularized coefficients v to be farther from w.

• tol (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm.

• thresholder (string, optional (default 'l0')) – Regularization function to use. Currently implemented options are ‘l0’ (l0 norm), ‘l1’ (l1 norm), ‘l2’ (l2 norm), ‘cad’ (clipped absolute deviation), ‘weighted_l0’ (weighted l0 norm), ‘weighted_l1’ (weighted l1 norm), and ‘weighted_l2’ (weighted l2 norm).

• max_iter (int, optional (default 30)) – Maximum iterations of the optimization algorithm.

• fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.

• constraint_lhs (numpy ndarray, optional (default None)) – Shape should be (n_constraints, n_features * n_targets), The left hand side matrix C of Cw <= d. There should be one row per constraint.

• constraint_rhs (numpy ndarray, shape (n_constraints,), optional (default None)) – The right hand side vector d of Cw <= d.

• constraint_order (string, optional (default "target")) – The format in which the constraints constraint_lhs were passed. Must be one of “target” or “feature”. “target” indicates that the constraints are grouped by target: i.e. the first n_features columns correspond to constraint coefficients on the library features for the first target (variable), the next n_features columns to the library features for the second target (variable), and so on. “feature” indicates that the constraints are grouped by library feature: the first n_targets columns correspond to the first library feature, the next n_targets columns to the second library feature, and so on.

• normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed. Note that this parameter is incompatible with the constraints!

• copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.

• initial_guess (np.ndarray, optional (default None)) – Shape should be (n_features) or (n_targets, n_features). Initial guess for coefficients coef_, (v in the mathematical equations) If None, least-squares is used to obtain an initial guess.

• thresholds (np.ndarray, shape (n_targets, n_features), optional (default None)) – Array of thresholds for each library function coefficient. Each row corresponds to a measurement variable and each column to a function from the feature library. Recall that SINDy seeks a matrix $$\Xi$$ such that $$\dot{X} \approx \Theta(X)\Xi$$. thresholds[i, j] should specify the threshold to be used for the (j + 1, i + 1) entry of $$\Xi$$. That is to say it should give the threshold to be used for the (j + 1)st library function in the equation for the (i + 1)st measurement variable.

• inequality_constraints (bool, optional (default False)) – If True, CVXPY methods are used to solve the problem.

• verbose (bool, optional (default False)) – If True, prints out the different error terms every max_iter / 10 iterations.

• verbose_cvxpy (bool, optional (default False)) – Boolean flag which is passed to CVXPY solve function to indicate if output should be verbose or not. Only relevant for optimizers that use the CVXPY package in some capabity.

Attributes
• coef_ (array, shape (n_features,) or (n_targets, n_features)) – Regularized weight vector(s). This is the v in the objective function.

• coef_full_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s) that are not subjected to the regularization. This is the w in the objective function.

• unbias (boolean) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support. unbias is automatically set to False if a constraint is used and is otherwise left uninitialized.

class pysindy.TrappingSR3(evolve_w=True, threshold=0.1, eps_solver=1e-07, relax_optim=True, inequality_constraints=False, eta=None, alpha_A=None, alpha_m=None, gamma=- 0.1, tol=1e-05, tol_m=1e-05, thresholder='l1', thresholds=None, max_iter=30, accel=False, normalize_columns=False, fit_intercept=False, copy_X=True, m0=None, A0=None, objective_history=None, constraint_lhs=None, constraint_rhs=None, constraint_order='target', verbose=False, verbose_cvxpy=False)[source]

Trapping variant of sparse relaxed regularized regression. This optimizer can be used to identify systems with stable (bounded) solutions.

Attempts to minimize one of two related objective functions

$0.5\|y-Xw\|^2_2 + \lambda R(w) + 0.5\|Pw-A\|^2_2/\eta + \delta_0(Cw-d) + \delta_{\Lambda}(A)$

or

$0.5\|y-Xw\|^2_2 + \lambda R(w) + \delta_0(Cw-d) + 0.5 * maximumeigenvalue(A)/\eta$

where $$R(w)$$ is a regularization function, which must be convex, $$\delta_0$$ is an indicator function that provides a hard constraint of CW = d, and :math:delta_{Lambda} is a term to project the $$A$$ matrix onto the space of negative definite matrices. See the following references for more details:

Kaptanoglu, Alan A., et al. “Promoting global stability in data-driven models of quadratic nonlinear dynamics.” arXiv preprint arXiv:2105.01843 (2021).

Zheng, Peng, et al. “A unified framework for sparse relaxed regularized regression: Sr3.” IEEE Access 7 (2018): 1404-1423.

Champion, Kathleen, et al. “A unified sparse optimization framework to learn parsimonious physics-informed models from data.” IEEE Access 8 (2020): 169259-169271.

Parameters
• evolve_w (bool, optional (default True)) – If false, don’t update w and just minimize over (m, A)

• threshold (float, optional (default 0.1)) – Determines the strength of the regularization. When the regularization function R is the L0 norm, the regularization is equivalent to performing hard thresholding, and lambda is chosen to threshold at the value given by this parameter. This is equivalent to choosing lambda = threshold^2 / (2 * nu).

• eta (float, optional (default 1.0e20)) – Determines the strength of the stability term ||Pw-A||^2 in the optimization. The default value is very large so that the algorithm default is to ignore the stability term. In this limit, this should be approximately equivalent to the ConstrainedSR3 method.

• alpha_m (float, optional (default eta * 0.1)) – Determines the step size in the prox-gradient descent over m. For convergence, need alpha_m <= eta / ||w^T * PQ^T * PQ * w||. Typically 0.01 * eta <= alpha_m <= 0.1 * eta.

• alpha_A (float, optional (default eta)) – Determines the step size in the prox-gradient descent over A. For convergence, need alpha_A <= eta, so typically alpha_A = eta is used.

• gamma (float, optional (default 0.1)) – Determines the negative interval that matrix A is projected onto. For most applications gamma = 0.1 - 1.0 works pretty well.

• tol (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm over w.

• tol_m (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm over m.

• thresholder (string, optional (default 'L1')) – Regularization function to use. For current trapping SINDy, only the L1 and L2 norms are implemented. Note that other convex norms could be straightforwardly implemented, but L0 requires reformulation because of nonconvexity.

• thresholds (np.ndarray, shape (n_targets, n_features), optional (default None)) – Array of thresholds for each library function coefficient. Each row corresponds to a measurement variable and each column to a function from the feature library. Recall that SINDy seeks a matrix $$\Xi$$ such that $$\dot{X} \approx \Theta(X)\Xi$$. thresholds[i, j] should specify the threshold to be used for the (j + 1, i + 1) entry of $$\Xi$$. That is to say it should give the threshold to be used for the (j + 1)st library function in the equation for the (i + 1)st measurement variable.

• eps_solver (float, optional (default 1.0e-7)) – If threshold != 0, this specifies the error tolerance in the CVXPY (OSQP) solve. Default is 1.0e-3 in OSQP.

• relax_optim (bool, optional (default True)) – If relax_optim = True, use the relax-and-split method. If False, try a direct minimization on the largest eigenvalue.

• inequality_constraints (bool, optional (default False)) – If True, relax_optim must be false or relax_optim = True AND threshold != 0, so that the CVXPY methods are used.

• max_iter (int, optional (default 30)) – Maximum iterations of the optimization algorithm.

• accel (bool, optional (default False)) – Whether or not to use accelerated prox-gradient descent for (m, A).

• m0 (np.ndarray, shape (n_targets), optional (default None)) – Initial guess for vector m in the optimization. Otherwise each component of m is randomly initialized in [-1, 1].

• A0 (np.ndarray, shape (n_targets, n_targets), optional (default None)) – Initial guess for vector A in the optimization. Otherwise A is initialized as A = diag(gamma).

• fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.

• copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.

• normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.

• verbose (bool, optional (default False)) – If True, prints out the different error terms every max_iter / 10 iterations.

• verbose_cvxpy (bool, optional (default False)) – Boolean flag which is passed to CVXPY solve function to indicate if output should be verbose or not. Only relevant for optimizers that use the CVXPY package in some capabity.

Attributes
• coef_ (array, shape (n_features,) or (n_targets, n_features)) – Regularized weight vector(s). This is the v in the objective function.

• history_ (list) – History of sparse coefficients. history_[k] contains the sparse coefficients (v in the optimization objective function) at iteration k.

• objective_history_ (list) – History of the value of the objective at each step. Note that the trapping SINDy problem is nonconvex, meaning that this value may increase and decrease as the algorithm works.

• A_history_ (list) – History of the auxiliary variable A that approximates diag(PW).

• m_history_ (list) – History of the shift vector m that determines the origin of the trapping region.

• PW_history_ (list) – History of PW = A^S, the quantity we are attempting to make negative definite.

• PWeigs_history_ (list) – History of diag(PW), a list of the eigenvalues of A^S at each iteration. Tracking this allows us to ascertain if A^S is indeed being pulled towards the space of negative definite matrices.

• PL_unsym_ (np.ndarray, shape (n_targets, n_targets, n_targets, n_features)) – Unsymmetrized linear coefficient part of the P matrix in ||Pw - A||^2

• PL_ (np.ndarray, shape (n_targets, n_targets, n_targets, n_features)) – Linear coefficient part of the P matrix in ||Pw - A||^2

• PQ_ (np.ndarray, shape (n_targets, n_targets,) – n_targets, n_targets, n_features) Quadratic coefficient part of the P matrix in ||Pw - A||^2

Examples

>>> import numpy as np
>>> from scipy.integrate import odeint
>>> from pysindy import SINDy
>>> from pysindy.optimizers import TrappingSR3
>>> lorenz = lambda z,t : [10*(z[1] - z[0]),
>>>                        z[0]*(28 - z[2]) - z[1],
>>>                        z[0]*z[1] - 8/3*z[2]]
>>> t = np.arange(0,2,.002)
>>> x = odeint(lorenz, [-8,8,27], t)
>>> opt = TrappingSR3(threshold=0.1)
>>> model = SINDy(optimizer=opt)
>>> model.fit(x, t=t[1]-t[0])
>>> model.print()
x0' = -10.004 1 + 10.004 x0
x1' = 27.994 1 + -0.993 x0 + -1.000 1 x1
x2' = -2.662 x1 + 1.000 1 x0

class pysindy.SSR(alpha=0.05, max_iter=20, ridge_kw=None, normalize_columns=False, fit_intercept=False, copy_X=True, criteria='coefficient_value', kappa=None, verbose=False)[source]

Stepwise sparse regression (SSR) greedy algorithm.

Attempts to minimize the objective function $$\|y - Xw\|^2_2 + \alpha \|w\|^2_2$$ by iteratively eliminating the smallest coefficient

See the following reference for more details:

Boninsegna, Lorenzo, Feliks Nüske, and Cecilia Clementi. “Sparse learning of stochastic dynamical equations.” The Journal of chemical physics 148.24 (2018): 241723.

Parameters
• max_iter (int, optional (default 20)) – Maximum iterations of the optimization algorithm.

• fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.

• normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.

• copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.

• kappa (float, optional (default None)) – If passed, compute the MSE errors with an extra L0 term with strength equal to kappa times the condition number of Theta.

• criteria (string, optional (default "coefficient_value")) – The criteria to use for truncating a coefficient each iteration. Must be “coefficient_value” or “model_residual”. “coefficient_value”: zero out the smallest coefficient). “model_residual”: choose the N-1 term model with the smallest residual error.

• alpha (float, optional (default 0.05)) – Optional L2 (ridge) regularization on the weight vector.

• ridge_kw (dict, optional (default None)) – Optional keyword arguments to pass to the ridge regression.

• verbose (bool, optional (default False)) – If True, prints out the different error terms every iteration.

Attributes
• coef_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s).

• history_ (list) – History of coef_. history_[k] contains the values of coef_ at iteration k of SSR

• err_history_ (list) – History of coef_. history_[k] contains the MSE of each coef_ at iteration k of SSR

Examples

>>> import numpy as np
>>> from scipy.integrate import odeint
>>> from pysindy import SINDy
>>> from pysindy.optimizers import SSR
>>> lorenz = lambda z,t : [10*(z[1] - z[0]),
>>>                        z[0]*(28 - z[2]) - z[1],
>>>                        z[0]*z[1] - 8/3*z[2]]
>>> t = np.arange(0,2,.002)
>>> x = odeint(lorenz, [-8,8,27], t)
>>> opt = SSR(alpha=.5)
>>> model = SINDy(optimizer=opt)
>>> model.fit(x, t=t[1]-t[0])
>>> model.print()
x0' = -9.999 1 + 9.999 x0
x1' = 27.984 1 + -0.996 x0 + -1.000 1 x1
x2' = -2.666 x1 + 1.000 1 x0

class pysindy.FROLS(normalize_columns=False, fit_intercept=False, copy_X=True, kappa=None, max_iter=10, alpha=0.05, ridge_kw=None, verbose=False)[source]

Forward Regression Orthogonal Least-Squares (FROLS) optimizer.

Attempts to minimize the objective function $$\|y - Xw\|^2_2 + \alpha \|w\|^2_2$$ by iteractively selecting the most correlated function in the library. This is a greedy algorithm.

See the following reference for more details:

Billings, Stephen A. Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. John Wiley & Sons, 2013.

Parameters
• fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.

• normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.

• copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.

• kappa (float, optional (default None)) – If passed, compute the MSE errors with an extra L0 term with strength equal to kappa times the condition number of Theta.

• max_iter (int, optional (default 10)) – Maximum iterations of the optimization algorithm. This determines the number of nonzero terms chosen by the FROLS algorithm.

• alpha (float, optional (default 0.05)) – Optional L2 (ridge) regularization on the weight vector.

• ridge_kw (dict, optional (default None)) – Optional keyword arguments to pass to the ridge regression.

• verbose (bool, optional (default False)) – If True, prints out the different error terms every iteration.

Attributes
• coef_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s).

• history_ (list) – History of coef_. history_[k] contains the values of coef_ at iteration k of FROLS.

Examples

>>> import numpy as np
>>> from scipy.integrate import odeint
>>> from pysindy import SINDy
>>> from pysindy.optimizers import FROLS
>>> lorenz = lambda z,t : [10*(z[1] - z[0]),
>>>                        z[0]*(28 - z[2]) - z[1],
>>>                        z[0]*z[1] - 8/3*z[2]]
>>> t = np.arange(0,2,.002)
>>> x = odeint(lorenz, [-8,8,27], t)
>>> opt = FROLS(threshold=.1, alpha=.5)
>>> model = SINDy(optimizer=opt)
>>> model.fit(x, t=t[1]-t[0])
>>> model.print()
x0' = -9.999 1 + 9.999 x0
x1' = 27.984 1 + -0.996 x0 + -1.000 1 x1
x2' = -2.666 x1 + 1.000 1 x0

class pysindy.SINDyPI(threshold=0.1, tol=1e-05, thresholder='l1', max_iter=10000, fit_intercept=False, copy_X=True, thresholds=None, model_subset=None, normalize_columns=False, verbose_cvxpy=False)[source]

SINDy-PI optimizer

Attempts to minimize the objective function

$0.5\|X-Xw\|^2_2 + \lambda R(w)$

over w where $$R(v)$$ is a regularization function. See the following reference for more details:

Kaheman, Kadierdan, J. Nathan Kutz, and Steven L. Brunton. SINDy-PI: a robust algorithm for parallel implicit sparse identification of nonlinear dynamics. Proceedings of the Royal Society A 476.2242 (2020): 20200279.

Parameters
• threshold (float, optional (default 0.1)) – Determines the strength of the regularization. When the regularization function R is the l0 norm, the regularization is equivalent to performing hard thresholding, and lambda is chosen to threshold at the value given by this parameter. This is equivalent to choosing lambda = threshold^2 / (2 * nu).

• tol (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm.

• thresholder (string, optional (default 'l1')) – Regularization function to use. Currently implemented options are ‘l1’ (l1 norm), ‘weighted_l1’ (weighted l1 norm), l2, and ‘weighted_l2’ (weighted l2 norm)

• max_iter (int, optional (default 10000)) – Maximum iterations of the optimization algorithm.

• fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.

• normalize_columns (boolean, optional (default False)) – This parameter normalizes the columns of Theta before the optimization is done. This tends to standardize the columns to similar magnitudes, often improving performance.

• copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.

• thresholds (np.ndarray, shape (n_targets, n_features), optional (default None)) – Array of thresholds for each library function coefficient. Each row corresponds to a measurement variable and each column to a function from the feature library. Recall that SINDy seeks a matrix $$\Xi$$ such that $$\dot{X} \approx \Theta(X)\Xi$$. thresholds[i, j] should specify the threshold to be used for the (j + 1, i + 1) entry of $$\Xi$$. That is to say it should give the threshold to be used for the (j + 1)st library function in the equation for the (i + 1)st measurement variable.

• model_subset (np.ndarray, shape(n_models), optional (default None)) – List of indices to compute models for. If list is not provided, the default is to compute SINDy-PI models for all possible candidate functions. This can take a long time for 4D systems or larger.

• verbose_cvxpy (bool, optional (default False)) – Boolean flag which is passed to CVXPY solve function to indicate if output should be verbose or not. Only relevant for optimizers that use the CVXPY package in some capabity.

Attributes
• coef_ (array, shape (n_features,) or (n_targets, n_features)) – Regularized weight vector(s). This is the v in the objective function.

• unbias (boolean) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support. unbias is automatically set to False if a constraint is used and is otherwise left uninitialized.