pysindy package¶
Subpackages¶
- pysindy.deeptime package
- pysindy.differentiation package
- pysindy.feature_library package
- Submodules
- pysindy.feature_library.base module
BaseFeatureLibrary
BaseFeatureLibrary.validate_input
BaseFeatureLibrary.reshape_samples_to_spatial_grid
BaseFeatureLibrary.correct_shape
BaseFeatureLibrary.calc_trajectory
BaseFeatureLibrary.get_spatial_grid
BaseFeatureLibrary.fit
BaseFeatureLibrary.transform
BaseFeatureLibrary.get_feature_names
BaseFeatureLibrary.size
x_sequence_or_item
ConcatLibrary
TensoredLibrary
- pysindy.feature_library.custom_library module
- pysindy.feature_library.fourier_library module
- pysindy.feature_library.generalized_library module
- pysindy.feature_library.identity_library module
- pysindy.feature_library.parameterized_library module
- pysindy.feature_library.pde_library module
- pysindy.feature_library.polynomial_library module
- pysindy.feature_library.sindy_pi_library module
- pysindy.feature_library.weak_pde_library module
- Module contents
- pysindy.optimizers package
- Submodules
- pysindy.optimizers.base module
- pysindy.optimizers.constrained_sr3 module
- pysindy.optimizers.frols module
- pysindy.optimizers.miosr module
- pysindy.optimizers.sindy_optimizer module
- pysindy.optimizers.sindy_pi module
- pysindy.optimizers.sr3 module
- pysindy.optimizers.ssr module
- pysindy.optimizers.stable_linear_sr3 module
- pysindy.optimizers.stlsq module
- pysindy.optimizers.trapping_sr3 module
- Module contents
- pysindy.utils package
- Submodules
- pysindy.utils.axes module
- pysindy.utils.base module
flatten_2d_tall
validate_input
validate_no_reshape
validate_control_variables
drop_nan_samples
reorder_constraints
prox_l0
prox_weighted_l0
prox_l1
prox_weighted_l1
prox_l2
prox_weighted_l2
prox_cad
get_prox
get_regularization
capped_simplex_projection
print_model
equations
supports_multiple_targets
- pysindy.utils.odes module
linear_damped_SHO
cubic_damped_SHO
linear_3D
van_der_pol
duffing
lotka
cubic_oscillator
rossler
hopf
logistic_map
logistic_map_control
logistic_map_multicontrol
lorenz
lorenz_u
lorenz_control
meanfield
oscillator
mhd
burgers_galerkin
enzyme
bacterial
yeast
pendulum_on_cart
f_steer
f_acc
kinematic_commonroad
double_pendulum
- Module contents
AxesArray
SampleConcatter
concat_sample_axis
wrap_axes
comprehend_axes
capped_simplex_projection
drop_nan_samples
equations
get_prox
get_regularization
print_model
prox_cad
prox_l0
prox_weighted_l0
prox_l1
prox_weighted_l1
prox_l2
prox_weighted_l2
reorder_constraints
supports_multiple_targets
validate_control_variables
validate_input
validate_no_reshape
flatten_2d_tall
linear_damped_SHO
cubic_damped_SHO
linear_3D
lotka
van_der_pol
duffing
rossler
cubic_oscillator
hopf
lorenz
lorenz_control
lorenz_u
meanfield
oscillator
burgers_galerkin
mhd
enzyme
yeast
bacterial
pendulum_on_cart
kinematic_commonroad
double_pendulum
Submodules¶
pysindy.pysindy module¶
- class pysindy.pysindy.SINDy(optimizer=None, feature_library=None, differentiation_method=None, feature_names=None, t_default=1, discrete_time=False)[source]¶
Bases:
BaseEstimator
Sparse Identification of Nonlinear Dynamical Systems (SINDy). Uses sparse regression to learn a dynamical systems model from measurement data.
- Parameters
optimizer (optimizer object, optional) – Optimization method used to fit the SINDy model. This must be a class extending
pysindy.optimizers.BaseOptimizer
. The default isSTLSQ
.feature_library (feature library object, optional) – Feature library object used to specify candidate right-hand side features. This must be a class extending
pysindy.feature_library.base.BaseFeatureLibrary
. The default option isPolynomialLibrary
.differentiation_method (differentiation object, optional) – Method for differentiating the data. This must be a class extending
pysindy.differentiation_methods.base.BaseDifferentiation
class. The default option is centered difference.feature_names (list of string, length n_input_features, optional) – Names for the input features (e.g.
['x', 'y', 'z']
). If None, will use['x0', 'x1', ...]
.t_default (float, optional (default 1)) – Default value for the time step.
discrete_time (boolean, optional (default False)) – If True, dynamical system is treated as a map. Rather than predicting derivatives, the right hand side functions step the system forward by one time step. If False, dynamical system is assumed to be a flow (right-hand side functions predict continuous time derivatives).
- Attributes
model (
sklearn.multioutput.MultiOutputRegressor
object) – The fitted SINDy model.n_input_features_ (int) – The total number of input features.
n_output_features_ (int) – The total number of output features. This number is a function of
self.n_input_features
and the feature library being used.n_control_features_ (int) – The total number of control input features.
Examples
>>> import numpy as np >>> from scipy.integrate import solve_ivp >>> from pysindy import SINDy >>> lorenz = lambda z,t : [10*(z[1] - z[0]), >>> z[0]*(28 - z[2]) - z[1], >>> z[0]*z[1] - 8/3*z[2]] >>> t = np.arange(0,2,.002) >>> x = solve_ivp(lorenz, [-8,8,27], t) >>> model = SINDy() >>> model.fit(x, t=t[1]-t[0]) >>> model.print() x0' = -10.000 1 + 10.000 x0 x1' = 27.993 1 + -0.999 x0 + -1.000 1 x1 x2' = -2.666 x1 + 1.000 1 x0 >>> model.coefficients() array([[ 0. , 0. , 0. ], [-9.99969193, 27.99344519, 0. ], [ 9.99961547, -0.99905338, 0. ], [ 0. , 0. , -2.66645651], [ 0. , 0. , 0. ], [ 0. , 0. , 0.99990257], [ 0. , -0.99980268, 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ]]) >>> model.score(x, t=t[1]-t[0]) 0.999999985520653
>>> import numpy as np >>> from scipy.integrate import solve_ivp >>> from pysindy import SINDy >>> u = lambda t : np.sin(2 * t) >>> lorenz_c = lambda z,t : [ 10 * (z[1] - z[0]) + u(t) ** 2, z[0] * (28 - z[2]) - z[1], z[0] * z[1] - 8 / 3 * z[2], ] >>> t = np.arange(0,2,0.002) >>> x = solve_ivp(lorenz_c, [-8,8,27], t) >>> u_eval = u(t) >>> model = SINDy() >>> model.fit(x, u_eval, t=t[1]-t[0]) >>> model.print() x0' = -10.000 x0 + 10.000 x1 + 1.001 u0^2 x1' = 27.994 x0 + -0.999 x1 + -1.000 x0 x2 x2' = -2.666 x2 + 1.000 x0 x1 >>> model.coefficients() array([[ 0. , -9.99969851, 9.99958359, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 1.00120331], [ 0. , 27.9935177 , -0.99906375, 0. , 0. , 0. , 0. , -0.99980455, 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , -2.666437 , 0. , 0. , 0.99990137, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]]) >>> model.score(x, u_eval, t=t[1]-t[0]) 0.9999999855414495
- fit(x, t=None, x_dot=None, u=None, multiple_trajectories=False, unbias=True, quiet=False, ensemble=False, library_ensemble=False, replace=True, n_candidates_to_drop=1, n_subset=None, n_models=None, ensemble_aggregator=None)[source]¶
Fit a SINDy model.
- Parameters
x (array-like or list of array-like, shape (n_samples, n_input_features)) – Training data. If training data contains multiple trajectories, x should be a list containing data for each trajectory. Individual trajectories may contain different numbers of samples.
t (float, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – If t is a float, it specifies the timestep between each sample. If array-like, it specifies the time at which each sample was collected. In this case the values in t must be strictly increasing. In the case of multi-trajectory training data, t may also be a list of arrays containing the collection times for each individual trajectory. If None, the default time step
t_default
will be used.x_dot (array-like or list of array-like, shape (n_samples, n_input_features), optional (default None)) – Optional pre-computed derivatives of the training data. If not provided, the time derivatives of the training data will be computed using the specified differentiation method. If x_dot is provided, it must match the shape of the training data and these values will be used as the time derivatives.
u (array-like or list of array-like, shape (n_samples, n_control_features), optional (default None)) – Control variables/inputs. Include this variable to use sparse identification for nonlinear dynamical systems for control (SINDYc). If training data contains multiple trajectories (i.e. if x is a list of array-like), then u should be a list containing control variable data for each trajectory. Individual trajectories may contain different numbers of samples.
multiple_trajectories (boolean, optional, (default False)) – Whether or not the training data includes multiple trajectories. If True, the training data must be a list of arrays containing data for each trajectory. If False, the training data must be a single array.
unbias (boolean, optional (default True)) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support. If the optimizer (
self.optimizer
) applies any type of regularization, that regularization may bias coefficients toward particular values, improving the conditioning of the problem but harming the quality of the fit. Settingunbias==True
enables an extra step wherein unregularized linear regression is applied, but only for the coefficients in the support identified by the optimizer. This helps to remove the bias introduced by regularization.quiet (boolean, optional (default False)) – Whether or not to suppress warnings during model fitting.
ensemble (boolean, optional (default False)) – This parameter is used to allow for “ensembling”, i.e. the generation of many SINDy models (n_models) by choosing a random temporal subset of the input data (n_subset) for each sparse regression. This often improves robustness because averages (bagging) or medians (bragging) of all the models are usually quite high-performing. The user can also generate “distributions” of many models, and calculate how often certain library terms are included in a model.
library_ensemble (boolean, optional (default False)) – This parameter is used to allow for “library ensembling”, i.e. the generation of many SINDy models (n_models) by choosing a random subset of the candidate library terms to truncate. So, n_models are generated by solving n_models sparse regression problems on these “reduced” libraries. Once again, this often improves robustness because averages (bagging) or medians (bragging) of all the models are usually quite high-performing. The user can also generate “distributions” of many models, and calculate how often certain library terms are included in a model.
replace (boolean, optional (default True)) – If ensemble true, whether or not to time sample with replacement.
n_candidates_to_drop (int, optional (default 1)) – Number of candidate terms in the feature library to drop during library ensembling.
n_subset (int, optional (default len(time base))) – Number of time points to use for ensemble
n_models (int, optional (default 20)) – Number of models to generate via ensemble
ensemble_aggregator (callable, optional (default numpy.median)) – Method to aggregate model coefficients across different samples. This method argument is only used if
ensemble
orlibrary_ensemble
is True. The method should take in a list of 2D arrays and return a 2D array of the same shape as the arrays in the list. Example:lambda x: np.median(x, axis=0)
- Returns
self
- Return type
a fitted
SINDy
instance
- predict(x, u=None, multiple_trajectories=False)[source]¶
Predict the time derivatives using the SINDy model.
- Parameters
x (array-like or list of array-like, shape (n_samples, n_input_features)) – Samples.
u (array-like or list of array-like, shape(n_samples, n_control_features), (default None)) – Control variables. If
multiple_trajectories==True
then u must be a list of control variable data from each trajectory. If the model was fit with control variables then u is not optional.multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.
- Returns
x_dot – Predicted time derivatives
- Return type
array-like or list of array-like, shape (n_samples, n_input_features)
- equations(precision=3)[source]¶
Get the right hand sides of the SINDy model equations.
- Parameters
precision (int, optional (default 3)) – Number of decimal points to include for each coefficient in the equation.
- Returns
equations – List of strings representing the SINDy model equations for each input feature.
- Return type
list of strings
- print(lhs=None, precision=3)[source]¶
Print the SINDy model equations.
- Parameters
lhs (list of strings, optional (default None)) – List of variables to print on the left-hand sides of the learned equations. By default
self.input_features
are used.precision (int, optional (default 3)) – Precision to be used when printing out model coefficients.
- score(x, t=None, x_dot=None, u=None, multiple_trajectories=False, metric=<function r2_score>, **metric_kws)[source]¶
Returns a score for the time derivative prediction produced by the model.
- Parameters
x (array-like or list of array-like, shape (n_samples, n_input_features)) – Samples from which to make predictions.
t (float, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – Time step between samples or array of collection times. Optional, used to compute the time derivatives of the samples if x_dot is not provided. If None, the default time step
t_default
will be used.x_dot (array-like or list of array-like, shape (n_samples, n_input_features), optional (default None)) – Optional pre-computed derivatives of the samples. If provided, these values will be used to compute the score. If not provided, the time derivatives of the training data will be computed using the specified differentiation method.
u (array-like or list of array-like, shape(n_samples, n_control_features), optional (default None)) – Control variables. If
multiple_trajectories==True
then u must be a list of control variable data from each trajectory. If the model was fit with control variables then u is not optional.multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.
metric (callable, optional) – Metric function with which to score the prediction. Default is the R^2 coefficient of determination. See Scikit-learn for more options.
metric_kws (dict, optional) – Optional keyword arguments to pass to the metric function.
- Returns
score – Metric function value for the model prediction of x_dot.
- Return type
float
- differentiate(x, t=None, multiple_trajectories=False)[source]¶
Apply the model’s differentiation method (
self.differentiation_method
) to data.- Parameters
x (array-like or list of array-like, shape (n_samples, n_input_features)) – Data to be differentiated.
t (int, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – Time step between samples or array of collection times. If None, the default time step
t_default
will be used.multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.
- Returns
x_dot – Time derivatives computed by using the model’s differentiation method
- Return type
array-like or list of array-like, shape (n_samples, n_input_features)
- coefficients()[source]¶
Get an array of the coefficients learned by SINDy model.
- Returns
coef – Learned coefficients of the SINDy model. Equivalent to \(\Xi^\top\) in the literature.
- Return type
np.ndarray, shape (n_input_features, n_output_features)
- get_feature_names()[source]¶
Get a list of names of features used by SINDy model.
- Returns
feats – A list of strings giving the names of the features in the feature library,
self.feature_library
.- Return type
list
- simulate(x0, t, u=None, integrator='solve_ivp', stop_condition=None, interpolator=None, integrator_kws={'atol': 1e-12, 'method': 'LSODA', 'rtol': 1e-12}, interpolator_kws={})[source]¶
Simulate the SINDy model forward in time.
- Parameters
x0 (numpy array, size [n_features]) – Initial condition from which to simulate.
t (int or numpy array of size [n_samples]) – If the model is in continuous time, t must be an array of time points at which to simulate. If the model is in discrete time, t must be an integer indicating how many steps to predict.
u (function from R^1 to R^{n_control_features} or list/array, optional (default None)) – Control inputs. If the model is continuous time, i.e.
self.discrete_time == False
, this function should take in a time and output the values of each of the n_control_features control features as a list or numpy array. Alternatively, if the model is continuous time,u
can also be an array of control inputs at each time step. In this case the array is fit with the interpolator specified byinterpolator
. If the model is discrete time, i.e.self.discrete_time == True
, u should be a list (withlen(u) == t
) or array (withu.shape[0] == 1
) giving the control inputs at each step.integrator (string, optional (default
solve_ivp
)) – Function to use to integrate the system. Default isscipy.integrate.solve_ivp
. The only options currently supported are solve_ivp and odeint.stop_condition (function object, optional) – If model is in discrete time, optional function that gives a stopping condition for stepping the simulation forward.
interpolator (callable, optional (default
interp1d
)) – Function used to interpolate control inputs ifu
is an array. Default isscipy.interpolate.interp1d
.integrator_kws (dict, optional (default {})) – Optional keyword arguments to pass to the integrator
interpolator_kws (dict, optional (default {})) – Optional keyword arguments to pass to the control input interpolator
- Returns
x – Simulation results
- Return type
numpy array, shape (n_samples, n_features)
- property complexity¶
Complexity of the model measured as the number of nonzero parameters.
pysindy.version module¶
Module contents¶
- class pysindy.SINDy(optimizer=None, feature_library=None, differentiation_method=None, feature_names=None, t_default=1, discrete_time=False)[source]¶
Bases:
BaseEstimator
Sparse Identification of Nonlinear Dynamical Systems (SINDy). Uses sparse regression to learn a dynamical systems model from measurement data.
- Parameters
optimizer (optimizer object, optional) – Optimization method used to fit the SINDy model. This must be a class extending
pysindy.optimizers.BaseOptimizer
. The default isSTLSQ
.feature_library (feature library object, optional) – Feature library object used to specify candidate right-hand side features. This must be a class extending
pysindy.feature_library.base.BaseFeatureLibrary
. The default option isPolynomialLibrary
.differentiation_method (differentiation object, optional) – Method for differentiating the data. This must be a class extending
pysindy.differentiation_methods.base.BaseDifferentiation
class. The default option is centered difference.feature_names (list of string, length n_input_features, optional) – Names for the input features (e.g.
['x', 'y', 'z']
). If None, will use['x0', 'x1', ...]
.t_default (float, optional (default 1)) – Default value for the time step.
discrete_time (boolean, optional (default False)) – If True, dynamical system is treated as a map. Rather than predicting derivatives, the right hand side functions step the system forward by one time step. If False, dynamical system is assumed to be a flow (right-hand side functions predict continuous time derivatives).
- Attributes
model (
sklearn.multioutput.MultiOutputRegressor
object) – The fitted SINDy model.n_input_features_ (int) – The total number of input features.
n_output_features_ (int) – The total number of output features. This number is a function of
self.n_input_features
and the feature library being used.n_control_features_ (int) – The total number of control input features.
Examples
>>> import numpy as np >>> from scipy.integrate import solve_ivp >>> from pysindy import SINDy >>> lorenz = lambda z,t : [10*(z[1] - z[0]), >>> z[0]*(28 - z[2]) - z[1], >>> z[0]*z[1] - 8/3*z[2]] >>> t = np.arange(0,2,.002) >>> x = solve_ivp(lorenz, [-8,8,27], t) >>> model = SINDy() >>> model.fit(x, t=t[1]-t[0]) >>> model.print() x0' = -10.000 1 + 10.000 x0 x1' = 27.993 1 + -0.999 x0 + -1.000 1 x1 x2' = -2.666 x1 + 1.000 1 x0 >>> model.coefficients() array([[ 0. , 0. , 0. ], [-9.99969193, 27.99344519, 0. ], [ 9.99961547, -0.99905338, 0. ], [ 0. , 0. , -2.66645651], [ 0. , 0. , 0. ], [ 0. , 0. , 0.99990257], [ 0. , -0.99980268, 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ], [ 0. , 0. , 0. ]]) >>> model.score(x, t=t[1]-t[0]) 0.999999985520653
>>> import numpy as np >>> from scipy.integrate import solve_ivp >>> from pysindy import SINDy >>> u = lambda t : np.sin(2 * t) >>> lorenz_c = lambda z,t : [ 10 * (z[1] - z[0]) + u(t) ** 2, z[0] * (28 - z[2]) - z[1], z[0] * z[1] - 8 / 3 * z[2], ] >>> t = np.arange(0,2,0.002) >>> x = solve_ivp(lorenz_c, [-8,8,27], t) >>> u_eval = u(t) >>> model = SINDy() >>> model.fit(x, u_eval, t=t[1]-t[0]) >>> model.print() x0' = -10.000 x0 + 10.000 x1 + 1.001 u0^2 x1' = 27.994 x0 + -0.999 x1 + -1.000 x0 x2 x2' = -2.666 x2 + 1.000 x0 x1 >>> model.coefficients() array([[ 0. , -9.99969851, 9.99958359, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 1.00120331], [ 0. , 27.9935177 , -0.99906375, 0. , 0. , 0. , 0. , -0.99980455, 0. , 0. , 0. , 0. , 0. , 0. , 0. ], [ 0. , 0. , 0. , -2.666437 , 0. , 0. , 0.99990137, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ]]) >>> model.score(x, u_eval, t=t[1]-t[0]) 0.9999999855414495
- fit(x, t=None, x_dot=None, u=None, multiple_trajectories=False, unbias=True, quiet=False, ensemble=False, library_ensemble=False, replace=True, n_candidates_to_drop=1, n_subset=None, n_models=None, ensemble_aggregator=None)[source]¶
Fit a SINDy model.
- Parameters
x (array-like or list of array-like, shape (n_samples, n_input_features)) – Training data. If training data contains multiple trajectories, x should be a list containing data for each trajectory. Individual trajectories may contain different numbers of samples.
t (float, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – If t is a float, it specifies the timestep between each sample. If array-like, it specifies the time at which each sample was collected. In this case the values in t must be strictly increasing. In the case of multi-trajectory training data, t may also be a list of arrays containing the collection times for each individual trajectory. If None, the default time step
t_default
will be used.x_dot (array-like or list of array-like, shape (n_samples, n_input_features), optional (default None)) – Optional pre-computed derivatives of the training data. If not provided, the time derivatives of the training data will be computed using the specified differentiation method. If x_dot is provided, it must match the shape of the training data and these values will be used as the time derivatives.
u (array-like or list of array-like, shape (n_samples, n_control_features), optional (default None)) – Control variables/inputs. Include this variable to use sparse identification for nonlinear dynamical systems for control (SINDYc). If training data contains multiple trajectories (i.e. if x is a list of array-like), then u should be a list containing control variable data for each trajectory. Individual trajectories may contain different numbers of samples.
multiple_trajectories (boolean, optional, (default False)) – Whether or not the training data includes multiple trajectories. If True, the training data must be a list of arrays containing data for each trajectory. If False, the training data must be a single array.
unbias (boolean, optional (default True)) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support. If the optimizer (
self.optimizer
) applies any type of regularization, that regularization may bias coefficients toward particular values, improving the conditioning of the problem but harming the quality of the fit. Settingunbias==True
enables an extra step wherein unregularized linear regression is applied, but only for the coefficients in the support identified by the optimizer. This helps to remove the bias introduced by regularization.quiet (boolean, optional (default False)) – Whether or not to suppress warnings during model fitting.
ensemble (boolean, optional (default False)) – This parameter is used to allow for “ensembling”, i.e. the generation of many SINDy models (n_models) by choosing a random temporal subset of the input data (n_subset) for each sparse regression. This often improves robustness because averages (bagging) or medians (bragging) of all the models are usually quite high-performing. The user can also generate “distributions” of many models, and calculate how often certain library terms are included in a model.
library_ensemble (boolean, optional (default False)) – This parameter is used to allow for “library ensembling”, i.e. the generation of many SINDy models (n_models) by choosing a random subset of the candidate library terms to truncate. So, n_models are generated by solving n_models sparse regression problems on these “reduced” libraries. Once again, this often improves robustness because averages (bagging) or medians (bragging) of all the models are usually quite high-performing. The user can also generate “distributions” of many models, and calculate how often certain library terms are included in a model.
replace (boolean, optional (default True)) – If ensemble true, whether or not to time sample with replacement.
n_candidates_to_drop (int, optional (default 1)) – Number of candidate terms in the feature library to drop during library ensembling.
n_subset (int, optional (default len(time base))) – Number of time points to use for ensemble
n_models (int, optional (default 20)) – Number of models to generate via ensemble
ensemble_aggregator (callable, optional (default numpy.median)) – Method to aggregate model coefficients across different samples. This method argument is only used if
ensemble
orlibrary_ensemble
is True. The method should take in a list of 2D arrays and return a 2D array of the same shape as the arrays in the list. Example:lambda x: np.median(x, axis=0)
- Returns
self
- Return type
a fitted
SINDy
instance
- predict(x, u=None, multiple_trajectories=False)[source]¶
Predict the time derivatives using the SINDy model.
- Parameters
x (array-like or list of array-like, shape (n_samples, n_input_features)) – Samples.
u (array-like or list of array-like, shape(n_samples, n_control_features), (default None)) – Control variables. If
multiple_trajectories==True
then u must be a list of control variable data from each trajectory. If the model was fit with control variables then u is not optional.multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.
- Returns
x_dot – Predicted time derivatives
- Return type
array-like or list of array-like, shape (n_samples, n_input_features)
- equations(precision=3)[source]¶
Get the right hand sides of the SINDy model equations.
- Parameters
precision (int, optional (default 3)) – Number of decimal points to include for each coefficient in the equation.
- Returns
equations – List of strings representing the SINDy model equations for each input feature.
- Return type
list of strings
- print(lhs=None, precision=3)[source]¶
Print the SINDy model equations.
- Parameters
lhs (list of strings, optional (default None)) – List of variables to print on the left-hand sides of the learned equations. By default
self.input_features
are used.precision (int, optional (default 3)) – Precision to be used when printing out model coefficients.
- score(x, t=None, x_dot=None, u=None, multiple_trajectories=False, metric=<function r2_score>, **metric_kws)[source]¶
Returns a score for the time derivative prediction produced by the model.
- Parameters
x (array-like or list of array-like, shape (n_samples, n_input_features)) – Samples from which to make predictions.
t (float, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – Time step between samples or array of collection times. Optional, used to compute the time derivatives of the samples if x_dot is not provided. If None, the default time step
t_default
will be used.x_dot (array-like or list of array-like, shape (n_samples, n_input_features), optional (default None)) – Optional pre-computed derivatives of the samples. If provided, these values will be used to compute the score. If not provided, the time derivatives of the training data will be computed using the specified differentiation method.
u (array-like or list of array-like, shape(n_samples, n_control_features), optional (default None)) – Control variables. If
multiple_trajectories==True
then u must be a list of control variable data from each trajectory. If the model was fit with control variables then u is not optional.multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.
metric (callable, optional) –
Metric function with which to score the prediction. Default is the R^2 coefficient of determination. See Scikit-learn for more options.
metric_kws (dict, optional) – Optional keyword arguments to pass to the metric function.
- Returns
score – Metric function value for the model prediction of x_dot.
- Return type
float
- differentiate(x, t=None, multiple_trajectories=False)[source]¶
Apply the model’s differentiation method (
self.differentiation_method
) to data.- Parameters
x (array-like or list of array-like, shape (n_samples, n_input_features)) – Data to be differentiated.
t (int, numpy array of shape (n_samples,), or list of numpy arrays, optional (default None)) – Time step between samples or array of collection times. If None, the default time step
t_default
will be used.multiple_trajectories (boolean, optional (default False)) – If True, x contains multiple trajectories and must be a list of data from each trajectory. If False, x is a single trajectory.
- Returns
x_dot – Time derivatives computed by using the model’s differentiation method
- Return type
array-like or list of array-like, shape (n_samples, n_input_features)
- coefficients()[source]¶
Get an array of the coefficients learned by SINDy model.
- Returns
coef – Learned coefficients of the SINDy model. Equivalent to \(\Xi^\top\) in the literature.
- Return type
np.ndarray, shape (n_input_features, n_output_features)
- get_feature_names()[source]¶
Get a list of names of features used by SINDy model.
- Returns
feats – A list of strings giving the names of the features in the feature library,
self.feature_library
.- Return type
list
- simulate(x0, t, u=None, integrator='solve_ivp', stop_condition=None, interpolator=None, integrator_kws={'atol': 1e-12, 'method': 'LSODA', 'rtol': 1e-12}, interpolator_kws={})[source]¶
Simulate the SINDy model forward in time.
- Parameters
x0 (numpy array, size [n_features]) – Initial condition from which to simulate.
t (int or numpy array of size [n_samples]) – If the model is in continuous time, t must be an array of time points at which to simulate. If the model is in discrete time, t must be an integer indicating how many steps to predict.
u (function from R^1 to R^{n_control_features} or list/array, optional (default None)) – Control inputs. If the model is continuous time, i.e.
self.discrete_time == False
, this function should take in a time and output the values of each of the n_control_features control features as a list or numpy array. Alternatively, if the model is continuous time,u
can also be an array of control inputs at each time step. In this case the array is fit with the interpolator specified byinterpolator
. If the model is discrete time, i.e.self.discrete_time == True
, u should be a list (withlen(u) == t
) or array (withu.shape[0] == 1
) giving the control inputs at each step.integrator (string, optional (default
solve_ivp
)) – Function to use to integrate the system. Default isscipy.integrate.solve_ivp
. The only options currently supported are solve_ivp and odeint.stop_condition (function object, optional) – If model is in discrete time, optional function that gives a stopping condition for stepping the simulation forward.
interpolator (callable, optional (default
interp1d
)) – Function used to interpolate control inputs ifu
is an array. Default isscipy.interpolate.interp1d
.integrator_kws (dict, optional (default {})) – Optional keyword arguments to pass to the integrator
interpolator_kws (dict, optional (default {})) – Optional keyword arguments to pass to the control input interpolator
- Returns
x – Simulation results
- Return type
numpy array, shape (n_samples, n_features)
- property complexity¶
Complexity of the model measured as the number of nonzero parameters.
- class pysindy.AxesArray(input_array, axes)[source]¶
Bases:
NDArrayOperatorsMixin
,ndarray
A numpy-like array that keeps track of the meaning of its axes.
- Parameters
input_array (array-like) – the data to create the array.
axes (dict) – A dictionary of axis labels to shape indices. Allowed keys: - ax_time: int - ax_coord: int - ax_sample: int - ax_spatial: List[int]
- Raises
AxesWarning if axes does not match shape of input_array –
- property n_spatial¶
- property n_time¶
- property n_sample¶
- property n_coord¶
- class pysindy.BaseDifferentiation[source]¶
Bases:
BaseEstimator
Base class for differentiation methods.
Simply forces differentiation methods to implement a
_differentiate
function.
- class pysindy.FiniteDifference(order=2, d=1, axis=0, is_uniform=False, drop_endpoints=False, periodic=False)[source]¶
Bases:
BaseDifferentiation
Finite difference derivatives.
- Parameters
order (int, optional (default 2)) – The order of the finite difference method to be used. Currently only centered differences are implemented, for even order and left-off-centered differences for odd order.
d (int, optional (default 1)) – The order of derivative to take. Must be positive integer.
axis (int, optional (default 0)) – The axis to differentiate along.
is_uniform (boolean, optional (default False)) – Parameter to tell the differentiation that, although a N-dim grid is passed, it is uniform so can use dx instead of the full grid array.
drop_endpoints (boolean, optional (default False)) – Whether or not derivatives are computed for endpoints. If False, endpoints will be set to np.nan. Note that which points are endpoints depends on the method being used.
periodic (boolean, optional (default False)) – Whether to use periodic boundary conditions for endpoints. Use forward differences for periodic=False and periodic boundaries with centered differences for periodic=True on the boundaries. No effect if drop_endpoints=True
Examples
>>> import numpy as np >>> from pysindy.differentiation import FiniteDifference >>> t = np.linspace(0, 1, 5) >>> X = np.vstack((np.sin(t), np.cos(t))).T >>> fd = FiniteDifference() >>> fd._differentiate(X, t) array([[ 1.00114596, 0.00370551], [ 0.95885108, -0.24483488], [ 0.8684696 , -0.47444711], [ 0.72409089, -0.67456051], [ 0.53780339, -0.84443737]])
- class pysindy.SINDyDerivative(**kwargs)[source]¶
Bases:
BaseDifferentiation
Wrapper class for differentiation classes from the derivative package. This class is meant to provide all the same functionality as the dxdt method.
This class also has
_differentiate
and__call__
methods which are used by PySINDy.- Parameters
derivative_kws (dictionary, optional) –
Keyword arguments to be passed to the dxdt method.
Notes
See the derivative documentation for acceptable keywords.
- class pysindy.SmoothedFiniteDifference(smoother=<function savgol_filter>, smoother_kws={}, **kwargs)[source]¶
Bases:
FiniteDifference
Smoothed finite difference derivatives.
Perform differentiation by smoothing input data then applying a finite difference method.
- Parameters
smoother (function, optional (default
savgol_filter
)) – Function to perform smoothing. Must be compatible with the following call signature:x_smoothed = smoother(x, **smoother_kws)
smoother_kws (dict, optional (default
{}
)) – Arguments passed to smoother when it is invoked.**kwargs (kwargs) – Additional parameters passed to the
pysindy.FiniteDifference.__init__
function.
Examples
>>> import numpy as np >>> from pysindy.differentiation import SmoothedFiniteDifference >>> t = np.linspace(0,1,10) >>> X = np.vstack((np.sin(t),np.cos(t))).T >>> sfd = SmoothedFiniteDifference(smoother_kws={'window_length': 5}) >>> sfd._differentiate(X, t) array([[ 1.00013114e+00, 7.38006789e-04], [ 9.91779070e-01, -1.10702304e-01], [ 9.73376491e-01, -2.20038119e-01], [ 9.43001496e-01, -3.26517615e-01], [ 9.00981354e-01, -4.29066632e-01], [ 8.47849424e-01, -5.26323977e-01], [ 7.84260982e-01, -6.17090177e-01], [ 7.11073255e-01, -7.00180971e-01], [ 6.29013295e-01, -7.74740601e-01], [ 5.39752150e-01, -8.41980082e-01]])
- class pysindy.SpectralDerivative(d=1, axis=0)[source]¶
Bases:
BaseDifferentiation
Spectral derivatives. Assumes uniform grid, and utilizes FFT to approximate a derivative. Works well for derivatives in periodic dimensions. Equivalent to a maximal-order finite difference, but runs in O(NlogN).
- Parameters
d (int) – The order of derivative to take
axis (int, optional (default 0)) – The axis to differentiate along
Examples
>>> import numpy as np >>> from pysindy.differentiation import SpectralDerivative >>> t = np.arange(0,1,0.1) >>> X = np.vstack((np.sin(t), np.cos(t))).T >>> sd = SpectralDerivative() >>> sd._differentiate(X, t) array([[ 6.28318531e+00, 2.69942771e-16], [ 5.08320369e+00, -3.69316366e+00], [ 1.94161104e+00, -5.97566433e+00], [-1.94161104e+00, -5.97566433e+00], [-5.08320369e+00, -3.69316366e+00], [-6.28318531e+00, 7.10542736e-16], [-5.08320369e+00, 3.69316366e+00], [-1.94161104e+00, 5.97566433e+00], [ 1.94161104e+00, 5.97566433e+00], [ 5.08320369e+00, 3.69316366e+00]])
- class pysindy.ConcatLibrary(libraries: list, library_ensemble=False, ensemble_indices=[0])[source]¶
Bases:
BaseFeatureLibrary
Concatenate multiple libraries into one library. All settings provided to individual libraries will be applied.
- Parameters
libraries (list of libraries) – Library instances to be applied to the input matrix.
library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library).
ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library. For instance, if ensemble_indices = [0], it chops off the first column of the library.
- Attributes
libraries_ (list of libraries) – Library instances to be applied to the input matrix.
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. The number of output features is the sum of the numbers of output features for each of the concatenated libraries.
Examples
>>> import numpy as np >>> from pysindy.feature_library import FourierLibrary, CustomLibrary >>> from pysindy.feature_library import ConcatLibrary >>> x = np.array([[0.,-1],[1.,0.],[2.,-1.]]) >>> functions = [lambda x : np.exp(x), lambda x,y : np.sin(x+y)] >>> lib_custom = CustomLibrary(library_functions=functions) >>> lib_fourier = FourierLibrary() >>> lib_concat = ConcatLibrary([lib_custom, lib_fourier]) >>> lib_concat.fit() >>> lib.transform(x)
- fit(x_full, y=None)[source]¶
Compute number of output features.
- Parameters
x (array-like, shape (n_samples, n_features)) – The data.
- Returns
self
- Return type
instance
- transform(x_full)[source]¶
Transform data with libs provided below.
- Parameters
x (array-like, shape [n_samples, n_features]) – The data to transform, row by row.
- Returns
xp – The matrix of features, where NP is the number of features generated from applying the custom functions to the inputs.
- Return type
np.ndarray, shape [n_samples, NP]
- get_feature_names(input_features=None)[source]¶
Return feature names for output features.
- Parameters
input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns
output_feature_names
- Return type
list of string, length n_output_features
- class pysindy.TensoredLibrary(libraries: list, library_ensemble=False, inputs_per_library=None, ensemble_indices=[0])[source]¶
Bases:
BaseFeatureLibrary
Tensor multiple libraries together into one library. All settings provided to individual libraries will be applied.
- Parameters
libraries (list of libraries) – Library instances to be applied to the input matrix.
library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library).
ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library. For instance, if ensemble_indices = [0], it chops off the first column of the library.
- Attributes
libraries_ (list of libraries) – Library instances to be applied to the input matrix.
inputs_per_library_ (numpy nd.array) – Array that specifies which inputs should be used for each of the libraries you are going to tensor together. Used for building GeneralizedLibrary objects.
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. The number of output features is the product of the numbers of output features for each of the libraries that were tensored together.
Examples
>>> import numpy as np >>> from pysindy.feature_library import FourierLibrary, CustomLibrary >>> from pysindy.feature_library import TensoredLibrary >>> x = np.array([[0.,-1],[1.,0.],[2.,-1.]]) >>> functions = [lambda x : np.exp(x), lambda x,y : np.sin(x+y)] >>> lib_custom = CustomLibrary(library_functions=functions) >>> lib_fourier = FourierLibrary() >>> lib_tensored = lib_custom * lib_fourier >>> lib_tensored.fit(x) >>> lib_tensored.transform(x)
- fit(x_full, y=None)[source]¶
Compute number of output features.
- Parameters
x (array-like, shape (n_samples, n_features)) – The data.
- Returns
self
- Return type
instance
- transform(x_full)[source]¶
Transform data with libs provided below.
- Parameters
x (array-like, shape [n_samples, n_features]) – The data to transform, row by row.
- Returns
xp – The matrix of features, where NP is the number of features generated from applying the custom functions to the inputs.
- Return type
np.ndarray, shape [n_samples, NP]
- get_feature_names(input_features=None)[source]¶
Return feature names for output features.
- Parameters
input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns
output_feature_names
- Return type
list of string, length n_output_features
- class pysindy.GeneralizedLibrary(libraries: list, tensor_array=None, inputs_per_library=None, library_ensemble=False, ensemble_indices=[0], exclude_libraries=[])[source]¶
Bases:
BaseFeatureLibrary
Put multiple libraries into one library. All settings provided to individual libraries will be applied. Note that this class allows one to specifically choose which input variables are used for each library, and take tensor products of any pair of libraries. Tensored libraries inherit the same input variables specified for the individual libraries.
- Parameters
libraries (list of libraries) – Library instances to be applied to the input matrix.
tensor_array (2D list of booleans, optional, (default None)) – Default is to not tensor any of the libraries together. Shape equal to the # of tensor libraries and the # feature libraries. Indicates which pairs of libraries to tensor product together and add to the overall library. For instance if you have 5 libraries, and want to do two tensor products, you could use the list [[1, 0, 0, 1, 0], [0, 1, 0, 1, 1]] to indicate that you want two tensored libraries from tensoring libraries 0 and 3 and libraries 1, 3, and 4.
inputs_per_library (2D np.ndarray, optional (default None)) – Shape should be equal to # feature libraries by # variable input. Can be used to specify a subset of the variables to use to generate a feature library. If number of feature libraries > 1, then can be used to generate a large number of libraries, each using their own subsets of the input variables. Note that this must be specified for all the individual feature libraries.
library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library).
ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library. For instance, if ensemble_indices = [0], it chops off the first column of the library.
- Attributes
libraries_ (list of libraries) – Library instances to be applied to the input matrix.
tensor_array_ (2D list of booleans (default None)) – Indicates which pairs of libraries to tensor product together and add to the overall library. For instance if you have 5 libraries, and want to do two tensor products, you could use the list [[1, 0, 0, 1, 0], [0, 1, 0, 1, 1]] to indicate that you want two tensored libraries from tensoring libraries 0 and 3 and libraries 1, 3, and 4. Shape equal to # of tensor libraries to make by the # feature libraries.
inputs_per_library_ (2D np.ndarray, (default None)) – Default is that all inputs are used for every library. Can be used to specify a subset of the variables to use to generate a feature library. If number of feature libraries > 1, then can be use to generate a large number of libraries, each using their own subsets of the input variables. Note that this must be specified for all the individual feature libraries. The shape is equal to # feature libraries, # variable inputs.
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. The number of output features is the sum of the numbers of output features for each of the concatenated libraries.
Examples
>>> import numpy as np >>> from pysindy.feature_library import FourierLibrary, CustomLibrary >>> from pysindy.feature_library import GeneralizedLibrary >>> x = np.array([[0.,-1],[1.,0.],[2.,-1.]]) >>> functions = [lambda x : np.exp(x), lambda x,y : np.sin(x+y)] >>> lib_custom = CustomLibrary(library_functions=functions) >>> lib_fourier = FourierLibrary() >>> lib_generalized = GeneralizedLibrary([lib_custom, lib_fourier]) >>> lib_generalized.fit(x) >>> lib_generalized.transform(x)
- fit(x_full, y=None)[source]¶
Compute number of output features.
- Parameters
x (array-like, shape (n_samples, n_features)) – The data.
- Returns
self
- Return type
instance
- transform(x_full)[source]¶
Transform data with libs provided below.
- Parameters
x (array-like, shape [n_samples, n_features]) – The data to transform, row by row.
- Returns
xp – The matrix of features, where NP is the number of features generated from applying the custom functions to the inputs.
- Return type
np.ndarray, shape [n_samples, NP]
- get_feature_names(input_features=None)[source]¶
Return feature names for output features.
- Parameters
input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns
output_feature_names
- Return type
list of string, length n_output_features
- class pysindy.CustomLibrary(library_functions, function_names=None, interaction_only=True, library_ensemble=False, ensemble_indices=[0], include_bias=False)[source]¶
Bases:
BaseFeatureLibrary
Generate a library with custom functions.
- Parameters
library_functions (list of mathematical functions) – Functions to include in the library. Default is to use same functions for all variables. Can also be used so that each variable has an associated library, in this case library_functions is shape (n_input_features, num_library_functions)
function_names (list of functions, optional (default None)) – List of functions used to generate feature names for each library function. Each name function must take a string input (representing a variable name), and output a string depiction of the respective mathematical function applied to that variable. For example, if the first library function is sine, the name function might return \(\sin(x)\) given \(x\) as input. The function_names list must be the same length as library_functions. If no list of function names is provided, defaults to using \([ f_0(x),f_1(x), f_2(x), \ldots ]\).
interaction_only (boolean, optional (default True)) – Whether to omit self-interaction terms. If True, function evaulations of the form \(f(x,x)\) and \(f(x,y,x)\) will be omitted, but those of the form \(f(x,y)\) and \(f(x,y,z)\) will be included. If False, all combinations will be included.
library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)
ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.
include_bias (boolean, optional (default False)) – If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model). This is hard to do with just lambda functions, because if the system is not 1D, lambdas will generate duplicates.
- Attributes
functions (list of functions) – Mathematical library functions to be applied to each input feature.
function_names (list of functions) – Functions for generating string representations of each library function.
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. The number of output features is the product of the number of library functions and the number of input features.
Examples
>>> import numpy as np >>> from pysindy.feature_library import CustomLibrary >>> x = np.array([[0.,-1],[1.,0.],[2.,-1.]]) >>> functions = [lambda x : np.exp(x), lambda x,y : np.sin(x+y)] >>> lib = CustomLibrary(library_functions=functions).fit(x) >>> lib.transform(x) array([[ 1. , 0.36787944, -0.84147098], [ 2.71828183, 1. , 0.84147098], [ 7.3890561 , 0.36787944, 0.84147098]]) >>> lib.get_feature_names() ['f0(x0)', 'f0(x1)', 'f1(x0,x1)']
- get_feature_names(input_features=None)[source]¶
Return feature names for output features.
- Parameters
input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns
output_feature_names
- Return type
list of string, length n_output_features
- fit(x_full, y=None)[source]¶
Compute number of output features.
- Parameters
x (array-like, shape (n_samples, n_features)) – Measurement data.
- Returns
self
- Return type
instance
- transform(x_full)[source]¶
Transform data to custom features
- Parameters
x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.
- Returns
xp – The matrix of features, where n_output_features is the number of features generated from applying the custom functions to the inputs.
- Return type
np.ndarray, shape (n_samples, n_output_features)
- class pysindy.FourierLibrary(n_frequencies=1, include_sin=True, include_cos=True, library_ensemble=False, ensemble_indices=[0])[source]¶
Bases:
BaseFeatureLibrary
Generate a library with trigonometric functions.
- Parameters
n_frequencies (int, optional (default 1)) – Number of frequencies to include in the library. The library will include functions \(\sin(x), \sin(2x), \dots \sin(n_{frequencies}x)\) for each input feature \(x\) (depending on which of sine and/or cosine features are included).
include_sin (boolean, optional (default True)) – If True, include sine terms in the library.
include_cos (boolean, optional (default True)) – If True, include cosine terms in the library.
library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)
ensemble_indices (integer array, optional (default 0)) – The indices to use for ensembling the library.
- Attributes
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. The number of output features is
2 * n_input_features_ * n_frequencies
if both sines and cosines are included. Otherwise it isn_input_features * n_frequencies
.
Examples
>>> import numpy as np >>> from pysindy.feature_library import FourierLibrary >>> x = np.array([[0.],[1.],[2.]]) >>> lib = FourierLibrary(n_frequencies=2).fit(x) >>> lib.transform(x) array([[ 0. , 1. , 0. , 1. ], [ 0.84147098, 0.54030231, 0.90929743, -0.41614684], [ 0.90929743, -0.41614684, -0.7568025 , -0.65364362]]) >>> lib.get_feature_names() ['sin(1 x0)', 'cos(1 x0)', 'sin(2 x0)', 'cos(2 x0)']
- get_feature_names(input_features=None)[source]¶
Return feature names for output features
- Parameters
input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns
output_feature_names
- Return type
list of string, length n_output_features
- fit(x_full, y=None)[source]¶
Compute number of output features.
- Parameters
x (array-like, shape (n_samples, n_features)) – The data.
- Returns
self
- Return type
instance
- transform(x_full)[source]¶
Transform data to Fourier features
- Parameters
x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.
- Returns
xp – The matrix of features, where n_output_features is the number of Fourier features generated from the inputs.
- Return type
np.ndarray, shape (n_samples, n_output_features)
- class pysindy.IdentityLibrary(library_ensemble=False, ensemble_indices=[0])[source]¶
Bases:
BaseFeatureLibrary
Generate an identity library which maps all input features to themselves.
- Attributes
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. The number of output features is equal to the number of input features.
library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)
ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.
Examples
>>> import numpy as np >>> from pysindy.feature_library import IdentityLibrary >>> x = np.array([[0,-1],[0.5,-1.5],[1.,-2.]]) >>> lib = IdentityLibrary().fit(x) >>> lib.transform(x) array([[ 0. , -1. ], [ 0.5, -1.5], [ 1. , -2. ]]) >>> lib.get_feature_names() ['x0', 'x1']
- get_feature_names(input_features=None)[source]¶
Return feature names for output features
- Parameters
input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns
output_feature_names
- Return type
list of string, length n_output_features
- fit(x_full, y=None)[source]¶
Compute number of output features.
- Parameters
x (array-like, shape (n_samples, n_features)) – The data.
- Returns
self
- Return type
instance
- transform(x_full)[source]¶
Perform identity transformation (return a copy of the input).
- Parameters
x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.
- Returns
x – The matrix of features, which is just a copy of the input data.
- Return type
np.ndarray, shape (n_samples, n_features)
- class pysindy.PolynomialLibrary(degree=2, include_interaction=True, interaction_only=False, include_bias=True, order='C', library_ensemble=False, ensemble_indices=[0])[source]¶
Bases:
PolynomialFeatures
,BaseFeatureLibrary
Generate polynomial and interaction features.
This is the same as
sklearn.preprocessing.PolynomialFeatures
, but also adds the option to omit interaction features from the library.- Parameters
degree (integer, optional (default 2)) – The degree of the polynomial features.
include_interaction (boolean, optional (default True)) – Determines whether interaction features are produced. If false, features are all of the form
x[i] ** k
.interaction_only (boolean, optional (default False)) – If true, only interaction features are produced: features that are products of at most
degree
distinct input features (so notx[1] ** 2
,x[0] * x[2] ** 3
, etc.).include_bias (boolean, optional (default True)) – If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model).
order (str in {'C', 'F'}, optional (default 'C')) – Order of output array in the dense case. ‘F’ order is faster to compute, but may slow down subsequent estimators.
library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)
ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.
- Attributes
powers_ (array, shape (n_output_features, n_input_features)) – powers_[i, j] is the exponent of the jth input in the ith output.
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. This number is computed by iterating over all appropriately sized combinations of input features.
- property powers_¶
Exponent for each of the inputs in the output.
- get_feature_names(input_features=None)[source]¶
Return feature names for output features.
- Parameters
input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns
output_feature_names
- Return type
list of string, length n_output_features
- fit(x_full, y=None)[source]¶
Compute number of output features.
- Parameters
x (array-like, shape (n_samples, n_features)) – The data.
- Returns
self
- Return type
instance
- transform(x_full)[source]¶
Transform data to polynomial features.
- Parameters
x (array-like or CSR/CSC sparse matrix, shape (n_samples, n_features)) – The data to transform, row by row. Prefer CSR over CSC for sparse input (for speed), but CSC is required if the degree is 4 or higher. If the degree is less than 4 and the input format is CSC, it will be converted to CSR, have its polynomial features generated, then converted back to CSC. If the degree is 2 or 3, the method described in “Leveraging Sparsity to Speed Up Polynomial Feature Expansions of CSR Matrices Using K-Simplex Numbers” by Andrew Nystrom and John Hughes is used, which is much faster than the method used on CSC input. For this reason, a CSC input will be converted to CSR, and the output will be converted back to CSC prior to being returned, hence the preference of CSR.
- Returns
xp – shape (n_samples, n_output_features) The matrix of features, where n_output_features is the number of polynomial features generated from the combination of inputs.
- Return type
np.ndarray or CSR/CSC sparse matrix,
- class pysindy.PDELibrary(library_functions=[], derivative_order=0, spatial_grid=None, temporal_grid=None, interaction_only=True, function_names=None, include_bias=False, include_interaction=True, library_ensemble=False, ensemble_indices=[0], implicit_terms=False, multiindices=None, differentiation_method=<class 'pysindy.differentiation.finite_difference.FiniteDifference'>, diff_kwargs={}, is_uniform=None, periodic=None)[source]¶
Bases:
BaseFeatureLibrary
Generate a PDE library with custom functions.
- Parameters
library_functions (list of mathematical functions, optional (default None)) – Functions to include in the library. Each function will be applied to each input variable (but not their derivatives)
derivative_order (int, optional (default 0)) – Order of derivative to take on each input variable, can be arbitrary non-negative integer.
spatial_grid (np.ndarray, optional (default None)) – The spatial grid for computing derivatives
temporal_grid (np.ndarray, optional (default None)) – The temporal grid if using SINDy-PI with PDEs.
function_names (list of functions, optional (default None)) – List of functions used to generate feature names for each library function. Each name function must take a string input (representing a variable name), and output a string depiction of the respective mathematical function applied to that variable. For example, if the first library function is sine, the name function might return \(\sin(x)\) given \(x\) as input. The function_names list must be the same length as library_functions. If no list of function names is provided, defaults to using \([ f_0(x),f_1(x), f_2(x), \ldots ]\).
interaction_only (boolean, optional (default True)) – Whether to omit self-interaction terms. If True, function evaulations of the form \(f(x,x)\) and \(f(x,y,x)\) will be omitted, but those of the form \(f(x,y)\) and \(f(x,y,z)\) will be included. If False, all combinations will be included.
include_bias (boolean, optional (default False)) – If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model). This is hard to do with just lambda functions, because if the system is not 1D, lambdas will generate duplicates.
include_interaction (boolean, optional (default True)) – This is a different than the use for the PolynomialLibrary. If true, it generates all the mixed derivative terms. If false, the library will consist of only pure no-derivative terms and pure derivative terms, with no mixed terms.
library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)
ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.
implicit_terms (boolean) – Flag to indicate if SINDy-PI (temporal derivatives) is being used for the right-hand side of the SINDy fit.
multiindices (list of integer arrays, (default None)) – Overrides the derivative_order to customize the included derivative orders. Each integer array indicates the order of differentiation along the corresponding axis for each derivative term.
differentiation_method (callable, (default FiniteDifference)) –
Spatial differentiation method.
- diff_kwargs: dictionary, (default {})
Keyword options to supply to differtiantion_method.
- Attributes
functions (list of functions) – Mathematical library functions to be applied to each input feature.
function_names (list of functions) – Functions for generating string representations of each library function.
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. The number of output features is the product of the number of library functions and the number of input features.
Examples
>>> import numpy as np >>> from pysindy.feature_library import PDELibrary
- get_feature_names(input_features=None)[source]¶
Return feature names for output features.
- Parameters
input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns
output_feature_names
- Return type
list of string, length n_output_features
- fit(x_full, y=None)[source]¶
Compute number of output features.
- Parameters
x (array-like, shape (n_samples, n_features)) – Measurement data.
- Returns
self
- Return type
instance
- transform(x_full)[source]¶
Transform data to pde features
- Parameters
x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.
- Returns
xp – The matrix of features, where n_output_features is the number of features generated from the tensor product of the derivative terms and the library_functions applied to combinations of the inputs.
- Return type
np.ndarray, shape (n_samples, n_output_features)
- class pysindy.WeakPDELibrary(library_functions=[], derivative_order=0, spatiotemporal_grid=None, function_names=None, interaction_only=True, include_bias=False, include_interaction=True, K=100, H_xt=None, p=4, library_ensemble=False, ensemble_indices=[0], num_pts_per_domain=None, implicit_terms=False, multiindices=None, differentiation_method=<class 'pysindy.differentiation.finite_difference.FiniteDifference'>, diff_kwargs={}, is_uniform=None, periodic=None)[source]¶
Bases:
BaseFeatureLibrary
- Generate a weak formulation library with custom functions and,
optionally, any spatial derivatives in arbitrary dimensions.
The features in the weak formulation are integrals of derivatives of input data multiplied by a test function phi, which are evaluated on K subdomains randomly sampled across the spatiotemporal grid. Each subdomain is initial generated with a size H_xt along each axis, and is then shrunk such that the left and right boundaries lie on spatiotemporal grid points. The expressions are integrated by parts to remove as many derivatives from the input data as possible and put the derivatives onto the test functions.
The weak integral features are calculated assuming the function f(x) to integrate against derivatives of the test function dphi(x) is linear between grid points provided by the data: f(x)=f_i+(x-x_i)/(x_{i+1}-x_i)*(f_{i+1}-f_i) Thus f(x)*dphi(x) is approximated as a piecewise polynomial. The piecewise components are integrated analytically. To improve performance, the complete integral is expressed as a dot product of weights against the input data f_i, which enables vectorized evaulations.
- Parameters
library_functions (list of mathematical functions, optional (default None)) – Functions to include in the library. Each function will be applied to each input variable (but not their derivatives)
derivative_order (int, optional (default 0)) – Order of derivative to take on each input variable, can be arbitrary non-negative integer.
spatiotemporal_grid (np.ndarray (default None)) – The spatiotemporal grid for computing derivatives. This variable must be specified with at least one dimension corresponding to a temporal grid, so that integration by parts can be done in the weak formulation.
function_names (list of functions, optional (default None)) – List of functions used to generate feature names for each library function. Each name function must take a string input (representing a variable name), and output a string depiction of the respective mathematical function applied to that variable. For example, if the first library function is sine, the name function might return \(\sin(x)\) given \(x\) as input. The function_names list must be the same length as library_functions. If no list of function names is provided, defaults to using \([ f_0(x),f_1(x), f_2(x), \ldots ]\).
interaction_only (boolean, optional (default True)) – Whether to omit self-interaction terms. If True, function evaulations of the form \(f(x,x)\) and \(f(x,y,x)\) will be omitted, but those of the form \(f(x,y)\) and \(f(x,y,z)\) will be included. If False, all combinations will be included.
include_bias (boolean, optional (default False)) – If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model). This is hard to do with just lambda functions, because if the system is not 1D, lambdas will generate duplicates.
include_interaction (boolean, optional (default True)) – This is a different than the use for the PolynomialLibrary. If true, it generates all the mixed derivative terms. If false, the library will consist of only pure no-derivative terms and pure derivative terms, with no mixed terms.
K (int, optional (default 100)) – Number of domain centers, corresponding to subdomain squares of length Hxt. If K is not specified, defaults to 100.
H_xt (array of floats, optional (default None)) – Half of the length of the square subdomains in each spatiotemporal direction. If H_xt is not specified, defaults to H_xt = L_xt / 20, where L_xt is the length of the full domain in each spatiotemporal direction. If H_xt is specified as a scalar, this value will be applied to all dimensions of the subdomains.
p (int, optional (default 4)) – Positive integer to define the polynomial degree of the spatial weights used for weak/integral SINDy.
library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)
ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.
num_pts_per_domain (int, deprecated (default None)) – Included here to retain backwards compatibility with older code that uses this parameter. However, it merely raises a DeprecationWarning and then is ignored.
implicit_terms (boolean) – Flag to indicate if SINDy-PI (temporal derivatives) is being used for the right-hand side of the SINDy fit.
multiindices (list of integer arrays, (default None)) – Overrides the derivative_order to customize the included derivative orders. Each integer array indicates the order of differentiation along the corresponding axis for each derivative term.
differentiation_method (callable, (default FiniteDifference)) –
Spatial differentiation method.
- diff_kwargs: dictionary, (default {})
Keyword options to supply to differtiantion_method.
- Attributes
functions (list of functions) – Mathematical library functions to be applied to each input feature.
function_names (list of functions) – Functions for generating string representations of each library function.
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. The number of output features is the product of the number of library functions and the number of input features.
Examples
>>> import numpy as np >>> from pysindy.feature_library import WeakPDELibrary >>> x = np.array([[0.,-1],[1.,0.],[2.,-1.]]) >>> functions = [lambda x : np.exp(x), lambda x,y : np.sin(x+y)] >>> lib = WeakPDELibrary(library_functions=functions).fit(x) >>> lib.transform(x) array([[ 1. , 0.36787944, -0.84147098], [ 2.71828183, 1. , 0.84147098], [ 7.3890561 , 0.36787944, 0.84147098]]) >>> lib.get_feature_names() ['f0(x0)', 'f0(x1)', 'f1(x0,x1)']
- convert_u_dot_integral(u)[source]¶
Takes a full set of spatiotemporal fields u(x, t) and finds the weak form of u_dot.
- get_feature_names(input_features=None)[source]¶
Return feature names for output features.
- Parameters
input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns
output_feature_names
- Return type
list of string, length n_output_features
- fit(x_full, y=None)[source]¶
Compute number of output features.
- Parameters
x (array-like, shape (n_samples, n_features)) – Measurement data.
- Returns
self
- Return type
instance
- transform(x_full)[source]¶
Transform data to custom features
- Parameters
x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.
- Returns
xp – The matrix of features, where n_output_features is the number of features generated from applying the custom functions to the inputs.
- Return type
np.ndarray, shape (n_samples, n_output_features)
- class pysindy.SINDyPILibrary(library_functions=None, t=None, x_dot_library_functions=None, function_names=None, interaction_only=True, differentiation_method=None, include_bias=False, library_ensemble=False, ensemble_indices=[0])[source]¶
Bases:
BaseFeatureLibrary
WARNING: This library is deprecated in PySINDy versions > 1.7. Please use the PDE or WeakPDE libraries instead.
Generate a library with custom functions. The Library takes custom libraries for X and Xdot respectively, and then tensor-products them together. For a 3D system, a library of constant and linear terms in x_dot, i.e. [1, x_dot0, …, x_dot3], is good enough for most problems and implicit terms. The function names list should include both X and Xdot functions, without the mixed terms.
- Parameters
library_functions (list of mathematical functions) – Functions to include in the library. Each function will be applied to each input variable x.
x_dot_library_functions (list of mathematical functions) – Functions to include in the library. Each function will be applied to each input variable x_dot.
t (np.ndarray of time slices) – Time base to compute Xdot from X for the implicit terms
differentiation_method (differentiation object, optional) – Method for differentiating the data. This must be a class extending
pysindy.differentiation_methods.base.BaseDifferentiation
class. The default option is centered difference.function_names (list of functions, optional (default None)) – List of functions used to generate feature names for each library function. Each name function must take a string input (representing a variable name), and output a string depiction of the respective mathematical function applied to that variable. For example, if the first library function is sine, the name function might return \(\sin(x)\) given \(x\) as input. The function_names list must be the same length as library_functions. If no list of function names is provided, defaults to using \([ f_0(x),f_1(x), f_2(x), \ldots ]\). For SINDy-PI, function_names should include the names of the functions in both the x and x_dot libraries (library_functions and x_dot_library_functions), but not the mixed terms, which are computed in the code.
interaction_only (boolean, optional (default True)) – Whether to omit self-interaction terms. If True, function evaulations of the form \(f(x,x)\) and \(f(x,y,x)\) will be omitted, but those of the form \(f(x,y)\) and \(f(x,y,z)\) will be included. If False, all combinations will be included.
include_bias (boolean, optional (default False)) – If True (default), then include a bias column, the feature in which all polynomial powers are zero (i.e. a column of ones - acts as an intercept term in a linear model). This is hard to do with just lambda functions, because if the system is not 1D, lambdas will generate duplicates.
library_ensemble (boolean, optional (default False)) – Whether or not to use library bagging (regress on subset of the candidate terms in the library)
ensemble_indices (integer array, optional (default [0])) – The indices to use for ensembling the library.
- Attributes
functions (list of functions) – Mathematical library functions to be applied to each input feature.
function_names (list of functions) – Functions for generating string representations of each library function.
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. The number of output features is the product of the number of library functions and the number of input features.
Examples
>>> import numpy as np >>> from pysindy.feature_library import SINDyPILibrary >>> t = np.linspace(0, 1, 5) >>> x = np.ones((5, 2)) >>> functions = [lambda x: 1, lambda x : np.exp(x), lambda x,y : np.sin(x+y)] >>> x_dot_functions = [lambda x: 1, lambda x : x] >>> function_names = [lambda x: '', lambda x : 'exp(' + x + ')', lambda x, y : 'sin(' + x + y + ')', lambda x: '', lambda x : x] >>> lib = ps.SINDyPILibrary(library_functions=functions, x_dot_library_functions=x_dot_functions, function_names=function_names, t=t ).fit(x) >>> lib.transform(x) [[ 1.00000000e+00 2.71828183e+00 2.71828183e+00 9.09297427e-01 2.22044605e-16 6.03579815e-16 6.03579815e-16 2.01904588e-16 2.22044605e-16 6.03579815e-16 6.03579815e-16 2.01904588e-16] [ 1.00000000e+00 2.71828183e+00 2.71828183e+00 9.09297427e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00] [ 1.00000000e+00 2.71828183e+00 2.71828183e+00 9.09297427e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00] [ 1.00000000e+00 2.71828183e+00 2.71828183e+00 9.09297427e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00] [ 1.00000000e+00 2.71828183e+00 2.71828183e+00 9.09297427e-01 -2.22044605e-16 -6.03579815e-16 -6.03579815e-16 -2.01904588e-16 -2.22044605e-16 -6.03579815e-16 -6.03579815e-16 -2.01904588e-16]] >>> lib.get_feature_names() ['', 'exp(x0)', 'exp(x1)', 'sin(x0x1)', 'x0_dot', 'exp(x0)x0_dot', 'exp(x1)x0_dot', 'sin(x0x1)x0_dot', 'x1_dot', 'exp(x0)x1_dot', 'exp(x1)x1_dot', 'sin(x0x1)x1_dot']
- get_feature_names(input_features=None)[source]¶
Return feature names for output features.
- Parameters
input_features (list of string, length n_features, optional) – String names for input features if available. By default, “x0”, “x1”, … “xn_features” is used.
- Returns
output_feature_names
- Return type
list of string, length n_output_features
- fit(x_full, y=None)[source]¶
Compute number of output features.
- Parameters
x (array-like, shape (n_samples, n_features)) – Measurement data.
- Returns
self
- Return type
instance
- transform(x_full)[source]¶
Transform data to custom features
- Parameters
x (array-like, shape (n_samples, n_features)) – The data to transform, row by row.
- Returns
xp – The matrix of features, where n_output_features is the number of features generated from applying the custom functions to the inputs.
- Return type
np.ndarray, shape (n_samples, n_output_features)
- class pysindy.ParameterizedLibrary(parameter_library=PolynomialLibrary(degree=1), feature_library=PolynomialLibrary(), num_parameters=3, num_features=3, library_ensemble=False, ensemble_indices=[0])[source]¶
Bases:
GeneralizedLibrary
Construct a SINDyCP library to fit multiple trajectories with variable control parameters. The library is composed of a tensor product of a feature library, applied to the input data, and a parameter library, applied to the input control. If the input libraries are weak, the temporal derivatives are automatically rescaled by the appropriate domain volumes.
- Parameters
feature_library (BaseFeatureLibrary, optional (default PolynomialLibrary).) –
features. (Specifies the library function to apply to the input control) –
parameter_library (BaseFeatureLibrary, optional (default PolynomialLibrary).) –
features. –
num_features (int, optional (default 3)) –
data. (Specifies the number of features in the input) –
num_parameters (int, optional (default 3)) –
control. (Specifies the number of features in the input) –
- Attributes
libraries_ (list of libraries) – Library instances to be applied to the input matrix. Equal to [parameter_library,feature_library].
tensor_array_ (2D list of booleans) – Indicates which pairs of libraries to tensor product together and add to the overall library. Equal to [0,1]
inputs_per_library_ (2D np.ndarray) – Can be used to specify a subset of the variables to use to generate a feature library. Value determined by num_parameters and num_features.
n_input_features_ (int) – The total number of input features. WARNING: This is deprecated in scikit-learn version 1.0 and higher so we check the sklearn.__version__ and switch to n_features_in if needed.
n_output_features_ (int) – The total number of output features. The number of output features is the sum of the numbers of output features for each of the concatenated libraries.
Examples
>>> import numpy as np >>> from pysindy.feature_library import ParameterizedLibrary,PolynomialLibrary >>> from pysindy import AxesArray >>> xs=[np.random.random((5,3)) for n in range(3)] >>> us=[np.random.random((5,3)) for n in range(3)] >>> feature_lib=PolynomialLibrary(degree=3) >>> parameter_lib=PolynomialLibrary(degree=1) >>> lib=ParameterizedLibrary(feature_library=feature_lib, >>> parameter_library=parameter_lib,num_features=3,num_parameters=3) >>> xus=[AxesArray(np.concatenate([xs[i],us[i]],axis=-1)) for i in range(3)] >>> lib.fit(xus) >>> lib.transform(xus)
- class pysindy.BaseOptimizer(max_iter=20, normalize_columns=False, fit_intercept=False, initial_guess=None, copy_X=True)[source]¶
Bases:
LinearRegression
,ComplexityMixin
Base class for SINDy optimizers. Subclasses must implement a _reduce method for carrying out the bulk of the work of fitting a model.
- Parameters
fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.
copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.
initial_guess (np.ndarray, shape (n_features,) or (n_targets, n_features),) – optional (default None) Initial guess for coefficients
coef_
. If None, the initial guess is obtained via a least-squares fit.
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s).
ind_ (array, shape (n_features,) or (n_targets, n_features)) – Array of 0s and 1s indicating which coefficients of the weight vector have not been masked out.
history_ (list) – History of
coef_
over iterations of the optimization algorithm.Theta_ (np.ndarray, shape (n_samples, n_features)) – The Theta matrix to be used in the optimization. We save it as an attribute because access to the full library of terms is sometimes needed for various applications.
- fit(x_, y, sample_weight=None, **reduce_kws)[source]¶
Fit the model.
- Parameters
x (array-like, shape (n_samples, n_features)) – Training data
y (array-like, shape (n_samples,) or (n_samples, n_targets)) – Target values
sample_weight (float or numpy array of shape (n_samples,), optional) – Individual weights for each sample
reduce_kws (dict) – Optional keyword arguments to pass to the _reduce method (implemented by subclasses)
- Returns
self
- Return type
returns an instance of self
- class pysindy.EnsembleOptimizer(opt: BaseOptimizer, bagging: bool = False, library_ensemble: bool = False, n_models: int = 20, n_subset: Optional[int] = None, n_candidates_to_drop: int = 1, replace: bool = True, ensemble_aggregator: Optional[Callable] = None)[source]¶
Bases:
BaseOptimizer
Wrapper class for ensembling methods.
- Parameters
opt (BaseOptimizer) – The underlying optimizer to run on each ensemble
bagging (boolean, optional (default False)) – This parameter is used to allow for “ensembling”, i.e. the generation of many SINDy models (n_models) by choosing a random temporal subset of the input data (n_subset) for each sparse regression. This often improves robustness because averages (bagging) or medians (bragging) of all the models are usually quite high-performing. The user can also generate “distributions” of many models, and calculate how often certain library terms are included in a model.
library_ensemble (boolean, optional (default False)) – This parameter is used to allow for “library ensembling”, i.e. the generation of many SINDy models (n_models) by choosing a random subset of the candidate library terms to truncate. So, n_models are generated by solving n_models sparse regression problems on these “reduced” libraries. Once again, this often improves robustness because averages (bagging) or medians (bragging) of all the models are usually quite high-performing. The user can also generate “distributions” of many models, and calculate how often certain library terms are included in a model.
n_models (int, optional (default 20)) – Number of models to generate via ensemble
n_subset (int, optional (default len(time base))) – Number of time points to use for ensemble
n_candidates_to_drop (int, optional (default 1)) – Number of candidate terms in the feature library to drop during library ensembling.
replace (boolean, optional (default True)) – If ensemble true, whether or not to time sample with replacement.
ensemble_aggregator (callable, optional (default numpy.median)) – Method to aggregate model coefficients across different samples. This method argument is only used if
ensemble
orlibrary_ensemble
is True. The method should take in a list of 2D arrays and return a 2D array of the same shape as the arrays in the list. Example:lambda x: np.median(x, axis=0)
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Regularized weight vector(s). This is the v in the objective function.
coef_full_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s) that are not subjected to the regularization. This is the w in the objective function.
unbias (boolean) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support.
unbias
is automatically set to False if a constraint is used and is otherwise left uninitialized.
- class pysindy.SINDyOptimizer(optimizer, unbias=True)[source]¶
Bases:
BaseEstimator
Wrapper class for optimizers/sparse regression methods passed into the SINDy object.
Enables single target regressors (i.e. those whose predictions are 1-dimensional) to perform multi target regression (i.e. predictions are 2-dimensional). Also enhances an
_unbias
function to reduce bias when regularization is used.- Parameters
optimizer (estimator object) – The optimizer/sparse regressor to be wrapped, implementing
fit
andpredict
.optimizer
should also have the attributescoef_
,fit_intercept
, andintercept_
. Note that attributenormalize
is deprecated as of sklearn versions >= 1.0 and will be removed in future versions.unbias (boolean, optional (default True)) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support. For example, if
optimizer=STLSQ(alpha=0.1)
is used then the learned coefficients will be biased toward 0 due to the L2 regularization. Settingunbias=True
will trigger an additional step wherein the nonzero coefficients learned by the optimizer object will be updated using an unregularized least-squares fit.
- property coef_¶
- property intercept_¶
- property complexity¶
- class pysindy.SR3(threshold=0.1, thresholds=None, nu=1.0, tol=1e-05, thresholder='L0', trimming_fraction=0.0, trimming_step_size=1.0, max_iter=30, fit_intercept=False, copy_X=True, initial_guess=None, normalize_columns=False, verbose=False)[source]¶
Bases:
BaseOptimizer
Sparse relaxed regularized regression.
Attempts to minimize the objective function
\[0.5\|y-Xw\|^2_2 + \lambda R(u) + (0.5 / \nu)\|w-u\|^2_2\]where \(R(u)\) is a regularization function. See the following references for more details:
Zheng, Peng, et al. “A unified framework for sparse relaxed regularized regression: SR3.” IEEE Access 7 (2018): 1404-1423.
Champion, K., Zheng, P., Aravkin, A. Y., Brunton, S. L., & Kutz, J. N. (2020). A unified sparse optimization framework to learn parsimonious physics-informed models from data. IEEE Access, 8, 169259-169271.
- Parameters
threshold (float, optional (default 0.1)) – Determines the strength of the regularization. When the regularization function R is the L0 norm, the regularization is equivalent to performing hard thresholding, and lambda is chosen to threshold at the value given by this parameter. This is equivalent to choosing lambda = threshold^2 / (2 * nu).
nu (float, optional (default 1)) – Determines the level of relaxation. Decreasing nu encourages w and v to be close, whereas increasing nu allows the regularized coefficients v to be farther from w.
tol (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm.
thresholder (string, optional (default 'L0')) – Regularization function to use. Currently implemented options are ‘L0’ (L0 norm), ‘L1’ (L1 norm), ‘L2’ (L2 norm) and ‘CAD’ (clipped absolute deviation). Note by ‘L2 norm’ we really mean the squared L2 norm, i.e. ridge regression
trimming_fraction (float, optional (default 0.0)) – Fraction of the data samples to trim during fitting. Should be a float between 0.0 and 1.0. If 0.0, trimming is not performed.
trimming_step_size (float, optional (default 1.0)) – Step size to use in the trimming optimization procedure.
max_iter (int, optional (default 30)) – Maximum iterations of the optimization algorithm.
initial_guess (np.ndarray, shape (n_features) or (n_targets, n_features), optional (default None)) – Initial guess for coefficients
coef_
. If None, least-squares is used to obtain an initial guess.fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.
copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.
thresholds (np.ndarray, shape (n_targets, n_features), optional (default None)) – Array of thresholds for each library function coefficient. Each row corresponds to a measurement variable and each column to a function from the feature library. Recall that SINDy seeks a matrix \(\Xi\) such that \(\dot{X} \approx \Theta(X)\Xi\).
thresholds[i, j]
should specify the threshold to be used for the (j + 1, i + 1) entry of \(\Xi\). That is to say it should give the threshold to be used for the (j + 1)st library function in the equation for the (i + 1)st measurement variable.verbose (bool, optional (default False)) – If True, prints out the different error terms every max_iter / 10 iterations.
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Regularized weight vector(s). This is the v in the objective function.
coef_full_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s) that are not subjected to the regularization. This is the w in the objective function.
history_ (list) – History of sparse coefficients.
history_[k]
contains the sparse coefficients (v in the optimization objective function) at iteration k.
Examples
>>> import numpy as np >>> from scipy.integrate import odeint >>> from pysindy import SINDy >>> from pysindy.optimizers import SR3 >>> lorenz = lambda z,t : [10 * (z[1] - z[0]), >>> z[0] * (28 - z[2]) - z[1], >>> z[0] * z[1] - 8 / 3 * z[2]] >>> t = np.arange(0, 2, .002) >>> x = odeint(lorenz, [-8, 8, 27], t) >>> opt = SR3(threshold=0.1, nu=1) >>> model = SINDy(optimizer=opt) >>> model.fit(x, t=t[1] - t[0]) >>> model.print() x0' = -10.004 1 + 10.004 x0 x1' = 27.994 1 + -0.993 x0 + -1.000 1 x1 x2' = -2.662 x1 + 1.000 1 x0
- class pysindy.STLSQ(threshold=0.1, alpha=0.05, max_iter=20, ridge_kw=None, normalize_columns=False, fit_intercept=False, copy_X=True, initial_guess=None, verbose=False)[source]¶
Bases:
BaseOptimizer
Sequentially thresholded least squares algorithm. Defaults to doing Sequentially thresholded Ridge regression.
Attempts to minimize the objective function \(\|y - Xw\|^2_2 + \alpha \|w\|^2_2\) by iteratively performing least squares and masking out elements of the weight array w that are below a given threshold.
See the following reference for more details:
Brunton, Steven L., Joshua L. Proctor, and J. Nathan Kutz. “Discovering governing equations from data by sparse identification of nonlinear dynamical systems.” Proceedings of the national academy of sciences 113.15 (2016): 3932-3937.
- Parameters
threshold (float, optional (default 0.1)) – Minimum magnitude for a coefficient in the weight vector. Coefficients with magnitude below the threshold are set to zero.
alpha (float, optional (default 0.05)) – Optional L2 (ridge) regularization on the weight vector.
max_iter (int, optional (default 20)) – Maximum iterations of the optimization algorithm.
ridge_kw (dict, optional (default None)) – Optional keyword arguments to pass to the ridge regression.
fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.
copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.
initial_guess (np.ndarray, shape (n_features) or (n_targets, n_features),) – optional (default None) Initial guess for coefficients
coef_
. If None, least-squares is used to obtain an initial guess.verbose (bool, optional (default False)) – If True, prints out the different error terms every iteration.
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s).
ind_ (array, shape (n_features,) or (n_targets, n_features)) – Array of 0s and 1s indicating which coefficients of the weight vector have not been masked out, i.e. the support of
self.coef_
.history_ (list) – History of
coef_
.history_[k]
contains the values ofcoef_
at iteration k of sequentially thresholded least-squares.
Examples
>>> import numpy as np >>> from scipy.integrate import odeint >>> from pysindy import SINDy >>> from pysindy.optimizers import STLSQ >>> lorenz = lambda z,t : [10*(z[1] - z[0]), >>> z[0]*(28 - z[2]) - z[1], >>> z[0]*z[1] - 8/3*z[2]] >>> t = np.arange(0,2,.002) >>> x = odeint(lorenz, [-8,8,27], t) >>> opt = STLSQ(threshold=.1, alpha=.5) >>> model = SINDy(optimizer=opt) >>> model.fit(x, t=t[1]-t[0]) >>> model.print() x0' = -9.999 1 + 9.999 x0 x1' = 27.984 1 + -0.996 x0 + -1.000 1 x1 x2' = -2.666 x1 + 1.000 1 x0
- property complexity¶
- class pysindy.ConstrainedSR3(threshold=0.1, nu=1.0, tol=1e-05, thresholder='l0', max_iter=30, trimming_fraction=0.0, trimming_step_size=1.0, constraint_lhs=None, constraint_rhs=None, constraint_order='target', normalize_columns=False, fit_intercept=False, copy_X=True, initial_guess=None, thresholds=None, equality_constraints=False, inequality_constraints=False, constraint_separation_index=0, verbose=False, verbose_cvxpy=False)[source]¶
Bases:
SR3
Sparse relaxed regularized regression with linear (in)equality constraints.
Attempts to minimize the objective function
\[0.5\|y-Xw\|^2_2 + \lambda R(u) + (0.5 / \nu)\|w-u\|^2_2\]\[\text{subject to } Cw = d\]over u and w, where \(R(u)\) is a regularization function, C is a constraint matrix, and d is a vector of values. See the following reference for more details:
Champion, Kathleen, et al. “A unified sparse optimization framework to learn parsimonious physics-informed models from data.” IEEE Access 8 (2020): 169259-169271.
- Parameters
threshold (float, optional (default 0.1)) – Determines the strength of the regularization. When the regularization function R is the l0 norm, the regularization is equivalent to performing hard thresholding, and lambda is chosen to threshold at the value given by this parameter. This is equivalent to choosing lambda = threshold^2 / (2 * nu).
nu (float, optional (default 1)) – Determines the level of relaxation. Decreasing nu encourages w and v to be close, whereas increasing nu allows the regularized coefficients v to be farther from w.
tol (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm.
thresholder (string, optional (default 'l0')) – Regularization function to use. Currently implemented options are ‘l0’ (l0 norm), ‘l1’ (l1 norm), ‘l2’ (l2 norm), ‘cad’ (clipped absolute deviation), ‘weighted_l0’ (weighted l0 norm), ‘weighted_l1’ (weighted l1 norm), and ‘weighted_l2’ (weighted l2 norm).
max_iter (int, optional (default 30)) – Maximum iterations of the optimization algorithm.
fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
constraint_lhs (numpy ndarray, optional (default None)) – Shape should be (n_constraints, n_features * n_targets), The left hand side matrix C of Cw <= d. There should be one row per constraint.
constraint_rhs (numpy ndarray, shape (n_constraints,), optional (default None)) – The right hand side vector d of Cw <= d.
constraint_order (string, optional (default "target")) – The format in which the constraints
constraint_lhs
were passed. Must be one of “target” or “feature”. “target” indicates that the constraints are grouped by target: i.e. the firstn_features
columns correspond to constraint coefficients on the library features for the first target (variable), the nextn_features
columns to the library features for the second target (variable), and so on. “feature” indicates that the constraints are grouped by library feature: the firstn_targets
columns correspond to the first library feature, the nextn_targets
columns to the second library feature, and so on.normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed. Note that this parameter is incompatible with the constraints!
copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.
initial_guess (np.ndarray, optional (default None)) – Shape should be (n_features) or (n_targets, n_features). Initial guess for coefficients
coef_
, (v in the mathematical equations) If None, least-squares is used to obtain an initial guess.thresholds (np.ndarray, shape (n_targets, n_features), optional (default None)) – Array of thresholds for each library function coefficient. Each row corresponds to a measurement variable and each column to a function from the feature library. Recall that SINDy seeks a matrix \(\Xi\) such that \(\dot{X} \approx \Theta(X)\Xi\).
thresholds[i, j]
should specify the threshold to be used for the (j + 1, i + 1) entry of \(\Xi\). That is to say it should give the threshold to be used for the (j + 1)st library function in the equation for the (i + 1)st measurement variable.inequality_constraints (bool, optional (default False)) – If True, CVXPY methods are used to solve the problem.
verbose (bool, optional (default False)) – If True, prints out the different error terms every max_iter / 10 iterations.
verbose_cvxpy (bool, optional (default False)) – Boolean flag which is passed to CVXPY solve function to indicate if output should be verbose or not. Only relevant for optimizers that use the CVXPY package in some capabity.
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Regularized weight vector(s). This is the v in the objective function.
coef_full_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s) that are not subjected to the regularization. This is the w in the objective function.
unbias (boolean) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support.
unbias
is automatically set to False if a constraint is used and is otherwise left uninitialized.
- class pysindy.StableLinearSR3(threshold=0.1, nu=1.0, tol=1e-05, thresholder='l1', max_iter=30, trimming_fraction=0.0, trimming_step_size=1.0, constraint_lhs=None, constraint_rhs=None, constraint_order='target', normalize_columns=False, fit_intercept=False, copy_X=True, initial_guess=None, thresholds=None, equality_constraints=False, inequality_constraints=False, constraint_separation_index=0, verbose=False, verbose_cvxpy=False, gamma=-1e-08)[source]¶
Bases:
ConstrainedSR3
Sparse relaxed regularized regression for building a-priori stable linear models. This requires making a matrix negative definite, which can be challenging. Here we use a similar method to the TrappingOptimizer algorithm. Linear equality and linear inequality constraints are both allowed, as in the ConstrainedSR3 optimizer.
Attempts to minimize the objective function
\[0.5\|y-Xw\|^2_2 + \lambda R(u) + (0.5 / \nu)\|w-u\|^2_2\]\[\text{subject to } Cu = d, Du = e, w negative definite\]over u and w, where \(R(u)\) is a regularization function, C and D are constraint matrices, and d and e are vectors of values. NOTE: This optimizer is intended for building purely linear models that are guaranteed to be stable.
- Parameters
threshold (float, optional (default 0.1)) – Determines the strength of the regularization. When the regularization function R is the l0 norm, the regularization is equivalent to performing hard thresholding, and lambda is chosen to threshold at the value given by this parameter. This is equivalent to choosing lambda = threshold^2 / (2 * nu).
nu (float, optional (default 1)) – Determines the level of relaxation. Decreasing nu encourages w and v to be close, whereas increasing nu allows the regularized coefficients v to be farther from w.
tol (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm.
thresholder (string, optional (default 'l1')) – Regularization function to use. Currently implemented options are ‘l1’ (l1 norm), ‘l2’ (l2 norm), ‘cad’ (clipped absolute deviation), ‘weighted_l1’ (weighted l1 norm), and ‘weighted_l2’ (weighted l2 norm). Note that the thresholder must be convex here.
max_iter (int, optional (default 30)) – Maximum iterations of the optimization algorithm.
fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
constraint_lhs (numpy ndarray, optional (default None)) – Shape should be (n_constraints, n_features * n_targets), The left hand side matrix C of Cw <= d. There should be one row per constraint.
constraint_rhs (numpy ndarray, shape (n_constraints,), optional (default None)) – The right hand side vector d of Cw <= d.
constraint_order (string, optional (default "target")) – The format in which the constraints
constraint_lhs
were passed. Must be one of “target” or “feature”. “target” indicates that the constraints are grouped by target: i.e. the firstn_features
columns correspond to constraint coefficients on the library features for the first target (variable), the nextn_features
columns to the library features for the second target (variable), and so on. “feature” indicates that the constraints are grouped by library feature: the firstn_targets
columns correspond to the first library feature, the nextn_targets
columns to the second library feature, and so on.normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed. Note that this parameter is incompatible with the constraints!
copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.
initial_guess (np.ndarray, optional (default None)) – Shape should be (n_features) or (n_targets, n_features). Initial guess for coefficients
coef_
, (v in the mathematical equations) If None, least-squares is used to obtain an initial guess.thresholds (np.ndarray, shape (n_targets, n_features), optional (default None)) – Array of thresholds for each library function coefficient. Each row corresponds to a measurement variable and each column to a function from the feature library. Recall that SINDy seeks a matrix \(\Xi\) such that \(\dot{X} \approx \Theta(X)\Xi\).
thresholds[i, j]
should specify the threshold to be used for the (j + 1, i + 1) entry of \(\Xi\). That is to say it should give the threshold to be used for the (j + 1)st library function in the equation for the (i + 1)st measurement variable.inequality_constraints (bool, optional (default False)) – If True, CVXPY methods are used to solve the problem.
verbose (bool, optional (default False)) – If True, prints out the different error terms every max_iter / 10 iterations.
verbose_cvxpy (bool, optional (default False)) – Boolean flag which is passed to CVXPY solve function to indicate if output should be verbose or not. Only relevant for optimizers that use the CVXPY package in some capabity.
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Regularized weight vector(s). This is the v in the objective function.
coef_full_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s) that are not subjected to the regularization. This is the w in the objective function.
unbias (boolean) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support.
unbias
is automatically set to False if a constraint is used and is otherwise left uninitialized.
- class pysindy.TrappingSR3(evolve_w=True, threshold=0.1, eps_solver=1e-07, relax_optim=True, inequality_constraints=False, eta=None, alpha_A=None, alpha_m=None, gamma=-0.1, tol=1e-05, tol_m=1e-05, thresholder='l1', thresholds=None, max_iter=30, accel=False, normalize_columns=False, fit_intercept=False, copy_X=True, m0=None, A0=None, objective_history=None, constraint_lhs=None, constraint_rhs=None, constraint_order='target', verbose=False, verbose_cvxpy=False)[source]¶
Bases:
SR3
Trapping variant of sparse relaxed regularized regression. This optimizer can be used to identify systems with globally stable (bounded) solutions.
Attempts to minimize one of two related objective functions
\[0.5\|y-Xw\|^2_2 + \lambda R(w) + 0.5\|Pw-A\|^2_2/\eta + \delta_0(Cw-d) + \delta_{\Lambda}(A)\]or
\[0.5\|y-Xw\|^2_2 + \lambda R(w) + \delta_0(Cw-d) + 0.5 * maximumeigenvalue(A)/\eta\]where \(R(w)\) is a regularization function, which must be convex, \(\delta_0\) is an indicator function that provides a hard constraint of CW = d, and :math:delta_{Lambda} is a term to project the \(A\) matrix onto the space of negative definite matrices. See the following references for more details:
Kaptanoglu, Alan A., et al. “Promoting global stability in data-driven models of quadratic nonlinear dynamics.” arXiv preprint arXiv:2105.01843 (2021).
Zheng, Peng, et al. “A unified framework for sparse relaxed regularized regression: Sr3.” IEEE Access 7 (2018): 1404-1423.
Champion, Kathleen, et al. “A unified sparse optimization framework to learn parsimonious physics-informed models from data.” IEEE Access 8 (2020): 169259-169271.
- Parameters
evolve_w (bool, optional (default True)) – If false, don’t update w and just minimize over (m, A)
threshold (float, optional (default 0.1)) – Determines the strength of the regularization. When the regularization function R is the L0 norm, the regularization is equivalent to performing hard thresholding, and lambda is chosen to threshold at the value given by this parameter. This is equivalent to choosing lambda = threshold^2 / (2 * nu).
eta (float, optional (default 1.0e20)) – Determines the strength of the stability term ||Pw-A||^2 in the optimization. The default value is very large so that the algorithm default is to ignore the stability term. In this limit, this should be approximately equivalent to the ConstrainedSR3 method.
alpha_m (float, optional (default eta * 0.1)) – Determines the step size in the prox-gradient descent over m. For convergence, need alpha_m <= eta / ||w^T * PQ^T * PQ * w||. Typically 0.01 * eta <= alpha_m <= 0.1 * eta.
alpha_A (float, optional (default eta)) – Determines the step size in the prox-gradient descent over A. For convergence, need alpha_A <= eta, so typically alpha_A = eta is used.
gamma (float, optional (default 0.1)) – Determines the negative interval that matrix A is projected onto. For most applications gamma = 0.1 - 1.0 works pretty well.
tol (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm over w.
tol_m (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm over m.
thresholder (string, optional (default 'L1')) – Regularization function to use. For current trapping SINDy, only the L1 and L2 norms are implemented. Note that other convex norms could be straightforwardly implemented, but L0 requires reformulation because of nonconvexity.
thresholds (np.ndarray, shape (n_targets, n_features), optional (default None)) – Array of thresholds for each library function coefficient. Each row corresponds to a measurement variable and each column to a function from the feature library. Recall that SINDy seeks a matrix \(\Xi\) such that \(\dot{X} \approx \Theta(X)\Xi\).
thresholds[i, j]
should specify the threshold to be used for the (j + 1, i + 1) entry of \(\Xi\). That is to say it should give the threshold to be used for the (j + 1)st library function in the equation for the (i + 1)st measurement variable.eps_solver (float, optional (default 1.0e-7)) – If threshold != 0, this specifies the error tolerance in the CVXPY (OSQP) solve. Default is 1.0e-3 in OSQP.
relax_optim (bool, optional (default True)) – If relax_optim = True, use the relax-and-split method. If False, try a direct minimization on the largest eigenvalue.
inequality_constraints (bool, optional (default False)) – If True, relax_optim must be false or relax_optim = True AND threshold != 0, so that the CVXPY methods are used.
max_iter (int, optional (default 30)) – Maximum iterations of the optimization algorithm.
accel (bool, optional (default False)) – Whether or not to use accelerated prox-gradient descent for (m, A).
m0 (np.ndarray, shape (n_targets), optional (default None)) – Initial guess for vector m in the optimization. Otherwise each component of m is randomly initialized in [-1, 1].
A0 (np.ndarray, shape (n_targets, n_targets), optional (default None)) – Initial guess for vector A in the optimization. Otherwise A is initialized as A = diag(gamma).
fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.
normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.
verbose (bool, optional (default False)) – If True, prints out the different error terms every max_iter / 10 iterations.
verbose_cvxpy (bool, optional (default False)) – Boolean flag which is passed to CVXPY solve function to indicate if output should be verbose or not. Only relevant for optimizers that use the CVXPY package in some capabity.
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Regularized weight vector(s). This is the v in the objective function.
history_ (list) – History of sparse coefficients.
history_[k]
contains the sparse coefficients (v in the optimization objective function) at iteration k.objective_history_ (list) – History of the value of the objective at each step. Note that the trapping SINDy problem is nonconvex, meaning that this value may increase and decrease as the algorithm works.
A_history_ (list) – History of the auxiliary variable A that approximates diag(PW).
m_history_ (list) – History of the shift vector m that determines the origin of the trapping region.
PW_history_ (list) – History of PW = A^S, the quantity we are attempting to make negative definite.
PWeigs_history_ (list) – History of diag(PW), a list of the eigenvalues of A^S at each iteration. Tracking this allows us to ascertain if A^S is indeed being pulled towards the space of negative definite matrices.
PL_unsym_ (np.ndarray, shape (n_targets, n_targets, n_targets, n_features)) – Unsymmetrized linear coefficient part of the P matrix in ||Pw - A||^2
PL_ (np.ndarray, shape (n_targets, n_targets, n_targets, n_features)) – Linear coefficient part of the P matrix in ||Pw - A||^2
PQ_ (np.ndarray, shape (n_targets, n_targets,) – n_targets, n_targets, n_features) Quadratic coefficient part of the P matrix in ||Pw - A||^2
Examples
>>> import numpy as np >>> from scipy.integrate import odeint >>> from pysindy import SINDy >>> from pysindy.optimizers import TrappingSR3 >>> lorenz = lambda z,t : [10*(z[1] - z[0]), >>> z[0]*(28 - z[2]) - z[1], >>> z[0]*z[1] - 8/3*z[2]] >>> t = np.arange(0,2,.002) >>> x = odeint(lorenz, [-8,8,27], t) >>> opt = TrappingSR3(threshold=0.1) >>> model = SINDy(optimizer=opt) >>> model.fit(x, t=t[1]-t[0]) >>> model.print() x0' = -10.004 1 + 10.004 x0 x1' = 27.994 1 + -0.993 x0 + -1.000 1 x1 x2' = -2.662 x1 + 1.000 1 x0
- class pysindy.SSR(alpha=0.05, max_iter=20, ridge_kw=None, normalize_columns=False, fit_intercept=False, copy_X=True, criteria='coefficient_value', kappa=None, verbose=False)[source]¶
Bases:
BaseOptimizer
Stepwise sparse regression (SSR) greedy algorithm.
Attempts to minimize the objective function \(\|y - Xw\|^2_2 + \alpha \|w\|^2_2\) by iteratively eliminating the smallest coefficient
See the following reference for more details:
Boninsegna, Lorenzo, Feliks Nüske, and Cecilia Clementi. “Sparse learning of stochastic dynamical equations.” The Journal of chemical physics 148.24 (2018): 241723.
- Parameters
max_iter (int, optional (default 20)) – Maximum iterations of the optimization algorithm.
fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.
copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.
kappa (float, optional (default None)) – If passed, compute the MSE errors with an extra L0 term with strength equal to kappa times the condition number of Theta.
criteria (string, optional (default "coefficient_value")) – The criteria to use for truncating a coefficient each iteration. Must be “coefficient_value” or “model_residual”. “coefficient_value”: zero out the smallest coefficient). “model_residual”: choose the N-1 term model with the smallest residual error.
alpha (float, optional (default 0.05)) – Optional L2 (ridge) regularization on the weight vector.
ridge_kw (dict, optional (default None)) – Optional keyword arguments to pass to the ridge regression.
verbose (bool, optional (default False)) – If True, prints out the different error terms every iteration.
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s).
history_ (list) – History of
coef_
.history_[k]
contains the values ofcoef_
at iteration k of SSRerr_history_ (list) – History of
coef_
.history_[k]
contains the MSE of eachcoef_
at iteration k of SSR
Examples
>>> import numpy as np >>> from scipy.integrate import odeint >>> from pysindy import SINDy >>> from pysindy.optimizers import SSR >>> lorenz = lambda z,t : [10 * (z[1] - z[0]), >>> z[0] * (28 - z[2]) - z[1], >>> z[0] * z[1] - 8 / 3 * z[2]] >>> t = np.arange(0, 2, .002) >>> x = odeint(lorenz, [-8, 8, 27], t) >>> opt = SSR(alpha=.5) >>> model = SINDy(optimizer=opt) >>> model.fit(x, t=t[1] - t[0]) >>> model.print() x0' = -9.999 1 + 9.999 x0 x1' = 27.984 1 + -0.996 x0 + -1.000 1 x1 x2' = -2.666 x1 + 1.000 1 x0
- class pysindy.FROLS(normalize_columns=False, fit_intercept=False, copy_X=True, kappa=None, max_iter=10, alpha=0.05, ridge_kw=None, verbose=False)[source]¶
Bases:
BaseOptimizer
Forward Regression Orthogonal Least-Squares (FROLS) optimizer.
Attempts to minimize the objective function \(\|y - Xw\|^2_2 + \alpha \|w\|^2_2\) by iteractively selecting the most correlated function in the library. This is a greedy algorithm.
See the following reference for more details:
Billings, Stephen A. Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. John Wiley & Sons, 2013.
- Parameters
fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed.
copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.
kappa (float, optional (default None)) – If passed, compute the MSE errors with an extra L0 term with strength equal to kappa times the condition number of Theta.
max_iter (int, optional (default 10)) – Maximum iterations of the optimization algorithm. This determines the number of nonzero terms chosen by the FROLS algorithm.
alpha (float, optional (default 0.05)) – Optional L2 (ridge) regularization on the weight vector.
ridge_kw (dict, optional (default None)) – Optional keyword arguments to pass to the ridge regression.
verbose (bool, optional (default False)) – If True, prints out the different error terms every iteration.
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s).
history_ (list) – History of
coef_
.history_[k]
contains the values ofcoef_
at iteration k of FROLS.
Examples
>>> import numpy as np >>> from scipy.integrate import odeint >>> from pysindy import SINDy >>> from pysindy.optimizers import FROLS >>> lorenz = lambda z,t : [10 * (z[1] - z[0]), >>> z[0] * (28 - z[2]) - z[1], >>> z[0] * z[1] - 8 / 3 * z[2]] >>> t = np.arange(0, 2, .002) >>> x = odeint(lorenz, [-8, 8, 27], t) >>> opt = FROLS(threshold=.1, alpha=.5) >>> model = SINDy(optimizer=opt) >>> model.fit(x, t=t[1] - t[0]) >>> model.print() x0' = -9.999 1 + 9.999 x0 x1' = 27.984 1 + -0.996 x0 + -1.000 1 x1 x2' = -2.666 x1 + 1.000 1 x0
- class pysindy.SINDyPI(threshold=0.1, tol=1e-05, thresholder='l1', max_iter=10000, fit_intercept=False, copy_X=True, thresholds=None, model_subset=None, normalize_columns=False, verbose_cvxpy=False)[source]¶
Bases:
SR3
SINDy-PI optimizer
Attempts to minimize the objective function
\[0.5\|X-Xw\|^2_2 + \lambda R(w)\]over w where \(R(v)\) is a regularization function. See the following reference for more details:
Kaheman, Kadierdan, J. Nathan Kutz, and Steven L. Brunton. SINDy-PI: a robust algorithm for parallel implicit sparse identification of nonlinear dynamics. Proceedings of the Royal Society A 476.2242 (2020): 20200279.
- Parameters
threshold (float, optional (default 0.1)) – Determines the strength of the regularization. When the regularization function R is the l0 norm, the regularization is equivalent to performing hard thresholding, and lambda is chosen to threshold at the value given by this parameter. This is equivalent to choosing lambda = threshold^2 / (2 * nu).
tol (float, optional (default 1e-5)) – Tolerance used for determining convergence of the optimization algorithm.
thresholder (string, optional (default 'l1')) – Regularization function to use. Currently implemented options are ‘l1’ (l1 norm), ‘weighted_l1’ (weighted l1 norm), l2, and ‘weighted_l2’ (weighted l2 norm)
max_iter (int, optional (default 10000)) – Maximum iterations of the optimization algorithm.
fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
normalize_columns (boolean, optional (default False)) – This parameter normalizes the columns of Theta before the optimization is done. This tends to standardize the columns to similar magnitudes, often improving performance.
copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.
thresholds (np.ndarray, shape (n_targets, n_features), optional (default None)) – Array of thresholds for each library function coefficient. Each row corresponds to a measurement variable and each column to a function from the feature library. Recall that SINDy seeks a matrix \(\Xi\) such that \(\dot{X} \approx \Theta(X)\Xi\).
thresholds[i, j]
should specify the threshold to be used for the (j + 1, i + 1) entry of \(\Xi\). That is to say it should give the threshold to be used for the (j + 1)st library function in the equation for the (i + 1)st measurement variable.model_subset (np.ndarray, shape(n_models), optional (default None)) – List of indices to compute models for. If list is not provided, the default is to compute SINDy-PI models for all possible candidate functions. This can take a long time for 4D systems or larger.
verbose_cvxpy (bool, optional (default False)) – Boolean flag which is passed to CVXPY solve function to indicate if output should be verbose or not. Only relevant for optimizers that use the CVXPY package in some capabity.
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Regularized weight vector(s). This is the v in the objective function.
unbias (boolean) – Whether to perform an extra step of unregularized linear regression to unbias the coefficients for the identified support.
unbias
is automatically set to False if a constraint is used and is otherwise left uninitialized.
- class pysindy.MIOSR(target_sparsity=5, group_sparsity=None, alpha=0.01, regression_timeout=10, fit_intercept=False, constraint_lhs=None, constraint_rhs=None, constraint_order='target', normalize_columns=False, copy_X=True, initial_guess=None, verbose=False)[source]¶
Bases:
BaseOptimizer
Mixed-Integer Optimized Sparse Regression.
Solves the sparsity constrained regression problem to provable optimality .. math:
\|y-Xw\|^2_2 + \lambda R(u)
\[\text{subject to } \|w\|_0 \leq k\]by using type-1 specially ordered sets (SOS1) to encode the support of the coefficients. Can optionally add additional constraints on the coefficients or access the gurobi model directly for advanced usage. See the following reference for additional details:
Bertsimas, D. and Gurnee, W., 2022. Learning Sparse Nonlinear Dynamics via Mixed-Integer Optimization. arXiv preprint arXiv:2206.00176.
- Parameters
target_sparsity (int, optional (default 5)) – The maximum number of nonzero coefficients across all dimensions. If set, the model will fit all dimensions jointly, potentially reducing statistical efficiency.
group_sparsity (int tuple, optional (default None)) – Tuple of length n_targets constraining the number of nonzero coefficients for each target dimension.
alpha (float, optional (default 0.01)) – Optional L2 (ridge) regularization on the weight vector.
regression_timeout (int, optional (default 10)) – The timeout (in seconds) of the gurobi optimizer to solve and prove optimality (either per dimension or jointly depending on the above sparsity settings).
fit_intercept (boolean, optional (default False)) – Whether to calculate the intercept for this model. If set to false, no intercept will be used in calculations.
constraint_lhs (numpy ndarray, optional (default None)) – Shape should be (n_constraints, n_features * n_targets), The left hand side matrix C of Cw <= d. There should be one row per constraint.
constraint_rhs (numpy ndarray, shape (n_constraints,), optional (default None)) – The right hand side vector d of Cw <= d.
constraint_order (string, optional (default "target")) – The format in which the constraints
constraint_lhs
were passed. Must be one of “target” or “feature”. “target” indicates that the constraints are grouped by target: i.e. the firstn_features
columns correspond to constraint coefficients on the library features for the first target (variable), the nextn_features
columns to the library features for the second target (variable), and so on. “feature” indicates that the constraints are grouped by library feature: the firstn_targets
columns correspond to the first library feature, the nextn_targets
columns to the second library feature, and so on.normalize_columns (boolean, optional (default False)) – Normalize the columns of x (the SINDy library terms) before regression by dividing by the L2-norm. Note that the ‘normalize’ option in sklearn is deprecated in sklearn versions >= 1.0 and will be removed. Note that this parameter is incompatible with the constraints!
copy_X (boolean, optional (default True)) – If True, X will be copied; else, it may be overwritten.
initial_guess (np.ndarray, shape (n_features) or (n_targets, n_features), optional (default None)) – Initial guess for coefficients
coef_
to warmstart the optimizer.verbose (bool, optional (default False)) – If True, prints out the Gurobi solver log.
- Attributes
coef_ (array, shape (n_features,) or (n_targets, n_features)) – Weight vector(s).
ind_ (array, shape (n_features,) or (n_targets, n_features)) – Array of 0s and 1s indicating which coefficients of the weight vector have not been masked out, i.e. the support of
self.coef_
.model (gurobipy.model) – The raw gurobi model being solved.
- property complexity¶