Working with Scikit-learn
This notebook shows how PySINDy objects interface with some useful tools from Scikit-learn.
Setup
[1]:
import numpy as np
from scipy.integrate import solve_ivp
from pysindy.utils import lorenz
import pysindy as ps
# ignore user warnings
import warnings
warnings.filterwarnings("ignore", category=UserWarning)
warnings.filterwarnings("ignore", category=FutureWarning)
# Integrator keywords for solve_ivp
integrator_keywords = {}
integrator_keywords['rtol'] = 1e-15
integrator_keywords['method'] = 'LSODA'
integrator_keywords['atol'] = 1e-10
Let’s generate some training data from the Lorenz system with which to experiment.
[2]:
# Generate training data
dt = .002
t_train = np.arange(0, 10, dt)
t_train_span = (t_train[0], t_train[-1])
x0_train = [-8, 8, 27]
x_train = solve_ivp(lorenz, t_train_span, x0_train,
t_eval=t_train, **integrator_keywords).y.T
# Evolve the Lorenz equations in time using a different initial condition
t_test = np.arange(0, 15, dt)
t_test_span = (t_test[0], t_test[-1])
x0_test = np.array([8, 7, 15])
x_test = solve_ivp(lorenz, t_test_span, x0_test,
t_eval=t_test, **integrator_keywords).y.T
Cross-validation
PySINDy supports Scikit-learn-type cross-validation with a few caveats.
We must use uniform timesteps using the
t_default
parameter. This is because thefit
andscore
methods ofSINDy
differ from those used in Scikit-learn in the sense that they both have an optionalt
parameter. Settingt_default
is a workaround.We have to be careful about the way we split up testing and training data during cross-validation. Because the
SINDy
object needs to differentiate the data, we need the training and test data to consist of sequential intervals of time. If we randomly sample the data, then the computed derivatives will be horribly inaccurate. Luckily, Scikit-learn has aTimeSeriesSplit
object for such situations. If we really want to randomly sample the data during cross-validation, there is a way to do so. However, it’s more complicated.
Note that we need to prepend optimizer__
, feature_library__
, or differentiation_method__
to the parameter names.
Cross-validation with TimeSeriesSplit
[3]:
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import TimeSeriesSplit
model = ps.SINDy(t_default=dt)
param_grid = {
"optimizer__threshold": [0.001, 0.01, 0.1],
"optimizer__alpha": [0.01, 0.05, 0.1],
"feature_library": [ps.PolynomialLibrary(), ps.FourierLibrary()],
"differentiation_method__order": [1, 2]
}
search = GridSearchCV(
model,
param_grid,
cv=TimeSeriesSplit(n_splits=5)
)
search.fit(x_train)
print("Best parameters:", search.best_params_)
search.best_estimator_.print()
Best parameters: {'differentiation_method__order': 1, 'feature_library': PolynomialLibrary(), 'optimizer__alpha': 0.01, 'optimizer__threshold': 0.01}
(x0)' = -10.021 x0 + 9.993 x1
(x1)' = 0.227 1 + 27.601 x0 + -0.611 x1 + -0.983 x0 x2 + -0.020 x1 x2
(x2)' = 0.590 1 + 0.045 x0 + -0.018 x1 + -2.691 x2 + 0.965 x0 x1 + 0.026 x1^2
Some extra care must be taken when working with differentiation methods from the derivative
package (i.e. those accessed via the SINDyDerivative
class). See the example below.
[4]:
model = ps.SINDy(
t_default=dt,
differentiation_method=ps.SINDyDerivative(kind='spline', s=1e-2)
)
param_grid = {
"optimizer__threshold": [0.001, 0.01, 0.1],
"differentiation_method__kwargs": [
{'kind': 'spline', 's': 1e-2},
{'kind': 'spline', 's': 1e-1},
{'kind': 'finite_difference', 'k': 1},
{'kind': 'finite_difference', 'k': 2},
]
}
# This part is identical to what we did before
search = GridSearchCV(
model,
param_grid,
cv=TimeSeriesSplit(n_splits=5)
)
search.fit(x_train)
print("Best parameters:", search.best_params_)
search.best_estimator_.print()
Best parameters: {'differentiation_method__kwargs': {'kind': 'spline', 's': 0.01}, 'optimizer__threshold': 0.1}
(x0)' = -10.000 x0 + 10.000 x1
(x1)' = 28.003 x0 + -1.001 x1 + -1.000 x0 x2
(x2)' = -2.667 x2 + 1.000 x0 x1
Cross-validation without TimeSeriesSplit
If we want to use another cross-validation splitter, we’ll need to (a) define a wrapper class which uses the argument “y” instead of “x_dot” and (b) precompute the derivatives. Note that (b) means that we will not be able to perform cross-validation on the parameters of the differentiation method.
[5]:
from sklearn.metrics import r2_score
class SINDyCV(ps.SINDy):
def __init__(
self,
optimizer=None,
feature_library=None,
differentiation_method=None,
feature_names=None,
t_default=1,
discrete_time=False,
):
super(SINDyCV, self).__init__(
optimizer=optimizer,
feature_library=feature_library,
differentiation_method=differentiation_method,
feature_names=feature_names,
t_default=t_default,
discrete_time=discrete_time,
)
def fit(self, x, y, **kwargs):
return super(SINDyCV, self).fit(x, x_dot=y, **kwargs)
def score(
self,
x,
y,
t=None,
u=None,
multiple_trajectories=False,
metric=r2_score,
**metric_kws
):
return super(SINDyCV, self).score(
x,
x_dot=y,
t=t,
u=u,
multiple_trajectories=multiple_trajectories,
metric=metric,
**metric_kws
)
[6]:
from sklearn.model_selection import ShuffleSplit
model = SINDyCV()
x_dot = model.differentiate(x_train, t=t_train)
param_grid = {
"optimizer__threshold": [0.002, 0.01, 0.1],
"optimizer__alpha": [0.01, 0.05, 0.1],
"feature_library__degree": [1, 2, 3],
}
search = GridSearchCV(
model,
param_grid,
cv=ShuffleSplit(n_splits=3, test_size=0.25)
)
search.fit(x_train, y=x_dot)
print("Best parameters:", search.best_params_)
search.best_estimator_.print()
Best parameters: {'feature_library__degree': 2, 'optimizer__alpha': 0.01, 'optimizer__threshold': 0.002}
(x0)' = -9.999 x0 + 9.999 x1
(x1)' = 27.992 x0 + -0.999 x1 + -1.000 x0 x2
(x2)' = -2.666 x2 + 1.000 x0 x1
Sparse optimizers
Any of Scikit-learn’s linear models can be used for the optimizer
parameter of a SINDy
object, though we only recommend using those designed for sparse regression.
In the examples below we set fit_intercept
to False
since the default feature library (polynomials of degree up to two) already includes constant functions.
[7]:
from sklearn.linear_model import ElasticNet
model = ps.SINDy(optimizer=ElasticNet(l1_ratio=0.9,
fit_intercept=False),
t_default=dt)
model.fit(x_train)
model.print()
(x0)' = -10.005 x0 + 10.003 x1
(x1)' = 27.990 x0 + -0.997 x1 + -1.000 x0 x2
(x2)' = -2.665 x2 + 0.001 x0^2 + 0.999 x0 x1
[8]:
from sklearn.linear_model import OrthogonalMatchingPursuit
model = ps.SINDy(
optimizer=OrthogonalMatchingPursuit(n_nonzero_coefs=8,
fit_intercept=False),
t_default=dt
)
model.fit(x_train)
model.print()
(x0)' = -10.005 x0 + 10.003 x1
(x1)' = 27.990 x0 + -0.997 x1 + -1.000 x0 x2
(x2)' = -2.665 x2 + 0.001 x0^2 + 0.999 x0 x1