1

(Cross-posting from statistics stackexchange)

Say we have a permanent-magnet DC motor that roughly obeys the system equation $$\ddot{x}(t) = \alpha \dot{x}(t) + \beta u(t) + \gamma $$ where $x(t)$ is the displacement of the rotor and $u(t)$ the applied voltage at time $t$.

Say we wish to determine the values of $\alpha, \beta$ and $\gamma$ experimentally. If we can only directly measure $x$ and not $\dot{x}$ or $\ddot{x}$, how should we go about estimating these parameters from a set of timeseries measurements of $u$ and $x$?

One naive approach is to compute the derivatives through some central finite difference scheme, and then perform an OLS regression - but it is unobvious how the derivative calculation interacts with the regression. Additionally, I have found in practice that this suffers from a significant amount of regression dilution if the test is allowed to run too long at steady-state (the derivatives vanish here, and so all that's left is the noise).

Is there any more "complete" method for analyze systems like this that handles the differentiation as part of the construction of the regression model? Is there a good theory of correlations between derivatives of time-series data?

Teo Protoulis
  • 819
  • 1
  • 8
  • 28
user3716267
  • 113
  • 4

2 Answers2

4

The process of system identification consists of the following steps:

  • Choose an appropriate input for the system. The input should definitely satisfy the Persistence of Excitation condition and therefore the input signal should be sufficiently rich such that all frequencies of the system will be excited. This will lead to a collection of output data which are rich in terms of the information they provide about the system.
  • Determine whether to use a black-box or grey-box model. If you can model the system by using first principles such as Newton's Second Law or Lagrange's equations then you can use the derived differential equation to perform the system identification. In contrast, if you don't have any information about the physics of the system then you are about to use a black-box model.
  • Appropriate processing of the acquired data. Before you do anything you have to process the data. For example, you can use a lowpass filter to remove high-frequency noise added by the sensors. You have to perform normalisation so that input and output data are of the same order. You may need to perform downsampling, remove any outliers or even choose which data to use from those collected. This step has some fixed guidance but also depends on your experience to work with data.
  • Choose identification algorithm. You have to decide first of all if you want to perform an online or offline parameter estimation of your system. Having reached this decision, next step is to determine which algorithm to use. OLS, RLS, TLS are some examples for offline algorithms. Steepese Descent, Lyapunov's parallel configuration, Lyapunov's series-parallel configuration etc are some examples of online identification.

Let's now suppose that you have collected the data and appropriately processes them. One possible approach is to use the OLS method in the following form. The differential equation of the system can be written as: $$ \begin{align} \ddot{x} &= \theta^{*T}\cdot \Delta \\ \theta^{*} &= \begin{bmatrix} \alpha & 0 & \beta & \gamma \end{bmatrix}^T \\ \Delta &= \begin{bmatrix} \dot{x} & x & u & 1 \end{bmatrix}^T \end{align} $$ Since only the measurements of $x(t)$ and $u(t)$ is available, we have to use a second order stable filter $\Lambda(s)$ in order to obtain the essential elements of the $\Delta$ vector. Suppose we use the filter: $$ \Lambda(s) = s^2+3s+2 $$ Then the system can be transformed into the following one:

$$\begin{align} \hat{y} &= \theta_{\lambda}^{T}\phi \\ \theta_{\lambda}^{T} &= \begin{bmatrix} \theta_1^{*T}-\lambda^T & \theta_{2}^{*T} \end{bmatrix}^T \\ &= \begin{bmatrix} \alpha-3 & -2 & \beta & \gamma \end{bmatrix}^T \\ \phi &= \begin{bmatrix} \frac{s}{\Lambda(s)}x & \frac{1}{\Lambda(s)}x & \frac{1}{\Lambda(s)}u & \frac{1}{\Lambda(s)}\end{bmatrix}^T \end{align}$$ This form of the system is called the plant parametric model which is linear in terms of the unknown parameters $\theta_{\lambda}$. The vector $\phi$ is called the regression vector and it is obvious that can be constructed only by the measurements of the system's input and output collected data. You can now apply the OLS algorithm and obtain the values of the unknown parameter vector $\theta_{\lambda}$: $$ \theta_{0} = \Big(\frac{1}{N}\sum_{i=1}^{N}\phi(t)\phi^{T}(t)\Big)^{-1}\cdot \Big(\frac{1}{N}\sum\phi(t)y(t)\Big) $$ here $N$ is the total number of measurements. The estimated vector $\theta_{0}$ gives the values of the unknown parameters that minimise the cost function: $$ V_{N}(\theta) =\frac{1}{N}\sum_{i=1}^{N} \frac{(y(t)-\theta_{\lambda}^{T}\phi(t))^2}{2} $$ which is minimised at the point where: $\frac{\partial V_{N}(\theta)}{\partial \theta} = 0$. This whole procedure is least squares estimate and it is performed when you have decided to use a grey-box model to identify the system's parameters.

Another possible way to perform the identification is to try to fit a simple ARX model. This is the case when you decide to use a black-box model to identify the system's parameters. You can take advantage of the mathematical model you have obtained and impose the ARX model to be of second order (which means two poles) and no zeros. These two characteristics for the ARX model are translated as: $$ \begin{align} na &= 2 \rightarrow \text{ Number of Poles} \\ (nb - 1) &= 0 \rightarrow \text{ Number of Zeros} \\ nc &= 0 \end{align}$$ As a result you will try to fit the data to the model below: $$ \begin{align} G(z) &= \frac{B(z)}{Α(z)} = \frac{b_1z^{-nk}}{1+f_1z^{-1}+f_2z^{-2}} \\ H(z) &= \frac{C(z)}{Α(z)} = \frac{1}{1+d_1z^{-1}+d_2z^{-2}} \\ A(z)y(t) &= B(z)u(t)+e(t) \end{align}$$ with $z$ begin the delay operator. The parameter $nk$ is the number of sampling intervals related to a potential dead time of the system. For example, if your system has no dead time then $nk = 0$. If your system has a dead time $\theta = 0.1sec$ and your sampling period (the one you used when you collected the data) is $T_s = 0.05sec$ then $nk = 2$. However, if sampling was $T_s = 0.08sec$ then $nk = 1$.

You will end up with a discrete-time model and after that all the techniques apply if you want to transform it to a continuous time model.

Of course, you could/should use software packages to perform the identification. However, the OLS method proposed is easily coded as well. I strongly encourage you to read the theory behind thee techniques in order to gain a deep insight on how they work since this answer here is just a quick introduction to them. These are some references:

  1. Robust Adaptive Control, P. Ioannou & J. Sum (available free online)
  2. Modeling and Identification of Dynamic Systems, L. Ljung & T. Glad
  3. System Identification: An Introduction, K. Keesman
Teo Protoulis
  • 819
  • 1
  • 8
  • 28
1

Another approach that has not been mentioned here, but which is applicable also to nonlinear systems, is to use a gradient-based optimization scheme over a simulation. Say that you simulate the (potentially nonlinear) system $$ \begin{aligned} \dot x &= f(x, u, p) \\ \hat y &= h(x, u, p) \end{aligned} $$ where $p$ denotes the parameters of the model. Then form a cost function $c(y_m, \hat y)$, where $y_m$ denotes measured outputs and $\hat y$ are estimated / simulated outputs, for example $$c(y_m, \hat y) = \sum_{t} (y_m(t) - \hat{y}(t))^2$$ Estimating the parameters then amounts to minimizing $c$ with respect to the parameters $p$. This can be done by, e.g., following the negative gradient $$-\dfrac{dc}{dp}$$ where $c$ depends implicitly on $p$ due to $\hat y$ depending on $p$. This procedure is called "gradient descent". Obtaining this gradient by hand can be tedious and error prone, especially since it involves simulating the ODE $\dot x = f(x, u, p)$, but many modern programming languages have good support for automatic differentiation which makes obtaining the gradient trivial even for very complicated simulators such as PDE and DAE solvers. For a system with a very small number of parameters like this, simply running a gradient free optimization algorithm would likely do the trick.

This approach replaces the need for differentiating the data by instead using an (ODE) integrator. This approach often has good low-frequency properties, indeed, if you have an error in your model and you integrate, the error tend to be inflated.

This documentation has a large number of tutorials where automatic differentiation (AD) is used to optimize cost functions that involve simulating different kinds of differential equations. For linear system, AD can be used to find the parameters of a linear state-space model using the prediction error method (PEM), documentation available here (full disclosure, I'm the author of this open-source software). PEM has the added benefit of estimating not only the model parameters, but also a noise model / Kalman filter at the same time. Apart from the open-source software linked to above, PEM is available also in widely used commercial system-identification packages.