Previous Page Next Page Contents

stats::reg -- regression (general least square fit)

Introduction

Consider a ``model function'' f with n parameters p.1, .., p.n relating a dependent variable y and m independent variables x.1, .., x.m:

y = f(x.1, .., x.m; p.1, .., p.n).

Given k different measurements x.1.j, .., x.k.j for the independent variables x.j and corresponding measurements y.1, .., y.k for the dependent variable y, one fits the parameters p.1, .., p.n by minimizing the ``weighted quadratic deviation''

Delta^2(p.1,..,p.n) = sum(w.i*abs(y.i - f(x.i.1,..,x.i.m; p1,..,p.n)^2, i=1..k).
stats::reg(..data.., f,..)

computes numerical approximations of the fit parameters.

Call(s)

stats::reg([x.1.1,..,x.k.1], .., [x.1.m,..,x.k.m], [y.1,..,y.k] <, [w.1,..,w.k]>, f, [x.1,..,x.m], [p.1,..,p.n] <, StartingValues = [p.1(0),..,p.n(0)]>)
stats::reg([[x.1.1,..,x.1.m, y.1 <, w.1>], .., [x.k.1,..,x.k.m, y.k <, w.k>]], f, [x.1,..,x.m], [p.1,..,p.n] <, StartingValues = [p.1(0),..,p.n(0)]>)

stats::reg(s, c.1, .., c.m, cy <, cw>, f, [x.1,..,x.m], [p.1,..,p.n] <, StartingValues = [p.1(0),..,p.n(0)]>)

stats::reg(s, [c.1, .., c.m, cy <, cw>], f, [x.1,..,x.m], [p.1,..,p.n] <, StartingValues = [p.1(0),..,p.n(0)]>)

Parameters

x.1.1,..,x.k.m - numerical sample data for the independent variables. The entry x.i.j represents the i-th measurement of the independent variable x.j.
y.1,..,y.k - numerical sample data for the dependent variable. The entry y.i represents the i-th measurement of the dependent variable.
w.1,..,w.k - numerical weight factors. The entry w.i is used as a weight for the data (x.i.1,..,x.i.m,y.i) of the i-th measurement. If no weights are provided, then w.i = 1 is used.
f - the model function: an arithmetical expression representing a function of the independent variables x.1, .., x.m and the fit parameters p.1, .., p.n. The expression must not contain any symbolic objects apart from x.1, .., p.n.
x.1,..,x.m - the independent variables: identifiers or indexed identifiers.
p.1,..,p.n - the fit parameters: identifiers or indexed identifiers.
p.1(0),..,p.n(0) - The user can assist the internal numerical search by providing numerical starting values p.i(0) for the fit parameters p.i. These should be reasonably close to the optimal fit values. The starting values p.i(0) = 1.0 are used, if no other values are provided by the user.
s - a sample of domain type stats::sample containing the data x.i.j for the independent variables, the data y.i for the dependent variable and, optionally, the weights w.i.
c.1,..,c.m - positive integers representing column indices of the sample s. Column c.j provides the measurements x.i.j for the independent variable x.j.
cy - a positive integer representing a column index of the sample s. This column provides the measurements y.i for the dependent variable.
cw - a positive integer representing a column index of the sample s. This column provides the weight factors w.i.

Returns

a list [[p.1,..,p.n], Delta^2] containing the optimized fit parameters p.i minimizing the quadratic deviation. The minimized value of this deviation is given by Delta^2, it indicates the quality of the fit. All returned data are floating point values. FAIL is returned, if a least square fit of the data is not possible with the given model function or if the internal numerical search failed.

Side Effects

The function is sensitive to the environment variable DIGITS, which determines the numerical working precision.

Related Functions

stats::linReg, stats::sample

Details

Option: StartingValues = [p.1(0),..,p.n(0)]

Example 1

We fit a linear function y = p1 + p2*x1 to four data pairs (x.i.1,y.i) given by two lists:

>> stats::reg([0, 1, 2, 3], [1, 3, 5, 7], 
              p1 + p2*x1, [x1], [p1, p2])
   
                             [[1.0, 2.0], 0.0]
      

The parameter values p1=1.0, p2=2.0 provide a perfect fit: the quadratic deviation vanishes.

Example 2

We fit an exponential function y = a*exp(x) to five data pairs (x.i,y.i). Weights are used to decrease the influence of the ``exceptional pair'' (x,y) = (5.0, 6.5*10^6) on the fit:

>> stats::reg([[1.1, 54, 1], [1.2, 73, 1], [1.3, 98, 1], 
               [1.4, 133, 1], [5.0, 6.5*10^6, 10^(-4)]], 
              a*exp(b*x), [x], [a, b])
   
                [[1.992321622, 2.999602426], 0.2001899629]
      

Example 3

We create a sample with four columns. The first column is a counter labeling the measurements. The second and third column provide measured data of two variables x1 and x2, respectively. The last column provides corresponding measurements of a dependent variable.

>> s := stats::sample([[1, 0, 0,  1.1], [2, 0, 1,  5.4], 
                       [3, 1, 1,  8.5], [4, 1, 2, 18.5], 
                       [5, 2, 1, 15.0], [6, 2, 2, 24.8]]) 
   
                              1  0  0   1.1
                              2  0  1   5.4
                              3  1  1   8.5
                              4  1  2  18.5
                              5  2  1  15.0
                              6  2  2  24.8
      

First, we try to model the data provided by the columns 2, 3, 4 by a function that is linear in the variables x1, x2. We specify the data columns by a list of column indices:

>> stats::reg(s, [2, 3, 4], p1 + p2*x1 + p3*x2, 
              [x1, x2], [p1, p2, p3])
   
         [[-0.9568181818, 4.688636364, 7.272727273], 15.23613636]
      

The quadratic deviation is rather large, indicating that a linear function is inappropriate to fit the data. Next, we extend the model and consider a polynomial fit function of degree 2. This is still a linear regression problem, because the fit parameters enter the model function linearly. We specify the data columns by a sequence of column indices:

>> stats::reg(s, 2, 3, 4,
              p1 + p2*x1 + p3*x2 + p4*x1^2 + p5*x2^2, 
              [x1, x2], [p1, p2, p3, p4, p5])
   
                   [[1.1, 1.525, 1.5, 1.625, 2.8], 0.01]
      

Finally, we include a further term p6*x1*x2 in the model, obtaining a perfect fit:

>> stats::reg(s, 2, 3, 4, 
              p1 + p2*x1 + p3*x2 + p4*x1^2 + p5*x2^2 + p6*x1*x2, 
              [x1, x2], [p1, p2, p3, p4, p5, p6])
   
           [[1.1, 1.6, 1.35, 1.7, 2.95, -0.2], 4.267632241e-34]
      
>> delete s:
   

Example 4

We create a sample of two columns:

>> s := stats::sample([[1, -1.44], [2, -0.82], 
                       [3, 0.97], [4, 1.37]])
   
                                1  -1.44
                                2  -0.82
                                3   0.97
                                4   1.37
      

The data are to be modeled by a function of the form y = p1*sin(p2*x), where the first column contains measurements of x and the second column contains corresponding data for y. Note that in this example there is no need to specify column indices, because the sample contains only two columns:

>> stats::reg(s, a*sin(b*x), [x], [a, b])
   
              [[-1.499812823, 1.281963381], 0.00001255632629]
      

Fitting a periodic function may be problematic. We provide starting values for the fit parameters and obtain a quite different set of parameters approximating the data with the same quality:

>> stats::reg(s, a*sin(b*x), [x], [a, b], StartingValues = [2, 5])
   
              [[1.499812823, 5.001221926], 0.00001255632629]
      
>> delete s:
   

Background

Changes




Do you have questions or comments?


Copyright © SciFace Software GmbH & Co. KG 2000