TeamQuest Corporation

What is Nonlinear Regression:Linear Least Square Error Regression of Multi Variable Function (Part II)

A simple form of multi variable function is a 2-variable function expressed as

f(x_1,x_2)=a_0+a_1x_1+a_2x_2.
(1)

The Sum of Square Error function for a set of measurements \{(\{x_{1j},x_{2j}\},\hat{f}(x_{1j},x_{2j}))|j=1,2,3,...,M\}, where \hat{f} is the measurement of f and M is the number of measurements, is

S(a_0,a_1,a_2)=\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j})-a_0-a_1x_{1j}-a_2x_{2j}]^2.
(2)

The expanded form of the right hand side (RHS) of Eq.(2) is

\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j})-a_0-a_1x_{1j}-a_2x_{2j}]^2=a_0^2\sum_{j=1}^M1+ a_1^2\sum_{j=1}^Mx_{1j}^2+a_2^2\sum_{j=1}^Mx_{2j}^2-2a_0\sum_{j=1}^M\hat{f}(x_{1j},x_{2j})-2a_1\sum_{j=1}^M\hat{f}(x_{1j},x_{2j})x_{1j}-2a_2\sum_{j=1}^M\hat{f}(x_{1j},x_{2j})x_{2j}+2a_0a_1\sum_{j=1}^Mx_{1j}+2a_0a_2\sum_{j=1}^Mx_{2j}+2a_1a_2\sum_{j=1}^Mx_{1j}x_{2j}+\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j})]^2.
(3)

To further simplify the example, assume that the constant term in the linear function of Eq.(1) is zero (i.e., a_0=0). Then Eq.(3) is rewritten as

\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j})-a_1x_{1j}-a_2x_{2j}]^2=a_1^2\sum_{j=1}^Mx_{1j}^2+a_2^2\sum_{j=1}^Mx_{2j}^2-2a_1\sum_{j=1}^M\hat{f}(x_{1j},x_{2j})x_{1j}-2a_2\sum_{j=1}^M\hat{f}(x_{1j},x_{2j})x_{2j}+2a_1a_2\sum_{j=1}^Mx_{1j}x_{2j}+\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j})]^2.
(4)

In the previous post, the graph was like a vertical slice of a bowl. Figure 1 shows inside of the seed holder of an avocado as bowl shape.

Figure 1: Seed holder surface as an example of a bowl shape.

 

Figure 2 shows how a vertical slice of bowl would look like in the length wise quarter slice of avocado.

Figure 2: Seed holder surface of the quarter slice as the shape of a vertical slice of a bowl.

 

In this case, the graph is like a 3-dimensional bowl as shown in Figure 3. The heatmap surface of Figure 3 is similar to the seed holder surface of the half of an avocado in Figure 1. In Figure 3, the

Figure 3: Quadratic polynomial Square Error function with positive leading coefficient for a function of 2 random variables

 

graph of  4x^2+4y^2-8x-8y+2xy+4 is displayed as an example of a polynomial of two input variables. To compare this as a function with the Sum of Square Error function of a function with two input random variables, replace x by a_1 and y by a_2. The minimum is a point at the bottom of the bowl of Figure  3. The base of the bowl is not a perfect circle in Figure 3. The heat map of the surface in the top part of Figure 3 shows the lowest value of the surface is possibly a negative value. In the lower part of Figure 3, the red color contour line at the center is for -0.1. A contour line is  a line joining all the points at same height at z-axis assuming that the bottom of the bowl is touching the horizontal plane of x-axis and y-axis. Any smaller value is so small that drawing an ellipse is not possible. Also, it is not possible to draw a point size ellipse, which would represent the minimum point of the surface of Figure 3. The smallest ellipse representing the value of -0.1 in Figure 3 indicates that the lowest point can be found in the area encircled by the contour line for -0.1. From the lower part of Figure 3, the center of the contour line at level -0.1 seems to be very close to the origin of the horizontal surface formed by x-axis and y-axis. The values of the other contour lines are listed in the right side above the lower part of Figure 3.

In the previous post, the conditional expression at the minimum for one random variable was

\frac{dS}{da}=0.

An extension of the above condition to take into account 2 random variables will be

\frac{\partial S(a_1,a_2)}{\partial a_i}=0, \text{ for } i=1,2.
(5)

To test this on the function of Figure 3, write S(a_1,a_2) as

S(a_1,a_2)=4a_1^2+4a_2^2-8a_1-8a_2+2a_1a_2+4.
(6)

When the condition of Eq.(5) is applied to the example graph of Figure 3, the following equations are obtained

\frac{\partial}{\partial a_1}(4a_1^2+4a_2^2-8a_1-8a_2+2a_1a_2+4)=8a_1-8+2a_2=0.
(7a)

\frac{\partial}{\partial a_2}(4a_1^2+4a_2^2-8a_1-8a_2+2a_1a_2+4)=8a_2-8+2a_1=0.
(7b)

As a result, the following system of linear equations are found

4a_1+a_2=4.
(8a)

a_1+4a_2=4.
(8b)

The solution of the system of linear equations of Eq.(8) is a_1=0.8 and a_2=0.8 or the point (0.8,0.8) on the horizontal plane formed by x-axis and y-axis in Figure 3. So, the precise point of the minimum is at the point (0.8,0.8) in the x-y plane of Figure 3 and is encircled by the smallest red contour at level -0.1 near the origin of the contour plots in the lower part. The precise value of the minimum is found by substituting a_1 by 0.8 and a_2 by 0.8 in the expression of Eq.(6)

4\times(0.8)^2+4\times(0.8)^2-8\times(0.8)-8\times(0.8)+2\times(0.8)\times(0.8)+4=-2.4.

The above derived value of -2.4 as minimum explains why the contour line is so small for the value -0.1.
A generalized multi variable linear function is formally stated as

f(x_1,x_2,...,x_n)=a_0+a_1x_1+a_2x_2+...+a_nx_n.
(9)

The coefficients a_0,a_1,a_2,...,a_n need to be estimated using Linear Least Square Error Regression from a set of measurements of the random variables x_1,x_2,...,x_n and the corresponding measurements of f(x_1,x_2,...,x_n). The theory explained in the previous post is extended here to deal with the multi variable function of Eq.(9). As was done in the previous post, let the measurement of f(x_1,x_2,...,x_n) be denoted by \hat{f}(x_1,x_2,...,x_n).

The sum of square of error function is expressed as

S(a_0,a_1,a_2,...,a_n)=\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j},...,x_{nj})-a_0-a_1x_{1j}-a_2x_{2j}-...-a_nx_{nj}]^2.
(10)

An example of n=2 is explained in the beginning and in Figure 3 to have an understanding of the graph of S(a_0,a_1,a_2,...,a_n) of Eq.(10), as this graph will need more than 3-dimension and is not possible to display.
The extension of the conditional expression of Eq.(5) is

\frac{\partial S(a_0,a_1,a_2,...,a_n)}{\partial a_i}=0,\text{for } i=0,1,2,...,n.
(11)

Substituting the RHS of Eq.(10) into Eq.(11)and expressing (x_1,x_2,x_3,...,x_n)as \overrightarrow{x} we obtain

\frac{\partial S(a_0,a_1,a_2,...,a_n)}{\partial a_i}=\frac{\partial}{\partial a_i}\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)-a_0-a_1x_{1j}-a_2x_{2j}-...-a_nx_{nj}]^2=0.

\Rightarrow 2\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)-a_0-a_1x_{1j}-...-a_nx_{nj}]\frac{\partial}{\partial a_i}[\hat{f}(\overrightarrow{x}_j)-a_0-...-a_ix_{ij}-...-a_nx_{nj}]=0.

\Rightarrow 2\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)-a_0-a_1x_{1j}-...-a_nx_{nj}][\frac{\partial}{\partial a_i}\hat{f}(\overrightarrow{x}_j)-\frac{\partial}{\partial a_i}a_0-...-\frac{\partial}{\partial a_i}a_ix_{ij}-...-\frac{\partial}{\partial a_i}a_nx_{nj}]=0.

\Rightarrow 2\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)-a_0-a_1x_{1j}-...-a_ix_{ij}-...-a_nx_{nj}][0-0-...-0-x_{ij}-0-...-0]=0.

\Rightarrow 2\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)-a_0-a_1x_{1j}...-a_ix_{ij}-...-a_nx_{nj}](-x_{ij})=0.

\Rightarrow -2\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{ij}-a_0x_{ij}-a_1x_{1j}-...-a_ix^2_{ij}-...-a_nx_{nj}x_{ij}]=0.

Dividing both sides of the above expression by -2

\Rightarrow \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{ij}]-\sum_{j=1}^M[a_0x_{ij}]-\sum_{j=1}^M[a_1x_{1j}x_{ij}]-...-\sum_{j=1}^M[a_ix^2_{ij}]-...-\sum_{j=1}^M[a_nx_{nj}x_{ij}]=0.

\Rightarrow a_0\sum_{j=1}^Mx_{ij}+a_1\sum_{j=1}^M(x_{1j}x_{ij})+...+a_i\sum_{j=1}^Mx^2_{ij}+...+a_n\sum_{j=1}^M(x_{nj}x_{ij})=\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{ij}].

The above expression can be expressed as a matrix equation as shown below.

\left(\begin{matrix}\sum_{j=1}^Mx_{1j} & \sum_{j=1}^M(x_{1j})^2 & \ldots & \sum_{j=1}^Mx_{1j}x_{nj} \\ \sum_{j=1}^Mx_{2j} & \sum_{j=1}^Mx_{1j}x_{2j} & \ldots & \sum_{j=1}^Mx_{2j}x_{nj} \\ \vdots & \vdots & \ddots &\vdots\\\sum_{j=1}^Mx_{nj}&\sum_{j=1}^Mx_{1j}x_{nj} &\ldots & \sum_{j=1}^M(x_{nj})^2\end{matrix}\right)\times\begin{pmatrix}a_0 \\ a_1 \\ \vdots \\ a_n\end{pmatrix}=\begin{pmatrix}\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{1j}] \\ \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{2j}] \\ \vdots\cr \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{nj}]\end{pmatrix}.
(12)

The solution of the above matrix equation will give the desired estimates of a_0,a_1,a_2,...,a_n. The solution involves finding the inverse of of the matrix

\left(\begin{matrix}\sum_{j=1}^Mx_{1j} & \sum_{j=1}^M(x_{1j})^2 & \ldots & \sum_{j=1}^Mx_{1j}x_{nj}\\\sum_{j=1}^Mx_{2j} & \sum_{j=1}^Mx_{1j}x_{2j} & \ldots & \sum_{j=1}^Mx_{2j}x_{nj} \\ \vdots & \vdots & \ddots & \vdots \\ \sum_{j=1}^Mx_{nj} & \sum_{j=1}^Mx_{1j}x_{nj} & \ldots & \sum_{j=1}^M(x_{nj})^2\end{matrix}\right).
(13)

Once the inverse is found the solution is found by multiplying both sides of Eq.(12) by the inverse matrix of Eq.(13) and obtain the solution

\begin{pmatrix}a_0 \\ a_1 \\ \vdots \\ a_n\end{pmatrix}=\begin{pmatrix}\sum_{j=1}^Mx_{1j} & \sum_{j=1}^M(x_{1j})^2 & \ldots & \sum_{j=1}^Mx_{1j}x_{nj} \\ \sum_{j=1}^Mx_{2j} & \sum_{j=1}^Mx_{1j}x_{2j} & \ldots & \sum_{j=1}^Mx_{2j}x_{nj} \\ \vdots & \vdots & \ddots &\vdots\\\sum_{j=1}^Mx_{nj}&\sum_{j=1}^Mx_{1j}x_{nj} &\ldots & \sum_{j=1}^M(x_{nj})^2\end{pmatrix}^{-1}\times\begin{pmatrix}\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{1j}] \\ \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{2j}] \\ \vdots\cr \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{nj}]\end{pmatrix}.
(14)

In the next post, we will find out how to apply the technique of linear least square error regression for linear functions of multiple variable to nonlinear functions of single variable.

Share


Launching TeamQuest Global Services

In the beginning of the year, a decision was made to update our professional services offering. The new Global Services organization will be responsible for developing, delivering and supporting a full suite of services centered on the core competences of TeamQuest. The ambition is to help our customers become even more successful and to further strengthen our position as thought leaders.

To achieve this, we will offer the following categories of services

  • Training Services
  • Implementation Services
  • Analysis Services
  • Managed Services
  • Strategic Services

With a comprehensive suite of services like this, we believe we will be able to say “Yes” to more things you customers ask for. It will also get us more involved with you, learning about your needs and use that knowledge to influence our future product directions.

Share


Burger Joints and Queuing Theory

One of the frequently asked questions I receive is regarding queuing theory and how it applies to computing infrastructures. I think a good way to explain how computer systems have evolved to address queuing issues is through a burger joint analogy.When I was a kid, there were numerous “mom and pop” burger joints along the roads. If you wanted a burger, you drove in, went up to the window, placed your order and paid your money. The people inside cooked up what you requested while you stood there.

At more popular burger joints, too many people had to wait because it was a one-in, one-out business model. The busier ones came up with the idea that they would have an “order” window and a “pick-up” window. This permitted multiple orders to be processed concurrently, reducing the line waiting to be served. The problem was that the order went in quickly, but during busy periods, people queued up at the “pick-up” window instead of the “order” window.

Then large chains used analysis tools (some computer-based) to determine the distribution of orders based on time-of-day and day-of-week. Based on their analyses, management adjusted manpower schedules and scheduled making up orders in advance. Heat lamps kept the food hot. At peak times, the burger joints could quickly satisfy demand, keeping order and pick-up queues low. Even if there was wasted food, it was acceptable because they were providing faster service and attracting a greater number of customers. The additional volume and associated profits more than offset the rare losses.

Computer technology addressing queuing issues has matured in similar fashion. Old computers using the DOS operating system could only process one job at a time. Then Multiple Virtual operating systems came about. Multiple jobs could be run, however they would vie for the same resources, causing queuing (thus elongated wait times).

With the advent of cheap memory in data storage (disk) controllers, you could fix data in storage at different points in time so that it was readily available and sped the execution of jobs. Read-ahead technology was developed at the same time, further reducing job execution times. Current technologies use a wide variety of the above solutions to speed execution of jobs and transactions.So as you can see, burger joints and computer queuing have a lot in common!

Until the next time

Ron

Share


Gartner Data Center Conference: CMDB Highlights

Thinking of implementing a CMDB?Maybe you should wait.According to Gartner, only 3-5% of organizations have a fully operational CMDB.From what I heard, even the vendors are struggling.  No one has a mature, comprehensive offering.I did learn that there were two components of a CMDB, the database and the dependency mapping tool; and that there were a number of vendors in various positions in the marketplace.

The Gartner presentation team, Ronni Colville and Patricia Adams, emphasized that you need to understand the business problem you are trying to solve before you even think about building a CMDB. You should have people and processes in place and be operational before you think about buying a tool or tools.

They also stressed that you need to set the correct expectations on implementation as it could take 18 months to two years or more to fully implement. From listening to Ronni and Patricia, it seems to me that CMDB and related tools are in their infancy, that much up-front thought needs to be done and a lot of internal discussions must happen before even thinking about building a CMDB. Considering all that, it might be better to focus on process and delay tool selection until they have matured further.

At this point most conference attendees are suffering from data overload. I am no exception.It will take me weeks to go back over my notes and think about everything that was said.

Gartner has out done themselves this year in the quality and applicability of the content.From my perspective, they deserve a standing ovation!

This is my last post of the conference. Hope you have found my posts interesting. For more details regarding these sessions, contact your Gartner representative.

Thanks for listening!

Ron

Share