This site uses cookies.

By continuing you are agreeing to receive all cookies on this site. To learn more about how we use cookies, go to our privacy page.

May 31, 2013

A simple form of multi variable function is a 2-variable function expressed as

[math]!f(x_1,x_2)=a_0+a_1x_1+a_2x_2. [/math](1)

The Sum of Square Error function for a set of measurements [math]\{(\{x_{1j},x_{2j}\},\hat{f}(x_{1j},x_{2j}))|j=1,2,3,...,M\},[/math] where [math]\hat{f}[/math] is the measurement of [math]f[/math] and [math]M[/math] is the number of measurements, is

[math]!S(a_0,a_1,a_2)=\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j})-a_0-a_1x_{1j}-a_2x_{2j}]^2.[/math](2)

The expanded form of the right hand side (RHS) of Eq.(2) is

[math]!\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j})-a_0-a_1x_{1j}-a_2x_{2j}]^2=a_0^2\sum_{j=1}^M1+ a_1^2\sum_{j=1}^Mx_{1j}^2+a_2^2\sum_{j=1}^Mx_{2j}^2-2a_0\sum_{j=1}^M\hat{f}(x_{1j},x_{2j})-2a_1\sum_{j=1}^M\hat{f}(x_{1j},x_{2j})x_{1j}-2a_2\sum_{j=1}^M\hat{f}(x_{1j},x_{2j})x_{2j}+2a_0a_1\sum_{j=1}^Mx_{1j}+2a_0a_2\sum_{j=1}^Mx_{2j}+2a_1a_2\sum_{j=1}^Mx_{1j}x_{2j}+\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j})]^2.[/math](3)

To further simplify the example, assume that the constant term in the linear function of Eq.(1) is zero (i.e., [math]a_0=0[/math]). Then Eq.(3) is rewritten as

[math]!\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j})-a_1x_{1j}-a_2x_{2j}]^2=a_1^2\sum_{j=1}^Mx_{1j}^2+a_2^2\sum_{j=1}^Mx_{2j}^2-2a_1\sum_{j=1}^M\hat{f}(x_{1j},x_{2j})x_{1j}-2a_2\sum_{j=1}^M\hat{f}(x_{1j},x_{2j})x_{2j}+2a_1a_2\sum_{j=1}^Mx_{1j}x_{2j}+\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j})]^2.[/math](4)

In the previous post, the graph was like a vertical slice of a bowl. Figure 1 shows inside of the seed holder of an avocado as bowl shape.

[caption id="attachment_660" align="alignleft" width="600" caption="Figure 1: Seed holder surface as an example of a bowl shape."][/caption]

Figure 2 shows how a vertical slice of bowl would look like in the length wise quarter slice of avocado.

[caption id="attachment_710" align="alignleft" width="600" caption="Figure 2: Seed holder surface of the quarter slice as the shape of a vertical slice of a bowl."][/caption]

In this case, the graph is like a 3-dimensional bowl as shown in Figure 3. The heatmap surface of Figure 3 is similar to the seed holder surface of the half of an avocado in Figure 1. In Figure 3, the

[caption id="attachment_658" align="alignleft" width="554" caption="Figure 3: Quadratic polynomial Square Error function with positive leading coefficient for a function of 2 random variables"][/caption]

graph of [math]4x^2+4y^2-8x-8y+2xy+4[/math] is displayed as an example of a polynomial of two input variables. To compare this as a function with the Sum of Square Error function of a function with two input random variables, replace [math]x[/math] by [math]a_1[/math] and [math]y[/math] by [math]a_2[/math]. The minimum is a point at the bottom of the bowl of Figure 3. The base of the bowl is not a perfect circle in Figure 3. The heat map of the surface in the top part of Figure 3 shows the lowest value of the surface is possibly a negative value. In the lower part of Figure 3, the red color contour line at the center is for -0.1. A contour line is a line joining all the points at same height at z-axis assuming that the bottom of the bowl is touching the horizontal plane of x-axis and y-axis. Any smaller value is so small that drawing an ellipse is not possible. Also, it is not possible to draw a point size ellipse, which would represent the minimum point of the surface of Figure 3. The smallest ellipse representing the value of -0.1 in Figure 3 indicates that the lowest point can be found in the area encircled by the contour line for -0.1. From the lower part of Figure 3, the center of the contour line at level -0.1 seems to be very close to the origin of the horizontal surface formed by x-axis and y-axis. The values of the other contour lines are listed in the right side above the lower part of Figure 3.

In the previous post, the conditional expression at the minimum for one random variable was

[math]\frac{dS}{da}=0.[/math]

An extension of the above condition to take into account 2 random variables will be

[math]!\frac{\partial S(a_1,a_2)}{\partial a_i}=0, \text{ for } i=1,2.[/math](5)

To test this on the function of Figure 3, write [math]S(a_1,a_2)[/math] as

[math]!S(a_1,a_2)=4a_1^2+4a_2^2-8a_1-8a_2+2a_1a_2+4.[/math](6)

When the condition of Eq.(5) is applied to the example graph of Figure 3, the following equations are obtained

[math]!\frac{\partial}{\partial a_1}(4a_1^2+4a_2^2-8a_1-8a_2+2a_1a_2+4)=8a_1-8+2a_2=0.[/math](7a)

[math]!\frac{\partial}{\partial a_2}(4a_1^2+4a_2^2-8a_1-8a_2+2a_1a_2+4)=8a_2-8+2a_1=0.[/math](7b)

As a result, the following system of linear equations are found

[math]!4a_1+a_2=4.[/math](8a)

[math]!a_1+4a_2=4.[/math](8b)

The solution of the system of linear equations of Eq.(8) is [math]a_1=0.8[/math] and [math]a_2=0.8[/math] or the point [math](0.8,0.8)[/math] on the horizontal plane formed by x-axis and y-axis in Figure 3. So, the precise point of the minimum is at the point [math](0.8,0.8)[/math] in the x-y plane of Figure 3 and is encircled by the smallest red contour at level -0.1 near the origin of the contour plots in the lower part. The precise value of the minimum is found by substituting [math]a_1[/math] by 0.8 and [math]a_2[/math] by 0.8 in the expression of Eq.(6)

[math]!4\times(0.8)^2+4\times(0.8)^2-8\times(0.8)-8\times(0.8)+2\times(0.8)\times(0.8)+4=-2.4.[/math]

The above derived value of -2.4 as minimum explains why the contour line is so small for the value -0.1.

A generalized multi variable linear function is formally stated as

[math]!f(x_1,x_2,...,x_n)=a_0+a_1x_1+a_2x_2+...+a_nx_n.[/math](9)

The coefficients [math]a_0,a_1,a_2,...,a_n[/math] need to be estimated using Linear Least Square Error Regression from a set of measurements of the random variables [math]x_1,x_2,...,x_n[/math] and the corresponding measurements of [math]f(x_1,x_2,...,x_n)[/math]. The theory explained in the previous post is extended here to deal with the multi variable function of Eq.(9). As was done in the previous post, let the measurement of [math]f(x_1,x_2,...,x_n)[/math] be denoted by [math]\hat{f}(x_1,x_2,...,x_n)[/math].

The sum of square of error function is expressed as

[math]!S(a_0,a_1,a_2,...,a_n)=\sum_{j=1}^M[\hat{f}(x_{1j},x_{2j},...,x_{nj})-a_0-a_1x_{1j}-a_2x_{2j}-...-a_nx_{nj}]^2.[/math](10)

An example of [math]n=2[/math] is explained in the beginning and in Figure 3 to have an understanding of the graph of [math]S(a_0,a_1,a_2,...,a_n)[/math] of Eq.(10), as this graph will need more than 3-dimension and is not possible to display.

The extension of the conditional expression of Eq.(5) is

[math]!\frac{\partial S(a_0,a_1,a_2,...,a_n)}{\partial a_i}=0,\text{for } i=0,1,2,...,n.[/math](11)

Substituting the RHS of Eq.(10) into Eq.(11)and expressing [math](x_1,x_2,x_3,...,x_n)[/math]as [math]\overrightarrow{x}[/math] we obtain

[math]!\frac{\partial S(a_0,a_1,a_2,...,a_n)}{\partial a_i}=\frac{\partial}{\partial a_i}\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)-a_0-a_1x_{1j}-a_2x_{2j}-...-a_nx_{nj}]^2=0.[/math]

[math]!\Rightarrow 2\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)-a_0-a_1x_{1j}-...-a_nx_{nj}]\frac{\partial}{\partial a_i}[\hat{f}(\overrightarrow{x}_j)-a_0-...-a_ix_{ij}-...-a_nx_{nj}]=0.[/math]

[math]!\Rightarrow 2\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)-a_0-a_1x_{1j}-...-a_nx_{nj}][\frac{\partial}{\partial a_i}\hat{f}(\overrightarrow{x}_j)-\frac{\partial}{\partial a_i}a_0-...-\frac{\partial}{\partial a_i}a_ix_{ij}-...-\frac{\partial}{\partial a_i}a_nx_{nj}]=0.[/math]

[math]!\Rightarrow 2\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)-a_0-a_1x_{1j}-...-a_ix_{ij}-...-a_nx_{nj}][0-0-...-0-x_{ij}-0-...-0]=0.[/math]

[math]!\Rightarrow 2\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)-a_0-a_1x_{1j}...-a_ix_{ij}-...-a_nx_{nj}](-x_{ij})=0.[/math]

[math]\Rightarrow -2\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{ij}-a_0x_{ij}-a_1x_{1j}-...-a_ix^2_{ij}-...-a_nx_{nj}x_{ij}]=0.[/math]

Dividing both sides of the above expression by -2

[math]!\Rightarrow \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{ij}]-\sum_{j=1}^M[a_0x_{ij}]-\sum_{j=1}^M[a_1x_{1j}x_{ij}]-...-\sum_{j=1}^M[a_ix^2_{ij}]-...-\sum_{j=1}^M[a_nx_{nj}x_{ij}]=0.[/math]

[math]!\Rightarrow a_0\sum_{j=1}^Mx_{ij}+a_1\sum_{j=1}^M(x_{1j}x_{ij})+...+a_i\sum_{j=1}^Mx^2_{ij}+...+a_n\sum_{j=1}^M(x_{nj}x_{ij})=\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{ij}].[/math]

The above expression can be expressed as a matrix equation as shown below.

[math]!\left(\begin{matrix}\sum_{j=1}^Mx_{1j} & \sum_{j=1}^M(x_{1j})^2 & \ldots & \sum_{j=1}^Mx_{1j}x_{nj} \\ \sum_{j=1}^Mx_{2j} & \sum_{j=1}^Mx_{1j}x_{2j} & \ldots & \sum_{j=1}^Mx_{2j}x_{nj} \\ \vdots & \vdots & \ddots &\vdots\\\sum_{j=1}^Mx_{nj}&\sum_{j=1}^Mx_{1j}x_{nj} &\ldots & \sum_{j=1}^M(x_{nj})^2\end{matrix}\right)\times\begin{pmatrix}a_0 \\ a_1 \\ \vdots \\ a_n\end{pmatrix}=\begin{pmatrix}\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{1j}] \\ \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{2j}] \\ \vdots\cr \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{nj}]\end{pmatrix}.[/math](12)

The solution of the above matrix equation will give the desired estimates of [math]a_0,a_1,a_2,...,a_n[/math]. The solution involves finding the inverse of of the matrix

[math]!\left(\begin{matrix}\sum_{j=1}^Mx_{1j} & \sum_{j=1}^M(x_{1j})^2 & \ldots & \sum_{j=1}^Mx_{1j}x_{nj}\\\sum_{j=1}^Mx_{2j} & \sum_{j=1}^Mx_{1j}x_{2j} & \ldots & \sum_{j=1}^Mx_{2j}x_{nj} \\ \vdots & \vdots & \ddots & \vdots \\ \sum_{j=1}^Mx_{nj} & \sum_{j=1}^Mx_{1j}x_{nj} & \ldots & \sum_{j=1}^M(x_{nj})^2\end{matrix}\right).[/math](13)

Once the inverse is found the solution is found by multiplying both sides of Eq.(12) by the inverse matrix of Eq.(13) and obtain the solution

[math]!\begin{pmatrix}a_0 \\ a_1 \\ \vdots \\ a_n\end{pmatrix}=\begin{pmatrix}\sum_{j=1}^Mx_{1j} & \sum_{j=1}^M(x_{1j})^2 & \ldots & \sum_{j=1}^Mx_{1j}x_{nj} \\ \sum_{j=1}^Mx_{2j} & \sum_{j=1}^Mx_{1j}x_{2j} & \ldots & \sum_{j=1}^Mx_{2j}x_{nj} \\ \vdots & \vdots & \ddots &\vdots\\\sum_{j=1}^Mx_{nj}&\sum_{j=1}^Mx_{1j}x_{nj} &\ldots & \sum_{j=1}^M(x_{nj})^2\end{pmatrix}^{-1}\times\begin{pmatrix}\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{1j}] \\ \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{2j}] \\ \vdots\cr \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{nj}]\end{pmatrix}.[/math](14)

In the next post, we will find out how to apply the technique of linear least square error regression for linear functions of multiple variable to nonlinear functions of single variable.

Tags:

Category: uncategorized