• Home
    • English
    • News
    • Blog
    • What is Nonlinear Regression (Part III):Linear Least Square Error Regression of Single Variable Nonlinear Function

    July 16, 2013
    A simple form of single variable nonlinear function is a quadratic polynomial function expressed as


    Suppose this is a general model of throughput as a function [math]f(x)[/math] of the population or the number of concurrent users in a server as [math]x[/math]. Without knowing the constants [math]a[/math]and [math]b[/math], the model cannot be used. If a set of measurements


    is available then the linear least square error regression can be used to estimate [math]a[/math]and [math]b[/math]of Eq.(1) from the measurements. Here [math]\hat{f}(x_{j})[/math]is the measurement and distinguished from [math]f(x)[/math]by the 'hat' over [math]f[/math].

    In the previous posts (part I and part II), the functions were all linear, meaning that the exponent or the power of the independent variable (i.e. [math]x[/math]in Eq.(1)) was 1. How do you apply the linear least square error regression to a nonlinear function which has a value 2 as exponent or power of the independent variable? This obstacle is overcome by using the definition of random variable in statistics.

    A formal definition of random variable states that "a random variable is real valued function defined on a sample space" in page 34 of "Probability Statistics and Queuing Theory with Computer Science Applications" by Arnold O Allen. Since, both the throughput and the number of concurrent users are random variables on the sample space or the set defined in Eq.(2). A new random variable [math]x_2[/math]can be introduced as


    The random variable [math]x[/math]in Eq.(1) is same as the sample space. Replacing [math]x^2[/math]of Eq.(1) by [math]x_2[/math]of Eq.(3), the function [math]f(x)[/math]is rewritten as linear function of two random variables


    The methodology of previous post can be applied to the function of Eq.(4) to get the estimates of [math]a[/math]and [math]b[/math]. This approach can be extended to any polynomial function of the form


    by turning each term of the series polynomial of above Eq.(5) into a new random variable to get a multi variable linear function


    where [math]\tilde{x}_i=x^{\frac{1}{i}}[/math]for [math]i=2,3,4,...,n[/math]and [math]x_j=x^j[/math]for [math]j=2,3,...,m[/math].

    A somewhat different example is a function of exponential form


    To be able to estimate [math]a[/math]and [math]b[/math]by linear least square error regression, the function has to be transformed into a linear function of new random variables

    [math]!\ln[f(x)]=\ln[ae^{bx}]=\ln[a]+\ln[e^{bx}]\text{ (because } \ln[p\times q]=\ln[p]+\ln[q]\text{)}[/math]

    [math]!\Rightarrow \ln[f(x)]=\ln[a]+bx\ln[e]\text{ (because }\ln[g^k]=k\ln[g]\text{)}[/math]

    [math]!\Rightarrow \ln[f(x)]=\ln[a]+bx\text{ (because }\ln[e]=1\text{)}[/math](8)

    Define a new function or random variable


    and a symbolic substitution of the constant


    Using the new constant of Eq.(11) and the new function of Eq.(10), the expression in Eq.(9) can be written as


    Using the methodology of the first post of this series for single variable linear function, the estimates of [math]b_0[/math]and [math]b[/math]of Eq.(12) can be found from a sample space of [math]\{(x_k,\hat{f}(x_k)):k=1,2,3,...,M\}[/math]. The actual estimate of [math]a[/math]can be found from the estimate of [math]b_0[/math]using the inverse relation of Eq.(11) as shown below


    A more complicated but real life example is to estimate the constants [math]\sigma[/math]and [math]\kappa[/math]of Universal Scalability Law

    [math]C(n)=\frac{n}{1+\sigma (n-1)+\kappa n(n-1)}[/math](12)

    where [math]C(n)[/math]is the scalability and [math]n[/math]is the number of concurrent users. The measured data is in Table 1. It is not obvious what new random variables to define such that the constants [math]\sigma[/math]and [math]\kappa[/math]of Eq.(12) can be estimated from the data of Table 1 using the linear least square error regression. The process will be explained in two stages. The first stage was initially developed by Dr. Neil Gunther in his book, titled "Guerrilla Capacity Planning''. The terms of the formula of Eq.(12) can be rearranged as

    [math]!C(n)=\frac{n}{1+\sigma (n-1)+\kappa n(n-1)}.[/math]

    [math]!\Rightarrow\frac{C(n)}{n}=\frac{1}{1+\sigma (n-1)+\kappa n(n-1)}\text{ [divide by }n\text{ on both sides]}.[/math]

    [math]!\Rightarrow\frac{n}{C(n)}=1+\sigma (n-1)+\kappa n(n-1)\text{ [equating the reciprocals of both sides]}.[/math]

    [math]!\Rightarrow\frac{n}{C(n)}-1=\sigma (n-1)+\kappa n(n-1)\text{ [subtracting 1 from both sides]}.[/math]

    [math]!\Rightarrow\frac{n}{C(n)}-1=\sigma (n-1)+\kappa (n-1+1)(n-1)\text{ [writing }n=n-1+1\text{ on the right hand side]}.[/math]

    [math]!\Rightarrow\frac{n}{C(n)}-1=\sigma (n-1)+\kappa (n-1)^2+\kappa(n-1)\text{ [opening the parenthesis of }(n-1+1)\text{]}.[/math](13)

    Define new random variables



    Rewrite Eq.(13) in terms of [math]X[/math]and [math]Y[/math]using the definitions of Eq.(14) and Eq.(15)

    [math]!\Rightarrow Y=\sigma X+\kappa X^2+\kappa X.[/math](16)

    The first stage is complete at this point.

    The second stage is from the article, titled "Mythbuster for the Guerrillas" that has been published in the proceedings of CMG 2012 International Conference. In this stage, rearrange the expression of Eq.(16) further into a new expression

    [math]!\Rightarrow Y=\sigma X+\kappa (X^2 + X).[/math](17)

    Define a new random variable [math]X_1[/math] as


    In terms of [math]X_1[/math] and [math]X[/math], the expression of Eq.(17) using the definition of Eq.(18) is

    [math]!f(X)= Y=\sigma X+\kappa X_1.[/math](19)
    From the data of Table 1, compute [math]X_1[/math]using the following expression derived by combining Eq.(18) and Eq.(15)


    Similarly, from the data of Table 1 compute [math]Y[/math]using the definition of Eq.(14) and compute [math]X[/math]using the definition of Eq.(15)

    [math]!Y=\frac{n}{C(n)}-1, X=n-1.[/math]

    The values of [math]Y,X[/math]and [math]X_1[/math]corresponding to the values of [math]n[/math]is also included in Table 1.

    The matrix formula to estimate coefficients of a multi-variable linear function that was derived in Eq.(12) of the previous post is

    [math]!\left(\begin{matrix}\sum_{j=1}^Mx_{1j} & \sum_{j=1}^M(x_{1j})^2 & \ldots & \sum_{j=1}^Mx_{1j}x_{nj} \\ \sum_{j=1}^Mx_{2j} & \sum_{j=1}^Mx_{1j}x_{2j} & \ldots & \sum_{j=1}^Mx_{2j}x_{nj} \\ \vdots & \vdots & \ddots &\vdots\\\sum_{j=1}^Mx_{nj}&\sum_{j=1}^Mx_{1j}x_{nj} &\ldots & \sum_{j=1}^M(x_{nj})^2\end{matrix}\right)\times\begin{pmatrix}a_0 \\ a_1 \\ \vdots \\ a_n\end{pmatrix}=\begin{pmatrix}\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{1j}] \\ \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{2j}] \\ \vdots\cr \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{nj}]\end{pmatrix}.[/math]

    In the above matrix, the left most column of the left hand side matrix is for a constant term, [math]a_0[/math]. Since there is no constant term in Eq.(19) that column should be removed and get a reduced column matrix equation as following

    [math]!\left(\begin{matrix}\sum_{j=1}^M(x_{1j})^2&\ldots&\sum_{j=1}^Mx_{1j}x_{nj}\cr\sum_{j=1}^Mx_{1j}x_{2j} &\ldots&\sum_{j=1}^Mx_{2j}x_{nj}\cr\vdots & \ddots &\vdots\cr\sum_{j=1}^Mx_{1j}x_{nj} &\ldots &\sum_{j=1}^M(x_{nj})^2\end{matrix}\right)\left(\begin{matrix}a_1\cr \vdots\cr a_n\end{matrix}\right)=\left(\begin{matrix}\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{1j}]\cr\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{2j}]\cr \vdots\cr \sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{nj}]\end{matrix}\right).[/math]

    There are only two random variables in the expression of Eq.(19), so [math]n=2[/math]and the the above matrix equation should have the matrix with 2 column and 2 rows on the left hand side

    [math]!\left(\begin{matrix}\sum_{j=1}^M(x_{1j})^2&\sum_{j=1}^Mx_{1j}x_{2j}\cr\sum_{j=1}^Mx_{1j}x_{2j} &\sum_{j=1}^M(x_{2j})^2\end{matrix}\right)\left(\begin{matrix}a_1\cr a_2\end{matrix}\right)=\left(\begin{matrix}\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{1j}]\cr\sum_{j=1}^M[\hat{f}(\overrightarrow{x}_j)x_{2j}]\end{matrix}\right).[/math]

    The [math]x_1[/math]in the above matrix equation corresponds to [math]X[/math]and [math]x_2[/math] in the above matrix equation corresponds to [math]X_1[/math]in Eq.(19). Also, [math]\hat{f}(\overrightarrow{x}_j)[/math]in the above matrix equation corresponds to [math]Y[/math]in Eq.(19) and [math]a_1[/math]and [math]a_2[/math]in the above matrix equation correspond to [math]\sigma[/math]and [math]\kappa[/math]respectively in Eq.(19). So, the final matrix equation to be used for estimating [math]\sigma[/math]and [math]\kappa[/math]of Eq.(19) is [math]!\left(\begin{matrix}\sum_{j=1}^M(X_{j})^2&\sum_{j=1}^MX_{j}X_{1j}\cr\sum_{j=1}^MX_{j}X_{1j} &\sum_{j=1}^M(X_{1j})^2\end{matrix}\right)\left(\begin{matrix}\sigma\cr\kappa\end{matrix}\right)=\left(\begin{matrix}\sum_{j=1}^M[Y_jX_{j}]\cr\sum_{j=1}^M[Y_jX_{1j}]\end{matrix}\right).[/math]

    The above can be written as a system of two linear equations of two unknowns


    The final solutions are



    From the data in Table 2, the following sums are computed






    These sums are then used to compute the estimates of




    from the formulas of Eq.(21) and Eq.(22) respectively. Let the user load at which throughput is maximum be denoted by [math]N_{\max}[/math]. A formula for computing it has been derived by Dr.Neil Gunther as a corollary from the Universal Scalability Law.


    Substituting the estimates of [math]\sigma[/math]and  [math]\kappa[/math] found here into the formula of Eq.(23) the estimate for [math]N_{\max}[/math] is


    [caption id="attachment_771" align="alignleft" width="772" caption="Figure 1: Nonlinear model based on Universal Scalability Law with the estimates of the constants from the data of Table 1"][/caption]

    A graphical representation of the estimated Universal Scalability Law based model from the data of Table 1 is shown in Figure 1. In this figure, the red color line is the model, the boxes connected with dashed lines are the measurements. The maxim throughput is at the point where the dashed vertical line with arrow at the head at [math]n=80.[/math]

    There are some other factors that needs to be considered before trusting the result and those will be discussed in the next post.

    Category: uncategorized