Linear Regression: Exploring Quadratic Least Squares Optimization with Gurobipy

Introduction: Linear regression is a fundamental technique in statistics and machine learning for modeling the relationship between a dependent variable and one or more independent variables. In this blog post, we’ll explore how to use the Gurobi optimization library to perform linear regression, comparing the results with the popular scikit-learn implementation.

Installing Gurobi: To get started, ensure you have Gurobi installed by running the following command in your Python environment:

!pip install gurobipy &> /dev/null

Setting up the Problem: Let’s generate some random data for our linear regression problem. We’ll create a dataset with 40 examples and 3 features, including a bias term. The dependent variable y is a random vector.

import gurobipy as gp
import numpy as np
from gurobipy import GRB

m = 40  # number of examples
n = 3   # number of features
X = np.random.rand(m, n)
X = np.hstack((np.ones((m, 1)), X))  # add a ones column

y = np.random.randn(m, 1)

Building the Gurobi Model: Now, let’s set up the Gurobi optimization model. We create a linear regression model and define the objective function, which is the least squares loss. The model is then optimized.

model = gp.Model()
beta = model.addMVar(shape=n + 1, vtype=GRB.CONTINUOUS, lb=float('-inf'))

obj = y.T @ y - 2 * beta.T @ X.T @ y + beta.T @ X.T @ X @ beta

model.setObjective(obj, GRB.MINIMIZE)
model.optimize()

Displaying Gurobi Results: After optimization, we can display the coefficients of our linear regression model.

display(beta.X.tolist())

Scikit-learn Comparison: Next, we compare the Gurobi results with scikit-learn’s linear regression implementation.

from sklearn.linear_model import LinearRegression

clf = LinearRegression()
clf.fit(X[:, 1:], y)
parameters = [clf.intercept_[0]] + clf.coef_.flatten().tolist()
display(parameters)

Visualizing the Results: Finally, let’s visualize the actual vs. predicted values using matplotlib.

import matplotlib.pyplot as plt

plt.scatter(X[:, 1:].mean(axis=1), y, label='Actual (y)')
plt.plot(X[:, 1:].mean(axis=1), X @ np.asarray(parameters), c='red', label='Predicted (y_hat)')
plt.legend(loc='lower right')
plt.show()

Conclusion: In this blog post, we’ve explored how to implement linear regression using the Gurobi optimization library and compared the results with scikit-learn. Gurobi provides a powerful tool for optimization problems, and its use extends beyond linear regression to more complex models and scenarios.

Appendix: Derivation and Interpretation of the Linear Regression Objective Function

The expression \( (y - X B)^T (y - X B) \) encapsulates the squared residual sum in the context of linear regression. In this section, we will delve into the derivation and interpretation of this expression.

The linear regression objective function, denoted as \( J(B) \), is conventionally defined as the sum of squared residuals:

\[ J(B) = (y - X B)^T (y - X B) \]

Here, \( J(B) \) signifies the objective function to be minimized, \( B \) is the vector of coefficients, \( X \) is the input matrix, and \( y \) is the target vector.

Let’s dissect the expression \( (y - X B)^T (y - X B) \) step by step:

\[ (y - X B)^T (y - X B) = (y^T - B^T X^T) (y - X B) \]

Expanding the product:

\[ y^T y - y^T X B - B^T X^T y + B^T X^T X B \]

Rearranging the terms:

\[ y^T y - 2 B^T X^T y + B^T X^T X B \]

This expression precisely corresponds to the sum of squared residuals. When employed as the objective function in linear regression, the objective is to minimize this quantity. The optimal coefficients \( B \) are obtained by differentiating \( J(B) \) with respect to \( B \), setting the result to zero, and solving for \( B \).

In the context of the provided Gurobi code snippet:

obj = y.T @ y + (-2 * y.T @ X) @ beta + beta @ (X.T @ X) @ beta

The term \( (y - X B)^T (y - X B) \) is represented as \( y^T y - 2 B^T X^T y + B^T X^T X B \). Gurobi is utilized to determine the values of \( B \) that minimize this objective function, aligning with the fundamental principles of linear regression optimization.

The optimization problem represented by the objective function \( (y - X B)^T (y - X B) \) in the context of linear regression is a quadratic optimization problem. Specifically, it is a quadratic least squares optimization problem.

In a quadratic optimization problem, the objective function is quadratic, and the goal is to find the values of the decision variables (in this case, the coefficients \( B \)) that minimize or maximize the quadratic expression subject to any constraints.

In the given linear regression objective function, the term \( (y - X B)^T (y - X B) \) is a quadratic expression in terms of the coefficients \( B \). The optimization task is to adjust the values of \( B \) to minimize the sum of squared residuals, which is a fundamental objective in linear regression.