This version of the tutorial will use Python. This tutorial requires two Python libraries: Scikit-Learn and Matplotlib.
Start by creating a new Python project and by placing the helper.py file in the same folder. This file contains some simple functions that are not specific to Seldonian algorithms, but which will be useful to us:
Function Name | Description |
---|---|
tinv | tinv(p, nu) returns the 100(1-p ) percentile of the Student's t-distribution with nu degrees of freedom. |
stddev | stddev(v) provides the sample standard deviation of the vector v , with Bessel's correction: \(\operatorname{stddev}(v) = \sqrt{\frac{1}{|v|-1} \sum_{i=1}^{|v|}(v_i-\operatorname{mean}(v))^2}\), where \(\operatorname{mean}(v)=\frac{1}{|v|}\sum_{i=1}^{|v|} v_i\) is the sample mean of v . |
ttestUpperBound | ttestUpperBound(v, delta) returns a (1-delta) -confidence upper bound on the mean of the distribution that generated the i.i.d. values in the vector v . It assumes that \(\operatorname{mean}(v)\) is normally distributed. It is used when checking whether each of the behavioral constraints passes the safety test. |
predictTTestUpperBound | predictTTestUpperBound(v, delta, k) works similarly to ttestUpperBound, but returns a more conservative upper bound. This is useful to make the Seldonian algorithm less confident that a given candidate solution is safe, thus making the generated candidate solutions more conservative. Such behavior helps when searching for candidate solutions that are likely to pass the safety test. This function uses data in the vector v to compute all relevant statistics (mean and standard deviation) but assumes that the number of points being analyzed is k instead of \(|v|\).
|
Now that you have access to these basic functions, create your main file, main.py
, and copy in the following code:
from helper import * if __name__ == "__main__": np.random.seed(0) # Create the random number generator to use, with seed zero numPoints = 5000 # Let's use 5000 points (X,Y) = generateData(numPoints) # Generate the data # Create the behavioral constraints - each is a gHat function and a confidence level delta gHats = [gHat1, gHat2] # The 1st gHat requires MSE < 2.0. The 2nd gHat requires MSE > 1.25 deltas = [0.1, 0.1] (result, found) = QSA(X, Y, gHats, deltas) # Run the Quasi-Seldonian algorithm if found: print("A solution was found: [%.10f, %.10f]" % (result[0], result[1])) print("fHat of solution (computed over all data, D):", fHat(result, X, Y)) else: print("No solution found")
The function main()
, above, is set up to run a simple experiment. Read through the function main()
and notice that it relies on some things that we will need to write:
generateData
will be a function that generates a data set for us to run the algorithm on.gHat1
will be \(\hat g_1\) and gHat2
will be \(\hat g_2\).QSA
will be our quasi-Seldonian algorithm. The pair of objects returned by QSA
is the solution (first element) and a Boolean flag indicating whether a solution that satisfies all behavioral constraints was found (second element).
We now implement the regression problem that we defined earlier in the tutorial. The following functions should all be placed at the beginning of the file main.py
.
First, let us write the generateData
function, which samples data as described in the problem description:
# Generate numPoints data points def generateData(numPoints): X = np.random.normal(0.0, 1.0, numPoints) # Sample x from a standard normal distribution Y = X + np.random.normal(0.0, 1.0, numPoints) # Set y to be x, plus noise from a standard normal distribution return (X,Y)
Next, we write the function that takes in a solution \(\theta\) and an input \(X\), and produces as output the prediction of \(Y\). In other words, this function will implement \(\hat y(X,\theta)\):
# Uses the weights in theta to predict the output value, y, associated with the provided x. # This function assumes we are performing linear regression, so that theta has # two elements: the y-intercept (first parameter) and slope (second parameter) def predict(theta, x): return theta[0] + theta[1] * x
Next, we write the function \(\hat f\), which specifies our primary objective: to minimize the sample mean squared error. Since we are attempting to maximize \(\hat f\), however, we need to return the negative sample mean squared error, so that maximizing \(\hat f\) corresponds to minimizing the sample mean squared error:
# Estimator of the primary objective, in this case, the negative sample mean squared error def fHat(theta, X, Y): n = X.size # Number of points in the data set res = 0.0 # Used to store the sample MSE we are computing for i in range(n): # For each point X[i] in the data set ... prediction = predict(theta, X[i]) # Get the prediction using theta res += (prediction - Y[i]) * (prediction - Y[i]) # Add the squared error to the result res /= n # Divide by the number of points to obtain the sample mean squared error return -res # Returns the negative sample mean squared error
Next, we write the functions \(\hat g_1\) and \(\hat g_2\) that will be provided as input to the Seldonian algorithm. Recall from the Simple Problem tutorial that for every \(i\) and \(j\): $$ \hat g_{1,j}(\theta,D) = (\hat y(X_j,\theta)-Y_j)^2-2.0, $$ and $$ \hat g_{2,j}(\theta,D) = 1.25-(\hat y(X_j,\theta)-Y_j)^2. $$ Compare the above definitions to the code below.
# Returns unbiased estimates of g_1(theta), computed using the provided data def gHat1(theta, X, Y): n = X.size # Number of points in the data set res = np.zeros(n) # We will get one estimate per point; initialize res to store these estimates for i in range(n): prediction = predict(theta, X[i]) # Compute the prediction for the i-th data point res[i] = (prediction - Y[i]) * (prediction - Y[i]) # Compute the squared error for the i-th data point res = res - 2.0 # We want the MSE to be less than 2.0, so g(theta) = MSE-2.0 return res # Returns unbiased estimates of g_2(theta), computed using the provided data def gHat2(theta, X, Y): n = X.size # Number of points in the data set res = np.zeros(n) # We will get one estimate per point; initialize res to store these estimates for i in range(n): prediction = predict(theta, X[i]) # Compute the prediction for the i-th data point res[i] = (prediction - Y[i]) * (prediction - Y[i]) # Compute the squared error for the i-th data point res = 1.25 - res # We want the MSE to be at least 1.25, so g(theta) = 1.25-MSE return res
Later in this tutorial we will want the ordinary least-squares solution to be used as a starting point during the search for a candidate solution. The following code implements least squares linear regression:
# Run ordinary least squares linear regression on data (X,Y) def leastSq(X, Y): X = np.expand_dims(X, axis=1) # Places the input data in a matrix Y = np.expand_dims(Y, axis=1) # Places the output data in a matrix reg = LinearRegression().fit(X, Y) theta0 = reg.intercept_[0] # Gets theta0, the y-intercept coefficient theta1 = reg.coef_[0][0] # Gets theta0, the slope coefficient return np.array([theta0, theta1])
We now have all of the libraries that we need and all of the functions to implement the problem that we specified earlier. Now we're ready to start writing our Seldonian algorithm! From here it's easy—by line-count, you've already written most of the code.