AI Safety · Tutorial 4

Coding Environment Setup

This version of the tutorial will use C++. This tutorial assumes that you are already familiar with C++ and have your own toolchain/IDE. We have tested the code in this tutorial using recent versions of gcc and MSVC. This tutorial uses the Eigen library for linear algebra and uses Boost to create a C++ version of Matlab's tinv function. For instructions regarding installing Eigen, click here. For instructions regarding installing Boost, click here.

Create a new C++ project, and place HelperFunctions.hpp in the include directory. This file contains some simple functions that are not specific to Seldonian algorithms, but which will be useful to us:

Function Name	Description
tinv	`double tinv(double p, unsigned int nu)` returns the 100(1-`p`) percentile of the Student t distribution with `nu` degrees of freedom. This is a C++ implementation of Matlab's tinv function. This is the only function that uses Boost. If you have your own tinv implementation that does not use Boost, then you could use that here and avoid installing Boost.
stddev	`double stddev(const VectorXd& v)` provides the sample standard deviation of the vector `v`, with Bessel's correction: $\operatorname{stddev}(v)\sqrt{\frac{1}{\|v\|-1} \sum_{i=1}^{\|v\|}(v_i-\operatorname{mean}(v))^2}$, where $\operatorname{mean}(v)=\frac{1}{\|v\|}\sum_{i=1}^{\|v\|} v_i$ is the sample mean of `v`.
ttestUpperBound	`double ttestUpperBound(const VectorXd& v, const double& delta, const int numPoints = -1)` returns a (1-`delta`)-confidence upper bound on the mean of the distribution that generated the independent and identically distributed values in the vector `v`, assuming that $\operatorname{mean}(v)$ is normally distributed. If `numPoints` is provided as an argument, then this function predicts what its output would be if it were called with a new vector `v` that contains `numPoints` samples from the same distribution that generated `v`. This functionality will be useful when the candidate selection mechanism is predicting what the future output of the safety test will be.
CMAES	`VectorXd CMAES(...)` is an implementation of the black-box optimization algorithm CMA-ES. Its inputs are: `initialMean`: The starting point for the search. `initialSigma`: The initial standard deviation for sampling solutions around initialMean. This should generally be set slightly larger than the width of the region you hope to search. `numIterations`: The number of iterations of CMA-ES to run. Larger values often result in better solutions, but requires more computation time. `f`: The function to be minimized or maximized. This function should have inputs `theta`, `params`, and `generator`. Here `theta` is the point to be evaluated, `params` should contain any objects that the objective function requires (such as data), and `generator` should be the random number generator to use. `params`: This object will be passed through CMA-ES to all of the calls to `f`. `minimize`: If true, CMA-ES attempts to minimize `f`. If false, CMA-ES attempts to maximize f. `generator`: The random number generator to use. This implementation of CMA-ES is quite basic. It does not include restarts and is not a state-of-the-art variant like BIPOP-aCMA-ES.

Create main.cpp and copy in the following code:

// Standard includes that should come with your compiler
#include <iostream>     // For console i/o
#include <vector>       // For vectors, not including linear algebra
#include <random>       // For random number generation

// Additional library that you should have downloaded.
#include <Eigen/Dense>

#include "HelperFunctions.hpp"            // General functions not specific to our Seldonian algorithm implementation

using namespace std;                      // To avoid writing std::vector all the time
using namespace Eigen;                    // To avoid writing Eigen::VectorXd all the time

// We will place code for the Seldonian algorithm here.

// Entry point for the program
int main(int argc, char* argv[]) {
  mt19937_64 generator(0);                // Create the random number generator to use, with seed zero
  unsigned int numPoints = 5000;          // Let's use 5000 points
  vector<Point> Data = generateData(numPoints, generator);              // Generate the data

  // Create the behavioral constraints - each is a gHat function and a confidence level delta. Put these in vector objects, one element per constraint
  vector<VectorXd(*)(const VectorXd&, const vector<Point>&)> gHats(2);  // The array of gHat functions to use
  gHats[0] = gHat1;                       // This gHat requires the MSE to be less than 2.0
  gHats[1] = gHat2;                       // This gHat requires the MSE to be at least 1.25
  vector<double> deltas(2, 0.1);          // The array of confidence levels, delta, to use. Initialize with two values, both equal to 0.1.
  
  pair<VectorXd, bool> result = QSA(Data, gHats, deltas, generator);    // Run the Seldonian algorithm.
  if (result.second) cout << "A solution was found: " << result.first.transpose() << endl;
  else cout << "No solution found." << endl;
}

The code above includes the Eigen library and HelperFunctions.hpp (which in turn includes Boost). The function main(...) is set up to run a simple experiment. Read through the function main, which relies on some things that we will need to write. Point will be an object that holds one data point, generateData will be a function that generates a data set for us to run the algorithm on, gHat1 will be $\hat g_1$, gHat2 will be $\hat g_2$, and QSA will be our quasi-Seldonian algorithm. The pair of objects returned by QSA is the solution (first element) and a Boolean flag indicating whether a solution was found (second element). The first element is only guaranteed to be set if the second element is true (if a solution was found).

Problem Implementation

We now implement the problem that we defined earlier in the tutorial [go back]. The following functions should all be placed above the function int main(...). First, we define a Point object that will contain a single data point:

// We will store individual data points in this object
struct Point {
  double x;                 // Input value
  double y;                 // Output value
};

Next, we write the generateData function, which samples data as described in the problem description:

// Generate numPoints data points. Here generator is the random number generator to use
vector<Point> generateData(int numPoints, mt19937_64 & generator)
{
  vector<Point> result(numPoints);          // Create the vector of data points that we will return, of length numPoints
  normal_distribution<double> d(0, 1);      // Create a standard normal distribution (mean 0, variance 1)
  for (Point& p : result) {                 // Loop over each point p in result
    p.x = d(generator);                     // Sample x from a standard normal distribution
    p.y = p.x + d(generator);               // Set y to be x, plus noise from a standard normal distribution
  }
  return result;
}

Next, we write the function that takes in a solution $\theta$ and an input $X$, and produces as output the prediction of $Y$, i.e., this function implements $\hat y(X,\theta)$:

// Using the weights in theta, predict the y-value associated with the provided x.
// This function assumes we are performing linear regression, so that theta has
// two elements, the slope (first parameter) and y-intercept (second parameter)
double predict(const VectorXd & theta, const double & x) {
  return theta[0] * x + theta[1];
}

Next, we write the function $\hat f$, which specifies the primary objective: minimize the sample mean squared error. Since we attempt to maximize $\hat f$, we return the negative sample mean squared error, so that maximizing $\hat f$ corresponds to minimizing the sample mean squared error:

// Estimator of the primary objective, in this case, the negative sample mean squared error
double fHat(const VectorXd& theta, const vector<Point> & Data) {
  double result = 0, prediction;       // We will store the sample MSE in result. Prediction will store the prediction for each point in the data set
  for (const Point& p : Data) {        // Loop over points p in the data set (Data)
    prediction = predict(theta, p.x);  // Get the prediction using theta
    result += (prediction - p.y) * (prediction - p.y);  // Add the squared error to result
  }
  result /= (double)Data.size();       // We want the sample mean squared error, not the sum of squared errors, so divide by the number of samples
  return -result;                      // Return the value that we have computed
}

Next, we write the functions $\hat g_1$ and $\hat g_2$, which will be provided as input to the Seldonian algorithm. Recall from the Simple Problem tutorial that for every $i$ and $j$: $$ \hat g_{1,j}(\theta,D) = (\hat y(X_j,\theta)-Y_j)^2-2.0, $$ and $$ \hat g_{2,j}(\theta,D) = 1.25-(\hat y(X_j,\theta)-Y_j)^2. $$ Compare the above definitions to the code below.

// Returns unbiased estimates of g_1(theta), computed using the provided data
VectorXd gHat1(const VectorXd& theta, const vector<Point>& Data) {
  VectorXd result(Data.size());                     // We will get one estimate per data point, so initialize the result to have length equal to the number of data points
  for (unsigned int i = 0; i < Data.size(); i++) {  // Loop over the data points
    double prediction = predict(theta, Data[i].x);  // Compute the prediction for the i'th data point
    result[i] = (Data[i].y - prediction) * (Data[i].y - prediction);  // Compute the squared error for the i'th data point, and store in the i'th element of result.
  }
  result.array() -= 2.0;                            // We want the MSE to be less than 2.0, so g(theta) = MSE-2.0.
  return result;                                    // Return the result that we have computed
}

// Returns unbiased estimates of g_2(theta), computed using the provided data
VectorXd gHat2(const VectorXd& theta, const vector<Point>& Data) {
  VectorXd result(Data.size());                     // We will get one estimate per data point, so initialize the result to have length equal to the number of data points
  for (unsigned int i = 0; i < Data.size(); i++) {  // Loop over the data points
    double prediction = predict(theta, Data[i].x);  // Compute the prediction for the i'th data point
    result[i] = (Data[i].y - prediction) * (Data[i].y - prediction);  // Compute the squared error for the i'th data point, and store in the i'th element of result.
  }
  result.array() = 1.25 - result.array();            // We want the MSE to be at least 1.25, so g(theta) = 1.25-MSE.
  return result;                                    // Return the result that we have computed
}

Finally, later we will want the ordinary least-squares solution to use as a starting point during the search for a candidate solution. This code implements least squares linear regression:

// Run ordinary least squares linear regression
VectorXd leastSquares(const vector<Point> & Data) {
  // Put data into an input matrix X (Data.size() rows, 2 cols), and vector y (of Data.size()).
  MatrixXd X = MatrixXd::Ones(Data.size(), 2);                // Initialize X to be a matrix of Data.size() rows and 2 cols, filled with ones.
  VectorXd y(Data.size());                                    // Initialize y to be a vector of length Data.size()
  for (unsigned int i = 0; i < Data.size(); i++) {            // Loop over data points
    X(i, 0) = Data[i].x;                                      // Copy the x-value over the entry in the first colum of the i'th row.
    y[i] = Data[i].y;                                         // Copy the target value into the y-vector
  }
  return X.jacobiSvd(ComputeThinU | ComputeThinV).solve(y);   // Return the least squares solution using Eigen's Jacobi SVD function.
}

We now have all of the libraries that we need and all of the functions to implement the problem that we specified earlier. Now we're ready to start writing our Seldonian algorithm! From here it's easy—by line-count, you've already written most of the code.

Previous: A simple Seldonian algorithm

C++