Running your quasi-Seldonian algorithm one time and observing that it found a solution is encouraging—certainly more so than if it returned No Solution Found (NSF)! However, it doesn't tell us much about the algorithm's behavior:
An algorithm that guarantees that it is safe and/or fair will typically have slightly worse performance than an algorithm that focuses purely on optimizing performance. The left column of plots in Figure 3 of our paper shows the slight decrease in predictive accuracy of our (quasi-)Seldonian classification algorithm due to enforcing the behavioral constraints. Similarly, in the supplementary materials of our paper, Fig. S19 shows how ensuring fairness in our GPA regression example results in a slight increase in the mean squared errors of predictions. For the problem we are solving, we will certainly increase the mean squared error, as our second behavioral constraint explicitly requires this. Still, it is worth comparing the mean squared error of the solutions returned by QSA
to the mean squared error of the solutions that would be returned by a standard algorithm, in this case least squares linear regression (LS).
Each of the plots below show how different properties of the algorithm vary with the amount of data, \(m\). We first plot performance (MSE) for different amounts of data. Notice that:
QSA
(the quasi-Seldonian algorithm you've written) and by leastSquares
(ordinary least squares linear regression).This raises the question: how should we compute the mean squared errors of estimators (both the estimators produced by our algorithm, and the estimators produced by least squares linear regression)? Typically, we won't have an analytic expression for this. However, for synthetic problems we can usually generate more data. We will evaluate the performance and safety/fairness of our algorithm by generating significantly more data than we used to train the algorithm. Specifically, in our implementation we train with up to 65,536 samples, and use 100 times as many points (around 6,500,000 samples) to evaluate the generated solutions.
The MSE plot that we obtained is below, which includes standard error bars and thin dotted lines to indicate the desired MSE range of \([1.25,2.0]\).
The previous plot made it clear that QSA didn't always return a solution. How much data does QSA require to return a solution for this problem? To answer this, we create a plot similar to the one above, but reporting the probability that each method returned a solution.
Next, we plot the probability that each algorithm produced undesirable behavior: i.e., the probability that each algorithm caused \(g_1(a(D)) > 0\) and the probability that each algorithm caused \(g_2(a(D)) > 0\).
QSA
is the one of more interest: if it goes above \(0.1\), the algorithm did not satisfy the behavioral constraints (this could in theory happen here due to our reliance on the aforementioned normality assumption). The curve for QSA
is not completely flat at zero: it goes up to a value of \(0.005\), which is well below our threshold of \(0.1\). You can download our source code to create the plots on this page by clicking the download button below:
This .zip file includes both our updated main.cpp and the outputs we obtained in the directory /output
, along with Matlab code to create all of the plots in this tutorial (simply run main.m
in Matlab). Other than the line #include<fstream>
, all of our changes are below the QSA
function. Notice in int main(...)
that for two different values of \(m\), say \(m_1\) and \(m_2\) where \(m_1 < m_2\), within the same trial the first \(m_1\) samples in the training data will be the same for both values of \(m\). This use of common random numbers reduces the variance in plots.
To run this demo, you will need two additional libraries: Ray, to allow our code to run in parallel, and Numba, a Just-in-Time (JIT) compiler that accelerates Python code.
This .zip file includes both the original main.py
file, for running our first, simpler experiment, and main_plotting.py
, for reproducing the plots and analyses above. The results of each experiment (including output CSV files) are stored in the experiment_results/
folder. To generate the graphs above, please run main_plotting.py
and then create_plots.py
. All plots are saved in the images/
folder.