AI Safety · Tutorial 1

Disclaimer

We recommend that you read Section 3 of the supplementary materials of our paper for the formal definition of our framework and Seldonian algorithms. Here we provide an extremely brief recap. Although this tutorial summarizes our framework and Seldonian algorithms, it does not walk through why the framework is the way it is, or how it compares to the previous standard approach for designing machine learning algorithms—that is the purpose of the paper.

Seldonian Algorithms: Formal Definition

Notation:

$\mathcal D$ is the set of all possible inputs (data sets) to the algorithm.
$\Theta$ is the set of all possible outputs of the algorithm, each of which we refer to as a solution $\theta \in \Theta$.
$D$ is the data set given as input to the algorithm, which we view as a random variable.
$a:\mathcal D \to \Theta$ is a machine learning algorithm, which takes a data set as input and returns a solution. That is, $a(D)$ is the solution (an element of $\Theta$) output by the algorithm when run on input $D$. Any random numbers required by the algorithm should be included in $D$.
$\mathcal A$ is the set of all possible machine learning algorithms.
$f:\mathcal A \to \mathbb R$ is the objective function of the algorithm designer. The goal of the designer of the algorithm is to find an algorithm that maximizes $f$, subject to the behavioral constraints.
$n$ is the number of behavioral constraints.
$(g_i,\delta_i)_{i=1}^n$ is a set of $n$ behavioral constraints, each of which contains a constraint function $g_i:\Theta\to\mathbb R$ and a confidence level $\delta_i$.

The constraint function measures undesirable behavior: we say that $\theta \in \Theta$ produces undesirable behavior if and only if $g_i(\theta) > 0$.
The confidence level specifies the maximum probability that the algorithm can return a solution $\theta$ where $g_i(\theta)>0$.

A Seldonian algorithm ensures that for all $i \in \{1,2,\dotsc,n\}$: $$\Pr(g_i(a(D))\leq 0)\geq 1-\delta_i.$$ In other words, a Seldonian algorithm $a$ returns, with high probability, a solution that does not break any of the behavioral constraints. An algorithm is quasi-Seldonian if it relies on reasonable but possibly false assumptions regarding the data being analyzed. These assumptions often appeal, for instance, to the central limit theorem, common in scientific research. As quasi-Seldonian algorithms tend to be more practical and data-efficient than Seldonian algorithms, we focus on creating a quasi-Seldonian algorithm here.

After we have created our first quasi-Seldonian algorithm in this setting, we describe more sophisticated interfaces that make it easier for users of the algorithm to specify undesirable behaviors.

Seldonian Framework: Summary

Recall that our framework has three steps:

Define the goal of the designer of the algorithm. In this step, the designer of the algorithm writes down an expression that captures their goals when constructing a safe machine learning algorithm. This expression should have the form: $$\arg\max_{a \in \mathcal A} f(a)\\\text{s.t. }\forall i \in \{1,\dotsc,n\},\quad\Pr(g_i(a(D))\leq 0)\geq 1-\delta_i.$$ This expression means that the goal of the designer is to identify an algorithm $a$ that maximizes the objective function $f$ while ensuring that, with high probability, it will only return solutions that satisfy all behavioral constraints. This differs from the standard framework wherein we typically write down the goals of the algorithm. If this is not clear, notice that the expression $\Pr(g_i(a(D))\leq 0)\geq 1-\delta_i$ is satisfied by a given candidate algorithm $a$, constructed by the designer, not by a given candidate solution that the algorithm may return. By contrast, in the standard framework we begin by writing the goal of the algorithm—to find a solution $\theta \in \Theta$ that maximizes $f$: $$\arg\max_{\theta \in \Theta}f(\theta).$$ where the constraint that $\theta \in \Theta$ is over the candidate solutions that will be analyzed by the algorithm, not over the algorithms that the designer may construct.

Define the interface that the user of the algorithm will use. In this step the designer of the algorithm specifies how future users of the algorithm will be able to specify each behavioral constraint $g_i$. This interface should be as easy to use as possible. For example, we should avoid requiring the user to provide code for $g_i$ or the set of solutions $\theta$ such that $g_i(\theta) \leq 0$, as the user typically will not know these quantities. Examples of interfaces include allowing users to provide data-based unbiased estimators of each $g_i$, allowing users to simply label outcomes that contain undesirable behavior, or allowing the user to write a mathematical expression for undesirable behavior in terms of a set of common variables, such as:

g($\theta$) = abs(False_Positive_Male - False_Positive_Female) - 0.01

Later in these tutorials we will describe how each of these interfaces can function. We will begin with the first and most simple to implement: requiring the user to provide data-based unbiased estimators, $\hat g_i$, of each behavioral constraint function $g_i$.

Create the algorithm. In this step, the designer of the algorithm creates an algorithm $a$ that is an approximate solution to the optimization problem specified in the first step, while ensuring that the algorithm satisfies the behavioral constraints. This algorithm must also be compatible with the interface defined in the second step.