Seldonian Algorithms: Formal Definition
Notation:
- \(\mathcal D\) is the set of all possible inputs (data sets) to the algorithm.
- \(\Theta\) is the set of all possible outputs of the algorithm, each of which we refer to as a solution \(\theta \in \Theta\).
- \(D\) is the data set input to the algorithm, which we view as a random variable.
- \(a:\mathcal D \to \Theta\) is a machine learning algorithm, which takes a data set as input and returns a solution. That is, \(a(D)\) is the solution (an element of \(\Theta\)) output by the algorithm when run on input \(D\). Any random numbers required by the algorithm should be included in \(D\).
- \(\mathcal A\) is the set of all possible machine learning algorithms.
- \(f:\mathcal A \to \mathbb R\) is the objective function of the algorithm designer. That is, the algorithm designer's goal is to find an algorithm that maximizes \(f\), subject to the behavioral constraints.
- \(n\) is the number of behavioral constraints.
- \((g_i,\delta_i)_{i=1}^n\) is the set of \(n\) behavioral constraints, each of which contains a constraint function \(g_i:\Theta\to\mathbb R\) and a confidence level \(\delta_i\).
- The constraint function measures undesirable behavior: we say that \(\theta \in \Theta\) produces undesirable behavior if and only if \(g_i(\theta) > 0\).
- The confidence level specifies the maximum probability that the algorithm can return a solution \(\theta\) where \(g(\theta)>0\).
A
Seldonian algorithm ensures that for all \(i \in \{1,2,\dotsc,n\}\):
$$\Pr(g_i(a(D))\leq 0)\geq 1-\delta_i.$$
An algorithm is
quasi-Seldonian if it relies on reasonable but false assumptions, such as appeals to the central limit theorem common in scientific research. As quasi-Seldonian algorithms tend to be more practical than Seldonian algorithms, we focus on creating a quasi-Seldonian algorithm here, but discuss how it could be transformed into a Seldonian algorithm.
Initially in this tutorial we begin by assuming that the following functions have been provided:
- \(\hat f:\Theta \times \mathcal D \to \mathbb R\) is the sample objective function, such that \(\hat f(a(D),D')\) is an estimate of \(f(a)\) created from data sets \(D\) and \(D'\).
- \(\hat g_i:\Theta \times \mathcal D \to \mathbb R\) is defined such that \(\hat g_i(\theta,D)\) is an unbiased estimator of \(g(\theta)\).
After we have created our first quasi-Seldonian algorithm in this setting, we describe more sophisticated interfaces that do not require \(\hat g_i\) to be provided by the user.