Seldonian Algorithms: Formal Definition
Notation:
- \(\mathcal D\) is the set of all possible inputs (data sets) to the algorithm.
- \(\Theta\) is the set of all possible outputs of the algorithm, each of which we refer to as a solution \(\theta \in \Theta\).
- \(D\) is the data set given as input to the algorithm, which we view as a random variable.
- \(a:\mathcal D \to \Theta\) is a machine learning algorithm, which takes a data set as input and returns a solution. That is, \(a(D)\) is the solution (an element of \(\Theta\)) output by the algorithm when run on input \(D\). Any random numbers required by the algorithm should be included in \(D\).
- \(\mathcal A\) is the set of all possible machine learning algorithms.
- \(f:\mathcal A \to \mathbb R\) is the objective function of the algorithm designer. The goal of the designer of the algorithm is to find an algorithm that maximizes \(f\), subject to the behavioral constraints.
- \(n\) is the number of behavioral constraints.
- \((g_i,\delta_i)_{i=1}^n\) is a set of \(n\) behavioral constraints, each of which contains a constraint function \(g_i:\Theta\to\mathbb R\) and a confidence level \(\delta_i\).
- The constraint function measures undesirable behavior: we say that \(\theta \in \Theta\) produces undesirable behavior if and only if \(g_i(\theta) > 0\).
- The confidence level specifies the maximum probability that the algorithm can return a solution \(\theta\) where \(g_i(\theta)>0\).
A
Seldonian algorithm ensures that for all \(i \in \{1,2,\dotsc,n\}\):
$$\Pr(g_i(a(D))\leq 0)\geq 1-\delta_i.$$
In other words, a Seldonian algorithm \(a\) returns, with high probability, a solution that does not break any of the behavioral constraints. An algorithm is
quasi-Seldonian if it relies on reasonable but possibly false assumptions regarding the data being analyzed. These assumptions often appeal, for instance, to the central limit theorem, common in scientific research. As quasi-Seldonian algorithms tend to be more practical and data-efficient than Seldonian algorithms, we focus on creating a quasi-Seldonian algorithm here.
After we have created our first quasi-Seldonian algorithm in this setting, we describe more sophisticated interfaces that make it easier for users of the algorithm to specify undesirable behaviors.