Disclaimer

We recommend that you read Section 3 of the supplementary materials of our paper[link] for the formal definition of our framework and Seldonian algorithms. Here we provide an extremely brief recap. Although this page summarizes our framework and Seldonian algorithms, it does not walk through why the framework is the way it is, or how it compares to the previous standard approach for designing machine learning algorithms—that is the purpose of the paper.

Seldonian Algorithms: Formal Definition

Notation:

  • \(\mathcal D\) is the set of all possible inputs (data sets) to the algorithm.
  • \(\Theta\) is the set of all possible outputs of the algorithm, each of which we refer to as a solution \(\theta \in \Theta\).
  • \(D\) is the data set input to the algorithm, which we view as a random variable.
  • \(a:\mathcal D \to \Theta\) is a machine learning algorithm, which takes a data set as input and returns a solution. That is, \(a(D)\) is the solution (an element of \(\Theta\)) output by the algorithm when run on input \(D\). Any random numbers required by the algorithm should be included in \(D\).
  • \(\mathcal A\) is the set of all possible machine learning algorithms.
  • \(f:\mathcal A \to \mathbb R\) is the objective function of the algorithm designer. That is, the algorithm designer's goal is to find an algorithm that maximizes \(f\), subject to the behavioral constraints.
  • \(n\) is the number of behavioral constraints.
  • \((g_i,\delta_i)_{i=1}^n\) is the set of \(n\) behavioral constraints, each of which contains a constraint function \(g_i:\Theta\to\mathbb R\) and a confidence level \(\delta_i\).
    • The constraint function measures undesirable behavior: we say that \(\theta \in \Theta\) produces undesirable behavior if and only if \(g_i(\theta) > 0\).
    • The confidence level specifies the maximum probability that the algorithm can return a solution \(\theta\) where \(g(\theta)>0\).
A Seldonian algorithm ensures that for all \(i \in \{1,2,\dotsc,n\}\): $$\Pr(g_i(a(D))\leq 0)\geq 1-\delta_i.$$ An algorithm is quasi-Seldonian if it relies on reasonable but false assumptions, such as appeals to the central limit theorem common in scientific research. As quasi-Seldonian algorithms tend to be more practical than Seldonian algorithms, we focus on creating a quasi-Seldonian algorithm here, but discuss how it could be transformed into a Seldonian algorithm.

Initially in this tutorial we begin by assuming that the following functions have been provided:
  • \(\hat f:\Theta \times \mathcal D \to \mathbb R\) is the sample objective function, such that \(\hat f(a(D),D')\) is an estimate of \(f(a)\) created from data sets \(D\) and \(D'\).
  • \(\hat g_i:\Theta \times \mathcal D \to \mathbb R\) is defined such that \(\hat g_i(\theta,D)\) is an unbiased estimator of \(g(\theta)\).
After we have created our first quasi-Seldonian algorithm in this setting, we describe more sophisticated interfaces that do not require \(\hat g_i\) to be provided by the user.

Seldonian Framework: Summary

Recall that our framework has three steps:

  1. Define the goal for the algorithm design process. In this step, the designer of the algorithm writes down an expression that captures their goals. This expression should have the form: $$\arg\max_{a \in \mathcal A} f(a)\\\text{s.t. }\forall i \in \{1,\dotsc,n\},\quad\Pr(g_i(a(D))\leq 0)\geq 1-\delta_i.$$ This differs from the standard framework wherein we typically write down the goals of the algorithm. If this is not clear, notice that here the objective function and behavioral constraints are over algorithms, not solutions. Algorithms satisfy \(\Pr(g_i(a(D))\leq 0)\geq 1-\delta_i\), not solutions. This differs from the standard framework where we begin by writing the goal of the algorithm, which is to find a solution in $$\arg\max_{\theta \in \Theta}f(\theta).$$ Here the constraint that \(\theta \in \Theta\) is a constraint that is satisfied by solutions.

  2. Define the interface that the user will use. In this step the designer of the algorithm specifies how future users of the algorithm will be able to specify each \(g_i\). This interface should be as easy to use as possible. For example, we should avoid requiring the user to provide code for \(g_i\) or the set of solutions such that \(g_i(\theta) \leq 0\), as the user typically will not know these quantities. Examples of interfaces include allowing users to provide unbiased estimators of each \(g_i\), allowing users to label outcomes as containing undesirable behavior or not, or allowing the user to write a mathematical expression for undesirable behavior in terms of a set of common variables, such as:

    g(\theta)=abs(False_Positive_Male - False_Positive_Female) - 0.01

    Later in these tutorials we will describe how each of these interfaces can function. We will begin with the first and most simple to implement: requiring the user to provide unbiased estimators, \(\hat g_i\).

  3. Create the algorithm. In this step, the designer of the algorithm creates an algorithm that is an approximate solution to the optimization problem specified in the first step, while ensuring that the algorithm at least satisfies the behavioral constraints. This algorithm must also be compatible with the interface defined in the second step.
Previous: Tutorial Overview Next: Example problem