## Safety Test

Before we implement the safety test, let us write a shell for our quasi-Seldonian algorithm, which we will call QSA. This shell code will show how the safety test will be used. At a high level, we are simply partitioning the data, getting a candidate solution, and running the safet test.

• Notice that in the code below we are using the more descriptive names: candidateData for $$D_1$$ and safetyData for $$D_2$$
• Notice also that we are placing 40% of the data in candidateData and 60% in safetyData. This is an arbitrary choice and it remains an open question how best to optimize this partitioning of the data.
# Our Quasi-Seldonian linear regression algorithm operating over data (X,Y).
# The pair of objects returned by QSA is the solution (first element)
# and a Boolean flag indicating whether a solution was found (second element).
def QSA(X, Y, gHats, deltas):
# Put 40% of the data in candidateData (D1), and the rest in safetyData (D2)
candidateData_len = 0.40
candidateData_X, safetyData_X, candidateData_Y, safetyData_Y = train_test_split(
X, Y, test_size=1-candidateData_len, shuffle=False)

# Get the candidate solution
candidateSolution = getCandidateSolution(candidateData_X, candidateData_Y, gHats, deltas, safetyData_X.size)

# Run the safety test
passedSafety      = safetyTest(candidateSolution, safetyData_X, safetyData_Y, gHats, deltas)

# Return the result and success flag
return [candidateSolution, passedSafety]


Now recall the pseudocode for the safety test:

3. Safety Test: Return $$\theta_c$$ if $$\forall i \in \{1,2,\dotsc,n\}, \quad \hat \mu(\hat g_i(\theta_c,D_2)) + \frac{\hat \sigma(\hat g_i(\theta_c,D_2))}{\sqrt{|D_2|}}t_{1-\delta_i,|D_2|-1} \leq 0,$$ and No Solution Found (NSF) otherwise.

Given the helper functions that we already have, this function is straightforward to write:

# Run the safety test on a candidate solution. Returns true if the test is passed.
#   candidateSolution: the solution to test.
#   (safetyData_X, safetyData_Y): data set D2 to be used in the safety test.
#   (gHats, deltas): vectors containing the behavioral constraints and confidence levels.
def safetyTest(candidateSolution, safetyData_X, safetyData_Y, gHats, deltas):

for i in range(len(gHats)):  # Loop over behavioral constraints, checking each
g         = gHats[i]  # The current behavioral constraint being checked
delta     = deltas[i] # The confidence level of the constraint

# This is a vector of unbiased estimates of g(candidateSolution)
g_samples = g(candidateSolution, safetyData_X, safetyData_Y)

# Check if the i-th behavioral constraint is satisfied
upperBound = ttestUpperBound(g_samples, delta)

if upperBound > 0.0: # If the current constraint was not satisfied, the safety test failed
return False

# If we get here, all of the behavioral constraints were satisfied
return True


We're almost there. All that's left is the the function getCandidateSolution!