Before we implement the safety test, let us write a shell for our quasi-Seldonian algorithm, which we will call QSA
. This shell code will show how the safety test will be used. At a high level, we are simply partitioning the data, getting a candidate solution, and running the safet test.
candidateData
for \(D_1\) and safetyData
for \(D_2\)candidateData
and 60% in safetyData
. This is an arbitrary choice and it remains an open question how best to optimize this partitioning of the data.# Our Quasi-Seldonian linear regression algorithm operating over data (X,Y). # The pair of objects returned by QSA is the solution (first element) # and a Boolean flag indicating whether a solution was found (second element). def QSA(X, Y, gHats, deltas): # Put 40% of the data in candidateData (D1), and the rest in safetyData (D2) candidateData_len = 0.40 candidateData_X, safetyData_X, candidateData_Y, safetyData_Y = train_test_split( X, Y, test_size=1-candidateData_len, shuffle=False) # Get the candidate solution candidateSolution = getCandidateSolution(candidateData_X, candidateData_Y, gHats, deltas, safetyData_X.size) # Run the safety test passedSafety = safetyTest(candidateSolution, safetyData_X, safetyData_Y, gHats, deltas) # Return the result and success flag return [candidateSolution, passedSafety]
Now recall the pseudocode for the safety test:
3. Safety Test: Return \(\theta_c\) if $$ \forall i \in \{1,2,\dotsc,n\}, \quad \hat \mu(\hat g_i(\theta_c,D_2)) + \frac{\hat \sigma(\hat g_i(\theta_c,D_2))}{\sqrt{|D_2|}}t_{1-\delta_i,|D_2|-1} \leq 0, $$ and No Solution Found (NSF) otherwise.
Given the helper functions that we already have, this function is straightforward to write:
# Run the safety test on a candidate solution. Returns true if the test is passed. # candidateSolution: the solution to test. # (safetyData_X, safetyData_Y): data set D2 to be used in the safety test. # (gHats, deltas): vectors containing the behavioral constraints and confidence levels. def safetyTest(candidateSolution, safetyData_X, safetyData_Y, gHats, deltas): for i in range(len(gHats)): # Loop over behavioral constraints, checking each g = gHats[i] # The current behavioral constraint being checked delta = deltas[i] # The confidence level of the constraint # This is a vector of unbiased estimates of g(candidateSolution) g_samples = g(candidateSolution, safetyData_X, safetyData_Y) # Check if the i-th behavioral constraint is satisfied upperBound = ttestUpperBound(g_samples, delta) if upperBound > 0.0: # If the current constraint was not satisfied, the safety test failed return False # If we get here, all of the behavioral constraints were satisfied return True
We're almost there. All that's left is the the function getCandidateSolution!