rapid_models.gp_diagnostics.cv

Module Contents

Functions

multifold(K, Y_train, folds, noise_variance=0, check_args=True)

Compute multifold CV residuals for GP regression with noiseless (noise_variance = 0) or fixed variance iid Gaussian noise.

multifold_cholesky(L, Y_train, folds, check_args=True)

Compute multifold CV residuals from the Cholesky factor L of the observation precision matrix and the training data Y_train

loo(K, Y_train, noise_variance=0, check_args=True)

Compute Leave-One-Out (LOO) residuals for GP regression with noiseless (noise_variance = 0) or fixed variance iid Gaussian noise.

loo_cholesky(L, Y_train, check_args=True)

Compute Leave-One-Out (LOO) residuals from the Cholesky factor L of the observation precision matrix and the training data Y_train

check_folds_indices(folds, n_max)

Check that the list of index subsets (list of lists) is valid

check_lower_triangular(arr, argname='arr')

Check that the argument is a 2d numpy array which is lower triangular

check_numeric_array(arr, dim, argname='arr')

Check that the argument is a numpy array of correct dimension

_multifold_inv(K, Y_train, folds)

Compute multifold cv residuals using matrix inverse (for testing)

rapid_models.gp_diagnostics.cv.multifold(K, Y_train, folds, noise_variance=0, check_args=True)[source]

Compute multifold CV residuals for GP regression with noiseless (noise_variance = 0) or fixed variance iid Gaussian noise. (residual = observed - predicted)

Parameters
  • K (2d array) – GP prior covariance matrix

  • Y_train (array) – training observations

  • folds (list of lists) – The index subsets

  • noise_variance – variance of the observational noise. Set noise_variance = 0 for noiseless observations

  • check_args (bool) – Check (assert) that arguments are well-specified before computation

Returns

Mean of CV residuals cov: Covariance of CV residuals residuals_transformed: The residuals transformed to the standard normal space

Return type

mean

This function just calls ‘multifold_cholesky()’ with the appropriate Cholesky factor. It is based on the formulation derived in:

[D. Ginsbourger and C. Schaerer (2021). Fast calculation of Gaussian Process multiple-fold crossvalidation residuals and their covariances. arXiv:2101.03108]

rapid_models.gp_diagnostics.cv.multifold_cholesky(L, Y_train, folds, check_args=True)[source]

Compute multifold CV residuals from the Cholesky factor L of the observation precision matrix and the training data Y_train (residual = observed - predicted)

Parameters
  • L (2d array) – lower triangular Cholesky factor of covariance matrix (L L.T = covariance matrix)

  • Y_train (array) – training observations

  • folds (list of lists) – The index subsets

  • check_args (bool) – Check (assert) that arguments are well-specified before computation

Returns

Mean of CV residuals cov: Covariance of CV residuals residuals_transformed: The residuals transformed to the standard normal space

Return type

mean

Note: * The matrix K = L L.T is the covariance matrix of the predicted observations Y_train * For observations including Gaussian noise with fixed variance (v), the matrix K is K = (K + v*I) where K[i, j] is the prior covariance of the latent GP between the i-th an j-th training location

This implementation uses the Cholesky factor instead of the inverse precision matrix, but is otherwise equivalent to the formulas derived in

[D. Ginsbourger and C. Schaerer (2021). Fast calculation of Gaussian Process multiple-fold crossvalidation residuals and their covariances. arXiv:2101.03108]

rapid_models.gp_diagnostics.cv.loo(K, Y_train, noise_variance=0, check_args=True)[source]

Compute Leave-One-Out (LOO) residuals for GP regression with noiseless (noise_variance = 0) or fixed variance iid Gaussian noise. (residual = observed - predicted) This function just calls ‘loo_cholesky()’ with the appropriate Cholesky factor.

Parameters
  • K (2d array) – GP prior covariance matrix

  • Y_train (array) – training observations

  • noise_variance – variance of the observational noise. Set noise_variance = 0 for noiseless observations

  • check_args (bool) – Check (assert) that arguments are well-specified before computation

Returns

Mean of LOO residuals cov: Covariance of LOO residuals residuals_transformed: The residuals transformed to the standard normal space

Return type

mean

rapid_models.gp_diagnostics.cv.loo_cholesky(L, Y_train, check_args=True)[source]

Compute Leave-One-Out (LOO) residuals from the Cholesky factor L of the observation precision matrix and the training data Y_train (residual = observed - predicted)

Parameters
  • L (2d array) – lower triangular Cholesky factor of covariance matrix (L L.T = covariance matrix)

  • Y_train (array) – training observations

  • check_args (bool) – Check (assert) that arguments are well-specified before computation

Returns

Mean of LOO residuals cov: Covariance of LOO residuals residuals_transformed: The residuals transformed to the standard normal space

Return type

mean

Note: * The matrix K = L L.T is the covariance matrix of the predicted observations Y_train * For observations including Gaussian noise with fixed variance (v), the matrix K is K = (K + v*I) where K[i, j] is the prior covariance of the latent GP between the i-th an j-th training location

This implementation uses the Cholesky factor instead of the inverse precision matrix, but is otherwise equivalent to the formulas derived in

[O. Dubrule. Cross validation of kriging in a unique neighborhood. Journal of the International Association for Mathematical Geology, 15 (6):687-699, 1983.]

rapid_models.gp_diagnostics.cv.check_folds_indices(folds, n_max)[source]

Check that the list of index subsets (list of lists) is valid

Parameters
  • folds (list of lists) – The index subsets.

  • n_max (int) – Total number of indices.

Raises

AssertionError – if not ‘folds’ represents the range [0:n_max-1] of n_max indices split into non overlapping subsets

rapid_models.gp_diagnostics.cv.check_lower_triangular(arr, argname='arr')[source]

Check that the argument is a 2d numpy array which is lower triangular

Parameters

() (arr) – object

Raises

AssertionError – if not ‘arr’ represents a lower triangular matrix

rapid_models.gp_diagnostics.cv.check_numeric_array(arr, dim, argname='arr')[source]

Check that the argument is a numpy array of correct dimension

Parameters

() (arr) – object

Raises

AssertionError – if not ‘arr’ represents a ‘dim’-dimensional numpy array

rapid_models.gp_diagnostics.cv._multifold_inv(K, Y_train, folds)[source]

Compute multifold cv residuals using matrix inverse (for testing) (residual = observed - predicted)

Parameters
  • K (2d array) – covariance matrix

  • Y_train (array) – training observations

  • folds (list of lists) – The index subsets.

Returns

Mean of CV residuals cov: Covariance of CV residuals residuals_transformed: The residuals transformed to the standard normal space

Return type

mean

[D. Ginsbourger and C. Schaerer (2021). Fast calculation of Gaussian Process multiple-fold crossvalidation residuals and their covariances. arXiv:2101.03108]