Settings

class linear_operator.settings.cg_tolerance(value)[source]

Relative residual tolerance to use for terminating CG.

(Default: 1)

class linear_operator.settings.cholesky_jitter(float_value=None, double_value=None, half_value=None)[source]

The jitter value used by psd_safe_cholesky when using cholesky solves.

Default for float: 1e-6
Default for double: 1e-8

class linear_operator.settings.cholesky_max_tries(value)[source]

The max_tries value used by psd_safe_cholesky when using cholesky solves.

(Default: 3)

class linear_operator.settings.ciq_samples(state=True)[source]

Whether to draw samples using Contour Integral Quadrature or not. This may be slower than standard sampling methods for N < 5000. However, it should be faster with larger matrices.

As described in the paper:

Fast Matrix Square Roots with Applications to Gaussian Processes and Bayesian Optimization.

(Default: False)

class linear_operator.settings.debug(state=True)[source]

Whether or not to perform “safety” checks on the supplied data. (For example, that the correct training data is supplied in Exact GP training mode) Pros: fewer data checks, fewer warning messages Cons: possibility of supplying incorrect data, model accidentially in wrong mode

(Default: True)

class linear_operator.settings.deterministic_probes(state=True)[source]

Whether or not to resample probe vectors every iteration of training. If True, we use the same set of probe vectors for computing log determinants each iteration. This introduces small amounts of bias in to the MLL, but allows us to compute a deterministic estimate of it which makes optimizers like L-BFGS more viable choices.

NOTE: Currently, probe vectors are cached in a global scope. Therefore, this setting cannot be used if multiple independent GP models are being trained in the same context (i.e., it works fine with a single GP model)

(Default: False)

class linear_operator.settings.fast_computations(covar_root_decomposition=True, log_prob=True, solves=True)[source]

This feature flag controls whether or not to use fast approximations to various mathematical functions used in GP inference. The functions that can be controlled are:

covar_root_decomposition
This feature flag controls how matrix root decompositions (\(K = L L^\top\)) are computed (e.g. for sampling, computing caches, etc.).
- If set to True,
  covariance matrices \(K\) are decomposed with low-rank approximations \(L L^\top\), (\(L \in \mathbb R^{n \times k}\)) using the Lanczos algorithm. This is faster for large matrices and exploits structure in the covariance matrix if applicable.
- If set to False,
  covariance matrices \(K\) are decomposed using the Cholesky decomposition.
log_prob
This feature flag controls how to compute the marginal log likelihood for exact GPs and log_prob for multivariate normal distributions
- If set to True,
  log_prob is computed using a modified conjugate gradients algorithm (as described in GPyTorch Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration. This is a stochastic computation, but it is much faster for large matrices and exploits structure in the covariance matrix if applicable.
- If set to False,
  log_prob is computed using the Cholesky decomposition.
fast_solves
This feature flag controls how to compute the solves of positive-definite matrices.
- If set to True,
  Solves are computed with preconditioned conjugate gradients.
- If set to False,
  Solves are computed using the Cholesky decomposition.

Warning

Setting this to False will compute a complete Cholesky decomposition of covariance matrices. This may be infeasible for GPs with structure covariance matrices.

By default, approximations are used for all of these functions (except for solves). Setting any of them to False will use exact computations instead.

See also:

linear_operator.settings.max_root_decomposition_size
(to control the size of the low rank decomposition used)
linear_operator.settings.num_trace_samples
(to control the stochasticity of the fast log_prob estimates)

covar_root_decomposition: alias of _fast_covar_root_decomposition

log_prob: alias of _fast_log_prob

solves: alias of _fast_solves

class linear_operator.settings.linalg_dtypes(default=torch.float64, symeig=None, cholesky=None)[source]

Whether to perform less stable linalg calls in double precision or in a lower precision. Currently, the default is to apply all symeig calls and cholesky calls within variational methods in double precision.

(Default: torch.double)

class linear_operator.settings.max_cg_iterations(value)[source]

The maximum number of conjugate gradient iterations to perform (when computing matrix solves). A higher value rarely results in more accurate solves – instead, lower the CG tolerance.

(Default: 1000)

class linear_operator.settings.max_cholesky_size(value)[source]

If the size of of a LinearOperator is less than max_cholesky_size, then root_decomposition and solve of LinearOperator will use Cholesky rather than Lanczos/CG.

(Default: 800)

class linear_operator.settings.max_lanczos_quadrature_iterations(value)[source]

The maximum number of Lanczos iterations to perform when doing stochastic Lanczos quadrature. This is ONLY used for log determinant calculations and computing Tr(K^{-1}dK/dtheta)

(Default: 20)

class linear_operator.settings.max_preconditioner_size(value)[source]

The maximum size of preconditioner to use. 0 corresponds to turning preconditioning off. When enabled, usually a value of around ~10 works fairly well.

(Default: 15)

class linear_operator.settings.max_root_decomposition_size(value)[source]

The maximum number of Lanczos iterations to perform This is used when 1) computing variance estiamtes 2) when drawing from MVNs, or 3) for kernel multiplication More values results in higher accuracy

(Default: 100)

class linear_operator.settings.memory_efficient(state=True)[source]

Whether or not to use Toeplitz math with gridded data, grid inducing point modules Pros: memory efficient, faster on CPU Cons: slower on GPUs with < 10000 inducing points

(Default: False)

class linear_operator.settings.min_preconditioning_size(value)[source]

If the size of of a LinearOperator is less than min_preconditioning_size, then we won’t use pivoted Cholesky based preconditioning.

(Default: 2000)

class linear_operator.settings.minres_tolerance(value)[source]

Relative update term tolerance to use for terminating MINRES.

(Default: 1e-4)

class linear_operator.settings.num_contour_quadrature(value)[source]

The number of quadrature points to compute CIQ.

(Default: 15)

class linear_operator.settings.num_trace_samples(value)[source]

The number of samples to draw when stochastically computing the trace of a matrix More values results in more accurate trace estimations If the value is set to 0, then the trace will be deterministically computed

(Default: 10)

class linear_operator.settings.preconditioner_tolerance(value)[source]

Diagonal trace tolerance to use for checking preconditioner convergence.

(Default: 1e-3)

class linear_operator.settings.skip_logdet_forward(state=True)[source]

This feature does not affect the gradients returned by linear_operator.distributions.MultivariateNormal.log_prob() (used by linear_operator.mlls.MarginalLogLikelihood). The gradients remain unbiased estimates, and therefore can be used with SGD. However, the actual likelihood value returned by the forward pass will skip certain computations (i.e. the logdet computation), and will therefore be improper estimates.

If you’re using SGD (or a variant) to optimize parameters, you probably don’t need an accurate MLL estimate; you only need accurate gradients. So this setting may give your model a performance boost.

(Default: False)

class linear_operator.settings.terminate_cg_by_size(state=True)[source]

If set to true, cg will terminate after n iterations for an n x n matrix.

(Default: False)

class linear_operator.settings.trace_mode(state=True)[source]

If set to True, we will generally try to avoid calling our built in PyTorch functions, because these cannot be run through torch.jit.trace.

Note that this will sometimes involve explicitly evaluating lazy tensors and various other slowdowns and inefficiencies. As a result, you really shouldn’t use this feature context unless you are calling torch.jit.trace on a GPyTorch model.

Our hope is that this flag will not be necessary long term, once https://github.com/pytorch/pytorch/issues/22329 is fixed.

(Default: False)

class linear_operator.settings.tridiagonal_jitter(value)[source]

The (relative) amount of noise to add to the diagonal of tridiagonal matrices before eigendecomposing. root_decomposition becomes slightly more stable with this, as we need to take the square root of the eigenvalues. Any eigenvalues still negative after adding jitter will be zeroed out.

(Default: 1e-6)

class linear_operator.settings.use_toeplitz(state=True)[source]

Whether or not to use Toeplitz math with gridded data, grid inducing point modules Pros: memory efficient, faster on CPU Cons: slower on GPUs with < 10000 inducing points

(Default: True)

class linear_operator.settings.verbose_linalg(state=True)[source]

Print out information whenever running an expensive linear algebra routine (e.g. Cholesky, CG, Lanczos, CIQ, etc.)

(Default: False)