Data-Sparse LinearOperators
BlockDiagLinearOperator
- class linear_operator.operators.BlockDiagLinearOperator(base_linear_op: LinearOperator | Tensor, block_dim: int = -3)[source]
Represents a lazy tensor that is the block diagonal of square matrices. The
block_dim
attribute specifies which dimension of the base LinearOperator specifies the blocks. For example, (with block_dim=-3 a k x n x n tensor represents k n x n blocks (a kn x kn matrix). A b x k x n x n tensor represents k b x n x n blocks (a b x kn x kn batch matrix).- Args:
base_linear_op
(LinearOperator or Tensor):Must be at least 3 dimensional.
block_dim
(int):The dimension that specifies the blocks.
CholLinearOperator
- class linear_operator.operators.CholLinearOperator(chol, upper=False)[source]
A LinearOperator (… x N x N) that represents a positive definite matrix given a lower trinagular Cholesky factor \(\mathbf L\) (or upper triangular Cholesky factor \(\mathbf R\)).
- Parameters:
chol (TriangularLinearOperator (... x N x N)) – The Cholesky factor \(\mathbf L\) (or \(\mathbf R\)).
upper (bool) – If the orientation of the cholesky factor is an upper triangular matrix (i.e. \(\mathbf R^\top \mathbf R\)). If false, then the orientation is assumed to be a lower triangular matrix (i.e. \(\mathbf L \mathbf L^\top\)).
ConstantDiagLinearOperator
- class linear_operator.operators.ConstantDiagLinearOperator(diag_values, diag_shape)[source]
Diagonal lazy tensor with constant entries. Supports arbitrary batch sizes. Used e.g. for adding jitter to matrices.
- Parameters:
diag_values (torch.Tensor) – A … 1 Tensor, representing a of (batch of) diag_shape x diag_shape diagonal matrix.
diag_shape (int) – The (non-batch) dimension of the (square) matrix
- abs()[source]
Returns a DiagLinearOperator with the absolute value of all diagonal entries.
- Return type:
LinearOperator
- exp()[source]
Returns a DiagLinearOperator with all diagonal entries exponentiated.
- Return type:
LinearOperator (… x M x N)
- inverse()[source]
Returns the inverse of the DiagLinearOperator.
- Return type:
LinearOperator (… x N x N)
DiagLinearOperator
- class linear_operator.operators.DiagLinearOperator(diag)[source]
Diagonal linear operator (… x N x N).
- Parameters:
diag (torch.Tensor (... x N)) – Diagonal elements of LinearOperator.
- abs()[source]
Returns a DiagLinearOperator with the absolute value of all diagonal entries.
- Return type:
LinearOperator
- exp()[source]
Returns a DiagLinearOperator with all diagonal entries exponentiated.
- Return type:
LinearOperator (… x M x N)
- inverse()[source]
Returns the inverse of the DiagLinearOperator.
- Return type:
LinearOperator (… x N x N)
IdentityLinearOperator
- class linear_operator.operators.IdentityLinearOperator(diag_shape, batch_shape=torch.Size([]), dtype=torch.float32, device=None)[source]
Identity linear operator. Supports arbitrary batch sizes.
- Parameters:
diag_shape (int) – The size of the identity matrix (i.e. \(N\)).
batch_shape (torch.Size, optional) – The size of the batch dimensions. It may be useful to set these dimensions for broadcasting.
dtype (torch.dtype, optional) – Dtype that the LinearOperator will be operating on. (Default:
torch.get_default_dtype()
).device (torch.device, optional) – Device that the LinearOperator will be operating on. (Default: CPU).
KernelLinearOperator
- class linear_operator.operators.KernelLinearOperator(x1, x2, covar_func, num_outputs_per_input=(1, 1), num_nonbatch_dimensions=None, **params)[source]
Represents the kernel matrix \(\boldsymbol K\) of data \(\boldsymbol X_1 \in \mathbb R^{M \times D}\) and \(\boldsymbol X_2 \in \mathbb R^{N \times D}\) under the covariance function \(k_{\boldsymbol \theta}(\cdot, \cdot)\) (parameterized by hyperparameters \(\boldsymbol \theta\) so that \(\boldsymbol K_{ij} = k_{\boldsymbol \theta}([\boldsymbol X_1]_i, [\boldsymbol X_2]_j)\).
The output of \(k_{\boldsymbol \theta}(\cdot,\cdot)\) (covar_func) can either be a torch.Tensor or a LinearOperator.
Note
All hyperparameters have some number of batch dimensions (which broadcast with the batch dimensions of x1 and x2) and some number of non-batch dimensions (dimensions that would exist if we were computing a single covariance matrix).
By default, each hyperparameter is assumed to have 2 (potentially singleton) non-batch dimensions. However, the number of non_batch dimensions can be specified on a per-hyperparameter through the optional num_nonbatch_dimensions dictionary argument.
For example, to implement the RBF kernel
\[o^2 \exp\left( -\tfrac{1}{2} (\boldsymbol x_1 - \boldsymbol x2)^\top \boldsymbol D_\ell^{-2} (\boldsymbol x_1 - \boldsymbol x2) \right),\]where \(o\) is an outputscale parameter and \(D_\ell\) is a diagonal lengthscale matrix, we would expect the following shapes:
x1: (*batch_shape x N x D)
x2: (*batch_shape x M x D)
lengthscale: (*batch_shape x 1 x D)
outputscale: (*batch_shape) # Note this parameter does not have non-batch dimensions
We would then supply the dictionary num_nonbatch_dimensions = {“outputscale”: 0}. (We do not need to include lengthscale in the dictionary since it has 2 non-batch dimensions.)
# NOTE: _covar_func intentionally does not close over any parameters def _covar_func(x1, x2, lengthscale, outputscale): # RBF kernel function # x1: ... x N x D # x2: ... x M x D # lengthscale: ... x 1 x D # outputscale: ... x1 = x1.div(lengthscale) x2 = x2.div(lengthscale) sq_dist = (x1.unsqueeze(-2) - x2.unsqueeze(-3)).square().sum(dim=-1) kern = sq_dist.div(-2.0).exp().mul(outputscale[..., None, None].square()) return kern # Batches of data x1 = torch.randn(3, 5, 6) x2 = torch.randn(3, 4, 6) # Broadcasting lengthscale and output parameters lengthscale = torch.randn(2, 1, 1, 6) # Batch shape is 2 x 1, with 2 non-batch dimensions outputscale = torch.randn(2, 1) # Batch shape is 2 x 1, no non-batch dimensions kern = KernelLinearOperator( x1, x2, lengthscale=lengthscale, outputscale=outputscale, covar_func=covar_func, num_nonbatch_dimensions={"outputscale": 0} ) # kern is of size 2 x 3 x 5 x 4
Warning
covar_func should not close over any parameters. Any parameters that are closed over will not have propagated gradients.
See the example above: the lengthscale and outputscale of _covar_func are passed in as arguments, rather than being externally defined variables.
- Parameters:
x1 (torch.Tensor (... x M x D)) – The data \(\boldsymbol X_1.\)
x2 (torch.Tensor (... x N x D)) – The data \(\boldsymbol X_2.\)
covar_func (Callable[... -> torch.Tensor (... x M x N) or LinearOperator (... x M x N)]) – The covariance function \(k_{\boldsymbol \theta}(\cdot, \cdot)\). Its arguments should be x1, x2, **params, and it should output the covariance matrix between \(\boldsymbol X_1\) and \(\boldsymbol X_2\).
num_outputs_per_input ((int, int)) – The number of outputs per data point. This parameter should be 1 for most kernels, but will be >1 for multitask kernels, gradient kernels, and any other kernels that require cross-covariance terms for multiple domains. If a tuple is passed, there will be a different number of outputs per input dimension for the rows/cols of the kernel matrix.
params (torch.Tensor or Any) – Additional hyperparameters (\(\boldsymbol \theta\)) or keyword arguments passed into covar_func.
RootLinearOperator
ToeplitzLinearOperator
ZeroLinearOperator
- class linear_operator.operators.ZeroLinearOperator(*sizes, dtype=None, device=None)[source]
Special LinearOperator representing zero.
- Parameters:
sizes ((int, ...)) – The size of each dimension (including batch dimensions).
dtype (torch.dtype, optional) – Dtype that the LinearOperator will be operating on. (Default:
torch.get_default_dtype()
).device (torch.device, optional) – Device that the LinearOperator will be operating on. (Default: CPU).