# Data-Sparse LinearOperators

## BlockDiagLinearOperator

class linear_operator.operators.BlockDiagLinearOperator(base_linear_op: LinearOperator | Tensor, block_dim: int = -3)[source]

Represents a lazy tensor that is the block diagonal of square matrices. The block_dim attribute specifies which dimension of the base LinearOperator specifies the blocks. For example, (with block_dim=-3 a k x n x n tensor represents k n x n blocks (a kn x kn matrix). A b x k x n x n tensor represents k b x n x n blocks (a b x kn x kn batch matrix).

Args:
base_linear_op (LinearOperator or Tensor):

Must be at least 3 dimensional.

block_dim (int):

The dimension that specifies the blocks.

## CholLinearOperator

class linear_operator.operators.CholLinearOperator(chol, upper=False)[source]

A LinearOperator (… x N x N) that represents a positive definite matrix given a lower trinagular Cholesky factor $$\mathbf L$$ (or upper triangular Cholesky factor $$\mathbf R$$).

Parameters:
• chol (TriangularLinearOperator (... x N x N)) – The Cholesky factor $$\mathbf L$$ (or $$\mathbf R$$).

• upper (bool) – If the orientation of the cholesky factor is an upper triangular matrix (i.e. $$\mathbf R^\top \mathbf R$$). If false, then the orientation is assumed to be a lower triangular matrix (i.e. $$\mathbf L \mathbf L^\top$$).

inverse()[source]

Returns the inverse of the CholLinearOperator.

Return type:

LinearOperator (… x N x N)

## ConstantDiagLinearOperator

class linear_operator.operators.ConstantDiagLinearOperator(diag_values, diag_shape)[source]

Diagonal lazy tensor with constant entries. Supports arbitrary batch sizes. Used e.g. for adding jitter to matrices.

Parameters:
• diag_values (torch.Tensor) – A … 1 Tensor, representing a of (batch of) diag_shape x diag_shape diagonal matrix.

• diag_shape (int) – The (non-batch) dimension of the (square) matrix

abs()[source]

Returns a DiagLinearOperator with the absolute value of all diagonal entries.

Return type:

LinearOperator

exp()[source]

Returns a DiagLinearOperator with all diagonal entries exponentiated.

Return type:

LinearOperator (… x M x N)

inverse()[source]

Returns the inverse of the DiagLinearOperator.

Return type:

LinearOperator (… x N x N)

log()[source]

Returns a DiagLinearOperator with the log of all diagonal entries.

Return type:

LinearOperator (… x M x N)

sqrt()[source]

Returns a DiagLinearOperator with the square root of all diagonal entries.

Return type:

LinearOperator (… x M x N)

## DiagLinearOperator

class linear_operator.operators.DiagLinearOperator(diag)[source]

Diagonal linear operator (… x N x N).

Parameters:

diag (torch.Tensor (... x N)) – Diagonal elements of LinearOperator.

abs()[source]

Returns a DiagLinearOperator with the absolute value of all diagonal entries.

Return type:

LinearOperator

exp()[source]

Returns a DiagLinearOperator with all diagonal entries exponentiated.

Return type:

LinearOperator (… x M x N)

inverse()[source]

Returns the inverse of the DiagLinearOperator.

Return type:

LinearOperator (… x N x N)

log()[source]

Returns a DiagLinearOperator with the log of all diagonal entries.

Return type:

LinearOperator (… x M x N)

sqrt()[source]

Returns a DiagLinearOperator with the square root of all diagonal entries.

Return type:

LinearOperator (… x M x N)

## IdentityLinearOperator

class linear_operator.operators.IdentityLinearOperator(diag_shape, batch_shape=torch.Size([]), dtype=torch.float32, device=None)[source]

Identity linear operator. Supports arbitrary batch sizes.

Parameters:
• diag_shape (int) – The size of the identity matrix (i.e. $$N$$).

• batch_shape (torch.Size, optional) – The size of the batch dimensions. It may be useful to set these dimensions for broadcasting.

• dtype (torch.dtype, optional) – Dtype that the LinearOperator will be operating on. (Default: torch.get_default_dtype()).

• device (torch.device, optional) – Device that the LinearOperator will be operating on. (Default: CPU).

## KernelLinearOperator

class linear_operator.operators.KernelLinearOperator(x1, x2, covar_func, num_outputs_per_input=(1, 1), num_nonbatch_dimensions=None, **params)[source]

Represents the kernel matrix $$\boldsymbol K$$ of data $$\boldsymbol X_1 \in \mathbb R^{M \times D}$$ and $$\boldsymbol X_2 \in \mathbb R^{N \times D}$$ under the covariance function $$k_{\boldsymbol \theta}(\cdot, \cdot)$$ (parameterized by hyperparameters $$\boldsymbol \theta$$ so that $$\boldsymbol K_{ij} = k_{\boldsymbol \theta}([\boldsymbol X_1]_i, [\boldsymbol X_2]_j)$$.

The output of $$k_{\boldsymbol \theta}(\cdot,\cdot)$$ (covar_func) can either be a torch.Tensor or a LinearOperator.

Note

All hyperparameters have some number of batch dimensions (which broadcast with the batch dimensions of x1 and x2) and some number of non-batch dimensions (dimensions that would exist if we were computing a single covariance matrix).

By default, each hyperparameter is assumed to have 2 (potentially singleton) non-batch dimensions. However, the number of non_batch dimensions can be specified on a per-hyperparameter through the optional num_nonbatch_dimensions dictionary argument.

For example, to implement the RBF kernel

$o^2 \exp\left( -\tfrac{1}{2} (\boldsymbol x_1 - \boldsymbol x2)^\top \boldsymbol D_\ell^{-2} (\boldsymbol x_1 - \boldsymbol x2) \right),$

where $$o$$ is an outputscale parameter and $$D_\ell$$ is a diagonal lengthscale matrix, we would expect the following shapes:

• x1: (*batch_shape x N x D)

• x2: (*batch_shape x M x D)

• lengthscale: (*batch_shape x 1 x D)

• outputscale: (*batch_shape) # Note this parameter does not have non-batch dimensions

We would then supply the dictionary num_nonbatch_dimensions = {“outputscale”: 0}. (We do not need to include lengthscale in the dictionary since it has 2 non-batch dimensions.)

# NOTE: _covar_func intentionally does not close over any parameters
def _covar_func(x1, x2, lengthscale, outputscale):
# RBF kernel function
# x1: ... x N x D
# x2: ... x M x D
# lengthscale: ... x 1 x D
# outputscale: ...
x1 = x1.div(lengthscale)
x2 = x2.div(lengthscale)
sq_dist = (x1.unsqueeze(-2) - x2.unsqueeze(-3)).square().sum(dim=-1)
kern = sq_dist.div(-2.0).exp().mul(outputscale[..., None, None].square())
return kern

# Batches of data
x1 = torch.randn(3, 5, 6)
x2 = torch.randn(3, 4, 6)
# Broadcasting lengthscale and output parameters
lengthscale = torch.randn(2, 1, 1, 6)  # Batch shape is 2 x 1, with 2 non-batch dimensions
outputscale = torch.randn(2, 1)  # Batch shape is 2 x 1, no non-batch dimensions
kern = KernelLinearOperator(
x1, x2, lengthscale=lengthscale, outputscale=outputscale,
covar_func=covar_func, num_nonbatch_dimensions={"outputscale": 0}
)

# kern is of size 2 x 3 x 5 x 4


Warning

covar_func should not close over any parameters. Any parameters that are closed over will not have propagated gradients.

See the example above: the lengthscale and outputscale of _covar_func are passed in as arguments, rather than being externally defined variables.

Parameters:
• x1 (torch.Tensor (... x M x D)) – The data $$\boldsymbol X_1.$$

• x2 (torch.Tensor (... x N x D)) – The data $$\boldsymbol X_2.$$

• covar_func (Callable[... -> torch.Tensor (... x M x N) or LinearOperator (... x M x N)]) – The covariance function $$k_{\boldsymbol \theta}(\cdot, \cdot)$$. Its arguments should be x1, x2, **params, and it should output the covariance matrix between $$\boldsymbol X_1$$ and $$\boldsymbol X_2$$.

• num_outputs_per_input ((int, int)) – The number of outputs per data point. This parameter should be 1 for most kernels, but will be >1 for multitask kernels, gradient kernels, and any other kernels that require cross-covariance terms for multiple domains. If a tuple is passed, there will be a different number of outputs per input dimension for the rows/cols of the kernel matrix.

• params (torch.Tensor or Any) – Additional hyperparameters ($$\boldsymbol \theta$$) or keyword arguments passed into covar_func.

## RootLinearOperator

class linear_operator.operators.RootLinearOperator(root)[source]

## ToeplitzLinearOperator

class linear_operator.operators.ToeplitzLinearOperator(column)[source]

## ZeroLinearOperator

class linear_operator.operators.ZeroLinearOperator(*sizes, dtype=None, device=None)[source]

Special LinearOperator representing zero.

Parameters:
• sizes ((int, ...)) – The size of each dimension (including batch dimensions).

• dtype (torch.dtype, optional) – Dtype that the LinearOperator will be operating on. (Default: torch.get_default_dtype()).

• device (torch.device, optional) – Device that the LinearOperator will be operating on. (Default: CPU).