Data-Sparse LinearOperators

BlockDiagLinearOperator

class linear_operator.operators.BlockDiagLinearOperator(base_linear_op: LinearOperator | Tensor, block_dim: int = -3)[source]

Represents a lazy tensor that is the block diagonal of square matrices. The block_dim attribute specifies which dimension of the base LinearOperator specifies the blocks. For example, (with block_dim=-3 a k x n x n tensor represents k n x n blocks (a kn x kn matrix). A b x k x n x n tensor represents k b x n x n blocks (a b x kn x kn batch matrix).

Args:
base_linear_op (LinearOperator or Tensor):

Must be at least 3 dimensional.

block_dim (int):

The dimension that specifies the blocks.

CholLinearOperator

class linear_operator.operators.CholLinearOperator(chol, upper=False)[source]

A LinearOperator (… x N x N) that represents a positive definite matrix given a lower trinagular Cholesky factor \(\mathbf L\) (or upper triangular Cholesky factor \(\mathbf R\)).

Parameters:
  • chol (TriangularLinearOperator (... x N x N)) – The Cholesky factor \(\mathbf L\) (or \(\mathbf R\)).

  • upper (bool) – If the orientation of the cholesky factor is an upper triangular matrix (i.e. \(\mathbf R^\top \mathbf R\)). If false, then the orientation is assumed to be a lower triangular matrix (i.e. \(\mathbf L \mathbf L^\top\)).

inverse()[source]

Returns the inverse of the CholLinearOperator.

Return type:

LinearOperator (… x N x N)

ConstantDiagLinearOperator

class linear_operator.operators.ConstantDiagLinearOperator(diag_values, diag_shape)[source]

Diagonal lazy tensor with constant entries. Supports arbitrary batch sizes. Used e.g. for adding jitter to matrices.

Parameters:
  • diag_values (torch.Tensor) – A … 1 Tensor, representing a of (batch of) diag_shape x diag_shape diagonal matrix.

  • diag_shape (int) – The (non-batch) dimension of the (square) matrix

abs()[source]

Returns a DiagLinearOperator with the absolute value of all diagonal entries.

Return type:

LinearOperator

exp()[source]

Returns a DiagLinearOperator with all diagonal entries exponentiated.

Return type:

LinearOperator (… x M x N)

inverse()[source]

Returns the inverse of the DiagLinearOperator.

Return type:

LinearOperator (… x N x N)

log()[source]

Returns a DiagLinearOperator with the log of all diagonal entries.

Return type:

LinearOperator (… x M x N)

sqrt()[source]

Returns a DiagLinearOperator with the square root of all diagonal entries.

Return type:

LinearOperator (… x M x N)

DiagLinearOperator

class linear_operator.operators.DiagLinearOperator(diag)[source]

Diagonal linear operator (… x N x N).

Parameters:

diag (torch.Tensor (... x N)) – Diagonal elements of LinearOperator.

abs()[source]

Returns a DiagLinearOperator with the absolute value of all diagonal entries.

Return type:

LinearOperator

exp()[source]

Returns a DiagLinearOperator with all diagonal entries exponentiated.

Return type:

LinearOperator (… x M x N)

inverse()[source]

Returns the inverse of the DiagLinearOperator.

Return type:

LinearOperator (… x N x N)

log()[source]

Returns a DiagLinearOperator with the log of all diagonal entries.

Return type:

LinearOperator (… x M x N)

sqrt()[source]

Returns a DiagLinearOperator with the square root of all diagonal entries.

Return type:

LinearOperator (… x M x N)

IdentityLinearOperator

class linear_operator.operators.IdentityLinearOperator(diag_shape, batch_shape=torch.Size([]), dtype=torch.float32, device=None)[source]

Identity linear operator. Supports arbitrary batch sizes.

Parameters:
  • diag_shape (int) – The size of the identity matrix (i.e. \(N\)).

  • batch_shape (torch.Size, optional) – The size of the batch dimensions. It may be useful to set these dimensions for broadcasting.

  • dtype (torch.dtype, optional) – Dtype that the LinearOperator will be operating on. (Default: torch.get_default_dtype()).

  • device (torch.device, optional) – Device that the LinearOperator will be operating on. (Default: CPU).

KernelLinearOperator

class linear_operator.operators.KernelLinearOperator(x1, x2, covar_func, num_outputs_per_input=(1, 1), num_nonbatch_dimensions=None, **params)[source]

Represents the kernel matrix \(\boldsymbol K\) of data \(\boldsymbol X_1 \in \mathbb R^{M \times D}\) and \(\boldsymbol X_2 \in \mathbb R^{N \times D}\) under the covariance function \(k_{\boldsymbol \theta}(\cdot, \cdot)\) (parameterized by hyperparameters \(\boldsymbol \theta\) so that \(\boldsymbol K_{ij} = k_{\boldsymbol \theta}([\boldsymbol X_1]_i, [\boldsymbol X_2]_j)\).

The output of \(k_{\boldsymbol \theta}(\cdot,\cdot)\) (covar_func) can either be a torch.Tensor or a LinearOperator.

Note

All hyperparameters have some number of batch dimensions (which broadcast with the batch dimensions of x1 and x2) and some number of non-batch dimensions (dimensions that would exist if we were computing a single covariance matrix).

By default, each hyperparameter is assumed to have 2 (potentially singleton) non-batch dimensions. However, the number of non_batch dimensions can be specified on a per-hyperparameter through the optional num_nonbatch_dimensions dictionary argument.

For example, to implement the RBF kernel

\[o^2 \exp\left( -\tfrac{1}{2} (\boldsymbol x_1 - \boldsymbol x2)^\top \boldsymbol D_\ell^{-2} (\boldsymbol x_1 - \boldsymbol x2) \right),\]

where \(o\) is an outputscale parameter and \(D_\ell\) is a diagonal lengthscale matrix, we would expect the following shapes:

  • x1: (*batch_shape x N x D)

  • x2: (*batch_shape x M x D)

  • lengthscale: (*batch_shape x 1 x D)

  • outputscale: (*batch_shape) # Note this parameter does not have non-batch dimensions

We would then supply the dictionary num_nonbatch_dimensions = {“outputscale”: 0}. (We do not need to include lengthscale in the dictionary since it has 2 non-batch dimensions.)

# NOTE: _covar_func intentionally does not close over any parameters
def _covar_func(x1, x2, lengthscale, outputscale):
    # RBF kernel function
    # x1: ... x N x D
    # x2: ... x M x D
    # lengthscale: ... x 1 x D
    # outputscale: ...
    x1 = x1.div(lengthscale)
    x2 = x2.div(lengthscale)
    sq_dist = (x1.unsqueeze(-2) - x2.unsqueeze(-3)).square().sum(dim=-1)
    kern = sq_dist.div(-2.0).exp().mul(outputscale[..., None, None].square())
    return kern


# Batches of data
x1 = torch.randn(3, 5, 6)
x2 = torch.randn(3, 4, 6)
# Broadcasting lengthscale and output parameters
lengthscale = torch.randn(2, 1, 1, 6)  # Batch shape is 2 x 1, with 2 non-batch dimensions
outputscale = torch.randn(2, 1)  # Batch shape is 2 x 1, no non-batch dimensions
kern = KernelLinearOperator(
    x1, x2, lengthscale=lengthscale, outputscale=outputscale,
    covar_func=covar_func, num_nonbatch_dimensions={"outputscale": 0}
)

# kern is of size 2 x 3 x 5 x 4

Warning

covar_func should not close over any parameters. Any parameters that are closed over will not have propagated gradients.

See the example above: the lengthscale and outputscale of _covar_func are passed in as arguments, rather than being externally defined variables.

Parameters:
  • x1 (torch.Tensor (... x M x D)) – The data \(\boldsymbol X_1.\)

  • x2 (torch.Tensor (... x N x D)) – The data \(\boldsymbol X_2.\)

  • covar_func (Callable[... -> torch.Tensor (... x M x N) or LinearOperator (... x M x N)]) – The covariance function \(k_{\boldsymbol \theta}(\cdot, \cdot)\). Its arguments should be x1, x2, **params, and it should output the covariance matrix between \(\boldsymbol X_1\) and \(\boldsymbol X_2\).

  • num_outputs_per_input ((int, int)) – The number of outputs per data point. This parameter should be 1 for most kernels, but will be >1 for multitask kernels, gradient kernels, and any other kernels that require cross-covariance terms for multiple domains. If a tuple is passed, there will be a different number of outputs per input dimension for the rows/cols of the kernel matrix.

  • params (torch.Tensor or Any) – Additional hyperparameters (\(\boldsymbol \theta\)) or keyword arguments passed into covar_func.

RootLinearOperator

class linear_operator.operators.RootLinearOperator(root)[source]

ToeplitzLinearOperator

class linear_operator.operators.ToeplitzLinearOperator(column)[source]

ZeroLinearOperator

class linear_operator.operators.ZeroLinearOperator(*sizes, dtype=None, device=None)[source]

Special LinearOperator representing zero.

Parameters:
  • sizes ((int, ...)) – The size of each dimension (including batch dimensions).

  • dtype (torch.dtype, optional) – Dtype that the LinearOperator will be operating on. (Default: torch.get_default_dtype()).

  • device (torch.device, optional) – Device that the LinearOperator will be operating on. (Default: CPU).