Skip to content

Convex Non-Negative Matrix Factorization (Convex NMF)

Convex Non-Negative Matrix Factorization (Convex NMF) is a variation of NMF that constrains the dictionary (D) to be a convex combination of input data (A). This ensures that learned basis components are directly interpretable in terms of the original data and the dictionary elements to be in the conical hull of the data.

where:

  • A (Data Matrix): Input data, shape (n_samples, n_features).
  • Z (Codes Matrix): Latent representation, shape (n_samples, nb_concepts), constrained to be non-negative.
  • W (Coefficient Matrix): Convex coefficients, shape (nb_concepts, n_samples), constrained to be non-negative. The dictionary is computed as D = W A.

Basic Usage

from overcomplete import ConvexNMF

# define a Convex NMF model with 10k concepts using
# the multiplicative update solver
convex_nmf = ConvexNMF(nb_concepts=10_000, solver='mu')

# fit the model to input data A
Z, D = convex_nmf.fit(A)

# encode new data
Z = convex_nmf.encode(A)
# decode (reconstruct) data from codes
A_hat = convex_nmf.decode(Z)

Solvers

The Convex NMF module supports different optimization strategies: - MU (Multiplicative Updates) - Standard Convex NMF update rule. - PGD (Projected Gradient Descent) - Allows for sparsity control via L1 penalty on Z.

For further details, we encourage you to check the original references 1.

ConvexNMF

PyTorch Convex NMF-based Dictionary Learning model.

__init__(self,
         nb_concepts,
         device='cpu',
         tol=0.0001,
         strict_convex=False,
         solver='pgd',
         verbose=False,
         l1_penalty=0.0,
         **kwargs)

Parameters

  • nb_concepts : int

    • Number of components to learn.

  • device : str, optional

    • Device to use for tensor computations, by default 'cpu'.

  • tol : float, optional

    • Tolerance value for the stopping criterion, by default 1e-4.

  • strict_convex : bool, optional

    • Whether to enforce the convexity constraint, by default False.

  • solver : str, optional

    • Optimization solver to use, either 'mu' (Multiplicative Update) or 'pgd' like method, by default 'mu'.

  • verbose : bool, optional

    • Whether to print optimization information, by default False.

  • l1_penalty : float, optional

    • L1 penalty coefficient, by default 0.0. Only used with the 'pgd' solver.

init_random_z(self,
              A)

Initialize the codes Z using non negative random values.

Parameters

  • A : torch.Tensor

    • Input tensor of shape (batch_size, n_features).

Return

  • Z : torch.Tensor

    • Initialized codes tensor.


get_dictionary(self)

Return the learned dictionary components from Convex NMF.

Return

  • torch.Tensor

    • Dictionary components D = A W.


encode(self,
       A,
       max_iter=300,
       tol=None)

Encode the input tensor (the data) using Convex NMF.

Parameters

  • A : torch.Tensor

    • Input tensor of shape (n_samples, n_features).

  • max_iter : int, optional

    • Maximum number of iterations, by default 300.

  • tol : float, optional

    • Tolerance value for the stopping criterion, by default the value set at initialization.

Return

  • torch.Tensor

    • Encoded features (the codes Z).


sanitize_np_input(self,
                  x)

Ensure the input tensor is a numpy array of shape (batch_size, dims). Convert from pytorch tensor or DataLoader if necessary.

Parameters

  • x : torch.Tensor or Iterable

    • Input tensor of shape (batch_size, dims).

Return

  • torch.Tensor

    • Sanitized input tensor.


fit(self,
    A,
    max_iter=500)

Fit the Convex NMF model to the input data.

Parameters

  • A : torch.Tensor

    • Input tensor of shape (n_samples, n_features).

  • max_iter : int, optional

    • Maximum number of iterations, by default 500.


init_semi_nmf(self,
              A,
              max_snmf_iters=100)

Initialize the Convex NMF model using Semi-NMF.

Parameters

  • A : torch.Tensor

    • Input tensor of shape (n_samples, n_features).

  • max_snmf_iters : int, optional

    • Maximum number of iterations for Semi-NMF, by default 100.

Return

  • Z : torch.Tensor

    • Initialized codes tensor.

  • D : torch.Tensor

    • Initialized dictionary tensor.


init_random_w(self,
              A)

Initialize the coefficient matrix W with non-negative random values.

Parameters

  • A : torch.Tensor

    • Input tensor of shape (n_samples, n_features).

Return

  • W : torch.Tensor

    • Initialized coefficient tensor.


sanitize_np_codes(self,
                  z)

Ensure the codes tensor (Z) is a numpy array of shape (batch_size, nb_concepts). Convert from pytorch tensor or DataLoader if necessary.

Parameters

  • z : torch.Tensor

    • Encoded tensor (the codes) of shape (batch_size, nb_concepts).

Return

  • torch.Tensor

    • Sanitized codes tensor.


decode(self,
       Z)

Decode the input tensor (the codes) using Convex NMF.

Parameters

  • Z : torch.Tensor

    • Codes tensor of shape (n_samples, nb_concepts).

Return

  • torch.Tensor

    • Decoded output (the approximation of A).



  1. Convex and Semi-Nonnegative Matrix Factorizations by Ding, Li, and Jordan (2008).