Welcome to PNMF’s documentation!
PNMF (Probabilistic Non-negative Matrix Factorization) is a Python package that implements variational inference for Poisson factorization with a scikit-learn compatible API.
Note
This project is licensed under the GNU General Public License v2.0 (GPL-2.0).
Installation
Install from PyPI:
pip install pnmf
Or install from source:
git clone https://github.com/luisfcd/Probabilistic-NMF.git
cd Probabilistic-NMF
pip install -e .
Quick Start
from PNMF import PNMF
import numpy as np
# Create sample data
X = np.random.rand(100, 50)
# Initialize and fit
model = PNMF(n_components=5, random_state=42, verbose=True)
transformed = model.fit_transform(X)
# Access results
print(f"Components shape: {model.components_.shape}") # (5, 50)
print(f"Transformed shape: {transformed.shape}") # (100, 5)
print(f"ELBO: {model.elbo_}")
Mathematical Formulation
Model
Given a non-negative data matrix \(Y \in \mathbb{R}_{+}^{N \times D}\), the PNMF model assumes each entry follows a Poisson distribution:
where the rate parameter \(\lambda_{ij}\) is given by:
with the following components:
\(W \in \mathbb{R}_{+}^{D \times L}\) are the loadings (learned parameters)
\(F \in \mathbb{R}^{N \times L}\) are the latent factors (random variables)
The latent factors follow a Gaussian prior:
Variational Inference
We use variational inference to approximate the posterior distribution over the latent factors \(F\). The variational distribution \(q(F)\) is a Gaussian with learnable mean and scale parameters:
The Evidence Lower BOund (ELBO) is:
Expected Log-Likelihood Expansion
The expected log-likelihood term expands as follows. For a Poisson observation model, the log-likelihood is:
Substituting the NSF parameterization \(\lambda_{ij} = \sum_{\ell} W_{j\ell} e^{F_{i\ell}}\), the expected log-likelihood becomes:
The second expectation can be computed analytically using the moment-generating function of the Gaussian:
This gives the final form:
where: - The first term (log of sum) is estimated via Monte Carlo sampling with the reparameterization trick - The second term (sum of exponentials) is computed analytically - The third term is constant with respect to the parameters
Optimization
The model is optimized by maximizing the ELBO using gradient ascent. The variational parameters \(\mu_{i\ell}\) and \(\sigma_{i\ell}\) and the loadings \(W_{j\ell}\) are learned jointly.