MCMC Samplers

For most users it suffices to use the predefined Bayesian variable selection classes like millipede.NormalLikelihoodVariableSelector. However advanced users also have the option of using the MCMC samplers directly. Note that the samplers are only partially documented. For example usage see the source code.

NormalLikelihoodSampler

class NormalLikelihoodSampler(X, Y, X_assumed=None, sigma_scale_factor=None, S=5.0, prior='isotropic', include_intercept=True, tau=0.01, tau_intercept=0.0001, c=100.0, nu0=0.0, lambda0=0.0, explore=5, precompute_XX=False, compute_betas=False, verbose_constructor=True, xi_target=0.2, subset_size=None, anchor_size=None)[source]

MCMC sampler for Bayesian variable selection for a linear model with a Normal likelihood. The likelihood variance is controlled by a Inverse Gamma prior. This class supports continuous-valued responses.

The details of the available models in NormalLikelihoodSampler are as follows. The covariates \(X\) and responses \(Y\) are defined as follows:

\[X \in \mathbb{R}^{N \times P} \qquad \qquad Y \in \mathbb{R}^{N}\]

The inclusion of each covariate is governed by a Bernoulli random variable \(\gamma_p\). In particular \(\gamma_p = 0\) corresponds to exclusion and \(\gamma_p = 1\) corresponds to inclusion. The prior probability of inclusion is governed by \(h\) or alternatively \(S\):

\[h \in [0, 1] \qquad \rm{with} \qquad S \equiv hP\]

Alternatively, if \(h\) is not known a priori we can put a prior on \(h\):

\[h \sim {\rm Beta}(\alpha, \beta) \qquad \rm{with} \qquad \alpha > 0 \;\;\;\; \beta > 0\]

Putting this together, the model specification for an isotopric prior (with an intercept \(\beta_0\) included) is as follows:

\[ \begin{align}\begin{aligned}&\gamma_p \sim \rm{Bernoulli}(h) \qquad \rm{for} \qquad p=1,2,...,P\\&\sigma^2 \sim \rm{InverseGamma}(\nu_0 / 2, \nu_0 \lambda_0 / 2)\\&\beta_0 \sim \rm{Normal}(0, \sigma^2\tau_\rm{intercept}^{-1})\\&\beta_\gamma \sim \rm{Normal}(0, \sigma^2 \tau^{-1} \mathbb{1}_\gamma)\\&Y_n \sim \rm{Normal}(\beta_0 + X_{n, \gamma} \cdot \beta_\gamma, \sigma^2)\end{aligned}\end{align} \]

Note that the dimension of \(\beta_\gamma\) depends on the number of covariates included in a particular model (i.e. on the number of non-zero entries in \(\gamma\)).

For a gprior the prior over the coefficients is instead specified as follows:

\[\beta_{\gamma} \sim \rm{Normal}(0, c \sigma^2 (X_\gamma^{\rm{T}} X_\gamma)^{-1})\]

Usage of this class is only recommended for advanced users. For most users it should suffice to use NormalLikelihoodVariableSelector.

Parameters
  • X (tensor) – A N x P torch.Tensor of covariates.

  • Y (tensor) – A N-dimensional torch.Tensor of continuous responses.

  • X_assumed (tensor) – A N x P’ torch.Tensor of covariates that are always assumed to be part of the model. Defaults to None.

  • sigma_scale_factor (tensor) – A N-dimensional torch.Tensor of positive scale factors that are used to scale the standard deviation of the Normal likelihood for each datapoint. For example, specifying 2.0 for a particular datapoint results in \(\sigma \rightarrow 2 \times \sigma\). Defaults to None.

  • S – Controls the expected number of covariates to include in the model a priori. Defaults to 5.0. To specify covariate-level prior inclusion probabilities provide a P-dimensional torch.Tensor of the form (h_1, …, h_P). If a tuple of positive floats (alpha, beta) is provided, the a priori inclusion probability is a latent variable governed by the corresponding Beta prior so that the sparsity level is inferred from the data. Note that for a given choice of alpha and beta the expected number of covariates to include in the model a priori is given by \(\frac{\alpha}{\alpha + \beta} \times P\). Also note that the mean number of covariates in the posterior can vary significantly from prior expectations, since the posterior is in effect a compromise between the prior and the observed data.

  • prior (str) – One of the two supported priors for the coefficients: ‘isotropic’ or ‘gprior’. Defaults to ‘isotropic’.

  • include_intercept (bool) – Whether to include an intercept term. If included the intercept term is is included in all models so that the corresponding coefficient does not have a PIP.

  • tau (float) – Controls the precision of the coefficients in the isotropic prior. Defaults to 0.01.

  • tau_intercept (float) – Controls the precision of the intercept in the isotropic prior. Defaults to 1.0e-4.

  • c (float) – Controls the precision of the coefficients in the gprior. Defaults to 100.0.

  • nu0 (float) – Controls the prior over the precision in the Normal likelihood. Defaults to 0.0.

  • lambda0 (float) – Controls the prior over the precision in the Normal likelihood. Defaults to 0.0.

  • explore (float) – This hyperparameter controls how greedy the MCMC algorithm is. Defaults to 5.0.

  • precompute_XX (bool) – Whether the matrix X^t @ X should be pre-computed. Defaults to False. Note that setting this to True may result in out-of-memory errors for sufficiently large covariate matrices.

  • verbose_constructor (bool) – Whether the class constructor should print some information to stdout upon initialization.

  • xi_target (float) – This hyperparameter controls how often \(h\) MCMC updates are made if \(h\) is a latent variable. Defaults to 0.2.

  • subset_size (int) – If subset_size is not None subset_size controls the amount of computational resources to use in Subset wTGS. Otherwise if subset_size is None vanilla wTGS is used. This argument is intended to be used for datasets with a very large number of covariates (e.g. tens of thousands or more). A typical value might be ~5-10% of the total number of covariates; smaller values result in more MCMC iterations per second but may lead to high variance PIP estimates. Defaults to None.

  • anchor_size (int) – If subset_size is not None anchor_size controls how greedy Subset wTGS is. If anchor_size is None it defaults to half of subset_size. For expert users only. Defaults to None.

CountLikelihoodSampler

class CountLikelihoodSampler(X, Y, X_assumed=None, TC=None, psi0=None, S=5.0, tau=0.01, tau_intercept=0.0001, explore=5.0, log_nu_rw_scale=0.05, omega_mh=True, xi_target=0.25, init_nu=5.0, verbose_constructor=True, subset_size=None, anchor_size=None)[source]

MCMC algorithm for Bayesian variable selection for a generalized linear model with a Binomial or Negative Binomial likelihood (note that a Bernoulli likelihood is a special case of a Binomial likelihood). This class supports count-valued responses.

To define a Binomial model specify TC but not psi0. To define a Negative Binomial model specify psi0 but not TC.

The details of the available models in CountlLikelihoodSampler are as follows. For both likelihoods the covariates and responses are defined as:

\[X \in \mathbb{R}^{N \times P} \qquad \qquad Y \in \mathbb{Z}_{\ge 0}^{N}\]

The inclusion of each covariate is governed by a Bernoulli random variable \(\gamma_p\). In particular \(\gamma_p = 0\) corresponds to exclusion and \(\gamma_p = 1\) corresponds to inclusion. The prior probability of inclusion is governed by \(h\) or alternatively \(S\):

\[h \in [0, 1] \qquad \rm{with} \qquad S \equiv hP\]

Alternatively, if \(h\) is not known a priori we can put a prior on \(h\):

\[h \sim {\rm Beta}(\alpha, \beta) \qquad \rm{with} \qquad \alpha > 0 \;\;\;\; \beta > 0\]

Putting this together, the model specification for the Binomial case is as follows:

\[ \begin{align}\begin{aligned}&\gamma_p \sim \rm{Bernoulli}(h) \qquad \rm{for} \qquad p=1,2,...,P\\&\beta_0 \sim \rm{Normal}(0, \tau_\rm{intercept}^{-1})\\&\beta_\gamma \sim \rm{Normal}(0, \tau^{-1} \mathbb{1}_\gamma)\\&Y_n \sim \rm{Binomial}(T_n, \sigma(\beta_0 + X_{n, \gamma} \cdot \beta_\gamma))\end{aligned}\end{align} \]

where \(\sigma(\cdot)\) is the logistic or sigmoid function and \(T_n\) denotes the \(N\)-dimensional vector of total counts. That is each Binomial likelihood is equivalent to \(T_n\) corresponding Bernoulli likelihoods.

The Negative Binomial case is similar but includes a latent variable \(\nu > 0\) that governs the dispersion of the Negative Binomial distribution:

\[ \begin{align}\begin{aligned}&\log \nu \sim \rm{ImproperPrior}(-\infty, \infty)\\&Y_n \sim \rm{NegBinomial}(\rm{mean}=\rm{exp}(\beta_0 + X_{n, \gamma} \cdot \beta_\gamma + \psi_{0, n}), \nu)\end{aligned}\end{align} \]

The vector \(\psi_0 \in \mathbb{R}^N\) allows the user to supply a datapoint-specific offset. We note that we use a parameterization of the Negative Binomial distribution where the variance is given by

\[\rm{variance} = \rm{mean} + \rm{mean}^2 / \nu\]

so that small values of \(\nu\) correspond to large dispersion/variance and \(\nu \to \infty\) recovers the Poisson distribution.

Note that above the dimension of \(\beta_\gamma\) depends on the number of covariates included in a particular model (i.e. on the number of non-zero entries in \(\gamma\)).

Usage of this class is only recommended for advanced users. For most users it should suffice to use one of BinomialLikelihoodVariableSelector, BernoulliLikelihoodVariableSelector, and NegativeBinomialLikelihoodVariableSelector.

Parameters
  • X (tensor) – A N x P torch.Tensor of covariates. This is a required argument.

  • Y (tensor) – A N-dimensional torch.Tensor of non-negative count-valued responses. This is a required argument.

  • X_assumed (tensor) – A N x P’ torch.Tensor of covariates that are always assumed to be part of the model. Defaults to None.

  • TC (tensor) – A N-dimensional torch.Tensor of non-negative counts. This is a required argument if you wish to specify a Binomial model. Defaults to None.

  • psi0 (tensor) – A N-dimensional torch.Tensor of offsets psi0. This is a required argument if you wish to specify a Negative Binomial model. If the user specifies a float, psi0 will be expanded to a N-dimensional vector internally. Defaults to None.

  • S – Controls the expected number of covariates to include in the model a priori. Defaults to 5.0. To specify covariate-level prior inclusion probabilities provide a P-dimensional torch.Tensor of the form (h_1, …, h_P). If a tuple of positive floats (alpha, beta) is provided, the a priori inclusion probability is a latent variable governed by the corresponding Beta prior so that the sparsity level is inferred from the data. Note that for a given choice of alpha and beta the expected number of covariates to include in the model a priori is given by \(\frac{\alpha}{\alpha + \beta} \times P\). Also note that the mean number of covariates in the posterior can vary significantly from prior expectations, since the posterior is in effect a compromise between the prior and the observed data.

  • tau (float) – Controls the precision of the coefficients in the isotropic prior. Defaults to 0.01.

  • tau_intercept (float) – Controls the precision of the intercept in the isotropic prior. Defaults to 1.0e-4.

  • explore (float) – This hyperparameter controls how greedy the MCMC algorithm is. Defaults to 5.0.

  • log_nu_rw_scale (float) – This hyperparameter controls the proposal distribution for nu updates. Defaults to 0.05. Only applicable to the Negative Binomial case.

  • omega_mh (bool) – Whether to include Metropolis-Hastings corrections during Polya-Gamma updates. Defaults to True. Only applicable to the Binomial case.

  • xi_target (float) – This hyperparameter controls how frequently the MCMC algorithm makes Polya-Gamma updates. It also controls how often \(h\) updates are made if \(h\) is a latent variable. Defaults to 0.25.

  • init_nu (float) – This hyperparameter controls the initial value of the dispersion parameter nu. Defaults to 5.0. Only applicable to the Negative Binomial case.

  • verbose_constructor (bool) – Whether the class constructor should print some information to stdout upon initialization.

  • subset_size (int) – If subset_size is not None subset_size controls the amount of computational resources to use in Subset PG-wTGS. Otherwise if subset_size is None vanilla PG-wTGS is used. This argument is intended to be used for datasets with a very large number of covariates (e.g. tens of thousands or more). A typical value might be ~5-10% of the total number of covariates; smaller values result in more MCMC iterations per second but may lead to high variance PIP estimates. Defaults to None.

  • anchor_size (int) – If subset_size is not None anchor_size controls how greedy Subset PG-wTGS is. If anchor_size is None it defaults to half of subset_size. For expert users only. Defaults to None.