MCMC Samplers¶
For most users it suffices to use the predefined Bayesian variable selection classes
like millipede.NormalLikelihoodVariableSelector
.
However advanced users also have the option of using the MCMC samplers directly.
Note that the samplers are only partially documented. For example usage see
the source code.
NormalLikelihoodSampler¶
- class NormalLikelihoodSampler(X, Y, X_assumed=None, sigma_scale_factor=None, S=5.0, prior='isotropic', include_intercept=True, tau=0.01, tau_intercept=0.0001, c=100.0, nu0=0.0, lambda0=0.0, explore=5, precompute_XX=False, compute_betas=False, verbose_constructor=True, xi_target=0.2, subset_size=None, anchor_size=None)[source]¶
MCMC sampler for Bayesian variable selection for a linear model with a Normal likelihood. The likelihood variance is controlled by a Inverse Gamma prior. This class supports continuous-valued responses.
The details of the available models in NormalLikelihoodSampler are as follows. The covariates \(X\) and responses \(Y\) are defined as follows:
\[X \in \mathbb{R}^{N \times P} \qquad \qquad Y \in \mathbb{R}^{N}\]The inclusion of each covariate is governed by a Bernoulli random variable \(\gamma_p\). In particular \(\gamma_p = 0\) corresponds to exclusion and \(\gamma_p = 1\) corresponds to inclusion. The prior probability of inclusion is governed by \(h\) or alternatively \(S\):
\[h \in [0, 1] \qquad \rm{with} \qquad S \equiv hP\]Alternatively, if \(h\) is not known a priori we can put a prior on \(h\):
\[h \sim {\rm Beta}(\alpha, \beta) \qquad \rm{with} \qquad \alpha > 0 \;\;\;\; \beta > 0\]Putting this together, the model specification for an isotopric prior (with an intercept \(\beta_0\) included) is as follows:
\[ \begin{align}\begin{aligned}&\gamma_p \sim \rm{Bernoulli}(h) \qquad \rm{for} \qquad p=1,2,...,P\\&\sigma^2 \sim \rm{InverseGamma}(\nu_0 / 2, \nu_0 \lambda_0 / 2)\\&\beta_0 \sim \rm{Normal}(0, \sigma^2\tau_\rm{intercept}^{-1})\\&\beta_\gamma \sim \rm{Normal}(0, \sigma^2 \tau^{-1} \mathbb{1}_\gamma)\\&Y_n \sim \rm{Normal}(\beta_0 + X_{n, \gamma} \cdot \beta_\gamma, \sigma^2)\end{aligned}\end{align} \]Note that the dimension of \(\beta_\gamma\) depends on the number of covariates included in a particular model (i.e. on the number of non-zero entries in \(\gamma\)).
For a gprior the prior over the coefficients is instead specified as follows:
\[\beta_{\gamma} \sim \rm{Normal}(0, c \sigma^2 (X_\gamma^{\rm{T}} X_\gamma)^{-1})\]Usage of this class is only recommended for advanced users. For most users it should suffice to use
NormalLikelihoodVariableSelector
.- Parameters
X (tensor) – A N x P torch.Tensor of covariates.
Y (tensor) – A N-dimensional torch.Tensor of continuous responses.
X_assumed (tensor) – A N x P’ torch.Tensor of covariates that are always assumed to be part of the model. Defaults to None.
sigma_scale_factor (tensor) – A N-dimensional torch.Tensor of positive scale factors that are used to scale the standard deviation of the Normal likelihood for each datapoint. For example, specifying 2.0 for a particular datapoint results in \(\sigma \rightarrow 2 \times \sigma\). Defaults to None.
S – Controls the expected number of covariates to include in the model a priori. Defaults to 5.0. To specify covariate-level prior inclusion probabilities provide a P-dimensional torch.Tensor of the form (h_1, …, h_P). If a tuple of positive floats (alpha, beta) is provided, the a priori inclusion probability is a latent variable governed by the corresponding Beta prior so that the sparsity level is inferred from the data. Note that for a given choice of alpha and beta the expected number of covariates to include in the model a priori is given by \(\frac{\alpha}{\alpha + \beta} \times P\). Also note that the mean number of covariates in the posterior can vary significantly from prior expectations, since the posterior is in effect a compromise between the prior and the observed data.
prior (str) – One of the two supported priors for the coefficients: ‘isotropic’ or ‘gprior’. Defaults to ‘isotropic’.
include_intercept (bool) – Whether to include an intercept term. If included the intercept term is is included in all models so that the corresponding coefficient does not have a PIP.
tau (float) – Controls the precision of the coefficients in the isotropic prior. Defaults to 0.01.
tau_intercept (float) – Controls the precision of the intercept in the isotropic prior. Defaults to 1.0e-4.
c (float) – Controls the precision of the coefficients in the gprior. Defaults to 100.0.
nu0 (float) – Controls the prior over the precision in the Normal likelihood. Defaults to 0.0.
lambda0 (float) – Controls the prior over the precision in the Normal likelihood. Defaults to 0.0.
explore (float) – This hyperparameter controls how greedy the MCMC algorithm is. Defaults to 5.0.
precompute_XX (bool) – Whether the matrix X^t @ X should be pre-computed. Defaults to False. Note that setting this to True may result in out-of-memory errors for sufficiently large covariate matrices.
verbose_constructor (bool) – Whether the class constructor should print some information to stdout upon initialization.
xi_target (float) – This hyperparameter controls how often \(h\) MCMC updates are made if \(h\) is a latent variable. Defaults to 0.2.
subset_size (int) – If subset_size is not None subset_size controls the amount of computational resources to use in Subset wTGS. Otherwise if subset_size is None vanilla wTGS is used. This argument is intended to be used for datasets with a very large number of covariates (e.g. tens of thousands or more). A typical value might be ~5-10% of the total number of covariates; smaller values result in more MCMC iterations per second but may lead to high variance PIP estimates. Defaults to None.
anchor_size (int) – If subset_size is not None anchor_size controls how greedy Subset wTGS is. If anchor_size is None it defaults to half of subset_size. For expert users only. Defaults to None.
CountLikelihoodSampler¶
- class CountLikelihoodSampler(X, Y, X_assumed=None, TC=None, psi0=None, S=5.0, tau=0.01, tau_intercept=0.0001, explore=5.0, log_nu_rw_scale=0.05, omega_mh=True, xi_target=0.25, init_nu=5.0, verbose_constructor=True, subset_size=None, anchor_size=None)[source]¶
MCMC algorithm for Bayesian variable selection for a generalized linear model with a Binomial or Negative Binomial likelihood (note that a Bernoulli likelihood is a special case of a Binomial likelihood). This class supports count-valued responses.
To define a Binomial model specify TC but not psi0. To define a Negative Binomial model specify psi0 but not TC.
The details of the available models in
CountlLikelihoodSampler
are as follows. For both likelihoods the covariates and responses are defined as:\[X \in \mathbb{R}^{N \times P} \qquad \qquad Y \in \mathbb{Z}_{\ge 0}^{N}\]The inclusion of each covariate is governed by a Bernoulli random variable \(\gamma_p\). In particular \(\gamma_p = 0\) corresponds to exclusion and \(\gamma_p = 1\) corresponds to inclusion. The prior probability of inclusion is governed by \(h\) or alternatively \(S\):
\[h \in [0, 1] \qquad \rm{with} \qquad S \equiv hP\]Alternatively, if \(h\) is not known a priori we can put a prior on \(h\):
\[h \sim {\rm Beta}(\alpha, \beta) \qquad \rm{with} \qquad \alpha > 0 \;\;\;\; \beta > 0\]Putting this together, the model specification for the Binomial case is as follows:
\[ \begin{align}\begin{aligned}&\gamma_p \sim \rm{Bernoulli}(h) \qquad \rm{for} \qquad p=1,2,...,P\\&\beta_0 \sim \rm{Normal}(0, \tau_\rm{intercept}^{-1})\\&\beta_\gamma \sim \rm{Normal}(0, \tau^{-1} \mathbb{1}_\gamma)\\&Y_n \sim \rm{Binomial}(T_n, \sigma(\beta_0 + X_{n, \gamma} \cdot \beta_\gamma))\end{aligned}\end{align} \]where \(\sigma(\cdot)\) is the logistic or sigmoid function and \(T_n\) denotes the \(N\)-dimensional vector of total counts. That is each Binomial likelihood is equivalent to \(T_n\) corresponding Bernoulli likelihoods.
The Negative Binomial case is similar but includes a latent variable \(\nu > 0\) that governs the dispersion of the Negative Binomial distribution:
\[ \begin{align}\begin{aligned}&\log \nu \sim \rm{ImproperPrior}(-\infty, \infty)\\&Y_n \sim \rm{NegBinomial}(\rm{mean}=\rm{exp}(\beta_0 + X_{n, \gamma} \cdot \beta_\gamma + \psi_{0, n}), \nu)\end{aligned}\end{align} \]The vector \(\psi_0 \in \mathbb{R}^N\) allows the user to supply a datapoint-specific offset. We note that we use a parameterization of the Negative Binomial distribution where the variance is given by
\[\rm{variance} = \rm{mean} + \rm{mean}^2 / \nu\]so that small values of \(\nu\) correspond to large dispersion/variance and \(\nu \to \infty\) recovers the Poisson distribution.
Note that above the dimension of \(\beta_\gamma\) depends on the number of covariates included in a particular model (i.e. on the number of non-zero entries in \(\gamma\)).
Usage of this class is only recommended for advanced users. For most users it should suffice to use one of
BinomialLikelihoodVariableSelector
,BernoulliLikelihoodVariableSelector
, andNegativeBinomialLikelihoodVariableSelector
.- Parameters
X (tensor) – A N x P torch.Tensor of covariates. This is a required argument.
Y (tensor) – A N-dimensional torch.Tensor of non-negative count-valued responses. This is a required argument.
X_assumed (tensor) – A N x P’ torch.Tensor of covariates that are always assumed to be part of the model. Defaults to None.
TC (tensor) – A N-dimensional torch.Tensor of non-negative counts. This is a required argument if you wish to specify a Binomial model. Defaults to None.
psi0 (tensor) – A N-dimensional torch.Tensor of offsets psi0. This is a required argument if you wish to specify a Negative Binomial model. If the user specifies a float, psi0 will be expanded to a N-dimensional vector internally. Defaults to None.
S – Controls the expected number of covariates to include in the model a priori. Defaults to 5.0. To specify covariate-level prior inclusion probabilities provide a P-dimensional torch.Tensor of the form (h_1, …, h_P). If a tuple of positive floats (alpha, beta) is provided, the a priori inclusion probability is a latent variable governed by the corresponding Beta prior so that the sparsity level is inferred from the data. Note that for a given choice of alpha and beta the expected number of covariates to include in the model a priori is given by \(\frac{\alpha}{\alpha + \beta} \times P\). Also note that the mean number of covariates in the posterior can vary significantly from prior expectations, since the posterior is in effect a compromise between the prior and the observed data.
tau (float) – Controls the precision of the coefficients in the isotropic prior. Defaults to 0.01.
tau_intercept (float) – Controls the precision of the intercept in the isotropic prior. Defaults to 1.0e-4.
explore (float) – This hyperparameter controls how greedy the MCMC algorithm is. Defaults to 5.0.
log_nu_rw_scale (float) – This hyperparameter controls the proposal distribution for nu updates. Defaults to 0.05. Only applicable to the Negative Binomial case.
omega_mh (bool) – Whether to include Metropolis-Hastings corrections during Polya-Gamma updates. Defaults to True. Only applicable to the Binomial case.
xi_target (float) – This hyperparameter controls how frequently the MCMC algorithm makes Polya-Gamma updates. It also controls how often \(h\) updates are made if \(h\) is a latent variable. Defaults to 0.25.
init_nu (float) – This hyperparameter controls the initial value of the dispersion parameter nu. Defaults to 5.0. Only applicable to the Negative Binomial case.
verbose_constructor (bool) – Whether the class constructor should print some information to stdout upon initialization.
subset_size (int) – If subset_size is not None subset_size controls the amount of computational resources to use in Subset PG-wTGS. Otherwise if subset_size is None vanilla PG-wTGS is used. This argument is intended to be used for datasets with a very large number of covariates (e.g. tens of thousands or more). A typical value might be ~5-10% of the total number of covariates; smaller values result in more MCMC iterations per second but may lead to high variance PIP estimates. Defaults to None.
anchor_size (int) – If subset_size is not None anchor_size controls how greedy Subset PG-wTGS is. If anchor_size is None it defaults to half of subset_size. For expert users only. Defaults to None.