This package provides the boilerplate code for the Correlated Pseudo-Marginal method of (Deligiannidis, Doucet, and Pitt 2015). This vignette demonstrates its use by providing two examples that are closely related to ones given in their paper.

Assume that you are a Bayesian and have some data \(y\), and that you would like to perform inference for a collection of parameters \(\theta\). After specifying a likelihood \(p(y \mid \theta)\) and a prior \(p(\theta)\), you are interested in sampling from the posterior \(p(\theta \mid y)\) using a Markov chain Monte Carlo (MCMC) algorithm. Such an algorithm will draw correlated samples \(\theta^1, \theta^2, \ldots, \theta^{N_{S}}\) from a Markov chain, and under suitable regularity conditions, the average of your large collection of samples will approximate posterior expectations such as the posterior mean: \(E[\theta \mid y]\).

This package implements Metropolis-Hastings (MH) type algorithms, which is a specific sub-class of MCMC methods. MH algorithms moves from iteration to iteration by following a two-step procedure. Step one starts with *proposing* a new value in the chain. Step two makes a decision on whether or not to *accept* this new proposal. In the event that the sample is rejected, the chain stays in the same place. other MCMC methods, such as Gibbs sampling, produce samples that cannot stay in the same place.

This software is more specifically tailored to an implementation of the Correlated Pseudo-Marginal sampler (CPM). CPM is an extension of the Pseudo-Marginal Metropolis-Hastings algorithm (PMMH) (Andrieu and Roberts 2009), which is itself an extension of the Metropolis-Hastings (MH) algorithm. CPM and PMMH are useful for sampling from the posterior of a model with an *intractable likelihood.* While the original MH algorithm requires the ability to evaluate the model’s likelihood, CPM and PMMH only requires unbiased approximations of it.

`makeCPMSampler()`

FunctionThis package provides a function `makeCPMSampler()`

that implements all three of these algorithms (although it is more suited for PMMH and CPM algorithms). It abstracts away many implementation details, so, mathematically speaking, only a small degree of familiarity with the algorithms is required to use this software.

Programmatically speaking, `makeCPMSampler()`

is a *function factory*. This means that it is a function that creates and returns another function. The returned function will be the function that generates parameter samples.

Furthermore, many of `makeCPMSampler()`

’s required inputs are functions, as well. Here is a description of all the inputs it takes.

Any MH algorithm (e.g. MH, PMMH, CPM) requires that you are able to sample from a proposal distribution, and to evaluate its density. You provide it these to

`makeCPMSampler()`

in the first two inputs:`paramKernSamp=`

and`logParamKernEval=`

.You will need to provide a function that evaluates the

*logarithm*of the prior distribution. This is the third argument:`logPriorEval`

.Unlike the standard Metropolis-Hastings algorithm, you will not be required to provide a function that evaluates the logarithm of the likelihood. Recall from earlier that PMMH and CPM only require an unbiased estimate of the likelihood. The logarithm of an unbiased approximation is provided as the fourth argument:

`logLikeApproxEval=`

. If you’d like, you may provide for this argument a function giving*exact*evaluations. In this case,`makeCPMSampler()`

will return a standard MH sampler. This can be useful, for instance, if you are comparing the relative efficiency of pseudo-marginal and non-pseudo-marginal methods.Finally, you must provide the non-functional inputs.

`yData`

is the entire data set of observations.`numU`

is the number of standard univariate normal samples that are used for each approximate log-likelihood evaluation.`numIters`

is the total number of samples drawn. This does not count burn-in or thinning.`rho`

is the correlation between standard normal variates at each MCMC iteration. Finally, change`storeEvery`

to some integer greater than \(1\) if you are concerned about memory on your computer, and you’d like to use thinning. If you set it to \(2\), say, then every other sample will be retained.

The first example is the same as the provided in this package’s documentation. This section breaks up the code into chunks in order to explain it more easily. The model has two parameters: \(\theta = (\theta_1, \theta_2)\). The conditional likelihood is

\[ \begin{eqnarray} p(y_{1:N} \mid x_{1:N}, \theta) = \prod_{i=1}^N p(y_{i} \mid x_{i}, \theta) \tag{1} \end{eqnarray} \]

where \[ \begin{eqnarray} y_{i} \mid x_{i}, \theta \sim \text{Normal}(x_i, \theta_1 - \theta_2) \end{eqnarray} \] for \(i=1,\ldots,N\). Here \(x_1, \ldots, x_N := x_{1:N}\) is a collection of latent/hidden/unobserved random variables. They have the following distribution:

\[ \begin{eqnarray} p(x_{1:N} \mid \theta) = \prod_{i=1}^N p( x_{i} \mid \theta) \tag{2} \end{eqnarray} \]

where

\[ \begin{eqnarray} x_{i} \mid \theta \sim \text{Normal}(x_i, \theta_2) \end{eqnarray} \]

for \(i=1,\ldots,N\).

We can simulate \(N=10\) observations from the model with the following code.

```
<- .2 + .3
realTheta1 <- .2
realTheta2 <- c(realTheta1, realTheta2)
realParams <- 10
numObs <- rnorm(numObs, mean = 0, sd = sqrt(realTheta2))
realX <- rnorm(numObs, mean = realX, sd = sqrt(realTheta1 - realTheta2)) realY
```

Next, construct the sampler function `sampler`

using `makeCPMSampler()`

. We assume a uniform prior over the region \(50 > \theta_1 > \theta_2 > 0\). A simple multivariate normal is used for the proposal distribution.

```
library(cPseudoMaRg)
<- 1000
numImportanceSamps <- 1000
numMCMCIters <- 1.5
randomWalkScale <- 1
recordEveryTh
# create the function that performs sampling
<- makeCPMSampler(
sampler paramKernSamp = function(params){
+ rnorm(2)*randomWalkScale
params
},logParamKernEval = function(oldTheta, newTheta){
dnorm(newTheta[1], oldTheta[1], sd = randomWalkScale, log = TRUE)
+ dnorm(newTheta[2], oldTheta[2], sd = randomWalkScale, log = TRUE)
},logPriorEval = function(theta){
if( (50 > theta[1]) & (theta[1] > theta[2]) & (theta[2] > 0) ){
-7.130899 # - log of 50^2/2
else{
}-Inf
}
},logLikeApproxEval = function(y, thetaProposal, uProposal){
if( (50 > thetaProposal[1]) & (thetaProposal[1] > thetaProposal[2]) & (thetaProposal[2] > 0) ){
<- uProposal*sqrt(thetaProposal[2])
xSamps <- sapply(xSamps,
logCondLikes function(xsamp) {
sum(dnorm(y,
xsamp, sqrt(thetaProposal[1] - thetaProposal[2]),
log = TRUE)) })
<- max(logCondLikes)
m log(sum(exp(logCondLikes - m))) + m - log(length(y))
else{
}-Inf
}
},
realY,
numImportanceSamps,
numMCMCIters, 99, # change to 0 for original pseudo-marginal method
. recordEveryTh)
```

The approximate log-likelihood, provided to the `logLikeApproxEval=`

parameter, uses importance sampling: \[
\begin{align*}
\log \hat{p}(y_{1:N} \mid \theta)
&= \log \left\{
N_X^{-1} \sum_{i=1}^{N_X} \frac{p(y_{1:N} \mid X^i_{1:N}, \theta) p(X^i_{1:N} \mid \theta)}{q(X^i_{1:N} \mid y_{1:N}, \theta)}
\right\}
\end{align*}
\] where \(X^1_{1:N}, \ldots, X^{N_X}_{1:N} \overset{\text{iid}}{\sim} q( \cdot \mid y_{1:N}, \theta)\).

Unlike many importance sampling implementations that call pseudo-random number generating functions (e.g. `rnorm`

) directly, this calculates the approximation based on standard normal samples generated from within `makeCPMSampler()`

. We choose \(q(x_{1:N} \mid y_{1:N}, \theta) = p(x_{1:N} \mid \theta)\), despite it not being the optimal proposal/instrumental distribution. Note the use of the “log-sum-exp” trick to avoid numerical underflow.

Finally, sample the Markov chain. Call the samples `res`

, and examine them.

```
<- sampler(realParams)
res print(res)
#> CPM Results: at a glance...
#> 1000 iterations performed
#> 1000 samples retained
#> posterior means (in the supplied parameterization): 1.772459 1.357395
#> acceptance rate: 0.076
plot(res)
```

You might have noticed that this particular model has a tractable likelihood, so it is more efficient to use regular MH. The likelihood is equal to the following

\[ \begin{align*} p(y_{1:N} \mid \theta) &= \int p(y_{1:N} \mid x_{1:N}, \theta) p(x_{1:N} \mid \theta) \text{d}x_{1:N} \\ &= \prod_{i=1}^N \text{Normal}(y_i ; 0, \theta_1) \end{align*} \]

```
<- makeCPMSampler(
samplerExact paramKernSamp = function(params){
return(params + rnorm(2)*randomWalkScale)
},logParamKernEval = function(oldTheta, newTheta){
dnorm(newTheta[1], oldTheta[1], sd = randomWalkScale, log = TRUE)
+ dnorm(newTheta[2], oldTheta[2], sd = randomWalkScale, log = TRUE)
},logPriorEval = function(theta){
if( (50 > theta[1]) & (theta[1] > theta[2]) & (theta[2] > 0) ){
-7.130899 # - log of 50^2/2
else{
}-Inf
}
},logLikeApproxEval = function(y, thetaProposal, uProposal){
# this is exact now!
if( (50 > thetaProposal[1]) & (thetaProposal[1] > thetaProposal[2]) & (thetaProposal[2] > 0) ){
sum(dnorm(y, mean = 0, sd = sqrt(thetaProposal[1]), log = TRUE))
else{
}-Inf
}
},
realY, # doesn't this matter because Us are not used
numImportanceSamps,
numMCMCIters, 99, # doesn't this matter because Us are not used
.
recordEveryTh)<- samplerExact(realParams)
res2 print(res2)
#> CPM Results: at a glance...
#> 1000 iterations performed
#> 1000 samples retained
#> posterior means (in the supplied parameterization): 0.5457526 0.2342012
#> acceptance rate: 0.034
plot(res2)
```

Consider the following stochastic volatility (Taylor 1982), which is a model that is simpler than the one explored in (Deligiannidis, Doucet, and Pitt 2015). It assumes a latent AR(1) process. It also assumes that, after conditioning on contemporaneous values of the latent process, that the observed returns are normal.

In other words, let \(Z_1, \ldots, Z_N\) and \(W_1, \ldots, W_N\) be iid standard normal random variables, and for \(t > 1\), define the latent process as

\[ X_t = \phi X_{t-1} + \sigma Z_t, \] and \(X_1 \sim \text{Normal}(0, \sigma^2/(1-\phi^2))\). The distribution of the observed returns is defined by \[ Y_t = \beta \exp(X_t/2) W_t \] for \(t = 1, \ldots, N\). The collection of all unknown parameters is \(\theta = \phi, \beta, \sigma^2\).

\(Y_{1:N} \mid X_{1:N}, \theta\) has the same factorization as equation (1), but unlike equation (2), we have the following Markovian factorization for the distribution of all latent variables:

\[ \begin{eqnarray} p(x_{1:N} \mid \theta) = p(x_1) \prod_{t=2}^N p( x_{t} \mid x_{t-1}, \theta) \tag{3}. \end{eqnarray} \] Assume that all three parameters are independent a priori: \(\pi(\phi)\pi(\beta)\pi(\sigma^2)\), and select a uniform, Gaussian, and Inverse-Gamma distribution for these three distributions, respectively. For the proposal distribution, we use a multivariate normal in a transformed parameter space. \(\phi\) is transformed with \(\text{logit}((\phi+1)/2)\), \(\beta\) is transformed with the identity map, and \(\sigma^2\) is transformed with the natural logarithm.

The primary bottleneck in the previous example was the function passed in to the `logLikeApproxEval=`

input of `makeCPMSampler()`

. It was written entirely in R, and so, as a result, was not as fast as it could have been. In this example, we provide a compiled function for this input that makes use of C++ code taken from the PF C++ library (Brown 2020). A growing collection of examples of calling PF code from R is provided as an R package called `pfexamplesinr`

that is hosted on Github. Interfacing to c++ from R was facilitated by the RcppEigen package (Bates and Eddelbuettel 2013).

The particle filter used to compute the approximate log-likelihood is described in (Deligiannidis, Doucet, and Pitt 2015). Like the example above, it calculates the approximation using common random normal variables generated from within `makeCPMSampler()`

. As detailed in (Deligiannidis, Doucet, and Pitt 2015), the particle filter that calculates this number uses a resampling technique that makes of the pseudo-inverse of a Hilbert space-filling function. This function was implemented with C++ code lightly edited from the C code provided in (Skilling 2004).

Since version 1.0.1, this package provides an extra input to `makeCPMSampler()`

called `nansInLLFatal=`

. If set to `FALSE`

, then whenever `NaN`

is returned from the approximate log-likelihood function, the parameter proposal is disregarded. Alternatively, if it is set to `TRUE`

, then the entire chain comes to a halt, and all the samples obtained up until that iteration are disregarded. When using particle filters to produce approximations to the likelihood, I recommend setting this to `FALSE`

, especially when the proposal distribution has a high amount of dispersion. This is because, despite one’s best efforts to guard against it, particle filters can frequently generate `NaN`

s.

Please also note that the number of particles used in this particle filter is selected in two places. One place is the code below, and another in a c++ `#define`

directive in the file `src/likelihoods.cpp`

in the `pfexamples`

library. If you are interested in increasing or decreasing the number of particles, fork `pfexamplesinr`

, edit your copy of the files, and run the following example after replacing the appropriate line with `devtools::install_github("<your Github username>/pfexamplesinr")`

.

```
library(cPseudoMaRg)
::install_github("tbrown122387/pfexamplesinr@e4e2a80")
devtoolslibrary(pfexamplesinr)
<- read.csv("data/return_data.csv", header=F)[,1]
returnsData <- 500 # THIS MUST MATCH "#define NP 500" in src/likelihoods.cpp
numParticles <- 500
numMCMCIters <- .1
randomWalkScale <- 1
recordEveryTh <- length(returnsData)*(numParticles+1)
numUs
# some helper functions
<- function(untrans){
transformParams <- vector(mode = "numeric", length = 3)
p 1] <- boot::logit(.5*(untrans[1] + 1))
p[2] <- untrans[2]
p[3] <- log(untrans[3])
p[return(p)
}<- function(trans){
revTransformParams <- vector(mode = "numeric", length = 3)
p 1] <- 2*boot::inv.logit( trans[1] )-1
p[2] <- trans[2]
p[3] <- exp(trans[3])
p[return(p)
}
<- makeCPMSampler(
sampler paramKernSamp = function(params){
revTransformParams(transformParams(params) + rnorm(3)*randomWalkScale)
},logParamKernEval = function(oldTheta, newTheta){
<- transformParams(newTheta)
unconstrainedNew <- transformParams(oldTheta)
unconstrainedOld dnorm(unconstrainedNew[1], unconstrainedOld[1], sd = randomWalkScale, log = TRUE) #phi
+ 0.6931472 + unconstrainedNew[1] - 2*log(1 + exp(unconstrainedNew[1])) # phi jacobian
+ dnorm(unconstrainedNew[2], unconstrainedOld[2], sd = randomWalkScale, log = TRUE) #beta
+ dnorm(unconstrainedNew[3], unconstrainedOld[3], sd = randomWalkScale, log = TRUE) #sigmaSquared
- unconstrainedNew[3] # jacobian
},logPriorEval = function(theta){
if( (abs(theta[1]) >= 1.0) || theta[3] <= 0.0 ){
-Inf
else{
}log(.5) +
dnorm(theta[2], mean = 0, sd = 10, log = T) +
dgamma(x = 1/theta[3], shape = 1.3, rate = .3, log = T)
}
},logLikeApproxEval = svolApproxLL, # c++ function from pfexamplesinr
99, recordEveryTh, FALSE
returnsData, numUs, numMCMCIters, . )
```

```
<- sampler( c(.9, 1, .1))
svolSampleResults mean(svolSampleResults)
print(svolSampleResults)
```

Andrieu, Christophe, and Gareth O. Roberts. 2009. “The Pseudo-Marginal Approach for Efficient Monte Carlo Computations.” *The Annals of Statistics* 37 (2): 697–725. http://www.jstor.org/stable/30243645.

Bates, Douglas, and Dirk Eddelbuettel. 2013. “Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package.” *Journal of Statistical Software* 52 (5): 1–24. https://www.jstatsoft.org/v52/i05/.

Brown, Taylor R. 2020. “A Short Introduction to PF: A c++ Library for Particle Filtering.” *Journal of Open Source Software* 5 (54): 2599. https://doi.org/10.21105/joss.02599.

Deligiannidis, George, Arnaud Doucet, and Michael K. Pitt. 2015. “The Correlated Pseudo-Marginal Method.”

Skilling, John. 2004. “Programming the Hilbert curve.” In *Bayesian Inference and Maximum Entropy Methods in Science and Engineering*, edited by Gary J. Erickson and Yuxiang Zhai, 707:381–87. American Institute of Physics Conference Series. https://doi.org/10.1063/1.1751381.

Taylor, S. J. 1982. “Financial Returns Modelled by the Product of Two Stochastic Processes-a Study of the Daily Sugar Prices 1961-75.” *Time Series Analysis : Theory and Practice* 1: 203–26. https://ci.nii.ac.jp/naid/10018822959/en/.