So last time I talked about rejection sampling for generating random samples of a specific distribution, starting from another distribution. We talked about what made it suck, and really, if you are going to do rejection sampling, you might as well head straight to Metropolis-Hastings, it’s basically an improved version with no downsides when compared to rejection sampling.
So the Metropolis-Hastings algorithm is another Monte Carlo method of generating samples, and really, it’s basically rejection sampling with a minor change. So I’ll go over the gist of it, and then point out the change, and why Metropolis-Hastings even works.
The algorithm is as follows.
Given a target PDF of and SYMMETRIC densities of (symmetric meaning ) that we are drawing samples from, where g is conditioned on the previous sample we had accepted,
1. Get a sample x from , and a sample from a uniform random variable between 0 and 1, i.e. , which we’ll MIGHT be using to make our decision whether or not to keep or reject the new sample.
2. Check if we keep the sample; if , such that the PDF of the new sample is larger than the previous one, we keep our new sample straight up. Otherwise, we need to do a check against our uniformly distributed value; if our uniformly distributed random sample , then we keep the new sample x. Otherwise, we reject x.
3. If we kept the previous sample, is replaced with the x we just got. Go back to step 1 and keep doing it until we have enough values.
A common example of is a Gaussian distribution with a mean of . Then the resulting process will be something of a random walk, which very nicely lines up with the Markov chain proof of correctness for this algorithm. I’M not going to prove it is right, but there certainly is proof, this process is pretty old.
So what’s the big deal? We changed our criterion for keeping samples. On inspection, you’ll note that we don’t need a dumb scaling constant anymore, and assuming we chose nice source and target distributions to work with, our acceptance rates will be much nicer than before.
But you might think, hey, if we accept more stuff like this, how can we guarantee Metropolis-Hastings even gives us samples pertaining to our target distribution?
In truth, the Metropolis-Hastings algorithm is loosely construction a Markov chain system like the ones we’ve suggested before. What we are doing is moving from state-to-state as we accept samples, and our Markov chain is constructed in such a way such that the stationary distribution of the Markov chain will be the target distribution! Of course, for a target PDF, since it’s a continuous function, this is not intuitive to see, but working with a discrete PMF, say, something trivial like simulating a fair 6-sided die by choosing a conditional such that each roll is biased (evenly, like the PMF is simply being shifted over a single value at a time from 1 to 6) more towards the number last rolled, it’s very possible to construct the transition matrix and see that the end stationary distribution of the system will be that of a fair die. An example would be a transition matrix like this…
Even though the die is a biased turd, if we generated dice rolls over time with this transition matrix, the stationary distribution in the end would be a nice uniform fair die.
So what are the downsides to this? Well, samples tend to be correlated with their neighboring samples, so if you want uncorrelated samples, you sort of have to give the samples a good shuffle. Additionally, the first bunch of samples generated clearly won’t be distributed as we’d want because the Markov chain hasn’t had time to hit a stationary distribution, so expect to have a burn-in period of running the algorithm, throwing things away, and then getting samples to use.
Hell, I’m lazy, you can fix my previous function to work with Metropolis-Hastings if you really want, the change is trivial and my previous function wasn’t great anyways. I wrote it in maybe a couple minutes.
Next time, if I get the chance, I will discuss Gibbs sampling and wrap up what I want to say on generating samples of a target distribution.