MCMC에 대한 직관적인 가이드(1부): Metropolis-Hastings 알고리즘
Towards Data Science
|
|
💼 비즈니스
#mcmc
#metropolis-hastings
#tip
#마르코프 연쇄
#몬테카를로
#확률 분포
원문 출처: Towards Data Science · Genesis Park에서 요약 및 분석
요약
해당 기사의 핵심 내용을 한국어로 요약 중이며, 현재 상세 본문 품질 검증을 진행하고 있습니다. 출처는 Towards Data Science입니다.
본문
Despite the intimidating acronym, Markov Chain Monte Carlo is a combination of two straightforward concepts: - A Markov Chain is a stochastic process where the next state of the system depends entirely on its current state and not on the sequences of events that preceded it. This property is usually referred to asmemorylessness. - A Monte Carlo method simply refers to any algorithm that relies on repeated random sampling to obtain numerical results. In this series, we will present the core algorithms used in MCMC frameworks. We primarily focus on those used for Bayesian methods. We begin with Metropolis-Hastings: the foundational algorithm that enabled the field’s earliest breakthroughs. But before diving into the mechanics, let’s discuss the problem MCMC methods help solve. The Problem Suppose we want to be able to sample variables from a probability distribution which we know the density formula for. In this example we use the standard normal distribution. Let’s call a function that can sample from it rnorm . For rnorm to be considered helpful, it must generate values \(x\) over the long run that match the probabilities of our target distribution. In other words, if we were to let rnorm run \(100,000\) times and if we were to collect these values and plot them by the frequency they appeared (a histogram), the shape would resemble the standard normal distribution. How can we achieve this? We start with the formula for the unnormalised density of the normal distribution: \[p(x) = e^{-\frac{x^2}{2}}\] This function returns a density for a given \(x\) instead of a probability. To get a probability, we need to normalise our density function by a constant, so that the total area under the curve integrates (sums) to \(1\). To find this constant we need to integrate the density function across all possible values of \(x\). \[C = \int^\infty_{-\infty}e^{-\frac{x^2}{2}}\,dx\] There is no closed-form solution to the indefinite integral of \(e^{-x^2}\). However, mathematicians solved the definite integral (from \(-\infty\) to \(\infty\)) by moving to polar coordinates (because apparently, turning a \(1D\) problem to a \(2D\) one makes it easier to solve) , and realising the total area is \(\sqrt{2\pi}\). Therefore, to make the area under the curve sum to \(1\), the constant must be the inverse: \[C = \frac{1}{\sqrt{2\pi}}\] This is where the well-known normalisation constant \(C\) for the normal distribution comes from. OK great, we have the mathematician-given constant that makes our distribution a valid probability distribution. But we still need to be able to sample from it. Since our scale is continuous and infinite the probability of getting exactly a specific number (e.g. \(x = 1.2345…\) to infinite precision) is actually zero. This is because a single point has no width, and therefore contains no ‘area’ under the curve. Instead, we must speak in terms of ranges i.e. what is the probability of getting a value \(x\) that falls between \(a\) and \(b\) (\(a 1\): - our forward acceptance is \(a(x’,x) = \min(1,R) = R\) - our reverse acceptance is \(a(x,x’) = \min(1,\frac{1}{R}) = 1\) Thus: \[\frac{R}{a(x,x’)} = R\] \[\frac{R}{1} = R,\] and the equality is satisfied in both cases. Implementation Lets implement the MH algorithm in python on two example target distributions. I. Estimating a Gaussian Distribution When we plot the samples on a chart against a true normal distribution this is what we get: Now you might be thinking why we bothered running a MCMC method for something we can do using np.random.normal(n_iterations) . That is a very valid point! In fact, for a 1-dimensional Gaussian, the inverse-transform solution (using trignometry) is much more efficient and is what numpy actually uses. But at least we know that our code works! Now, let’s try something more interesting. II. Estimating the ‘Volcano’ Distribution Let’s try to sample from a much less ‘standard’ distribution that is constructed in 2-dimensions, with the third dimension representing the distribution’s density. Since the sampling is happening in \(2D\) space (the algorithm only knows its x-y location not the ‘slope’ of the volcano) – we get a pretty ring around the mouth of the volcano. Summary of Mathematical Conditions for MCMC Now that we’ve seen the basic implementation here’s a quick summary of the mathematical conditions an MCMC method requires to actually work: | Condition | Mechanism | |---|---| | Stationary Distribution (That there exists a set of probabilities that, once reached, will not change.) | Detailed Balance The algorithm is designed to satisfy the detailed balance equation. | | Convergence (Guaranteeing that the chain eventually converges to the stationary distribution.) | Ergodicity The system must satisfy the conditions in table 2 be ergodic. | | Uniqueness of Stationary Distribution (That there exists only one solution to the detailed balance equation) | Ergodicity Guaranteed if the system is ergodic. | And here’s how the MH
Genesis Park 편집팀이 AI를 활용하여 작성한 분석입니다. 원문은 출처 링크를 통해 확인할 수 있습니다.
공유