I feel like I need to publish something quick, so I’ll just discuss stationary distributions for Markov Chains real quick before my orientation stuff at Cisco in California. I don’t necessarily know if I have time to make a post while I’m there.
We’ve discussed Markov Chains previously. Given a set of states, say n states such that where S is the set of all states we care about, we have a transition matrix that is n x n elements, say a matrix A like the following:
where the transition matrix entry is the probability of being AT state i, and moving to state j. REMEMBER THIS, because some books like to refer to it the other way around, like i is the target state and j is the previous state, and that changes our notation up entirely here.
For simplicity, you can assume our Markov Chain is time-invariant, meaning the probabilities in the transition matrix don’t change over time. I believe what we are discussing here will hold, but definitely not in a obvious and conveniently calculable way, like there’s going to be symbols and stuff, and potentially you’d probably have to prove that the markov chain remains positive recurrent over time, weird shit, let’s not get into that. Let’s assume we have a nice matrix full of constants that don’t change with each time step.
So say we have a starting distribution of what state we might be in, a row vector x such that . The subscript of X indicates which time step we’re talking about, and 0 is obvious the start. We can find the probability of being in whatever state at time step t by doing the following
so we have a row vector multiplied by our A transition matrix to get us another row vector with the changed probabilities of being at each state at whatever time step. So that’s cool, right? Very handy in terms of calculation. Assuming your transition matrix is legit, like the rows add up to 1 properly, then your X elements should also sum up to 1 all the time. You can prove this but it takes some work, just accept it for now.
So a stationary distribution is when we have some row vector Y such that , so when you multiply the row vector Y with the transition matrix, there is no change to the elements of Y, the probabilities remain the same. This stationary distribution typically comes up as a limiting distribution where . So yeah, if you want to brute force your way to a stationary distribution, you can calculate it on a computer by doing matrix multiplies over and over until hitting some convergence point.
But linear algebra people should recognize that Y here is actually a LEFT eigenvector! You can solve for this simply by solving for the (right) eigenvectors of , the transpose of the transition matrix, and finding the eigenvector associated with eigenvalue 1, and normalizing it such that it is a legit discrete probability vector. So there’s a clear relation of linear algebra to Markov chains.
Obviously, there has to be a catch to this bit, right? So there are three things that could happen with stationary Markov chains.
1. unique stationary distribution
2. multiple unique final distributions
3. oscillating/alternative final distributions
The uniqueness of the stationary distribution is only guaranteed iff the MC is positive-recurrent. What does that even mean? The simplest way to find it is by making sure each state is reachable at any time, i.e. they are all recurring states, and that the periodicity of each state going back to itself eventually has a greatest common factor of 1 (like there could be a state that goes back to itself potentially even 2 steps at the quickest, but somebody else could only do it in minimum 3 steps, then the GCF is 1). In linear algebra terms, this effectively guarantees the matrix is full rank, geometric multiplicity of eigenvalue 1 is 1 (there’s only 1 eigenvector associated with eigenvalue 1), and there’s no weird thing involving eigenvector -1.
How do 2 and 3 even occur then? 2, if you effectively have transient states, or just completely separable Markov chains you happened to put into the same matrix (like with 5 states, if states 1-3 communicated amongst themselves and 4,5 communicated only between themselves), you will end up with more than 1 stationary distribution at the end. In case 3, if there’s some periodicity going on, then clearly it’s possible that the Markov chain will have some alternating with each time step and not actually be stationary by the end, though it will have converged to a set of probability vectors. Generally, you may end up with a eigenvector for eigenvalue 1, and then an eigenvector for eigenvalue -1 that will represent the change in the distribution with every time step.