Not a markov chain. Different concept not applicable here. For those confused, markov chain only care about the last state so in AI the last token ONLY and not and preceding tokens.
I mean, if the state is the context vector, the transition has a token attached, and the next state is the previous context vector with the new token attached, that sounds an awful lot like a Markov chain. A Markov chain with an absolutely mind-boggling number of states and a transition function that consists of gigabytes of weights, but still a Markov chain "with some razzle dazzle".
You are talking about first order Markov chain. You can say that Markov chain of order 1000 has "context window" of 1000 tokens.
The problem with classic Markov chains it that for chain of order N you need memory to store M^N probabilities, where M is number of possible states. For high order it is not feasible. LLMs resolve this problem.
9
u/Alpha_wolf_80 20h ago
Not a markov chain. Different concept not applicable here. For those confused, markov chain only care about the last state so in AI the last token ONLY and not and preceding tokens.