# symbolic dynamical systems-an example

Here in this post I will not give a precise definition of what is a symbolic dynamical system. I will just give an example of it, so that we can get just feeling of a symbolic dynamical system.

One of the most studied example in dynamical system is perhaps the unit circle, $S^1$. There are various transformations that we can define on it, like the rotation transformation, the reflexion transformation, and so on. When we view the unit circle as the quotient space of the unit interval, that is $S^1=[0,1]/(0\equiv 1)$, perhaps we can invent more, for example, the multiplication transformation, that is $T_N: S^1\rightarrow S^1, z\mapsto z^N$(or, equivalently $T_N: \mathbb{R}/\mathbb{Z}\rightarrow \mathbb{R}/\mathbb{Z}, x\mapsto Nx$), just multiplying the argument of the complex number $z$ by $N$, where $N$ is a positive integer, since for non integers this map would not make obvious sense.

Which kind of numbers in the unit interval(latter in this post, we will mix the unit interval and the unit circle when there is no confusion) behaves well under $T_N$? First of all, which number has finite orbit under $T_N$? After a simple speculation, we see that the orbit $\mathfrak{O}(x)$ of a point $x\in [0,1]$ is finite if and only if $x\in \mathbb{Q}$. What are the possible cardinality of these finite orbits? This is not hard, either. But all this has something to do with $N$. Perhaps a natural way of thinking is to express a real number on basis of $N$(here we assume that $N>1$, otherwise $T_N$ is the identity map, things get much easier), that is to say, write $x=\sum_{n\in\mathbb{Z}}\frac{a_n}{N^n}$. And we can prove easily that $x$ is a rational number if and only if the sequence $(a_n)$ is periodic. So if $x$ is rational, then suppose that $(a_n)$ has a period $\tau$, then we see that $T_N^{\tau}(x)=x$. Which means that $\#\mathfrak{O}(x)\leq \tau$. An interesting result.

Perhaps we can concentrate our attention on this $N$-cimal expansion. In fact, we see that all expansions of numbers in the unit interval correspond to a sequence of integers $(a_n)_{n\in\mathbb{N}}$ such that each term $a_n\in\{0,1,...,N-1\}$. Conversely, given a sequence, $(a_n)_{n\in\mathbb{N}}(a_n\in\{0,1,...,N-1\})$, there is always a number $x=\sum_n\frac{a_n}{N^n}$ which lies in the unit interval. Note that, for a real number in the unit interval, there are sometimes two possibilities to write its $N$-cimal expansion(for example, $x=\frac{1}{N}=\sum_{n>1}\frac{N-1}{N^n}$). So we had better define a map from $\{0,1,...,N-1\}^{\mathbb{N}}$ to the unit interval, not in the inverse direction. That is, we define, $\pi: S_N=\{0,1,...,N-1\}^{\mathbb{N}}\rightarrow \mathbb{R}/\mathbb{Z}, (a_n)\mapsto \sum_n\frac{a_n}{N^n}$.

What is missing is a topology on $S_N$. And that is what we are going to do right now. For any element $a=(a_n)\in S_N$, if it is not zero, then there is a least $n(a)$ such that $a_{n(a)}\neq 0$. And thus we define the order of $a$ to be $ord(a)=n(a)$. And we set $ord((0))=+\infty$. It is easy to see that $ord(x-y)\geq ord(x)+ord(y)(x,y\in S_N)$. Note that these numbers are positive numbers, so we have an inequality $e^{-ord(x-y)}\leq e^{-ord(x)}+e^{-ord(y)}$, which shows that $d(x,y)=e^{-ord(x-y)}$ is in fact a distance on $S_N$. If we denote $p_i$ the projection that takes an element in $S_N$ to its $i$-th component, then we see that $p_i(S_N)=\{0,1,...,N-1\}$, which, as a finite subset of $S_N$, is compact. Note also that $S_N=\prod_ip_i(S_N)$, so $S_N$ is the product of compact spaces, so $S_N$ is compact.(There is some similarity between $S_p$ and the $p$-adic integers $\mathbb{Z}_p$, the expansion, the order of an element, the metric(we can show that both two metrics are in fact ultrametrics), the compacity, the total disconnectivity, etc.,but their addition and multiplication rules are not the same, which is a pity. In other words, they share some topological properties, but don’t share algebraic properties).

Having equipped $S_N$ with an ultrametric, we wonder if $\pi: S_N\rightarrow S^1$ is continuous or not. It is not hard to do. In deed, note that $\pi(a)\leq N^{-ord(a)}\sum_{n>0}\frac{N-1}{N^n}=d(a,0)^{\text{ln}N}$, which show that $\pi$ is indeed continuous.

Now we can study some properties of these two transformations, $T_N$. Let’s first decide if the $T_N$ can be transported to $S_N$. Suppose $x=\sum_n\frac{x_n}{N^n}$. So $T_N(x)=\sum_n\frac{x_{n+1}}{N^n}$. So in fact, we can define a transformation on $S_N$, that is $\sigma: S_N\rightarrow S_N, (a_n)_{n\geq0}\mapsto (a_{n+1})_{n\geq0}$. It is easy to verify that there is $T_N\circ \pi=\pi \circ\sigma$. Recall that $\pi$ is surjective, and we can this kind of map(like $\pi$, which is continuous, surjective, commutes two dynamical systems) a semi-conjugation.

Then like those irrational rotations on the unit circle, we want to see if there are some real number in the unit interval has a dense orbit. Without the idea of $N$-cimal expansion, this problem can be very hard. However, with this idea, things become much easier. Note that $S_N$ is in fact a separable space(one countable dense subset of $S_N$ is $B=\{x\in S_N|\exists m\in\mathbb{N},\text{s.t.} \forall n>m, x_n=0\}$). We can put the effective parts of all the elements in $B$ in one element of $S_N$, and this element has a dense orbit. What is more, we can see that are infinite way to generate this kind of elements.

So, in this example, we abstract the space $S^1$ to get a purely symbolic dynamical system. After giving it some proper metric, this abstract symbolic dynamical system gives us a satisfactory answer to a hard problem.

Now we want to say something on some particular $\Sigma_N$, that is $\Sigma_2$. Before that, let’s study the expansions of the real numbers in the interval $[0,1)$. We, as we have done in the above, identify $S=[0,1)\mathbb{R}/\mathbb{Z}$, equipped with the Lebesgue measure. Then we see that the multiplication operator $T_N:S\rightarrow S, x\mapsto Nx$ is ergodic with respect to the Lebesgue measure. That is to say, the only measurable set $A\subset S$ such that $T_N^{-1}(A)=A$ is a set of measure $0$ or $1$, or equivalently, the only measurable functions invariant under $T$ are the constant functions. Note that the level set of an invariant function is a set invariant under $T$, too. This gives the equivalence between these two. What we want to do now is to show that all words of length $k$ in $\Sigma_N$ have an equal probability of appearance in the expansion of almost all elements of $S$ in the $N$-base expansion. How to express the probability of appearance of a word in the expansion of a number? Suppose there is a word $x=(x_1x_2...x_k)(0\leq x_1,...,x_k\leq N-1)$, and a real number $a=\sum_{i>0}a_i/N^i$. That $x$ appears in the first $k$ digitals of $a$ is equivalent to saying that $a\in I_x=[b,b+1/N^k)$, where $b=b(x)=\sum_{1\leq i\leq k}x_i/N^i$. And the second $k$ digitals of $a$ is the word $x$ means that $T_N(a)\in [b,b+1/N^k)$. So, we have found a method in counting the number of appearance of the word $x$ in the expansion of the number $a$. That is, in the first $n+k$ digitals of $a$, the number of appearance of $x$ is just $\sum_{1\leq i\leq n}\chi_{I_x}(T^i_N(a))$. Divided by $n$, then the quotient means the frequency of appearance of $x$ in the first $n+k$ digitals of $a$. So, does this frequency have a limit? If so, what is the limit? Here we need the mean ergodic theorem of von Neumann. Note that the characteristic function $\chi_{I_x}$ belongs to $L^2(S)$. So, the mean ergodic theorem of von Neumann says that, under the condition that $T$ is ergodic, the sequence of functions $\frac{1}{n}\sum_{0\leq k\leq n-1}f(T^k_N(x))$ converges in $L^2(S)$ to a constant for any $f\in L^2(S)$, so does $f=\chi_{I_x}$ here(Or we can also use the mean ergodic theorem of Birkhoff). And this constant is just the orthogonal projection of $f$ on the space of functions measurable with respect to the $\sigma$-algebra generated by the measurable sets invariant under $T$. Thus here, it is the trivial algebra, $\{\emptyset, S\}$, and a function measurable with respect to this algebra is just constant function. So, the orthogonal of $f$ onto this space is just the integral of $f$. So, we have that $\lim_n\frac{1}{n}\sum_{0\leq k\leq n-1}f(T^k_N(a))$ converges in $L^2(S)$ to $\int f$. Note also that each term in this sequence is bounded from above by $1$. Without loss of generality, we can assume that $\int f=0$. So, we conclude that for almost all points $a\in S$, this sequence converges to $0$(we can argue by contradiction. Suppose that there is a measurable set $A$ of measure $m(A)>0$ such that for almost all $a\in A$, we have that $\limsup |\frac{1}{n}\sum_kf(T^k_N(a))|>\epsilon>0$. So we must have that $\limsup \int |\frac{1}{n}\sum_kf(T^k_N)|^2>\epsilon^2m(A)>0$(theorem of dominant convergence), which is a contradiction). So, in this way, we show that for almost all points $a\in S$, we have that in its expansion, the word $x$ appears with a frequency $\int \chi_{I_x}=\frac{1}{N^k}$. This means that for each word $x$, there is a set of measure $1$ such that each point in it has an expansion in which $x$ appears infinitely many times. One consequence of this result is that, for all the words(which are countable), there is a set of measure $1$ such that each point in it has an expansion in which every word appears infinitely many times. This is the same as that the set $S'=\{a\in S|\omega(a)=S\}$.

A little digression here. Note that equipped with the quotient topology, $S$ is a compact space, thus is complete. So, the Baire category theorem applies here. Baire category theorem says that in a complete metric space $S$, countable intersection of dense open sets is still dense. Note that ”density’ is really a topological term, which means that, it only measures the cardinality of the set, but says nothing about the ‘volume’ of the set. We call the countable intersection of dense open sets a generic set. So, a generic set is dense, but not the converse. For the present problem, we can show that $S'$ is a generic set. This is not hard to see, and we will omit it.

Another digression is useful. It is the concept of mixed dynamical systems. We say a system $(X,T,m)$(preserving the measure) is mixed, if for any measurable sets $A,B\subset X$, we have that $\lim_nm(A\bigcap T^{-n}(B))=m(A)m(B)$. What does this condition mean? It means that, after a long time($n>>1$), a non trivial measurable set $A$ will be spread to almost everywhere with equal probability. This condition is stronger than ergodicity in that ergodicity doesn’t guarantee the equal probability. Indeed, suppose that a system is mixed, then for any invariant set $A$, $X-A$ is invariant, too. So, we have that $\lim_nm(A\bigcap T^{-n}(X-A))=m(A)m(X-A)$. But we have that $T^{-1}(X-A)=X-A$, so $m(A\bigcap T^{-n}(X-A))=m(A\bigcap X-A)=0$, which shows that either $m(A)=0$ or $m(A)=1$, thus showing the ergodicity of the system. According to the comments above, it is not hard to find examples of dynamical systems which are ergodic but not mixed. The examples must not spread its subset (at least one subset). One example is the rotation on the unit circle.

Now return to the problem of symbolic dynamical systems. Using its factor system $(S,T_N,m)$, we have shown that $(\Sigma_N,\sigma_n,d)$ is ergodic. But note that we haven’t defined any Borel measure on $\Sigma_N$. For any $p_0,p_2,...,p_{N-1}\in [0,1]$ such that $\sum_ip_i=1$, and any cylinder for a word $x=(x_1,x_2,...,x_k)$, $[x]=\{a\in \Sigma_N, a_1=x_x,...,a_k=x_k\}$, we set $m([x])=p_{x_1}p_{x_2}...p_{x_k}$. What is more, the set $\mathbb{X}=\{[x]\subset X|x \text{ is a word}\}\bigcup{\emptyset}$(we also consider the empty word $\emptyset$, then $[\emptyset]=X$), forms a basis for the topology on $\Sigma_N$ induced by the metric defined above. What is more, $\mathbb{X}$ generates the $\sigma$-algebra on $\Sigma_N$ which is compatible with its topology. Moreover, for any two elements $A,B\in \mathbb{X}$, we have that $A\bigcap B\in \mathbb{X}$(either the empty set, either one of them), $A-B$ is the finite union of elements in $\mathbb{X}$. And if $A_n\in \mathbb{X}(n\in \mathbb{N})$ are disjoint one to another, and there is $A\in\mathbb{X}$ such that $A=\bigcup_nA_n$, then the word $x$ for $A$ must be a part of each of the word $x_n$ for $A_n$. So, without loss of generality, we can suppose that these words have no common sub-word. So, we have that $1=m([\emptyset])=\sum_{0\leq k\leq N-1}m(\bigcup_{A_n, (x_n)_1=k}A_n)$. We can continue in this way for the second digital, the third digital, etc.. And finally we have that $m(\bigcup_n A_n)=\sum_nm(A_n)$. So, according to the Caratheodory extension theorem, we have that there is a unique $\sigma$-finite measure($m'$) on $X$ compatible with the topology on it such that $m'|_{\mathbb{X}}=m$. And for simplicity, we still write $m$ for $m'$. So, for each $N$-tuple, $p=(p_0,p_1,...,p_{N-1})$, there is a unique measure on $\Sigma_N$, and we denote this measure by $m_p$.

Now we want to show that $m_p$ is $\sigma$-invariant. Indeed, for any cylinder $[x]$, we have that $T^{-1}([x])=[0x]\bigcup[1x]\bigcup...\bigcup[(N-1)x]$, and $\sum_km([kx])=\sum_kp_km([x])=m([x])$. This completes the proof. Next we want to prove that $(\Sigma_N,\sigma, m_p)$ is ergodic. First, we can prove that it is mixed. For that, consider any two cylinder $A=[x]=[(x_1x_2...x_k)],B=[y]=[y_1...y_l]$, then for large $n>>1$, $T^{-n}([y])=\{a\in\Sigma_N|a_n=y_1,...,a_{n+l}=y_l\}$, so we must have that $A\bigcap T^{-n}(B)=\{a\in\Sigma_N|a_1=x_1,...,a_k=x_k,a_n=y_1,...,a_{n+l}=y_l\}$, thus $m_p(A\bigcap T^{-n}(B))=p_{x_1}...p_{x_k}p_{y_1}...p_{y_l}=m_p(A)m_p(B)$. So this system is mixed, thus is ergodic.

Now we can repeat what we have done for $(S,T_N,m)$: for any word $x=(x_1...x_k)$, and any element $a\in \Sigma_N$, the frequency that $x$ appears in the first $n+k$ digitals of $a$ is $\frac{1}{n}\sum_{0\leq j\leq n-1}\chi_{B(x,1/N^k)}(\sigma^j(a))$. Note that $B(x,1/N^k)=[x]$, so we have that this sequence converges for almost all points $a$ to the integral $\int \chi_{[x]}=m([x])=p_{x_1}...p_{x_k}$(since the system is ergodic). So, in this system, different words appear with a possibly different probability. The reason for this, is of course that the measure is not uniform. But note that, even though these measures are not uniform, these systems are still mixed.

So we prove that as long as none of these $p_i$ is zero, then the set of elements $a\in \Sigma_N$ such that $\omega(a)=\Sigma_N$ has a measure $1$. Is this set the full $\Sigma_N$? Not really. For example, the element $a=(0000...)$ whose digitals are all zero, so, $\omega(a)=\emptyset$. In fact, we can show something much stronger, that is: there is a generic set whose elements are all this sort. First, we want to construct an element $a$ such that the limit of the sequence $\frac{1}{n}\sum_k\chi_{[x]}(\sigma^k(a))$ doesn’t exist for all words( that is to say, for some words, this limit doesn’t exist). For example, we choose $x=(1)$. How to construct an element such that the frequency of ones doesn’t have a limit? One way is, for an element $a=(a_1a_2...)$, we put $2^{2k-1}$ many ones on $a_{2^{2k}+1},a_{2^{2k}+1},...,a_{2^{2k+1}}$, and we put no ones on $a_{2^{2k+1}+1},...,a_{2^{2k+2}}$. So, on one hand, from $a_1$ to $a_{2^{2n+1}}$ there are $\sum_{1\leq k\leq n}2^{2k-1}$ many ones, so the frequency of ones is $f_n\geq 2^{2n-1}/2^{2n+1}=1/4$. On the other hand, from $a_1$ to $a_{2^{2n+2}}$, there are $\sum_{1\leq k\leq n}2^{2k-1}<2^{2n+1}/3$, so the frequency of ones is $g_n\leq 1/6$. So, we have found two sequences of frequencies $f_n,g_n$, such that $\liminf_nf_n\geq1/4$ while $\limsup_ng_n\leq1/6$. So, the limit doesn’t exist. Now we want to show that there is a generic set whose elements are all this kind. We define $W_n=\{x\in\Sigma_N| \exists l,k>n, \text{from } a_{2^{2k}+1+l} \text{ to } a_{2^{2k+1}+2l}, \text{there are } 2^{2k-1}+l/2 \text{ many ones, from } a_{2^{2k+1}+2l+1} \text{ to } a_{2^{2k+2}+4l} \text{ there are no ones}\}$. It is easy to verify that each $W_n$ is open and dense. Now look at their intersection $W=\bigcap_n W_n$, this is a generic set. We can show that each element in $W$ has the same property as the element constructed above. For each word, we can construct a series of $W_n$ as above, and since there are only countable words, so we conclude that the set of elements in whose expansions the frequency of each word doesn’t converge is a generic set. Of course, this generic set must have a measure $0$, with respect to the above measures $m_p$.

Next we want to restrict our attention to $\Sigma_2$, which consists of elements of a sequence of $0,1$. A note for all these $\Sigma_N$. The distance function takes only discrete values in the real number field. What is more, the whole space $\Sigma_N$ is compact. So, if $C\subset\Sigma_N$ is a closed subset, then it is compact. For each point $c\in C$, we can take a ball $B(c,r_c)$ around it such that $B(c,r_c)\subset C$. And using the compactness of $C$, we conclude that $C$ is at the same time open. So the characteristic function of it, $\chi_C$, is continuous. Now suppose that, there is an element $x\in \Sigma_2$ such that there is a positive number $\delta\in (0,1)$, and for any $n\in\mathbb{N}$, there is $b_n,a_n\in\mathbb{N}$, $b_n-a_n>n$, and $\frac{1}{b_n-a_n}\sum_{a_n+1\leq k\leq b_n}x_k>\delta$. This condition means that for any large number $n$, there is an interval of length greater than $n$ such that the number $1$ appears in this interval with a frequency greater than $\delta$. We want to use this to construct a measure $m$ on $\Sigma_2$. Note $K=\overline{\mathfrak(O)_{\sigma}(x)}$ the closure of the orbit of $x$ under $\sigma$. We want that $m(K)=1$. The bricks to construct measures is the Dirac measures. Here we can try, $m_n=\frac{1}{b_n-a_n}\sum_{a_n\leq k\leq b_n}\delta_{\sigma^{k}(x)}$. Note that, these are all measures, thus there is a subsequence $m_{n_k}$ which converges weakly to a measure $m$. Moreover, for any $n$, there is $m_n(K)=1$, then $\int \chi_Kdm_n=1$. Since $\chi_K$ is continuous, so we must have $1=\int \chi_K dm_{n_k}\rightarrow \int \chi_Kdm$, thus $m(K)=1$. Now to express the idea that $\frac{1}{b_n-a_n}\sum_{a_n+1\leq k\leq b_n}x_k>\delta$, we set $K'=[1]\bigcap K$. Then this inequality becomes $\frac{1}{b_n-a_n}\sum_{a_n+1\leq k\leq b_n}\chi_{[1]}(\sigma^k(x))>\delta$. In other words, $\frac{1}{b_n-a_n}\sum_{a_n+1\leq k\leq b_n}\delta_{\sigma^k(x)}(K')>\delta$. So, $\int \chi_{K'}\geq\delta$. Note that $[1]$ is closed and open, $K$ is closed, so $K'$ is closed, thus $\chi_{K'}$ is continuous. This implies that $\delta\leq m_{n_k}(K')\rightarrow m(K')$, that is $m(K')\geq \delta>0$.

Now we want to use a great theorem of $Furstenberg$, the multiple recurrence theorem:

For any probabilistic dynamical system $(X,T,m)$ which preserves the measure, suppose that $B\subset X$ is a measurable set with $m(B)>0$.Then for any integer $k$, there are infinitely many $n$ such that $m(B\bigcap T^{-n}(B)\bigcap...\bigcap T^{-kn}(B))>0$

This is a refinement of the recurrence theorem of Poincare. Poincare’s theorem says that almost all points in $B$ shall return to $B$ for infinitely many times. Furstenberg’s theorem states that one part of the returns has some rule inside, that is the arithmetic progression rule. More precisely, there is an arithmetic progression of any length, but the common difference is not known, a fortiori, although it guarantees that the common length can be arbitrarily large. This theorem doesn’t say that there exists an arithmetic progression of any common difference. But, in some sense, is also the default of this result.

Now apply this result to our system $(\Sigma_2,\sigma,m)$ and to the set $K'$. Thus for any $k$, there is infinitely many $n$ such that $m(K'\bigcap \sigma^{-n}(K')\bigcap...\bigcap \sigma^{-kn}(K'))>0$. What does this mean? If $y$ lies in such a set, then $y,\sigma^n(y),...,\sigma^{kn}(y)\in K'=[1]\bigcap K$. Since $K$ is the closure of the orbit of $x$, then there is a $j$ such that $d(\sigma^j(x),y)<1/2^{kn+n}$, this means that the first $kn$ digitals of $\sigma^j(x)$ and $y$ coincide. Since we have that $y_1=y_{n+1}=y_{2n+1}=...=y_{kn+1}=1$, this implies that $x_{j+1}=x_{j+n+1}=...=x_{j+kn+1}=1$. And there are infinitely many arithmetic progression of length $k+1$.

For summary, we have proven a theorem of Szemeredi,

If $f:\mathbb{N}\rightarrow \{0,1\}$ such that there is a positive real number $\delte\in (0,1)$, and for any $n$ there is $b_n,a_n$ with $b_n-a_n>n$ and $\frac{1}{b_n-a_n}\sum_{a_n+1\leq k\leq b_n}f(k)>\delta$. Then for any $k$, there are infinitely many arithmetic progression as a subsequence in the sequence $(f(n))$ of the form $1=f(a)=f(a+n)=f(a+2n)=...=f(a+kn)$, and the common difference can be arbitrarily large.

This is an interesting result. It says that, as long as the ones is not too sparse, there exists always infinitely many arithmetic progressions of any length.