algebraic number theory-part 3

This post is mainly based on the book, ‘Fermat’s dream’ by three Japanese mathematicians Kazuya Kato, Nobushige Kurokawa and Takeshi Saito. This post is concerned with the local and global problems in algebraic number theory.

We already know that in Dedekind rings, especially like algebraic integer rings, the most important object is the non-zero prime ideal. We know thus that any non-zero ideal I in a Dedekind ring R, there is a decomposition I=\prod_{P,\text{prime ideal}}P^{e_P} with e_P\in\mathbb{Z}. Of course, this decomposition still works for fractional ideals in the fractional field K of R. So, in fact, if we fix a prime ideal P, then any fractional ideal I gives an integer e_P. Since now P is fixed, we can write this number as v_P(I)(when there is no confusion ,we will write it simply as v(I)). It is clear that for two fractional ideals I,J, we have that v_P(IJ)=v_P(I)+v_P(J). Moreover, it is not hard to show that v_P(I+J)\geq \min{v_P(I),v_P(J)}. In terms of commutative algebra, we say that v_P:K^{\times}\rightarrow\mathbb{Z},k\mapsto v_P(Rk) defines a discrete valuation on K(here we require that the map be surjactive). We can set v_P(0)=+\infty so that the two conditions above are satisfied on the whole K. We call R a discrete valuation ring and K a discrete valuation field.

The most basic example is perhaps the p-adic numbers. Suppose that R=\mathbb{Z}, K=\mathbb{Q}. Then any non-zero prime ideal correspond to an unique prime number p. And for any non-zero rational number a, we can write it as a=p^nA/B where A,B are two integers prime to p. Then this n is just v_p(a). It is easy to verify that the valuation thus defined is indeed a discrete valuation.

The really interesting point is that we can define some topology on K using this discrete valuation. Indeed, using the properties of the discrete valuation, we can define a distance d on K. For any k,k'\in K, we set d_v(k,k')=e^{-v(k-k')}(and we set d_v(k,k)=0,compatible with the above definition). This is indeed a distance thanks to the inequality above. Note we here choose e. In fact, we can change e for any real number greater than 1, and the resulting distance is equivalent to the one defined above in the sense that they define the same topology on K.

There is an important concept to introduce before going any further. We define A=\{k\in K|v(k)\geq0\} to be the valuation ring of K. Note that this is indeed a ring, closed under addition and subtraction and multiplication. Since the map v:A\rightarrow \mathbb{N} is surjective(this condition is essential), we can choose a\in A such that v(a)=1. Then for any ideal I\subset A, we set n=\inf_{k\in A}v(k). Then for any k\in I, since v(k/a^n)\geq0, we must have, according to the definition of A, k/a^n\in A, that is a^n divides k for any k\in I. Moreover, if k\in I with v(k)=n, then v(k/a^n)=0. It is easy to show that if an element k\in A has v(k)=0,then since v(1/k)=0, which implies that 1/k\in A, thus k is a unit in A. So, we have that k/a^n is a unit, which means that I=(a^n). So, for any ideal in A, it is always of the form (a^n) for some n\in\mathbb{N}. So the valuation ring is rather simple under the discrete valuation. In the above example, we have that A=\{k\in\mathbb{Q}|k=p^na/b,a,b\in\mathbb{Z},n\geq0\}. So in general, A\neq R.

Now return to the topology defined above. It is not hard to see that the algebraic operations on K is continuous under this topology, so this makes K into a topological field. And A a topological ring, too. K being a metric space, a natural question is, whether K is complete with respect to this distance. How to complete this space if it is not complete? Perhaps we should get inspired from the above example. One way to complete A is to consider the inverse limit of the system A/(a^n). So we define \hat{A}=\varprojlim_n A/(a^n), the completion of A(in fact, \hat{A}, being a closed subspace of \prod_n A/(a^n) which is compact due the Tychonoff’s theorem, thus is compact, and as a result is complete). And we set K_v to be the fractional field of \hat{A}. We can show that K_v is in fact locally compact, which arises from the observation that one neighborhood of the origin of K_v is just \hat{A}, which is compact and open at the same time due to the discrete valuation condition and the fact that \hat{A}=\{k\in K_v|v(k)\geq 0\}(note that, we can induce a discrete valuation, which we denote still as v, on A so that its restriction to A coincide with v). This is a very good signal, since this implies us that we can do integration on K after finding the Haar measure on K.

We call R/P the residual field of K( it is indeed a field, since P is a prime ideal, and is not zero, thus is maximal, so R/P is a field). Note that, when K is a number field(finite algebraic extension of \mathbb{Q}), We see that R is a \mathbb{Z}-module of rank [K:\mathbb{Q}], so is for all non-zero ideals, and thus we have that R/P is a finite set, thus is a finite field. So, all in all, given a prime ideal P\neq 0, we can define a valuation, thus a distance on K, with a residual field, R/P. We call a number field a global field, while a local field refers to a field with a discrete valuation such that it is complete with respect to the distance induced by the valuation, with a complementary condition that the residual field is a finite field. Note that in fact, we haven’t defined what a residual field is in general. Suppose that a field K with a discrete valuation v, and consider its valuation ring A=\{k\in K|v(k)\geq0\}. We saw that there is only one non-zero prime ideal of A, that is (a). And we set A/(a) to be the residual field of K. So, we say that a field is a local field if it is complete topological field and its residual field is finite. One question is, are there many local fields? Not so many, in fact we can show that any local field is the completion of some number field K with respect to some discrete valuation v induced by some prime ideal in \mathfrak{O}_K. Note that, using the projective-limit definition of \hat{A}, we can see that the residual field of K is the same as any of its completions K_v.  So, in fact, local fields appear in rather limited domains, just those number fields.

So, in this post, we have associated any non-zero prime ideal of an algebraic integer ring with a discrete valuation, and thus defined a distance, and at the end, completed the field to get a local field.

h-cobordism-part 2

This series of posts are mainly based on the ‘lectures on the h-cobordism theorem’ by John Milnor.

This second post is concerned with Morse function. A Morse function f of a manifold M is intuitively a height function of M. Recall the definition of cobordism between two manifolds M,M', it is a quintuple (N,m,m;F,F') with N=m\bigcup m' a disjoint union of m,m'. One natural question is: can we separate these two parts? Note that N is compact, and m,m' are two closed(thus compact) subset of N. With the Haudorff property of N, we know that there is a function f:\rightarrow [0,1] such that f^{-1}(\{0\})=m,f^{-1}(\{1\})=m'. We can modify this function such that it is a smooth function and still satisfies the above property. There are some related concepts. We say x\in M a critical point of a function h: M\rightarrow\mathbb{R} if dh_x=0. We say also that h(x) a critical value of h. If at a critical point x, we have that det(\frac{\partial^2 h}{\partial x_j\partial x_i}) the Jacobian of h at x is not zero, then we say that h is non-degenerated at x\in M. The problem is, for the function defined above on N, does it have all its critical points in the interior of N? Are these critical points all non-degenerate? Here we introduce the concept of Morse function, just to include all these conditions into one: a smooth function f:N\rightarrow\mathbb{R} is a Morse function, if f^{-1}(\{0\})=m,f^{-1}(\{1\})=m', and all critical points of f are in the interior of N and are all non-degenerate. Note that if f is non-degenerate at a critical point x\in N, then in a neighborhood U of x f can be written as f(x_1,...,x_n)=-x_1^2-...-x_k^2+x^2_{k+1}+...+x^2_n for some chart \phi:U\subset N\rightarrow \mathbb{R}^n. It is easy to see that the integer k is independent of the choice of the charts. This result is called the Morse lemma. From this lemma, we see easily that the non-degenerate critical points are isolated. And if f is a Morse function, then all these critical points are isolated, with that N being compact, we have that there are only finitely many critical points for f. And we call the number of critical points the Morse number of f. The fundamental result concerning the Morse function is perhaps the following result:

For any cobordism (N,m,m';F,F'), there is a Morse function on N.

The proof of this result is a bit technical. First we will show that there exists a function on N such that none of its critical points is in the boundary of N. Then we will show that after some modification, we can obtain a function that all its critical points are non-degenerate. For this, we can first work in the Euclidean case, and then using a finite cover of N to modify the function step by step until we get a Morse function. For the first step, we show that

There exists a smooth function f:N\rightarrow [0,1] with f^{-1}(\{0\})=m,f^{-1}(\{1\})=m' such that all its critical points lie in N-V where V is a neighborhood of the boundary m\bigcap m'.

Note that we can find a finite open cover \{U_i\}_{1\leq i\leq k} of M such that for any i, U_i can not reach both m and m'. We identify these open sets with its correspondence in \mathbb{R}^n. And if U_i reaches m, then according to the definition of boundaries of a manifold, we have that U_i lies in the half space \mathbb{R}^n_{\geq 0}=\{x\in\mathbb{R}|x_n\geq 0\} with that U_i meets with the hyperplane x_n=0 of \mathbb{R}^n. So, we define f_i: U_i\rightarrow [0,1], x\mapsto c_ix_n for some positive constant c_i such that f_i(U_i)\subset[0,1/2]. If U_j reaches m', then similarly we define f_j:U_j\rightarrow [0,1],x\mapsto 1-c_jx_n for some positive number c_j such that f_j(U_j)\subset [1/2,1]. Otherwise, we define f_l:U_l\rightarrow [0,1], x\mapsto 1/2. Then we use a partition of unity \{\phi_i\}for this cover \{U_i\}. So, each f_i\phi_i can have an extension to the global N(just extending by zero), and now f=\sum_if_i\phi_i is a function on N. Since for each point, we have that 0\leq f_i(x)\leq 1 for all i, thus 0\leq \sum_if_i(x)\phi_i(x)\leq \sum_i\phi_i(x)=1, so there is f:N\rightarrow [0,1]. What is more, if f(x)=0, then if x\in N-\bigcup_{k\in I} where \bigcup_{k\in I} is the union of the above open sets meeting m, and thus for these k\in I, there is \phi_{k}(x)=0. So we have that f(x)=\sum_{i\not\in I}f_i(x)\phi_i(x)\geq 1/2\sum_{i\not\in I}\phi_{i}(x)=1/2\sum_i\phi_i(x)=1/2, a contradiction. And thus x\in \bigcup_{k\in I}V_k. If for all k\in I, f_k(x)>0, since f(x)=0, we must have that \phi_k(x)=0 for all k\in I. Yet \sum_i\phi_i(x)=1, thus there is a k\not\in I such that U_k doesn’t meet m and \phi_k(x)>0. But note that this time f_k(x)\geq1/2, so again we have that f(x)>0, another contradiction. So, we must have some i\in I such that f_i(x)=0. Note what this means? In the coordinate expression, we recognize that x\in m. So, we get that f^{-1}(\{0\})=m. Similarly, we have that f^{-1}(\{1\})=m'. Next we will show that in a neighborhood of m\bigcup m', there is no critical points of this f. Suppose that x\in m', and then \frac{\partial f}{\partial x_n}(x)=\sum_{i}\frac{\partial f_i\phi_i}{\partial x_n}(x)=\sum_i f_i\frac{\partial \phi_i}{\partial x_n}(x)+\sum_{i}\phi_i\frac{\partial f_i}{\partial x_n}(x). Note for the first sum, if x\in U_i, that is U_i meets m', then we have that f_i(x)=1. Otherwise, we have that \phi_i(x')=0 in a neighborhood of x, so we can assume that for all i, we have f_i(x')=1 in a neighborhood of x. Thus the first sum is reduced to \frac{\partial (\sum_i\phi_i)}{\partial x_n}(x). Since this reduction is valid in a neighborhood of x, and \sum_i\phi_i(y)=1(y\in N), the first term is just zero. Then we consider the second term. Note that \frac{\partial f_i}{\partial x_n}(x)=1, so we have that \frac{\partial f}{\partial x_n}(x)>0, thus showing that the derivative of f at boundary points is not zero, and concluding the proof of this result.

Next we need the following result,

If f is a C^2 mapping of an open set U\in\mathbb{R}^n to \mathbb{R}, then for almost all linear mapping L:\mathbb{R}^n\rightarrow\mathbb{R}(‘almost’ means a full measure set in Hom_{\mathbb{R}}(\mathbb{R}^n,\mathbb{R})=\mathbb{R}^n), f+L has only non-degenerate critical points.

This result is due to Morse. Every time there appears the term ‘almost’ in differential topology, we should think of Sard’s theorem. So, perhaps we should construct a mapping whose image domain is U_n=Hom_{\mathbb{R}}(\mathbb{R}^n,\mathbb{R}), and whose critical points correspond to the degenerate critical points of f. The following proof is a genius idea. We consider M=\{(x,L)\in U\times U_n|d(f+L)(x)=0\}. And the projection p:M\rightarrow U_n,(x,L)\mapsto L. We admit the fact that M is a sub-manifold. Note that dL determines L, so we have a natural injection i:U\rightarrow M which is also surjection. We can show that i is a diffeomorphism, and thus p:M\rightarrow U_n is the same thing as q=p\circ i:U\rightarrow U_n,x\mapsto -df(x).Then a point x\in U is critical of q, it is the same as d^2f(x), as a square matrix, is not of full rank, and thus has zero determinant. So according to Sard’s theorem, which says that the set of critical values forms a zero-measure set in the image domain. In other words, the set of L\in U_n such that L=q(x)=-df(x) for some x\in U is critical(equivalently, det(d^2f(x))=0) is of zero measure. This gives the proof of the above result.

The next result says that we can define a C^2 topology, and in this topology for each Morse function, all the elements of some of its neighborhood are Morse function again. That is to say, the set of Morse functions on N is an open set under this topology. We know easily how to define a C^0 topology. Yet for C^1,C^2 things like that, it is not so easy, since we are no longer in an Euclidean space. The solution is to pull the functions on N locally into Euclidean spaces. So, we can choose a finite cover \{U_i\}_{1\leq i\leq k} of N with compact refinement K_i(each K_i is compact in U_i). Note that the space C^2(N,\mathbb{R}) is at least a vector space. So, to make it into a topological vector space, we just need to define the neighborhoods of the origin. For any positive number \delta>0, we set N(\delta)=\{g\in C^2(N,\mathbb{R})| \forall i=1,2,...,k,\forall x\in K_i, |g_{U_i}(x)|<\delta,|\frac{\partial g_{U_i}}{\partial x_j}(x)|<\delta, |\frac{\partial^2 g_{U_i}}{\partial x_j\partial x_l}(x)|<\delta, \forall j,k=1,...,n\}. We can use these N(\delta) to form a base of topology and thus to generate a topology on N. We call this topology the Whitney topology on N. Note that in the above definition, we have chosen a particular open covering of N. In fact, this topology is independent of this choice, since it is easy to verify that if these derivatives of f is bounded on a compact set in one coordinate, then they are also bounded in another coordinate(which sends compacts to compacts). Now we are ready to state our next result:

Suppose U\subset\mathbb{R}^n is an open set, and K\subset U is compact. If f:U\rightarrow \mathbb{R} having only non-degenerate critical points in K and being C^2, then there exists a positive number \delta>0 such that for any g\in N(f,\delta)(in other words, g-f\in N(\delta)), g has only non-degenerate critical points in K

This is not hard to show. We have to show that \frac{\partial g}{\partial x_j}, \frac{\partial^2 g}{\partial x_j\partial x_l} can not be zero at the same time.Since f is so, and K is a compact set, we can find such a \delta.

Noe let’s try to prove the theorem:

For any cobordism (N,m,m';F,F'), there is a Morse function f: N\rightarrow[0,1].

Note that we have shown that there exists a function f:N\rightarrow[0,1] such that it has no critical points in a neighborhood of the boundary. We can choose this neighborhood U\supset m\bigcup m' with \overline{U} being compact and f has no critical points on a neighborhood U' of \overline{U}. Then under the C^2 topology, we have that there is a neighborhood V of f such that all its elements have no critical points on U'. Now f may have degenerate critical points on N-U. Next we want to eliminate these points by modifying f a little on a local coordinate level. Suppose that N-U has a finite open cover \{U_i\}_{1\leq i\leq k} such that each U_i is just an open set in \mathbb{R}^n and each U_i has a compact closure. Now look at the first open set U_1. We know that, as the above result says, for almost all linear functionals L:\mathbb{R}^n\rightarrow\mathbb{R}, the map f+L:U_1\rightarrow\mathbb{R} has no degenerate critical points. Clearly we can choose those L with sufficiently small coefficients to guarantee that f+L will not be too far from f. So now we have constructed a f+L on U_1 which has no degenerate critical points. But the problem is, how to extend f+L to other U_i? We can extend by zero, this reminds us that we can construct a function h_1:N\rightarrow\mathbb{R} such that h_1=1 in a compact neighborhood of U_1 and h_1=0 outside another compact neighborhood of U_1(in other words, there is U_1\subset\overline{U_1}\subset V_1\subset\overline{V_1}\subset W_1 with the closures of each open set being compact, and h_1|_{V_1}=1,h_1|_{N-W_1}=0), and we set f_1=f+h_1L. Note that on U_1 we still have that f_1=f+L_1, thus has no degenerate critical points on U_1. Choosing L still smaller, now f_1 has no critical points on U' and has no degenerate critical points on U_1. Now, we continue this process to U_2. We can suppose that U_1\bigcap U_2\neq \emptyset. Then we can choose a linear functional L and a function h_2:U_2\rightarrow\mathbb{R} just as h_1 such that f_1+h_2L has no degenerate critical points on U_2 and has no critical points on U'. We have to show that f_1+h_2L has no degenerate critical points on U_1, either. This is not hard to show, since we can choose L to be small enough such that those derivatives of f_1+h_2L are still near to those of f_1. So, in this way we have constructed a f_2=f_1+h_2L which has no degenerate critical points on U_1\bigcup U_2 and has no critical points on U'. This process can be continued and so the finally we can get a f_k which has no degenerate critical points on \bigcup_i U_i\supset N-U and has no critical points on U'. So, we have proven that this f_k has only non-degenerate critical points and has no critical points in a neighborhood of m\bigcup m'. Note that in each step above, we have that f_i|_{m\bigcup m'}=f|_{m\bigcup m'}. What is more, for f, we can choose U,U' such that 0<f(x)<1(\forall x\in N-U), since N-U is a compact set, we can choose L such that the sup and inf of f_1 on N-U is not 1,0. We can do similarly for other steps such that these f_i share this property. And thus we have that f_k^{-1}(\{0\})=m and f_k^{-1}(\{1\})=m'. So, this f_k is just the Morse function that we are seeking. In fact, we see that f_k is still not far from f. So, we can say that the set of Morse functions on (N,m,m',F,F') is a dense subset in C'=\{f\in C^2(N,[0,1])|f^{-1}(\{0\})=m, f^{-1}(\{1\})=m'\}. What is more, the set of Morse function is open in C^2(N,\mathbb{R}), so it is also open in C'.

In fact, we can say something more. We can modify this f such that the new f has no two critical points with the same value. Indeed, suppose that x\neq x'\in N such that f(x)=f(x'). Then using the Morse lemma, we see that in a neighborhood of x there is only one point(that is the point x) such that df=0 at this point(recall that near x, there is f(y)=-y_1^2-...-y^2_k+y^2_{k+1}+...+y^2_n+...), so we can choose two neighborhoods x\in V\subset U such that K=\overline{U}-V is compact and |df(y)|=\sum_{i=1,...,n}|\frac{\partial f}{\partial y_i}| is bounded from below by \delta>0 on this compact set, and of course in U there is only this critical point x. Now choose a smooth function h:N\rightarrow[0,1] such that h_V=1 and h_{N-U}=0. We can choose \lambda>0 such that \lambda |dh|<\delta/2 on N and \lambda is smaller than all the possible non-zero differences of the values of the critical points. Then, we set that g=f+\lambda h. It is clear that g(x)\neq g(x'). What is more, on V, dg=df, on N-\overline{U}, dg=df. And on |dg|=|df-\lambda dh|>|df|-\lambda|dh|>1/2|df|>0, so all in all, we don’t create any new critical points and at the same time we isolate two critical points.

Now suppose that f is a Morse function on (N,m,m';F,F'). Then for any c\in(0,1) such that f^{-1}(c) contains no critical points(there are plenty of this kind of points, since there are only finitely many critical points), then we see that f^{-1}([0,c)) is an open set of N with boundary m\bigcup f^{-1}(c). Indeed, f^{-1}(c) is a sub-manifold of N since f is regular here, so the implicit function theorem says that this is so. So, we have that f^{-1}([0,c]) and f^{-1}([c,1]) are both manifolds with boundaries. And the sum of their Morse numbers is equal to that of f. So, we can choose successively these c such that each manifold in the final decomposition has only Morse number 1. This is the last result of this post:

Any cobordism can be decomposed into cobordisms with Morse number 1.

So, in this post, we have shown that there are in fact many Morse functions, they form a G_{\delta} set in the C^2 topology.

Voevodsky’s fascinating interview

In this interview, Voevodsky talks about his past work, his present work and some reasons that he changed his research direction since 2009 or so. It seems that having nobody to talk to is not always a good thing, even in the mathematical research(there were only ten people or so working in the same area as Voevodsky in his time, feeling ‘lonely’, frustrated of explaining things to other non-experts).

episodic thoughts

The october issue of Gazette des Mathématiciens  has a transcript in french of a really fascinating interview of Vladimir Voevodsky, as part of a dossier on Théorie des types et mathématiques certifiées.

The hour long video of that interview, in english and conducted by Gaël Octavia from Fondation Sciences Mathématiques de Paris (see also a blog set up for the ICM for context) is the following on Vimeo (and a must-see) :

View original post

algebraic number theory-part 1

The most important arithmetic property of the integer ring \mathbb{Z} is perhaps that it is a principal ideal domain, thus is unique factorization domain, such that every non-zero non-unit element can have a decomposition into the products of irreducible elements, and this decomposition is unique up to a permutation. Yet this property is rather rare. In other words, consider the algebraic integer ring \mathfrak{O}_Kof any algebraic number field K. In general it is not unique factorial. For example, there are only finitely many imaginary quadratic integer ring that are unique factorial(a conjecture of Gauss, now proved. We have mentioned this result in a post before). The easiest counterexample is \mathbb{Z}[\sqrt{-5}]=A_{-20}=\mathfrak{O}_{\mathbb{Q}[\sqrt{-5}]}. We have clearly, 2\times 3=(1+\sqrt{-5})\times (1-\sqrt{-5}). Since N(2)=4,N(3)=9,N(1+\sqrt{-5})=N(1-\sqrt{-5})=6, we can show easily that these four numbers are all irreducible, yet examining their norms, we see that this decomposition is not unique.

How to rescue this situation? One genius idea is to consider the ideals, instead of elements, consider the decomposition of ideals into maximal ideals(note that, irreducible ideals correspond to p-maximal ideals, here a p-maximal ideal is an ideal that is maximal among the set of principal ideals), instead of decomposition of an element into irreducible elements.

Before going on, let’s reexamine the definition of unique factorization domain. One common definition is that, just as above, each non-zero non-unitary element can be decomposed into a product of irreducible elements, and this decomposition is unique up to permutation of these irreducible elements. Sometimes, this uniqueness is hard to verify. In fact, in some sense, irreducible elements are suitable to define the decomposition of an element, since, recall the definition of irreducible elements, an element r is irreducible in a ring R if for any expression r=st(s,t\in R) we have that s is a unit or t is a unit. So, in this definition, there is not at all any trace of decomposition. But look at prime elements. It says that, an element r\in R is prime if for any r|st(s,t\in R), we have that r|s or r|t. This definition deals with decomposition. So, there is an equivalent definition for the unique factorization domain: a ring is unique factorization domain if and only if all its non-zero non-unitary elements can be decomposed into a product of prime elements. This time, we don’t require the uniqueness of the decomposition. But this uniqueness is already guaranteed by the property of the prime elements(from these two equivalent definitions, we can see that, a ring is unique factorization domain if and only if all its irreducible elements are also prime elements). So from now on, we will always talk about the decomposition into prime elements or prime ideals, according to the situation.

So sometimes, the irreducible elements and prime elements are hard to distinguish, so are the maximal ideals and prime ideals. Yet, the good news is that for the algebraic integer ring, all non-zero prime ideals are maximal ideals, just as in the case of \mathbb{Z}. This involves another concept, the Dedekind domain. A domain R is a Dedekind domain if it is Noetherian, all its non-zero prime ideals are maximal ideals, and it is integrally algebraically closed in its fractional field. The simples example is the integer ring, \mathbb{Z}. Other examples are the principal ideal domain. There is a result concerning the relation between unique factorization domains(UFD) and Dedekind domains: a Dedekind domain is UFD if and only if it is a principal ideal domain(PID). Or, equivalently according to the above comment, a UFD is Dedekind if and only if it is PID. This means that any UFD that is not PID is not Dedekind. An easy example is R=k[x,y] polynomial ring over a field k. For example, the ideal generated by x, Rx is a prime ideal, yet it is not maximal(Rx\subset R(x,y)). One important result from commutative algebra is that, if R is Dedekind domain, and K is its fraction field, L an finite extension of K. Then the algebraic closure R' of R in L is again Dedekind. This last proposition gives us directly that all the algebraic integer rings are Dedekind. Dedekind domains are very suitable to talk about prime decompositions of ideals. One reason is that they are Noetherian, and non-zero prime ideals are maximal. Before going on, we have to introduce another concept: fractional ideals.

For any domain, R and its fraction field K, a subset I\subset K is called a fractional ideal if there is a non-zero element r\in R-0 such that rI\subset R is a non-zero ideal of R. If R is Noetherian, we have an equivalent definition, I is a fractional ideal if I is an R-module of finite type. From now on, we will restrict ourselves to the case that R=\mathfrak{O}_K of some algebraic number field K. We set the set of fractional ideals to be C(R). There is an operation on this set, multiplication, just as we did in this post. Now we want to define the inverse of a fractional ideal. Suppose that I is a fractional ideal, we define I'=\{k\in K|kI\subset R\}. Note that this is indeed a fractional ideal(first we can assume that I is an ideal of R and thus generated as I=R(e_1,...,e_n) since R is Noetherian. So I'=\bigcap_{1\leq i\leq n}I_i where I_i=\{k\in K|ke_i\in R\}. So, we have to show that each I_i is a fractional ideal. Note that k\in K lies in I_i if and only if for the expression k=\frac{s}{t}(s,t\in R), we have that \frac{se_i}{t}\in R. Thus we must have rt\in Re_i for some r\in R. If we define J_i=\{t\in R|\exists r\in R,rt\in Re_i\}, then it is easy to verify that J_i is an ideal of R, so we see that I_i\equiv J_i\times R, thus is a R-module of finite type. And this shows that I' is a fractional ideal of R). So we have given C(R) a group structure, an Abelian group structure, more precisely. As we said in the post, sometimes this group is often too large. So we can modulo the fractional principal ideal subgroup. That is, we have a natural map, f:K^{\times}\rightarrow C(R),k\mapsto Rk=(k). Then the co-kernel of this map, cok(f)=C(R)/im(f), is just the set Cl(R) we defined in that post(of course, there we only consider the ideals, not the fractional ideals, so not all elements have inverse, and sometimes it doesn’t make a group. The only reason is that it is too small). We denote this quotient again as Cl(R), this time it is a group. using the concept of fractional ideals, we can state the theorem of prime decomposition of fractional ideals:

If I is a fractional ideal of R, then there is a unique expression I=\prod_{P}P^{e_P} where the product is over all the non-zero prime ideals of R and e_P\in\mathbb{Z} for all P and only finitely many of them is not zero.

This is really a fascinating theorem, since it points out that the law of prime decomposition works for Dedekind domains, in particular for the algebraic integer rings. So, in some sense, we don’t lose much. We have  noticed that  sometimes C(R) is often too large, and we have to consider Cl(R). It is easy to see that \#Cl(R)=1 if and only if R is a PID(since R is Dedekind, so this is also equivalent to that R is UFD). We call Cl(R) the ideal class group of R. So this group measures the difference between R and a principal ideal domain.

To measure the difference between R and \mathbb{Z}, we have to consider another thing, the units of the ring. We call it the unit group, and denote it by R^{\times}. Clearly, this group contains at least two elements.Another way to view this unit group is to reconsider the morphism f:K^{\times}\rightarrow Cl(R). We see easily that ker(f)=R^{\times}. Concerning the unit group, there is a result which describes explicitly the structure of R^{\times}. Recall that we denote \Sigma(K) to be the set of field morphisms from K to \mathbb{C}, and \Sigma_1(K)=\{\sigma\in\Sigma(K|\sigma=\overline{\sigma})\}, and \Sigma_2(K) to be the set of morphisms that are not equal to their conjugates(just pick up one from each pair), moreover we set r_1=\#\Sigma_1(K),r_2=\#\Sigma_2(K). Then theorem says

We define r=r_1+r_2-1, and G=\{k\in K||k|=1\}, then we have that R^{\times}=\mathbb{Z}^{r}\bigoplus G.

As for the ideal class group, we have also

Cl(R) is a finite Abelian group. We call \#Cl(R) the class number of R.

Perhaps some example will help illustrate these two results.

Dirichlet L-functions

As we know that there are three important properties about the Riemann \zeta-function: infinite product expression, analytic continuation and the functional equation.

One natural generalization of the Riemann \zeta-function is the Dirichlet L-function. And indeed these L-functions share the three properties mentioned above.

First let’s introduce the Dirichlet character. Given a positive integer N, then consider the group homomorphisms from the multiplicative group (\mathbb{Z}/N\mathbb{Z})^{\times} to the multiplicative group \mathbb{C}^{\times}. We extend this map to all of \mathbb{Z}/N\mathbb{Z} by setting the values at those missing points to be 0. We call one such map a Dirichlet character of conductor N. If n is a divisor of N, then we have a natural map from (\mathbb{Z}/N\mathbb{Z})^{\times} to (\mathbb{Z}/n\mathbb{Z})^{\times}, i.e. the restriction map. So all the characters of conductor n can be extended to characters of conductor N. But the inverse doesn’t hold, that is to say, there is some character \chi of conductor N which is not the extension of any character of conductor n for n<N(we call this character the primitive character of \mathbb{Z}/N\mathbb{Z}, this is a consequence of the Euler formula N=\sum_{n|N}\psi(n)). This simple fact can be deduced from the structure theorem of finite abelian groups, and we omit the proof.

Now we define the Gauss sum of a character \Gamma(\chi,m)=\sum_{n\in\mathbb{Z}/N\mathbb{Z}}\chi(n)e^{2\pi imn/N}.

Note we can make an analogy between the group (\mathbb{Z}/N\mathbb{Z})^{\times} and the group \mathbb{R}_{>0}. Recall the \Gamma function, \Gamma(s)=\int_{0}^{\infty}e^{-t}t^sdt/t. Note that \mathbb{R}_{>0}\rightarrow\mathbb{C},t\mapsto t^s is a character for the multiplicative group \mathbb{R}_{>0}. It is easy to show that all the continuous characters of \mathbb{R}_{>0} are of the form t\mapsto t^s for some complex number s. This is just like \chi on (\mathbb{Z}/N\mathbb{Z})^{\times}. Moreover, the exponential function is a character of the additive group \mathbb{R}, just like e^{2\pi imn} on \mathbb{Z}/N\mathbb{Z}. So, in some sense, we can also define \Gamma(s,m)=\int_0^{\infty}t^se^{-mt}dt/t. An easy calculation shows that \Gamma(s,m)=m^{-s}\Gamma(s,1). We will show that there is a similar relation for the \Gamma functions on \mathbb{Z}/N\mathbb{Z}:

\Gamma(\chi,m)=\overline(\chi(m))\Gamma(\chi,1)

There are two cases, the first is that gcd(m,N)=1.  Then we have that \Gamma(\chi,m)=\sum_n\chi(mn)\chi(m)^{-1}e^{2\pi imn/N}=\chi(m)^{-1}\sum_n\chi(n)e^{2\pi in/N} =\chi(m)^{-1}\Gamma(\chi,1)=\overline{\chi(m)}\Gamma(\chi,1). So, the first case is easily done. As for the second case, we suppose that gcd(m,N)=d>1, m=dm',N=dN'. We can find c such that c=1(N'),gcd(c,dN')=1 with \chi(c)(if not, then for all c with c=1(N'), gcd(c,dN')=1, we have that \chi(c)=1. This means that \chi can be defined on \mathbb{Z}/N'\mathbb{Z}. in other words, \chi is an extension of some character of \mathbb{Z}/N'\mathbb{Z}, which contradicts the fact that \chi is not any extension). Then note that \Gamma(\chi,m)=\sum_{r(\text{mod}N')}\sum_{n=r(N)}\chi(n)e^{2\pi inm'/N'}=\sum_{r(N')}(\sum_{n=r(N)}\chi(n))e^{2\pi irm'/N'}. Then \sum_{n=r(N)}\chi(n)=\sum_{n=r(N)}\chi(nc)\chi(c)^{-1}=\chi(c)^{-1}\sum_{nc=r(N)}\chi(nc)=\chi(c)^{-1}\sum_{n=r(N)}\chi(n). Since \chi(c)\neq1, we have that this sum is zero, and so \Gamma(\chi,m)=0=\overline{\chi(m)}\Gamma(\chi,1). So, the result is proved.

Now we can define the Dirichlet L-function for the primitive character \chi, L(s,\chi)=\sum_{n\in\mathbb{N}-0}\chi(n)n^{-s}. Using the multiplicity of \chi, we have that L(s,\chi)=\prod_{p,\text{prime}}(1-\frac{\chi(p)}{p^s}). As for \chi, there are two possibilities for \chi(-1)=1,-1. We define that \epsilon(\chi)=0 if \chi(-1)=1 and \epsilon(\chi)=1 if \chi(-1)=-1. And we define the complete Dirichlet L-function to be \hat{L}(s,\chi)=N^{s/2}\pi^{-(s+\epsilon(\chi))/2}\Gamma(\frac{s+\epsilon(\chi)}{2})L(s,\chi). Then we have the corresponding result for this L-function:

L(s,\chi) can be extended to the whole complex plane for any non-trivial character on \mathbb{Z}/N\mathbb{Z}. And if \chi is a primitive character, then we have the functional equation \hat{L}(s,\chi)=\frac{\Gamma(\chi,1)}{i^{\epsilon(\chi)}\sqrt{N}}\hat{L}(1-s,\overline{\chi}).

Note that we have used the same letter \Gamma to refer to both the \Gamma-function on \mathbb{R}_{>0} and those on \mathbb{Z}/N\mathbb{Z}. When the variables of the function are (s,m), we refer to the first kind, and when the variables are (\chi,m), we refer to the second. Not too confusing.

First of all, let’s look at what \hat{L}(s,\chi) is(we first consider the case \chi(-1)=1). \hat{L}(s,\chi)=(\frac{N}{\pi})^{s/2}\sum_n\int e^{-t}t^{s/2}\frac{\chi(n)}{n^s}dt/t=\sum_n\chi(n)\int e^{-t}(\frac{Nt}{\pi n^2})^{s/2}dt/t =\int t^{s/2}dt/t \sum_{n>0}\chi(n)e^{-t\pi n^2/N}=\int t^{s/2}dt/t f(t,\chi) where we denote f(t,\chi)=\sum_{n>0}\chi(n)e^{-t\pi n^2/N}=\frac{1}{2}\sum_{n\in\mathbb{Z}}\chi(n)e^{-t\pi n^2/N}(for the lastidentity we have to pose the condition that \chi(-1)=1). The form of f(t,\chi) reminds us of the Poisson summation formula. If there is no term \chi(n), then we can exactly apply this formula. Recall that there is a formula \Gamma(\chi,m)=\overline{\chi}(m)\Gamma(\chi,1). In other words, \Gamma(\overline{\chi},m)=\chi(m)\Gamma(\overline{\chi},1). Thus we have that \Gamma(\overline{\chi},1)f(t,\chi)=1/2\sum_n e^{-t\pi n^2/N}(\Gamma(\overline{\chi},1)\chi(n))=1/2\sum_ne^{-t\pi n^2/N}\Gamma(\overline{\chi},n) =1/2\sum_{k(N)}\overline{\chi}(k)\sum_ne^{-t\pi n^2/N+2\pi in/N}. If we define F(x)=e^{-t\pi x^2/N+2\pi ix/N}, then we got that \Gamma(\overline{\chi},1)f(t,\chi)=1/2\sum_{k(N)}\overline{\chi}(k)\sum_nF(n). So, now we can use the Poisson summation formula to F(x), and we get \sum_nF(n)=\sum_n\tilde{F}(n)=\sqrt{N/t}\sum_ne^{-\pi(k-nN)^2/Nt}. So, there is \Gamma(\overline{\chi},1)f(t,\chi)=1/2\sqrt{N/t}\sum_{k(N)}\overline{\chi}(k)\sum_{n}e^{-\pi(k-nN)^2/Nt}. Note that \chi(k)=\chi(k-nN) for any character. Moreover, the whole sum k-nN(k=0,1,...,N-1;n\in\mathbb{Z}) runs through the whole \mathbb{Z} once and only once, so the last term becomes 1/2\sqrt{N/t}\sum_{m\in\mathbb{Z}}\overline{\chi}(m)e^{-\pi m^2/Nt}. As a result, we get that \Gamma(\overline{\chi},1)f(t,\chi)=\sqrt{N/t}f(1/t,\overline{\chi}). Or, equivalently, \Gamma(\overline{\chi},1)f(1/t,\chi)=\sqrt{Nt}f(t,\overline{\chi}). So the next step is to express \hat{L}(s,\chi) in a form more symmetric. In the \theta function, we split the integral into two parts, here we want to try something formal. In fact, \Gamma(\overline{\chi},1)\hat{L}(s,\chi)=\int t^{s/2}dt/t f(t,\chi)\Gamma(\overline{\chi},1)=\int t^{s/2}dt/t \sqrt{N/t}f(1/t,\overline{\chi}). After a change of variables t\mapsto 1/t, we get that \Gamma(\overline{\chi},1)\hat{L}(s,\chi)=\sqrt{N}\int t^{(1-s)/2}dt/tf(r,\overline{\chi})=\sqrt{N}\hat{L}(1-s,\overline{\chi})(we have to say that this process is purely formal, it doesn’t consider any convergence. The legal process should still split the integral into two parts, \hat{L}(s,\chi)=\int_0^1t^{s/2}dt/tf(t,\chi)+\int_1^{\infty}t^{s/2}dt/tf(t,\chi)=\int_1^{\infty}dt/t(f(1/t,\chi)t^{-s/2}+f(t,\chi)t^{s/2}), and then use the following identity to get show the symmetry of \hat{L}(s,\chi)). So we have really finished our proof except the last identity

\Gamma(\overline{\chi},1)\Gamma(\chi,1)=\chi(-1)N.

The proof is a bit tricky. We note that \overline{\chi}(n)\Gamma(\chi,1)=\Gamma(\chi,n), \chi(n)\Gamma(\overline{\chi},1)=\Gamma(\overline{\chi},n). So, there is |\chi(n)|^2\Gamma(\overline{\chi},1)\Gamma(\chi,1)=\Gamma{\overline{\chi},n}\Gamma(\chi,n).Summing over \mathbb{Z}/N\mathbb{Z}, we get that \phi(N)\Gamma(\overline{\chi},1)\Gamma(\chi,1)=\sum_{a,b}\overline{\chi}(a)\chi(b)\sum_ne^{2\pi i(a+b)n/N}(where \phi is the Euler function). Note that, if a+b\neq 0, then \sum_ne^{2\pi i(a+b)n/N}=0, and if a+b=0, we have that \sum_ne^0=N. So, at last we get that \phi(N)\Gamma(\overline{\chi},1)\Gamma(\chi,1)=\sum_aN\overline{\chi}(a)\chi(-a)=\phi(N)N\chi(-1). And this is \Gamma(\overline{\chi},1)\Gamma(\chi,1)=\chi(-1)N. Note also that \chi(-1)=1(our first assumption), thus, plugging all this into the above equation, we have that \hat{L}(s,\chi)=\frac{\Gamma(\chi,1)}{\sqrt{N}}\hat{L}(1-s,\overline{\chi}).

As for the case \chi(-1)=-1, we can note write directly \hat{L}(s,\chi) is a symmetric form as above. But let’s try anyway. We have that \hat{L}(s,\chi)=N^{s/2}\pi^{-(s+1)/2}\Gamma((s+1)/2)L(s,\chi)=\frac{1}{\sqrt{N}}\int t^{(s+1)/2}dt/t\sum_{n>0}n\chi(n)e^{-t\pi n^2/N}. Thus we denote g(t,\chi)=\sum_{n>0}n\chi(n)e^{-t\pi n^2/N}=1/2\sum_{n\in\mathbb{Z}}n\chi(n)e^{-t\pi n^2/N}. Using the same technique as above, we have that \Gamma(\overline{\chi},1)g(t,\chi)=1/2\sum_{k(N)}\overline{\chi}(k)\sum_nne^{-t\pi n^2/N+2\pi ikn/N}. So we define G(x)=ze^{-t\pi x^2/N+2\pi ikx/N}(which depends on k, of course). And applying the Poisson summation formula, we get that \sum_nG(n)=\sum_n\tilde{G}(n) =\sum_n\int xe^{-t\pi x^2/N+2\pi ikx/N-2\pi inx}=\sum_n i\sqrt{N}e^{-\pi(k-nN)^2/Nt}(k-nN)1/\sqrt{t^3}. Thus we get that \Gamma(\overline{\chi},1)g(t,\chi)=\frac{i\sqrt{N}}{2\sqrt{t^3}}\sum_{k(N)}\sum_n\overline{\chi}(k-nN)(k-nN)e^{-\pi(k-nN)^2/Nt} =\frac{i\sqrt{N}}{2\sqrt{t^3}}\sum_m\overline{\chi}(m)e^{-\pi m^2/Nt}=i\sqrt{N/t^3}g(1/t,\overline{\chi}). So, we have that \Gamma(\overline{\chi},1)\hat{L}(s,\chi)=1/\sqrt{N}\int t^{(s+1)/2}dt/t\Gamma(\overline{\chi},1)g(t,\chi)=\frac{i\sqrt{N}}{\sqrt{N}}\int t^{(s+1-3)/2}dt/tg(1/t,\overline{\chi}). This time, we prove the symmetry of the L-function rigorously. \Gamma(\chi,1)\hat{L}(1-s,\overline{\chi})=1/\sqrt{N}\int_0^1t^{(2-s)/2}\Gamma(\chi,1)g(t,\overline{\chi})dt/t+\int_1^{\infty}t^{(2-s)/2}\Gamma(\chi,1)g(t,\overline{\chi})dt/t=1/\sqrt{N}\int_1^{\infty}(t^{(s-2)/2}\Gamma(\chi,1)g(1/t,\overline{\chi})+t^{(2-s)/2}\Gamma(\chi,1)g(t,\overline{\chi}))dt/t =1/\sqrt{N}\int_1^{\infty}(t^{(s-2)/2}i\sqrt{NT^3}g(t,\chi)+t^{(2-s)/2}\Gamma(\chi,1)g(t,\overline{\chi}))dt/t=i\int_1^{\infty}t^{(s+1)/2}g(t,\chi)dt/t+1/\sqrt{N}\int_0^1t^{(s-2)/2}\Gamma(\chi,1)g(1/t,\overline{\chi})dt/t =i\int_1^{\infty}t^{(s+1)/2}g(t,\chi)dt/t+1/\sqrt{N}\int_0^1t^{(s-2)/2}i\sqrt{Nt^3}g(t,\chi)dt/t=i\int t^{(s+1)/2}g(t,\chi)dt/t=i\hat{L}(s,\chi) where we have used several times the following identities proved above \Gamma(\overline{\chi},1)g(t,\chi)=i\sqrt{N/t^3}g(1/t,\overline{\chi}) and its conjugate \Gamma(\overline{\chi},1)g(1/t,\chi)=i\sqrt{Nt^3}g(t,\overline{\chi}).

In the above proofs, we have used substantially the identity

\overline{\chi}(n)\Gamma(\chi,1)=\Gamma(\chi,n) for a primitive character \chi. Note that, we really should understand this identity in this way \Gamma(\chi,n)=\chi(n)^{-1}\Gamma(\chi,1), which is a perfect analogy of the real number case, \Gamma(s,m)=m^{-s}\Gamma(s,1).

The fact that L(s,\chi) can be analytically continued to the whole complex plane for non-trivial characters is just a consequence of the fact that \sum_n\chi(n)=0 and the Abel’s summation theorem. And we omit the proof here.

ideal classes of an integral ring

Suppose that R is an integral. There are many operations that we can do on the ideals of R. For example, if I,J\subset R two ideals, then, I+J=<I,J> the sum of I,J being the ideal containing both I and J. Also, IJ=<IJ>, the product of I,J being the ideal containing the product of any two elements from I,J. In this post, we mainly concern ourselves with the second operation, the product operation. We denote C(R) to be the set of non-zero ideals of R. Why exclude the zero ideal? One reason is that it annulates every ideal 0I=I, which is not so interesting. And we give this set the product structure defined above. There is a neutral element, R since RI=I for any ideal. Sometimes this set is too large. For example, even the simplest ring, R=\mathbb{Z}, we have that C(R)=\{(n)|n\in\mathbb{Z}\}. This set is almost the same as the ring itself. On one hand, this is good, since it reflects all the information of R, yet the problem is that it is too large. Studying C(R) is almost the same as studying R itself, so this approach will not give us much good. In fact, sometimes we concern only with some particular properties of R, for example, whether R is a principal ring. We have seen that this kind of ring have very nice properties, especially the modules on these rings, they behave much the same way as vector spaces over a field. So perhaps we can define some equivalence relation on C(R) to identify some elements to make smaller, at the same time, to reflect if R is principal or not. We wish to express the idea that a ring R is principal if and only if C(R) contains only one equivalence class of ideals. That is, all the principal ideas are equivalent. In other words, (i)\equiv(j). How to express this idea? We know that, for any i,j\in R-0, we have that xi=yj for some x,y\in R-0(for example, we can take x=j,y=i). This is the equivalence relation we are looking for. More precisely, I,J are two ideals, we say that I\equiv J if there exist x,y\in R-0 such that xI=yJ. This is in deed an equivalence relation, the reflectivity, the transitivity, etc. can all be verified easily. So, we denote Cl(R)=C(R)/\equiv. Note that this equivalence relation is compatible with the product operation: if I\equiv J,I'\equiv J', then we have that II'\equiv JJ'. So, now we have a much smaller set Cl(R), at least it identifies all the principal ideals. What is more, we see that Cl(R) contains only one element if and only if R is principal. This is what we were looking for.

In general, Cl(R) can still be very large. Yet for some important rings, we can show that Cl(R) is finite. This is the case for R a subring of some algebraic integer ring of finite type as a module over \mathbb{Z}, that is R\subset \mathfrak{O}_K where K is an algebraic number ring, that is the following result:

For an algebraic integer ring R, Cl(R) is finite.

The proof of this important result relies on the following proposition:

For R as above, there is a positive integer c, such that for any non-zero ideal I\subset R, there is an element r\in I with |I/rR|\leq c.

Indeed, for any I\in C(R), we choose r\in I such that N=|I/rR|\leq c. Then we must have that NI\subset rR. Note that rR\equiv R. Moreover, R\equiv NrR\subset NI, So we have that NrR\subset NI\subset rR. This means that for any ideal I, there is a multiple of it (the multiplicator is bounded for all non-zero ideals) such that it is sandwiched between cR and R. Note that, since R is a \mathbb{Z}-module of finite type, this means that R/cR is finite. So, this quotient must have finitely many ideals. Since any ideal I has a multiple NI/x that is sandwiched between cR\subset R, what is more, (NI/x)/cR is also an ideal in R/cR, thus there are only finitely many equivalent ideals in Cl(R).

So all things are reduced to proving this proposition. First of all, let’s have a close look at these ideals. We can first consider the case R=\mathfrak{O}_K for some algebraic field K. For example, R=\mathbb{Z}[i], the Gauss integer ring. Note that, R is in fact a lattice in \mathbb{C}=\mathbb{R}^2. It is also the case for R=\mathbb{Z}[\omega](\omega^3=1,\omega\neq1). Yet it is not the case for R=\mathbb{Z}[\sqrt{2}]. Can we still give it a lattice structure? Wait a minute, why do we consider a lattice structure on R? This is a good question. We take the case R=\mathbb{Z}[i] to illustrate this usage. It is obvious that R is generated by i. As a lattice, it is generated by 1=(1,0),i=(0,1). Then for any non-zero ideal I\subset R, and any non-zero element r\in I, r,ri are still linearly independent. So, I has sub-lattice structure with respect to R. Then intuitively, we have that covol(I)/covol(R)=|R/I|=N(I). For any element r\in R, we define a norm on R, the norm N(r) of r is just the square of the modulo of r as a complex number N(r)=|r|^2. To prove the proposition for R, for each I, we have to find a non-zero element x\in I such that |I/xR| is uniformly smaller than a constant c independent of I. Note that |I/xR|=covol(xR)/covol(I). This means that covol(xR)<c covol(I). Can we express covol(xR) in terms of the norm of x and the covol of R? Note that covol(xR)/covol(R)=|R/xR|, we guess that |R/xR|=N(x). So, in this way, we have that |x|^2=N(x)<c covol(I)/covol(R). So here we should consider a ball B_t=\{r\in \mathbb{R}^2=\mathbb{C}||r|\leq t\}. Recall the Minkowski’s body theorem(cf. the post on the geometry of number), if for some t we have that vol(B_t)\geq 2^2covol(I), then there must be some element r\in I-0 such that r\in B_t, or equivalently, |r|\leq t. For the best case, we consider the t such that vol(B_t)=4covol(I). After some calculations, we find that vol(B_{t'})=\pi {t'}^2=4covol(I). So ,for t', there is a non-zero element r in I such that |r|^2\leq t'^2=\frac{4}{\pi}covol(I). Note that we have that N(r)=|r|^2, so we get that N(r)\leq \frac{4}{\pi}covol(I). So, finally we find an element r\in I-0 such that N(r)\leq \frac{4}{\pi}covol(I). Compare with the above conjecture, we can take that c /covol(R)=\frac{4}{\pi}. Note that our original goal is to show that |I/rR| is bounded. Now |I/rR|=covol(rR)/covol(I)=N(r)covol(R)/covol(I)\leq \frac{4}{\pi}covol(R). At last, we showed that there exists such r\in I-0 with |I/rR|\leq \frac{4}{\pi}covol(R) independent of I. Note that in this process there seems to be some coincidence: especially N(x) is the square of x, while the volume of the ball B_t is the square of t. All these are due to the fact that the lattice is in \mathbb{R}^2, a dimension 2 vector space. This again result from the fact that K=\mathbb{Q}[i] is an extension of\mathbb{Q} of degree 2. Or more pertinently, this results from the fact that R is a \mathbb{Z}-module of rank 2.

How about the general case? We can generalize the above argument without any difficulty once we can identity R with a lattice in an Euclidean space of a dimension the same as the rank of R as a \mathbb{Z}-module. How to do this? An interesting problem. Note that from the above remarks we see that sometimes the direct identification doesn’t work(R=\mathbb{Z}[\sqrt{2}] is such a case). So, we have to consider other methods. Here is one. Suppose that K the quotient field of R. So K is an algebraic field. Then we know that the degree of the extension of K over \mathbb{Q} is the same as the rank of R as a \mathbb{Z}-module, that is to say [K:\mathbb{Q}]=rank_{\mathbb{Z}}(R). Then we consider the set of field morphisms from K to \mathbb{C}, that is \Sigma(K). Note that if \sigma\in \Sigma(K), then \overline{\sigma}(z)=\overline{\sigma(z)}(z\in K) also lies in \Sigma(K). We separate two cases, the first is that \overline{\sigma}=\sigma, the other is that the conjugate of \sigma is not equal to itself. So, we define \Sigma_1(K)=\{\sigma\in \Sigma(K)|\sigma=\overline{\sigma}\}. For the rest elements in \Sigma(K), since they are in pairs, so we can choose one element from each pair(\Sigma(K) is a finite set, so this choice is practical), and they form the set \Sigma_2(K). We set r_1=\#\Sigma_1(K), r_2=\#\Sigma_2(K), n=[K:\mathbb{Q}]. So we have that r_1+2r_2=n. Note that for \sigma\in \Sigma_1(K), we have that \sigma(z)\in\mathbb{R}. For convenience, we denote the elements in these two sets as \Sigma_1(K)=\{\sigma_1,...,\sigma_{r_1}\}, \Sigma_2(K)=\{\sigma_{r_1+1},...,\sigma_{r_1+r_2}\}. And we define a map f:R\rightarrow V=\mathbb{R}^{r_1}\times \mathbb{C}^{r_2}, r\mapsto(\sigma_1(r),...,\sigma_{r_1+r_2}(r)). Note that dim(V)=r_1+2r_2=n. So it remains to show that f(R) is a lattice in V. This is not hard to see. We have to find a basis (e_1,...,e_n) for R, and then show that the ‘lattice’ generated by f(e_1),...,f(e_n) has positive co-volume. Note that for each r\in R, f(r) is a vector whose components are the conjugates of r. If we identify \mathbb{C}=\mathbb{R}\bigoplus \mathbb{R}i, and we set P=(f(e_1)^{\dagger},...,f(e_n)^{\dagger}) the matrix of the image of the basis for R. Then we have that |det(P)|=2^{-r_2}|disc_{K/\mathbb{Q}}(e_1,...,e_n)|^{1/2}. So, we have that f(R) has positive co-volume. Then we have to show that f(R) is discrete in V. We give V the norm ||v||=\sup_{1\leq i\leq r_1+r_2}|v_i|(where v_1,...,v_{r_1}\in\mathbb{R} while v_{r_1+1},...,v_{r_1+r_2}\in\mathbb{C}). So, we have to show that for any t>0, the ball D_t=\{v\in V| ||v||\leq t\} contains only finitely many elements of f(R). But for any r\in R, note that ||f(r)||\leq t is the same as |\sigma_i(r)|\leq t(\forall i). Are there finitely many algebraic integers whose conjugates are bounded by a number t? How to attack this problem? Note that we have a very strong restriction: r is an algebraic integer. So, we can consider its minimal polynomial. We see immediately that the minimal polynomials of all such algebraic integers have coefficients bounded from above by some constant depending only on t. Clearly, we have only finitely many such polynomials, and thus f(R) is a lattice.Besides, covol(f(R))=|det(P)|=2^{-r_2}|disc(R)|^{1/2} This done, the rest proceeds much the same way as above. Hence in the following we just copy the results:

For any non-zero ideal I\subset R, I has the same rank as R viewed as \mathbb{Z}-modules, and thus the quotient field of I is also K, the same as that of R. So, we can see that |R/I|=\sqrt{\frac{|disc(I)|}{|disc(R)|}}. Note that, the smaller an ideal, in some sense, the larger its discriminant. We can think it this way: the smaller an ideal, the smaller the lattice in V, and thus the larger its co-volume. And we have seen that this implies that the discriminant of the ideal is larger. For any ideal I in R, we define N(I)=|R/I| the norm of I. This definition is compatible with the norm N=N_{K/\mathbb{Q}} in that N(rI)=|N(r)|N(I)(r\in R). With this, we have the following result due to Minkowski

Suppose that I\subset R is a non-zero ideal, then there exists a non-zero element r\in I such that |N(r)|\leq C(r_2,n)N(I)|disc(R)|^{1/2} where C(r,n)=(\frac{4}{\pi})^r\frac{n!}{n^n}.

Note that with this and |N(r)|=N(rR), we have that |N(r)|/N(I)=N(rR)/N(I)=\frac{|R/rR|}{|R/I|}=|I/rR|. So, we have that |I/rR|\leq C(r_2,n)|disc(R)|^{1/2}, proving the theorem in the general case, and showing that Cl(R) is a finite set for any algebraic integer ring.

linear Schrodinger equation

This post is about the existence and uniqueness of the solution to the linear Schrodinger equation. That is to say, $latex i\partial_{t}u+\Delta u=0, (t,x)\in\mathbb{R}\times\mathbb{R}^{d}, u(t,x)\in\mathbb{C}$ with the initial condition u |_{t=0}=u_{0} The main tool is due to Strichartz, in particular his norms. This post, serves to be an introduction to this method in using the linear Schrodinger equation.

First of all, since this equation is linear, the first idea comes to mind is to use Fourier transforms with respect to x. In fact, there is a group structure in the evolution process. That is what we are going to do right now.

Take the Fourier transform of the above Schrodinger equation, then we obtain that

i \partial_t\widehat{u} + (-i \xi )^{2} \hat{u}=0 where we denote \hat{u} the Fourier transform of u. So, we get that \partial_{t} \hat{u} +i \xi^{2} \hat{u} =0. We solve this ordinary equation very easily, that is \hat{u}(t,\xi ) =e^{i \xi^{2} t}\widehat{u_{0}}(\xi ). Then we take the inverse Fourier transform, we obtain that u(t,x)=S_{t} \ast u_{0} where \widehat{S_{t}} =e^{i \xi^{2} t} ( t>0 ) ,\widehat{S_{0}} =\delta_{0} the Dirac distribution at the origin. The group structure of this evolution can be seen very easily using the Fourier transform. The real problem is the regularity.

This is not very hard. Note that, we have $latex |\hat{u} (t,\xi )|=|\overline{u_{0}}(\xi)|$. Have we seen something like this before which concerns the norm of the Fourier transform of a function? Yes, in the Sobolev spaces. In deed, if we suppose that u_{0} \in H^{s} ( \mathbb{R}^{d} ), then we see easily that u(t,x) \in H^{s} (\mathbb{R}^{d} ) due to the above comment for fixed time t. Note that here we do not require that the time t be non-negative. This also guarantees that the evolution process hasa group structure. Now we can have a closer look at the operator S_{t}. We have seen that it is in fact an evolution operator. This said, we have that if there are two initial conditions, u_{0} ,v_{0}, then using the norm <u,v>=\int_{\mathbb{R}^{d}} ( 1+ |\xi |^{2} )^{s} \bar{u}vd\xifor the Hilbert space H^{s}, we have that < S_{t} u,S_{t} v>=< u,v>. Using the group structure, we see that the conjugate of S_{t} is just S_{t}^{\ast}=S_{-t}. All the way around, we have not seen any explicite formulation of S_{t}. In fact, we have seen that \widehat{S_{t}} =e^{i \xi^{2} t}. In general, we can not calculate the inverse Fourier transform of e^{i \xi^{2}t} directly, since it doesn’t lie in L^{1} (\mathbb{R}^{d} ). So, we must solve this problem in the sense of distribution. We can construct a sequenceof Gaussian-like functions f_{n} (\xi ) =e^{({it} -1/n )\xi^{2}}.This f_{n} has an inverse Fourier transform. And, this sequence converges to f=e^{i \xi^{2} t} in the topology of S' ( \mathbb{R}^{d}) (that is to say, for any g \in S ( \mathbb{R}^{d}), we have (f_{n} ,g ) \rightarrow ( f,g ) due to the dominant convergence theorem).After some calculations, we get that S_{t} = \frac{1}{( 4 \pi{it})^{d/2}} e^{i \frac{x^{2}}{4t}}. What is more, we verify easily that as t\rightarrow 0, we have that S_{t} \rightarrow S_{0} = \delta_{0} in the topology of S' ( \mathbb{R}^{d} ). This, combined with the comments above about the isometricity of S_{t}, shows that the application \mathbb{R}\rightarrow H^{s} ( \mathbb{R}^{d} ) ,t \mapsto S_{t} u_{0} is continuous. This result is very natural, and in some sense it has to be so, since we aresolving a differential equation, so the solution u_{t} must be continuous with respect to the temporal variable. Now we have that \hat{u} ( t, \xi ) =e^{i\xi^{2} t} \widehat{u_{0}} ( \xi). What does this expression mean? Note that, for the initial data, u_{0}, its Fourier transform has an amplitude \widehat{u_{0}} ( \xi ) for the frequency \xi. But for the time t, the absolute value of this amplitude doesn’t change, yet it oscillates much, in fact it oscillates in a fashion proportional to \xi^{2}. We can interpret this oscillation as the propagation of the component at the frequency \xi of the original wave. In other words, the higher the frequency, the faster its propagation.

Yet the problem is to control this ‘dispersion’. But how? We should use which norms? It is a very good yet hard question. Perhaps we can use the L^p norms. Which L^p? For example, for the L^2, we know already that the norm doesn’t change with time, that is ||S_tu_0||_{L^2}=||u_0||_{L^2}. There is another norms, which we can use, it is the L^1 or L^{\infty}. Note that S_tu_0 is in fact a convolution operator, so this reminds us of the inequality of Young, that is ||S_tu_0||_{L^{\infty}}\leq ||S_t||_{L^{\infty}}||u_0||_{L^1}. Note that, ||S_t||_{L^{\infty}}=\frac{1}{(4\pi t)^{d/2}}. So, we have some estimations on the L^2 and L^{\infty}, so now we can use the interpolation theorems on exposants, that is, if 1/p=a/2+(1-a)/\infty, 1/p'=a/2+(1-a)/1, then we have that ||S_tu_0||_{L^p}\leq 1^a(\frac{1}{(4\pi t)^{d/2}})^{1-a}||u_0||_{L^{p'}}. Note that 1/p'-1/p=1-a, so we have that ||S_tu_0||_{L^p}\leq 1^a(\frac{1}{(4\pi t)^{d/2}})^{1/p'-1/p}||u_0||_{L^{p'}}.

This estimation is for every instant t. Can we say something global? That is to say, can we have some norm to measure the whole time global behavior of u(t,x)?

This is what Strichartz did. He defines a norm on the space C^{\infty}(\mathbb{R}\times\mathbb{R}^d). That is, for a smooth function f(t,x), we define ||f||_{L^q_tL^r_x}=(\int_{\mathbb{R}}||u(t, .)||_{L^r_x}^qdt)^{1/q} for q<\infty and similarly for q=\infty. Is this norm surprising? Not really. First, we want to know the L^r information of u(t,x)for each instant t. So, we get a positive valued function on t. Then we want to control this function using some norm L^q. So, this is a very natural approach. Now perhaps some dimensional analysis is helpful. Suppose that f_{a,b}(t,x)=f(at,bx), then we can verify easily that ||f_{a,b}||_{L^q_tL^r_x}=a^{-1/q}b^{-d/r}. So, if we want that there be some inequality like ||S_tu_0||_{L^q_tL^r_x}\leq C||u_0||_{L^p}, we must also have it valid for u_{0,b}(x)=u_0(bx). Note that, for the second term, we have that ||u_{0,b}||_{L^p}=b^{-d/p}||u_0||_{L^p}. While for the first term, we have that S_tu_{0,b}=S_{tb^2}u_0(bx)(this is really due to the fact that in the expression of S_t(x), in the exponent, there is x^2/t, while in the rational term, there is 1/t^{d/2}). Thus, according to the above remark, we have that ||S_tu_{0,b}||_{L^q_tL^r_x}=b^{-2/q-d/r}||S_tu_0||_{L^q_tL^r_x}. So, for the inequality to be held for any reasonable u_0, we must have that -2/q-d/r=-d/p, or 2/p+d/r=d/p. The easiest case for L^p is of course L^2. And thus, we have the inequality of Strichartz,

If u_0\in L^2(\mathbb{R}^d), then we have that ||S_tu_0||_{L^q_tL^r_x}\leq C||u_0||_{L^2} for some constant C independent of u_0 with 2/p+d/r=d/2 and (q,r,d)\neq(2,\infty,2),q>2.

Note that this result says that the linear operator, T:H=L^2(\mathbb{R}^d)\rightarrow B=L^q(\mathbb{R},L^q(\mathbb{R}^d)) is continuous, where it is easy to verify that H is a Hilbert space and B is a Banach space with the norms defined above. This important observation reminds us to use something operator theory. To show that T is bounded, we can in some way consider its adjoint, T^*. That is to say, T^*:B'\rightarrow H'=\overline(H). There is a general result from operator theory,

Suppose that T:H\rightarrow B is a linear operator from a Hilbert space(its scalar product is <u,av>=a<u,v>(a\in\mathbb{C}), and for the scalar product induced on H', we write it <u',av'>=\overline{a}<u',v'>(u',v'\in H'=\overline{H})) to a Banach space(the action of a linear functional on the elements is (f,b)(f\in B',b\in B)), and its adjoint T^*:B'\rightarrow H' defined by <T^*(b'),h'>=(b',\overline{T(h')}). Then we have that ||TT^*||_{L(B',B)}=||T||_{L(H,B)}^2=||T^*||_{L(B',H')}.

This is not hard to prove. We just have to utilize the expression like ||T(h)||=\sup_{||b'||\leq1}|(b',T(h))| and things like that. We will not prove this result here. One remark is that the composition T^*T doesn’t make sense in general.

Now we want to see what T^*,TT^* are in our context. It is not hard to see that B'=L^{q'}(\mathbb{R},L^{r'}(\mathbb{R}^d)). One one hand, <T^*(b'),h'>_{H'}=\int_{\mathbb{R}^d}T^*(b')\overline{h'}dx. One the other hand, <T^*(b'),h'>_{H'}=(b',\overline{T(h')})_{B',B}=\int_{\mathbb{R}\times\mathbb{R}^d}=\int_{\mathbb{R}}<S_th',b'>_Hdt.But we have seen above that <S_th',b'>_H=<h',S_{-t}b'>_H. So, we get that <T^*(b'),h'>_{H'}=\int_{\mathbb{R}} <h',S_{-t}b'>_H=<h',\int_{\mathbb{R}}S_{-t}b'(t,.)dt>, thus we see that T^*(b')=\int_{\mathbb{R}} S_{-t}b'(t,.)dt. Thus, TT^*(b')=S_t(\int S_{-s}b'(s,.))ds=\int S_{t-s}b'(s,.)ds, just a convolution. So to prove the continuity of T, we have to show the continuity of T^* or TT^*. But for T^*, perhaps the difficulty is the same as T. So, we first consider TT^*, the most complex one. For f\in B', we have that TT^*(f)=\int S_{t-s}f(s,.)ds. So, we first evaluate ||TT^*(f)||_{L^r_x}\leq\int_{\mathbb{R}}||S_{t-s}f(s,.)||_{L^r_x}dt. Note that for almost all s, f(s,.)\in L^{r'}(\mathbb{R}^d), so using the estimation on the dispersion above, we get that ||S(t-s)f(s,.)||_{L^r}\leq c\frac{1}{|t-s|^{d/2}})^{1/r'-1/r}||f(s,.)||_{L^{r'}}. Now we get another convolution, ||TT^*(f)||_{L^r_x}\leq c\int_{\mathbb{R}})\frac{1}{|t-s|^{d/2}})^{1/r'-1/r}||f(s,.)||_{L^{r'}_x}ds=c\int_{\mathbb{R}}(\frac{1}{|t-s|^{d/2}})^{1/r'-1/r}h(s)ds. Note that \frac{d}{2}(1/r'-1/r)=\frac{d}{2}(1-2/r)=d/2-d/r=2/q, so all in all, we have that ||TT^*(f)(t)||_{L^r_x}\leq c\int \frac{1}{|t-s|^{2/q}}h(s)ds=c(\frac{1}{|s|^{2/q}}\ast h(s))(t). Now comes the last thing, ||TT^*(f)||_B\leq c||\frac{1}{|t|^{2/q}}\ast h(t)||_{L^q_t}.

Now we have a convolution which involves a special function 1/|t|^{2/q}. This function doesn’t lie in any L^p. Perhaps we can use the inequality of Hardy-Littlewood-Sobolev,

If a\in(0,n), (p,w)\in(1,\infty)^2 such that 1/p+a/n=1+1/w then for all f\in L^p(\mathbb{R}^n), we have that ||\frac{1}{|x|^a}\ast f(x)||_{L^w(\mathbb{R}^n)}\leq C||f||_{L^p(\mathbb{R}^n)} for some constant  C independent of f.

Now apply this theorem to our context, here n=1,a=2/q, w=q,p=q'(indeed h(t)=||f(t,.)||_{L^r_x} lies in L^q(\mathbb{R})). Besides, we have that 1/p+a/n=1/q'+2/q=1+1/q=1+1/w. What a relief, it satisfies this inequality, and we conclude that ||TT^*(f)||_B\leq cC||h(t)||_{L^q_t}=cC||f||_{B'}. That is to say, the operator TT^*:B'\rightarrow B is continuous, and so is T:H\rightarrow B. And we proved the result. Note that in the above argument, we use implicitly that q\neq\infty(a condition also in the Hardy-Littlewood-Sobolev inequality). For the case, q=\infty, we must have that r=2, thus ||TT^*(f)||_{L^{\infty}_tL^2_x}=\sup_t||TT^*(f)(t,.)||_{L^2}=\sup_t||\int S_{t-s}f(s,.)ds||_{L^2}\leq \sup_t\int ||S_{t-s}f(s,.)||_{L^2}ds=\sup_{t}\int ||f(s,.)||_{L^2}ds=||f||_{L^1_tL^2_x} where we have used the isometricity of S_t for the norm L^2. So, this case is also proved. And thus the whole result is finished. What is more, reexamine the condition in the inequality of Hard-Littlewood-Sobolev, we have to ensure that a=2/q\in (0,n)=(0,1). This means that q>2. So, the above result is really valid for the case q>2, just like what the condition says in the theorem.

Now we want to generalize the above analysis for the case where the equation is not homogeneous, that is:

i\partial_tu+\Delta u=f(f\in C(\mathbb{R},S(\mathbb{R}^d))) with initial condition u|{t=0}u_0\in S(\mathbb{R}^d)

First we can write the explicit formula for this solution u(t)=S_tu_0-i\int_0^tS_{t-s}f(s)ds. A little digression here. We see that S_t serves as an integral kernel, which means that it takes into account the effects of stimulation. For example, if f=0. At time t, the effect of the stimulation u_0 is taken to be S_tu_0. If at first, we add some stimulation, f(0), then its effect at time t becomes S_tf(0), and if at time t_1 we add another stimulation f(t_1), then this stimulation will have an effect S_{t-t_1}f(t_1) at time t. So, the whole effect of all the stimulation f(s)(0\leq s\leq t) will be \int_0^t S_{t-s}f(s)ds. Then why don’t we take a similar integral for the stimulations u(s)(0\leq s\leq t)? The point is that S_{t-s}u(s)=S_{t-s}S_su_0=S_tu_0(\forall s), so the effect of these u(s) are uniform, that is why we consider only the starting point, that is enough. Now return to our work, our result for the inhomogeneous case is

Suppose that (q_2,r_2) satisfy 2/q_2+d/r_2=d/2 and q_2,r_2\in[2,\infty],q_2>2. Then the solution u to the equation i\partial_tu+\Delta u=f(f\in L^{q_2'},\mathbb{R},L^{r'_2}(\mathbb{R}^d)) with initial condition u|_{t=0}=0 satisfies that u\in L^{q_1}(\mathbb{R},L^{r_1}(\mathbb{R}^d)) for any (q_1,r_1) satisfying the same condition as (q_2,r_2) as above. What is more, for each such pair, there is a constant C independent of f such that ||u||_{L^{q_1}_tL^{r_1}_x}\leq C||f||_{L^{q_2'}_tL^{r'_2}_x}.

The main idea of the proof for this result is that, we first show that ||u||_{L^{q_2}_tL^{r_2}_x}\leq c||f||_{L^{q_2'}_tL^{r_2'}_x} for some constant c using the same argument as above, and then ||u||_{L^{\infty}_tL^2_x}\leq c'||f||_{L^{q_2'}_tL^{r_2'}_x} for another constant c' using the continuity of the operator T^*:B'\rightarrow H'. So, we can apply the interpolation theorems to obtain that for any q_1\in [q_2,\infty], ||u||_{L^{q_1}_tL^{r_1}_x}\leq c''||f||_{L^{q_2'}_tL^{r_2'}_x} for some constant c'' depending on q_1. As for the case q_1\in[2,q_2], we will prove it later.

So, all this way, we have shown the global property of the solution to the linear Schrodinger equation.