Top > Prospective Students > Focus on Mathematics > Yutaka Ishii, Associate Professor

Focus on Mathematics

インターネットを支える数学 石井豊

I’m sure all of you have had some experience in searching the Internet using keywords. At present, the search engine having the largest share in the world is Google. The main reason as to why Google has been so successful lies in the accuracy of its search results in which websites of high importance are returned with near-certainty. The index expressing this importance of a website is called PageRank—a website with a higher PageRank will appear higher in search results and be noticed by more people.Did you know that the algorithm for calculating PageRank makes very effective use of mathematical theory? Here, I would like to give you a brief explanation of the kind of mathematics supporting Google technology.

To begin with, just what is the Internet? Roughly speaking, it refers to individual websites and all the links interconnecting between them. Here, if we regard one website as one vertex and indicate the link relationships among vertices by arrows, we can represent the world of the Internet as a finite number of vertices (though that number would actually be on the order of several ten billion!) and as the arrows between those vertices. In mathematics, this type of object is called a directed graph, which we can use as a mathematical model of the Internet.

With the above in mind, just how is PageRank, which has a great effect on the order of search results, calculated?

Well, it’s calculated according to the principle that a website with
1)links from more websites and
2)links from more websites with higher PageRank values and
3)links from more websites selected as link destinations
will have a higher PageRank. The idea here is that calculating the importance of a certain website does not simply consider the number of links to that website but also the quality of those links. That is to say, links from websites that garner much support are of higher quality and selected links are more reliable than simply a large number of links. However, we can ask whether this calculation principle would face the dilemma of “which came first: the chicken or the egg?” For example, given two websites i and j linked to each other, calculating the PageRank of i would require knowledge of the PageRank of j, and conversely, calculating the PageRank of j would require knowledge of the PageRank of i. In such a case, can PageRank be reliably determined?

(1) Link relationships among websites

(2) Directed graph representing website relationships

(3) Matrix representing website relationships

In actuality, this calculation principle can be expressed as H

H r = r

where H is a supergiant matrix (whose number of both rows and columns corresponds to the number of websites) determined only by the structure of the Internet as a directed graph and r is a column vector whose elements consist of the PageRank of each website (*1).This relationship can also be described in the following way using terms from “linear algebra” that a first-year university student would learn. In general, when a scalar λ and a non-zero column vector x satisfy the relation Ax =λx for a square matrix A, λ is called an eigenvalue of A and x is called an eigenvector with respect to the eigenvalue λ. The above relation H r = r can therefore be interpreted as “PageRank can be calculated by finding an eigenvector r with respect to the eigenvalue 1 of H.”

However, can we say for sure that eigenvector r satisfying the above expression actually exists? If it doesn’t, then it would follow that PageRank itself doesn’t exist. Next, even if such a vector should happen to exist, would it be the only one (up to constant multiples) to satisfy the above expression? If not, it would be necessary to allocate multiple PageRank values to a single website. Additionally, even if we were able to determine in an abstract manner that only one vector r exists, would there be a method that could efficiently calculate that vector? The number of rows and columns in matrix H can be on the order of several tens of billions, so failure to find an efficient calculation method could result in an astronomically long time to perform that calculation. Fortunately, it’s mathematical theory that provides us with answers to these questions. Specifically, the existence and the uniqueness of the eigenvector r is guaranteed by the Perron-Frobenius theorem while the ability of calculating eigenvector r with good accuracy by a technique called the power method is guaranteed by the convergence theorem of Markov chains. In short, Google’s PageRank concept and its calculation method are first and foremostjustified by these mathematical theorems.

In the above way, Google has grown into a megacorporation doing more than 80 billion dollars in annual sales based on technology that skillfully combines several(andelementary) mathematical facts.Similar to its contribution to PageRank, mathematics has made substantial contributions to a number of convenient technologies that we have come to use on a routine basis in modern society.To put it another way, studying mathematics at a university means that you will be given the opportunity to discover a new technology comparable to PageRank (*2).

*1 Specifically, element (i, j) of H is the value obtained by dividing the number of links from website j to website i by the total number of links emanating from j. In reality, however, it is necessary to slightly modify matrix H to satisfy the property of “primitiveness.”

*2 Of course, “discovering a new technology” is not the only purpose of studying mathematics.