Published on December 21, 2010.
Google Books is controversial for several reasons; in this ambitious corporate attempt to digitize as many books as possible, copyright and monopoly issues may only be the most vexing. These and other issues are contentious even though, or especially because, casual reader and scholarly researcher alike already enjoy the benefits of digitized books directly.
Many of the books are available only in Preview format, but even the limits of this format can be liberating: some books offer a few pages (so we can read Fr. Miguel Bernad on “The Nature of Rizal’s Farewell Poem”); others several dozens, perhaps even a couple of hundreds (such is the case, for instance, of the massive and minutely detailed Indonesian-English dictionary by Alan M. Stevens and A. Ed.
Last week, Google announced one more benefit—and it is staggering. In a word, “computational analysis” of the millions of books already digitized is now possible. In the first version of Google’s N-gram viewer (n-gram means a “string of characters uninterrupted by a space”), we can “search” over 500 billion words contained in over 5 million books (to be exact: 5,195,769), “containing ˜4 percent of all books ever published.” What does this all mean? It means we can “observe cultural trends and subject them to quantitative investigation.” (All the quotes are from the paper published last week in Science magazine, co-written by 13 authors and “The Google Books Team.”)
That sounds forbidding, but in fact the viewer was designed to be fun, informative and ridiculously easy to use.
For instance: Searching the 361 billion words available in the English corpus (the largest of seven language corpora, and the most reliable) for mentions of Jose Rizal, Andres Bonifacio and Emilio Aguinaldo in the 110 years between 1896 and 2006 yields a visual picture of the ebb and flow of historical reputation, if by reputation we include the sense of contemporary relevance.
(In the graphic above, Rizal is represented by the line in dark shade, Bonifacio by the medium shade, Aguinaldo by the line in light.)
We can see that Aguinaldo was much more famous and written about than Rizal during the last years of the 19th century and the first decade of the 20th century. His role as a revolutionary general and first president of the first Asian republic, which was reported in the international news columns of the newspapers at the time, is recorded in the books published in those very years. Note the first two peaks in his reputation: the first occurred at the height of the Philippine-American War, and the second (for reasons I can only guess at) immediately before and during the first Philippine Assembly elections of 1907.
Rizal’s reputation reached its peak in the early 1960s; in 1961, the international community (not just the Philippine nation) celebrated the centenary of his birth. (The peak that followed soon after may have been propelled by the government’s publication of the now-standard Rizal compilation of writings.)
We can see other milestones in Rizal studies reflected in the graph: for instance, the controversy over the law requiring the reading of Rizal’s novels in the early 1950s is caught in the web of data, while the fire lit by Benedict Anderson’s “Imagined Communities” (one of the most influential books of the late 20th century) seems to have sparked an enormous amount of renewed interest in Rizal after its publication in 1983.
Bonifacio’s graph shows the extraordinary influence of the compelling revisionist views (considerably in error, in my layman’s view) of Teodoro Agoncillo. In 1956, his “Revolt of the Masses,” the still-standard reference on Bonifacio and the Katipunan, was published. I think the graph shows the immediate and impressive impact of Agoncillo’s bracing history.
The centennial commemorations of 1996 (the revolution, Rizal’s execution) to 1998 (the proclamation of independence) register in the graphs only as a modest and sustained rise—a modesty that should invite further reflection, both of the serious and fun variety.