Monday, December 27, 2010

"Quants" for the Humanities

I read in the news that robots are reading the news. They comb through billions of words to detect trends and moods in markets and then trigger automatic trades on Wall Street. (See )

When I was in junior high in the earlier 1970's, my private school decided that we were all going to spend a few weeks learning to "speed read." Speed reading went out of style long ago and any benefits that I received have faded away, although I and many others find the technique of "skimming" text quite useful. The new technologies that traders are using to skim the news takes that concept to the google power.

Wall Street technologies are doing very much the same thing that scholars are doing in the digital humanities. (See Counting words has been an important part of scholarship. Millman Parry revolutionized classical scholarship in the early 20th Century by meticulously counting patterns and variations of words and syllables in Homer's Iliad and Odyssey. (See The Making of Homeric Verse.) Now we can skim millions of books and analyze patterns of usage over time in an entire culture.

A colleague and I are now thinking about how this might be applied to architectural elements of culture. Many digital imaging applications, e.g. Google's Picasa, have built-in face recognition. In Picasa, if you mouse over the image of someone's face, a frame appears around the face asking whether you would like to "tag" this person. The same software might be redesigned to recognize columns, freezes, gargoyles, etc.

Friday, December 17, 2010

Carefully Chosen Words in the Age of Digital Humanities

As a classicist, I read Greek and Latin very slowly. I don't write scholarly pieces any more, but when I did I wrote very slowly with carefully chosen words and then I edited my piece interminably. Very different from a blog.

Good ideas will always take a long time to work out. But does anyone value these lovingly turned phrases, these intricate, elegant concepts in a world of twitter, facebook,wikileaks and blogs? Apparently the answer is yes, because The Atlantic, a high-brow magazine for intellectuals, has begun to turn a profit according to The New York Times. And that profit is coming from an unlikely source: online advertising.

Advertisers have decided that painstaking research, fact-checked articles and elaborate prose will indeed "attract eyeballs" online. The magazine still gets revenue from print ads and subscriptions but the move to online readership has saved the day.

At the same time, a field called "Digital Humanities" has begun to take off thanks to Google's obsessive scanning of all printed materials in the universe. See this article in The Atlantic. Google estimates it is up to about five percent of all of literature. Scholars then find patterns in word usage over time and geography to map cultural changes. It is like shredding all the books in the world, grinding them into fine powder, suspending them in a chemical solution, running them through a magical centrifuge, waiting for them to precipitate into a crystal pattern and then thinking about the pattern that appears on the spectrometer.

This kind of spectral analysis of literature requires the same degree of careful, critical thinking and ends up being published in magazines like Science which caters primarily to scholars and scientists. So while the digital humanities takes a very different approach to analyzing the words, it takes the same approach when it comes to synthesizing them.

The Atlantic online is putting out very fine prose by extremely well-read authorities in their fields. All the thoughts are hand-made, as it were, and result in 500 to 1,000 words of beautiful sentences. Meanwhile, Google's Ngram Viewer is sifting 500,000,000 words as though they were DNA molecules, completely disregarding the authority of the sourses. Scholars interpret the results and produce carefully crafted thoughts and prose. All that carefully crafted prose will, someday, be fed back into the word grinder of Google or whatever replaces Google and be reduced to dust again.