Monday, July 03, 2006

Data Mining for Imbeciles 

We can't believe this happened:
The Air Force Office of Scientific Research recently began funding a new research area that includes a study of blogs. Blog research may provide information analysts and warfighters with invaluable help in fighting the war on terrorism.

Dr. Brian E. Ulicny, senior scientist, and Dr. Mieczyslaw M. Kokar, president, Versatile Information Systems Inc., Framingham, Mass., will receive approximately $450,000 in funding for the 3-year project entitled “Automated Ontologically-Based Link Analysis of International Web Logs for the Timely Discovery of Relevant and Credible Information.”

“It can be challenging for information analysts to tell what’s important in blogs unless you analyze patterns,” Ulicny said.

Patterns include the content of the blogs as well as what hyperlinks are contained within the blog.

Within blogs, hyperlinks act like reference citations in research papers thereby allowing someone to discover the most important events bloggers are writing about in just the same way that one can discover the most important papers in a field by finding which ones are the most cited in research papers . . . .

To some degree blog interpretation, he said, involves understanding a different form of communication.

“Blog entries have a different structure,” Ulicny said. “They are typically short and are about something external to the blog posting itself , such as a news event. It’s not uncommon for a blogger to simply state, ‘I can’t believe this happened,’ and then link to a news story.”

In this example, Ulicny said, there might not be much of interest in the blog posting, yet the fact that the blogger called attention to this story can be significant to understanding what matters.
And those are just a few of the startling discoveries Dr. Ulicny's investigations are sure to yield!

We certainly hope the AO-BLAIWLTDRCI project is some sort of front for a clandestine effort to track and catalogue dissident blogs (like this one). While we are quite willing to believe that our terrorist foes have never heard of phone taps, SWIFT, etc., it pains us to imagine that the Air Force Office of Scientific Research is about to drop half a million clams because nobody told them about Technorati.

If, however, Dr. Ulicny is on the up-and-up, we hope he has allotted a portion of his grant for field research. For a mere $50K, we would be happy to bring him along to the next BARBARian beer blast and introduce him to the gang. In the wild, as it were. What ho!

UPDATE (via our distinguished colleague Erin Roof): AO-BLAIWLTDRCI is not to be confused with a recently unveiled NSA program that would track other personal data -- e.g., what kind of kisser you are, the phrase you use most often on IM, your pimp name, and the superhero you most resemble:
[F]ast-growing social networking websites such as MySpace and Friendster are a snoop's dream.

New Scientist [our third-favorite magazine -- SS] has discovered that Pentagon's National Security Agency, which specialises in eavesdropping and code-breaking, is funding research into the mass harvesting of the information that people post about themselves on social networks. And it could harness advances in internet technology - specifically the forthcoming "semantic web" championed by the web standards organisation W3C - to combine data from social networking websites with details such as banking, retail and property records, allowing the NSA to build extensive, all-embracing personal profiles of individuals . . . .

[Phone logs, which the NSA has been maintaining since 2001] can only be used to build a very basic picture of someone's contact network, a process sometimes called "connecting the dots". Clusters of people in highly connected groups become apparent, as do people with few connections who appear to be the intermediaries between such groups. The idea is to see by how many links or "degrees" separate people from, say, a member of a blacklisted organisation.

By adding online social networking data to its phone analyses, the NSA could connect people at deeper levels, through shared activities, such as taking flying lessons. Typically, online social networking sites ask members to enter details of their immediate and extended circles of friends, whose blogs they might follow. People often list other facets of their personality including political, sexual, entertainment, media and sporting preferences too. Some go much further, and a few have lost their jobs by publicly describing drinking and drug-taking exploits. Young people have even been barred from the orthodox religious colleges that they are enrolled in for revealing online that they are gay . . . .

Other data the NSA could combine with social networking details includes information on purchases, where we go (available from cellphone records, which cite the base station a call came from) and what major financial transactions we make, such as buying a house.

Right now this is difficult to do because today's web is stuffed with data in incompatible formats. Enter the semantic web, which aims to iron out these incompatibilities over the next few years via a common data structure called the Resource Description Framework (RDF). W3C hopes that one day every website will use RDF to give each type of data a unique, predefined, unambiguous tag . . . .

On the downside, this ease of use will also make prying into people's lives a breeze. No plan to mine social networks via the semantic web has been announced by the NSA, but its interest in the technology is evident in a funding footnote to a
research paper delivered at the W3C's WWW2006 conference in Edinburgh, UK, in late May.

| | Technorati Links | to Del.icio.us