The title of this blurb is a question whose answer is "An IBM computer system that recently won over seventy thousand dollars on Jeopardy against two stalwart human champions, Ken Jennings and Brad Rutter."
The match followed more than a decade of work by IBM computer scientists developing and harnessing capabilities that enabled Watson to do “open question answering” – answering questions delivered in unstructured formats, about unknown topics, with all of the nuance and uncertainty of natural language. The Jeopardy format is particularly challenging because it requires understanding the puns and deliberately obscure allusions common to Jeopardy categories. Humans filter, interpret, and cross-reference to push the buzzer in Jeopardy – Watson needed these same skills, which are all part of “data analytics.”
Years ago a machine might be puzzled by the phrase "Time flies like an arrow" and struggle to consider its various meanings, with little chance of associating contextual clues successfully. Watson stores a huge amount of information (for example, that Paganini's capricci are written for the violin and that the Church Lady is a SNL character of Dana Carvey). But to be successful at Jeopardy, Watson also needed to be able to quickly puzzle through the distraction of the question, "With much 'gravity', this young fellow of Trinity became the Lucasian Professor of Mathematics in 1669" (correct response: "Who is Isaac Newton?").
Watson's hardware is extensive (nearly three thousand processors and 16 terabytes of RAM) but its software is even more impressive. IBM scientists programmed Watson without outside assistance, but they built upon research partly sponsored by various U.S. agencies. CCICADA researchers at the Information Sciences Institute (ISI) at USC were involved with some of this earlier research on question answering sponsored by the Advanced Research Development Agency in its AQUAINT program.
Watson exemplifies one aspect of an industry of increasingly sophisticated query-and-response software. For example, the discovery process is a part of complicated litigation: each side gets access to potentially huge stores of information which must be sorted and analyzed. According to a recent article in the New York Times (http://www.nytimes.com/2011/03/05/science/05legal.html), by using the software "one lawyer would suffice for work that once required 500" in handling the millions of documents commonly encountered in complex actions. And while "People get bored, people get headaches. Computers don't." Related software also has a logical role in homeland security, enabling law enforcement and intelligence officers to sift through large volumes of information to find clues and put pieces together during investigations.
CCICADA faculty member Dr. Hans Chalupsky and CCICADA Research Director Dr. Eduard Hovy are now working with some of the IBM scientists who created Watson in their next venture to improve upon Watson in the RACR project. RACR (an acronym whose precise meaning is still to be determined) will accumulate information, respond to queries, and also evaluate the reliability of the information using the reputation of the sources. It will combine elements of Watson's reasoning along with the abilities to acquire, search, and classify data that such commercial products as the legal discovery engines possess. Further, Dr. Hovy stated that RACR is planned to be a system with "deeper reasoning, inference, and intelligent question answering" which will be able to "learn from mistakes". This technology will have important applications to complex DHS questions involving very large amounts of information having varying degrees of reliability, with all of these aspects evolving in time.
Natural language processing is central to both CCICADA’s research program and its education program. Dr. Hovy was an organizer and principal lecturer for the CCICADA/VACCINE Reconnect Conference on "Extracting and Visualizing Information from Natural Language Text" in June 2010. The Reconnect Conference, as reported in the June 2010 iVAC, exposed faculty teaching undergraduates to the methods underlying natural language processing and to its role in homeland security. Natural language and reasoning are also important topics in the Data Sciences Summer Institute (DSSI), a six-week residential summer program sponsored by CCICADA partners at the University of Illinois at Urban-Champaign. Hovy is also a frequent speaker at DSSI, which brings together over 20 computer science juniors, seniors and graduate students for an intensive introduction to key topics in data analytics. Details on the 2010 DSSI program were reported in the September 2010 issue of the iVAC Newsletter.