Thursday, March 18, 2021

Project Debater, an autonomous debating system that can engage in a competitive debate with humans

An autonomous debating system. Noam Slonim et al. Nature volume 591, pp379–384, Mar 17 2021. https://www.nature.com/articles/s41586-021-03215-w

Abstract: Artificial intelligence (AI) is defined as the ability of machines to perform tasks that are usually associated with intelligent beings. Argument and debate are fundamental capabilities of human intelligence, essential for a wide range of human activities, and common to all human societies. The development of computational argumentation technologies is therefore an important emerging discipline in AI research1. Here we present Project Debater, an autonomous debating system that can engage in a competitive debate with humans. We provide a complete description of the system’s architecture, a thorough and systematic evaluation of its operation across a wide range of debate topics, and a detailed account of the system’s performance in its public debut against three expert human debaters. We also highlight the fundamental differences between debating with humans as opposed to challenging humans in game competitions, the latter being the focus of classical ‘grand challenges’ pursued by the AI research community over the past few decades. We suggest that such challenges lie in the ‘comfort zone’ of AI, whereas debating with humans lies in a different territory, in which humans still prevail, and for which novel paradigms are required to make substantial progress.

Popular version: https://www.nature.com/articles/d41586-021-00539-5


Discussion

Research in AI and in natural language processing is often focused on so called ‘narrow AI’, consisting of narrowly defined tasks. The preference for such tasks has several reasons. They require less resources to pursue; typically have clear evaluation metrics; and are amenable to end-to-end solutions such as those stemming from the rapid progress in the study of deep learning techniques37. Conversely, ‘composite AI’ tasks—namely, tasks associated with broader human cognitive activities, which require the simultaneous application of multiple skills—are less frequently tackled by the AI community. Here, we break down such a composite task into a collection of tangible narrow tasks and develop corresponding solutions for each. Our results demonstrate that a system that properly orchestrates such an arsenal of components can meaningfully engage in a complex human activity, one which we presume is not readily amenable to a single end-to-end solution. Since the 1950s AI has advanced in leaps and bounds, thanks, in part, to the ‘grand challenges’, in which AI technologies performed tasks of growing complexity. Often, this was in the context of competing against humans in games which were thought to require intuitive or analytic skills that are particular to humans. Examples range from chequers38, backgammon39, and chess40, to Watson winning in Jeopardy!41 and Alpha Zero winning at Go and shogi42. We argue that all these games lie within the ‘comfort zone’ of AI, whereas many real-world problems are inherently more ambiguous and fundamentally different, in several ways. First, in games there is a clear definition of a winner, facilitating the use of reinforcement learning techniques39,42. Second, individual game moves are clearly defined, and the value of such moves can often be quantified objectively (for example, see ref. 43), enabling the use of game-solving techniques. Third, while playing a game an AI system may come up with any tactic to ensure winning, even if the associated moves could not be easily interpreted by humans. Finally, for many AI grand challenges, such as Watson41 and Alpha Star44, massive amounts of relevant structured data (for example, in the form of complete games played by humans) was available and imperative for the development of the system. These four characteristics do not hold in competitive debate, which requires an advanced form of using human language, one with much room for subjectivity and interpretation. Correspondingly, often there is no clear winner. Moreover, even if we had a computationally efficient ‘oracle’ to determine the winner of a debate, the sheer complexity of a debate—such as the amount of information required to encode the ‘board state’ or to enumerate all possible ‘moves’—prohibits the use of contemporary game-solving techniques. In addition, it seems implausible to win a debate using a strategy that humans can fail to follow, especially if it is the human audience which determines the winner. And finally, structured debate data are not available at the scale required for training an AI system. Thus, the challenge taken by Project Debater seems to reside outside the AI comfort zone, in a territory where humans still prevail, and where many questions are yet to be answered.


No comments:

Post a Comment