Three general types of dialogue manager exist [62] [49]. State based systems are the simplest. The dialogue is modelled as a sequence of states, with the strategies chosen by the user represented by the transitions. State based systems therefore have a hardcoded dialogue strategy and no user model. Another more flexible design for a dialogue manager is one based on a frame representation of the dialogue state. Frame-based systems are useful for the style of dialogue where a number of information items must be elicited from the user. The strategy is more flexible than in state-based systems since the system can dynamically form strategies by checking the slots that still need to be filled. For example if the name and credit-card number are still not known, the system could formulate a single question to elicit both. The most complex design for a dialogue manager is that of an agent based system, which uses an explicit model of the system and the user in terms of of a belief, desire and intention (BDI) architecture (see Section 2.3). These systems choose dialogue strategies by planning.
One of the objectives of this thesis is to create a dialogue planning system that is domain independent, in that it acts as a shell that supports execution of a set of dialogue plan rules, and automatically maintains of the user model that is used in generating the dialogue. No changes should be necessary to the system to support a new set of dialogue plan rules. A number of dialogue systems already exist that have a similar objective. Based on a finite state design, VoiceXML [46] uses an XML description of a finite state machine. For each state, a set of transitions is given to correspond with each of the user inputs. These inputs are specified using a grammar. VoiceXML is intended as an analogue to HTML, so that forms can be filled using voice rather than a web browser. Just as web pages are served from a web server, voiceXML pages are served from a voiceXML server, which runs the automaton, generates speech output and interprets speech input, and returns the information gathered in the dialogue. Another domain independent system that supports dialogue management is the BGP-MS system of Kobsa and Pohl [40]. This constitutes a user modelling shell system that stores and infers user beliefs and goals. It provides a protocol through which an application program can feed beliefs about the user and reports of the user's actions. The system, working on a specified knowledge base can pass interesting inferences to the application system, search for misconceptions, and formulate questioning strategies for the application system which are used to acquire the user model. Using action specifications, inferences are drawn about the preconditions and effects of the dialogue acts that are observed in the application's interaction. A mechanism for resolving inconsistency in the user model has been proposed for the system, since user beliefs can change or be misconceived. Stereotypes can be used, allowing inheritance of user models from a stereotype model to each of the members of the stereotype. Kass and Finin [37] also developed a user modelling shell, GUMS, which has much the same set of features as BGP-MS.
COLLAGEN [56] is a dialogue planning system that is used in managing a collaborative process between a user and a separate agent. While it does not choose strategies, it instead operates as a mediator between the two dialogue participants, recording the dialogue history and parsing it into a plan structure. The dialogue is modelled using hierarchical plan rules, and assumes that acts are added to the plan in a focussed manner, in a similar fashion to the model employed by Carberry (Section 2.5), and similar to the model that will be used in this thesis. Since COLLAGEN records the structure and changing focus of the dialogue, it can help the user in a number of ways. First, it can display the dialogue acts from which the user can choose at a point in the dialogue, by checking the applicable plan rules at the plan's focus point. The user can stop an incomplete plan so that the focus point can be moved somewhere else. He can return to stopped points later on. A plan can also be abandoned, by backtracking and taking a different alternative at an earlier choice point. Segments can be replayed allowing their reuse in different contexts. COLLAGEN separates the generation of allowable strategies from the discourse model from the choosing of those strategies by the agent. A similar principle will be used in the planner presented in this thesis, where the allowable strategies are first generated, and a separate module is used to choose a strategy from those alternatives.
The TRAINS system [2] is an example of a natural language collaborative planning system. It is a kind of meta-level planning system (see Section 2.6), in that the planned dialogue is one that supports the choice of a domain-level plan. A human planner uses the system to answer questions about the domain and to evaluate proposed alternatives. The architecture of the TRAINS system is agent-based, and in common with the planner described in this thesis, uses the BDI model to represent the state of the dialogue manager. Using this model the system maintains a set of nested beliefs about the user, so that as the dialogue progresses, a model of the user's domain plan alternatives can be developed, and the system can provide cooperative contributions to the dialogue in the context of these alternatives. TRAINS is a complete natural language system, addressing the challenge of understanding natural user input that is relatively unconstrained due to a mixed-initiative dialogue strategy.
Walker [75] discusses BDI planning using Bratman's IRMA [7] architecture. IRMA is designed to accommodate an agent's resource limitations in performing deliberation every time a change is made in the environment. Walker explains that a substantial fraction of dialogue contains redundancies - information that needs to be communicated only because the hearer's resource limitations prevent him from inferring it himself. A dialogue partner may be limited in working memory or may be limited in inferential capacity. A resource bounded BDI architecture is useful for generating dialogue with redundancies, since it can be used as to model the deliberation process of the hearer.