Method and apparatus for separating processing for...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S010000, C704S251000, C704S275000

Reexamination Certificate

active

06513010

ABSTRACT:

The present invention pertains to a method and an apparatus for separating processing for language-understanding from an application and its functionality, the application containing functionality within a provided domain.
TECHNICAL FIELD
The present invention pertains to a method and a system for separating processing for language-understanding from an application and its functionality, said application containing functionality within a provided domain.
BACKGROUND ART
Conventional speech recognition application programming interfaces (API:s), such as Microsoft Speech API™ and Java Speech API™, take input on the form of a grammar and a lexicon, with little other information on the context or application domain in which the language interface is to operate. The output of such API:s is typically a stream of words, and an application designer must build a substantial amount of custom code to interpret the words and make appropriate application calls.
As illustrated in
FIG. 1
of the attached drawings, the conventional speech recognizer with its API is so to speak glued with custom code to the application itself The custom code provides the “intelligence” in translating a stream of words received from the speech recognizer to appropriate application calls. Any translation to actual application objects, methods, etc. has to be done on a per-case basis in the custom code.
Other speech API:s aim at reducing the amount of custom code, by allowing the use of modal dialogs. For example, the Philips SpeechMania® 99 product has been demonstrated with a pizza ordering application, where a user goes through dialog modes involving for instance selecting pizza toppings. A disadvantage of this type of technology is that the system will only understand the utterances expected in the given mode. If the user changes his drink order while the user is expected to select pizza toppings, the system may fail to understand this. The degree to which the system ‘understands’ the utterances in this kind of interaction is limited; each mode and the utterances valid therein must be anticipated by the developers, and directly related to the action the system takes as a response to the user input. This also means it requires a substantial amount of interface design work, with extensive studies (such as “wizard of oz”-type of settings) to determine every possible phrase a user might come up with in a given situation.
A widely distributed application of speech recognition and language-understanding today is different forms of telephony services. These systems are typically built with a central server, which accepts incoming voice calls over standard telephone lines. The users are presented with an interactive voice-based interface, and can make choices, navigate through menus, etc by uttering voice commands. The complete set of software, ranging from the speech recognition, through language-understanding, to application calls, database searches, and audio feedback, resides on the central server. This put high demands on the central server hardware and software, which also must support a large number of simultaneous interactive voice sessions. Typical applications for this type of system is ticket booking, general information services, banking systems, etc. An example of such a system is the “SJ Passenger traffic timetable information system”, in use by the Swedish Railway.
Many speech- and language-enabled applications do not use speech recognizer API:s (see description above with respect to the discussion of “conventional speech recognition API:s”). Instead, they implement the whole range of technologies required, from speech recognition through syntactic and semantic (linguistic) processing to the actual application calls and effects. Such designs are called, monolithic, since they do not make use of specified API:s to distinguish between different interchangeable modules of the language interaction system, but rather put all components in “one design”. An example of such a design is disclosed by, Bertenstam J. et al, “The Waxholm Application Data-Base”, Proc. of Eurospeech '95, Vol. 1, pp. 833-836, Madrid, 1995. The “Waxholm system” is a speech-controlled system for search and retrieval of information on boat timetables and services in the Stockholm archipelago. The system implements all relevant linguistic components, such as speech recognition, lexicon, grammar, semantics and application functionality internally.
The field of distributed systems in general deals with the distribution of databases, object repositories, etc over computer networks, The general intent is to provide unified high-level platforms to be used by computer applications that require runtime data to be presented and distributed over a network. One effort to provide a standardized framework for the design of distributed systems is the Common Object Request Broker Architecture (CORBA), proposed by the Object Management Group (OMG), The CORBA architecture is centered around the Object Request Broker (ORB), which handles application (client) calls to a distributed object by providing object stubs (or proxies) on the client-side, on which remote procedure calls are made and transferred to the actual object implementation (server) over the network.
The present invention addresses some fundamental problems that currently arise when language-based interaction is to be performed with multiple application entities present. These can be summarized in three main issues:
1) The lack of a consistent natural language interaction model for different application entities. This means that a multitude of different applications exist with different and mutually inconsistent linguistic interfaces. The interpretation of the recognized strings of words received from the speech recognizers is done by custom code (see description above with respect to the discussion of “conventional speech recognition API:s”), or even with the complete speech recognition and linguistic processing as an integral part of the application (see description above with respect to the discussion of “monolithic applications with language-based interaction”), and thus with application-specific solutions. This means that the ways users speak to machines varies and is inconsistent.
2) The lack of transparent interaction using natural language with multiple application entities. Given multiple natural language-enabled applications, there is a lack of unifying methods to bring the language interfaces together so as to make them accessible at once by the user. Application-specific solutions to distinguish between different sub-functionalities of a system exist (such as prefixing an utterance by “telephone, . . . ” or “calendar, . . . ” to indicate the context of a command), but this is still limited to customized solutions of particular application designs, and the parsing and linguistic processing is still left to each particular application once the destination of an utterance is determined. Thus, there exists a lack of “unification of linguistic processing and execution ”, given different accessible applications. As an example of where this type of interaction is problematic, consider a situation when a user wants to control different electronic systems integrated in a car, a stereo and a climate control system. Rather than prefixing each utterance with a destination (by saying things such as “radio, louder”, or “climate, cooler”), the system should be able to resolve sentences in the context of both applications simultaneously and understand that the verb “louder” is addressed to the radio, and “cooler” is addressed to the climate control system, something that currently can only be achieved by building the two applications as one single application unit.
3) The requirement to build natural language processing into all entities. Since there are no methods of unifying the linguistic processing of disparate applications in one design (see the two previous points), the full linguistic processing must with conventional techniques be built into each application. This is generally a pro

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for separating processing for... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for separating processing for..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for separating processing for... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3006831

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.