Merging of speech interfaces from concurrent use of devices...

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S270000

Reexamination Certificate

active

06615177

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
This invention relates to a speech interface in a home network environment that can dynamically and actively extend its vocabulary on basis of device or medium dependent vocabulary transmitted by network devices. In particular it relates to a method of extending the vocabulary within a speech unit realizing such a speech interface.
In this context, a device can either be hardware, e.g. a VCR, or software, e.g. an electronic programming guide.
2. Description of the Related art
The EP-A-97 118 470 describes that in a home network environment a device can send its vocabulary describing its functionality and its speech interface to a speech unit. The speech unit can then translate a received and recognized user utterance into a corresponding user-network-command to control said network device on basis of the received vocabulary.
In September 1998 Motorola published a language reference to VoxML 1.0 which is a language based on XML. This language is used for describing dialog systems by specifying dialog steps that consist of prompts and a list of possible options in general. Basically, VoxML offers the same ease of production to voice applications as HTML does as a vehicle for the presentation of rich text as well as images, hypertext links and simple GUI input controls. The VoxML language reference contains information on the syntax of the elements and their attributes, example usage, the structure of VoxML documents or dialogs and pointers to other reference documentation that may be helpful when developing applications using VoxML.
Similarly, the paper “a markup language for text-to-speech synthesis” by Richard Sproat et al. ESCA. Eurospeech 97, Rhodes, Greece, ISSN 1018-4074, page 1774 discloses a spoken text markup language (STML) to provide some knowledge of text structure to text-to-speech (TTS) synthesizers. The STML text can e.g. set the language and default speaker for that language in a multi-lingual TTS system so that appropriate language and speaker specific tables can be loaded.
Furtheron, Phillips has developed a dialog description language called HDDL specifically for the kind of dialogs that are encountered in automatic enquiry systems. HDDL is used to build a dialog system in off-line mode before it is sold.
SUMMARY OF THE INVENTION
The object underlying the present invention is to provide a method to transmit the functionality and speech interface of a network device to the speech unit of the network and to handle such functionalities and speech interfaces of several network devices within the speech unit.
Therefore, the present invention provides an easy, quick and flexible method to control a network device connected to a network with a speech unit within said network that translates user-commands into user-network-commands to control said network device via said network on basis of the functionality and a speech interface that comes e.g. with said network device.
The inventive method is defined in independent claim
1
, preferred embodiments thereof are respectively defined in the following dependent claims
2
to
24
.
According to the present invention, every network device connected to a network is associated with at least one device-document that defines the functionality and speech interface of said network device. Such a device-document can come e.g. without said network device and comprises one or several pairs of an user-command interpretation element and an associated user-network-command. Furtheron, the speech unit is able to receive such device-documents and merge them together into one speech interface description. This speech interface description comprises then the language of said speech unit. It can also be referred to as a general document for the whole network. The spoken user-commands that are received and recognized by the speech unit are then translated into user-network-commands on basis of all pairs of an user-command interpretation element and an associated user-network-command included in said general document. Such an user-command interpretation element can e.g. comprise a vocabulary element, a definition of a grammar, a pronunciation definition or several of these or other examples.
As described in EP-A-97 118 470, documents for devices might alternatively be fetched by the speech unit from its own memory or a (distant) database.
The adapting of the general document within the speech unit can be done purely syntactically at runtime after a device-document is received from a device. Of course, it is also possible that a network device comprises several documents that individually describe a part of its functionality and a part of its speech interface and only those documents are transferred to the speech unit that are really needed, e.g. only those documents describing the functionality of a network device for a certain language or only documents that define a part of the functionality of the network device. If the speech unit or an individual network device itself then recognizes that further speech capabilities of said individual device are needed, e.g. an additional language, the corresponding device-document(s) can be send to the speech unit that adapts its general document on basis of this further document and can generate the corresponding user-network-commands on basis of the adapted general document at run-time.
The present invention and its advantages will be better understood from the following description of exemplary embodiments thereof. These subsequent discussed embodiments are described on basis of a case where two devices are connected to a network that also includes one speech unit. Initially, the general document of the speech unit is empty and both device-documents each of which defines the language of one device are merged together into one interface description within the speech unit. Obviously, these examples can be extended to a case with n devices being connected to a network or also for the case that the speech unit comprises a general document that gets adapted on basis of a newly received device-document. Furtheron, for the sake of simplicity, a device document comprises user-command interpretation elements consisting only of one vocabulary-element.
DETAILED DESCRIPTION OF THE INVENTION
In the following examples L
1
is the accepted language, i.e. the vocabulary-elements and associated commands, of a first device
1
and L
2
that of a second device
2
. Mathematically speaking, L
1
is a set of at least one vocabulary-element, i.e. word w
i
(although a word w
i
needs not to be a single word, it could also be a complete utterance consisting of several words), and the associated user-network-command. Additionally to the vocabulary-elements for example elements about pronunciation, grammar for word sequences and/or rules for speech understanding and dialog can be contained in the language.
In a first example L
1
and L
2
contain no same vocabulary-elements, i.e. no same words, so that L
1
∩L
2
={ }, the merged accepted language L for the interface description is L=L
1
∪L
2
. i.e. the general document within the speech unit is built by adding the pairs of vocabulary-elements and associated commands from document
1
comprising the language L
1
of the first device
1
and document
2
comprising the language L
2
of the second device
2
. As the vocabulary-elements together with the associated commands implicitly define for which device they were meant, since L
1
and L
2
contain no same vocabulary-elements, user network commands can be generated and sent to the correct corresponding devices.
In this first example the two network devices might be a TV set and a CD player. In this case L
1
associated with the TV set and L
2
associated with the CD player respectively comprise the following vocabulary-elements within the device-documents:
L
1
={MTV, CNN}, and
L
2
={play, stop}.
Since L
1
and L
2
do not contain the same vocabulary-elements, i.e. L
1
∩L
2
={ }, the me

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Merging of speech interfaces from concurrent use of devices... does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Merging of speech interfaces from concurrent use of devices..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Merging of speech interfaces from concurrent use of devices... will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3041179

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.