Speech interface for computer application programs

Data processing: speech signal processing – linguistics – language – Speech signal processing – Application

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C704S275000, C704S266000

Reexamination Certificate

active

06289312

ABSTRACT:

FIELD OF THE INVENTION
This invention relates generally to computer systems, and more particularly to computer systems that are operated interactively by users.
BACKGROUND OF THE INVENTION
A traditional computer system typically uses visual communication to interact with the user. For example, most application programs are designed to display output on a monitor screen. In response to viewing the output, the user enters commands to control the application program.
The complexity of the visual output has increased substantially with modem application programs. Consider the many programs that use windows, pull-down menus, tool bars, icons, slide buttons, pointers, multiple fonts, underlining, change bars, bolding, color cues, timers, and status bars, often all on a single display screen. The intent is to make user interaction with the application easier.
However, many currently available computing devices, which could be used to access modem applications interactively, such as hand-held personal digital assists and palm-top computers, have limited display capabilities. Also, the most widely available communication device, the telephone, typically has none. In addition, in certain environments where computers are used there may be light restrictions which preclude a visual interaction with the user, and certain users may prefer to have some, if not all of the interaction with the application program be in anything but visual mode. Consequently, in many instances, the increased visual complexity of modem applications has decreased their accessibility for user interaction. Therefore, it is desired to have modem application programs interact with users in other communication modes.
For example, some prior art application programs have been modified to provide some, if not all of the output in aural mode. Then, a relatively unsophisticated loudspeaker or earphones can replace the color graphic display monitor. However, it is a problem to modify application programs, and in most cases, application programs are not accessible for user modification. With mass produced “shrink-wrap” software, it is economically and technically impracticable to enable all applications for non-visual user interaction.
Instead of modifying application programs, it is known that screen readers can be used to provide spoken output for application programs. Prior art screen-readers typically intercept the character stream that is communicated to the monitor by the application program. The intercepted characters are converted to audible letters, and where possible, a contiguous set of characters are spoken as words or sentences. Special embedded control characters that are intended to format the display are converted into distinctive sounds.
Screen-readers are generally designed to operate separately from the application program. Thus, as an advantage, screen-readers can often be used with many applications programs without making any modifications to the application programs themselves. However, the user is entirely responsible for interpreting what the application is trying to convey, and how the sounds relate to the displayed images. In other words, the context of the application with respect to the arrangement of the displayed output is lost by screen-readers.
In addition, traditional screen-readers are difficult to adapt to modem display monitors. Modem display monitors use a bit stream instead of a character stream to create a screen image. The screen image, or graphic user interface, is generated from a bit pattern stored in a memory as pixels. The on and off states of the bits of the pixels determine how the information is presented. In designing screen-readers for use with graphic display systems, for example, IBM OS/2, Microsoft Windows 3.1, and Unix X-Windows, a significant effort is expended in extracting meaningful information from a complex graphic display, and constructing data structures to form a model of what is being displayed. However, in most cases, recovering the full application context from spoken words remains difficult.
As an additional drawback, most commercial screen-readers are designed to operate in a specific operating system environment, for example, DOS. This makes it difficult to use the screen-readers with applications designed for other widely used operating systems such as UNIX. Typically, the prior art screen-readers execute on a stand-alone PC running in terminal emulation mode. The PC is usually connected to the output driver used by the application programs. As a result, the screen-readers slavishly regurgitate sounds independent of the application context.
For example, an appointment calendar application program can arrange the dates of the month as cells organized into rows and columns on the monitor. Each row of seven cells represents a week, and like days are vertically arranged in columns. The user wishing to book or look up an appointment must form a mental image of the entire display as the calendar is spoken by the screen-reader. Only then can the user determine, for example, “what day of the week is the third Sunday of March.” To completely capture a mental image of the calendar, the user may have to “display” the month several times. In addition, the user may have to navigate through the cells using a cursor to determine the relative position of the cells. This is cumbersome and a waste of resources.
FIG. 1
shows a typical arrangement of a prior art speech enabled application programming system. An interactive application program
10
needs to interact with the user. Therefore, the application program
10
includes output calls
11
for generating output and soliciting input. Associated with the calls are parameters which indicate how the output and input should be handled.
A general purpose input/output (I/O) driver
20
includes processes or functions
22
. The calls
11
of the application
10
transfer execution control to the functions
22
. The functions
22
, depending on the parameters of the calls
11
and the hardware characteristics of a monitor
30
, generate “visual” output on line
21
, and receive input. The output and input can be in the form of a digitally encoded character stream. A monitor
30
connected to the output driver
20
by line
21
converts the character stream to a displayed image
40
. An input device, for example a keyboard or mouse, can be used to generate input.
In order to enable the application to “speak,” a screen-reader
50
is also connected to line
21
. Typically, the screen-reader executes on a stand-alone computer emulating a text terminal. The screen-reader
50
receives as input the same visual output generated by the functions
22
of the output driver
20
, e.g., the character stream. The screen-reader
50
generates aural output on line
51
. The aural output can also be in the form of an encoded character stream. A speech synthesizer
60
connected to the screen-reader
50
by line
51
processes the aural output to produce speech
70
, hopefully, representative of the image
40
.
In this arrangement, the screen-reader
50
merely intercepts the characters of the visual output without regard to the context of the application
10
when the calls
11
are made. Consequently, known screen-readers cannot provide the user with any application specific assistance to interpret the spoken words. Therefore, users listening to the output must attempt to build a mental picture of the image
40
on the screen from the letters and words of the speech
70
. However, if the application uses specific arrangements of the words and letters to convey meaning, then it is difficult to determine how the spoken words relate to that arrangement and what responses are appropriate to interact with the application.
Therefore, there is a need for a speech interface which integrates the spoken words with the context of an application program as the program is executing. The interface should provide rich, context-sensitive feedback to the user so that interaction with the application program can be facilitated. In addition, the interface s

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Speech interface for computer application programs does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Speech interface for computer application programs, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Speech interface for computer application programs will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2487329

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.