Data processing: speech signal processing – linguistics – language – Speech signal processing – Application
Reexamination Certificate
2000-02-16
2002-08-13
{haeck over (S)}mits, T{overscore (a)}livaldis Ivars (Department: 2654)
Data processing: speech signal processing, linguistics, language
Speech signal processing
Application
C704S270000, C704S257000
Reexamination Certificate
active
06434529
ABSTRACT:
CROSS REFERENCE TO RELATED APPLICATIONS
N/A
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
N/A
BACKGROUND OF THE INVENTION
The present invention relates generally to speech recognition, and more specifically to a system for providing actions to be performed during processing of recognition results.
Speech recognition systems are available today which allow a computer system user to communicate with an application computer program using spoken commands. In order to perform command and control operations on the application program in response to a users speech, existing speech recognition systems employ a type of speech recognition grammar referred to as a “rule grammar”, which is loaded into a speech recognizer program. Rule grammars are also sometimes referred to as “command and control” or “regular” grammars. Rule grammars are often written in a special grammar language that is different from the programming language used to write the application program itself. For example, in a system in which the application program is written in the Java™ programming language provided by Sun Microsystems™, the Java Speech API grammar format (JSGF) may be used to write the rule grammar. Accordingly, application programs and grammars are typically developed and maintained in separate files by different programmers or teams. As a result, the application code that handles the logic of the application is usually separate from the files that define the rule grammars. These factors result in a parallel maintenance problem: changes to the application code may require changes to the rule grammar files and visa versa.
An example of a simple rule grammar for an application program associated with a hypothetical multi-media player is as follows:
<play>=(play|go|start){PLAY};
<stop>=(stop|halt|quit running){STOP};
<lineno>=1{ONE}|2{TWO}|3{THREE}|4{FOUR};
<goto>=go to line<lineno>{GOTO};
public <command>=<play>|<stop>|<goto>;
In the above illustrative grammar, each line is a recognition rule having a rule name within < > on the left side of the equation, specifically <play>, <stop>, <lineno>, <goto>, and <command>. Rule definitions are on the right side of the equations. In the above example, the rule definitions include a set of one or more alternative utterances (sometimes referred to as “tokens”) or rule names separated by “|”. The utterances and rule names in each rule definition define the speech patterns matching the rule.
The above rule grammar may be loaded into a speech recognizer program to enable the speech recognizer program to listen for any of the following spoken commands: “play”, “go”, “start”, “stop”, “halt”, “quit running”, “go to line 1”, “go to line 2”, “go to line 3”, and “go to line 4”. In existing systems, for an application program to respond to the above commands, the application program must include program logic mapping the specific speech patterns defined by the rule grammar to the appropriate actions. This is sometimes accomplished by embedding static strings, referred to as “tags”, in the rule grammar, which are passed to the application program within recognition results from the speech recognizer program. In the above illustrative rule grammar, the tags associated with each rule are shown within curly brackets {}, specifically “{PLAY}”, “{STOP}”, “{ONE}”, “{TWO}”, “{THREE}”, “{FOUR}”, and “{GOTO}“.
When the speech recognizer program recognizes a command defined by the rule grammar, the speech recognizer program sends the application program recognition result information describing what was spoken by the user. Result information is passed to the application program in what is sometimes referred to as a “recognition result object.” The result information in the recognition result object includes any tag or tags associated with the grammar rule or rules matching what was spoken by the user. The application program then must determine what action or actions are to be performed in response to what was spoken by the user by interpreting the tags in the recognition result.
In more sophisticated existing systems, tags embedded in rule grammars may include or consist of portions of scripting language. For example, a more elaborate rule grammar for a hypothetical multi-media player application program might include the following rules:
<lineno>=(1|2|3|4) {line=this.tokens;};
<goto>=go to line <lineno>{action=goto;
lineno=<lineno>.line;};
<play>=(play|go|start) {action=play;};
<stop>=(stop|halt|quit|running) {action=stop;};
public <command>=<play>|<stop>|<goto>;
In the above example, the tag for the <lineno> rule is “line=this.tokens;”, which is a scripting language command for assigning the value of the number that was spoken (1, 2, 3 or 4) to a “line” feature field within a feature/value table. Similarly, the tag for the <goto> rule in the above rule grammar is the scripting language “action=goto; lineno=<lineno>.line;”. When a user says “go to line 3”, the speech recognizer generates a recognition result including the tags “line=this.tokens” and “action=goto; lineno=<lineno>.line;”. The application program receives the recognition result, and, for example, passes the tags it contains to a tags parser program for interpretation of the scripting language they contain. The application program may, for example, pass the result object to the tags parser program using the following command:
FeatureValueTable fv=TagsParser.parseResult(recognitionResult);
The above command loads the result from the tag parser program (TagsParser.parseResult), operating on the above tags for example, into the feature/value table “fv”. In this example, the tags parser would first associate the value “3” with the “line” field. The tags parser would then associate the value “goto” with the “action” feature and copy the value “3” from the “line” field to the “lineno” feature. This results in logical feature/value pairs stored in the feature/value table fv as follows:
Feature
Value
action
goto
lineno
3
Upon receipt of the feature/value table, the application program must employ specialized post-processing code to interpret the feature/value pairs it contains. An example of such post-processing is as follows:
public void interpretResult(RecognitionResult recognitionResult) {
FeatureValueTable fv=TagsParser.parseResult(recognitionResult);
String action=fv.getvalue (“action”);
if (action.equals(“goto”)) {
String lineno=fv.getvalue (“lineno”);
int line=Integer.parseInt(lineno);
player.goto(line);
else if (action.equals(“play”)) {
player.play( )
}. . .
}
The above code example illustrates the complexity required in the application program to process the feature/value table. Some existing scripting language systems permit class references to be embedded in the scripting language of a tag. Class references are references to globally defined “static” objects as may be obtained using the “static” key word in the Java programming language. The following rule definition shows such an approach:
<lineno>=(1 |2|3|4) {line =HelperClass.parseInt(this.tokens);};
The above recognition rule contains a class reference to a static object named “HelperClass”. The class reference performs, for example, conversion of the string spoken by the user to an integer value. However, the only permitted references in such systems are class references, and object instanc
Adams Stuart J.
Hunt Andrew J.
Walker William D.
Azad Abul K.
Sun Microsystems Inc.
Weingarten Schurgin, Gagnebin & Lebovici LLP
{haeck over (S)}mits T{overscore (a)}livaldis Ivars
LandOfFree
System and method for referencing object instances and... does not yet have a rating. At this time, there are no reviews or comments for this patent.
If you have personal experience with System and method for referencing object instances and..., we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and System and method for referencing object instances and... will most certainly appreciate the feedback.
Profile ID: LFUS-PAI-O-2901633