Multiscript database system and method

Data processing: database and file management or data structures – Database design – Data structure types

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C707S793000, C707S793000, C709S223000

Reexamination Certificate

active

06560596

ABSTRACT:

FIELD OF THE INVENTION
The invention relates generally to the field of art of database systems and more particularly to systems for generating, managing and operating databases in a multi-script environment.
BACKGROUND OF THE INVENTION
Text is stored in computers in a wide variety of encodings. For instance, one of the earliest encodings is ASCII (American Standard Code for Information Interchange,) where alphanumeric characters are represented by a 7-bit numeric value. Thus, as illustrated by the ASCII encoding in Table 1 below, the character ‘A’ is represented by the 7-bit representation ‘100 0001’ (or hexidecimal value 0×41 as illustrated in table.)
TABLE 1
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
0
NUL
SOH
STX
ETX
EOT
ENQ
ACK
BEL
BS
HT
LF
VT
FF
CR
SO
SI
1
DLE
DC1
DC2
DC3
DC4
NAK
SYN
ETB
CAN
EM
SUB
ESC
FS
GS
RS
US
2
SP
!

#
$
%
&
'
(
)
*
,
-
.
/
3
0
1
2
3
4
5
6
7
8
9
:
;
<
=
>
?
4
@
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
5
P
Q
R
S
T
U
V
W
X
Y
Z
[
\
]
{circumflex over ( )}
_
6
{grave over ( )}
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
7
p
q
r
s
t
u
v
w
x
y
z
{
|
}
~
Another character encoding is the 8-bit EBCDIC (Extended Binary Coded Decimal Interchange Code) utilized in traditional IBM Corporation mainframe computers, where alphanumeric characters are represented by an 8-bit numeric value. Thus, with reference to the EBCDIC encoding illustrated Table 2 below, the character ‘A’ is represented by the 8-bit representation ‘1001 0001’ (or hexidecimal value 0×91 as illustrated in the table.) Notice that the EBCDIC alphanumeric encodings, illustrated in Table 2, are different from ASCII as illustrated in Table 1.
TABLE 2
0
1
2
3
4
5
6
7
8
9
A
B
C
D
E
F
0
NUL
SOH
STX
ETX
PF
HT
LC
DEL
GE
RLF
SMM
VT
FF
CR
SO
SI
1
DLE
DC1
DC2
TM
RES
NL
BS
IL
CAN
EM
CC
CU1
IFS
IGS
IRS
IUS
2
DS
SOS
FS
BYP
LF
ETB
ESC
SM
CU2
ENQ
ACK
BEL
3
SYN
PN
RS
UC
EOT
CU3
DC4
NAK
SUB
4
{grave over ( )}
:
#
@
{grave over ( )}
=
{grave over ( )}
5
a
b
c
d
e
f
g
h
i
6
j
k
l
m
n
o
p
q
r
7
~
s
t
u
v
w
x
y
z
8
9
{
A
B
C
D
E
F
G
H
I
A
}
J
K
L
M
N
O
P
Q
R
B
\
S
T
U
V
W
X
Y
Z
C
D
E
F
0
1
2
3
4
5
6
7
8
9
|
EO
Historically, as illustrated by the ASCII and EBCDIC encodings, computer representations for text were entirely focused on English. Over time, new encodings were designed that allowed the representation of text in many languages with many different character sets. In this document, we use the term ‘script’ to refer to the representation of one or more languages in terms of a set of written character forms. An ‘encoding’ is a binary representation that allows text in one or more scripts to be encoded in the memory of a computer.
The many script encodings have grown in a disorganized process; they are not organized into a coherent system. In particular, they are not collectively self-descriptive. In other words, it is not possible to look at an arbitrary stream of data and determine what, if any, text encoding is in use. For example, if a data value of 0×51 representing a text character is received by a computer, and the ASCII and EBCDIC are possible encodings, if the computer does not know which script encoding is being used the computer cannot determine if the data value is referring to the ASCII character ‘Q’ or the EBCDIC character ‘a’. The data value itself does not convey the script encoding utilized in creating the data value. In addition, in most of the encodings, it is not possible to include text in multiple scripts in the same logical document.
The existing art includes several standards that attempt to bring some order into this chaos. The ISO-8859 family of standards provides a series of one-byte-per-character encoding for European languages. Yet these encodings are not self-descriptive. The ISO-2022 standard attempts to allow for a complete, self-descriptive encoding that can be extended to cover all languages. However, this standard is so complex and unwieldy that it is never used in a full, self-descriptive, multi-script form. There are many other standards that provide encodings using one, two, or more bytes of data per character to represent text.
Unicode, standardized by ISO as ISO/IEC-10646-1: 1993, provides a representation that can store most of the commonly used languages in a single encoding. Unicode is self-descriptive, so that text encoded utilizing Unicode further includes information indicative of the script. Thus, Unicode overcomes many of the shortcomings of preexisting script encodings. But while Unicode is becoming widespread, there are serious difficulties in simultaneously accommodating currently existing non-Unicode information in many applications. In addition, many of the commonly installed computer systems do not even handle Unicode and can only handle a single encoding at a time. These legacy systems will be with us for a long time.
The problem of multiple encodings is traditionally addressed in software applications by building multiple versions of software systems, one per encoding. One version may be adapted to handle English based scripts while yet another version may handle Chinese based scripts. Each user interacts with the software using the encoding native to their particular computer system.
This model fails to cope with the needs of international business, particularly on the World Wide Web. In the emerging international marketplace, businesses need to present an interface to users in many languages and accept responses from them. Furthermore, as the marketplace becomes more global, the mechanism for information exchange between these divergent markets (and thus divergent computer systems) must be able to handle a wide variety of scripts. Until and unless the majority of users use Unicode-enabled systems and software, these business interfaces must cope with the existing inventory of text encodings.
This problem is particularly acute on the World Wide Web, where the standards for information exchange and presentation have well-known inadequacies in the area of character encodings. For instance, pages of information sent to users can be marked with an encoding so that the text may be correctly displayed. However, responses from the user to the business server are not marked with any encoding at all. This impairs the development of truly worldwide software applications.
As a result, the existing art offers no good means of taking user responses in an arbitrary text encoding and processing them. This problem is particularly acute for database lookups. While existing DBMS systems can store text in all the many national encodings (and, in some cases, Unicode), they provide no assistance for looking up a string in an unknown encoding.
Any solution to the problem of processing user responses in arbitrary encodings has to be compatible with existing databases.
SUMMARY OF THE INVENTION
It is a goal of the present invention to provide a system and method for handling multiple script encodings.
It is a further goal of the present invention to facilitate the use of multiple script encodings in software and database applications.
It is an additional goal of the present invention to provide a system and method for augmenting existing databases to handle multiple script encodings.
It is an additional object of the invention to provide a system and method to manage an Internet Protocol Domain Name Service utilizing a variety of script encodings.
It is a further object of the invention to modify existing Internet Protocol Domain Name Service databases to manage both legacy script encodings and additional non-legacy script encodings.
The present invention is directed to a database system and method for storing information referenced by a name encoded according to at least two scripts. The database system includes a first database containing first information pertaining to the name retrieved from the first database via a first key. The first key contains the name encoded in a first script. The database system further includes a second database containin

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Multiscript database system and method does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Multiscript database system and method, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Multiscript database system and method will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3001046

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.