Unicode conversion into multiple encodings

Coded data generation or conversion – Digital code to digital code converters – To or from alphanumeric code formats

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

Reexamination Certificate

active

06204782

ABSTRACT:

BACKGROUND OF THE INVENTION
1. Field of the Invention
The present invention relates to a system for converting between character codes for printed or displayed text and, more particularly, to a code converter for converting one character set to multiple character sets.
2. Description of the Related Art
Computers and other electronic devices typically use text to interact with users. The text is usually displayed on a monitor or some other type of display device. Because the text must be represented in digital form inside the computer or other electronic device, a character set encoding must be used. Generally speaking, a character set encoding operates to encode each character of the character set with a unique digital representation. The characters (which are encoded) correspond to letters, numbers and various text symbols and are assigned numeric codes for use by computers or other electronic devices. The most popular character set for use with computers and other electronic devices is the American Standard Code for Information Exchange (ASCII). ASCII uses 7-bit sequences for its encodings. In other countries, different character sets are used. In Europe, the dominant character encoding standards are the ISO 8859-X family, especially ISO 8859-1 (called “Latin-1”) developed by the International Standards Organization (ISO). In Japan, the dominant character encoding standard is JIS X0208 where JIS refers to the Japanese Information Standard and was developed by Japan Standards Association USA). Examples of other existing character sets include Mac™ OS Standard Roman encoding (by Apple Computer, Inc.), Shift-JIS (Japan), Big5 (Taiwan), and many more.
With the ongoing globalization of business and networks, it has become important for computers or other electronic devices to be able to handle multiple character encodings. For example, the same computer or electronic device may be used by persons of different nationalities who wish to interact with the computer or other electronic device in a different language. For each such language a different character set encoding is usually needed. However, character sets for the same language can also differ.
There is also a need to be able to convert from one character set encoding to another encoding. For example, a user in France using ISO 8859-1 may want to send an electronic mail message in French to a user in Israel who is using ISO 8859-8. Because the sender and receiver are using different character set encodings, the non-ASCII characters in the message will be garbled for the user in Israel. Ideally, one of the computers or electronic devices would convert from one character set to another character set. This has been achieved to a limited extent between a few character sets, but is largely not possible with modern computers or electronic devices. Code conversion is made difficult because of the numerous different character standards and the often conflicting or inconsistent national standards.
The Unicode™ standard (hereafter simply Unicode or Unicode standard) was developed to provide an international character encoding standard. The designers of the Unicode standard wanted and did provide a more efficient and flexible method of character identification. The Unicode standard includes characters of all major International Standards approved and published before Dec. 31, 1990, as well as other characters not in previous standards. The characters are encoded in the Unicode standard without duplication. The codes within the Unicode standard are 16-bits (or 2 bytes) wide.
A character code standard such as the Unicode standard facilitates code conversion and enables the implementation of useful processes operating on textual data. For example, in accordance with the above example, the computer or other electronic device in France can transmit Unicode characters and the computer or other electronic device in Israel can convert the Unicode characters it receives into a Hebrew based character set that is compatible with the computer or other electronic device in Israel. For additional detail about the Unicode standard, see, e.g., The Unicode Standard, Worldwide Character Encoding, Version 2.0, Addision-Wesley 1996, which is hereby incorporated by reference in its entirety.
One problem with Unicode is that when the Unicode text originates from multiple different encodings, it is difficult to convert the Unicode text back to the original multiple different encodings. In particular, some computer systems or applications that execute on computer systems do not support Unicode encodings. Hence, when such computer systems or applications receive Unicode text, they are not able to properly utilize the text. Hence, code conversion of the Unicode text to a target encoding understood by the computer system or application is needed. The difficulty is when the Unicode originates from multiple different encodings, the computer system (e.g., operating system) would not normally understand how to convert the Unicode back to the original multiple different encodings. In some cases, font or style information might be available and associated with the Unicode text so as to provide a suggestion as to the originating encodings. However, often such font or style information is not available.
Thus, there is a need for improved approaches to converting Unicode text to multiple different encodings.
SUMMARY OF THE INVENTION
Broadly speaking, the invention relates to techniques for converting source text (e.g., Unicode text) to multiple different encodings. The invention operates without any font or style information that would suggest the original encoding types. The invention is able to intelligently determine which of a variety of available target encodings are most appropriate for the given source text. The determination of the most appropriate target encodings can be flexible enough to accommodate different criteria or tolerance levels in performing its conversion. The criteria can, for example, be determined according to the intended use for the converted text, namely printing or displaying of the converted text. The various tolerance level can, for example, include strict, loose or fallbacks. Another aspect of the invention pertains to the automatic identification of those target encoding that are available.
The invention can be implemented in numerous ways, including as a system, an apparatus, a method, or computer readable medium. Several embodiments of the invention are summarized below.
As a code conversion system for converting a source string to a target string, an embodiment of the invention includes: a target encoding list containing available target encodings for the code conversion system; and a multi-encoding code converter that receives the source string and converts the source string into the target string, the target string including a plurality of encoding runs of different ones of the available target encodings.
As a computer-implemented method for converting a source encoding to target encodings selected from available target encodings, an embodiment of the invention includes the acts of: receiving a source text block, the source text block including a series of text elements; selecting one of the available target encodings; selecting one of the text elements from the source text block; determining whether the selected text element can be converted into the selected target encoding; selecting a next one of the text elements from the source text block and repeating the determining when the selected text element can be converted into the selected target encoding; and selecting another one of the available target encodings and repeating the determining for the selected text element when the selected text element cannot be converted into the selected target encoding.
As a computer-implemented method for producing a target encoding list for use by a code conversion system in converting characters in a source encoding to at least one target encoding, the target encoding list and the code conversion system being associated with a computer system

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Unicode conversion into multiple encodings does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Unicode conversion into multiple encodings, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Unicode conversion into multiple encodings will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2535650

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.