Method and apparatus for mRNA assembly

Data processing: measuring – calibrating – or testing – Measurement system in a specific environment – Biological or biochemical

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C702S020000, C435S006120, C536S023100

Reexamination Certificate

active

06625545

ABSTRACT:

FIELD OF THE INVENTION
The present invention relates to automatic assembly of mRNA sequences from databases containing large numbers of partial cDNA sequences.
BACKGROUND OF THE INVENTION
In human cells, genetic material is stored as DNA in a nucleus of the cell. When a certain protein is needed by the cell, a portion of the DNA is transcribed as mRNA, which is transported the cytoplasm of the cell. In the cytoplasm, ribosomes create proteins, using the mRNA as a template. Generally, the mRNA comprises a long sequence of bases, each triplet (codon) of which encodes a specific amino acid. Thus, a sequence of triplets encodes a sequence of amino acids, which form a protein.
Cell function can, theoretically, be analyzed by determining the type of and ratio between the proteins in the cell. However, proteins are very delicate materials, which are difficult to analyze. mRNA, which controls the creation of the proteins, is easier to separate and analyze. Although several different mRNA sequences may encode similar acting proteins, each mRNA sequence encodes only a single protein. In addition, there is usually a good correlation between the relative amount of different types of mRNA and the relative amounts of protein. It is thus possible to analyze cell function by analyzing the mRNA in a cell.
It should be noted that mRNA contains two types of information which are not evident from DNA. First, the relative concentration of the mRNA indicates the abundance of a particular protein. Second, in the process of transcribing DNA, changes, especially deletions, are made to the nucleotide sequence.
Differential analysis is used to generate standardized databases of human cellular activity by determining differences between gene expression in sick cells and healthy cells and between cells from different tissues. The result of a differential analysis between two cells is the difference in the type and expression level of mRNA sequences. In some cells, for example cancer cells, there is a higher concentration of certain proteins than in healthy cells of the same tissue. Determining these differences can help researchers determine how a cancer cell functions differently from healthy cells. Analysis of mRNA is currently being used to generate drug leads. For example, by selectively blocking these proteins which are more common in cancer cells, using designer-pharmaceuticals, it may be possible to disrupt the functioning of cancer cells, without significantly affecting the functionality of regular cells. Also when developing pharmaceuticals for bacterial, prion and viral infections, it is useful to design a pharmaceutical which selectively blocks proteins which are necessary for the life and/or reproduction of the disease agent, but which does not block proteins necessary for human cell survival.
Thus, it can easily be appreciated why pharmaceutical companies, research institutes and biotechnology companies maintain large databases of partial mRNA sequences. Such sequences, known as ESTs (Expressed Sequence Tag), often have associated information, such as the tissue type and/or disease type where the EST is expressed and/or the expression level of the EST in these situations. Some databases include complete mRNA sequences. In some cases, a genomic database can be analyzed to yield mRNA sequences, if the introns are correctly identified.
ESTs are generated using the following (greatly simplified) process: a cell is selected and disrupted; proteins and other cell structures are selectively disintegrated; mRNA sequences are isolated and converted to cDNA sequences; cDNA sequences are inserted into host cells, which can be cultured; individual host cells are disrupted; and a segment of DNA which includes the cDNA or original mRNA sequence at a known location thereof is located and read out.
Unfortunately, the art of reading mRNA sequences is not yet completely developed. The error rate of the reading increases with increasing length of the mRNA sequence. The common errors are insertion or deletion of bases, and errors in the identification of individual bases. At a certain sequence length, the error rate increases to a point where further reading is not possible. As a result, most ESTs are only 200-600 bases long, while an average mRNA sequence is typically 1000-3000 bases long.
In addition, EST databases contain many other types of errors, which may be accumulated during the complicated process of EST generation in addition to features, inherent in the mRNA, which make the assembly difficult. These causes of difficulty include:
(a) Chimeric sequences. During the process of extracting and replicating the mRNA and cDNA, chimeric sequences may be inadvertently inserted into the nucleotide sequences. Such chimeric sequences include ribosome RNA, junk sequences from the extraction and replication process, contamination from external sources, such as human cells and contamination from the host cells.
(b) Intron Contamination. Introns are portions of the DNA which are not expressed in the final mRNA product and are usually removed from the mRNA during the middle of the transcription process (splicing). However, since the cell is disrupted in the middle of its normal activity, the transcription process may be incomplete or otherwise disrupted, for example by introns being incorporated in the mRNA sequences.
(c) Broken and respliced sections. During the process of extraction and replication the mRNA sequences may be broken and, in some cases, may be reconnected, not necessarily correctly. In addition, whole sections of mRNA sequences may be inadvertently removed.
(d) Alternative splicing. This is not an error in the ESTs but it is an important cause of mismatch between ESTs. The transcription of DNA to mRNA does not follow a one-to-one correspondence. Depending on various conditions in the cell, a single DNA sequence may be transcribed as several different mRNA sequences. The different transcriptions, named alternative splice variants, are usually achieved by certain segments of the DNA being selectively spliced out. Thereafter, selected portions of the mRNA, named alternative spliced regions, are selectively spliced out of the mRNA sequence. As result, there may be two mRNA sequences which do not exactly match, even though they originate from the same DNA sequence and contain no errors.
(e) Redundancy Level. The process of extracting the ESTs includes replication of mRNA sequences and there is usually more than one copy of each mRNA in a living cell. In addition, as most databases contain ESTs extracted in many experiments, many ESTs can be expected to appear in several experiments. As a result, there is a high redundancy of ESTs in the raw database. However, due to the errors in reading out the ESTs, the ESTs will not exactly match. Also, even though there may be significant overlap between two or more ESTs, they will usually have different start and end points and different lengths. This lack of consistency makes the task of assembly more difficult.
As an end result, EST databases generally contain only short ESTs, which must then be correctly associated and assembled into the original mRNA sequences. However, due to the above-described problems, it is very difficult to correctly match up the ESTs. In general, the limiting factor in this field is information analysis, rather than information volume.
If the ESTs are correctly matched, the discovery and/or development of new pharmaceuticals, is made easier and faster. For example, assuming 20 ESTs are determined by differential analysis to be found in a cancer cell rather than a healthy cell, 20 leads must be pursued to find a drug, which may disrupt the cancer cell. However, if the 20 ESTs are combined to form 2 complete mRNA sequences, only 2 leads need to be pursued, reducing the volume of work by a factor of 10.
SUMMARY OF THE INVENTION
It is an object of some embodiments of the present invention to provide a method of mRNA assembly which reduces existing raw EST databases, removes errors therefrom and facilitates the creation of longer and/or complete

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Method and apparatus for mRNA assembly does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Method and apparatus for mRNA assembly, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Method and apparatus for mRNA assembly will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-3083263

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.