Spam email detection based on n-grams with feature selection

Electrical computers and digital processing systems: multicomput – Computer conferencing – Demand based messaging

Reexamination Certificate

Rate now

  [ 0.00 ] – not rated yet Voters 0   Comments 0

Details

C706S052000

Reexamination Certificate

active

07912907

ABSTRACT:
A similarity measurement manager uses n-gram analysis to identify spam email messages. The similarity measurement manager tokenizing an email message into a plurality of overlapping n-grams, wherein n is large enough to identify uniqueness of artifacts. The similarity measurement manager employs feature selection by comparing the created n-grams to n-grams of known artifacts which were created according to the same methodology. Created n-grams that match an n-gram of a known artifact are ignored. The similarity measurement manager compares the remaining created n-grams to pluralities of n-grams of known spam email messages, the n-grams of the known spam email messages being themselves created by executing the same steps. The similarity measurement manager determines whether the email message comprises spam based on whether or not the n-gram comparison indicates that it is substantially similar to a known spam email message.

REFERENCES:
patent: 6167434 (2000-12-01), Pang
patent: 6249807 (2001-06-01), Shaw et al.
patent: 6282565 (2001-08-01), Shaw et al.
patent: 6289416 (2001-09-01), Fukushima et al.
patent: 6324569 (2001-11-01), Ogilvie et al.
patent: 6487586 (2002-11-01), Ogilvie et al.
patent: 6493007 (2002-12-01), Pang
patent: 6546416 (2003-04-01), Kirsch
patent: 6640301 (2003-10-01), Ng
patent: 6643685 (2003-11-01), Millard
patent: 6650890 (2003-11-01), Irlam et al.
patent: 6654787 (2003-11-01), Aronson et al.
patent: 6687740 (2004-02-01), Gough
patent: 6691156 (2004-02-01), Drummond et al.
patent: 6697942 (2004-02-01), L'Heureux
patent: 6701347 (2004-03-01), Ogilvie
patent: 6711608 (2004-03-01), Ogilvie
patent: 6732157 (2004-05-01), Gordon et al.
patent: 6757713 (2004-06-01), Ogilvie et al.
patent: 6757830 (2004-06-01), Tarbotton et al.
patent: 7272853 (2007-09-01), Goodman et al.
patent: 7421498 (2008-09-01), Packer
patent: 2002/0087641 (2002-07-01), Levosky
patent: 2002/0138581 (2002-09-01), MacIntosh et al.
patent: 2003/0149726 (2003-08-01), Spear
patent: 2003/0167311 (2003-09-01), Kirsch
patent: 2003/0191969 (2003-10-01), Katsikas
patent: 2003/0200334 (2003-10-01), Grynberg
patent: 2003/0220978 (2003-11-01), Rhodes
patent: 2003/0229672 (2003-12-01), Kohn
patent: 2003/0233415 (2003-12-01), Beyda
patent: 2004/0003283 (2004-01-01), Goodman et al.
patent: 2004/0024823 (2004-02-01), Del Monte
patent: 2004/0054887 (2004-03-01), Paulsen et al.
patent: 2004/0064734 (2004-04-01), Ehrlich
patent: 2004/0068534 (2004-04-01), Angermayr et al.
patent: 2004/0073617 (2004-04-01), Milliken et al.
patent: 2004/0093383 (2004-05-01), Huang et al.
patent: 2004/0093384 (2004-05-01), Shipp
patent: 2004/0111480 (2004-06-01), Yue
patent: 2004/0148358 (2004-07-01), Singh et al.
patent: 2004/0205173 (2004-10-01), Hall
patent: 2005/0262210 (2005-11-01), Yu
patent: 2006/0031346 (2006-02-01), Zheng et al.
patent: 2006/0149820 (2006-07-01), Rajan et al.
patent: 2006/0218115 (2006-09-01), Goodman et al.
Nicholas et al. (“Spotting Topics with the Singular Value Decomposition”, 1998, http://www.springerlink.com/content/pmnxa3681qurn1gq/, pp. 82-91).
CAUSE.org web pages [online] Coalition Against Unsolicited Commercial Email [retrieved Mar. 17, 2003] Retrieved from the Internet: <URL: http://www.cauce.org/about/problem.shtml> U.S.A.
Outlook.spambully.com web pages [online] Spam Bully [retrieved Jan. 16, 2003] Copyright 2002, Retrieved from the Internet <URL: http://outlook.spambully.com/about.php>.
NBEC/NWOCA Anti-Spam Tools, [online] [retrieved Jul. 7, 2004] retrieved from http://home.nwoca.org, Ohio, U.S.A., Jul. 7, 2004.
Kularski, C. “Compound Procedures for Spam Control,” Highland School of Technology, Gastonia, NC, U.S.A., Jan. 2004.
“Technical Responses to Spam,” Nov. 2003, Taughannock Networks, Trumansburg, New York, U.S.A.
Cranor, Faith, L., LaMacchia, Brian A., “Spam!” Communications of the ACM, vol. 41, No. 8, pp. 74-83, Aug. 1998. U.S.A.
How it Works:Spam Recognition, http://www.death2spam.net/docs/classifier.html, retrieved Aug. 18, 2005, U.S.A.
Cavnar, William B. et al., “N-Gram-Based Text Categorization”, Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, NV., USA, Apr. 13, 1994.
“N-Gram-Based Text Categorization”, 2 pages, downloaded from http://citeseer.ist.psu.edu/68861.html, Aug. 25, 2005 U.S.A.
TextCat Language Guesser, 2 pages, downloaded from http:/odur.letsug.nl/˜vannoord/Textcat/ on Aug. 25, 2005., U.S.A.
Spam Assassin, The Apache SpamAssasin Project, 2 pages, downloaded from http:/spamassasin.apache.org on Aug. 25, 2005, U.S.A.
Basis Technology's Rosette Language Identifier, 2 pages, downloaded from http:/www.basistech.com/language-identification/ on Aug. 25, 2005, U.S.A.
Karp-Rabin algorithm, 3 pages, downloaded from http:/www-igm.univ-mlv.fr/˜lecroq/string
ode5.html on Sep. 1, 2005, U.S.A.
Rabin-Karp string search algorithm, 5 pages, downloaded from http://en.wikipedia.org/wiki/Rabin-Karp—string—search—alogrithm on Aug. 31, 2005 U.S.A.
The Rabin-Karp algorithm, String searching via Hashing, 5 pages, downloaded from http://www.eecs.harvard.edu/˜ellard/Q-97/HTML/root
ode43 on Aug. 31, 2005 U.S.A.

LandOfFree

Say what you really think

Search LandOfFree.com for the USA inventors and patents. Rate them and share your experience with other people.

Rating

Spam email detection based on n-grams with feature selection does not yet have a rating. At this time, there are no reviews or comments for this patent.

If you have personal experience with Spam email detection based on n-grams with feature selection, we encourage you to share that experience with our LandOfFree.com community. Your opinion is very important and Spam email detection based on n-grams with feature selection will most certainly appreciate the feedback.

Rate now

     

Profile ID: LFUS-PAI-O-2720113

  Search
All data on this website is collected from public sources. Our data reflects the most accurate information available at the time of publication.