Forgot your password?  

Not What You Meant?  There are 21 definitions for String.  Also try: Pattern search.

String-Matching Algorithms | Research & Encyclopedia Articles

Print-Friendly   Order the PDF version   Order the RTF version
About 2 pages (716 words)
String searching algorithm Summary

 


String-Matching Algorithms

One of the important basic tasks in computing is the finding of all occurrences of a particular alphanumeric string in a document, or the finding of all documents that have one or more occurrences of a given alphanumeric string. One common example where such a task needs to be carried out is on the World Wide Web, where search engine software needs to find all occurrences of a given query term or keyword, and return the results based on some rules of priority and relevance. It is also common for there to be find-and-replace macros or features in text-editing programs, allowing for all occurrences of a certain string to be replaced with another string. For example, in writing many form letters having the same nature, it is possible to just change the salutation: Dear Ms. Smith, Dear Mr. Jones, Dear Mrs. Fowler, Dear Dr. Koopman, etc., while keeping the rest of the letter the same. String-matching algorithms are also applied in genetic engineering to find matching patterns in DNA sequences.

The applications of string-matching are thus diverse, but they all have a common point--the efficiency of algorithms used is extremely important. This is especially true in cases like web searches--search engines receive thousands of search requests each minute, and even with the software running on very high-end computing equipment, it does not pay to delay the results. The algorithm must return results very quickly to free up the hardware to service other queries. It is also possible that users of search engines will be disappointed and turn away, if their queries are not answered very quickly. Thus, there is in such cases a very great need for speed, so the algorithms used must be designed with this in mind.

The naive string-matching algorithm simply uses a brute-force comparison by sliding the test string along the string to be compared with, and making a comparison at each and every possible point. This is a meaningful but very inefficient way of determining a match. A more efficient method was created by Richard Karp and Michael O. Rabin; their algorithm is almost as bad as the naive algorithm in the worst case, but has much better best-case and average performance. Other algorithms have also been proposed; the best deterministic one known and accepted widely at this time is the Boyer-Moore algorithm of Robert S. Boyer and J. Strother Moore.

In many cases, it is necessary to have fast on-line approximate string matching. This is the problem of searching a pattern in a text allowing errors in the pattern or in the text. Algorithms to do this are commonly based on a very fast string-matching kernel which is able to search short patterns using a nondeterministic finite automaton. A number of techniques and heuristics are possible to extend this kernel for longer patterns. However, the techniques can be integrated in many ways and there is no single optimal solution.

In many applications, it is necessary to determine the similarity—or lack thereof--of two strings. Similarity is in a broad sense vague and intuitive, but a practical definition of the abstraction that is widely used is the so-called edit distance--the minimum number of insertions, deletions, and substitutions required to transform one string into the other. A number of stochastic models have been proposed to determine the string edit distance.

A more important general case of the problem of string matching is that of pattern matching. This is a combinatorial problem that addresses issues of searching and matching strings and more complicated patterns such as trees, regular expressions, graphs, point sets, and arrays. The goal is to derive non-trivial combinatorial properties for such structures and then to exploit these properties in order to achieve improved performance for the corresponding computational problem of matching patterns.

The area of pattern matching has attracted a great deal of intensive research; the subject itself has thus changed from a sparse set of isolated results into a full-fledged area of algorithmics with some important practical applications. The subject is expected to grow even further due to the increasing demand for speed and efficiency that come from molecular biology and genetic engineering, but also from areas such as information retrieval, pattern recognition, biometric authentication (such as speech and speaker recognition, feature recognition, and the like), program compilation, data compression, program analysis, and system security.

This is the complete article, containing 716 words (approx. 2 pages at 300 words per page).

Ask any question on String searching algorithm and get it answered FAST!
Answer questions in BookRags Q&A and earn points toward
discounted or even FREE Study Guides and other BookRags products!
Learn more about BookRags Q&A
Copyrights
String-Matching Algorithms from World of Computer Science. ©2005-2006 Thomson Gale, a part of the Thomson Corporation. All rights reserved.

Join BookRagslearn moreJoin BookRags

Join BookRagslearn moreJoin BookRags