Forgot your password?  

Not What You Meant?  There are 24 definitions for Recognition.  Also try: OCR or OOCR.

Optical Character Recognition (Ocr) | Research & Encyclopedia Articles

Print-Friendly   Order the PDF version   Order the RTF version
About 3 pages (956 words)
Optical character recognition Summary

 


Optical Character Recognition (Ocr)

Although computers have made tremendous strides over the last few decades in terms of their hardware, especially the processor speeds and sophistication, some things have not changed nearly as much. The most important thing that is still the same now as twenty years ago is the constraint of the WIMP interface used in human-computer interaction. WIMP is an acronym denoting window, icon, menu, and pointing device, and is used to describe the sort of interface that has been common for quite some time now. A human user typically interacts with the computer through a screen, a keyboard, and a mouse. The screen is the most important, or often the only, means for the computer to communicate its output. The screen typically consists of one or more windows, each of which may be used for separate files or other environments; the screen also shows icons for various objects, and menus from which the human user may choose options. The human uses a pointing device (typically a mouse) to choose icons, menus, or windows.

The WIMP menu certainly has its good points, but the fact that it has changed so little over such a long period of time, while other aspects of computing have become a lot better, is certainly troublesome. It is undoubtedly true in many contexts that if only one were able to take regular notes in longhand and have a computer interpret them, one would find life so much easier than having to use a keyboard and format text. Although handwriting recognition systems that can serve limited purposes do exist, such systems are not widespread at present for several reasons: human handwriting is very varied, and it is difficult to create a system that will handle the diverse range of possible styles; a single individual's handwriting can also change with age, illness, injury, moods, and the like. Lastly, certain specialized subjects like medicine or mathematics have their own very involved styles and notations that are sufficiently different from common language that the computer would almost need to be a physician or a mathematician to understand the writings of professionals in those subject.

Although full-scale computer comprehension of human writing is impossible at this time, systems that can, to a limited extent, translate writing into computer files do exist. Such systems are also useful in the context of reading off printed text (either computer printouts where the software and files are unavailable, or printed text not generated by a computer) and creating a computer file that can be edited or used for other purposes. The common feature of such systems is optical character recognition (OCR). The term stands for general character recognition by a machine, including the transformation of anything humanly readable to a machine manipulable representation. As noted, character recognition may involve either recognition of machine-printed characters, or of human handwriting. At this time, there have been significant advances only in the field of machine-printed character recognition, and most software that is available in the market is geared toward that function. For this reason, some pople like to use a separate term ICR, or intelligent character recognition, for systems that explicitly are also meant to recognize handwritten characters. ICR is very much an active research area at this time, and a conclusion does not seem to be imminent. Most working models are based on the constraints of using only limited vocabularies and "clean" handwriting. Some may also involve "training" the software with large, known samples of a person's writing before it can recognize characters produced by her hand.

With OCR software, converting printed text on paper into electronic files can be almost as easy as feeding the pages to a scanner. An OCR package takes an image (actually, a bitmap graphic) from the scanner, analyzes it, and tries to translate it into a text file. Each character is recognized by its shape, and a corresponding character is generated and placed in the file. It is possible to edit and format this file using a word processing package, just as you would any other text file. This is a significant advantage in cases where large quantities of printed text have to be converted into electronic data. However, though the characters are recognized in a way that allows most words and even sentences to appear correctly in the generated file, a significant limitation is that most OCR software today does not correctly recognize footnotes, page numbers, marginal comments, headers, italicized text, and the like. Strange fonts, drop caps, and the like may produce incorrect results also. Contemporary OCR also does not format the text into a replica that of the original document; all that it achieves is the creation of a computer file containing all the text on the original page. A fair amount of post-processing by hand is thus called for in order to achieve a version that is actually close or identical to that of the original text.

The success of an OCR effort depends on several factors, among them the quality of the original text and the quality of the scanned image--if the original page is smudged, or if the image is not scanned to an adequate resolution, the results may not be satisfactory. Having a higher quality image is often useful, but may not necessarily be so.

Another major research area in OCR is the recognition of text in foreign languages that do not use the Roman script (common to modern English, French, German, etc.). Certain scripts that have characters based on phonetic syllables or pictographs are especially difficult even with printed text, and much work needs to be done in their respect. Such work also has the potential of one day allowing for automatic machine translation of printed text in foreign languages into English.

This is the complete article, containing 956 words (approx. 3 pages at 300 words per page).

More Information
  • View Optical Character Recognition (Ocr) Study Pack
  • 24 Alternative Definitions
  • Search Results for "Optical Character Recognition (Ocr)"
  • More Products on This Subject
    Optical Character Recognition
    Optical Character Recognition (OCR) uses a device that reads pencil marks and converts them into a ... more

    Computer Pattern and Character Recognition
    Character recognition is the technology of using machines to identify symbols (usually alphanumeric... more


    Ask any question on Optical character recognition and get it answered FAST!
    Answer questions in BookRags Q&A and earn points toward
    discounted or even FREE Study Guides and other BookRags products!
    Learn more about BookRags Q&A
    Copyrights
    Optical Character Recognition (Ocr) from World of Computer Science. ©2005-2006 Thomson Gale, a part of the Thomson Corporation. All rights reserved.

    Join BookRagslearn moreJoin BookRags

    Join BookRagslearn moreJoin BookRags