Document Processing
Documents serve to archive and communicate information. Document processing is the activity of operating on information captured in some form of persistent medium. Traditionally, that medium is paper, and documents are bundles of paper with information captured in print or in writing.
Document processing may serve to coordinate and conduct business transactions. When a customer submits an order to purchase a certain product, the order becomes a document for processing. The manufacturing company coordinates the activities of acquiring the raw materials, making the product, and finally delivering it to the customer with an invoice to collect payment—all by passing documents from one department to another, from one party to another.
Humans, endowed with the capacity to read, write, and think, are the principal actors in document processing. The invention of the modern digital computer, supported by various key technologies, has revolutionized document processing. Because information can be coded in other media that is read and written by the computer—from punched cards in the early 1960s to magnetic tapes, disks, and optical CDs (compact discs) today—it is not always necessary for documents to be on paper for processing.
Automatic Data Processing
If one can implement decision-making into the logic of a computer program, and have the relevant information in the documents coded in some medium for the computer to read and write, the computer running the program can process the documents automatically. Unless the decisions in processing the documents require the intelligence of a human expert, the computer is much faster and more reliable.
The repository for the information is a database. Since the information in the database is readily accessible by the computer, one can generate the paper documents with the desired information any time it is necessary. Automatic data processing and the database technologies for information maintenance and archival have existed since the 1960s. For decisions that require the judgment of a human expert, document processing must bring in the knowledge workers—human users with the expertise in the relevant field of knowledge.
Typographics and Reprographics
The computer is also a versatile tool for the preparation and reproduction of documents. During the early 1980s, as a result of advances in printing technology, text formatting and typesetting tools were available on the computer. People can use these tools to create document content while at the same time specify the presentation layout, including typesetting details. People can keep all the information in some persistent medium such as a disk file. This is called a source document, since the computer tool can use it as input to generate the printed document as output.
Commonly the source document contains coded information in a mark-up language—tags that specify typesetting and presentation layout information. Mark-up languages may also incorporate the use of images and graphical drawings supported by the printing technologies. Low-cost laser printers became available in the mid-1980s. These tools greatly enhance one's ability to produce documents readily on demand. It is necessary to keep only the source documents in a computer-readable medium.
Interactive Graphics and Multimedia
A document does not need to be printed on paper in order for people to view it. Since the bit-mapped monitor screen was invented in the 1970s, people can also view a document on the monitor screen. This allows people to interact with the document directly on the screen. The printed document is called a hard copy, and a displayed document on the monitor screen is known as a soft copy. Both copies are generated from the source document.
When processing documents, particularly images, the color and contrast can be enhanced. Printing on a color laser will create a clear and crisp image.
Using interactive graphics and window interfaces, users can treat the monitor screen as a desktop and retrieve any document for viewing, or interact with one document to bring up another document. Multiple users can easily share documents and view related documents on the computer at the same time. This also means that someone can use the computer to mediate and coordinate the timing and sequencing of people working on documents. A workflow system can implement the business rules of operation to coordinate multiple parties working together in document processing. It is conceivable that an office may have employees working on documents without ever needing to print out the documents on paper. That is the idea of document processing in a paperless office.
Another worthwhile note is the changing concept of a document. The source document kept in a disk file may incorporate document content with graphical drawing, images, and the typesetting and layout information, as well as audio and video scripts. On a computer equipped with the proper hardware, the soft copy of the document can show a video script or play an audio segment. Such a multimedia document is a new concept of the document: It is no longer a physical bundle of papers.
Telecommunications and E-Commerce
Since people can view a document on a monitor screen to work on it, and they can print out the document on paper only when a hard copy is needed,they can easily share documents by sending them across computer networks. Electronic mail (e-mail) is a document sent from one person to another over a network. The Internet was originally proposed in the early 1980s for the purpose of communication between researchers, connecting the computers in research institutions across the nation. But as the Internet has rapidly grown with documents shared by more and more people, the network has become a channel for publishing. The parties involved, however, need to jointly observe certain standards for the communication protocol and the format for source documents.
Servers are the computers that send documents out on request, and browsers are the tools that are used to make the requests and view the documents received. Servers and browsers must observe the same standards for communication protocol and document format. Hyper Text Transfer Protocol (HTTP) for communication and Hyper Text Mark-up Language (HTML) were established as the standards for source documents in the 1990s. Computers supporting these standards on the Internet formed the World Wide Web.
The Internet continues to grow, virtually covering the whole world today. Document processing on the web can readily involve anybody in the world. Documents can be published and made available for public access from a web server.
The web has become a marketplace for business. E-commerce is a major application of document processing on the World Wide Web. A company may publish a document on a web server to advertise itself and attract customers. Viewers of the document may then interact with it to go to other documents to seek more information. A viewer may also submit an order to make a purchase, sending the order as a document to the company to initiate trading.
Document Structures and Formats
When there are more and more large, complex documents on the Internet, people want to be able to process most of these documents automatically. They want to mark up the structure of document content, so that computer programs can process the content guided by the markup tags. The generation of a soft copy for viewing is simply one of the functions of processing the document.
HTML is a document format designed primarily for viewing using a web browser. Using HTML, people mark up the content of a document with tags for presentation and layout information. A new document format, called Extensible Markup Language (XML), was drafted in November 1996 and has gone through many revisions. XML is a meta-markup language in the sense that it allows one to design the right tags to mark up the content of a document to indicate the structure of its content. Different areas of application domain apply different sets of vocabulary for markup tags. Although molecular biology researchers may use one set of tags, lawyers may use a different set. The style of presentation can be specified according to content structure, and a computer program will be able to display the document for viewing. XML is now emerging as the standard format for documents on the World Wide Web.
Intelligent Agents
There is now a vast amount of information on the Internet, and the information changes quickly. It can be difficult to find useful information, to track changes, and monitor certain situations. For example, a user might be interested in collecting information on stock prices and want to pay attention only to those that change quickly, or to a very high or very low price. Even when he can gather the information, it is difficult to watch too many stocks at the same time.
An interesting active research area today is that of intelligent agents. An intelligent agent is like a software robot. It is an active program that processes information from documents on the web. An agent may actively watch changes in stock prices, on behalf of its owner who launched it; or it may determine the right combination of plane tickets and hotel reservations for a travel itinerary specified by its owner. It becomes even more interesting when these intelligent agents interact with one another. An agent may be trying to sell some product while another agent may be looking for the right product to buy. The two agents may make the trade, each serving its particular owner. XML is one of the key technologies that makes this possible, because these agents need to process the contents of documents intelligently.
With the Internet, the amount of information, and therefore the number of documents that people need to deal with, is much larger than ever before. It is often said that the world is in the Information Age. Document processing will continue to be a major activity of people working with information. The possibilities for harnessing the power of information are endless.
Input Devices; Markup Languages.
Bibliography
Anderson-Freed, Susan. Weaving a Website. Upper Saddle River, NJ: Prentice Hall, 2002.
Harold, Elliotte Rusty. XML: Extensible Markup Language. Foster City, CA: IDG Books Worldwide, 1998.
This is the complete article, containing 1,635 words
(approx. 5 pages at 300 words per page).