BookRags.com Literature Guides Literature
Guides
Criticism & Essays Criticism &
Essays
Questions & Answers Questions &
Answers
Lesson Plans Lesson
Plans
My Bibliography Periodic Table U.S. Presidents Shakespeare Sonnet Shake-Up
Research Anything:        
History | Encyclopedias | Films | News | Create a Bibliography | More... Login | Register | Help

NaturallySpeaking

Print-Friendly
About 7 pages (2,056 words)

Bookmark and Share Know this topic well? Help others and get FREE products!
NaturallySpeaking
A sample dictation in DragonPad, the included text editor.
A sample dictation in DragonPad, the included text editor.
Developer Nuance Communications
Latest release 9.5 / January 2007
OS Microsoft Windows
Genre Voice Recognition
License Proprietary
Website Nuance Communications Website
For the purpose of brevity just the unique name NaturallySpeaking is used throughout the majority of the article.

Dragon NaturallySpeaking is a speech recognition software package produced by Nuance Communications for Windows PCs. It was among the first programs to make speech recognition practical on a PC.[1] NaturallySpeaking uses a minimal visual interface. Dictated words appear in a floating tooltip as they are spoken, and when the speaker pauses, the program transcribes the words into the active window at the location of the cursor. Like other speech recognition software, NaturallySpeaking has three primary areas of functionality. Dictation, whereby spoken language is transcribed to written text; commands that control, whereby spoken language is recognized as a command to click widgets (controls); and finally text-to-speech whereby written text is converted to synthesized audio stream. Early versions of the software had to be trained for approximately 10 minutes to recognize the user's voice, though version 9 no longer requires the initial training. Nuance claims that using NaturallySpeaking, writing a 900 word essay would take 6 minutes, while typing 40 words per minute and writing a 900 word essay would take 22 minutes.

Contents

Common user profiles

  • Health-care industry - This is likely the most profitable sector for speech recognition vendors. The high cost of labor, the specialized polysyllabic vocabulary of medicine, the formalized input, and the need to access a computer while using hands for other tasks makes speech recognition a compelling tool for health-care workers.Scheuer, Armin. Speech recognition experts meet in Berlin. Retrieved on 2007-11-25.
  • Legal industry - Similar to health care
  • Accessibility - Speech recognition is the most effective means of using a computer for those with limited or no ability to use their hands. Many people start using speech recognition after suffering from the symptoms of RSI, although voice strain and RSI of the vocal cords is a possible side-effect. Note this is not especially designed for the disabled, as there are a few key functions that have no default audible way to trigger, thus requiring keyboard input.

Accuracy

Highest accuracy is achieved with, in approximate order of effectiveness:

  • A quality input signal.
  • A powerful computer system.
  • Adding phrases to NaturallySpeaking's vocabulary.
  • Using NaturallySpeaking's Acoustic Optimizer.
  • Correcting NaturallySpeaking's misinterpretations.
  • Feeding NaturallySpeaking many proofread documents.
  • Training NaturallySpeaking.[1]

Accuracy, in practice, varies widely in intelligent system programs. [compare @self-learning]In the present self-learning progam developers made use of [ hidden Markov modelpdf.] of human natural language speech. Accuracy of conversion from speech to text is especially dependent on the accuracy, and fidelity of the 'single-user voice profile' to the words and sound of the speaker's 'normal' voice. Such single user voice profiles are intended to provide a large spoken vocabulary rendered into text for only one, or a small set of, profiled user[s]. Indeed, the accuracy of such profiles are sensitive to the accuracy of early program input; or early program 'learning'.[2] As well, hardware and environmental conditions should be optimized for best results.

  • Any noise introduced in the path from throat to computer can reduce accuracy by degrading signal quality. Speech is created by the larynx, travels through cable, ending inside the computer at the sound chip. Causes include poor quality microphones, too much ambient noise around the speaker, excessive noise inside the computer case.
  • The integrated sound cards included in many laptops do not have shielding, such as computers from Dell, Compaq, and Hewlett-Packard desktops.
  • Microphones that contain noise canceling are often considered preferable and many inexpensive ones offer excellent performance.
  • Note that traditionally such a microphone is not included in the boxed sets; instead it is recommended that the starter microphone be replaced.

The key concepts are NaturallySpeaking learns with mostly prevention, moderate correction (e.g. 1 in a 100) and lastly training. Generally prevention approach is favored by adding unusual words to NaturallySpeaking before dictating them. As for correcting misinterpretations afterwards, it works more effectively by including adjacent words (context), thus helping to distinguish similar sounds.

Editions and versions

There are a range of editions of Dragon NaturallySpeaking, each of which comes with a noise-canceling headset microphone: - Dragon NaturallySpeaking Standard Edition, which is an entry-level version of the product that enables command and control of the PC, as well as speech-to-text input for email, instant messaging clients and word processing. - Dragon NaturallySpeaking Preferred Edition, which adds robust integration with Microsoft Office (Word, Excel, etc.) and Corel WordPerfect, as well as support for digital recorders and Bluetooth headset microphones. This version also allows users to control the formatting of documents by voice (increase font size, set colors, set columns, insert tables, etc.) - Dragon NaturallySpeaking Mobile Edition, which is the Preferred edition with a digital voice recorder. Users can dictate on the recorder, upload the recording to their PC, and have Dragon NaturallySpeaking convert the recording into a text or Microsoft Word document. - Dragon NaturallySpeaking Professional Edition, which adds robust support for custom voice commands (where the user can associate a word or phrase to pre-defined text or graphics) and scripting (which enables automated operation and integration with more specialized applications such as electronic medical records and case management applications.) - Dragon NaturallySpeaking Professional Medical and Professional Legal, which are versions of the Professional edition with extended vocabularies for those domains. - Dragon NaturallySpeaking Client and Server Developer Toolkits (SDK's), which are used by commercial and in-house developers to integrate speech with their applications. This can include front-end speech recognition (adding speech input to the application) and back-end speech recognition (batch processing of recorded speech for search, transcription or other application areas). Total command-and-control requires a lot of research and support. Nuance has chosen not to go down that road. Nuance provides the tools to create commands, but charges for command support. This has led to a prevalence of value-added resellers (VARs), people who develop commands to solve problems such as reducing the repetition of a series of events into a few spoken words. NaturallySpeaking can be extended by other programs. NatLink, for instance, is a tool that allows NaturallySpeaking to interact with the Python programming language. In addition, certain versions of NaturallySpeaking (7,8,9) can be run on Linux operating systems using the WINE open-source implementation system. The implementation does not give full Windows functionality, concentrating instead on text-recognition capability using the included DragonPad.

Version Release date Editions
1.0 June 1997 Personal
2.0 November 1997 Standard, Preferred, Deluxe
3.0 October 1998 Point & Speak, Standard, Preferred, Professional (with optional Legal and Medical add-on products)
3.01 Teens
4.0 August 4, 1999 Essentials,Standard, Preferred, Professional, Legal, Medical, Mobile
5.0 July 2001 Essentials, Standard, Preferred, Professional, Legal, Medical
6.0 November 15, 2001 Essentials, Standard, Preferred, Professional, Legal, Medical
7.0 March 2003 Essentials, Standard, Preferred, Professional, Legal, Medical
8.0 November 2004 Standard, Preferred, Professional, Legal, Medical
9.0 July 2006 Standard, Preferred, Professional, Legal, Medical, SDK client, SDK server

History

NaturallySpeaking has passed through four companies and evolved considerably since its first beginnings in the early 1980s as a research prototype called DRAGON. The married couple Dr. James Baker and Dr. Janet Baker founded Dragon Systems in 1982, deciding to commercialize DRAGON when their funding was cut by DARPA. Their first product DragonDictate was sold for a number of years. Dr. James Baker departed from the conventional AI, and was a pioneer in Hidden Markov models, a way of using statistics for recognition of speech. His wife developed the expert system named Hearsay. In March of 1990, Dragon Systems began selling DragonDictate (for DOS) at a cost of $9000 for a single-user license. As hardware became less expensive over the next several years the price decreased, and by the time NaturallySpeaking 1.0 was released, the price of DragonDictate for Windows was about $2000. The hardware of the time was not yet powerful enough to address the difficult problem of word segmentation, and was unable to determine the boundaries of words in the continuous signal that constitute human voice. Users had to pronounce one word at a time, each clearly separated by a small pause before the next. DragonDictate is based on a trigram model, and is known as a discrete speech recognition engine. In 1997 advances in hardware technology allowed NaturallySpeaking version 1.0 to launch as the first available continuous dictation system. During this time the speech recognition industry promoted enthusiastically the notion that speech input was "the" natural modality that would eventually supersede more "primitive" methods such as keyboards. Trying to reach a mass market, vendors dropped prices to levels that were unsustainable. Lernout & Hauspie bought Dragon Systems in 2000. The dictation system bubble burst in 2001, and Lernout & Hauspie had a spectacular bankruptcy. ScanSoft Inc. bought the rights for Dragon products. In 2005, ScanSoft bought Nuance Communications , and changed the name of the newly combined entity to Nuance. This shows a particular drive of the company to move further into the Enterprise speech arena. The software today is being advertised as potentially up to 99% accurate.

Features missing since DragonDictate

Later versions of NaturallySpeaking include a feature to ignore some types of external noise. This is the Nothing But Speech technology originally ported over from the L&H product Voice Xpress. While individual noises can't be trained as with DragonDictate there is suppression using NBS running in the background with NaturallySpeaking 8. Previous to version 8 Preferred, it was impossible to have several language versions of Dragon NaturallySpeaking installed on one system (for example: German and French). However, all non-English versions of DNS also contain the functionality to dictate in English. This problem was rectified in DNS 8 Preferred, where all languages can coexist and function fully on a single installation.

Competition

Philips SpeechMagic

Market leader within the medical industry according to Frost & Sullivan [3][4], SpeechMagic is a recognition engine that may be run either as a stand-alone product or integrated into other applications.

Speech API in Office, Tablet PCs, and Vista

Speech recognition functionality built on Microsoft's Speech API (SAPI) 5.1 is included as part of Microsoft Office and on Tablet PCs running Microsoft Windows XP Tablet PC Edition. It may also be downloaded as part of the Speech SDK 5.1 for Windows® applications; but since that is aimed at developers building speech applications, it lacks any user interface, and thus is unsuitable for end users. Windows Vista includes version 8.0 of the Microsoft speech recognition engine along with a completely new speech experience, known as Windows Speech Recognition.

ViaVoice

IBM ViaVoice was licensed to Nuance (formerly ScanSoft) a few years ago. Control and development remain in the hands of IBM. Functionality is similar to NaturallySpeaking, but ViaVoice is considered inferior. It is available on Linux and Mac OS X (although these versions are no longer maintained).

iListen

iListen is the leading OS X speech recognition program, but it is generally regarded as inferior to NaturallySpeaking.

References

  1. ^ Sindya N. Bhanoo. July 16, 2007. Language quest moves to Hopkins: Speech-to-text technology expert joins defense work. Baltimore Sun. Retrieved on July 20, 2007.

See also

External links

Forums

View More Summaries on NaturallySpeaking
 
Ask any question on NaturallySpeaking and get it answered FAST!
Answer questions in BookRags Q&A and earn points toward
discounted or even FREE Study Guides and other BookRags products!
Learn more about BookRags Q&A
Copyrights
NaturallySpeaking from Wíkipedia. ©2006 by Wíkipedia. Licensed under the GNU Free Documentation License. View a list of authors or edit this article.

Article Navigation
Join BookRagslearn moreJoin BookRags




About BookRags | Customer Service | Report an Error | Terms of Use | Privacy Policy