What is a transcription error?
A transcription error is a kind of knowledge entry error generally made by human operators or optical character recognition (OCR) packages. Human transcription errors are often the results of typographical errors brought on by hanging the incorrect key on a keyboard or by putting two or extra fallacious keys due to finger-keyboard misalignment. Electronic or non-human transcription errors usually happen as a result of a program attempts to scan matter that it’s unfamiliar with or it can’t read.
Transcription errors happen when knowledge (words, letters, numbers, special characters) are incorrectly entered into an info system. The system is usually a pc text file or some sort of electronic data system. These errors are often unintentional and may occur when a transcriber (human or machine) data supply info incorrectly or enters the knowledge incorrectly into the digital system.
Transcription errors have been the bane of authors and editors for many years. Different customers, corresponding to medical and authorized workplaces, additionally commonly expertise such errors. It’s because they transcribe giant portions of hand-written notes, audio tapes and other kinds of unstructured text documents into digital formats, and errors occur through the transcription course of. This may increasingly occur whether or not the transcriber is a human or a machine.
Listed here are some examples of transcription errors:
- ZIP code: 5forty eight29 (fallacious) as an alternative of 5forty seven29 (right)
- Identify: Stamley (fallacious) as an alternative of Stanley (right)
- Date: Jun 42, 2003 (flawed) as an alternative of Jun 24, 2003 (right)
Human transcription errors vs. machine transcription errors
As more printed matter is transcribed into digital format and with the growing workload on transcribers (each human and digital), this drawback is more likely to get worse earlier than it will get higher.
In most transcription tasks there are considered one of two explanation why transcription errors happen. One is straightforward human carelessness or lack of consideration to detail. Human misunderstandings can even end in errors. A standard explanation for misunderstandings is accent variations; another is the speaker not speaking or enunciating clearly. Different widespread human causes embrace the next:
- Transcribers not wanting at the pc display when typing.
- Transcribers can’t accurately read (or hear) the source material.
- Transcribers are unfamiliar with the transcription gear or the supply material (or its material).
- The source material has an excessive amount of jargon (technical terms) or uses too many long, complicated sentences.
- Transcribers misplace their fingers on the keyboard.
Using OCR software program can even lead to transcription errors. It’s because the software program can’t comprehend language or perceive context. As an alternative it is going to match the acquired input with info in its database. If a match just isn’t discovered, it’ll incorrectly interpret the brand new enter, leading to a transcription error. Such errors are widespread when software tries to transcribe the letters and words in a scanned image of a document to convert the document into a digital type. The software could also be unable to carry out accurate transcriptions, leading to transcription errors if the next happens:
- The supply doc incorporates illegible handwriting or blotches.
- The supply document is wrinkled.
- There’s filth on the scanner.
- The lighting is poor.
Detecting and measuring transcription errors
Transcription errors could be measured with the phrase error price (WER). WER refers to the number of errors in a bit of textual content divided by the whole variety of phrases.
WER = number of errors ÷ number of phrases
The WER might be calculated by adding all the insertions, deletions and substitutions occurring in a bit of text (which incorporates a sequence of acknowledged words). The quantity is then divided by the full number of phrases in the text to derive the WER proportion.
WER = ((insertions + deletions + substitutions) ÷ variety of phrases) × one hundred%
The next applies to this method:
- Substitution = a letter in a phrase getting changed to create a new word. Example: chamcoal (incorrect) as an alternative of charcoal (right).
- Deletion = a letter in a word getting removed to create a brand new word. Example: mose (incorrect) as an alternative of mouse (right).
- Insertion = a letter in a word getting added or a new phrase getting added. Instance: we’ve um received a brand new uh uh automotive (incorrect) as an alternative of we have got a new automotive.
Suppose an unique audio file (to be transcribed) accommodates eighty five phrases. The transcription included 17 substitutions, insertions and deletions.
WER = 17 ÷ eighty five = 0.2 × one hundred% = 20%
In lots of situations, a suitable WER is about for knowledge entry staff. This number can differ depending on the transcription use case. The WER is all the time low in important use instances. For example, in the medical area, a small medical transcription error may be detrimental, so the WER is all the time set at a low threshold.
Detecting and decreasing transcription errors
Some transcription errors may be detected utilizing spell-checking packages. Nevertheless, many transcription errors, notably these involving numeric knowledge, are troublesome or inconceivable to detect. That stated, it’s potential to scale back the potential for transcription errors with double knowledge entry of the identical supply material. This refers to multiple individuals transcribing the identical material after which evaluating the transcriptions to verify accuracy. Nevertheless, this technique will increase transcription effort, time and costs because it requires more human assets.

Another option to detect and scale back errors is to make use of automated quality management software that checks sentence syntax and context to seek out incorrect letters or phrases. Software with automated transcription capabilities or powered by synthetic intelligence, machine learning or APIs can generate extra correct transcriptions. Usually a robust high quality control process can scale back transcription errors. Coaching transcribers to properly learn or hear source material and comply with transcription greatest practices also can scale back errors.
Transcription errors vs. transposition errors
Transcription errors aren’t the same as transposition errors, although each are widespread error varieties that happen during knowledge entry and transcription. A transcription error occurs when the wrong values or letters are enter are by a human or pc program. In contrast a transposition error happens when certain characters or letters are interchanged (transposed).
These are examples of transposition errors:
- ZIP code: 5seventy four29 (flawed) as an alternative of 5forty seven29 (right)
- Identify: Stnaley (incorrect) as an alternative of Stanley (right)
- Date: Jnu 23, 2003 (incorrect) as an alternative of Jun 23, 2003 (right)
Transposition errors are virtually all the time human in origin, whereas transcription errors may be brought on by people and machines (e.g., OCR software).
See how enterprise analytics benefits of natural language processing.