Control your language

Email This Post Email This Post Print This Post Print This Post

Posted by mbadger on Mar 20 2007 | How To

Warning. Approximately 2000 words of practical, ready to use technical communication theory follows.

As global communication becomes the norm, the technical communicator’s audience naturally expands to include users who speak English as a second language and users who translate the text. We have a responsibility to author documents that reduce ambiguity; merging controlled language characteristics with sound technical writing practices aids the creation of documentation that is usable by a greater number of people.

Background

According to Developing Quality Technical Information A Handbook for Writers and Editors Second Edition, “English is the native language of less than half of Internet users, although 85% of Web pages are in English.” The implication is obvious. If material is available online, the chances are good that the documentation will be read by somebody who speaks English as a second language or by somebody who translates the text, and this is when a natural language’s problems come to the forefront. Unrestricted language is inherently ambiguous and contextual and requires cultural knowledge to properly decode. Problems arise with idiomatic speech, slang, or when words have multiple meanings and more than one part of speech.

Communicating technical information to a global marketplace is nothing new, and various mechanisms have evolved to aid the delivery of information to multiple languages, including human translation, controlled languages, and machine translation. A technical communicator can aid all of these processes by consistently controlling the source text with controlled language. A controlled language by definition reduces a natural language by restricting the available lexicon and grammar, and thereby reducing the ambiguity.

When creating a controlled language, one must decide the purpose of the source text. Will the source text be read, human translated, machine translated, or a combination of all three? Ursula Reuther, in her paper, “Two in One – Can it Work? Readability and Translatability by Means of Controlled Language” implies that the controlled language rules that are developed for translation are constructed based on the translation tool, and the rules for human readers emphasize “readability, comprehensibility, clarity, and consistency of text.” Ultimately, she concludes that adhering to a controlled language improves both readability and translatability. Reuther’s caveat is “translatability ensures readability. The reverse statement is only true to some extent.”

C.K. Ogden’s controlled language, dubbed Basic English, is meant to be taught to non-English speakers to aid in business and government communication. Basic English’s lexicon is restricted to 850 words, and Ogden argues in Basic English and Grammatical Reform that these 850 words can be used to say “everything we normally say with fifteen or twenty thousand.”

Caterpillar Fundamental English (CFE) builds on the Basic English lexicon by adding industry specific terminology. Like Basic English, CFE is meant to be taught to non-English speakers and acts as a spoken controlled language. Caterpillar uses CFE to train service repair technicians around the world, so the technicians can understand the English documentation.

Caterpillar Technical English (CTE) replaces CFE and moves the controlled language emphasis to machine translation with the help of the Center for Machine Translation at Carnegie Mellon University. In “Controlled English for Knowledge-Based MT: Experience with the KANT System,” Teruko Mitamura and Eric H. Nyberg shift the focus of controlled language to a computer science problem by not only refining a controlled language rule set but by developing an authoring environment and a controlled translation environment. The authoring environment contains grammar and vocabulary checkers to help the author adhere to the controlled language rules.

Machine translation requires effort beyond the authoring of source texts in a controlled language. In his paper “Controlled Language for Multilingual Machine Translation,” Mitamura warns that target languages need to have controlled language versions to increase translation accuracy and reduce the need to introduce non-controlled language editing. By mapping a controlled source language to a controlled target language, the translation is more efficient by reducing the post-translation editing time. However, the process to implement a machine translation system is beyond the budget of all but the largest companies.

The Internet and global software communities, such as open source software, are driving a need for more accessible communication strategies. Ideally, an open source software suite would exist to facilitate the creation of documentation in a controlled language and translate the text into controlled target languages. When the tools become available, a substantial set of controlled language rules can be implemented and customized to meet specific industry needs. Until that time, we should focus our attention on writing that helps control ambiguity, which will in turn aid comprehension and translation.

Writing Strategies

As John Hutchins remarks in his report on “Computer-Based Translation Systems and Tools,” machine translation “has to contend with widespread misunderstanding and ridicule from users of online MT services.” When I was answering support emails for a Linux based software company, I would often use Google to translate support requests and hope that I could find a few words or a phrase to allow me to make an educated guess at the problem. In return, I would encourage users to write in any English they could muster because their worst attempts at English would be better than anything I could get through the online translator. The problem is the translation tool’s inability to understand context and convert one unrestricted, idiomatic language to another unrestricted, idiomatic language.

Hutchins suggests there is a market for translation systems to provide real time web translations or to translate article summaries, rather than the whole article. The value in these applications is to acquire the meaning or an overview of the document quickly, when precise translations do not matter. Controlled language could be used to reduce the complexity of a document and require less contextual decision making to get a usable solution. When writing to my foreign language customers, I never knew if they would run my response through a translator or not. However, I would make sure I responded in a controlled and consistent way.

Merging my support experience with my research into controlled language, I highlight the following guidelines as a way to reduce doubt and create a document that is easier for everyone to interpret, including native language users.

Define a Glossary

Analyze the domain or industry to create a list of words that might be included in the controlled language, including industry and company specific terms. Domain identifies the target group that a language is to be used for. Heavy machinery is an example of a domain, just as open source software is an example of another domain. Controlled language researchers Mitamura and Nyberg list “technical phrases,” “technical words,” and “technical symbols” as the three groups of words to identify during the glossary building stage in their paper. For example, man page is a technical phrase that describes online Linux help files.

After the glossary is compiled, the words must be edited so that the glossary adheres to the ideal of one word, one meaning.

Associate One Word with One Meaning

The goal is to map each word in the lexicon to one meaning. For example, do not use terminal to refer to a computer workstation and also as a Linux command line environment. Likewise, do not interchangeably refer to computer software as a program and an application. Choose a term and be consistent.

If more than one word is used to describe the same item or concept, the additional words can be used to form a synonym list. As Mitamura and Nyberg point out, the goal of one word, one meaning may not always be possible, and the same word will sometimes need to be defined with more than one meaning.

Defining a glossary that adheres to the goal of one word, one meaning is the most difficult advice to follow for an individual technical communicator. Analysis takes time and can be subjective. Without automated checking, some mistakes are likely.

Use Active Voice in the Present Tense

Active voice is a staple technical writing rule because active sentences help the reader identify what needs to be done and who or what needs to do the action. The Developing Quality Technical Information handbook explains that passive voice requires longer and more complicated verb phrases. Consider the passive sentence: The program was opened by Mike. The verb was opened precedes the person doing the action. Rewritten in an active voice, the sentence becomes: Mike opened the program.

Limit Sentences to a Single Action

As the writing’s complexity increases, the opportunity for misunderstanding increases. Rather than assign an arbitrary sentence length, authors should limit each sentence to one action or thought. Avoid making compound sentences. In her paper “Controlling Controlled English An Analysis of Several Controlled Language Rule Sets,” Sharon O’Brien studies the rule sets of eight controlled languages and discovers that sentence length is the only common rule between her test sets.

Eliminate Pronouns

Pronouns are “referential, thus posing a problem for comprehensibility and translatability,” says Reuther. The Developing Quality Technical Information handbook identifies three reasons why the pronoun and noun reference is vague: The noun and pronoun are far apart; the pronoun may apply to more than one noun; the pronoun may refer to an implied noun.

Eliminating pronouns will add redundancy and minimally inflate the word count, but the benefit is clearer text. In contrast, the same repetition would irritate the reader of a literary work.

Avoid Ellipsis

When clear documentation is the goal, do not construct sentences that require the reader to speculate the meaning of the sentence. Reuther warns that using ellipsis will introduce not only readability problems but produce failed translations. Similar to pronouns, elliptical constructions require the reader or the translator to fill in the missing parts. Avoiding ellipsis will also increase redundancy and word count, but will produce clearer text.

I once asked a non-native English speaker the question, “Where are you at?” To which he responded, “I am at home.” Had I asked, “Where are you at in the installation?” I would have received the answer I was looking for.

Future Considerations

While controlled language would sterilize creative writing or dialog within a culture, the implied use is for technology and business. The technical communication community needs a common controlled language grammar with an expandable lexicon to accommodate a wide range of companies and industries. If such an ideal ever came to be, software to support the authoring and translation of documents could become more accessible.

Controlled language is analogous to many of the technologies we document in that they are governed by standards or open guidelines. The World Wide Web Consortium (W3C) is an easily identifiable example. The W3C’s mission is “lead the world wide web to its full potential by developing protocols and guidelines that ensure long-term growth for the web.” The consortium’s guidelines help web developers create sites that are accessible to a wide range of users despite the browser a customer might use.

O’Brien touches on the proprietary nature of controlled language rule sets. In response to sixteen requests, she received the rule sets, or partial rule sets, for eight languages. Of the eight languages O’Brien surveyed, six required confidentiality agreements which stipulated that she could not identify specific rules in relation to the controlled language. The real impact of her research was the subjectivity of the rules among companies. The only rule shared by all eight of the surveyed languages was to keep sentence length short. OBrien’s study reveals only seven common rules, which were shared by at least four languages. This kind of fragmentation results when companies repeatedly solve the same problem with different approaches. No consistency exists across a profession, and no global communication progress is made.

Imagine what the web would be like if developers had to choose between Firefox, Opera, or Internet Explorer users. Granted, some developers still ignore users and deploy sites that require proprietary solutions, but for the most part, access to information is more important than the application used.

In the absence of a controlled language standard and accessible authoring software, writers must focus on creating documentation that is clear and consistent for human readers with the understanding that adherence to a controlled language, even if self imposed, will be more understandable to a greater number of users.

Creative Commons License

This essay originally written May 2006 by Michael Badger.

This work is
licensed under a Creative
Commons Attribution-Noncommercial-No Derivative Works 2.5 License
.

1 comment for now

One Response to “Control your language”

  1. I wrote this essay as my undergraduate thesis with the intent of trying to publish it in a traditional magazine. I decided instead to let it live on my blog.

    22 Mar 2007 at 7:23 am

Trackback URI | Comments RSS

Leave a Reply