Natural Language Processing

August 25, 2014 by Robyn DeAngelis · Leave a Comment
Filed under: Language, Translation Services 

Natural Language Processing or (NLP) can best be described as the field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human languages. NLP ties into translation through use of Machine Translation (MT), which is known as the task of automatically, converting one natural language into another, preserving the meaning of the input text, and producing fluent text in the output language. Other types of NLP include information extraction, sentiment analysis and question answering.

NLP was “discovered” in the 1950s by Alan Turing, who’s “Turing Test” (test of a machine’s ability to exhibit intelligent behavior equal to, or indistinguishable from, that of a human) is one the determinants of intelligence.  Later, in 1954, the Georgetown experiment (involving the fully automated translation of more than 60 Russian sentences into English) was conducted with the hope that within five years, machine translation would be a solved problem.

In 1964, a committee of seven scientists, the Automatic Language Processing Advisory Committee, was established by the US government for the purpose of evaluating the progress in machine translation. The findings of this committee, known as the ALPAC report was issued in 1966, gained notoriety for being very skeptical of research done in machine translation so far, and emphasized the need for future research and eventually caused the U. S. Government to reduce its funding of the topic dramatically.

Later in the 1960s, many other advances in NLP were made, most notably, the ELIZA computer program (based on Carl Rogers and his client-centered brand of talk therapy) which was an early example of primitive NLP.  ELIZA operated by processing users’ responses to scripts, the most famous of which was DOCTOR, a simulation of a Rogerian psychotherapist.  Using almost no information about human thought or emotion, DOCTOR sometimes provided a startlingly human-like interaction.

Throughout the 1970s and 80s, new trends in NLP started to emerge. Many programmers began to write “conceptual ontologies” in the 1970s which structured real-world information into computer-understandable data and chatterbots (computer programs designed to simulate an intelligent conversation with one or more human users via auditory or textual methods, primarily for engaging in “small talk”) were also employed. Up to the 1980s, most NLP systems were based on complex sets of handwritten rules. During the late 1980s, however, we started to see the development of the cache language models upon which many speech recognition systems now rely.  While Google has made great strides with NLP and MT, the challenge is still maintaining the integrity of sentence structure.

Moving forward, try to imagine talking to your computer and having it talk back to you in a fluid style.  Apple’s Siri is a great example of where NLP is headed but it is only the tip of the iceberg. According to my research, as NLP technology advances computers will have a much easier time understanding us.



Rise of the Machines?


Machine translation is a topic that’s come up a lot recently – not just in conversations with clients curious about cutting-edge translation solutions, but in other aspects of everyday life.  We hear about MT innovations in the media, find it embedded in web browsers and probably come across materials in our day-to-day lives that it has generated.  And practical applications aside, it’s a topic that people want to learn more about because it appeals to something basic in our lives – the urge to communicate; to hear and be heard.

As we’ve discussed in the past here on the ABLE blog, it’s our collective opinion that Machine Translation technologies are not yet mature enough to serve as a reliable translation solution for client-facing or otherwise quality-sensitive materials.  That said, for more informal uses, MT can be a useful and practical tool.

While a number of methodologies exist for MT, two leaders have emerged in terms of reliability and quality.

Rule Based Machine Translation:  RBMT translation systems are built largely using linguistic resources like bilingual dictionaries, grammar libraries and syntactic knowledge-bases to assemble translations.  The tools rely on the syntactic knowledge-base to assemble translations.  One of the primary benefits of RBMT is it can be used in conjunction with client-customized glossaries to enhance the translation’s subjective quality.

Statistical Machine Translation:  SMT translation systems are likewise built using existing bilingual dictionaries and grammar libraries.  Instead of using a set of fixed syntactic rules to translate, however, it relies on statistics and probability to select and apply the best grammatical approach.  The statistical approach is calibrated by referring to corpus, or a body of existing translations.

While both of these approaches have their benefits, a so-called hybrid approach is often taken that relies on a mix of Rules-based and Statistical-based principles to achieve the best results for a given client.

In previous blogs, we’ve discussed the drawbacks to MT when it comes to sensitive materials. That said, there are several uses for existing technologies.

Gist-translation of large bodies of content: It’s not uncommon for clients to have repositories of content that are not in need of perfect-quality translation, but still must be understandable for internal use.  Consider an international law firm with boxes of legal documents totaling over a million words.  Not every word of that material will be directly relevant to a case, but it’s critical to review the entire body of text to identify relevant information.  This first pass is an excellent use for MT.  Once the relevant areas of text have been identified, these can then be sent for human re-translation or editing.

First-pass translation: If source text is of a sufficiently literal and straightforward nature, Machine Translation can be adequate for the translation stage of client-facing materials.  This initial translation pass would then be followed by a human editing and quality assurance pass.  Technical manuals are often suitable for translation.

Ancillary translation:  Existing (and often free) MT solutions can benefit clients in many ways, and social media and collaboration clients may find these to be especially useful.  Since these platforms rely on the ability of users of all linguistic backgrounds to interact and share content they themselves have created, formal human translation is impractical or entirely impossible.  This creates a frustrating challenge –it’s easy and affordable to reach out to new markets by simply translating a site’s User Interface, help and other in-house content through traditional methods; but it’s often not worth it if it simply creates a body of segregated linguistic communities.  However, with embedded, easy-to-use tools like Google Translate at everyone’s fingertips, the ability for end users to connect across linguistic boundaries is just a click away.

If you have questions about how Machine Translation may be right for you, ABLE’s highly-experienced team of experts is here to help.  We look forward to discussing your team’s specific challenges and suggesting a customized solution using one or more of these approaches.



Next Page »