Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5

The Statistical Strategy: Is translation an art or a math problem?

#1
C C Offline
http://www.nytimes.com/2015/06/07/magazi...oblem.html

EXCERPT: [...] The possibility of machine translation, Schwartz explained, emerged from World War II. Weaver, an American scientist and government administrator, had learned about the work of the British cryptographers who broke the Germans’ Enigma code. It occurred to him that cryptographic investigations might solve an immediate postwar problem: keeping abreast of Russian scientific publications. There simply weren’t enough translators around, and even if there were, it would require an army of them to stay current with the literature. “When I look at an article in Russian,” Weaver wrote, “I say: ‘This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode.’ ” In this view, Russian was merely English in frilly Cyrillic costume, only one small step removed from pig Latin.

Within a year or two, this idea was understood as absurd, and yet the broader notion of algorithmic processing held. By 1954 the American public was treated to a demonstration of the first nonnumerical application of computing. A secretary typed a Russian sentence onto a series of punch cards; the computer whirred and spat out an English equivalent. The Christian Science Monitor wrote that the “electronic brain” at the demonstration “didn’t even strain its superlative versatility and flicked out its interpretation with a nonchalant attitude of assumed intellectual achievement.”

That demonstration, however, was basically rigged. The computer had been given a pidgin vocabulary (a total of 250 words) and fed a diet of simple declarative sentences.

In 1960, one of the earliest researchers in the field, the philosopher and mathematician Yehoshua Bar-­Hillel, wrote that no machine translation would ever pass muster without human “post-­editing”; he called attention to sentences like “The pen is in the box” and “The box is in the pen.” For a translation machine to be successful in such a situation of semantic ambiguity, it would need at hand not only a dictionary but also a “universal encyclopedia.” The brightest future for machine translation, he suggested, would rely on coordinated efforts between plodding machines and well-­trained humans. The scientific community largely came to accept this view: Machine translation required the help of trained linguists, who would derive increasingly abstract grammatical rules to distill natural languages down to the sets of formal symbols that machines could manipulate.

This paradigm prevailed until 1988, year zero for modern machine translation, when a team of IBM’s speech-­recognition researchers presented a new approach. What these computer scientists proposed was that Warren Weaver’s insight about cryptography was essentially correct — but that the computers of the time weren’t nearly powerful enough to do the job. “Our approach,” they wrote, “eschews the use of an intermediate mechanism (language) that would encode the ‘meaning’ of the source text.”

All you had to do was load reams of parallel text through a machine and compute the statistical likelihood of ­matches across languages. If you train a computer on enough material, it will come to understand that 99.9 percent of the time, “the butterfly” in an English text corresponds to “le papillon” in a parallel French one. One researcher quipped that his system performed incrementally better each time he fired a linguist. Human collaborators, preoccupied with shades of “meaning,” could henceforth be edited out entirely.

Though some researchers still endeavor to train their computers to translate Dante with panache, the brute-force method seems likely to remain ascendant. This statistical strategy, which supports Google Translate and Skype Translator and any other contemporary system, has undergone nearly three decades of steady refinement. The problems of semantic ambiguity have been lessened — by paying pretty much no attention whatsoever to semantics.

The English word “bank,” to use one frequent example, can mean either “financial institution” or “side of a river,” but these are two distinct words in French. When should it be translated as “banque,” when as “rive”? A probabilistic model will have the computer examine a few of the other words nearby. If your sentence elsewhere contains the words “money” or “robbery,” the proper translation is probably “banque.” (This doesn’t work in every instance, of course — a machine might still have a hard time with the relatively simple sentence “A Parisian has to have a lot of money to live on the Left Bank.”) Furthermore, if you have a good probabilistic model of what standard sentences in a language do and don’t look like, you know that the French equivalent of “The box is in the ink-­filled writing implement” is encountered approximately never....
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  4+ hours smartphone use tied to health risks for adolescents (statistical analysis) C C 1 101 Dec 10, 2023 10:31 PM
Last Post: Magical Realist
  Research Something is wrong with psychological research (decades of statistical delusion) C C 1 114 Nov 1, 2023 05:19 PM
Last Post: ellisael
  Study finds genetic links between traits are often overstated (statistical analysis) C C 0 282 Nov 18, 2022 03:31 AM
Last Post: C C
  COVID as a hoax is ‘gateway’ to belief in conspiracy theories (statistical analysis) C C 1 330 Oct 27, 2022 03:18 AM
Last Post: Kornee
  Statistical technique hints at hundreds of lost medieval legends C C 0 80 Feb 21, 2022 11:42 PM
Last Post: C C
  Racism is a framework, not a theory (statistical modeling) C C 1 110 Jan 8, 2022 02:49 AM
Last Post: Syne
  How the strange idea of ‘statistical significance’ was born C C 1 94 Aug 12, 2021 11:17 PM
Last Post: Syne
  We’re very close to finding a solar system like our own (statistical data analysis) C C 0 241 Jun 7, 2020 04:51 PM
Last Post: C C
  Statistical analysis reveals odds of life evolving on alien worlds C C 0 136 May 21, 2020 04:49 AM
Last Post: C C
  How a statistical paradox helps to get to the root of bias in college admissions C C 0 319 Dec 20, 2017 06:55 PM
Last Post: C C



Users browsing this thread: 1 Guest(s)