That holds especially for machine translation, or MT–the software that translates Web pages for the likes of Google and AltaVista. In recent years a handful of search engines have come to dominate the globe, even though at best they do a poor job of translating Web pages. This performance gap is keeping millions of non-English-speaking people from getting access to English-language Web pages–which currently account for about 35 percent of the billions of Web pages now available via search engines. If engineers can solve some of the more vexing problems of machine translation–and many think they eventually will–it could transform the competitive landscape for search-engine firms.

It will also be the biggest advance in MT since U.S. Sovietologists used computers to make sense out of Russian-language documents during the cold war. They made swift advances, but couldn’t crack the tougher problems–the ambiguities of meaning and the complexities of grammar, to name two. With the advent of the Internet and powerful computer chips, their technology made its way to the common man, warts and all. In recent years MT firms like Systran of San Diego, California, which currently supplies Google and AltaVista, and Language Weaver of Marina del Ray, California, have incorporated advances in linguistics and statistics to render texts in languages from Croatian to Mapudungun, spoken by the Mapuches in Chile. Still, these are incremental improvements, not a breakthrough.

That may soon change. In the 1990s, IBM researchers developed so-called statistical MT programs, which take identical texts in two different languages and apply statistics to “learn” how to translate between them. This method is used for high-end translation services, and has begun to trickle down to the search engines. The so-called war on terror has also unleashed funds from the U.S. Defense Department, which is anxious to improve the translation of texts in Arabic and other languages. “We’re more aware now that the rest of the world is important to us, and we now have the resources available to us,” says David Yarowsky, head of Johns Hopkins University’s machine-translation efforts. Researchers have recently made programs that can translate entire phrases rather than one word at a time and handle trickier grammar. And in the next year or two they expect big improvements in the way machines translate the names of people, places and organizations, which now cause computer hiccups.

That would leave plenty of tough problems, including ambiguity and anarchic “jumble”–jokes, creative phrasing, and colloquiallisms–that stymies the most powerful machines. But researchers are confident that machines will eventually match the quality of human translators. Yarowsky’s goal at Johns Hopkins is to develop, within the next five years, programs that can translate 100 languages to the point where a Brit could get the gist of, say, an editorial in Bengali. When those advances make their way to search engines, the Internet will truly pull the world together.