The Use of Machine Translation in eDiscovery

Written By
Oasis Discovery
Oasis Discovery

Talking Legal Tech with Bart Holladay of Linguistic Systems Inc.

This article covers some interesting takeaways I learned during a conversation with Bart Holladay, a Language Technology & Services Consultant at Linguistic Systems, Inc. It’s not intended to be a summary of our conversation, nor am I attempting to editorialize his comments. To listen to the entire discussion, visit thedatfile.com

That said…

Read on if you want to know:

  • Why machine and auto translators can be risky
  • The breakthroughs that got them to usability
  • Where the future of algorithms will take your firm

Can we Trust the Machines to Tell it to us Straight?

I’ve been on the technology side of legal for almost 20 years. Early on, I learned to start back-peddling right out of the gate when it comes to machine translation of foreign language documents. “Machine translation is not perfect, so lower your expectations,” I’d find myself telling clients, “also, it’s expensive.”

For as long as I can remember I’ve heard customers land somewhere between “it worked just fine” and “it was pretty rough” on the satisfaction scale. Basically, machine translations are better than nothing, but not by much.

But, maybe it’s time to rethink this notion. Technology is changing the way we drive cars, buy food, watch movies – everything. So why wouldn’t it be getting better at translating languages? I sat down with Bart Holladay to get an update on where machine translation is today, how it got here, and where it’s going.

The evolution of machine translation

Phase 1: The 1940’s: Basic Translation

One of the first problems the inventors of the computer tackled was translation. It seems like the perfect task for a computer. After all, language has clear rules. What’s so hard about teaching a computer that cat = ネコ? Boom. Your computer should now speak fluent Japanese.

But that didn’t work, because there are too many exceptions. Context means everything, language evolves, regional differences matter, and slang is more prevalent than you think.

What, exactly, is the translation for, “I’ve had it up to here”? If you ever excelled in a high school language class only to be baffled by a native speaker, you can relate to the computer’s plight.

Phase 2: The 1990s: Statistical Translation

Eventually, some genius named Kevin Knight of Language Weaver started applying statistics to deal with exceptions. Here is a gross oversimplification of the change – the old model: . the statistical model: this = that, that, or that.

Not only were multiple translation options considered for each source-word, but the actual source word was now considered in context. If words co-occurred with other words, those co-occurrences would be considered when evaluating the options. Because sometimes you need the words around a word to determine what a word means.

This is called Statistical Machine Translation, and it was a huge breakthrough. The translation tools at Google and Bing are both based this technology.

bart holladay Linguistic Systems

Back on the topic Legal.

Statistical Machine Translation (SMT) is pretty incredible, but if you’ve ever used Google Translate, you know it’s clunky. It’s not how people actually talk. It’s very useful, but you can almost always tell that the translation is computer-generated.

And this is where legal struggles. MT is great for general purposes, but 90% accuracy doesn’t cut it when you’re making privilege calls. The text in the reviewer’s hands needs to reflect the speaker’s original intent, and that’s still a lofty goal for MT.

“It was postulated in the 40’s, and could never really even be tested until computers got to the IC chip speed that they got to in about 2010. Lo and behold, by 2014, these guys write the paper using neural network processing to do what the statistical processing was doing before.”
Bart Holladay, Linguistic Systems, Inc.

Phase 3: Now and Beyond

The latest innovation in the translation space are neural networks, systems designed like a brain and backed by processors so fast they were once unthinkable. The computing power available today is growing exponentially, and this powerful network of “nodes” can perform incredible amounts of cross-reference checks before the translation output is selected. Before it translates the word “cat,” it will use a network of super-computers that check sources, find similar examples and see context at warp speed,

Where does that leave eDiscovery?

“That readability and that fluency of the output spells usability in eDiscovery. It’s usable as-is for early case assessment, some of the stuff is usable as a substitute for the foreign language to run analytics on. For review? Yeah we’re seeing simple reviews, with a little consulting, sometimes as-is will work.”
Bart Holladay, Linguistic Systems, Inc.

Today’s machine translations are good for:

  • Early Case Assessment. Getting a general idea of the documents, key issues, etc.
  • Predictive coding and analytics. The statistical approach to machine translation is similar to the indexing methods used by predictive coding engines. They’re language agnostic, but need consistency – MT works just fine.
  • 1st pass review – Simple documents? Yes. Complicated documents? Eh, not just yet. But it’s getting there.

With these ongoing improvements, the output of MT will be good enough for reviewers to make quality decisions on documents within the next few years. After that? The future may as well be a Star Trek episode. Fingers crossed.

Stay on the Edge of Legal Technology.

Subscribe to News & Updates from Oasis.

The Use of Machine Translation in eDiscovery


Oasis Logo

Just A Little More Information:

Oasis Logo