Since most competitors to Google offerings aren't going to have a hugely profitable core business with which to fund all the data collection and normalization that goes into building a high quality ML system, the future for poorly capitalized competitors to compete seems bleak to me. This seems to support some of the growing rumblings about enforcing antitrust laws against the large tech companies.
Edit: better, not bigger.
In this case, the author claims pretty good accuracy, almost on par with Google Brain's!
On my test set of 3,000 sentences, the translator obtained a BLEU score of 0.39. This score is the benchmark scoring system used in machine translation, and the current best I could find in English to French is around 0.42 (set by some smart folks as Google Brain). So, not bad.The basic idea is to use word vector embeddings to build a source<->target dictionary, then combine this with a language recognition model to iteratively bootstrap a set of source<->target training examples for use with a conventional ML approach.
To that, though, I'm definitely not holding my breath :)
Also people make the assumption that as soon as we make strong AI comparable to a human we will be to translate anything and everything (let's say we are excluding the last mile for arguments sake). That assumption ignores an important fact that sometimes translation is a team effort where certain words, phrases or concepts are debated among multiple translators to reach a consensus. It's not always done by a single intelligence.
Some people might argue that's because people have far more limited capacity to consider all the examples in the corpus whereas a machine can consider all of lightning fast and thus can arrive at the right answer.
A perfect edge case that illustrates why that doesn't matter and where multiple human intelligences will often grapple with how something should be translated would be what name to give to a movie you are translating to an international audience. The same movie often has quite different names depending on which language it gets translated into. There isn't actually a correct answer there is just answers that are deemed 'good enough'.
Nature was able to do this. Sure, it took a couple billion years of evolution to get to this point, but it is doable. I'm betting that the chances of us inventing Strong AI within the next 100 years is almost a near certainty.
Modern neural translation techniques don’t impose a 1:1 mapping on translations, and this works reasonably well between major European languages (English/German/French/Spainish/Italian) where there are large cross language corpas and large monolingual corpa available. In these languages you’ll often get single words translated to short phrases.
It’s true that this doesn’t solve the Japanese examples you give below, but these seems a question of degree rather than seeming impossible.
The problem most people seem to have understanding this is two-fold:
1. They assume if you just had more data and better algorithms you could get better results.
2. They have never translated things themselves and come across a case where something didn't have a translation.
Remove the machines from the equation entirely. It's not always possible in 100% of cases for people to do it.
Naturally linguistically similar languages have more overlap and hence better success overall but that's really just a nice to have.
No matter how similar English and French are, if you ask someone to translate a meme that started in 4chan or Reddit into French you will quickly encounter a case where attempting to do so just doesn't work. I'm sure there are plenty of better examples than that but I don't know French.
It's an elephant in the room and stunningly few people seem to see it standing there.
それは部屋の象で、驚くほど少数の人々がそこに立っているのを見ているようです。
Lol Google really??
In fact case in point 'elephant in the room' if it were used inside a joke that relied on using the elephant as part of the joke would not be possible to translate. It just wouldn't make sense.
I remember watching a program back in high school subtitling music videos with their lyrics machine translated from English to Hungarian. The absurdity was indeed hilarious for a brief period.