Pages in topic:   [1 2] >
How does machine translation get better?
Thread poster: Baran Keki
Baran Keki
Baran Keki  Identity Verified
Türkiye
Local time: 20:02
Member
English to Turkish
May 21, 2022

Hi,
I've been seeing a lot of posts and topics about machine translation recently. I know next to nothing about machine translation (never used it, never been interested in it), so please excuse me if my question sounds stupid or naive.
I've observed over the years that Google translate has come a long way (DeepL is not available in my language). I want to understand how machine translation gets better and accurate. Is it improved by translators themselves by using web based CAT to
... See more
Hi,
I've been seeing a lot of posts and topics about machine translation recently. I know next to nothing about machine translation (never used it, never been interested in it), so please excuse me if my question sounds stupid or naive.
I've observed over the years that Google translate has come a long way (DeepL is not available in my language). I want to understand how machine translation gets better and accurate. Is it improved by translators themselves by using web based CAT tools like Matecat? Because I know that Matecat is free and connected to Google Translate (or some other MT engine). So, is it that people using Matecat and CATs like that one are actually contributing to the 'improvement' of Machine Translation?
I remember one colleague here saying that DeepL greatly benefitted from TM Town through the leaking of hundreds of translation memories. I have no idea how that is possible.
So, who is responsible for improving machine translation? Is it the translators - by using free web based CAT tools like Matecat that seem to feed thousands of translated segments into MT engines or by doing the same through the use of MT plugins with their licensed CAT tools - or AI companies?
I hope my question makes sense, and please forgive my ignorance on the subject.

[Edited at 2022-05-21 11:54 GMT]
Collapse


expressisverbis
Matthias Brombach
 
Aranglish
Aranglish
Türkiye
English to Arabic
+ ...
My two cents May 21, 2022

If you have never used MT then it probably have used and benefitted from your translated texts already, say, when using a service like gmail to send attachments. For my language pair, English-Arabic, I have observed that Google first started scraping the web for content translated from English, the UN's website is treasure trove for these comparative texts. To get better, it relies on a reinforcement learning model involving predictive statistics and a modified version of Bayesian theorem to cla... See more
If you have never used MT then it probably have used and benefitted from your translated texts already, say, when using a service like gmail to send attachments. For my language pair, English-Arabic, I have observed that Google first started scraping the web for content translated from English, the UN's website is treasure trove for these comparative texts. To get better, it relies on a reinforcement learning model involving predictive statistics and a modified version of Bayesian theorem to classify text. This model gets better with intervention from humans. Google translate users, for example, have been teaching this model for free years on end, correcting machine translation as they go when reading inaccurate translations.

So far ML is good with general texts but still wanting when it comes to creative translation/transcreation and domain-specific texts. I am not complacent about it as it's getting better by the second and continues to tap into language idiosyncrasies and culture. Very exciting stuff ahead.

For proprietary ML, humans are certainly needed to hand over their intelligence to the machine to take care of repetitive tasks. There are data mining companies paying pennies to have collected data tagged and fed to their learning models. Data sweat shops are all the rage these days. Tapping into the collective intelligence will surely see ML getting better worldwide across cultures and markets.
Collapse


 
expressisverbis
expressisverbis
Portugal
Local time: 18:02
Member (2015)
English to Portuguese
+ ...
Translators, linguists and IT experts, I believe... May 21, 2022

Regarding your question: who feeds machine translation tools in order to improve complex linguistic algorithms?
Yes, I believe as with every other cases related to technology, machine translation is no exception and also needs in our case translators, linguists and IT experts to improve machine translation's outcome.
However, machine translation will hardly ever fully understand and interact with communication contexts and all their hidden nuances as a human does. This is my firm con
... See more
Regarding your question: who feeds machine translation tools in order to improve complex linguistic algorithms?
Yes, I believe as with every other cases related to technology, machine translation is no exception and also needs in our case translators, linguists and IT experts to improve machine translation's outcome.
However, machine translation will hardly ever fully understand and interact with communication contexts and all their hidden nuances as a human does. This is my firm conviction.
What we all need to bear in mind when discussing MT, CATs, and company is that almost every translator makes use of these sorts of software and computer resources; they often make part of their academic and professional training, and they are designed simply to to assist translators on their tasks, they don't do our job.
This is just my simple and humble opinion.
Collapse


Kevin Fulton
P.L.F. Persio
Rachel Waddington
Baran Keki
Stepan Konev
Philip Lees
 
Baran Keki
Baran Keki  Identity Verified
Türkiye
Local time: 20:02
Member
English to Turkish
TOPIC STARTER
Just want to find out May 21, 2022

I just want to find out if there is any truth to the claim/theory, which I remember hearing elsewhere, that using free web based and MT integrated CAT tools such as Matecat (perhaps Memsource?) serves to improve Machine Translation.
I'm not the most tech-savvy person in the world, but it doesn't take a genius to figure out that if you translate 100s of privacy policies or sales contracts on Matecat over 5 to 10 years, you eventually end up getting brilliant (jaw-dropping as some would say
... See more
I just want to find out if there is any truth to the claim/theory, which I remember hearing elsewhere, that using free web based and MT integrated CAT tools such as Matecat (perhaps Memsource?) serves to improve Machine Translation.
I'm not the most tech-savvy person in the world, but it doesn't take a genius to figure out that if you translate 100s of privacy policies or sales contracts on Matecat over 5 to 10 years, you eventually end up getting brilliant (jaw-dropping as some would say) results with Google Translate, or with MT in general. I really want to know, beyond doubt, if that's the case.
I've also heard from a colleague at work (while working in-house) that using MT plugins with MemoQ also results in a similar outcome, that is all the segments you translate go directly into MT and improve its algorithms. This sounds like a bit of a conspiracy theory to me, and raises the question of copyrights and intellectual property. But who knows? Maybe licensed tools like MemoQ and Trados are also doing their bit to improve MT?
Collapse


 
expressisverbis
expressisverbis
Portugal
Local time: 18:02
Member (2015)
English to Portuguese
+ ...
Copyright and intellectual property May 21, 2022

Baran Keki wrote:

This sounds like a bit of a conspiracy theory to me, and raises the question of copyrights and intellectual property. But who knows? Maybe licensed tools like MemoQ and Trados are also doing their bit to improve MT?


I'm always very careful when using MT. I only have a couple of agencies that allow me to use Language Weaver. Except those I don't use it for any other translation job-related tasks.
As far as I am aware, it is widely accepted that machine translations do not count in the matter of copyright and intellectual property, because they aren't personal intellectual creations.
That's why more and more companies are using clauses/provisions in their contracts and/or NDAs because machine-translated content can violate copyright and intellectual property.
Also, we have to consider in that question the copyright owner who is the only person to authorize a translation that will be distributed.

[Edited at 2022-05-21 14:15 GMT]


Baran Keki
 
Stepan Konev
Stepan Konev  Identity Verified
Russian Federation
Local time: 20:02
English to Russian
It is up to you May 21, 2022

I don't use DeepL or Google websites or desktop apps directly because I prefer a dedicated app for that (QTranslate), but I know that they both have options to improve your MT output.
google
In Trados, you can tick the 'Update' checkbox against their MT engine to send your translation into their database.
I use an offline MT engine, OPUS CAT MT Engine. Initially, it is built on the basis of the UN corpora. Then I fine-tune it ("train") with my own TM so that the MT engine begins to speak my language and uses terms that I use in my TM. If you know how to fine-tune the MT engine with your TM, you don't need to delete and fix terms now and then in your MT output as "seasoned translators" like to mention every time when they make MT a problem. After fine-tuning, MT just suggests your own translations from your own TM. That's it.


expressisverbis
Baran Keki
 
expressisverbis
expressisverbis
Portugal
Local time: 18:02
Member (2015)
English to Portuguese
+ ...
Exactly! May 21, 2022

Stepan Konev wrote:
In Trados, you can tick the 'Update' checkbox against their MT engine to send your translation into their database.


I forgot to mention that very important and technical function!


 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 18:02
English to French
+ ...
MT’s got layers… May 22, 2022

To answer Baran’s question, we’ve got to look under hood too.
MT have progressed du to a number of factors. The way they work has changed.
They used to be Statistical (SMT) or Rule-based (RBMT), These system work quite well for specialized content or when the Corpora used to create/train them is relatively small.
But then, Neural Nets came into play and allowed to get better results on vast amounts of training data, that’s when Neural MT (NMT) appeared. Since that, there
... See more
To answer Baran’s question, we’ve got to look under hood too.
MT have progressed du to a number of factors. The way they work has changed.
They used to be Statistical (SMT) or Rule-based (RBMT), These system work quite well for specialized content or when the Corpora used to create/train them is relatively small.
But then, Neural Nets came into play and allowed to get better results on vast amounts of training data, that’s when Neural MT (NMT) appeared. Since that, there have been all sorts of flavours of MT created like Transformer models that are quite popular.
Modern MT engines (in general not just “ModernMT”) are the result of the work of data scientist, programmers, algorithms etc. along with the collection of huge amount of text. This can be parallel corpora (bilingual) but also monolingual corpora. The initial training of an MT engine requires a lot of computer power and a lot of time (can be 24 Hours or more depending on the case).
Now, how translators are involved? Very little. Most MT developers rely on existing corpora available for free on the web. Then they use special techniques (back translating, weighing words etc.) to fine-tune their engine.
Some will then use a few translators to flag errors so they can further tweak their MT engine.
Some companies will have kept TMs (i.e. automobile manufacturers) for a decade and decide to create in-house an MT engine. That will rely on all the work done by translators over the years.
Regarding MT engines available to the public at large, the current trend is to try to adapt the output to suit the translator’s work by the use of glossaries or the use of the translator’s translation/editing. So in most cases, the MT engine only sees the source text and not the final version, so it’s not improved by your work.
If you use adaptive functions then the MT engine will be “steered” to use your vocabulary/past translations BUT ONLY on a top layer NOT ON the core engine training (as I explained, this process needs time and processing power). I demonstrate this in a video on my channel showing how adaptive features work in ModernMT.
So, data being the new gold, you need to think smart about where you put your data (TM etc.) but, unless you use adaptive functions, your work is not sent to the MT, only the source is. That’s how it’s supposed to be anyway.
Good topic Baran! It was an interesting question. 😊
Collapse


expressisverbis
Rachel Waddington
Stepan Konev
Philippe Etienne
Cecilia Yalangozian
 
Metin Demirel
Metin Demirel  Identity Verified
Türkiye
Local time: 20:02
Member (2018)
Italian to Turkish
+ ...
Human and machine collaboration May 22, 2022

I know of a project, where separate teams of translators are employed. While one group provides 2 separate human translations for each segment, the other teams edits MT. The end goal, I believe, is to teach the machine to translate like a human being.

LIZ LI
 
Baran Keki
Baran Keki  Identity Verified
Türkiye
Local time: 20:02
Member
English to Turkish
TOPIC STARTER
Thank you Philippe May 22, 2022

Thank you for your comprehensive reply and explaining things slightly in layman's terms, though I've got to admit it's still kind of over my head.
So this notion I have that translators using free online CAT tools and feeding their translations into MT databases/engines, and thereby digging their/our own graves is a groundless one?
But, surely, whatever they're producing in target segments that go to the MT database should add to that corpora you're talking about? They (translated s
... See more
Thank you for your comprehensive reply and explaining things slightly in layman's terms, though I've got to admit it's still kind of over my head.
So this notion I have that translators using free online CAT tools and feeding their translations into MT databases/engines, and thereby digging their/our own graves is a groundless one?
But, surely, whatever they're producing in target segments that go to the MT database should add to that corpora you're talking about? They (translated segments) become available on the internet for exploitation, do they not?
I occasionally check Google Translate, and I'm fairly impressed by how good it is with certain texts, such as privacy policies, user manuals, agreements etc.
Not being a niche subject translator, I have a cause for concern.
What about the role of PMTE? Doesn't that serve to fine-tune MT engines?

[Edited at 2022-05-22 18:32 GMT]
Collapse


 
Lieven Malaise
Lieven Malaise
Belgium
Local time: 19:02
Member (2020)
French to Dutch
+ ...
No expert, but... May 23, 2022

I'm no expert, but I use Deepl (a paid version by the way, so that alone should guarantee your input isn't saved and used) and I've come across job offers for translators several times. So I'm very sure humans are used to edit their material to improve it.

Philippe Locquet
expressisverbis
Philippe Etienne
Sylvia Hatzl
 
LIZ LI
LIZ LI  Identity Verified
China
Local time: 01:02
French to Chinese
+ ...
Won't say no to MT May 23, 2022

Imagine us coaching different algorithms to tango.

MT could be helpful for experienced translators, but the opium for young translators & our profession.
We can't be faster than a machine, and we aren't able to have its memory, we can't even beat an elephant...
But we're still jurors in the jury panel.


Multiverse Solutions s.r.o. (X)
 
Mr. Satan (X)
Mr. Satan (X)
English to Indonesian
@Philippe Locquet May 23, 2022

Hi, Philippe.

Do you mind sharing some papers explaining about this in more details? That sounds like an interesting read for this weekend.

Cheers, bud.


expressisverbis
Philippe Locquet
 
Christopher Schröder
Christopher Schröder
United Kingdom
Member (2011)
Swedish to English
+ ...
This is all very vague... May 23, 2022

If I used MT, I would want to know FOR CERTAIN whether my translations were being used to train the MT engine or for any other purpose, all of which would probably upset my clients as much as me.

I do not understand how the MT and CAT companies manage not to be completely transparent about this.


Baran Keki
Christine Andersen
 
Christopher Schröder
Christopher Schröder
United Kingdom
Member (2011)
Swedish to English
+ ...
Vicious circle May 23, 2022

Given that most MT won't be edited to the highest standards, isn't MT just going to eat itself?

Anton Konashenok
Christine Andersen
Metin Demirel
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How does machine translation get better?






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »