Help concerning bilingual aligned texts (Corpus)
Thread poster: GabLuz
GabLuz
GabLuz
Local time: 16:06
English to Portuguese
Jul 17, 2007

Hi, folks.

I have about 60 Gb of txt file texts to be aligned (english to portuguese, portuguese to english, japanese to english, japanese to portuguese) and I'm really thinking about indexing all these stuff but I don't know how I should proceed!
I tried Google Desktop but it's really annoying because it doesn't show "exactly" what I need and how I need it.
The best solution I've ever could think of was a Corpus system such COMPARA but I want to make my own Corpus datab
... See more
Hi, folks.

I have about 60 Gb of txt file texts to be aligned (english to portuguese, portuguese to english, japanese to english, japanese to portuguese) and I'm really thinking about indexing all these stuff but I don't know how I should proceed!
I tried Google Desktop but it's really annoying because it doesn't show "exactly" what I need and how I need it.
The best solution I've ever could think of was a Corpus system such COMPARA but I want to make my own Corpus database and offline. I'm trying my best but I can't find an useful one.

I really want something like this: http://www.linguateca.pt/COMPARA/Welcome.html

Here I just have to type the text and all results are displayed. That's quick and simple!

I already tried:
- Google Desktop (it works but it requires manual searching);
- mkAlign (almost there! but it really needs A LOT OF improving!);
And many others...

Yes, I have a really fast computer (for me, it's enough).

I got a Desktop PC with all these specs:
AMD Athlon 3200+, 512 Mb Ram, 250 Gb 7200 RPM HDD.

If anybody knows a similar tool, just let me know.
I hope I'm not destroying any forum rules.
Collapse


 
Vito Smolej
Vito Smolej
Germany
Local time: 21:06
Member (2004)
English to Slovenian
+ ...
SITE LOCALIZER
two issues in one mail Jul 18, 2007

o aligning 60 Gb - my maximum so far has been a few Mb so I cant really tell...
o mining the aligned material (aka using the translation memory).

The important thing is to have the result (whatever its size) in TMX (i.e. Translation Memory eXchange) format, then you have the mining decoupled from the first part of the job.

In any case I have yet to hear of a TM application that can handle 60 Gb of stuff. Theres something in Canada/Nepean/Montreal (?) that goes i
... See more
o aligning 60 Gb - my maximum so far has been a few Mb so I cant really tell...
o mining the aligned material (aka using the translation memory).

The important thing is to have the result (whatever its size) in TMX (i.e. Translation Memory eXchange) format, then you have the mining decoupled from the first part of the job.

In any case I have yet to hear of a TM application that can handle 60 Gb of stuff. Theres something in Canada/Nepean/Montreal (?) that goes in this direction. But I honestly cant remenber the name.

You can start with OmegaT and aligning tools accompanying it (see wikipedia for OmegaT).

Also good for aligning could be +Tools
http://www.global-tm.net/index.php?whichpage=plustools&lang=engb

Keep us posted

regards

Vito
Collapse


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Help concerning bilingual aligned texts (Corpus)






Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »