Tags spanning several segments / put everything between two strings in a single segment
Thread poster: CAK
CAK
CAK  Identity Verified
Germany
Local time: 07:57
English to German
+ ...
Aug 20, 2018

I'm trying to get parts of a text that are enclosed by {} and don't need translation out of the way. This can be words, sentences or several paragraphs. I'd prefer not having to change the original document.

Using custom tags and regex that works fine for words and sentences, as long as the content is part of a single segment. However the algorithm doesn't seem to consider any content spanning more than one segment. At least I couldn't get it to work using the multiline switch/mode.
... See more
I'm trying to get parts of a text that are enclosed by {} and don't need translation out of the way. This can be words, sentences or several paragraphs. I'd prefer not having to change the original document.

Using custom tags and regex that works fine for words and sentences, as long as the content is part of a single segment. However the algorithm doesn't seem to consider any content spanning more than one segment. At least I couldn't get it to work using the multiline switch/mode. Is my assumption correct that this is not possible? I'm not very good at using regular expressions, I'm afraid.

Alternatively I tried to create segmentation exceptions, with very little success. I managed to ignore the first period between two sentences after the opening bracket (or before the closing bracket, depending on using greedy or lazy matching and conditions don't seem to be supported) and had no success with line breaks whatsoever. Is this possible to do with segmentation rules at all?
Would it be possible with adjusting file filters?

[Edited at 2018-08-20 10:11 GMT]

[Edited at 2018-08-20 12:43 GMT]
Collapse


 
Samuel Murray
Samuel Murray  Identity Verified
Netherlands
Local time: 07:57
Member (2006)
English to Afrikaans
+ ...
@CAK Aug 20, 2018

CAK wrote:
I'm trying to get parts of a text that are enclosed by {} and don't need translation out of the way. This can be words, sentences or several paragraphs. I'd prefer not having to change the original document.


Do you mind changing the original document if the change is 100% reversible?

For example, if you were to replace all spaces that were between curly brackets with e.g. " ###" (space plus ###) (assuming that "###" does not occur anywhere else in your file), then at least you would be able to identify that text within OmegaT even though you'd still see it. For example, {The rain in Spain.} would become {The ###rain ###in ###Spain.}. Then afterwards you can just delete all ### from the target file.

What kind of a file is it -- is it a plain text file, or an MS Word file, or what? Do you have access to MS Word, by the way? Or, what kind of a text editor do you have?


 
CAK
CAK  Identity Verified
Germany
Local time: 07:57
English to German
+ ...
TOPIC STARTER
Title Aug 21, 2018

@ Samuel Murray

Thanks for your reply!
I do have MS Word and the files are of various origin. I'd prefer to not save in a text editor at all, since there are all kinds of Word clones out there with slight incompatibilities. As I understand it, OmegaT does resave the file, but doesn't touch the metadata and formatting at all or just minimally.
But then again if there is no other way, it would at least be helpful to know how to do it and even having just an optical marker
... See more
@ Samuel Murray

Thanks for your reply!
I do have MS Word and the files are of various origin. I'd prefer to not save in a text editor at all, since there are all kinds of Word clones out there with slight incompatibilities. As I understand it, OmegaT does resave the file, but doesn't touch the metadata and formatting at all or just minimally.
But then again if there is no other way, it would at least be helpful to know how to do it and even having just an optical marker isn't something I had thought about, so thanks for that.

I wonder If I could replace line breaks etc. in Word and be able to get the old formatting back without problems.
Collapse


 
Didier Briel
Didier Briel  Identity Verified
France
Local time: 07:57
English to French
+ ...
Custom tags only work one segment at a time Aug 27, 2018

CAK wrote:

I'm trying to get parts of a text that are enclosed by {} and don't need translation out of the way. This can be words, sentences or several paragraphs. I'd prefer not having to change the original document.

Using custom tags and regex that works fine for words and sentences, as long as the content is part of a single segment. However the algorithm doesn't seem to consider any content spanning more than one segment. At least I couldn't get it to work using the multiline switch/mode. Is my assumption correct that this is not possible?

This is correct. Custom tags only work one segment at a time.

Alternatively I tried to create segmentation exceptions, with very little success. I managed to ignore the first period between two sentences after the opening bracket (or before the closing bracket, depending on using greedy or lazy matching and conditions don't seem to be supported) and had no success with line breaks whatsoever. Is this possible to do with segmentation rules at all?
Would it be possible with adjusting file filters?

What is considered a paragraph depends on the file filter. This cannot be changed by segmentation rules.
For some of the filters (e.g., the Text filter, the HTML filter or the OpenXML filter), you can use some options to change what starts a new paragraph.

Didier


 
CAK
CAK  Identity Verified
Germany
Local time: 07:57
English to German
+ ...
TOPIC STARTER
Title Aug 27, 2018

Thanks for the clarification, Didier!
I'll try to look into file filters, then.

[Edited at 2018-08-27 14:05 GMT]


 


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


Tags spanning several segments / put everything between two strings in a single segment






Wordfast Pro
Translation Memory Software for Any Platform

Exclusive discount for ProZ.com users! Save over 13% when purchasing Wordfast Pro through ProZ.com. Wordfast is the world's #1 provider of platform-independent Translation Memory software. Consistently ranked the most user-friendly and highest value

Buy now! »
Trados Business Manager Lite
Create customer quotes and invoices from within Trados Studio

Trados Business Manager Lite helps to simplify and speed up some of the daily tasks, such as invoicing and reporting, associated with running your freelance translation business.

More info »