The role of CAT tools in Patent Translation
Tesi di laurea in Laboratorio CAT
Serena CUSCIANNA
UniversitĂ del Salento, A.A. 2018/2019
Extracts (missing parts *) from:
5. ANALYSIS OF THE TRANSLATION OF A PATENT WITH A CAT TOOL
In this chapter of this dissertation, a range of analyses and evaluations of translated patents will be carried out based on my internship experience. The chapter will also present the steps of patent translation management used at the host translation company (Global Voices Ltd.). Subsequently, the translation process will be explained in detail in order to explicitly indicate the stages in which I was involved as a translator. To concretely show how a CAT system can be of great help to the patent translator, a patent will be selected from those I have translated. The type and degree of intervention by the translator after the MT, and the patent management in SDL Trados Studio, will be evaluated.
5.1 The patent translation management process in a flowchart
Among the many language services offered, Global Voices provides professional assistance in the translation of intellectual property, including patents. In order to guarantee high quality and fast service, the company works with the help of two key tools: SDL Trados Studio, as the CAT tool, and DeepL, as the external MT application (exploited as a plug-in component in Trados Studio). During the first weeks of my internship, after a period of job-shadowing and before the company granted me a Trados license to activate on my personal computer, I helped colleagues with their assigned patent translation projects. However, since I did not have access to Trados, my translation activity was limited to using word processors such as Microsoft Word. Once the license was granted, the positive contribution provided by SDL Trados Studio was immediately recognizable in the increased fluidity of the workflow compared to CAT tool-free standards. In the translation company where I worked as linguist, the workflow is divided into phases involving different professional figures: account managers, project managers, and linguists.
In phase 1, an Account Manager (AM) has the first contact with the client; after agreeing on the quote with the client, he/she receives the patent to be translated, and processes it with Trados. In this phase, a first TM – previously supplied by the client – is associated. With the support of the TM and the DeepL plug-in (which allows the DeepL MT tool to be used directly within the Trados environment), a pre- translation of each TU of the document is carried out. Next, in phase 2, the patent passes to a Project Manager (PM) who is responsible for allocating the project to in-house or external linguists, depending on the languages involved, the subject field, and the availability of the chosen linguist. Phase 3 involves at least two linguists: one translator and one proofreader. At the end of the translation process, which will be described in more detail in the following paragraphs, the translator sends the document to the proofreader. As stated by Mossop (2001, p. 149), a proofreader should make as little intervention as necessary, taking into account the expected usage and the receivers of the translated text. The proofreader should always be able to justify his choices by referring to reliable sources. Therefore, he should never ask himself whether the translation can be improved, but whether it needs to be improved (Mossop, 2001, p. 149). In the translation company where my internship took place, linguists have to comply with a Key Performance Indicator (KPI) set by the company, corresponding to 7,000 post-edited words per day. Patents have variable lengths, from a minimum of about 8,000 words to a maximum of about 100,000 words. New translation projects, whether patent-related or not, are assigned to each linguist every day. In addition, in a workday, it is also requested to in-house linguists to revise and fix the translations carried out by external linguists. Due to the high workload and the length of many documents, it is not always possible to review patents in their entirety. Often, the proofreaders simply check the most important parts of a patent document (i.e., the claims) (see chap. 1.2). Next, the translator is sent the revised document, and he or she decides whether or not to apply the suggested changes. The final version of the translation is sent to the PM, who generates one or more editable target files for desktop publishing (DTP). The workflow ends with the generation of a PDF target file by the AM and the delivery of the finished product to the client, upon the issue of the corresponding invoice. The handovers from one department to another would have been much more difficult and would have taken a considerably longer time without the use of an advanced CAT tool such as Trados. The creation of project packages makes it easier to transfer translation files while minimizing the risk of unintentional alterations in the text.
Figure 9 provides an overview of the patent translation management process in the form of a flowchart that summarizes the phases anticipated above and organizes them depending on the professional figure involved. For the creation of this flowchart, symbols from the standard workflow/flowchart notation have been used, as explained in Figure 8. Please note that a rectangle with rounded corners has been used for the action symbol, instead of the regular rectangular shape as this slightly changed notation is allowed by the standard and it is more readable.
FIG. 8 – Symbol legend (Source: Gliffy)
FIG. 9- Flow chart of the patent translation management process
As a linguist, I was involved in two of the previously mentioned tasks: the translation of new patents, and the review of the patent claims translated by my colleagues. As far as the translation process is concerned, once I accepted the project and received the project package with all the original files, I imported the package into the Trados work environment. At this stage of the process, all TUs had a pre-translation in the TL. Before I started working on the text, I ran TM and TB. With the first project, I was sent a TM already populated with hundreds of TU from translations produced for the same client in previous months. Consulting that TM often avoided conducting further terminology research, as the TM often provided an already accepted translation, or provided a starting point for more targeted researches. Whenever Trados reported a match with associated TMs, the procedure to follow varied based on the percentage of the match itself. In the case of perfect matches or context matches (100-101%), I applied the translation suggested by the TM, after a quick check. In case the matches were fuzzy matches between 99% and 70%, I assessed the type of intervention to be carried out, e.g. whether a simple punctuation change or a terminological change. If no match was detected, I post-edited the whole pre-translation performed by DeepL.
As far as terminology is concerned, creating TBs organized by project proved particularly useful. The texts were very long and often required several days of work, so it was essential to have a tool able to suggest the correct translation of a technical term in order to ensure text consistency and to drastically reduce work time. I preferred to populate different TBs by project, rather than by topic, because the same term was often used with different meanings in different patents and depending on context. On the contrary, having a different TB for each project allowed me to be more confident in my translations and to limit terminology research to the first occurrence only. Once I had finished my translation and sent it to the proofreader, I received the suggestions for changes from the proofreader and revised the translation accordingly. Finally, I generated the return packages (i.e., packages containing, in the form of zipped archives, the translated/reviewed files), to be sent to PMs. Figure 10 offers an overview of the translation process in the form of a flowchart which summarizes the phases just described. In Figure 10, the graphical notation presented in Figure 8 is applied.
FIG. 10 – Flowchart of the translation process
It should be stressed that the combination of a MT engine and a CAT tool is especially functional in patent translation as it facilitates the translation of technical-scientific content with a high degree of repetitiveness while minimizing the number of any alterations in the translation. The benefits of this approach will be described in the following section.
5.2 Case study: translation of patent EP 2 600 235
Among the European patents translated during the internship, European Patent No. 2600 235 (2019) has been selected for the present analysis inasmuch its DeepL pre-translation gave the highest number of matches. For this reason, this patent is well suited to a more in-depth evaluation based on the quality of MT output and the type of intervention implemented by the human translator using a CAT tool.
That patent was granted on May 31, 2019, to its applicant, that is to say, the LG Electronics Inc., the multinational electronics company headquartered in Seoul, South Korea. As required by the application procedure, the patent was drafted in one of the official languages of the EPO (i.e., English) and, in order to be validated in the different Countries designated by the applicant, it had to be translated into their national languages, Italian in this specific case. The patent relates to a “mobile terminal and controlling method thereof” (patent title), and according to the IPC it qualifies as an “electric digital data processing” (G06F), with “interaction techniques based on graphical user interfaces […], based on specific properties of the displayed interaction object or a metaphor-based environment […], for the control of specific functions or operations […], using a touch-screen or digitizer […].” (G06F3/0481, G06F3/0484, G06F3/0488).
As regards the structure of the text, the patent has all the structural elements mentioned in section 1.2. The only missing element is the abstract. After the cover page, there is a section named “Description”, which is divided into several parts:
- “Background of the invention”: subdivided in “Field of the invention” (par. [0001]), and “Discussion of the Related Art” (par. [0002]-[0007]);
- “Summary of the invention” (par. [0008]-[0013]);
- “Brief description of the drawings” (par. [0014]);
- “Detailed description of the invention” (par. [0015]-[0201]).
The last section of the patent consists of the claims, also translated into German and French, as established by Article no. 14 clause 6 of the EPC: “Specifications of European patents shall be published in the language of the proceedings and shall include a translation of the claims in the other two official languages of the European Patent Office”. Finally, pages 22 to 39 include the attachments, i.e., the drawings described in the text.
5.2.1 Matches overview
Once the Trados package has been received from the PM, during the import procedure in Trados, it is possible to get the first useful information about the project from the “Review Package Contents” window (FIG. 11). For example, it is possible to know the total words of the project (in this case, 15812), the words not yet translated (in this case, 15745) and, consequently, the words already translated and confirmed (in this case, 67). By clicking on the “Report View” you can read the “Analyze Files Report” in order to assess the time a translator may needed to translate the file based on information about the number and type of translation memory matches found. For this specific project, the most relevant settings are as follows: the minimum match value was 70%, the search mode was set on the option “Use best matches from all translation sources”, and the minimum words for a match was 2 on the whole TU. Furthermore, in Table 10, information about matches, in terms of the number of words, is shown. I preferred to use the number of segments as a unit of measurement indicating the value of each type of match, in order to maintain the same counting and archiving method used by TMs.
FIG. 11 – Review Package Contents window
According to the analysis report, although more than half of the text has no matches, there is a fairly high number of repetitions (23 segments if we consider only the first occurrence, but the number rises to 93 if we consider the total repetitions), as well as matches between 95-99% (59 segments), 75-84% (34 segments) and 50-74% (23 matches). As stated above, the project has been set with a minimum match value of 70%. With lower percentages, indeed, the required intervention by the human translator would be massive, thus rendering the TM results of no help. Figure 12 is a pie chart illustrating an overview of the match types detected by Trados using the number of segments as a unit of measurement. The same matches grouping methodology used by Trados has been maintained.
*
As shown in the chart, only 21.27 % of the text has some form of match. However, almost half of the matches identified correspond to matches requiring minimal intervention by the translator (8.71% of matches between 95-99%) or no intervention at all (1.48% Perfect Match and 0.44% Context Match). The remaining half of the segments with matches require greater intervention by the translator, sometimes even significant, as in the case of matches between 70-74% which correspond to 3.4% of the total number of segments.
5.2.2 At each patent section its matches
If the distribution of matches within the text is analyzed according to the different sections that are part of the patent document, it is possible to point out that the majority of matches (almost 90%) occur in the macro-section of the “Description”, in particular in the “Detailed description of the invention”, which also corresponds to the longest section of the patent.
*
According to Table 11, the “Background of the invention” section has segments that can be accepted by the translator with minimum intervention. Most of them are titles at the beginning of a paragraph. This element confirms what was stated in the previous chapters (see chaps. 1.2 and 2.3): the patent document has a standard structure and is characterized by a high degree of repetitiveness. Consequently, even the titles of the sections undergo minimal variations from one patent to another. In the “Summary of the invention” section, the matches correspond to the opening and closing formulas of the paragraph. The fact that Trados has identified high-percentage matches within the TM associated with the project confirms stylistic rigidity in patents. In the “Brief description of the drawings” section, the number of matches is proportional to an increase in paragraph’s length. In this case, most matches have a rate of 95-99%. The reason behind such data lies in the typical structure of this paragraph and in how a CAT tool usually divides a text into segments (as stated in Chapter 3, punctuation represents the first segmentation rule in a text). The paragraph concerning the description of the drawings often consists of a list where the drawings are explained one at a time by the expression “Fig./Figs [x]. is/are”. As a result, Trados often tends to divide the sentence, creating a first segment in which only the abbreviation and reference number appear, followed by a second segment providing the rest of the explanation.
As previously mentioned, the “Detailed description of the invention” section includes the highest number of matches (105). The largest group is made up of matches in the 95-99% range. Scrolling through the segments in Trados Edit View, one immediately notices that the matches with the highest percentages are segments corresponding to the beginning of a new paragraph. The content of these segments is not strictly related to the object of the patent, but it refers to the syntactical organization of the description of the invention. For example, a recurrent segment is “referring to Fig.” or “with reference to Fig.”. Finally, the “Claims” section, while representing only a small part of the total number of segments, has 16 match cases, of which 13 have a 76% match. Although this is not a high percentage, the translator can gain a great advantage from it. Each segment featuring this match, indeed, is an independent claim and uses the same phrase: “The mobile terminal of claim”. The only difference is the number of the claim referred to. As soon as the translator changes the first occurrence, Trados will automatically propagate the translation with the correct reference number. As a result, all 76% fuzzy matches will become 100% exact matches. Therefore, the translator will be able to continue the translation of the dependent claims more quickly.
5.2.3 Match examples
The percentage of a match is inversely proportional to the degree of the translator’s intervention: the lower the percentage, the more changes the translator will have to make, and vice versa (see chap. 3.2.1). In order to prove that for the same class of match the type of intervention required is similar, some TU samples from the text have been selected. For each group of matches, the following stages will be analyzed: the original SL segment, the SL segment retrieved from the TM with a certain established match, the TL segment, and the final TL translation following the human intervention. Six groups of matches have been identified: 70-74%, 75-84%, 85-94%, 95- 99%, 100%, 101%. Finally, TUs were also selected where there was no need for any translation adjustments after MT. As the examples are presented, we will progress from cases where a more “invasive” intervention by the translator is necessary, to cases where the translator can be more confident in the proposed match and the resulting MT output.
Table 12 provides two examples belonging to the 70-74% group: example [1] corresponds to a 70% match case, while example [2] is a 74% match case. The minimum threshold accepted according to the project settings is 70%, so translator intervention is important. In example [1], the SL segment, indeed, shares only the words display, display (output) information, and mobile terminal with the matching segment in the TM. With such a low matching percentage, even MT does not produce a reliable result. If you compare the segment after MT and the corresponding final translation produced and accepted by the translator, it is evident that the translator practically had to type a translation from scratch. Therefore, in this case, the translator did not benefit from the presence of a TM and MT.
In example [2], the match is slightly higher: 74%. Although the percentage difference between example [1] and example [2] is minimal, in the latter, the elements in common between the SL segment and the associated segment in the TM are more significant. The segment extracted from the TM differs from the original SL segment in the following aspects: the absence of the verb receives, the substitution of the preposition of with such as, the substitution of the relative clause with the past participle of the verb to obtain, the substitution of the term photographing with its synonym capturing. Nevertheless, these differences are easily identifiable and do not drastically edit the general meaning of the segment. The translator can change the MT output with basic corrections.
*
Table 13 provides two examples belonging to the 75-84% group: example [3] corresponds to a 80% match case, while example [4] is a 84% match case. Example [3] is not very different from the 74% match example presented in Table 12. In this case, the segment has a number of words equal to half the number of words in the previous case, so the differences between the original SL segment and the segment identified in the TM are also smaller. The latter differs from the original SL segment in the following aspects: the substitution of the past participle of the verb associate with the past participle of the verb relate, and the substitution of the active form of the verb exist with the passive form be implemented. Also in this case, the intervention of the translator requires simple change. In example [4], the match is slightly higher: 84%. Apart from the different reference number of the paragraph, the sentence structure in the original SL segment is the same as in the segment extracted from the TM: an action corresponds to a reaction. In this case, the MT produced a good output. The translator had to make explicit reference in Italian to the singular and plural form of the word signal, which in the SL was rendered with the s in brackets, and to ensure consistency with the verb.
*
Table 14 provides two examples belonging to the 85-94% group: example [5] corresponds to an 88% match case, while example [6] is a 93% match case. In example [5], the text of the SL segment is characterized by a sequence of technical terms belonging to the field of IT. The same sequence, in a virtually unchanged order, occurs in another segment within the TM, thus producing a very good MT translation output. In this way, the translator was able to drastically reduce the time spent searching for equivalent terminology, and complete the translation of the segment more quickly after light punctuation corrections.
In example [6], the correspondence between the SL segment and the TM segment emerges more clearly. In this case, the textual difference lies in one word: examples (SL segment) vs. preferred embodiments (TM suggested segment). Once the difference, clearly indicated by Trados, has been identified, the translator can easily intervene and benefit from a translation already performed at 93%.
*
Table 15 provides two examples belonging to the 95-99% group: example [7] corresponds to a 95% match case, while example [8] is a 99% match case. In example [7], the difference lies in the modal used. Nevertheless, the MT has produced output with an incorrect paragraph reference number. However, the SL segment has no number in the text. This is a demonstration of how the translator’s attention should never decrease during the translation phase. Although matches can have a high percentage and the MT can produce good outputs, we must not forget that a computer- aided translation is being performed, so the machine must only be an aid, and the human being must always keep the machine’s work under control.
Example [8] is a 99% match. Generally, matches with such high percentages only have differences in formatting or punctuation (see 3.2.1). For example, in this case, it is necessary to replace the semicolon with the two dots, so as to respect the original punctuation. Once again, the translator’s attention is put to the test. The output of the MT, indeed, has an overabundant comma after the mobile terminal reference number.
*
Table 16 provides an example [9] of exact match (i.e., 100%). When Trados automatically confirms the translation of consecutive 100% matches due to the auto-propagation process performed within the same document, the translator may miss some segments. However, for the reasons mentioned above, the translator must check each segment individually, albeit quickly in case of exact matches.
*
Table 17 provides two examples of context match (i.e., 101%). In Trados, the CM is detected taking into account the segment preceding the one currently selected (see 3.2.1). For this reason, example [10] and example [11] are two consecutive segments selected in the text. Generally, the CM must be preceded by an exact match in order to be detected. However, in this case, it is preceded by a 99% match. The percentage value lower than 100% is caused by a different paragraph reference number between the SL segment and the segment extracted from the TM. Nevertheless, this difference does not affect the text content and Trados automatically detects and corrects the mismatch, thus making the segment a 100% match.
*
Sometimes, even if no-match is involved, the MT produces an output that does not require any further modifications by the translator, other than regular checks. Table 18 provides two examples of segments whose auto- propagated translation (in Trados, referred to as AT, Automated Translation) has been accepted by the translator without any modifications.
*
In other no-match cases, where the MT output needed to be reworked, in-depth terminology research was conducted by consulting generic and specialist online dictionaries and glossaries.