{"id":3639,"date":"2024-08-02T17:41:57","date_gmt":"2024-08-02T15:41:57","guid":{"rendered":"https:\/\/innovalang.eu\/?p=3639"},"modified":"2024-08-08T16:00:32","modified_gmt":"2024-08-08T14:00:32","slug":"computational-linguistics","status":"publish","type":"post","link":"https:\/\/innovalang.eu\/en\/blog-en\/computational-linguistics\/","title":{"rendered":"Introduction to Computational Linguistics"},"content":{"rendered":"<div>\n<p><span lang=\"EN-US\">A few years ago, amidst unfulfilled hopes of the pandemic&#8217;s end and new initiatives by <a href=\"https:\/\/innovalang.eu\/en\/\">InnovaLang<\/a> in translation innovation, we started a discussion with experts in artificial intelligence, machine translation, computer-assisted translation (CAT and beyond), and computational linguistics. We resume the conversation on the latter with Marco Tomatis, an expert in applied linguistics and linguistic engineering, employed at the University of Turin.<\/span><\/p>\n<\/div>\n<div>\n<p><span lang=\"EN-US\">The aforementioned discussion aimed to establish a connection between these areas of research and work, laying theoretical and conceptual foundations useful for developing our Machine Translation (MT) Engine. It also sought to confirm the substantial absence of &#8220;syncretism&#8221; among artificial intelligence, machine translation, computer-assisted translation systems, and computational linguistics, with the perspective of academic research to develop a convergence point to be formalized as a new theoretical starting point on the automation of translation processes.<\/span><\/p>\n<\/div>\n<div>\n<p><span lang=\"EN-US\">Marco, tell us something about yourself first!<\/span><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">I graduated in Modern Foreign Languages and Literatures from the University of Turin back in 1997, after a technical diploma in electronics. After a brief work experience in machine translation at the Dima Group and a more substantial one at the Regional Ethnographic Linguistic Center of Turin, where I dealt with the digitization and filtering of original sound material, I obtained my Ph.D. in &#8220;Linguistics, Applied Linguistics, and Linguistic Engineering&#8221; from the University of Turin in 2005. <\/span><\/i><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">Following this, I had the opportunity to actively engage in research on various projects related to the diverse world of Natural Language Processing (NLP), from the development of Corpora to the encoding of texts according to TEI standards. I also served as an adjunct professor of &#8220;English Language,&#8221; &#8220;Applied Computer Science for Multimedia Communication,&#8221; and &#8220;General Linguistics&#8221; at the University of Turin, as well as &#8220;Computational Linguistics&#8221; at the University of International Studies (Unint) of Rome.<\/span><\/i><\/p>\n<\/div>\n<div>\n<p><span lang=\"EN-US\">How did you acquire these skills?<\/span><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">Acquiring diverse skills, all belonging to the world of linguistics and natural language processing, is the result of passion and experience gained in the field over the years. It&#8217;s worth noting how technological evolution in terms of computing power and data storage capacity has inevitably influenced and modified the theoretical and practical approach to the more delicate and problematic aspects that the design of NLP systems requires addressing. <\/span><\/i><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">For example, from the late &#8217;90s to today, I have observed an evolution in the approach to machine translation, characterized by the gradual shift from rule-based models active on various levels of linguistic structure to models focused on stochastic analysis of translation data. <\/span><\/i><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">This evolution has also involved the realm of programming languages used: over this period, we have seen the great success of Prolog, whose logic-based setup was soon replaced by an approach more closely linked to &#8220;regular expressions,&#8221; a symbolic representation system for character sequences originating from the Unix operating environment and now commonly implemented in all computer sectors involved in human-machine interaction.<\/span><\/i><\/p>\n<\/div>\n<div>\n<p><span lang=\"EN-US\">How would you introduce computational linguistics to someone unfamiliar with it but working in a linguistic environment?<\/span><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">The main difficulty in introducing computational linguistics lies in the fact that it is a hybrid and multidisciplinary field that requires in-depth knowledge of linguistics (particularly the analysis of structure at all levels), statistics, and computer science (operating systems and programming languages) to be properly mastered. <\/span><\/i><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">Unfortunately, in Italy, the humanities and hard sciences struggle to integrate and communicate with each other, often due to a theoretical foundation that is divergent and unable to overcome certain rigid schemas traditionally imposed by the discipline itself. <\/span><\/i><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">My experience, on the contrary, has taught me that the points of contact between the disciplines involved are significantly greater than one might believe, but a radical change in perspective is necessary: in this sense, Noam Chomsky&#8217;s insights are the most evident example. <\/span><\/i><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">Therefore, my suggestion for those interested in taking their first steps in this field is to be guided on a path that, in addition to providing a solid theoretical foundation, offers a practical approach to solving elementary (though absolutely essential) problems such as the process of tokenizing an electronic text.<\/span><\/i><\/p>\n<\/div>\n<div>\n<p><span lang=\"EN-US\">What aspects of this discipline do you find most interesting?<\/span><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">Computational linguistics presents captivating challenges, primarily on the linguistic level: there are still problematic areas of linguistic analysis that could find a solution precisely through the use of automatic systems, which, as such, impose a clear stance in terms of categorization. <\/span><\/i><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">Closely related to this aspect is the potential of models based on the stochastic approach (a mathematical approach to identifying probabilities related to random events) to &#8220;guess&#8221; the nature of a given term unknown to the system simply by referring to the quantization of the term itself within the text portion under examination. Since different approaches can produce different results, I find it highly interesting to identify the best balance between natural language processing through rules and its statistical management by creating a database as large and complete as possible. <\/span><\/i><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">From this perspective, the possibility of improving individual disciplines by leveraging the potential of integrated research represents an interesting and impactful challenge.<\/span><\/i><\/p>\n<\/div>\n<div>\n<p><span lang=\"EN-US\">What are its possible fields of application?<\/span><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">Natural language processing has countless fields of application. Just to mention the most well-known, we go from systems supporting humanistic research and improving the usability of digital texts in electronic libraries (TEI encoding) to speech recognition and synthesis systems, the increasingly widespread &#8220;chatbots&#8221; for automatic user support of a given service, integrated and individual e-learning platforms, and machine translation and computer-assisted translation systems.<\/span><\/i><\/p>\n<\/div>\n<div>\n<p><span lang=\"EN-US\">Do you find a significant gap between the academic approach and practical applications?<\/span><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">Unfortunately, I have noticed a certain disconnect between the traditional approach to problems prevalent in the academic world and the decidedly more pragmatic one that characterizes applied solutions: with a few exceptions, academic research generally struggles to respond to the private sector&#8217;s demands with innovative solutions capable of solving concrete problems quickly.<\/span><\/i><\/p>\n<\/div>\n<div>\n<p><span lang=\"EN-US\">Do you have a funny anecdote to share about your activity in this field?<\/span><\/p>\n<\/div>\n<div>\n<p><i><span lang=\"EN-US\">Even before graduating, a professor (now retired for several years) who later supervised my thesis used to call me &#8220;The computer man&#8221; because of my interests that went far beyond the classical boundaries of linguistics: when I presented my thesis project on the automatic creation of an English-Italian machine dictionary, she realized too late that what I was doing had nothing to do with lexicography in the strict sense&#8230;<\/span><\/i><\/p>\n<\/div>\n<div>\n<p><span lang=\"EN-US\">Thank you, Marco!<\/span><\/p>\n<\/div>\n<div>\n<p><span lang=\"EN-US\">Marco Tomatis&#8217; LinkedIn profile <a href=\"https:\/\/www.linkedin.com\/in\/marco-stefano-tomatis-36257053\/\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/span><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>A few years ago, amidst unfulfilled hopes of the pandemic&#8217;s [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":2639,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"rank_math_lock_modified_date":false,"footnotes":""},"categories":[11],"tags":[113,162,117],"class_list":["post-3639","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-blog-en","tag-innovalang-en","tag-computationallinguistics","tag-translation"],"acf":[],"_links":{"self":[{"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/posts\/3639"}],"collection":[{"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/comments?post=3639"}],"version-history":[{"count":4,"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/posts\/3639\/revisions"}],"predecessor-version":[{"id":3659,"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/posts\/3639\/revisions\/3659"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/media\/2639"}],"wp:attachment":[{"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/media?parent=3639"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/categories?post=3639"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/innovalang.eu\/en\/wp-json\/wp\/v2\/tags?post=3639"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}