Tecnologies del llenguatge per a una administració pública multilingüe a Espanya

Iria de-Dios-Flores, José Ramom Pichel Campos, Adina Ioana Vladu, Pablo Gamallo Otero

Resum


Les interaccions entre la ciutadania i l’administració pública es produeixen cada cop amb més freqüència per via electrònica, que sovint s’anomena administració electrònica. A Espanya, moltes d’aquestes interaccions han de ser monolingües, en castellà, en el cas de l’administració central, però poden ser bilingües o fins i tot multilingües a les comunitats autònomes amb llengua oficial pròpia. En aquest article, volem mostrar com les últimes tecnologies del llenguatge oral i escrit per a les llengües cooficials d’Espanya permetrien que els parlants d’aquestes llengües les fessin servir en gran part de les seves relacions administratives amb qualsevol organisme públic espanyol, cosa que facilitaria la transformació de l’administració majoritàriament monolingüe d’Espanya en una de multilingüe i així fomentaria la igualtat lingüística digital i garantiria els drets lingüístics dels parlants de les llengües minoritzades. Presentarem un panorama general de les tecnologies del llenguatge més prometedores per la seva importància des de la perspectiva de la comunicació multilingüe entre la ciutadania i l’administració. També analitzarem les tecnologies existents per a les llengües cooficials d’Espanya i presentarem algunes idees sobre com es podrien integrar per avançar cap a la transformació multilingüe de les administracions públiques espanyoles sense oblidar algunes de les qüestions ètiques i jurídiques dels treballadors. Aquest article té l’objectiu de servir com una descripció introductòria i accessible per a legisladors, administradors o qualsevol altra persona interessada en el potencial de les tecnologies del llenguatge per ajudar a desenvolupar una administració pública multilingüe.


Paraules clau


tecnologies del llenguatge; administració pública; igualtat lingüística digital; multilingüisme

Cites


Agerri, Rodrigo, Agirre, Eneko, Aldabe, Itziar, Aranberri, Nora, Arriola, Jose Maria, Atutxa, Aitziber, Azkune, Gorka, Casillas, Arantza, Estarrona, Ainara, Farwell, Aritz, Iakes, Goenaga, Josu, Goikoetxea, Koldo, Gojenola, Inma, Hernaez, Mikel, Iruskieta, Gorka, Labaka, Lopez de Lacalle, Oier, Navas, Eva, Oronoz, Maite, … Soroa, Aitor. (2021). European language equality. D1.2: Report on the state of the art in LT and language-centric AI. European Language Equality

Alegría-Loinaz, Iñaki, Arantzabal-Altuna, Iñaki, Forcada, Mikel L., Gómez-Guinovart, Xavier, Padró-Cirera, Lluís, Pichel-Campos, José Ramom, & Waliño, Josu. (2006). OpenTrad: Traducción automática de código abierto para las lenguas del estado español. Procesamiento del Lenguaje Natural, 37, 357–358.

Baevski, Alexei, Zhou, Henry, Mohamed, Abdelrahman, & Auli, Michael. (2020). wav2vec 2.0: A framework for self-supervised learning of speech representations. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, & Hsuan-Tien Lin (Eds.), Advances in neural information processing systems 33 (NeurIPS 2020). Curran Associates

Bapna, Ankur, Caswell, Isaac, Kreutzer, Julia, Firat, Orhan, van Esch, Daan, Siddhant, Aditya, Niu, Mengmeng, Baljekar, Pallavi, Garcia, Xavier, Macherey, Wolfgang, Breiner, Theresa, Axelrod, Vera, Riesa, Jason, Cao, Yuan, Chen, Mia, Macherey, Klaus, Krikun, Maxim, Wang, Pidong, Gutkin, Alexander, … Hughes, Macduff. (2022). Building machine translation systems for the next thousand languages. Google Research.

Bender, Emily M., Gebru, Timnit, McMillan-Major, Angelina, & Shmitchell, Shmargaret. (2021). On the dangers of stochastic parrots: can language models be too big? FAccT ‘21: Proceedings of the 2021 ACM conference on fairness, accountability, and transparency (pp. 610–623). Association for Computing Machinery. https://doi.org/10.1145/3442188.3445922

Bernadí-Gil, Xavier. (2004). La incidencia de internet sobre la distribución de competencias. Observatorio de la Evolución de las Instituciones, Universitat Pompeu Fabra.

Bernadí-Gil, Xavier. (2008). La cooperación interadministrativa y la interoperabilidad. In Agustí Cerrillo i Martínez (Ed.), Informe sobre la administración electrónica local (pp. 283–330). Fundació Carles Pi i Sunyer d’Estudis Autonòmics i Locals.

Brown, Tom, Mann, Benjamin, Ryder, Nick, Subbiah, Melanie, Kaplan, Jared D., Dhariwal, Prafulla, Neelakantan, Arvind, Shyam, Pranav, Sastry, Girish, Askell, Amanda, Agarwal, Sandhini, Herbert-

Voss, Ariel, Krueger, Gretchen, Henighan, Tom, Child, Rewon, Ramesh, Aditya, Ziegler, Daniel, Wu, Jeffrey, Winter, Clemens, … Amodei, Dario. (2020). Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, & Hsuan-Tien Lin (Eds.), Advances in neural information processing systems 33 (NeurIPS 2020). Curran Associates.

Byambadorj, Zolzaya, Nishimura, Ryota, Ayush, Altangerel, Ohta, Kengo, & Kitaoka, Norihide. (2021). Multi-speaker TTS system for low-resource language using cross-lingual transfer learning and data augmentation. 2021 Asia-Pacific Signal and Information Processing Association annual summit and conference (APSIPA ASC) (pp. 849-853). IEEE.

Capdeferro, Oscar. (2020). La inteligencia artificial del sector público: desarrollo y regulación de la actuación administrativa inteligente en la cuarta revolución industrial. IDP. Revista de Internet, Derecho y Política, 30. https://doi.org/10.7238/idp.v0i30.3219

Cascallar-Fuentes, Andrea, Ramos-Soto, Alejandro, & Bugarín-Diz, Alberto. (2018). Adapting SimpleNLG to Galician language. In Emiel Krahmer, Albert Gatt, & Martijn Goudbeek (Eds.), Proceedings of the 11th international conference on natural language generation (pp. 67–72). Association for Computational Linguistics.

Cheng, Lanzhi, Ben, Peiyun, & Qiao, Yuchen. (2022). Research on automatic error correction method in English writing based on deep neural network. Computational Intelligence and Neuroscience, 3, 1–10. https://doi.org/10.1155/2022/2709255

Conneau, Alexis, Khandelwal, Kartikay, Goyal, Naman, Chaudhary, Vishrav, Wenzek, Guillaume, Guzmán, Francisco, Grave, Edouard, Ott, Myle, Zettlemoyer, Luke, & Stoyanov, Veselin. (2019). Unsupervised cross-lingual representation learning at scale. In Dan Jurafsky, Joyce Chai, Natalie Schluter, & Joel Tetreault (Eds.), Proceedings of the 58th annual meeting of the Association for Computational Linguistics (pp. 8440–8451). Association for Computational Linguistics.

Constitución Española. (1978, December 29). Boletín Oficial del Estado, 311.

Council of Europe. (1992). European charter for regional or minority languages (ETS No. 148).

Damascene Twizeyimana, Jean, & Andersson, Annika. (2019). The public value of e-government – A literature review. Government Information Quarterly, 36(2), 167–178. https://doi.org/10.1016/j.giq.2019.01.001

de-Dios-Flores, Iria, Magariños, Carmen, Vladu, Adina Ioana, Ortega, John E., Pichel, José Ramom, Garcia, Marcos, Gamallo, Pablo, Fernández Rei, Elisa, Bugarín-Diz, Alberto, González Gamali, Manuel, Barro, Senén, & Regueira, Xosé Luis. (2022). The Nós project: Opening routes for the Galician language in the field of language technologies. In Itziar Aldabe, Begoña Altuna, Aritz Farwell, & German Rigau (Eds.), Proceedings of the workshop towards digital language equality within the 13th language resources and evaluation conference (pp. 52–61). European Language Resources Association.

Devlin, Jacob, Chang, Ming-Wei, Lee, Kenton, & Toutanova, Kristina. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Jill Burstein, Christy Doran, & Thamar Solorio (Eds.), Proceedings of the 2019 conference of the North American chapter of the Association for Computational Linguistics: Human language technologies, Volume 1 (pp. 4171–4186). Association for Computational Linguistics

Erjavec, Tomaž, Ogrodniczuk, Maciej, Osenova, Petya, Ljubešić, Nikola, Simov, Kiril, Pančur, Andrej, Rudolf, Michał, Kopp, Matyáš, Barkarson, Starkaður, Steingrímsson, Steinþór, Çöltekin, Çağrı, de Does, Jesse, Depuydt, Katrien, Agnoloni, Tommaso, Venturi, Giulia, Pérez, María Calzada, de Macedo, Luciana D., Navarretta, Costanza, Luxardo, Giancarlo, Coole, Matthew, … Fišer, Darja. (2022). The ParlaMint corpora of parliamentary proceedings. Language Resources and Evaluation, 57, 415–448. https://doi.org/10.1007/s10579-021-09574-0

Forcada, Mikel L. (2006, May 22-28). Open source machine translation: an opportunity for minor languages [Workshop presentation]. Strategies for developing machine translation for minority languages, 5th SALTMIL Workshop on Minority Languages, LREC 2006, Genoa, Italy.

Forcada, Mikel, L., Ginestí-Rosell, Mireia, Nordfalk, Jacob, O’Regan, Jim, Ortiz-Rojas, Sergio, Pérez-Ortiz, Juan Antonio, Sánchez-Martínez, Felipe, Ramírez-Sánchez, Gema, & Tyers, Francis M. (2011). Apertium: a free/open-source platform for rule-based machine translation. Machine Translation, 25(2), 127–144.

Gaspari, Federico, Way, Andy, Dunne, Jane, Rehm, Georg, Piperidis, Stelios, & Giagkou, Maria. (2021). European language equality. D1.1 Digital language equality (preliminary definition). European Language Equality.

Gaspari, Federico, Grützner-Zahn, Annika, Rehm, Georg, Gallagher, Owen, Giagkou, Maria, Piperidis, Stelios, & Way, Andy. (2022). European language equality. D1.3 Digital language equality (full specification). European Language Equality.

Gatt, Albert, & Krahmer, Emiel. (2018). Survey of the state of the art in natural language generation: Core tasks, applications, and evaluation. Journal of Artificial Intelligence Research, 61, 65–170.

Goldberg, Yaov. (2017). Neural network methods for natural language processing. Springer.

Gómez-Pomar Rodríguez, Juan, & López Aranda, Miguel. (Eds). (2009). Administración electrónica: El modelo español (2nd edition). Euroeditions.

Gu, Jiatao, Wang, Yong, Cho, Kyunghyun, & Li, Victor O.K. (2019). Improved zero-shot neural machine translation via ignoring spurious correlations. In Anna Korhonen, David Traum, & Lluís Màrquez (Eds.), Proceedings of the 57th annual meeting of the Association for Computational Linguistics (pp. 1258–1268). Association for Computational Linguistics.

Kamocki, Paweł, & Witt, Andreas. (2022). Ethical issues in language resources and language technology – tentative taxonomy. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, & Stelios Piperidis (Eds.), Proceedings of the thirteenth language resources and evaluation conference (pp. 559–563). European Language Resources Association.

Koehn, Phillip. (2005). Europarl: A parallel corpus for statistical machine translation. Proceedings of Machine Translation, Summit X: Papers (pp. 79–86).

Koehn, Philipp. (2009). Statistical machine translation. Cambridge University Press.

Koehn, Philipp. (2020). Neural machine translation. Cambridge University Press.

Koehn, Philipp, Hoang, Hieu, Birch, Alexandra, Callison-Burch, Chris, Federico, Marcello, Bertoldi, Nicola, Cowan, Brooke, Shen, Wade, Moran, Christine, Zens, Richard, Dyer, Chris, Bojar, Ondrej, Constantin, Alexandra, & Herbst, Evan. (2007). Moses: Open source toolkit for statistical machine translation. In Sophia Ananiadou (Ed.), Proceedings of the 45th annual meeting of the Association for Computational Linguistics companion volume proceedings of the demo and poster sessions (pp. 177–180). Association for Computational Linguistics.

Külebi, Baybars, Öktem, Alp, Peiró-Lilja, Alex, Pascual, Santiago, & Farrús, Mireia. (2020, October 25–29). CATOTRON – A neural text-to-speech system in Catalan [Conference presentation]. Interspeech 2020, Shanghai, China.

Kumar, Yogesh, Koul, Apesha, & Singh, Chamkaur. (2022). A deep learning approaches in text-to-speech system: A systematic review and recent research perspective. Multimedia Tools and Applications, 82, 15171–15197. https://doi.org/10.1007/s11042-022-13943-4

LeCun, Yann, Bengio, Yoshua, & Hinton, Geoffrey. (2015). Deep learning. Nature, 521, 436–444. https://doi.org/10.1038/nature14539

Liu, Yinhan, Gu, Jiatao, Goyal, Naman, Li, Xian, Edunov, Sergey, Ghazvininejad, Marjan, Lewis, Mike, & Zettlemoyer, Luke. (2020). Multilingual denoising pre-training for neural machine translation. Transactions of the Association for Computational Linguistics, 8, 726–742. https://doi.org/10.1162/tacl_a_00343

Lopez, Adam. (2008). Statistical machine translation. ACM Computing Surveys (CSUR), 40(3), 1–49.

Marcus, Gary. (2022, December 19). AI platforms like ChatGPT are easy to use but also potentially dangerous. Scientific American.

McTear, Michael. (2020). Conversational AI: Dialogue systems, conversational agents, and chatbots. Morgan & Claypool Publishers.

Melero, Maite, Figueras, Blanca, Rodríguez, Mar, & Villegas, Marta. (2022a). European language equality. D1.15. Report on the Catalan language. European Language Equality.

Melero, Maite, Peñarrubia, Pablo, Cabestany, David, Figueras, Blanca, Rodríguez, Mar, & Villegas, Marta. (2022b). European language equality. D1.32. Report on the Spanish language. European Language Equality.

Ortega, John E., de-Dios-Flores, Iria, Gamallo, Pablo, & Pichel, José Ramom. (2022). A neural machine translation system for Galician from transliterated Portuguese text. In Miguel Á. Alonso, Margarita Alonso-Ramos, Carlos Gómez Rodríguez, David Vilares Calvo, & Jesús Vilares (Eds.), SEPLN-PD 2022. Annual Conference of the Spanish Association for Natural Language Processing 2022: Projects and Demonstrations (pp. 92–95). CEUR Workshop Proceedings.

Patlan, Atharv Singh, Tripathi, Shiven, & Korde, Shubham. (2021). A review of dialogue systems: from trained monkeys to stochastic parrots. arXiv, arXiv:2111.01414 [cs.CL].

Pecina, Pavel, Toral, Antonio, Papavassiliou, Vassilis, Prokopidis, Prokopis, Tamchyna, Aleš, Way, Andy, & van Genabith, Josef. (2015). Domain adaptation of statistical machine translation with domain-focused web crawling. Language Resources and Evaluation, 49(1), 147–193. https://doi.org/10.1007/s10579-014-9282-3

Pilehvar, Mohammad Taher, & Camacho-Collados, Jose. (2020). Embeddings in natural language processing: theory and advances in vector representations of meaning. Springer.

Radford, Alec, Kim, Jong Wook, Xu, Tao, Brockman, Greg, McLeavey, Christine, & Sutskever, Ilya. (2022). Robust speech recognition via large-scale weak supervision. arXiv, arXiv:2212.04356 [eess.AS].

Raffel, Colin, Shazeer, Noam, Roberts, Adam, Lee, Katherine, Narang, Sharan, Matena, Michael, Zhou, Yanqi, Li, Wei, & Liu, Peter J. (2020). Exploring the limits of transfer learning with a unified text-totext transformer. Journal of Machine Learning Research, 21(140), 1−67.

Ramírez-Sánchez, José Manuel, & García Mateo, Carmen. (2022). European language equality. D1.15. Report on the Galician language. European Language Equality.

Rikters, Matiss. (2018). Impact of corpora quality on neural machine translation. In Kadri Muischneck & Kaili Müürisep (Eds.), Human language technologies – The Baltic perspective (pp. 126–133). IOS Press.

Rodríguez Banga, Eduardo, García-Mateo, Carmen, Méndez-Pazó, Francisco, González-González, Manuel, & Magarinos, Carmen. (2012). Cotovía: An open-source TTS for Galician and Spanish. In Doroteo Torre Toledano et al. (Eds.), Proceedings IberSPEECH 2012: “VII Jornadas en Tecnología del Habla” and “III Iberian SLTech Workshop” (pp. 308–315). Universidad Autónoma de Madrid.

Sarasola, Kepa, Aldabe, Itziar, Diaz de Ilarraza, Arantza, Estarrona, Ainara, Farwell, Aritz, Hernaez, Inma, & Navas, Eva. (2022). European language equality. D1.15. Report on the Basque language. European Language Equality.

Shiwen, Yu, & Xiaojing, Bai. (2014). Rule-based machine translation. In Sin-Wai Chan (Ed.), Routledge encyclopedia of translation technology (pp. 224–238). Routledge.

Sobrino-García, Itziar. (2021). Artificial intelligence risks and challenges in the Spanish public administration: An exploratory analysis through expert judgements. Administrative Sciences, 11(3), 102. https://doi.org/10.3390/admsci11030102

Stefaniak, Karolina. (2020). Evaluating the usefulness of neural machine translation for the Polish translators in the European Commission. In André Martins, Helena Moniz, Sara Fumega, Bruno Martins, Fernando Batista, Luisa Coheur, Carla Parra, Isabel Trancoso, Marco Turchi, Arianna Bisazza, Joss Moorkens, Ana Guerberof, Mary Nurminen, Lena Marg, & Mikel L. Forcada (Eds.), Proceedings of the 22nd annual conference of the European Association for Machine Translation (pp. 263–269). European Association for Machine Translation.

Tsamados, Andreas, Aggarwal, Nikita, Cowls, Josh, Morley, Jessica, Roberts, Huw, Taddeo, Mariarosaria, & Floridi, Luciano. (2022). The ethics of algorithms: key problems and solutions. AI & Society, 37, 215–230. https://doi.org/10.1007/s00146-021-01154-8

Tunstall, Lewis, von Werra, Leandro, & Wolf, Thomas. (2022). Natural language processing with transformers. O’Reilly.

Valero Torrijos, Julián. (2020). The legal guarantees of artificial intelligence in administrative activity: Reflections and contributions from the viewpoint of Spanish administrative law and good administration requirements. European Review of Digital Administration & Law, 1(1–2), 55–62.

Vaswani, Ashish, Shazeer, Noam, Parmar, Niki, Uszkoreit, Jakob, Jones, Llion, Gomez, Aidan N., Kaiser, Łukasz, & Polosukhin, Ilia. (2017). Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.), Advances in neural information processing systems 30 (NIPS 2017) (pp. 5998–6008). Curran Associates.

Weidinger, Laura, Mellor, John, Rauh, Maribeth, Griffin, Conor, Uesato, Jonathan, Huang, Po-Sen, Cheng, Myra, Glaese, Mia, Balle, Borja, Kasirzadeh, Atoosa, Kenton, Zac, Brown, Sasha, Hawkins, Will, Stepleton, Tom, Biles, Courtney, Birhane, Abeba, Haas, Julia, Rimell, Laura, Hendricks, … Gabriel, Iason. (2021). Ethical and social risks of harm from language models. ArXiv, arXiv:2112.04359 [cs.CL].

Williams, Jason, Raux, Antoine, Ramachandran, Deepak, & Black, Alan. (2013). The dialog state tracking challenge. In Maxine Eskenazi, Michael Strube, Barbara Di Eugenio, & Jason D. Williams (Eds.), Proceedings of the SIGDIAL 2013 conference (pp. 404–413). Association for Computational Linguistics.

Wu, Jilong, Polyak, Adam, Taigman, Yaniv, Fong, Jason, Agrawal, Prabhav, & He, Qing. (2022). Multilingual text-to-speech training using cross language voice conversion and self-supervised learning of speech representations. ICASSP 2022 – 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 8017–8021). IEEE.




DOI: http://dx.doi.org/10.58992/rld.i79.2023.3943



 

Reconeixement - NoComercial - SenseObraDerivada (by-nc-nd): No es permet un ús comercial de l'obra original ni la generació d'obres derivades.