Pedro Paulo Balage


Pedro Paulo BALAGE FILHO


Researcher - Natural Language Processing

I am a research scientist at Priberam, Portugal and a PhD student at ICMC-USP, in Brazil. I have interest in natural language processing and opinion mining. My research project is on aspect-based sentiment analysis and my supervisor is Prof. Dr. Thiago A. S. Pardo, member of NILC.

Personal Information

Date of birth
June 11, 1985
Address
Lisbon, Portugal
Email
pedrobalage@gmail.com

Education

2012 -

PhD @ University of São Paulo - Brazil

from July 2012 to present day

Doctor of Philosophy in Natural Language Processing

Aspect-Based Sentiment Analysis: Information Extraction and Applications for Portuguese Texts

2010 - 2012

MSc @ University of Wolverhampton - UK
MSc @ Unversity of Algarve - Portugal

from 2010 to 2012

Master of Science in Computational Linguistics

Use of Discourse Knowledge to Improve Lexicon-based Sentiment Analysis

2005 - 2010

BS @ University of São Paulo - Brazil

from 2005 to 2010

Bachelor of Science in Computer Science

Employment

Current

Research Scientist - Priberam

from April 2016 to present day

Priberam

At Priberam I work on state-of-art machine learning algorithms for natural language processing.

2015

Founder - Busca Opiniões

September 2014 to December 2015

Busca Opiniões

Busca Opiniões was a startup company dedicated to open domain search for opinions in Brazilian Portuguese language.

2013

Researcher - Dicionário Criativo

from January 2013 to September 2013

Dicionário Criativo

Dicionário Criativo is a Brazilian startup company which was conceived to offer linguistic tools to help users in creative writing. I was responsible to organize the website linguistic resources and to develop the first onomasiological thesaurus available online.

→ View website

2011

Research Intern - XRCE

from May 2011 to September 2011

Xerox Research Centre Europe (XRCE)

In this internship I developed an interface in Java to support the Xerox opinion mining system. My work was awarded as the 2nd best internship project in that summer and I got a paper published in NAACL-Demo 2012.

2009

Project Management Intern - Gemalto

from March 2009 to January 2010

Gemalto

In this internship I was resposible for project management activities including project forecasting, billing, documentation and ROI analysis.

Languages

Fluent/Advanced
Portuguese English Spanish
Intermediary
French German

Skills

Programming Languages
Python NLTK Scrapy Shell Scripting C/C++ Java
Databases
MySQL PostGres MongoDB
Coding and Writing
LaTeX VIM Git
Servers
Linux Server Apache

Hobbies

Swimming Bicycling Running Traveling

Publications

Book Chapters (1)

  1. A. D. Felippo, L. H. M. Rino, T. A. S. Pardo, P. C. F. Cardoso, E. R. M. Seno, P. P. Balage Filho, A. P. Rassi, M. S. Dias, M. L. D. R. C. Jorge, E. G. Maziero, A. C. I. Zacarias, J. W. d. C. Souza, R. T. Camargo, V. Agostini, "Corpus Annotation of Textual Aspects in Multi-Document Summaries", in New Language Technologies and Linguistic Research: A Two-Way Road, S. M. Aluísio, S. E. O. Tagnin, Eds., Cambridge Scholars Publishing, 2014, pp. 171-192. [bibtex] [pdf]

Articles in Scientific Journals (3)

  1. M. A. S. Cabezudo, E. Maziero, J. Souza, M. Dias, P. C. Cardoso, P. P. B. Filho, V. Agostini, F. A. Nóbrega, C. d. Barros, A. D. Felippo, T. A. Pardo, "Anotação de Sentidos de Verbos em Textos Jornalísticos do Corpus CSTNews", Revista de Estudos da Linguagem, vol. 23, no. 3, 2015, pp. 797-832. [bibtex] [pdf] [doi]

  2. N. Konstantinova, C. Orasan and P. P. Balage, "A Corpus Based Method for Product Feature Ranking for Interactive Question Answering Systems", International Journal of Computational Linguistics and Applications, vol. 3, no. 1, mar 2012, pp. 57-70. [bibtex] [pdf]

  3. P. P. Balage Filho, "Use of Discourse Knowledge to Improve Lexicon-based Sentiment Analysis", BULAG Natural Language Processing and Human Language Technology 2012, vol. 1, 2012, pp. 3-22. [bibtex] [pdf] [slides]

Conference Papers (14)

  1. R. Liepins, U. Germann, G. Barzdins, A. Birch, S. Renals, S. Weber, P. van der Kreeft, H. Bourlard, J. Prieto, O. Klejch, P. Bell, A. Lazaridis, A. Mendes, S. Riedel, M. S. C. Almeida, P. Balage, S. B. Cohen, T. Dwojak, P. N. Garner, A. Giefer, M. Junczys-Dowmunt, H. Imran, D. Nogueira, A. Ali, S. Miranda, A. Popescu-Belis, L. Miculicich Werlen, N. Papasarantopoulos, A. Obamuyide, C. Jones, F. Dalvi, A. Vlachos, Y. Wang, S. Tong, R. Sennrich, N. Pappas, S. Narayan, M. Damonte, N. Durrani, S. Khurana, A. Abdelali, H. Sajjad, S. Vogel, D. Sheppey, C. Hernon, J. Mitchell, "The SUMMA Platform Prototype", in Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain: Association for Computational Linguistics, 2017, pp. 116-119. [bibtex] [pdf]

  2. P. Paikens, G. Barzdins, A. Mendes, D. Ferreira, S. Broscheit, M. S. C. Almeida, S. Miranda, D. Nogueira, P. Balage, A. F. T. Martins, "SUMMA at TAC Knowledge Base Population Task 2016", in Proceedings of the Text Analysis Conference -TAC, 2017, pp. 1-9. [bibtex] [pdf]

  3. R. Lopez, T. Pardo, L. Avanço, P. P. Balage Filho, A. Bokan, P. Cardoso, M. Dias, F. Nóbrega, M. Cabezudo, J. Souza, A. Zacarias, E. Seno, A. Di Felippo, "A Qualitative Analysis of a Corpus of Opinion Summaries based on Aspects", in Proceedings of The 9th Linguistic Annotation Workshop, Denver, Colorado, USA: Association for Computational Linguistics, 2015, pp. 62-71. [bibtex] [pdf] [source_code]

  4. P. P. Balage Filho, T. A. S. Pardo, "NILC_USP: Aspect Extraction using Semantic Labels", in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), P. Nakov, T. Zesch, Eds., Dublin, Ireland: Association for Computational Linguistics and Dublin City University, 2014, pp. 433-436. [bibtex] [pdf] [source_code] [poster]

  5. P. P. Balage Filho, L. V. Avanço, M. d. G. V. Nunes, T. A. S. Pardo, "NILC_USP: An Improved Hybrid System for Sentiment Analysis in Twitter Messages", in Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), P. Nakov, T. Zesch, Eds., Dublin, Ireland: Association for Computational Linguistics and Dublin City University, 2014, pp. 428-432. [bibtex] [pdf] [source_code] [poster]

  6. N. S. Hartmann, L. V. Avanço, P. P. Balage Filho, M. Duran, M. d. G. V. Nunes, T. A. S. Pardo, S. M. Aluísio, "A Large Opinion Corpus in Portuguese: Tackling Out-Of-Vocabulary Words", in Proceedings of the 9th edition of the Language Resources and Evaluation Conference (LREC), 2014, pp. 3865-3871. [bibtex] [pdf] [demo]

  7. M. S. Dias, A. Y. B. Garay, C. Chuman, C. D. de Barros, E. G. Maziero, F. A. A. Nobrega, J. W. d. C. Souza, M. A. S. Cabezudo, M. Delege, M. L. D. R. C. Jorge, N. L. Silva, P. C. F. Cardoso, P. P. Balage Filho, R. E. L. Condori, A. D. Felippo, M. d. G. V. Nunes, T. A. S. Pardo, "Enriquecendo o Córpus CSTNews - a Criação de Novos Sumários Multidocumento", in Proceedings of the I Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish - ToRPorEsp, 2014, pp. 1-8. [bibtex] [pdf]

  8. M. A. S. Cabezudo, E. G. Maziero, J. W. d. C. Souza, M. S. Dias, P. C. F. Cardoso, P. P. Balage Filho, V. Agostini, F. A. A. Nobrega, "Anotação de Sentidos de Verbos em Notícias Jornalísticas em Português do Brasil", in Proceedings of the XII Encontro de Linguística de Corpus - ELC, 2014, pp. 1-7. [bibtex] [pdf]

  9. P. P. Balage Filho, T. A. S. Pardo, "NILC_USP: A Hybrid System for Sentiment Analysis in Twitter Messages", in Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 2: Proceedings of the Seventh International Workshop on Semantic Evaluation (SemEval 2013), Atlanta, Georgia, USA: Association for Computational Linguistics, 2013, pp. 568-572. [bibtex] [pdf] [source_code]

  10. P. P. Balage Filho, T. Pardo and S. Aluísio, "An Evaluation of the Brazilian Portuguese LIWC Dictionary for Sentiment Analysis", in Proceedings of the 9th Brazilian Symposium in Information and Human Language Technology (STIL), S. M. Aluísio, V. D. Feltrim, Eds., Fortaleza-CE, Brazil: Sociedade Brasileira de Computação, 2013, pp. 215-219. [bibtex] [pdf] [source_code] [poster]

  11. P. P. Balage Filho, C. Brun and G. Rondeau, "A Graphical User Interface for Feature-Based Opinion Mining", in Proceedings of the NAACL-HLT 2012: Demonstration Session, Montréal, Canada: Association for Computational Linguistics, 2012, pp. 5-8. [bibtex] [pdf] [slides]

  12. P. P. Balage Filho, T. A. S. Pardo and M. d. G. V. Nunes, "Summarizing Scientific Texts: Experiments with Extractive Summarizers", in Seventh International Conference on Intelligent Systems Design and Applications - ISDA, Rio de Janeiro-RJ, Brazil: IEEE Computer Society, 2007, pp. 520-524. [bibtex] [pdf] [doi]

  13. P. P. Balage Filho, V. R. de Uzêda, T. A. S. Pardo, M. d. G. V. Nunes, "Experiments on Applying a Text Summarization System for Question Answering", in Proceedings of the Cross Language Evaluation Forum 2006 Workshop - CLEF (Lecture Notes in Computer Science 4730), C. Peters et al., Eds., Springer Berlin / Heidelberg, 2007, pp. 372-376. [bibtex] [pdf] [doi]

  14. P. P. Balage Filho, V. R. de Uzêda, T. A. S. Pardo, M. d. G. V. Nunes, "Using a Text Summarization System for Monolingual Question Answering", in Proceedings of the Cross Language Evaluation Forum 2006 Workshop - CLEF, 2006, pp. 1-4. [bibtex] [pdf]

[Extended] Abstracts (6)

  1. P. P. Balage Filho, T. A. S. Pardo, "BuscaOpinioes: Searching for Opinions over the Internet", in Proceedings of the 11th International Conference on Computational Processing of Portuguese Language. Software Demonstration, 2014, pp. 1-3. [bibtex] [pdf] [demo] [poster]

  2. P. P. Balage Filho, T. A. S. Pardo, "A Web Interface for Browsing the CSTNews Corpus", in Extended Activities of the 9th International Conference on Computational Processing of Portuguese Language - PROPOR. Software Demonstration., 2010, pp. 1-2. [bibtex] [pdf] [demo]

  3. P. P. Balage Filho, T. A. S. Pardo, "Aquisição Automática de Conhecimento Discursivo para o Português do Brasil", in Anais do 16o Simpósio Internacional de Iniciação Científica da Universidade de São Paulo - SIICUSP, Universidade de São Paulo, 2008. [bibtex] [pdf]

  4. P. P. Balage Filho, T. A. S. Pardo and M. d. G. V. Nunes, "Sumarização Automática de Artigos Científicos", in Anais do 15o Simpósio Internacional de Iniciação Científica da Universidade de São Paulo - SIICUSP, Universidade de São Paulo, 2007. [bibtex] [pdf]

  5. P. P. Balage Filho, T. A. S. Pardo and M. d. G. V. Nunes, "Experimentos com Sumarização Automática Extrativa de Textos Científicos", in I Workshop de Iniciação Científica e Tecnológica da Computação - ICMC USP - WICT-Comp, São Carlos: Instituto de Ciências Matemáticas e de Computação - ICMC/USP, 2007. [bibtex] [pdf]

  6. P. P. Balage Filho, T. A. S. Pardo and M. d. G. V. Nunes, "Sumarização Automática de Textos Estruturados", in Anais do 14o Simpósio Internacional de Iniciação Científica da Universidade de São Paulo - SIICUSP, São Paulo-SP, Brazil: Universidade de São Paulo, 2006. [bibtex] [pdf]

Technical Reports (5)

  1. M. A. S. Cabezudo, E. G. Maziero, J. W. d. C. Souza, M. S. Dias, P. C. F. Cardoso, P. P. Balage Filho, V. Agostini, F. A. A. Nobrega, C. D. de Barros, A. D. Felippo, T. A. S. Pardo, "Anotação de Sentidos de Verbos no Córpus CSTNews", Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. Série de Relatórios do Núcleo Interinstitucional de Linguística Computacional: São Carlos-SP, Brazil, 2014. [bibtex] [pdf]

  2. A. P. Rassi, A. C. I. Zacarias, E. G. Maziero, J. W. d. C. Souza, M. S. Dias, M. L. D. R. C. Jorge, P. C. F. Cardoso, P. P. Balage Filho, R. T. Camargo, V. Agostini, A. D. Felippo, E. R. M. Seno, L. H. M. Rino, T. A. S. Pardo, "Anotação de Aspectos Textuais em Sumários do Córpus CSTNews", Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. Série de Relatórios do Núcleo Interinstitucional de Linguística Computacional.: São Carlos-SP, Brazil, 2013. [bibtex] [pdf]

  3. P. C. F. Cardoso, A. P. Rassi, E. G. Maziero, F. A. A. Nobrega, J. W. d. C. Souza, M. S. Dias, M. L. D. R. C. Jorge, P. P. Balage Filho, R. T. Camargo, V. Agostini, A. D. Felippo, L. H. M. Rino, T. A. S. Pardo, "Anotação de subtópicos do córpus multidocumento CSTNews", Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. Série de Relatórios do Núcleo Interinstitucional de Linguística Computacional.: São Carlos-SP, Brazil, 2012. [bibtex] [pdf]

  4. P. P. Balage Filho, T. A. S. Pardo and M. d. G. V. Nunes, "Sumarização Automática de Textos Científicos: Estudo de Caso com o Sistema GistSumm", Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. Série de Relatórios do Núcleo Interinstitucional de Linguística Computacional.: Sao Carlos-SP, Brazil, 2007. [bibtex] [pdf]

  5. P. P. Balage Filho, V. R. de Uzêda, T. A. S. Pardo, M. d. G. V. Nunes, "Estrutura textual e multiplicidade de tópicos na sumarização automática: o caso do sistema GistSumm", Instituto de Ciências Matemáticas e de Computação, Universidade de São Paulo. Série de Relatórios do Núcleo Interinstitucional de Linguística Computacional.: São Carlos-SP, Brazil, 2006. [bibtex] [pdf]

Monographs (1)

  1. P. P. Balage Filho, "Use of Discourse Knowledge to Improve Lexicon-based Sentiment Analysis", MastersThesis, Universidade do Algarve, University of Wolverhampton, 2012. [bibtex] [pdf] [slides]

Other Publications (3)

  1. P. P. Balage Filho, "Aspect-based Sentiment Analysis for Portuguese: Exploring Syntactic and Semantic Features", in Advances in Machine Learning for Social Media Analysis 2014. Summer Training Workshop in Wroclaw, Poland. Poster Presentation. [bibtex] [poster]

  2. P. P. Balage Filho, "Aspect-based Sentiment Analysis: Information Extraction and Applications for Portuguese texts", in Workshop of PhD and MSc Research (WTD) - ICMC-USP. Poster Presentation, 2013. [bibtex] [poster]

  3. S. M. Bruschi, V. C. Xavier and P. P. Balage Filho, "Teste USP: Multifuncional imprime até fotos.", in Caderno de Informática, Folha de São Paulo, 2007, pp. v. 1, p. F6. [bibtex] [pdf]