Textual analysis and machine learning with applications to economics and finance
Perugia, 11-15 July 2022
- Thomas Renault , University Paris 1 Panthéon-Sorbonne (France)
- Matthieu Picault , University of Orléans (France) and Laboratoire d’Économie d’Orléans
The activation of the course in presence is conditional to the recruitment of a minimum of 15 participants.
The maximum number of allowed participants in presence is 30.
Basic knowledge of statistics. Participants should have a basic understanding of computer programming. It is possible to follow the tutorial available at https://www.learnpython.org/ to learn or review the basics of programming in Python. Participants must install Anaconda (https://www.anaconda.com/products/individual) to have a functional programming environment before the beginning of the course.
Reference textbooks for the course:
- Altig, D., Baker, S., Barrero, J. M., Bloom, N., Bunn, P., Chen, S., ... & Thwaites, G. (2020). Economic uncertainty before and during the COVID-19 pandemic. Journal of Public Economics, 191, 104274.
- Kearney, C., & Liu, S. (2014). Textual sentiment in finance: A survey of methods and models. International Review of Financial Analysis, 33, 171-185.
- Picault, M., Pinter, J., & Renault, T. (2022). Media sentiment on monetary policy: determinants and relevance for inflation expectations. Journal of International Money and Finance, Forthcoming.
- Picault, M., & Renault, T. (2017). Words are not all created equal: A new measure of ECB communication. Journal of International Money and Finance, 79, 136-156.
- Loughran, T., & McDonald, B. (2016). Textual analysis in accounting and finance: A survey. Journal of Accounting Research, 54(4), 1187-1230.
- Renault, T. (2020). Sentiment analysis and machine learning in finance: a comparison of methods and models on one million messages. Digital Finance, 2(1), 1-13.
- Renault, T. (2017). Intraday online investor sentiment and return patterns in the US stock market. Journal of Banking & Finance, 84, 25-40.
- Thorsrud, L. A. (2020). Words are the new numbers: A newsy coincident index of the business cycle. Journal of Business & Economic Statistics, 38(2), 393-409.
- Mitchell, R. (2018). Web scraping with Python: Collecting more data from the modern web. " O'Reilly Media, Inc.".
- Bengfort, B., Bilbro, R., & Ojeda, T. (2018). Applied text analysis with python: Enabling language-aware data products with machine learning. " O'Reilly Media, Inc.".
- Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.".
The objective of this course is study how we can use the millions of textual contents published on the Internet and social media every day to improve our understanding of various economic and financial phenomena. After an introduction to the Python programming language, we will start by seeing how it is possible to extract online content via the use of existing APIs or the implementation of web scraping tools. We will create an application to collect articles from a major media site and we will use an API to extract tweets from a social network dedicated to finance. Next, we will see how to analyse a text using Natural Language Processing (NLP) methods. We will apply this to the speeches made by the European Central Bank to show how it is possible to give structure to unstructured data. The next session will be dedicated to sentiment analysis and will present the different methods (dictionary approach and machine learning). We will analyse Twitter data to build a sentiment indicator capturing the well-being of individuals in a country. Then, we will introduce the unsupervised methods of textual analysis with a particular focus on topic modelling methods. We will perform an application of a Latent Dirichlet Allocation on a large corpus of Wikipedia articles. Finally, the last session will be devoted to advanced methods of textual analysis to open the field of possibilities by introducing different methods of machine learning, word embedding and data structuring.
For the different sessions, we will first present both the related theories and methods - in a language accessible to non-mathematicians - and their latest applications in the economic and financial literature. We will then study and share with the participants’ scripts and codes to realize different tasks in Python. We will also offer participants the opportunity to present their research and/or projects, and if possible, we will assist them with their projects - both on the data collection side and on the data analysis side.
Schedule of the course:
Mon 11 Jul 9:00 - 12:30 Introduction to Python
14:30 -18:00 Application: How to get data from API and websites
Tue 12 Jul 9:00 - 12:30 Natural Language Processing
14:30 -18:00 Application: NLP to analyse central bank
Wed 13 Jul 9:00 - 12:30 Sentiment Analysis
14:30 -18:00 Application: Measuring well-being on Twitter
Thu 14 Jul 9:00 - 12:30 Unsupervised methods for textual analysis
14:30 -18:00 Application: Latent Dirichlet Allocation on Wikipedia
Fri 15 Jul 9:00 - 12:30 Paper presentation
14:30 -18:00 Advanced methods in text mining
Venue and timetables
The Module will be held in the Bank of Italy's Scuola di Automazione per Dirigenti Bancari (S.A.Di.Ba.), via San Marco n.54, Perugia. Participants will be accommodated at S.A.Di.Ba.. (in case of reduced availability of rooms in the Centre, they will be accommodated in local hotels).
Lectures and tutorials will be in English, with the following schedule:
- Monday to Friday: lectures 9:00-12:30, 14:30-18:00
- For more information: Laura Urraci e-mail: firstname.lastname@example.org
- For administrative issues : Laura Urraci (email@example.com), Alessandra Picariello (firstname.lastname@example.org), phone:+39 375 5112161;
- For travel and accommodation: Maria Assunta Colacci (email@example.com), phone 075-5447613 cell. 3498725455