Workshop Title Slide

Computational Approaches to Text Preparation and Analysis

Are you interested in textual analysis but unsure about where to start? Join us for an interactive “no experience required” introduction to the fundamental concepts, processes, and methodological approaches for preparing and analyzing text using computational approaches. Following a general introduction to the topic, participants will be guided through prepared exercises that demonstrate how different software packages (OpenRefine, Python) can be used to prepare for and perform textual analysis.

Presented by Jay Brodeur (Associate Director, Digital Scholarship Infrastructure & Services and Administrative Director of the Sherman Centre for Digital Scholarship) and Devon Mordell (Educational Developer, The MacPherson Institute for Teaching and Learning).

Workshop Preparation

For this workshop, you will need OpenRefine and a web browser. Follow the instructions provided by the Library Carpentry to install OpenRefine on your system (whether it is Windows, Mac, or Linux).

NOTE: When opening OpenRefine for the first time in a Mac, you may need to open your security preferences and permit OpenRefine to run. See this article from Apple Support about opening a Mac app from an unidentified developer.

Segment	Time Allotted	Key Topics / Activities
Introductory remarks	20 minutes	Introduction to text preparation and analysis Overview of concepts and methods Key considerations for different source materials and analyses
OpenRefine	40 minutes	Introduction to OpenRefine Manual cleanup (e.g. find and replace) Faceting
Getting Programmatic with Python	20 minutes	Overview of programmatic approaches The ‘what’ and ‘when’ to program Using Python for text preparation Link to notebook
Break	10 minutes	Break
Sampling of text analysis methods	75 minutes	Named entity recognition (Link to notebook) Topic Modeling (Link to notebook) Sentiment analysis (Link to notebook)
Q & A; Final Thoughts	10 minutes	Questions and wrap-up Where to learn more

Workshop Notebooks

Most of our work will be done using jupyter notebooks hosted in Google Colab.

Workshop Recording

View the original here..

Workshop Slides

Download as PDF.

Links and Resources

Here are a variety of helpful resources to explore and learn more

OpenRefine

Library Carpentry lesson on OpenRefine
University of Toronto Libraries OpenRefine tutorials
OpenRefine Manual on Regular Expressions
Using regular expressions in OpenRefine: Tutorial by Peter Green, includes non-Latin script.
Regular expression testers
- https://www.regular-expressions.info/
- https://regex101.com/
- Regexr: Interactive regular expression (regex) coder and explainer

Python & NLP

Python Integrated Development Environments

There are many, many different Python IDEs. Find which one is best for you. Jay is partial to Pyzo.

Python packages for text prep and Natural Langauge Processing

PyTesseract: Simple Python Optical Character Recognition
spaCy NLP library and documentation
NLTK NLP library and docmentation
natas: Library for processing historical English corpora, especially for studying neologisms
Python phonetics package, which includes methods for matching and clustering words by phonetic similarity
pyspellchecker: A simple Python-based spell checking algorithm
BookNLP: A natural language processing pipeline that scales to books and other long documents (in English).