Skip to main content Link Menu Expand (external link) Left Arrow Right Arrow Document Search Copy Copied

Workshop Title Slide

Computational Approaches to Text Preparation and Analysis

Are you interested in textual analysis but unsure about where to start? Join us for an interactive “no experience required” introduction to the fundamental concepts, processes, and methodological approaches for preparing and analyzing text using computational approaches. Following a general introduction to the topic, participants will be guided through prepared exercises that demonstrate how different software packages (OpenRefine, Python) can be used to prepare for and perform textual analysis.

Presented by Jay Brodeur (Associate Director, Digital Scholarship Infrastructure & Services and Administrative Director of the Sherman Centre for Digital Scholarship) and Devon Mordell (Educational Developer, The MacPherson Institute for Teaching and Learning).

Workshop Preparation

For this workshop, you will need OpenRefine and a web browser. Follow the instructions provided by the Library Carpentry to install OpenRefine on your system (whether it is Windows, Mac, or Linux).

  • NOTE: When opening OpenRefine for the first time in a Mac, you may need to open your security preferences and permit OpenRefine to run. See this article from Apple Support about opening a Mac app from an unidentified developer.

Contents

Segment Time Allotted Key Topics / Activities
Introductory remarks 20 minutes Introduction to text preparation and analysis
Overview of concepts and methods
Key considerations for different source materials and analyses
OpenRefine 40 minutes Introduction to OpenRefine
Manual cleanup (e.g. find and replace)
Faceting
Getting Programmatic with Python 20 minutes Overview of programmatic approaches
The ‘what’ and ‘when’ to program
Using Python for text preparation
Link to notebook
Break 10 minutes Break
Sampling of text analysis methods 75 minutes Named entity recognition (Link to notebook)
Topic Modeling (Link to notebook)
Sentiment analysis (Link to notebook)
Q & A; Final Thoughts 10 minutes Questions and wrap-up
Where to learn more

Workshop Notebooks

Most of our work will be done using jupyter notebooks hosted in Google Colab.

Workshop Recording

View the original here..

Workshop Slides

Download as PDF.

Here are a variety of helpful resources to explore and learn more

OpenRefine

Python & NLP

Python Integrated Development Environments

  • There are many, many different Python IDEs. Find which one is best for you. Jay is partial to Pyzo.

Python packages for text prep and Natural Langauge Processing

  • PyTesseract: Simple Python Optical Character Recognition
  • spaCy NLP library and documentation
  • NLTK NLP library and docmentation
  • natas: Library for processing historical English corpora, especially for studying neologisms
  • Python phonetics package, which includes methods for matching and clustering words by phonetic similarity
  • pyspellchecker: A simple Python-based spell checking algorithm
  • BookNLP: A natural language processing pipeline that scales to books and other long documents (in English).

Other tutorials and resources