Identifying Proper Nouns with Named Entity Recognition belongs to a series of workshops on computational text analysis.
Names - of people or places, for instance - are a common feature of interest when working with texts at scale. We can use a search function to locate occurrences of names that we are expecting to find, such as “Frederick Douglass” in issues of The North Star, but how do we go about searching for all names in the text - even those we do not know to look for?
The natural language processing technique of named entity recognition (NER) identifies words - or tokens - that may be names, places, or organizations within unstructured text. Some NER tools will also identify tokens that may represent dates, currency and so on. It can support exploratory data analysis, highlighting entities for further investigation through close reading or produce a list of entities that can then be counted and compared.
In this workshop, we will explore how NER works and apply it to a text corpus using a Python library named SpaCy.
By the end of the workshop, participants will be able to:
- Describe how NER identifies possible entities within a text corpus
- Identify potential names, places and organizations using NER tools
- Explain why different NER tools may produce different results from each other
Going through the workshop from start to finish (and you need not necessarily!) will take you approximately 1 hour to complete, depending on your familiarity with Python and whether you are working with your own dataset alongside the sample corpus.
Next –> Preparation