Skip to main content Link Menu Expand (external link) Left Arrow Right Arrow Document Search Copy Copied

OCR Error Correction with Python

Although we have discussed how you can export your pre-processing steps from OpenRefine to create a program of sorts in JSON, you may find it easier to perform your pre-processing tasks in Python if you are already familiar with programming concepts or if you would like to learn. Python has a number of natural language processing (NLP) libraries that can be used for exploratory data analysis as well.

The Sherman Centre’s Jay Brodeur and Alexandra Provo of NYU Libraries have created a Text Preparation and Analysis workshop with a section on programmatic approaches with Python. The workshop also has a tutorial on OpenRefine if you would like to further practice and build on the skills you have developed through the current workshop.