Abbyy finereader 10 tutorial

#Abbyy finereader 10 tutorial free#

Experiment with different color settings for your project. People debate the efficacy of binarized (black-white) versus color formats. Explore what works best for your project, the benefits are high.

The straighter your page, the more likely programs are to recognize tabular content.

Pre-process scans to remove stains and borders fix page orientation deskew and normalize page illumination.

Cleaning images improves text recognition.

This is especially true for text from the early 1900s and prior. The older the text, the harder OCRing will be.PDFs are often unavoidable, but use less "lossy" formats when possible. Try to work with scans that are at least 300 DPI, saved in TIFF format.(From the Mad Men Mondays "Data's First Class Economy Set" Repository: Hartman Center, Rubenstein Library, Duke University.)

#Abbyy finereader 10 tutorial free#

ABBYY has a relatively gentle learning curve and, importantly, straightforward table functionality.įor those more comfortable with the command line and programming, or for open source advocates, I suggest free programmatic alternatives for each tutorial step. Larger complex digitization projects often entail more technical elbow grease and advanced use of such tools. I focus on OCRing material with ABBYY FineReader, a popular commercial program for OCRing. Instead, this is a broad overview aimed at researchers with minimal programming experience tackling smaller digitization projects-say, nothing more than 200 pages. This is a simple introduction to scraping tables from historic (scanned) documents. Tutorial: A Beginner's Guide to Scraping Historic Table Data