Welcome
My name is Rebecca (Mengyue Zhao). I am a data scientist who now works in the private sector. I completed a degree of MSc in Economic and Social History at the University of Oxford in 2018, and have been working on this project since 2016. Please feel free to get in touch if you want to learn more about my projects.
- email: beczhaozmy@gmail.com
- Get in touch through LinkedIn
- Read my paper on this OCR project
- Read my paper on the town-level, nation-wide postal revenue for Nineteenth-Century America: The Economic Geography of Nineteenth-Century America –Mapping from National-Scale, Town-Level Postal Revenue
the Machine Vision Project in Action
I built a stable and efficient workflow to handle the poor layouts of historical tables using machine vision techniques. OCR tasks for tabular data requires extremely high accuracy from the page layout analysis due to the need to align column and rows. Commercial software packages do not handle these tasks well, because they are not trained to detect old (Nineteenth-Century) table format. The page layout information is then feed into a deep learning OCR engine. The result is a dramatic increase in OCR accuracy.
Visualzed Data - The Economic Geography of Nineteenth-Century America
Example of New York
Introduction
Solution Architecture
Performance Comparison
Page Segmentation Results Comparison
OCR Results Comparison