OCR-D

Cookies und Tracking helfen uns, Ihnen auf unserer Website ein besseres Erlebnis zu ermöglichen.

Open Character Recognition Development (OCR-D)

With the Union Catalogue of Books of the 16th-18th century (VD 16, VD 17, VD 18) published in the German-speaking countries, a retrospective national bibliography of early modern writings from the German-speaking countries in being compiled. In order to facilitate research access to these texts, great concerted efforts have been and are being undertaken to make fully digitised copies or key pages for the recorded titles available in digital form.

The main goal of the project is the conceptual and technical preparation of the full trext transformation of the VD. the task of automatic full-text recognition is broken down into its individual process steps, which can be retraced in the open source OCR-D software. This allows to create optimal workflows for the old prints to be processed and thus to generate scientifically usable full texts.

For this purpose, a coordination project was formed that identified development needs in the first project phase. These were worked on in the second project phase by a total of eight module projects. In the current third project phase, the focus is on the conceptual preparation for the automatic generation of full texts for VD 16, VD 17, and VD 18. In addition, four implementation projects are working on integrating OCR-D into existing applications and infrastructures, while three module projects are further optimising OCR-D tools.

Role of GWDG

  • Providing infrastructure and consultant at all phases
  • Workflow development at phase 3
  • Software development of a long-term archiving system (OLA-HD) at phase 2 and 3
  • Software development of an implementation project in phase 3 (OPERANDI) 

Project partners

 

Webseite

https://ocr-d.de/

Community

GitHub
Gitter
Twitter
Docker Hub

Kontakt

Prof. Dr Philipp Wieder
Triet Ho Anh Doan

Laufzeit

Phase 1: 01.09.2015 - 28.02.2018
Phase 2: 01.03.2018 - 30.06.2020
Phase 3: 01.04.2021 - 31.03.2024

Förderung

Projektnummer: 460675868

Deutsche Forschungsgemeinschaft DFG