Open Character Recognition Development (OCR-D)
With the Union Catalogue of Books of the 16th-18th century (VD 16, VD 17, VD 18) published in the German-speaking countries, a retrospective national bibliography of early modern writings from the German-speaking countries in being compiled. In order to facilitate research access to these texts, great concerted efforts have been and are being undertaken to make fully digitised copies or key pages for the recorded titles available in digital form.
The main goal of the project is the conceptual and technical preparation of the full trext transformation of the VD. the task of automatic full-text recognition is broken down into its individual process steps, which can be retraced in the open source OCR-D software. This allows to create optimal workflows for the old prints to be processed and thus to generate scientifically usable full texts.
For this purpose, a coordination project was formed that identified development needs in the first project phase. These were worked on in the second project phase by a total of eight module projects. In the current third project phase, the focus is on the conceptual preparation for the automatic generation of full texts for VD 16, VD 17, and VD 18. In addition, four implementation projects are working on integrating OCR-D into existing applications and infrastructures, while three module projects are further optimising OCR-D tools.
Role of GWDG
- Providing infrastructure and consultant at all phases
- Workflow development at phase 3
- Software development of a long-term archiving system (OLA-HD) at phase 2 and 3
- Software development of an implementation project in phase 3 (OPERANDI)
- Herzog August Bibliothek
- Berlin-Brandenburgische Akademie der Wissenschaften
- Staatsbibliothek zu Berlin - Preußischer Kulturbesitz
- Niedersächsische Staats- und Universitätsbibliothek Göttingen