Zeig Dich: Dataset para Reconhecimento de Tipos de Fonte de Jornais Históricos Teuto-Brasileiros

Autores

  • Lucas ulzbach Universidade Federal do Paraná Curitiba, Paraná, Brasil
  • Thomas Bianchi Todt Universidade Federal do Paraná Curitiba, Paraná, Brasil
  • Thalita Maria do Nascimento Universidade Federal do Paraná Curitiba, Paraná, Brasil
  • Eduardo Todt Universidade Federal do Paraná Curitiba, Paraná, Brasil
  • Pedro Domingos Tricossi dos Santos Universidade Federal do Paraná Curitiba, Paraná, Brasil

DOI:

https://doi.org/10.14210/cotb.v14.p447-449

Resumo

ABSTRACT
This paper addresses the challenge of typeface recognition, within
the broader scope of optical character recognition of historical
German-Brazilian periodicals. A dataset of words containing annotations
of font types and transcriptions for training neural networks
for typeface and text recognition is presented. By enabling wordlevel
typeface and text recognition, the authors plan to later develop
techniques for high-precision OCR of historical prints typeset in
heterogeneous font styles. The value of this dataset is proven by the
excellent results obtained by artificial neural networks trained on it.
The authors also recognize that even better results can be obtained
by exploring new ways of organizing the dataset prior to training,
and that the results can also be improved through modifications in
the architecture of the nets used.

Downloads

Publicado

03-05-2023

Edição

Seção

Resumos Estendidos