Synthetic Documents for Layout Recognition

Dataset of synthetic document images along with labels and bounding boxes of the layout elements. The documents correspond to three different domains namely articles, resumes and forms. We focus mainly on the document structure and produce visually unique samples capturing complex and diverse layouts.  The layout categories include generic elements such as titles, sections, headers/footers, tables, figures etc. and domain specific elements such as equations, skills, profiles, questions, answers etc. 

Sample Synthetic Document With Annotation

Sample Synthetic Document With Annotation

References

1.  Synthetic Document Generator for Annotation-free Layout Recognition.
    N Raman, S Shah, and M Veloso.
    Pattern Recognition, 2022.

Would you like to know more about AI Research at J.P. Morgan?

For upcoming workshops and updates, visit: