Document Q&A

Doc Q&A utilizes natural language processing and computer vision to extract information from images, including distorted or low-quality ones. With our Layout LM and Donut Cord models, we offer accurate results for diverse image types and scenarios. Our text extraction feature provides a JSON format for easy integration into other models and systems.

Doc Q&A
Generate answers from images like a pro. Just upload a image and ask question accordingly.
Image to JSON
Our text extraction feature provides a JSON format for easy integration into other models and systems.
Free
We provide our services free of charge and do not offer any paid pricing plans

Doc Q&A

  • Doc Q&A  is a web application developed in Python that utilizes natural language processing and computer vision to extract information from images uploaded by the user in any format.
  • It can identify and decode text from distorted images or those with low quality, making it easier for users to find answers to their questions.
  • With Doc Q&A, users can effortlessly obtain accurate information from images without the need for manual transcription or data entry.
  • Doc Q&A  focused on achieving high accuracy in our models for image analysis. To achieve this, we have extensively tested our models with various types of images and asked relevant questions to ensure that the output is precise and accurate.

Models Used

  • LayoutLM:  Pre-training of Text and Layout for Document Image Understanding
  • Donut Cord:  Donut is a multimodal sequence-to-sequence model with a vision encoder (Swin Transformer) and text decoder (BART).

Research Papers

  • Cornell University:  Pre-training of Text and Layout for Document Image Understanding   (Visit)
  • Hugging Faces:  Official documentation for using those models. (Visit)

Donut 🍩

  • Donut  is a web application developed in Python that utilizes natural language processing and computer vision to extract JSON data from images uploaded by the user in any format.
  • It can identify and decode text from distorted images or those with low quality, making it easier for users to find JSON data
  • With Donut, users can effortlessly obtain accurate information from images without the need for manual transcription or data entry.

Models Used

  • Donut Cord:  Donut is a multimodal sequence-to-sequence model with a vision encoder (Swin Transformer) and text decoder (BART).

Research Papers

  • Cornell University:  OCR-free Document Understanding Transformer(Visit)
  • Hugging Faces:  Official documentation for using those models. (Visit)

Developers

Varsha Sahu

Final Year Student at Government Engineering College, Bilaspur

Rishabh Dwivedi

Final Year Student at Government Engineering College, Bilaspur

Prankit Sahu

Final Year Student at Government Engineering College, Bilaspur

Prachi Sahu

Final Year Student at Government Engineering College, Bilaspur

Mentor

Sanchita Chourawar

Assistant Professor at Government Engineering College, Bilaspur