Document Q&A
Doc Q&A utilizes natural language processing and computer vision to extract information from images, including distorted or low-quality ones. With our Layout LM and Donut Cord models, we offer accurate results for diverse image types and scenarios. Our text extraction feature provides a JSON format for easy integration into other models and systems.
Doc Q&A
Generate answers from images like a pro. Just upload a image and ask question accordingly.
Image to JSON
Our text extraction feature provides a JSON format for easy integration into other models and systems.
Free
We provide our services free of charge and do not offer any paid pricing plans
Doc Q&A
- Doc Q&A is a web application developed in Python that utilizes natural language processing and computer vision to extract information from images uploaded by the user in any format.
- It can identify and decode text from distorted images or those with low quality, making it easier for users to find answers to their questions.
- With Doc Q&A, users can effortlessly obtain accurate information from images without the need for manual transcription or data entry.
- Doc Q&A focused on achieving high accuracy in our models for image analysis. To achieve this, we have extensively tested our models with various types of images and asked relevant questions to ensure that the output is precise and accurate.
Models Used
- LayoutLM: Pre-training of Text and Layout for Document Image Understanding
- Donut Cord: Donut is a multimodal sequence-to-sequence model with a vision encoder (Swin Transformer) and text decoder (BART).
Donut 🍩
- Donut is a web application developed in Python that utilizes natural language processing and computer vision to extract JSON data from images uploaded by the user in any format.
- It can identify and decode text from distorted images or those with low quality, making it easier for users to find JSON data
- With Donut, users can effortlessly obtain accurate information from images without the need for manual transcription or data entry.
Models Used
- Donut Cord: Donut is a multimodal sequence-to-sequence model with a vision encoder (Swin Transformer) and text decoder (BART).
Developers
Varsha Sahu
Final Year Student at Government Engineering College, Bilaspur
Rishabh Dwivedi
Final Year Student at Government Engineering College, Bilaspur
Prankit Sahu
Final Year Student at Government Engineering College, Bilaspur
Prachi Sahu
Final Year Student at Government Engineering College, Bilaspur
Mentor
Sanchita Chourawar
Assistant Professor at Government Engineering College, Bilaspur