This project is a Python-based document parsing application developed using Jupyter Notebook. It is designed to extract text, metadata, and tables from documents such as PDF, DOCX, and TXT files, and ...
Standalone implementation of RAGFlow's backend pipeline that transforms raw documents into searchable, embedded chunks through parsing, chunking, embedding, tokenization, and indexing. This project ...