This project implements a simple ETL (Extract, Transform, Load) pipeline in Python using Tabula to extract tables from PDF files and process them. The pipeline is created for 2 tables segment_table on ...
tabula-py requires Java for table extraction. If Java is unavailable, the script still parses PDFs with pdfplumber text/table extraction.