File size: 4,255 Bytes
58f48eb f2f8d0d 58f48eb 5dcdd42 58f48eb 5dcdd42 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
---
library_name: PaddleOCR
tags:
- table-extraction
- paddleocr
- huggingface
license: mit
---
# **π Table Extraction Tool: OCR & Computer Vision for Structured Data**
[](https://opensource.org/licenses/MIT)
[](https://github.com/Sudhanshu1304/table-transformer)
[](https://github.com/Sudhanshu1304/table-transformer/stargazers)
[](https://github.com/Sudhanshu1304/table-transformer/watchers)
## Overview
Table Transformer is an advanced open-source tool that leverages state-of-the-art OCR and computer vision techniques to extract structured tabular data from images. It is ideal for enhancing LLM preprocessing, powering data analysis pipelines, and automating your data extraction tasks.
## Features
- π **Automatic Table Detection**: Effortlessly detect tables in images.
- π **OCR-based Document Processing**: Extract text with high accuracy.
- π§ **Integrated Models**: Seamlessly combine OCR and table detection models.
- πΎ **Flexible Export Options**: Export data as DataFrame, HTML, CSV, and more.
---
## **Tool Overview**
<div align="center">
<!-- First Row -->
<img src="images/image1.png" alt="Image upload" width="45%" style="margin: 10px;">
<img src="images/image2.png" alt="Table detection & extraction" width="45%" style="margin: 10px;">
<!-- Second Row -->
<img src="images/image3.png" alt="Table in HTML format" width="45%" style="margin: 10px;">
<img src="images/image4.png" alt="Table exported as CSV" width="45%" style="margin: 10px;">
</div>
---
## **Open-Source Tools Used**
- **[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR)**: For text extraction.
- **[Hugging Face Table Detection](https://huggingface.co/foduucom/table-detection-and-extraction)**: For table structure detection.
---
## **Installation**
### **Prerequisites**
- Python 3.8+
- Conda
### **Setup**
1. **Clone the Repository**
Clone the repository to your local machine:
```bash
git clone https://github.com/Sudhanshu1304/table-transformer.git
cd table-transformer
```
2. **Create and Activate Conda Environment**
Create a new conda environment and activate it:
```bash
conda create --name myenv python=3.12.7
conda activate myenv
```
3. **Install PaddlePaddle**
Install PaddlePaddle in the conda environment:
```bash
python -m pip install paddlepaddle==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
```
4. **Install PaddleOCR**
Install PaddleOCR:
```bash
pip install paddleocr
```
5. **Install Additional Dependencies**
Install other required packages:
```bash
pip install ultralytics pandas
pip install streamlit
```
### **Project Structure**
```
project/
βββ src/
β βββ streamlit_app.py # Streamlit application
β βββ table_creator/
β β βββ processing.py # Core processing logic
β βββ models/
β β βββ text.py # table detection and text recognition
β
βββ requirements.txt # Dependencies
βββ README.md # Project documentation
βββ .gitignore # Git ignore configuration
```
### **Usage**
Run the Streamlit app to interact with the tool:
```bash
streamlit run src/streamlit_app.py
```
### **Contributions**
Contributions are welcome! Please fork the repository and submit a pull request with your improvements or new features.
### **License**
This project is licensed under the MIT License.
---
## **Connect with Us**
Stay updated and connect for any queries or contributions:
- **GitHub**: [Sudhanshu1304](https://github.com/Sudhanshu1304)
- **LinkedIn**: [Sudhanshu Pandey](https://www.linkedin.com/in/sudhanshu-pandey-847448193/)
- **Medium**: [@sudhanshu.dpandey](https://medium.com/@sudhanshu.dpandey)
---
## **Support**
If you find this tool useful, please consider giving it a β on GitHub. Your support is greatly appreciated!
Happy Extracting!
|