Ar86Bat
/

Finance-Document-Text-Classification

@@ -1,75 +1,66 @@
-# 📄 Finance Document Classification API
-A fine-tuned [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) model served via **FastAPI** for classifying finance-related documents. It uses a DistilBERT base model fine-tuned on the English subset of the Synthetic PII Finance Multilingual dataset. The API can be run locally or inside Docker and offers both `/predict` and `/health` endpoints.
-## 🚀 Features
-- Fine-tuned DistilBERT-based classification model
-- REST API with `/predict` and `/health` endpoints
-- Docker-ready for easy deployment
-- High accuracy with production-ready code
-## 📊 Model Details
-- **Base Model:** distilbert-base-uncased
-- **Task:** Multi-class finance document classification
-- **Language:** English
-- **Dataset:** Synthetic PII Finance Multilingual (English subset)
-- **Framework:** Hugging Face Transformers
-- **Metrics:**
-  | Metric      | Score   |
-  |-------------|---------|
-  | Accuracy    | 98.65%  |
-  | Precision   | 98.70%  |
-  | Recall      | 98.65%  |
-  | F1          | 98.65%  |
-## 📂 Project Structure
-```
-finance_document_classification/
-├─ app/
-│  └─ main.py               # FastAPI app
-├─ final_model/              # Saved model & tokenizer
-├─ requirements.txt
-├─ Dockerfile
-├─ .dockerignore
-└─ README.md
-```
-## 🛠 Installation
-Clone the repository:
-```bash
-git clone https://github.com/Ar86Bat/Finance-Document-Text-Classification.git
-cd Finance-Document-Text-Classification
-```
-Create a virtual environment:
-```bash
-python3 -m venv .venv
-source .venv/bin/activate   # Windows: .venv\Scripts\activate
-```
-Install dependencies:
-```bash
-pip install --upgrade pip
-pip install -r requirements.txt
-```
-## ▶️ Run Locally
-```bash
-uvicorn app.main:app --reload
-```
-The API will be available at:
-```
-http://127.0.0.1:8000/docs
-```
-## 🐳 Run with Docker
-```bash
-docker build -t finance-doc-classifier .
-docker run -p 8000:8000 finance-doc-classifier
 ```
-## 📡 API Endpoints
-### `POST /predict`
 **Request:**
 ```json
 {
@@ -84,28 +75,17 @@ docker run -p 8000:8000 finance-doc-classifier
 }
 ```
-### `GET /health`
-Returns API health status.
-## 📦 Use the Model in Python
-```python
-from transformers import AutoTokenizer, AutoModelForSequenceClassification
-import torch
-model_path = "Ar86Bat/Finance-Document-Text-Classification"
-tokenizer = AutoTokenizer.from_pretrained(model_path)
-model = AutoModelForSequenceClassification.from_pretrained(model_path)
-text = "Client requested details about investment restrictions."
-inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
-with torch.no_grad():
-    outputs = model(**inputs)
-    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
-    pred_id = torch.argmax(probs, dim=1).item()
-print("Predicted class ID:", pred_id)
 ```
-## 📜 License
-MIT License.

+---
+license: mit
+language:
+- en
+base_model:
+- distilbert/distilbert-base-uncased
+tags:
+- finance
+- document-classification
+datasets:
+- gretelai/synthetic_pii_finance_multilingual
+metrics:
+- accuracy
+pipeline_tag: text-classification
+---
+# 📄 Finance Document Classification
+A fine-tuned DistilBERT model for classifying finance-related documents. This model is based on `distilbert-base-uncased` and fine-tuned on the English subset of the Synthetic PII Finance Multilingual dataset. It is suitable for multi-class document classification tasks in the finance domain.
+## Model Details
+- **Base Model:** distilbert-base-uncased
+- **Task:** Multi-class finance document classification
+- **Language:** English
+- **Dataset:** Synthetic PII Finance Multilingual (English subset)
+- **Framework:** Hugging Face Transformers
+## Metrics
+| Metric      | Score   |
+|-------------|---------|
+| Accuracy    | 98.65%  |
+| Precision   | 98.70%  |
+| Recall      | 98.65%  |
+| F1          | 98.65%  |
+## How to Use
+```python
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+import torch
+model_id = "Ar86Bat/Finance-Document-Text-Classification"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(model_id)
+text = "Client requested details about investment restrictions."
+inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
+with torch.no_grad():
+    outputs = model(**inputs)
+    probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
+    pred_id = torch.argmax(probs, dim=1).item()
+print("Predicted class ID:", pred_id)
 ```
+## Intended Uses & Limitations
+- **Intended use:** Automated classification of finance-related documents for compliance, organization, or workflow automation.
+- **Not suitable for:** Non-financial or out-of-domain documents without further fine-tuning.
+## Example API Usage
+This model can be served via FastAPI or other REST frameworks. Example request/response:
 **Request:**
 ```json
 {
 }
 ```
+## Citation
+If you use this model, please cite the repository:
+```
+@misc{ar86bat_finance_doc_classification_2025,
+  author = {Arif Hizlan},
+  title = {Finance Document Text Classification},
+  year = {2025},
+  howpublished = {\\url{https://huggingface.co/Ar86Bat/Finance-Document-Text-Classification}}
+}
 ```
+## License
+MIT License