Ar86Bat commited on
Commit
1a705c7
Β·
verified Β·
1 Parent(s): 3b85c1e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -3
README.md CHANGED
@@ -1,3 +1,111 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # πŸ“„ Finance Document Classification API
2
+
3
+ A fine-tuned [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) model served via **FastAPI** for classifying finance-related documents. It uses a DistilBERT base model fine-tuned on the English subset of the Synthetic PII Finance Multilingual dataset. The API can be run locally or inside Docker and offers both `/predict` and `/health` endpoints.
4
+
5
+ ## πŸš€ Features
6
+ - Fine-tuned DistilBERT-based classification model
7
+ - REST API with `/predict` and `/health` endpoints
8
+ - Docker-ready for easy deployment
9
+ - High accuracy with production-ready code
10
+
11
+ ## πŸ“Š Model Details
12
+ - **Base Model:** distilbert-base-uncased
13
+ - **Task:** Multi-class finance document classification
14
+ - **Language:** English
15
+ - **Dataset:** Synthetic PII Finance Multilingual (English subset)
16
+ - **Framework:** Hugging Face Transformers
17
+ - **Metrics:**
18
+ | Metric | Score |
19
+ |-------------|---------|
20
+ | Accuracy | 98.65% |
21
+ | Precision | 98.70% |
22
+ | Recall | 98.65% |
23
+ | F1 | 98.65% |
24
+
25
+ ## πŸ“‚ Project Structure
26
+ ```
27
+ finance_document_classification/
28
+ β”œβ”€ app/
29
+ β”‚ └─ main.py # FastAPI app
30
+ β”œβ”€ final_model/ # Saved model & tokenizer
31
+ β”œβ”€ requirements.txt
32
+ β”œβ”€ Dockerfile
33
+ β”œβ”€ .dockerignore
34
+ └─ README.md
35
+ ```
36
+
37
+ ## πŸ›  Installation
38
+ Clone the repository:
39
+ ```bash
40
+ git clone https://github.com/Ar86Bat/Finance-Document-Text-Classification.git
41
+ cd Finance-Document-Text-Classification
42
+ ```
43
+
44
+ Create a virtual environment:
45
+ ```bash
46
+ python3 -m venv .venv
47
+ source .venv/bin/activate # Windows: .venv\Scripts\activate
48
+ ```
49
+
50
+ Install dependencies:
51
+ ```bash
52
+ pip install --upgrade pip
53
+ pip install -r requirements.txt
54
+ ```
55
+
56
+ ## ▢️ Run Locally
57
+ ```bash
58
+ uvicorn app.main:app --reload
59
+ ```
60
+ The API will be available at:
61
+ ```
62
+ http://127.0.0.1:8000/docs
63
+ ```
64
+
65
+ ## 🐳 Run with Docker
66
+ ```bash
67
+ docker build -t finance-doc-classifier .
68
+ docker run -p 8000:8000 finance-doc-classifier
69
+ ```
70
+
71
+ ## πŸ“‘ API Endpoints
72
+ ### `POST /predict`
73
+ **Request:**
74
+ ```json
75
+ {
76
+ "text": "Client requested details about investment restrictions."
77
+ }
78
+ ```
79
+ **Response:**
80
+ ```json
81
+ {
82
+ "label": "Investment Restrictions",
83
+ "confidence": 0.987
84
+ }
85
+ ```
86
+
87
+ ### `GET /health`
88
+ Returns API health status.
89
+
90
+ ## πŸ“¦ Use the Model in Python
91
+ ```python
92
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
93
+ import torch
94
+
95
+ model_path = "Ar86Bat/Finance-Document-Text-Classification"
96
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
97
+ model = AutoModelForSequenceClassification.from_pretrained(model_path)
98
+
99
+ text = "Client requested details about investment restrictions."
100
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
101
+
102
+ with torch.no_grad():
103
+ outputs = model(**inputs)
104
+ probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
105
+ pred_id = torch.argmax(probs, dim=1).item()
106
+
107
+ print("Predicted class ID:", pred_id)
108
+ ```
109
+
110
+ ## πŸ“œ License
111
+ MIT License.