Ar86Bat commited on
Commit
eba7307
Β·
verified Β·
1 Parent(s): 1a705c7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -87
README.md CHANGED
@@ -1,75 +1,66 @@
1
- # πŸ“„ Finance Document Classification API
2
-
3
- A fine-tuned [Hugging Face Transformers](https://huggingface.co/docs/transformers/index) model served via **FastAPI** for classifying finance-related documents. It uses a DistilBERT base model fine-tuned on the English subset of the Synthetic PII Finance Multilingual dataset. The API can be run locally or inside Docker and offers both `/predict` and `/health` endpoints.
4
-
5
- ## πŸš€ Features
6
- - Fine-tuned DistilBERT-based classification model
7
- - REST API with `/predict` and `/health` endpoints
8
- - Docker-ready for easy deployment
9
- - High accuracy with production-ready code
10
-
11
- ## πŸ“Š Model Details
12
- - **Base Model:** distilbert-base-uncased
13
- - **Task:** Multi-class finance document classification
14
- - **Language:** English
15
- - **Dataset:** Synthetic PII Finance Multilingual (English subset)
16
- - **Framework:** Hugging Face Transformers
17
- - **Metrics:**
18
- | Metric | Score |
19
- |-------------|---------|
20
- | Accuracy | 98.65% |
21
- | Precision | 98.70% |
22
- | Recall | 98.65% |
23
- | F1 | 98.65% |
24
-
25
- ## πŸ“‚ Project Structure
26
- ```
27
- finance_document_classification/
28
- β”œβ”€ app/
29
- β”‚ └─ main.py # FastAPI app
30
- β”œβ”€ final_model/ # Saved model & tokenizer
31
- β”œβ”€ requirements.txt
32
- β”œβ”€ Dockerfile
33
- β”œβ”€ .dockerignore
34
- └─ README.md
35
- ```
 
36
 
37
- ## πŸ›  Installation
38
- Clone the repository:
39
- ```bash
40
- git clone https://github.com/Ar86Bat/Finance-Document-Text-Classification.git
41
- cd Finance-Document-Text-Classification
42
- ```
43
 
44
- Create a virtual environment:
45
- ```bash
46
- python3 -m venv .venv
47
- source .venv/bin/activate # Windows: .venv\Scripts\activate
48
- ```
49
 
50
- Install dependencies:
51
- ```bash
52
- pip install --upgrade pip
53
- pip install -r requirements.txt
54
- ```
55
 
56
- ## ▢️ Run Locally
57
- ```bash
58
- uvicorn app.main:app --reload
59
- ```
60
- The API will be available at:
61
- ```
62
- http://127.0.0.1:8000/docs
63
- ```
64
 
65
- ## 🐳 Run with Docker
66
- ```bash
67
- docker build -t finance-doc-classifier .
68
- docker run -p 8000:8000 finance-doc-classifier
69
  ```
70
 
71
- ## πŸ“‘ API Endpoints
72
- ### `POST /predict`
 
 
 
 
 
73
  **Request:**
74
  ```json
75
  {
@@ -84,28 +75,17 @@ docker run -p 8000:8000 finance-doc-classifier
84
  }
85
  ```
86
 
87
- ### `GET /health`
88
- Returns API health status.
89
-
90
- ## πŸ“¦ Use the Model in Python
91
- ```python
92
- from transformers import AutoTokenizer, AutoModelForSequenceClassification
93
- import torch
94
-
95
- model_path = "Ar86Bat/Finance-Document-Text-Classification"
96
- tokenizer = AutoTokenizer.from_pretrained(model_path)
97
- model = AutoModelForSequenceClassification.from_pretrained(model_path)
98
-
99
- text = "Client requested details about investment restrictions."
100
- inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
101
-
102
- with torch.no_grad():
103
- outputs = model(**inputs)
104
- probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
105
- pred_id = torch.argmax(probs, dim=1).item()
106
 
107
- print("Predicted class ID:", pred_id)
 
 
 
 
 
 
108
  ```
109
 
110
- ## πŸ“œ License
111
- MIT License.
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - distilbert/distilbert-base-uncased
7
+ tags:
8
+ - finance
9
+ - document-classification
10
+ datasets:
11
+ - gretelai/synthetic_pii_finance_multilingual
12
+ metrics:
13
+ - accuracy
14
+ pipeline_tag: text-classification
15
+ ---
16
+
17
+ # πŸ“„ Finance Document Classification
18
+
19
+ A fine-tuned DistilBERT model for classifying finance-related documents. This model is based on `distilbert-base-uncased` and fine-tuned on the English subset of the Synthetic PII Finance Multilingual dataset. It is suitable for multi-class document classification tasks in the finance domain.
20
+
21
+ ## Model Details
22
+ - **Base Model:** distilbert-base-uncased
23
+ - **Task:** Multi-class finance document classification
24
+ - **Language:** English
25
+ - **Dataset:** Synthetic PII Finance Multilingual (English subset)
26
+ - **Framework:** Hugging Face Transformers
27
+
28
+ ## Metrics
29
+ | Metric | Score |
30
+ |-------------|---------|
31
+ | Accuracy | 98.65% |
32
+ | Precision | 98.70% |
33
+ | Recall | 98.65% |
34
+ | F1 | 98.65% |
35
+
36
+ ## How to Use
37
 
38
+ ```python
39
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
40
+ import torch
 
 
 
41
 
42
+ model_id = "Ar86Bat/Finance-Document-Text-Classification"
43
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
44
+ model = AutoModelForSequenceClassification.from_pretrained(model_id)
 
 
45
 
46
+ text = "Client requested details about investment restrictions."
47
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
 
 
 
48
 
49
+ with torch.no_grad():
50
+ outputs = model(**inputs)
51
+ probs = torch.nn.functional.softmax(outputs.logits, dim=-1)
52
+ pred_id = torch.argmax(probs, dim=1).item()
 
 
 
 
53
 
54
+ print("Predicted class ID:", pred_id)
 
 
 
55
  ```
56
 
57
+ ## Intended Uses & Limitations
58
+ - **Intended use:** Automated classification of finance-related documents for compliance, organization, or workflow automation.
59
+ - **Not suitable for:** Non-financial or out-of-domain documents without further fine-tuning.
60
+
61
+ ## Example API Usage
62
+ This model can be served via FastAPI or other REST frameworks. Example request/response:
63
+
64
  **Request:**
65
  ```json
66
  {
 
75
  }
76
  ```
77
 
78
+ ## Citation
79
+ If you use this model, please cite the repository:
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
 
81
+ ```
82
+ @misc{ar86bat_finance_doc_classification_2025,
83
+ author = {Arif Hizlan},
84
+ title = {Finance Document Text Classification},
85
+ year = {2025},
86
+ howpublished = {\\url{https://huggingface.co/Ar86Bat/Finance-Document-Text-Classification}}
87
+ }
88
  ```
89
 
90
+ ## License
91
+ MIT License