Text Generation
Transformers
Safetensors
English
ismail-h commited on
Commit
196c45c
·
verified ·
1 Parent(s): 02ad4be

Update the Readme with details

Browse files
Files changed (1) hide show
  1. README.md +149 -1
README.md CHANGED
@@ -4,6 +4,10 @@ datasets:
4
  - BothBosu/scam-dialogue
5
  - BothBosu/Scammer-Conversation
6
  - BothBosu/youtube-scam-conversations
 
 
 
 
7
  language:
8
  - en
9
  metrics:
@@ -11,11 +15,155 @@ metrics:
11
  - perplexity
12
  - f1
13
  - rouge
 
 
14
  base_model:
15
  - meta-llama/LlamaGuard-7b
16
- - meta-llama/Llama-Guard-3-8B
17
  - meta-llama/Meta-Llama-Guard-2-8B
 
18
  - OpenSafetyLab/MD-Judge-v0.1
19
  pipeline_tag: text-generation
20
  library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
  ---
 
4
  - BothBosu/scam-dialogue
5
  - BothBosu/Scammer-Conversation
6
  - BothBosu/youtube-scam-conversations
7
+ - BothBosu/multi-agent-scam-conversation
8
+ - BothBosu/single-agent-scam-conversations
9
+ - an19352/scam-baiting-conversations
10
+ - scambaitermailbox/scambaiting_dataset
11
  language:
12
  - en
13
  metrics:
 
15
  - perplexity
16
  - f1
17
  - rouge
18
+ - distinct-n
19
+ - dialogrpt
20
  base_model:
21
  - meta-llama/LlamaGuard-7b
 
22
  - meta-llama/Meta-Llama-Guard-2-8B
23
+ - meta-llama/Llama-Guard-3-8B
24
  - OpenSafetyLab/MD-Judge-v0.1
25
  pipeline_tag: text-generation
26
  library_name: transformers
27
+ ---
28
+
29
+ # Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting
30
+
31
+ This repository contains **instruction-tuned large language models (LLMs)** designed for **real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning**.
32
+ The models are trained and evaluated as part of the paper:
33
+ **[AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning](https://arxiv.org/abs/)**
34
+
35
+ ---
36
+
37
+ ## Model Details
38
+
39
+ - **Developed by:** Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale
40
+ - **Funded by:** U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012)
41
+ - **Shared by:** Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam
42
+ - **Model type:** Multi-task instruction-tuned LLMs (classification + safe text generation)
43
+ - **Languages:** English
44
+ - **License:** MIT
45
+ - **Finetuned from:** LlamaGuard family & MD-Judge
46
+
47
+ ### Model Sources
48
+ - **Repository:** [GitHub – supreme-lab/ai-in-the-loop](https://github.com/supreme-lab/ai-in-the-loop)
49
+ - **Hugging Face:** [supreme-lab/ai-in-the-loop](https://huggingface.co/supreme-lab/ai-in-the-loop)
50
+ - **Paper:** [ArXiv version](#)
51
+
52
+ ---
53
+
54
+ ## Uses
55
+
56
+ ### Direct Use
57
+ - Real-time scam classification (scam vs. non-scam conversations)
58
+ - Conversational **scam-baiting** to waste scammer time safely
59
+ - **PII risk scoring** to filter unsafe outputs
60
+
61
+ ### Downstream Use
62
+ - Integration into messaging platforms for scam prevention
63
+ - Benchmarks for **AI safety alignment** in adversarial contexts
64
+ - Research in **federated privacy-preserving LLMs**
65
+
66
+ ### Out-of-Scope Use
67
+ - Should **not** be used as a replacement for law enforcement tools
68
+ - Should **not** be deployed without safety filters and human-in-the-loop monitoring
69
+ - Not intended for **financial or medical decision-making**
70
+
71
+ ---
72
+
73
+ ## Bias, Risks, and Limitations
74
+
75
+ - Models may **over-engage with scammers** in rare cases
76
+ - Possible **false positives** in benign conversations
77
+ - Cultural/linguistic bias: trained primarily on **English data**
78
+ - Risk of **hallucination** when generating long responses
79
+
80
+ ### Recommendations
81
+ - Always deploy with **safety thresholds (δ, θ1, θ2)**
82
+ - Use in **controlled environments** first (research, simulations)
83
+ - Extend to **multilingual settings** before real-world deployment
84
+
85
+ ---
86
+
87
+ ## How to Get Started
88
+
89
+ ```python
90
+ from transformers import AutoModelForCausalLM, AutoTokenizer
91
+ # Replase the <x> with 2 or 3 and Nothing (when it is llama-guard-multi-task)
92
+ model_id = "supreme-lab/ai-in-the-loop/llama-guard-<x>-multi-task"
93
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
94
+ model = AutoModelForCausalLM.from_pretrained(model_id)
95
+
96
+ inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt")
97
+ outputs = model.generate(**inputs, max_new_tokens=100)
98
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
99
+ ```
100
+
101
+ ---
102
+
103
+ ## Training Details
104
+
105
+ ### Training Data
106
+ - **Classification:** SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues)
107
+ - **Generation:** SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions)
108
+ - **Auxiliary:** ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset
109
+
110
+ ### Training Procedure
111
+ - **Fine-tuning setup:**
112
+ - 3 epochs, batch size = 8
113
+ - LoRA rank = 8, α = 16
114
+ - Mixed precision (bf16)
115
+ - Optimizer: AdamW
116
+ - **Federated Learning (FL):**
117
+ - Simulated 10 clients, 30 rounds FedAvg
118
+ - Optional **Differential Privacy** (noise multipliers: 0.1, 0.8)
119
+
120
+ ---
121
+
122
+ ## Evaluation
123
+
124
+ ### Metrics
125
+ - **Classification:** F1, AUPRC, FPR, FNR
126
+ - **Generation:** Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench
127
+
128
+ ### Results
129
+ - **Classification:** BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive
130
+ - **Instruction-tuned LLMs:** MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation
131
+ - **Generation:** MD-Judge achieved **lowest perplexity (22.3)**, **highest engagement (0.79)**, **96% safety compliance** in human evals
132
+
133
+ ---
134
+
135
+ ## Environmental Impact
136
+
137
+ - **Hardware:** NVIDIA H100 GPUs
138
+ - **Training Time:** ~30 hrs across models
139
+ - **Federated Setup:** 10 simulated clients, 30 rounds
140
+
141
+ ---
142
+
143
+ ## Technical Specifications
144
+
145
+ - **Architecture:** Instruction-tuned transformer (decoder-only)
146
+ - **Objective:** Multi-task (classification, risk scoring, safe generation)
147
+ ---
148
+
149
+ ## Citation
150
+
151
+ If you use these models, please cite our paper:
152
+
153
+ ```bibtex
154
+ @article{hossain2025aiintheloop,
155
+ title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning},
156
+ author={Hossain, Ismail and Puppala, Sai and Talukder, Sajedul and Alam, Md Jahangir},
157
+ journal={arXiv preprint arXiv:XXXX.XXXXX},
158
+ year={2025}
159
+ }
160
+ ```
161
+
162
+ ---
163
+
164
+ ## Contact
165
+
166
167
+ - **Lab:** [Supreme Lab](https://www.cs.utep.edu/stalukder/supremelab/index.html)
168
+
169
  ---