Text Generation
Transformers
Safetensors
English
File size: 5,771 Bytes
0c18694
 
 
 
 
 
196c45c
 
 
 
0c18694
 
 
 
 
 
 
196c45c
 
0c18694
 
 
196c45c
0c18694
 
02ad4be
196c45c
 
 
 
 
 
56ec32c
196c45c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1436dfb
 
196c45c
 
 
 
 
 
 
 
 
1436dfb
 
196c45c
0c18694
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
license: mit
datasets:
- BothBosu/scam-dialogue
- BothBosu/Scammer-Conversation
- BothBosu/youtube-scam-conversations
- BothBosu/multi-agent-scam-conversation
- BothBosu/single-agent-scam-conversations
- an19352/scam-baiting-conversations
- scambaitermailbox/scambaiting_dataset
language:
- en
metrics:
- bertscore
- perplexity
- f1
- rouge
- distinct-n
- dialogrpt
base_model:
- meta-llama/LlamaGuard-7b
- meta-llama/Meta-Llama-Guard-2-8B
- meta-llama/Llama-Guard-3-8B
- OpenSafetyLab/MD-Judge-v0.1
pipeline_tag: text-generation
library_name: transformers
---

# Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting

This repository contains **instruction-tuned large language models (LLMs)** designed for **real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning**.  
The models are trained and evaluated as part of the paper:  
**[AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning](https://supreme-lab.github.io/ai-in-the-loop/)**  

---

## Model Details

- **Developed by:** Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale  
- **Funded by:** U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012)  
- **Shared by:** Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam  
- **Model type:** Multi-task instruction-tuned LLMs (classification + safe text generation)  
- **Languages:** English  
- **License:** MIT  
- **Finetuned from:** LlamaGuard family & MD-Judge  

### Model Sources
- **Repository:** [GitHub – supreme-lab/ai-in-the-loop](https://github.com/supreme-lab/ai-in-the-loop)  
- **Hugging Face:** [supreme-lab/ai-in-the-loop](https://huggingface.co/supreme-lab/ai-in-the-loop)  
- **Paper:** [ArXiv version](#)  

---

## Uses

### Direct Use
- Real-time scam classification (scam vs. non-scam conversations)  
- Conversational **scam-baiting** to waste scammer time safely  
- **PII risk scoring** to filter unsafe outputs  

### Downstream Use
- Integration into messaging platforms for scam prevention  
- Benchmarks for **AI safety alignment** in adversarial contexts  
- Research in **federated privacy-preserving LLMs**  

### Out-of-Scope Use
- Should **not** be used as a replacement for law enforcement tools  
- Should **not** be deployed without safety filters and human-in-the-loop monitoring  
- Not intended for **financial or medical decision-making**  

---

## Bias, Risks, and Limitations

- Models may **over-engage with scammers** in rare cases  
- Possible **false positives** in benign conversations  
- Cultural/linguistic bias: trained primarily on **English data**  
- Risk of **hallucination** when generating long responses  

### Recommendations
- Always deploy with **safety thresholds (δ, θ1, θ2)**  
- Use in **controlled environments** first (research, simulations)  
- Extend to **multilingual settings** before real-world deployment  

---

## How to Get Started

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Replase the <x> with 2 or 3 and Nothing (when it is llama-guard-multi-task)
model_id = "supreme-lab/ai-in-the-loop/llama-guard-<x>-multi-task"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

---

## Training Details

### Training Data
- **Classification:** SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues)  
- **Generation:** SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions)  
- **Auxiliary:** ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset  

### Training Procedure
- **Fine-tuning setup:**  
  - 3 epochs, batch size = 8  
  - LoRA rank = 8, α = 16  
  - Mixed precision (bf16)  
  - Optimizer: AdamW  
- **Federated Learning (FL):**  
  - Simulated 10 clients, 30 rounds FedAvg  
  - Optional **Differential Privacy** (noise multipliers: 0.1, 0.8)  

---

## Evaluation

### Metrics
- **Classification:** F1, AUPRC, FPR, FNR  
- **Generation:** Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench  

### Results
- **Classification:** BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive  
- **Instruction-tuned LLMs:** MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation  
- **Generation:** MD-Judge achieved **lowest perplexity (22.3)**, **highest engagement (0.79)**, **96% safety compliance** in human evals  

---

## Environmental Impact

- **Hardware:** NVIDIA H100 GPUs  
- **Training Time:** ~30 hrs across models  
- **Federated Setup:** 10 simulated clients, 30 rounds 

---

## Technical Specifications

- **Architecture:** Instruction-tuned transformer (decoder-only)  
- **Objective:** Multi-task (classification, risk scoring, safe generation)
---

## Citation

If you use these models, please cite our paper:

```bibtex
@article{hossain2025aiintheloop,
  title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning},
  author={Hossain, Ismail; Puppala, Sai; Alam, Md Jahangir; and Talukder, Sajedul},
  journal={[arXiv preprint arXiv:2509.05362](https://arxiv.org/abs/2509.05362)},
  year={2025}
}
```

---

## Contact

- **Authors:** [email protected], [email protected], [email protected]  
- **Lab:** [Supreme Lab](https://www.cs.utep.edu/stalukder/supremelab/index.html)
- **Personal Web:**   [https://ismail102.github.io/](https://ismail102.github.io/)

---