Text Generation
Transformers
Safetensors
English
ai-in-the-loop / README.md
ismail-h's picture
Update README.md
56ec32c verified
---
license: mit
datasets:
- BothBosu/scam-dialogue
- BothBosu/Scammer-Conversation
- BothBosu/youtube-scam-conversations
- BothBosu/multi-agent-scam-conversation
- BothBosu/single-agent-scam-conversations
- an19352/scam-baiting-conversations
- scambaitermailbox/scambaiting_dataset
language:
- en
metrics:
- bertscore
- perplexity
- f1
- rouge
- distinct-n
- dialogrpt
base_model:
- meta-llama/LlamaGuard-7b
- meta-llama/Meta-Llama-Guard-2-8B
- meta-llama/Llama-Guard-3-8B
- OpenSafetyLab/MD-Judge-v0.1
pipeline_tag: text-generation
library_name: transformers
---
# Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting
This repository contains **instruction-tuned large language models (LLMs)** designed for **real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning**.
The models are trained and evaluated as part of the paper:
**[AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning](https://supreme-lab.github.io/ai-in-the-loop/)**
---
## Model Details
- **Developed by:** Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale
- **Funded by:** U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012)
- **Shared by:** Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam
- **Model type:** Multi-task instruction-tuned LLMs (classification + safe text generation)
- **Languages:** English
- **License:** MIT
- **Finetuned from:** LlamaGuard family & MD-Judge
### Model Sources
- **Repository:** [GitHub – supreme-lab/ai-in-the-loop](https://github.com/supreme-lab/ai-in-the-loop)
- **Hugging Face:** [supreme-lab/ai-in-the-loop](https://huggingface.co/supreme-lab/ai-in-the-loop)
- **Paper:** [ArXiv version](#)
---
## Uses
### Direct Use
- Real-time scam classification (scam vs. non-scam conversations)
- Conversational **scam-baiting** to waste scammer time safely
- **PII risk scoring** to filter unsafe outputs
### Downstream Use
- Integration into messaging platforms for scam prevention
- Benchmarks for **AI safety alignment** in adversarial contexts
- Research in **federated privacy-preserving LLMs**
### Out-of-Scope Use
- Should **not** be used as a replacement for law enforcement tools
- Should **not** be deployed without safety filters and human-in-the-loop monitoring
- Not intended for **financial or medical decision-making**
---
## Bias, Risks, and Limitations
- Models may **over-engage with scammers** in rare cases
- Possible **false positives** in benign conversations
- Cultural/linguistic bias: trained primarily on **English data**
- Risk of **hallucination** when generating long responses
### Recommendations
- Always deploy with **safety thresholds (δ, θ1, θ2)**
- Use in **controlled environments** first (research, simulations)
- Extend to **multilingual settings** before real-world deployment
---
## How to Get Started
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Replase the <x> with 2 or 3 and Nothing (when it is llama-guard-multi-task)
model_id = "supreme-lab/ai-in-the-loop/llama-guard-<x>-multi-task"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
---
## Training Details
### Training Data
- **Classification:** SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues)
- **Generation:** SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions)
- **Auxiliary:** ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset
### Training Procedure
- **Fine-tuning setup:**
- 3 epochs, batch size = 8
- LoRA rank = 8, α = 16
- Mixed precision (bf16)
- Optimizer: AdamW
- **Federated Learning (FL):**
- Simulated 10 clients, 30 rounds FedAvg
- Optional **Differential Privacy** (noise multipliers: 0.1, 0.8)
---
## Evaluation
### Metrics
- **Classification:** F1, AUPRC, FPR, FNR
- **Generation:** Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench
### Results
- **Classification:** BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive
- **Instruction-tuned LLMs:** MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation
- **Generation:** MD-Judge achieved **lowest perplexity (22.3)**, **highest engagement (0.79)**, **96% safety compliance** in human evals
---
## Environmental Impact
- **Hardware:** NVIDIA H100 GPUs
- **Training Time:** ~30 hrs across models
- **Federated Setup:** 10 simulated clients, 30 rounds
---
## Technical Specifications
- **Architecture:** Instruction-tuned transformer (decoder-only)
- **Objective:** Multi-task (classification, risk scoring, safe generation)
---
## Citation
If you use these models, please cite our paper:
```bibtex
@article{hossain2025aiintheloop,
title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning},
author={Hossain, Ismail; Puppala, Sai; Alam, Md Jahangir; and Talukder, Sajedul},
journal={[arXiv preprint arXiv:2509.05362](https://arxiv.org/abs/2509.05362)},
year={2025}
}
```
---
## Contact
- **Authors:** [email protected], [email protected], [email protected]
- **Lab:** [Supreme Lab](https://www.cs.utep.edu/stalukder/supremelab/index.html)
- **Personal Web:** [https://ismail102.github.io/](https://ismail102.github.io/)
---