|
|
--- |
|
|
license: mit |
|
|
datasets: |
|
|
- BothBosu/scam-dialogue |
|
|
- BothBosu/Scammer-Conversation |
|
|
- BothBosu/youtube-scam-conversations |
|
|
- BothBosu/multi-agent-scam-conversation |
|
|
- BothBosu/single-agent-scam-conversations |
|
|
- an19352/scam-baiting-conversations |
|
|
- scambaitermailbox/scambaiting_dataset |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- bertscore |
|
|
- perplexity |
|
|
- f1 |
|
|
- rouge |
|
|
- distinct-n |
|
|
- dialogrpt |
|
|
base_model: |
|
|
- meta-llama/LlamaGuard-7b |
|
|
- meta-llama/Meta-Llama-Guard-2-8B |
|
|
- meta-llama/Llama-Guard-3-8B |
|
|
- OpenSafetyLab/MD-Judge-v0.1 |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
# Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting |
|
|
|
|
|
This repository contains **instruction-tuned large language models (LLMs)** designed for **real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning**. |
|
|
The models are trained and evaluated as part of the paper: |
|
|
**[AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning](https://supreme-lab.github.io/ai-in-the-loop/)** |
|
|
|
|
|
--- |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Developed by:** Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale |
|
|
- **Funded by:** U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012) |
|
|
- **Shared by:** Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam |
|
|
- **Model type:** Multi-task instruction-tuned LLMs (classification + safe text generation) |
|
|
- **Languages:** English |
|
|
- **License:** MIT |
|
|
- **Finetuned from:** LlamaGuard family & MD-Judge |
|
|
|
|
|
### Model Sources |
|
|
- **Repository:** [GitHub – supreme-lab/ai-in-the-loop](https://github.com/supreme-lab/ai-in-the-loop) |
|
|
- **Hugging Face:** [supreme-lab/ai-in-the-loop](https://huggingface.co/supreme-lab/ai-in-the-loop) |
|
|
- **Paper:** [ArXiv version](#) |
|
|
|
|
|
--- |
|
|
|
|
|
## Uses |
|
|
|
|
|
### Direct Use |
|
|
- Real-time scam classification (scam vs. non-scam conversations) |
|
|
- Conversational **scam-baiting** to waste scammer time safely |
|
|
- **PII risk scoring** to filter unsafe outputs |
|
|
|
|
|
### Downstream Use |
|
|
- Integration into messaging platforms for scam prevention |
|
|
- Benchmarks for **AI safety alignment** in adversarial contexts |
|
|
- Research in **federated privacy-preserving LLMs** |
|
|
|
|
|
### Out-of-Scope Use |
|
|
- Should **not** be used as a replacement for law enforcement tools |
|
|
- Should **not** be deployed without safety filters and human-in-the-loop monitoring |
|
|
- Not intended for **financial or medical decision-making** |
|
|
|
|
|
--- |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
- Models may **over-engage with scammers** in rare cases |
|
|
- Possible **false positives** in benign conversations |
|
|
- Cultural/linguistic bias: trained primarily on **English data** |
|
|
- Risk of **hallucination** when generating long responses |
|
|
|
|
|
### Recommendations |
|
|
- Always deploy with **safety thresholds (δ, θ1, θ2)** |
|
|
- Use in **controlled environments** first (research, simulations) |
|
|
- Extend to **multilingual settings** before real-world deployment |
|
|
|
|
|
--- |
|
|
|
|
|
## How to Get Started |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
# Replase the <x> with 2 or 3 and Nothing (when it is llama-guard-multi-task) |
|
|
model_id = "supreme-lab/ai-in-the-loop/llama-guard-<x>-multi-task" |
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
model = AutoModelForCausalLM.from_pretrained(model_id) |
|
|
|
|
|
inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_new_tokens=100) |
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Training Data |
|
|
- **Classification:** SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues) |
|
|
- **Generation:** SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions) |
|
|
- **Auxiliary:** ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset |
|
|
|
|
|
### Training Procedure |
|
|
- **Fine-tuning setup:** |
|
|
- 3 epochs, batch size = 8 |
|
|
- LoRA rank = 8, α = 16 |
|
|
- Mixed precision (bf16) |
|
|
- Optimizer: AdamW |
|
|
- **Federated Learning (FL):** |
|
|
- Simulated 10 clients, 30 rounds FedAvg |
|
|
- Optional **Differential Privacy** (noise multipliers: 0.1, 0.8) |
|
|
|
|
|
--- |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
### Metrics |
|
|
- **Classification:** F1, AUPRC, FPR, FNR |
|
|
- **Generation:** Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench |
|
|
|
|
|
### Results |
|
|
- **Classification:** BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive |
|
|
- **Instruction-tuned LLMs:** MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation |
|
|
- **Generation:** MD-Judge achieved **lowest perplexity (22.3)**, **highest engagement (0.79)**, **96% safety compliance** in human evals |
|
|
|
|
|
--- |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
- **Hardware:** NVIDIA H100 GPUs |
|
|
- **Training Time:** ~30 hrs across models |
|
|
- **Federated Setup:** 10 simulated clients, 30 rounds |
|
|
|
|
|
--- |
|
|
|
|
|
## Technical Specifications |
|
|
|
|
|
- **Architecture:** Instruction-tuned transformer (decoder-only) |
|
|
- **Objective:** Multi-task (classification, risk scoring, safe generation) |
|
|
--- |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use these models, please cite our paper: |
|
|
|
|
|
```bibtex |
|
|
@article{hossain2025aiintheloop, |
|
|
title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning}, |
|
|
author={Hossain, Ismail; Puppala, Sai; Alam, Md Jahangir; and Talukder, Sajedul}, |
|
|
journal={[arXiv preprint arXiv:2509.05362](https://arxiv.org/abs/2509.05362)}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
--- |
|
|
|
|
|
## Contact |
|
|
|
|
|
- **Authors:** [email protected], [email protected], [email protected] |
|
|
- **Lab:** [Supreme Lab](https://www.cs.utep.edu/stalukder/supremelab/index.html) |
|
|
- **Personal Web:** [https://ismail102.github.io/](https://ismail102.github.io/) |
|
|
|
|
|
--- |