supreme-lab
/

ai-in-the-loop

@@ -4,6 +4,10 @@ datasets:
 - BothBosu/scam-dialogue
 - BothBosu/Scammer-Conversation
 - BothBosu/youtube-scam-conversations
 language:
 - en
 metrics:
@@ -11,11 +15,155 @@ metrics:
 - perplexity
 - f1
 - rouge
 base_model:
 - meta-llama/LlamaGuard-7b
-- meta-llama/Llama-Guard-3-8B
 - meta-llama/Meta-Llama-Guard-2-8B
 - OpenSafetyLab/MD-Judge-v0.1
 pipeline_tag: text-generation
 library_name: transformers
 ---

 - BothBosu/scam-dialogue
 - BothBosu/Scammer-Conversation
 - BothBosu/youtube-scam-conversations
+- BothBosu/multi-agent-scam-conversation
+- BothBosu/single-agent-scam-conversations
+- an19352/scam-baiting-conversations
+- scambaitermailbox/scambaiting_dataset
 language:
 - en
 metrics:
 - perplexity
 - f1
 - rouge
+- distinct-n
+- dialogrpt
 base_model:
 - meta-llama/LlamaGuard-7b
 - meta-llama/Meta-Llama-Guard-2-8B
+- meta-llama/Llama-Guard-3-8B
 - OpenSafetyLab/MD-Judge-v0.1
 pipeline_tag: text-generation
 library_name: transformers
+---
+# Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting
+This repository contains **instruction-tuned large language models (LLMs)** designed for **real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning**.
+The models are trained and evaluated as part of the paper:
+**[AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning](https://arxiv.org/abs/)**
+---
+## Model Details
+- **Developed by:** Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale
+- **Funded by:** U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012)
+- **Shared by:** Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam
+- **Model type:** Multi-task instruction-tuned LLMs (classification + safe text generation)
+- **Languages:** English
+- **License:** MIT
+- **Finetuned from:** LlamaGuard family & MD-Judge
+### Model Sources
+- **Repository:** [GitHub – supreme-lab/ai-in-the-loop](https://github.com/supreme-lab/ai-in-the-loop)
+- **Hugging Face:** [supreme-lab/ai-in-the-loop](https://huggingface.co/supreme-lab/ai-in-the-loop)
+- **Paper:** [ArXiv version](#)
+---
+## Uses
+### Direct Use
+- Real-time scam classification (scam vs. non-scam conversations)
+- Conversational **scam-baiting** to waste scammer time safely
+- **PII risk scoring** to filter unsafe outputs
+### Downstream Use
+- Integration into messaging platforms for scam prevention
+- Benchmarks for **AI safety alignment** in adversarial contexts
+- Research in **federated privacy-preserving LLMs**
+### Out-of-Scope Use
+- Should **not** be used as a replacement for law enforcement tools
+- Should **not** be deployed without safety filters and human-in-the-loop monitoring
+- Not intended for **financial or medical decision-making**
+---
+## Bias, Risks, and Limitations
+- Models may **over-engage with scammers** in rare cases
+- Possible **false positives** in benign conversations
+- Cultural/linguistic bias: trained primarily on **English data**
+- Risk of **hallucination** when generating long responses
+### Recommendations
+- Always deploy with **safety thresholds (δ, θ1, θ2)**
+- Use in **controlled environments** first (research, simulations)
+- Extend to **multilingual settings** before real-world deployment
+---
+## How to Get Started
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Replase the <x> with 2 or 3 and Nothing (when it is llama-guard-multi-task)
+model_id = "supreme-lab/ai-in-the-loop/llama-guard-<x>-multi-task"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(model_id)
+inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt")
+outputs = model.generate(**inputs, max_new_tokens=100)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+---
+## Training Details
+### Training Data
+- **Classification:** SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues)
+- **Generation:** SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions)
+- **Auxiliary:** ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset
+### Training Procedure
+- **Fine-tuning setup:**
+  - 3 epochs, batch size = 8
+  - LoRA rank = 8, α = 16
+  - Mixed precision (bf16)
+  - Optimizer: AdamW
+- **Federated Learning (FL):**
+  - Simulated 10 clients, 30 rounds FedAvg
+  - Optional **Differential Privacy** (noise multipliers: 0.1, 0.8)
+---
+## Evaluation
+### Metrics
+- **Classification:** F1, AUPRC, FPR, FNR
+- **Generation:** Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench
+### Results
+- **Classification:** BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive
+- **Instruction-tuned LLMs:** MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation
+- **Generation:** MD-Judge achieved **lowest perplexity (22.3)**, **highest engagement (0.79)**, **96% safety compliance** in human evals
+---
+## Environmental Impact
+- **Hardware:** NVIDIA H100 GPUs
+- **Training Time:** ~30 hrs across models
+- **Federated Setup:** 10 simulated clients, 30 rounds
+---
+## Technical Specifications
+- **Architecture:** Instruction-tuned transformer (decoder-only)
+- **Objective:** Multi-task (classification, risk scoring, safe generation)
+---
+## Citation
+If you use these models, please cite our paper:
+```bibtex
+@article{hossain2025aiintheloop,
+  title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning},
+  author={Hossain, Ismail and Puppala, Sai and Talukder, Sajedul and Alam, Md Jahangir},
+  journal={arXiv preprint arXiv:XXXX.XXXXX},
+  year={2025}
+}
+```
+---
+## Contact
+- **Authors:** [email protected], [email protected], [email protected]
+- **Lab:** [Supreme Lab](https://www.cs.utep.edu/stalukder/supremelab/index.html)
 ---