ai-in-the-loop / README.md

Update README.md

56ec32c verified about 2 months ago

5.77 kB

	---
	license: mit
	datasets:
	- BothBosu/scam-dialogue
	- BothBosu/Scammer-Conversation
	- BothBosu/youtube-scam-conversations
	- BothBosu/multi-agent-scam-conversation
	- BothBosu/single-agent-scam-conversations
	- an19352/scam-baiting-conversations
	- scambaitermailbox/scambaiting_dataset
	language:
	- en
	metrics:
	- bertscore
	- perplexity
	- f1
	- rouge
	- distinct-n
	- dialogrpt
	base_model:
	- meta-llama/LlamaGuard-7b
	- meta-llama/Meta-Llama-Guard-2-8B
	- meta-llama/Llama-Guard-3-8B
	- OpenSafetyLab/MD-Judge-v0.1
	pipeline_tag: text-generation
	library_name: transformers
	---

	# Model Card: AI-in-the-Loop for Real-Time Scam Detection & Scam-Baiting

	This repository contains instruction-tuned large language models (LLMs) designed for real-time scam detection, conversational scam-baiting, and privacy-preserving federated learning.
	The models are trained and evaluated as part of the paper:
	[AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning](https://supreme-lab.github.io/ai-in-the-loop/)

	---

	## Model Details

	- Developed by: Supreme Lab, University of Texas at El Paso & Southern Illinois University Carbondale
	- Funded by: U.S. National Science Foundation (Award No. 2451946) and U.S. Nuclear Regulatory Commission (Award No. 31310025M0012)
	- Shared by: Ismail Hossain, Sai Puppala, Sajedul Talukder, Md Jahangir Alam
	- Model type: Multi-task instruction-tuned LLMs (classification + safe text generation)
	- Languages: English
	- License: MIT
	- Finetuned from: LlamaGuard family & MD-Judge

	### Model Sources
	- Repository: [GitHub – supreme-lab/ai-in-the-loop](https://github.com/supreme-lab/ai-in-the-loop)
	- Hugging Face: [supreme-lab/ai-in-the-loop](https://huggingface.co/supreme-lab/ai-in-the-loop)
	- Paper: [ArXiv version](#)

	---

	## Uses

	### Direct Use
	- Real-time scam classification (scam vs. non-scam conversations)
	- Conversational scam-baiting to waste scammer time safely
	- PII risk scoring to filter unsafe outputs

	### Downstream Use
	- Integration into messaging platforms for scam prevention
	- Benchmarks for AI safety alignment in adversarial contexts
	- Research in federated privacy-preserving LLMs

	### Out-of-Scope Use
	- Should not be used as a replacement for law enforcement tools
	- Should not be deployed without safety filters and human-in-the-loop monitoring
	- Not intended for financial or medical decision-making

	---

	## Bias, Risks, and Limitations

	- Models may over-engage with scammers in rare cases
	- Possible false positives in benign conversations
	- Cultural/linguistic bias: trained primarily on English data
	- Risk of hallucination when generating long responses

	### Recommendations
	- Always deploy with safety thresholds (δ, θ1, θ2)
	- Use in controlled environments first (research, simulations)
	- Extend to multilingual settings before real-world deployment

	---

	## How to Get Started

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	# Replase the <x> with 2 or 3 and Nothing (when it is llama-guard-multi-task)
	model_id = "supreme-lab/ai-in-the-loop/llama-guard-<x>-multi-task"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(model_id)

	inputs = tokenizer("Scammer: Hello, I need your SSN.", return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=100)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	---

	## Training Details

	### Training Data
	- Classification: SSD, SSC, SASC, MASC (synthetic scam/non-scam dialogues)
	- Generation: SBC (254 real scam-baiting convs), ASB (>37k msgs), YTSC (YouTube scam transcriptions)
	- Auxiliary: ConvAI, DailyDialog (engagement), HarmfulQA, Microsoft PII dataset

	### Training Procedure
	- Fine-tuning setup:
	- 3 epochs, batch size = 8
	- LoRA rank = 8, α = 16
	- Mixed precision (bf16)
	- Optimizer: AdamW
	- Federated Learning (FL):
	- Simulated 10 clients, 30 rounds FedAvg
	- Optional Differential Privacy (noise multipliers: 0.1, 0.8)

	---

	## Evaluation

	### Metrics
	- Classification: F1, AUPRC, FPR, FNR
	- Generation: Perplexity, Distinct-1/2, DialogRPT, BERTScore, ROUGE-L, HarmBench

	### Results
	- Classification: BiGRU/BiLSTM > 0.99 F1, RoBERTa competitive
	- Instruction-tuned LLMs: MD-Judge best overall (F1 = 0.89+), LlamaGuard3 strong for moderation
	- Generation: MD-Judge achieved lowest perplexity (22.3), highest engagement (0.79), 96% safety compliance in human evals

	---

	## Environmental Impact

	- Hardware: NVIDIA H100 GPUs
	- Training Time: ~30 hrs across models
	- Federated Setup: 10 simulated clients, 30 rounds

	---

	## Technical Specifications

	- Architecture: Instruction-tuned transformer (decoder-only)
	- Objective: Multi-task (classification, risk scoring, safe generation)
	---

	## Citation

	If you use these models, please cite our paper:

	```bibtex
	@article{hossain2025aiintheloop,
	title={AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning},
	author={Hossain, Ismail; Puppala, Sai; Alam, Md Jahangir; and Talukder, Sajedul},
	journal={[arXiv preprint arXiv:2509.05362](https://arxiv.org/abs/2509.05362)},
	year={2025}
	}
	```

	---

	## Contact

	- Authors: [email protected], [email protected], [email protected]
	- Lab: [Supreme Lab](https://www.cs.utep.edu/stalukder/supremelab/index.html)
	- Personal Web: [https://ismail102.github.io/](https://ismail102.github.io/)

	---