krishnas4415's picture
Upload models/XGBoost-Log-Anomaly-Detection/README.md with huggingface_hub
e957c99 verified
metadata
license: mit
tags:
  - log-analysis
  - anomaly-detection
  - bert
  - cybersecurity
  - multiclass-classification
language:
  - en
datasets:
  - custom-log-dataset
metrics:
  - f1
  - accuracy
pipeline_tag: text-classification

XGBoost-Log-Anomaly-Detection - Log Anomaly Detection

This model is part of the Log Anomaly Detection System that classifies system logs into 7 anomaly categories.

Model Description

XGBoost-Log-Anomaly-Detection is a XGBoost Classifier with BERT Features model fine-tuned for multi-class log anomaly detection. It can classify logs from 16+ different sources (Apache, SSH, Hadoop, etc.) into 7 categories:

  1. Normal (0): Benign operations
  2. Security Anomaly (1): Authentication failures, unauthorized access
  3. System Failure (2): Crashes, kernel panics
  4. Performance Issue (3): Timeouts, slow responses
  5. Network Anomaly (4): Connection errors, packet loss
  6. Config Error (5): Misconfigurations, invalid settings
  7. Hardware Issue (6): Disk failures, memory errors

Performance Metrics

  • F1-Score (Macro): 0.885
  • Accuracy: 0.912
  • Model Type: XGBoost Classifier with BERT Features
  • Classes: 7 (normal, security_anomaly, system_failure, performance_issue, network_anomaly, config_error, hardware_issue)

Usage

import torch
from transformers import AutoTokenizer, AutoModel

# Load the model
model = torch.load('model.pt')
tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')

# Example usage
log_text = "Apr 15 12:34:56 server sshd[1234]: Failed password for admin"
inputs = tokenizer(log_text, return_tensors='pt', max_length=128, truncation=True, padding=True)

with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.softmax(outputs.logits, dim=-1)
    predicted_class = torch.argmax(predictions, dim=-1)

Training Data

  • Sources: 16 log types (Apache, SSH, Hadoop, HDFS, Linux, Windows, etc.)
  • Size: ~32,000 labeled logs
  • Classes: 7 anomaly categories
  • Features: BERT embeddings + template features + statistical features

Citation

@misc{log-anomaly-detection-2024,
  title={Log Anomaly Detection System},
  author={Krishna Sharma},
  year={2024},
  url={https://github.com/krishnasharma4415/log-anomaly-detection}
}

License

MIT License - see LICENSE file for details.