krishnas4415 commited on
Commit
e957c99
·
verified ·
1 Parent(s): 012dc87

Upload models/XGBoost-Log-Anomaly-Detection/README.md with huggingface_hub

Browse files
models/XGBoost-Log-Anomaly-Detection/README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - log-analysis
5
+ - anomaly-detection
6
+ - bert
7
+ - cybersecurity
8
+ - multiclass-classification
9
+ language:
10
+ - en
11
+ datasets:
12
+ - custom-log-dataset
13
+ metrics:
14
+ - f1
15
+ - accuracy
16
+ pipeline_tag: text-classification
17
+ ---
18
+
19
+ # XGBoost-Log-Anomaly-Detection - Log Anomaly Detection
20
+
21
+ This model is part of the **Log Anomaly Detection System** that classifies system logs into 7 anomaly categories.
22
+
23
+ ## Model Description
24
+
25
+ XGBoost-Log-Anomaly-Detection is a XGBoost Classifier with BERT Features model fine-tuned for multi-class log anomaly detection. It can classify logs from 16+ different sources (Apache, SSH, Hadoop, etc.) into 7 categories:
26
+
27
+ 1. **Normal** (0): Benign operations
28
+ 2. **Security Anomaly** (1): Authentication failures, unauthorized access
29
+ 3. **System Failure** (2): Crashes, kernel panics
30
+ 4. **Performance Issue** (3): Timeouts, slow responses
31
+ 5. **Network Anomaly** (4): Connection errors, packet loss
32
+ 6. **Config Error** (5): Misconfigurations, invalid settings
33
+ 7. **Hardware Issue** (6): Disk failures, memory errors
34
+
35
+
36
+ ## Performance Metrics
37
+
38
+ - **F1-Score (Macro)**: 0.885
39
+ - **Accuracy**: 0.912
40
+ - **Model Type**: XGBoost Classifier with BERT Features
41
+ - **Classes**: 7 (normal, security_anomaly, system_failure, performance_issue, network_anomaly, config_error, hardware_issue)
42
+
43
+
44
+ ## Usage
45
+
46
+ ```python
47
+ import torch
48
+ from transformers import AutoTokenizer, AutoModel
49
+
50
+ # Load the model
51
+ model = torch.load('model.pt')
52
+ tokenizer = AutoTokenizer.from_pretrained('bert-base-uncased')
53
+
54
+ # Example usage
55
+ log_text = "Apr 15 12:34:56 server sshd[1234]: Failed password for admin"
56
+ inputs = tokenizer(log_text, return_tensors='pt', max_length=128, truncation=True, padding=True)
57
+
58
+ with torch.no_grad():
59
+ outputs = model(**inputs)
60
+ predictions = torch.softmax(outputs.logits, dim=-1)
61
+ predicted_class = torch.argmax(predictions, dim=-1)
62
+ ```
63
+
64
+ ## Training Data
65
+
66
+ - **Sources**: 16 log types (Apache, SSH, Hadoop, HDFS, Linux, Windows, etc.)
67
+ - **Size**: ~32,000 labeled logs
68
+ - **Classes**: 7 anomaly categories
69
+ - **Features**: BERT embeddings + template features + statistical features
70
+
71
+ ## Citation
72
+
73
+ ```bibtex
74
+ @misc{log-anomaly-detection-2024,
75
+ title={Log Anomaly Detection System},
76
+ author={Krishna Sharma},
77
+ year={2024},
78
+ url={https://github.com/krishnasharma4415/log-anomaly-detection}
79
+ }
80
+ ```
81
+
82
+ ## License
83
+
84
+ MIT License - see LICENSE file for details.