bareethul commited on
Commit
5e3db0b
·
verified ·
1 Parent(s): 1c4dfb4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +122 -0
README.md ADDED
@@ -0,0 +1,122 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc
3
+ datasets:
4
+ - jennifee/HW1-tabular-dataset
5
+ language:
6
+ - en
7
+ metrics:
8
+ - accuracy
9
+ base_model:
10
+ - autogluon/tabpfn-mix-1.0-classifier
11
+ pipeline_tag: tabular-classification
12
+ tags:
13
+ - automl
14
+ - classification
15
+ - books
16
+ - tabular
17
+ - autogluon
18
+ ---
19
+
20
+ # Model Card for AutoML Books Classification
21
+
22
+ This model card documents the **AutoML Books Classification** model trained with **AutoGluon AutoML** on a classmate’s dataset of fiction and nonfiction books.
23
+ The task is to predict whether a book is **recommended to everyone** based on tabular features.
24
+
25
+ ---
26
+
27
+ ## Model Details
28
+
29
+ - **Developed by:** Bareethul Kader
30
+ - **Framework:** AutoGluon (v1.1)
31
+ - **Repository:** [bareethul/AutoML-books-classification](https://huggingface.co/bareethul/AutoML-books-classification)
32
+ - **License:** CC BY 4.0
33
+
34
+ ---
35
+
36
+ ## Intended Use
37
+
38
+ ### Direct Use
39
+ - Educational demonstration of AutoML on a small tabular dataset.
40
+ - Comparison of multiple classical ML models through automated search.
41
+ - Understanding validation vs. test performance trade-offs.
42
+
43
+ ### Out-of-Scope Use
44
+ - Not designed for production or book recommendation engines.
45
+ - Dataset too small to generalize beyond classroom context.
46
+
47
+ ---
48
+
49
+ ## Dataset
50
+
51
+ - **Source:** https://huggingface.co/datasets/jennifee/HW1-tabular-dataset .
52
+ - **Task:** Classification (`RecommendToEveryone` = 0/1).
53
+ - **Size:** 30 original samples + ~300 augmented rows.
54
+ - **Features:**
55
+ - `Pages` (integer)
56
+ - `Thickness` (float)
57
+ - `ReadStatus` (categorical: read/started/not read)
58
+ - `Genre` (categorical: fiction/nonfiction)
59
+ - `RecommendToEveryone` (binary target)
60
+
61
+ ---
62
+
63
+ ## Training Setup
64
+
65
+ - **AutoML framework:** AutoGluon TabularPredictor
66
+ - **Evaluation metric:** Accuracy
67
+ - **Budget:** ~1 minute training time, small scale search
68
+ - **Hardware:** Google Colab (T4 GPU not required, CPU sufficient)
69
+ - **Search Space:**
70
+ - Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest
71
+ - Neural nets: Torch, FastAI (small MLPs)
72
+ - Bagging and ensembling across layers (L1, L2, L3)
73
+
74
+ ---
75
+
76
+ ## Results
77
+
78
+ ### Mini Leaderboard (Top 3 Models)
79
+
80
+ | Rank | Model | Test Accuracy | Validation Accuracy |
81
+ |------|---------------------------|---------------|----------------------|
82
+ | 1 | RandomForestEntr_BAG_L1 | **0.55** | ~0.65 |
83
+ | 2 | LightGBM_r96_BAG_L2 | 0.53 | ~0.72 |
84
+ | 3 | LightGBMLarge_BAG_L2 | 0.53 | ~0.74 |
85
+
86
+ - **Best model (AutoGluon selected):** `RandomForestEntr_BAG_L1`
87
+ - **Test Accuracy:** ~0.55
88
+ - **Validation Accuracy (best across runs):** up to ~0.75 (LightGBM variants)
89
+
90
+ Note: The **“best model”** may vary depending on random splits and seeds.
91
+ While AutoGluon reported `RandomForestEntr_BAG_L1` as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly.
92
+
93
+ ---
94
+
95
+ ## Limitations, Biases, and Ethical Notes
96
+
97
+ - **Small dataset size** → models may overfit, performance metrics unstable.
98
+ - **Augmented data** → synthetic rows may not reflect true variability.
99
+ - **Task scope** → purely educational, not for real world recommendation.
100
+
101
+ ---
102
+
103
+ ## AI Usage Disclosure
104
+
105
+ - ChatGPT (GPT-5) assisted in:
106
+ - Helping with coding and AutoGluon AutoML approach on the go
107
+ - Polishing the Colab notebook for clarity
108
+ - Refining this model card
109
+
110
+ ---
111
+
112
+ ## Citation
113
+
114
+ **BibTeX:**
115
+ ```bibtex
116
+ @model{bareethul_books_classification,
117
+ author = {Kader, Bareethul},
118
+ title = {AutoML Books Classification},
119
+ year = {2025},
120
+ framework = {AutoGluon},
121
+ repository = {https://huggingface.co/bareethul/AutoML-books-classification}
122
+ }