bareethul
/

AutoML-books-classification

+---
+license: cc
+datasets:
+- jennifee/HW1-tabular-dataset
+language:
+- en
+metrics:
+- accuracy
+base_model:
+- autogluon/tabpfn-mix-1.0-classifier
+pipeline_tag: tabular-classification
+tags:
+- automl
+- classification
+- books
+- tabular
+- autogluon
+---
+# Model Card for AutoML Books Classification
+This model card documents the **AutoML Books Classification** model trained with **AutoGluon AutoML** on a classmate’s dataset of fiction and nonfiction books.
+The task is to predict whether a book is **recommended to everyone** based on tabular features.
+---
+## Model Details
+- **Developed by:** Bareethul Kader
+- **Framework:** AutoGluon (v1.1)
+- **Repository:** [bareethul/AutoML-books-classification](https://huggingface.co/bareethul/AutoML-books-classification)
+- **License:** CC BY 4.0
+---
+## Intended Use
+### Direct Use
+- Educational demonstration of AutoML on a small tabular dataset.
+- Comparison of multiple classical ML models through automated search.
+- Understanding validation vs. test performance trade-offs.
+### Out-of-Scope Use
+- Not designed for production or book recommendation engines.
+- Dataset too small to generalize beyond classroom context.
+---
+## Dataset
+- **Source:** https://huggingface.co/datasets/jennifee/HW1-tabular-dataset .
+- **Task:** Classification (`RecommendToEveryone` = 0/1).
+- **Size:** 30 original samples + ~300 augmented rows.
+- **Features:**
+  - `Pages` (integer)
+  - `Thickness` (float)
+  - `ReadStatus` (categorical: read/started/not read)
+  - `Genre` (categorical: fiction/nonfiction)
+  - `RecommendToEveryone` (binary target)
+---
+## Training Setup
+- **AutoML framework:** AutoGluon TabularPredictor
+- **Evaluation metric:** Accuracy
+- **Budget:** ~1 minute training time, small scale search
+- **Hardware:** Google Colab (T4 GPU not required, CPU sufficient)
+- **Search Space:**
+  - Tree based models: LightGBM, XGBoost, ExtraTrees, RandomForest
+  - Neural nets: Torch, FastAI (small MLPs)
+  - Bagging and ensembling across layers (L1, L2, L3)
+---
+## Results
+### Mini Leaderboard (Top 3 Models)
+| Rank | Model                     | Test Accuracy | Validation Accuracy |
+|------|---------------------------|---------------|----------------------|
+| 1    | RandomForestEntr_BAG_L1   | **0.55**      | ~0.65               |
+| 2    | LightGBM_r96_BAG_L2       | 0.53          | ~0.72               |
+| 3    | LightGBMLarge_BAG_L2      | 0.53          | ~0.74               |
+- **Best model (AutoGluon selected):** `RandomForestEntr_BAG_L1`
+- **Test Accuracy:** ~0.55
+- **Validation Accuracy (best across runs):** up to ~0.75 (LightGBM variants)
+Note: The **“best model”** may vary depending on random splits and seeds.
+While AutoGluon reported `RandomForestEntr_BAG_L1` as best in this run, LightGBM models sometimes achieved higher validation accuracy but generalized less strongly.
+---
+## Limitations, Biases, and Ethical Notes
+- **Small dataset size** → models may overfit, performance metrics unstable.
+- **Augmented data** → synthetic rows may not reflect true variability.
+- **Task scope** → purely educational, not for real world recommendation.
+---
+## AI Usage Disclosure
+- ChatGPT (GPT-5) assisted in:
+  - Helping with coding and AutoGluon AutoML approach on the go
+  - Polishing the Colab notebook for clarity
+  - Refining this model card
+---
+## Citation
+**BibTeX:**
+```bibtex
+@model{bareethul_books_classification,
+  author       = {Kader, Bareethul},
+  title        = {AutoML Books Classification},
+  year         = {2025},
+  framework    = {AutoGluon},
+  repository   = {https://huggingface.co/bareethul/AutoML-books-classification}
+}