llm.c instruction pretraining explorations Exploring to find a successful training recipe interleaving pre-training with instruction data jrahn/gpt2_350M_edu_hermes Text Generation • 0.4B • Updated Jul 24, 2024 • 3 • 1 jrahn/gpt3_125M_edu_hermes Text Generation • 0.1B • Updated Aug 1, 2024 • 2
llm.c Exploration Fun experiments with llm.c yuchenj/gpt2_124M_100B_FinewebEdu_hf Text Generation • 0.1B • Updated Jul 26, 2024 yuchenj/gpt2_350M_100B_FinewebEdu_hf Text Generation • 0.4B • Updated Jul 26, 2024 yuchenj/gpt2_774M_100B_FinewebEdu_hf Text Generation • 0.8B • Updated Jul 26, 2024 • 2 • 1 yuchenj/gpt2_1558M_100B_FinewebEdu_hf Text Generation • 2B • Updated Aug 25, 2024 • 3 • 1
llm.c Exploration Fun experiments with llm.c yuchenj/gpt2_124M_100B_FinewebEdu_hf Text Generation • 0.1B • Updated Jul 26, 2024 yuchenj/gpt2_350M_100B_FinewebEdu_hf Text Generation • 0.4B • Updated Jul 26, 2024 yuchenj/gpt2_774M_100B_FinewebEdu_hf Text Generation • 0.8B • Updated Jul 26, 2024 • 2 • 1 yuchenj/gpt2_1558M_100B_FinewebEdu_hf Text Generation • 2B • Updated Aug 25, 2024 • 3 • 1
llm.c instruction pretraining explorations Exploring to find a successful training recipe interleaving pre-training with instruction data jrahn/gpt2_350M_edu_hermes Text Generation • 0.4B • Updated Jul 24, 2024 • 3 • 1 jrahn/gpt3_125M_edu_hermes Text Generation • 0.1B • Updated Aug 1, 2024 • 2