1 4 1

Lukas PRO

sirluk

AI & ML interests

None yet

Recent Activity

updated a Space about 2 months ago

sirluk/trl-lora-without-regret-smollm3

published a Space about 2 months ago

sirluk/trl-lora-without-regret-smollm3

commented on their article 4 months ago

Efficient LLM Pretraining: Packed Sequences and Masked Attention

View all activity

Organizations

updated a Space about 2 months ago

Trl Lora Without Regret Smollm3

🚀

Display tracking information

published a Space about 2 months ago

Trl Lora Without Regret Smollm3

🚀

Display tracking information

commented on Efficient LLM Pretraining: Packed Sequences and Masked Attention 4 months ago

Can you maybe share a full script to reproduce this error?

upvoted an article 7 months ago

Article

xLSTM-based time series model TiRex significantly outperforms competing models in forecasting accuracy

Jun 4

•

commented on Efficient LLM Pretraining: Packed Sequences and Masked Attention 7 months ago

Yes, this sounds like the model just learns to not attend to earlier tokens when an eos token comes after. With a packed sequence mask you can enforce this explicitly by just masking out previous tokens.

commented on Efficient LLM Pretraining: Packed Sequences and Masked Attention 8 months ago

Hey @shantanuagarwal , glad you enjoyed the article! Even though I havent tried it out myself you should be able to leverage pytorch flexattention api for this. Have a look at the tutorial here https://pytorch.org/blog/flexattention/. Section "Document Masking/Jagged Sequences" talks about these packed sequence masks.

upvoted a paper about 1 year ago

A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks

Paper • 2410.22391 • Published Oct 29, 2024 • 22

authored a paper about 1 year ago

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Paper • 2410.07170 • Published Oct 9, 2024 • 16

upvoted 2 papers about 1 year ago

Retrieval-Augmented Decision Transformer: External Memory for In-context RL

Paper • 2410.07071 • Published Oct 9, 2024 • 7

One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation

Paper • 2410.07170 • Published Oct 9, 2024 • 16

published an article about 1 year ago

Article

Efficient LLM Pretraining: Packed Sequences and Masked Attention

Oct 7, 2024

•

New activity in meta-llama/Llama-3.1-8B over 1 year ago

apply_chat_template method not working correctly for llama 3 tokenizer

#35 opened over 1 year ago by

sirluk

liked a dataset over 1 year ago

bigcode/the-stack

Viewer • Updated Apr 13, 2023 • 546M • 21.4k • 898

published an article almost 2 years ago

Article

Multilabel Classification using Mistral-7B on a single GPU with quantization and LoRA

Jan 22, 2024

•

Lukas PRO

AI & ML interests

Recent Activity

Organizations

sirluk's activity

Trl Lora Without Regret Smollm3

Trl Lora Without Regret Smollm3

xLSTM-based time series model TiRex significantly outperforms competing models in forecasting accuracy

Efficient LLM Pretraining: Packed Sequences and Masked Attention

apply_chat_template method not working correctly for llama 3 tokenizer

Multilabel Classification using Mistral-7B on a single GPU with quantization and LoRA