 danielz01
			's Collections
			danielz01
			's Collections
			
			
		Efficient LLM
		
	updated
			
 
				
				
 - FlashDecoding++: Faster Large Language Model Inference on GPUs- 
			Paper
			 •- 
			2311.01282
			 •
			Published
				
			•- 
				37
			 
 - S-LoRA: Serving Thousands of Concurrent LoRA Adapters- 
			Paper
			 •- 
			2311.03285
			 •
			Published
				
			•- 
				32
			 
 - Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization- 
			Paper
			 •- 
			2311.06243
			 •
			Published
				
			•- 
				22
			 
 - FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor
  Cores- 
			Paper
			 •- 
			2311.05908
			 •
			Published
				
			•- 
				16
			 
 - Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying- 
			Paper
			 •- 
			2311.09578
			 •
			Published
				
			•- 
				16
			 
 - I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of
  Post-Training ViTs Quantization- 
			Paper
			 •- 
			2311.10126
			 •
			Published
				
			•- 
				10
			 
 - SparQ Attention: Bandwidth-Efficient LLM Inference- 
			Paper
			 •- 
			2312.04985
			 •
			Published
				
			•- 
				40
			 
 - A Survey of Resource-efficient LLM and Multimodal Foundation Models- 
			Paper
			 •- 
			2401.08092
			 •
			Published
				
			•- 
				3
			 
 - SliceGPT: Compress Large Language Models by Deleting Rows and Columns- 
			Paper
			 •- 
			2401.15024
			 •
			Published
				
			•- 
				74
			 
 - EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty- 
			Paper
			 •- 
			2401.15077
			 •
			Published
				
			•- 
				20
			 
 - Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient
  LLMs Under Compression- 
			Paper
			 •- 
			2403.15447
			 •
			Published
				
			•- 
				16
			 
 - A Controlled Study on Long Context Extension and Generalization in LLMs- 
			Paper
			 •- 
			2409.12181
			 •
			Published
				
			•- 
				45