Long latency and low gpu utilization
#8
by
						
Scott0612
	
							
						- opened
							
					
I am running this model using the mlx example code mistral.py, I downloaded the model, tokenizer and config files. When I run it, the model is loaded fine, but the gpu utilization is single digit, and the first 10 tokens output took like 5 mins. Using a M2 macbook air 16 GB. What is the issue?
