Inference Time

#287
by XxLOLxX - opened

Any One Can help make the Inference time faster? I have a vm with 4 T4 GPUs.
My script passes to the model:
1-Orignial Query
2-User's Question
3-10 instructions to follow

The inference time really varies some requests take 7:10 secs
other may take 50:60 secs

Any ideas how can i use the 4 GPU's for a fast response

Sign up or log in to comment