Inference Time
#287
by
XxLOLxX
- opened
Any One Can help make the Inference time faster? I have a vm with 4 T4 GPUs.
My script passes to the model:
1-Orignial Query
2-User's Question
3-10 instructions to follow
The inference time really varies some requests take 7:10 secs
other may take 50:60 secs
Any ideas how can i use the 4 GPU's for a fast response