©RillNews
new
show
ask
jobs
submit
login
Real-time LLM Inference on Standard GPUs: 3k tokens/s per request
blog.kog.ai
10 points by
NicoConstant
49 minutes ago
|
0 comments
add comment