This author successfully replaced OpenAI's API with an open-source alternative to reduce the cost of running large-scale AI applications. They tried using Ollama on a local machine to generate text summaries, but limitations with concurrent processing led to them using vLLM (a fast inference runner). To handle large volumes of requests, the author used a Kubernetes cluster to deploy and load balance vLLM.
Wednesday, May 15, 2024Requesting JSON or other structured output from language models is challenging. This new feature in OpenAI's API supports structured output from language model generation that can be consumed by downstream deterministic programs.
OpenAI has introduced Structured Outputs in its API, allowing developers to have model-generated outputs adhere to provided JSON schemas. This guarantees output consistency and format compliance. It's available in the API today.
Mistral has released a free API tier, dramatically reduced its costs, improved the performance of its small model, and put its vision model in Le Chat.