AI/ML News & Innovations Hub

Saving time and costs when combining Zilliz Cloud with GPTCache

Frequently asking LLMs repetitive or similar questions can be costly, resource-wasting, and time-consuming, especially during peak times when responses are slow. To save time and money when building AI applications, developers can utilize Zilliz Cloud with GPTCache, an open-source semantic cache that stores LLM responses.With this architecture, Zilliz first checks GPTCache for answers when a user asks a question. If it finds an answer, Zilliz Cloud quickly returns the answer to the user. Otherwise, Zilliz Cloud sends the query to the LLM for an answer and stores it in GPTCache for future use.

Retrieval Augmented Generation

Saving time and costs when combining Zilliz Cloud with GPTCache