Chinese artificial intelligence start-up DeepSeek has rolled out a major upgrade to its flagship V4 model aimed at sharply accelerating AI response generation, as competition among Chinese developers increasingly shifts to

reducing serving costs and enhancing user experience

.

AI models’ conventional token-by-token output often slowed when responses were lengthy, leading to low utilisation of graphics processing units (GPU) and high user-perceived waiting time, which was a “primary bottleneck in serving AI”, the company said in research published on Saturday.

DeepSeek said the DSpark module accelerated AI response generation – also known as AI inference, which refers to serving a trained model to respond to user queries – by using a lightweight draft model to propose candidate responses and then verifying them in batches with a larger model, speeding up output.

DSpark further refined the approach with a semi-autoregressive generation method, allowing the model to produce small chunks of tokens rather than strictly one at a time.

The new technique could reduce the computing resources needed to serve AI systems, according to a programmer. Shutterstock

It also introduced a confidence-based scheduling system that dynamically adjusted how much verification was applied based on computing demand, helping balance speed and output quality.