Cost Harvesting safeguards LLM usage by monitoring and limiting the number of tokens consumed by individual users. If a user exceeds a defined token limit, the system blocks further requests to avoid unnecessary cost spikes.
The policy tracks the prompt and response tokens consumed by each user on a per-minute basis. If the tokens exceed the configured threshold, all additional requests for that minute will be denied.
Threshold Range: 0 - 100,000,000 prompt and response tokens per minute.
Default: 100,000 prompt and response tokens per minute.
If the number of prompt and response tokens exceeds the defined threshold within a minute, all additional requests from that user will be blocked for the remainder of that minute, including history.
To ensure this policy functions correctly, the user should provide a unique User ID to activate the policy. Without the User ID, the policy will not function.
The User ID parameter should be passed in the request body as user:.