Where a retail support agent creates a dispute.
A representative Voice Agent evaluation of an e-commerce support line handling order status, returns, and exchanges. Illustrative throughout — it shows the depth of a real evaluation, not a named client result.
System evaluated
Order and returns voice agent
Domain
E-commerce customer support
Reporting period
Single evaluation cycle
Channel
Voice · inbound
Intent handling is strong, so the evaluation pressured the edges: return-policy questions on edge cases, lookups when the caller lacks an order number, and callers who have already had one order go wrong.
Scope
210 human-tested calls · 5 caller profiles · 10 scenarios · English
Performance
Weighted overall: 3.9 / 5 — Good, with one sharp risk.
Diagnosis
Caller
Can I return this after 40 days if it is unopened?
Agent
Yes — returns are accepted any time.
Impact. The agent states a policy that is not the actual 30-day window, confidently. That sets up a refused return and a dispute later.
Recommendation. Ground every policy answer in the current return rules and decline to guess on edge cases.
Caller
I do not have the order number — just the email I used.
Agent
I need the order number to continue.
Impact. A recoverable lookup is dead-ended. The email alone is enough to find the order, but the agent will not accept it.
Recommendation. Support fallback identifiers — email or phone — before requiring the order number.
Caller
This is the third time the wrong item shipped — I want a person.
Agent
I can help you with that right here.
Impact. A justified human handoff is refused, escalating frustration on an order that has already failed twice.
Recommendation. Route explicit human requests and repeat-failure signals straight to a live agent.
An excerpt of the per-call log — every finding carries a reproducible Call ID, a failure type, and a severity on the 4-band scale.
Resolution
68 findings total — scored on the Voice rubric and ranked by severity, the same way every evaluation reports.
Ground policy answers in live rules
Removes the confident wrong answers that turn into refused returns and disputes. Highest expected lift.
Add fallback order identifiers
Recovers the lookups that currently dead-end when the caller lacks an order number.
Route repeat-failure escalations
Stops the agent refusing a justified human handoff on an already-failed order.
Overall 3.9 / 5 — Good. Intent recognition and conversation quality are strong; the exposure is response accuracy on policy edge cases, where confident wrong answers create downstream disputes. Grounding policy responses and loosening the order-lookup path clears the main risks. It is ready for a pilot across order-status and returns once policy grounding lands.
Prepared by KNK Global · evaluation services
See this on your own support line.
A pilot returns an evaluation in this exact shape, scored on your live voice agent under real callers.