Services
Voice Agent TestingAI Sales / SDR TestingCustomer Support AI EvaluationContact Center AI QA ProgramEnterprise AI Performance AssessmentAll services
How we work
How It WorksMethodologyReportsCase studies
Company
Why UsEngagement Models
Book a Pilot
Case study

Where a retail support agent creates a dispute.

A representative Voice Agent evaluation of an e-commerce support line handling order status, returns, and exchanges. Illustrative throughout — it shows the depth of a real evaluation, not a named client result.

Engagement · Voice agentILLUSTRATIVE

System evaluated

Order and returns voice agent

Domain

E-commerce customer support

Reporting period

Single evaluation cycle

Channel

Voice · inbound

The challengeILLUSTRATIVE

Intent handling is strong, so the evaluation pressured the edges: return-policy questions on edge cases, lookups when the caller lacks an order number, and callers who have already had one order go wrong.

Scope

210 human-tested calls · 5 caller profiles · 10 scenarios · English

Performance

Performance scorecard · Voice rubricILLUSTRATIVE
Intent Recognition4.3 / 5
Accent Handling3.8 / 5
Response Accuracy3.5 / 5
Context Retention3.6 / 5
Conversation Quality4.0 / 5

Weighted overall: 3.9 / 5Good, with one sharp risk.

Diagnosis

Top failure themesILLUSTRATIVE
Confident wrong policy answer29% of failures

Caller

Can I return this after 40 days if it is unopened?

Agent

Yes — returns are accepted any time.

Impact. The agent states a policy that is not the actual 30-day window, confidently. That sets up a refused return and a dispute later.

Recommendation. Ground every policy answer in the current return rules and decline to guess on edge cases.

Lookup dead-ends without an order number26% of failures

Caller

I do not have the order number — just the email I used.

Agent

I need the order number to continue.

Impact. A recoverable lookup is dead-ended. The email alone is enough to find the order, but the agent will not accept it.

Recommendation. Support fallback identifiers — email or phone — before requiring the order number.

Escalation refused on a repeat failure19% of failures

Caller

This is the third time the wrong item shipped — I want a person.

Agent

I can help you with that right here.

Impact. A justified human handoff is refused, escalating frustration on an order that has already failed twice.

Recommendation. Route explicit human requests and repeat-failure signals straight to a live agent.

Sample failure logILLUSTRATIVE
Call IDIssue typeSeverityDescription
R-024Factual errorCRITICALStated a returns window that did not match the actual policy.
R-051Flow breakdownMAJORDead-ended a lookup when the caller had no order number.
R-077Escalation failureMAJORRefused a human handoff on a repeat shipping failure.
R-103ConsistencyMINORGave two different exchange timelines across separate calls.

An excerpt of the per-call log — every finding carries a reproducible Call ID, a failure type, and a severity on the 4-band scale.

Resolution

Severity distribution · 4-bandILLUSTRATIVE
3
11
24
30
Critical3 · 4%
Major11 · 16%
Minor24 · 35%
Observations30 · 44%

68 findings total — scored on the Voice rubric and ranked by severity, the same way every evaluation reports.

Improvement prioritiesILLUSTRATIVE
1

Ground policy answers in live rules

Removes the confident wrong answers that turn into refused returns and disputes. Highest expected lift.

2

Add fallback order identifiers

Recovers the lookups that currently dead-end when the caller lacks an order number.

3

Route repeat-failure escalations

Stops the agent refusing a justified human handoff on an already-failed order.

Management summaryILLUSTRATIVE

Overall 3.9 / 5 — Good. Intent recognition and conversation quality are strong; the exposure is response accuracy on policy edge cases, where confident wrong answers create downstream disputes. Grounding policy responses and loosening the order-lookup path clears the main risks. It is ready for a pilot across order-status and returns once policy grounding lands.

Prepared by KNK Global · evaluation services

See this on your own support line.

A pilot returns an evaluation in this exact shape, scored on your live voice agent under real callers.