KNK Global — Independent human evaluation for conversational AI

Case study

Where a retail support agent creates a dispute.

A representative Voice Agent evaluation of an e-commerce support line handling order status, returns, and exchanges. Illustrative throughout — it shows the depth of a real evaluation, not a named client result.

Engagement · Voice agentILLUSTRATIVE

System evaluated

Order and returns voice agent

Domain

E-commerce customer support

Reporting period

Single evaluation cycle

Channel

Voice · inbound

The challengeILLUSTRATIVE

Intent handling is strong, so the evaluation pressured the edges: return-policy questions on edge cases, lookups when the caller lacks an order number, and callers who have already had one order go wrong.

Scope

210 human-tested calls · 5 caller profiles · 10 scenarios · English

Performance

Performance scorecard · Voice rubricILLUSTRATIVE

Intent Recognition4.3 / 5

Accent Handling3.8 / 5

Response Accuracy3.5 / 5

Context Retention3.6 / 5

Conversation Quality4.0 / 5

Weighted overall: 3.9 / 5 — Good, with one sharp risk.

Diagnosis

Top failure themesILLUSTRATIVE

Confident wrong policy answer29% of failures

Caller

Can I return this after 40 days if it is unopened?

Agent

Yes — returns are accepted any time.

Impact. The agent states a policy that is not the actual 30-day window, confidently. That sets up a refused return and a dispute later.

Recommendation. Ground every policy answer in the current return rules and decline to guess on edge cases.

Lookup dead-ends without an order number26% of failures

Caller

I do not have the order number — just the email I used.

Agent

I need the order number to continue.

Impact. A recoverable lookup is dead-ended. The email alone is enough to find the order, but the agent will not accept it.

Recommendation. Support fallback identifiers — email or phone — before requiring the order number.

Escalation refused on a repeat failure19% of failures

Caller

This is the third time the wrong item shipped — I want a person.

Agent

I can help you with that right here.

Impact. A justified human handoff is refused, escalating frustration on an order that has already failed twice.

Recommendation. Route explicit human requests and repeat-failure signals straight to a live agent.

Sample failure logILLUSTRATIVE

Call IDIssue typeSeverityDescription

R-024Factual errorCRITICALStated a returns window that did not match the actual policy.

R-051Flow breakdownMAJORDead-ended a lookup when the caller had no order number.

R-077Escalation failureMAJORRefused a human handoff on a repeat shipping failure.

R-103ConsistencyMINORGave two different exchange timelines across separate calls.

An excerpt of the per-call log — every finding carries a reproducible Call ID, a failure type, and a severity on the 4-band scale.

Resolution

Severity distribution · 4-bandILLUSTRATIVE

Critical3 · 4%

Major11 · 16%

Minor24 · 35%

Observations30 · 44%

68 findings total — scored on the Voice rubric and ranked by severity, the same way every evaluation reports.

Improvement prioritiesILLUSTRATIVE

Ground policy answers in live rules

Removes the confident wrong answers that turn into refused returns and disputes. Highest expected lift.

Add fallback order identifiers

Recovers the lookups that currently dead-end when the caller lacks an order number.

Route repeat-failure escalations

Stops the agent refusing a justified human handoff on an already-failed order.

Management summaryILLUSTRATIVE

Overall 3.9 / 5 — Good. Intent recognition and conversation quality are strong; the exposure is response accuracy on policy edge cases, where confident wrong answers create downstream disputes. Grounding policy responses and loosening the order-lookup path clears the main risks. It is ready for a pilot across order-status and returns once policy grounding lands.

Prepared by KNK Global · evaluation services

See this on your own support line.

A pilot returns an evaluation in this exact shape, scored on your live voice agent under real callers.

Book a Pilot How a pilot runs

Services

How we work

Company

Where a retail support agent creates a dispute.

Ground policy answers in live rules

Add fallback order identifiers

Route repeat-failure escalations

See this on your own support line.