KNK Global — Independent human evaluation for conversational AI

Case study

Where a phone-banking agent gets rigid.

A representative Voice Agent evaluation of a fintech phone line fielding balance, transaction, and dispute calls. Illustrative throughout — it shows the depth of a real evaluation, not a named client result.

Engagement · Voice agentILLUSTRATIVE

System evaluated

Phone-banking voice agent

Domain

Fintech phone banking

Reporting period

Single evaluation cycle

Channel

Voice · inbound

The challengeILLUSTRATIVE

Core accuracy is high, so the evaluation pressured the flex points: recovering when verification stalls, capturing spoken account digits, and handling a caller who reports a fraudulent charge. These are scripted-behavior questions, not model accuracy.

Scope

190 human-tested calls · 4 caller profiles · 9 scenarios · English

Performance

Performance scorecard · Voice rubricILLUSTRATIVE

Intent Recognition3.9 / 5

Accent Handling3.5 / 5

Response Accuracy4.2 / 5

Context Retention3.3 / 5

Conversation Quality3.4 / 5

Weighted overall: 3.7 / 5 — Acceptable, with a critical gap on the dispute path.

Diagnosis

Top failure themesILLUSTRATIVE

Sensitive intent deflected32% of failures

Caller

I think there is a fraudulent charge on my account.

Agent

You can review your recent transactions in the app.

Impact. An urgent fraud report is deflected to self-service instead of following the defined fraud-handling script. This is the critical finding.

Recommendation. Detect distress and fraud intents and follow the client script for escalation immediately, without a self-service detour.

Verification recovery loop28% of failures

Caller

I already gave you my date of birth twice.

Agent

For security, please provide your date of birth.

Impact. The agent loops on a verification step the caller has already cleared. Callers abandon before reaching their actual request.

Recommendation. Track completed verification steps in-session and never re-request a check that is already satisfied.

Spoken digits misread22% of failures

Caller

The last four are five-zero-nine-three.

Agent

I have five-zero-nine-five — is that right?

Impact. Digit strings are misheard under accent variation, stalling account lookups on repeated re-confirmation.

Recommendation. Confirm digit strings as a single grouped read-back rather than re-prompting digit by digit.

Sample failure logILLUSTRATIVE

Call IDIssue typeSeverityDescription

B-019Escalation failureCRITICALDeflected a fraud report to self-service instead of the defined script.

B-046Flow breakdownMAJORLooped on a verification step the caller had already cleared.

B-072Capture errorMAJORMisread spoken account digits under accent variation.

B-101Noted behaviorOBSERVATIONSRoutine balance and transaction calls completed cleanly.

An excerpt of the per-call log — every finding carries a reproducible Call ID, a failure type, and a severity on the 4-band scale.

Resolution

Severity distribution · 4-bandILLUSTRATIVE

Critical6 · 8%

Major14 · 20%

Minor22 · 31%

Observations29 · 41%

71 findings total — scored on the Voice rubric and ranked by severity, the same way every evaluation reports.

Improvement prioritiesILLUSTRATIVE

Make the fraud path follow the script

Closes the one critical gap — an urgent fraud report must follow the defined escalation script, not self-service. Highest priority.

Track in-session verification state

Ends the re-verification loop that drives caller abandonment.

Improve grouped digit read-back

Reduces the lookup stalls caused by misread account digits.

Management summaryILLUSTRATIVE

Overall 3.7 / 5 — Acceptable. Core intent and accuracy hold up, but the agent is rigid where it most needs to flex — verification recovery and sensitive-intent handling — and one critical gap sits on the fraud path. These are scripted-behavior fixes, not model retraining. With the fraud path and verification state addressed it is pilot-ready on routine balance and transaction calls; the fraud and dispute path should stay supervised until remediated.

Prepared by KNK Global · evaluation services

See this on your own phone line.

A pilot returns an evaluation in this exact shape, scored on your live voice agent under real callers.

Book a Pilot How a pilot runs

Services

How we work

Company

Where a phone-banking agent gets rigid.

Make the fraud path follow the script

Track in-session verification state

Improve grouped digit read-back

See this on your own phone line.