Where a phone-banking agent gets rigid.
A representative Voice Agent evaluation of a fintech phone line fielding balance, transaction, and dispute calls. Illustrative throughout — it shows the depth of a real evaluation, not a named client result.
System evaluated
Phone-banking voice agent
Domain
Fintech phone banking
Reporting period
Single evaluation cycle
Channel
Voice · inbound
Core accuracy is high, so the evaluation pressured the flex points: recovering when verification stalls, capturing spoken account digits, and handling a caller who reports a fraudulent charge. These are scripted-behavior questions, not model accuracy.
Scope
190 human-tested calls · 4 caller profiles · 9 scenarios · English
Performance
Weighted overall: 3.7 / 5 — Acceptable, with a critical gap on the dispute path.
Diagnosis
Caller
I think there is a fraudulent charge on my account.
Agent
You can review your recent transactions in the app.
Impact. An urgent fraud report is deflected to self-service instead of following the defined fraud-handling script. This is the critical finding.
Recommendation. Detect distress and fraud intents and follow the client script for escalation immediately, without a self-service detour.
Caller
I already gave you my date of birth twice.
Agent
For security, please provide your date of birth.
Impact. The agent loops on a verification step the caller has already cleared. Callers abandon before reaching their actual request.
Recommendation. Track completed verification steps in-session and never re-request a check that is already satisfied.
Caller
The last four are five-zero-nine-three.
Agent
I have five-zero-nine-five — is that right?
Impact. Digit strings are misheard under accent variation, stalling account lookups on repeated re-confirmation.
Recommendation. Confirm digit strings as a single grouped read-back rather than re-prompting digit by digit.
An excerpt of the per-call log — every finding carries a reproducible Call ID, a failure type, and a severity on the 4-band scale.
Resolution
71 findings total — scored on the Voice rubric and ranked by severity, the same way every evaluation reports.
Make the fraud path follow the script
Closes the one critical gap — an urgent fraud report must follow the defined escalation script, not self-service. Highest priority.
Track in-session verification state
Ends the re-verification loop that drives caller abandonment.
Improve grouped digit read-back
Reduces the lookup stalls caused by misread account digits.
Overall 3.7 / 5 — Acceptable. Core intent and accuracy hold up, but the agent is rigid where it most needs to flex — verification recovery and sensitive-intent handling — and one critical gap sits on the fraud path. These are scripted-behavior fixes, not model retraining. With the fraud path and verification state addressed it is pilot-ready on routine balance and transaction calls; the fraud and dispute path should stay supervised until remediated.
Prepared by KNK Global · evaluation services
See this on your own phone line.
A pilot returns an evaluation in this exact shape, scored on your live voice agent under real callers.