TL;DR
Code-switching in voice refers to speakers alternating between two or more languages during a spoken conversation, often within a single sentence. Over 250 million Indians engage in code-switched communication daily, blending Hindi with English (Hinglish), Tamil with English (Tanglish), and other combinations. Standard monolingual ASR models suffer roughly 42% word error rates on code-switched speech, making this the single biggest technical barrier to deploying voice AI in multilingual markets like India. Understanding code-switching is essential for anyone building or buying voice AI for financial services, customer support, or outbound calling.
What Is Code-Switching in Voice?
Code-switching is the practice of alternating between two or more languages during a single conversation. In linguistics, it’s defined as “the use of more than one linguistic variety in a manner consistent with the syntax and phonology of each variety.” It can happen between sentences, within a single sentence, or even at the level of individual words and word fragments.
In the voice AI context, code-switching voice refers specifically to this behavior as it occurs in spoken interactions, the kind that automatic speech recognition (ASR), natural language understanding (NLU), and text-to-speech (TTS) systems must process. When a borrower calls about a late EMI payment and says, “My payment is late. Kya aap mujhe due date bata sakte hain?” that’s code-switching voice in action.
This is not the same as sociological code-switching, which describes adjusting your behavior, tone, or mannerisms for different social settings. The voice AI meaning is strictly linguistic: mixing languages in speech.
And it’s not a sign of confusion or poor language skills. Linguists consistently find that code-switching indicates strong metalinguistic awareness. Speakers who code-switch are demonstrating fluency in multiple languages simultaneously.
The scale is enormous. India’s 2011 Census recorded 314.9 million bilingual speakers, representing 26% of the population. More recent estimates suggest over 250 million Indians engage in code-switched communication daily, particularly blending English with Hindi. With 870 million Indians accessing the internet in Indic languages as of 2024, the gap between how people actually speak and how voice systems are built has never been wider.
Types of Code-Switching in Spoken Language
Linguists identify three main types of code-switching, each posing different challenges for voice AI systems.
Intersentential Switching
This is switching languages between complete sentences. One sentence is entirely in Hindi, the next entirely in English.
Example: “My account balance seems wrong. Kya aap check kar sakte hain?”
This is the easiest type for ASR to handle because language boundaries align with sentence boundaries. The system can detect the language shift at a natural pause and adjust its recognition model accordingly.
Intrasentential Switching
This is switching languages within a single sentence, blending the grammar of one language with words from another.
Example: “Mujhe flight book karni hai” (I need to book a flight), where Hindi sentence structure wraps around English nouns.
Practitioners at Hamming AI include this exact pattern, “मुझे flight book करनी है,” in their testing framework for code-switched voice agents. It represents the most common type of Hindi-English switching: technical or commercial terms in English embedded within Hindi syntax. It’s also the hardest type for ASR systems because there’s no clean boundary where the language changes.
Tag-Switching (Extrasentential)
This involves inserting a short tag, filler, or discourse marker from another language into an otherwise monolingual sentence.
Example: “You know, yeh thoda mushkil hai.” Or ending a Hindi sentence with “right?”
Tag switches are brief, unpredictable, and easy for humans to process but surprisingly difficult for machines to catch.
| Type | What Happens | AI Difficulty | Example |
|---|---|---|---|
| Intersentential | Language changes between sentences | Moderate | “Please check my balance. Mera last payment kab hua tha?” |
| Intrasentential | Languages blend within one sentence | High | “Mujhe loan EMI ka reminder set karna hai” |
| Tag-switching | Short tag/filler from another language | Moderate-High | “Haan, so basically, yeh kaam nahi kar raha” |
For teams building or evaluating conversational AI for contact centers, understanding these three types helps set realistic expectations for what voice agents can and cannot handle today.
Code-Switching vs. Code-Mixing: What’s the Difference?
In strict linguistic terminology, code-switching refers to alternating between languages at sentence or clause boundaries, while code-mixing refers to blending languages within a single utterance or phrase. The distinction matters in academic research.
In the voice AI industry, though, the two terms are used interchangeably. Vendor documentation, product specs, and most research papers use “code-switching” as the umbrella term that covers both phenomena. If you see a voice AI platform claiming it “handles code-mixing,” it means the same thing as handling code-switching.
For practical purposes, don’t worry about the distinction. What matters is whether the system can accurately process speech that freely mixes languages, regardless of where the switch occurs.
Why Code-Switching Matters in Voice AI
Code-switching voice isn’t a niche edge case. It’s the default mode of communication for hundreds of millions of people, and it’s the primary reason voice AI systems fail in multilingual markets.
The Accuracy Problem
Monolingual ASR models, the kind trained on clean single-language datasets, fall apart when they encounter code-switched speech. The HiACC benchmark study found that standard monolingual models experience approximately 42% word error rate (WER) on code-switched Hinglish. Broader evaluations show a 30 to 50% relative increase in WER when ASR models are exposed to code-switched speech compared to monolingual input.
A 42% WER means nearly half the words are transcribed incorrectly. At that error rate, intent detection becomes guesswork.
The Business Impact
When ASR fails on code-switched voice input, the damage cascades through the entire system. Wrong transcription leads to wrong intent classification, which leads to wrong responses, which leads to failed tasks. In financial services, where voice AI handles collections calls, KYC verification, and EMI reminders, a failed task isn’t just a bad experience. It’s lost revenue, compliance risk, and eroded trust.
Practitioners at HuskyVoice put it bluntly: global voice AI platforms assume one language per conversation, clean grammar, and predictable phrasing. India violates all three. They note that the challenges compound when you add Indian name pronunciations, numeric format variations (“25 thousand” vs. “25 hazaar”), and indirect yes/no responses to the code-switching problem.
Gladia, a speech-to-text provider, warns that voice agents can mishear or misinterpret code-switched conversations and switch languages even when the customer isn’t multilingual, creating what they call “exceptionally poor” customer experiences.
The BFSI Angle
Financial services is where code-switching voice matters most. A collections agent calling about an overdue EMI in Bihar will hear Hinglish. A KYC verification call in Chennai will involve Tanglish. A loan origination call in Kolkata might blend Bengali and English.
Researchers at the National Institutes of Health published work on a bilingual banking assistant that handles English, Hindi, and Hinglish, processing natural language banking queries with automatic language detection. The system supports voice input and output, reflecting the reality that banking customers in India don’t stick to one language.
How Code-Switching Affects the Voice AI Pipeline
Code-switching doesn’t just challenge one component. It stresses every layer of a voice AI stack.
ASR (Speech-to-Text)
The ASR layer must detect language switches in real time, sometimes mid-word. It needs to handle mixed-script phonemes, like Hindi phonological patterns with English loanwords. NVIDIA’s NeMo framework demonstrates that multilingual ASR models can transcribe code-switched speech using language ID tagging and aggregate tokenizers, but this requires purpose-built architecture, not a standard monolingual model with a second language bolted on.
The Shunya Labs engineering blog captures the core tradeoff in multilingual ASR: broad language coverage with poor accuracy, or high accuracy in a few languages. This is sometimes called the “curse of multilinguality,” where adding more languages to a single model dilutes the performance on each one.
NLU (Natural Language Understanding)
Even if ASR transcribes code-switched voice correctly, the NLU layer must extract the right intent. When a customer says “Mujhe refund kab milega?” (When will I get my refund?), the key entity “refund” is in English while the question structure is Hindi. The NLU must not treat this as two separate fragments.
TTS (Text-to-Speech)
The overlooked piece. Voice AI must not only understand code-switching but also produce it naturally. A 2025 study in Frontiers in Computer Science examined how bilingual listeners perceive code-switched TTS across different synthesis methods. The finding: synthesis method significantly impacts comprehension. A voice agent that responds in stilted, mono-language output after receiving a fluid code-switched query sounds robotic and untrustworthy.
For organizations running AI-powered call centers in India, failure at any of these three layers means failed conversations at scale.
How Modern Voice AI Handles Code-Switching
The industry is moving in two broad directions.
End-to-End Multilingual Models
Rather than routing calls to separate language-specific models, newer systems use a single model trained on multilingual and code-switched data. This approach avoids the latency and error of language detection as a separate step. Practitioners on Bolna AI’s platform report that GPT-4o mini handles code-mixed Indian language conversations well, and that Sarvam’s India Multi model can automatically identify caller language including code-mixed speech.
Domain-Specific Fine-Tuning
General-purpose multilingual models still struggle with domain-specific vocabulary. A voice agent for loan collections needs to understand “EMI,” “overdue,” “principal,” and “moratorium” as they appear in Hinglish, not just in clean English. Fine-tuning on domain-specific, code-switched data is what separates voice AI that works in production from demos that work in controlled settings.
The Dataset Problem
A major bottleneck is data. Publicly available code-switched speech datasets remain scarce. The AI4Bharat IndicVoices project represents the largest Indian speech dataset, with 23,700 hours of speech from 51,000 speakers across 22 languages. The HiACC corpus specifically targets Hinglish code-switching with 5.24 hours of annotated code-switched speech. These are valuable resources, but still small compared to the monolingual English datasets that power mainstream ASR.
For a deeper look at how multilingual voice AI works across the full technology stack, see this complete guide to multilingual conversational AI.
Code-Switching Examples in Real Voice Interactions
Here are examples of code-switching voice as it actually occurs in Indian customer service and financial conversations.
| Language Pair | Context | What the Caller Says | Notes |
|---|---|---|---|
| Hindi-English | EMI reminder | “Mera last EMI payment miss ho gaya, can you check?” | Intrasentential, common in collections |
| Hindi-English | Refund inquiry | “Mujhe refund kab milega? It’s been two weeks.” | Intersentential switch |
| Tamil-English | KYC verification | “Enna documents upload pannanum?” (What documents should I upload?) | English noun in Tamil syntax |
| Bengali-English | Loan inquiry | “Amar loan application er status ki? I applied last Monday.” | Intersentential, mixes Bengali question with English context |
| Hindi-English | Payment confirmation | “Twenty-five hazaar ka payment ho gaya kya?” | Numeric code-switching, “25 hazaar” instead of “25 thousand” |
| Hindi-English | Account issue | “You know, mera account mein kuch gadbad hai” | Tag-switching with English filler |
These patterns appear daily in the banking customer experience, and they are the norm rather than the exception. Any voice AI deployed in Indian financial services will encounter some version of every row in this table.
How to Test for Code-Switching in Voice AI
Testing code-switching voice capabilities requires a structured approach. Two metrics matter most.
Word Error Rate (WER)
WER is the standard metric for ASR accuracy. The formula:
WER = (Substitutions + Deletions + Insertions) / Total Words × 100
For code-switching evaluation, WER should be measured separately on monolingual and code-switched utterances. The gap between the two numbers reveals how much accuracy the system loses when languages mix. A baseline monolingual model will show roughly 42% WER on Hinglish. A system built for code-switching should cut that significantly.
Task Completion Rate
WER alone doesn’t tell the full story. What matters for business outcomes is whether the voice agent completes the task: did the borrower confirm their payment date? Did the KYC data get collected? Hamming AI recommends testing with 10 to 20 code-switched utterances per language pair, in both directions, with a pass threshold of 80% or higher task completion.
Building a Test Protocol
A practical testing approach:
- Create 10 to 20 code-switched utterances per language pair relevant to your use case
- Include all three types (intersentential, intrasentential, tag-switching)
- Test both directions (Hindi sentence with English words, and English sentence with Hindi words)
- Measure ASR accuracy, intent recognition accuracy, and end-to-end task completion
- Compare against monolingual baselines
Understanding the cost implications of code-switching failures, including rework, escalation to human agents, and repeat calls, can be quantified using a framework for calculating call center cost per minute.
Frequently Asked Questions
Is code-switching the same as being bilingual?
Not exactly. Bilingualism means a person can speak two languages. Code-switching is a specific behavior that bilingual (and multilingual) speakers engage in: mixing those languages during a single conversation. All code-switchers are multilingual, but not all multilingual speakers code-switch frequently.
Why do people code-switch during phone calls?
It’s natural. Speakers code-switch for many reasons: certain concepts are easier to express in one language, they want to match the formality level of the conversation, or they simply think in mixed languages. In India, where education often happens in English but daily life happens in regional languages, mixing is the default for hundreds of millions of speakers.
Can voice AI learn to code-switch in its responses?
Yes. Modern TTS systems can generate code-switched output, though the quality varies. Research published in Frontiers in Computer Science shows that the synthesis method significantly impacts how natural and intelligible code-switched TTS sounds to bilingual listeners. The best systems match the caller’s language mix rather than forcing a single language.
What is Hinglish and why does it matter for voice AI?
Hinglish is the informal blend of Hindi and English spoken by hundreds of millions of Indians. It’s the most common form of code-switching voice in Indian markets. For voice AI systems serving Indian customers, especially in BFSI, Hinglish support is not optional. It’s a baseline requirement.
How much does code-switching hurt ASR accuracy?
Significantly. Benchmark studies show monolingual ASR models experience a 30 to 50% relative increase in word error rate on code-switched speech. On Hinglish specifically, the HiACC benchmark recorded approximately 42% WER for monolingual-trained models.
What datasets exist for training code-switched voice AI?
The two most significant publicly available resources are AI4Bharat’s IndicVoices dataset (23,700 hours across 22 Indian languages) and the HiACC corpus (5.24 hours of annotated Hinglish code-switched speech from adults and children). Both are available for research, but the scarcity of large-scale, domain-specific code-switched datasets remains a major bottleneck.
Does code-switching only matter in India?
No. Code-switching is common wherever multilingual populations exist: Spanish-English in the United States, French-Arabic in North Africa, Malay-English in Southeast Asia. But India, with over 250 million code-switching speakers and 22 official languages, is the largest and most complex market for code-switching voice AI.
Build Voice AI That Handles Code-Switching
Code-switching voice is the reality of how multilingual populations communicate. For any organization deploying voice AI in India, particularly in financial services, getting code-switching right is the difference between a voice agent that completes tasks and one that frustrates callers.
Awaaz AI builds multilingual voice AI agents that support 8+ Indian languages with code-switching capabilities, purpose-built for BFSI use cases like collections, KYC, and EMI reminders. To see how it handles code-switched conversations in production, book a demo or explore more voice AI insights on the blog.
