Cross-Platform Text-to-Speech with Real-time Highlighting (Kotlin Multiplatform + Swift…

M

Meet

Guest

Cross-Platform Text-to-Speech with Real-time Highlighting (Kotlin Multiplatform + Swift Interoperability)​

1*_6IJ_299dPUIQhrSmbfkTQ.png


In this tutorial, we’ll walk through how to build a cross-platform Text-to-Speech (TTS) app targeted for Android and iOS, using Kotlin Multiplatform (KMP). The standout feature is real-time highlightingβ€Šβ€”β€Šas the app reads text aloud, it highlights the currently spoken word, providing a rich, accessible reading experience.

Let’s get started!

Project Setup

If you haven’t already created a Compose Multiplatform project, head over to the Kotlin Multiplatform Wizard website.

  • Select the platforms: Android, iOS, and Desktop.
  • Make sure that the Share UI option is selected for iOS.
    (This ensures your Compose UI code is reused across all platforms.)
  • Project Name: You can set this to TextToSpeech-CMP (or any name you like)
  • Project ID: You can use org.example.texttospeech (or customize as needed)

After configuring your options, download the generated project template.

Once downloaded, open the project in Android Studio or IntelliJ IDEA.
Now you’re ready to implement cross-platform toast notifications!

Step 1: Define the TTSProvider Interface

We start by creating a common interface in the shared module. This interface defines the contract for text-to-speech functionality across Android and iOS. By keeping it simple, we ensure that both platforms follow the same structure while allowing platform-specific implementations underneath.

// composeApp/src/commonMain/kotlin/your_package_name/TTSProvider.kt

interface TTSProvider {
fun initialize(onInitialized: () -> Unit)
fun speak(
text: String,
onWordBoundary: (Int, Int) -> Unit,
onStart: () -> Unit,
onComplete: () -> Unit
)

fun stop()
fun pause()
fun resume()
fun isPlaying(): Boolean
fun isPaused(): Boolean
fun release()
}

Next, we define an expect function that will be implemented differently on each platform:

// composeApp/src/commonMain/kotlin/your_package_name/TextToSpeechManager.kt
expect fun getTTSProvider(): TTSProvider

Note: expect means There will be an actual implementation for every platform (Android, iOS). This allows KMP to use the same API in shared code, while resolving to platform-specific implementations at runtime.

This interface defines the essential TTS operations like initialize, speak, pause, resume, and stop. It also supports real-time word boundary callbacks for highlighting text as it is spoken.

Step 2: Add actual Implementations for Each Platform

After you declare the expect class, your IDE (IntelliJ IDEA/Android Studio) will show a warning like:

Expected function β€˜getTTSProvider’ has no actual declaration in module TextToSpeech.composeApp.iosArm64Main…

You will see a lightbulb or a popup with the option:
β€œAdd missing actual declarations”

  1. Hover over getTTSProvider or the warning to see this popup:
1*toMLM6krkky3cIjgvlcKjQ.png


IDE warning: β€œExpected function β€˜getTTSProvider’ has no actual declaration…”

2. Click on β€œAdd missing actual declarations.”
A new dialog will open where you can select the source sets:

1*JN-MX8nSW1uaI1kMEvX0yQ.png


Select the source sets (androidMain, iosMain) for your actual implementations and click OK.

After this, the IDE will create platform-specific actual class stubs where you’ll add the real toast code for Android, iOS, and Desktop.

Step 3: Android Implementationβ€Šβ€”β€ŠText to Speech Needs Context!

On Android, Text-to-Speech requires a Context (usually an Activity). To make it accessible across the app, we use an Activity Provider approach.

1. Create an Activity Provider​


We define a provider function in the Android source set to supply the current Activity.

// composeApp/src/androidMain/kotlin/your_package_name/TextToSpeechManager.kt
import android.app.Activity

private var activityProvider: () -> Activity? = {
null
}

fun setActivityProvider(provider: () -> Activity?) {
activityProvider = provider
}

2. Set the Provider in MainActivity​


In your MainActivity, call setActivityProvider so that the TTS manager can always access the activity context.

// composeApp/src/androidMain/kotlin/your_package_name/MainActivity.kt
class MainActivity : ComponentActivity() {
override fun onCreate(savedInstanceState: Bundle?) {
super.onCreate(savedInstanceState)
setActivityProvider { this }
setContent {
App()
}
}
}

3. Create AndroidTTSProvider​


We first create the AndroidTTSProvider class inside the androidMain source set. This class implements the shared TTSProvider interface and uses the Android TextToSpeech API.

It manages:

  • Initializing the TTS engine with the current locale.
  • Speaking text with real-time word boundary callbacks for highlighting.
  • Handling pause, resume, and stop logic.
  • Tracking playback state (isPlaying, isPaused).
  • Cleaning up resources with release.

// composeApp/src/androidMain/kotlin/your_package_name/TextToSpeechManager.kt

class AndroidTTSProvider : TTSProvider {
private var tts: TextToSpeech? = null
private var context = activityProvider.invoke()

private var isPausedState = false
private var originalText: String = ""
private var pausedPosition = 0
private var resumeOffset = 0

// Callback blocks
private var onWordBoundaryCallback: ((Int, Int) -> Unit)? = null
private var onCompleteCallback: (() -> Unit)? = null

override fun initialize(onInitialized: () -> Unit) {
println("πŸš€ Android TTS Initialized")
context?.let { ctx ->
tts = TextToSpeech(ctx) { status ->
if (status == TextToSpeech.SUCCESS) {
tts?.language = Locale.getDefault()
println("βœ… Android TTS engine ready")
onInitialized()
} else {
println("❌ Android TTS initialization failed with status: $status")
}
}
}
}

override fun speak(
text: String,
onWordBoundary: (wordStart: Int, wordEnd: Int) -> Unit,
onStart: () -> Unit,
onComplete: () -> Unit
) {
println("πŸ—£οΈ Android Speak called with text: '${text.take(50)}...'")
println("πŸ“Š Current state - isPaused: $isPausedState, resumeOffset: $resumeOffset")

// Store callbacks for resume functionality
onWordBoundaryCallback = onWordBoundary
onCompleteCallback = onComplete

// Check if originalText is empty to determine if this is first time or resume
val isFirstTimeSpeak = originalText.isEmpty()

if (isFirstTimeSpeak) {
println("πŸ†• First time speaking - resetting state")
originalText = text
pausedPosition = 0
resumeOffset = 0
} else {
println("πŸ”„ Resume speaking - keeping resumeOffset: $resumeOffset")
}

// Set paused state to false after checking
isPausedState = false

tts?.let { textToSpeech ->
val utteranceId = "tts_utterance_${System.currentTimeMillis()}"
println("🎬 Starting utterance with ID: $utteranceId")

textToSpeech.setOnUtteranceProgressListener(object : UtteranceProgressListener() {
override fun onStart(utteranceId: String?) {
println("🎀 Android TTS Started")
onStart()
}

override fun onDone(utteranceId: String?) {
println("βœ… Android TTS Finished - isPaused: $isPausedState")
if (!isPausedState) {
println("🏁 Speech finished normally")
onWordBoundary(-1, -1) // Reset highlight
onComplete()
// Reset everything after completion
originalText = ""
pausedPosition = 0
resumeOffset = 0
println("πŸ”„ State reset after completion")
} else {
println("⏸️ Speech finished due to pause - keeping state")
}
}

override fun onError(utteranceId: String?) {
println("❌ Android TTS Error occurred")
onComplete()
}

override fun onRangeStart(utteranceId: String?, start: Int, end: Int, frame: Int) {
if (!isPausedState) {
// Calculate position in original text for resume functionality
val actualStart = resumeOffset + start
val actualEnd = resumeOffset + end - 1

println("🎯 Android word boundary: local($start-$end) -> actual($actualStart-$actualEnd)")
println("πŸ“ Original text length: ${originalText.length}, resumeOffset: $resumeOffset")

// Bounds check
if (actualStart >= 0 && actualStart < originalText.length) {
// Find word boundaries in original text
val wordStart = findWordStart(originalText, actualStart)
val wordEnd =
findWordEnd(originalText, minOf(actualEnd, originalText.length - 1))

// Update paused position for future resume
pausedPosition = wordStart

println("✨ Android highlighting: $wordStart-$wordEnd, updated pausedPosition: $pausedPosition")

// Show highlighted text
if (wordStart <= wordEnd && wordEnd < originalText.length) {
val highlightedText = originalText.substring(wordStart, wordEnd + 1)
println("πŸ“ Highlighted text: '$highlightedText'")
}

onWordBoundary(wordStart, wordEnd)
} else {
println("⚠️ Android word boundary actualStart($actualStart) out of bounds!")
}
}
}
})

textToSpeech.speak(text, TextToSpeech.QUEUE_FLUSH, null, utteranceId)
}
}

override fun stop() {
println("πŸ›‘ Android Stop called")
tts?.stop()
isPausedState = false
pausedPosition = 0
resumeOffset = 0
originalText = ""
onWordBoundaryCallback?.invoke(-1, -1)
println("πŸ”„ All state reset after stop")
}

override fun pause() {
println("⏸️ Android Pause called")
if (tts?.isSpeaking == true) {
println("πŸ“ Pausing at position: $pausedPosition")
isPausedState = true
tts?.stop()
} else {
println("⚠️ Cannot pause - TTS not speaking")
}
}

override fun resume() {
println("▢️ Android Resume called")
println("πŸ“Š Resume state - isPaused: $isPausedState, pausedPos: $pausedPosition, originalText.length: ${originalText.length}")

if (isPausedState && originalText.isNotEmpty()) {
// Find the remaining text from paused position
val remainingText = if (pausedPosition < originalText.length) {
// Find the start of the word at paused position to avoid cutting words
val wordStartPos = findWordStart(originalText, pausedPosition)
resumeOffset = wordStartPos // Set offset for correct highlighting
println("πŸ“ Resume offset set to: $resumeOffset")

val remaining = originalText.substring(wordStartPos)
println("πŸ“ Remaining text: '${remaining.take(50)}...'")
remaining
} else {
println("⚠️ No remaining text to speak")
return // Nothing left to speak
}

// Resume speaking with the remaining text
onWordBoundaryCallback?.let { callback ->
onCompleteCallback?.let { complete ->
println("πŸ”„ Calling speak with remaining text, resumeOffset should stay: $resumeOffset")
speak(remainingText, callback, {}, complete)
println("πŸ“ After speak call, resumeOffset is: $resumeOffset")
}
}
} else {
println("⚠️ Cannot resume - not in paused state or no original text")
}
}

override fun isPlaying(): Boolean {
val playing = tts?.isSpeaking == true && !isPausedState
println("❓ Android isPlaying: $playing (speaking: ${tts?.isSpeaking}, paused: $isPausedState)")
return playing
}

override fun isPaused(): Boolean {
println("❓ Android isPaused: $isPausedState")
return isPausedState
}

override fun release() {
println("πŸ—‘οΈ Android Release called")
tts?.shutdown()
tts = null
isPausedState = false
pausedPosition = 0
resumeOffset = 0
originalText = ""
println("πŸ”„ Android TTS completely released")
}

private fun findWordStart(text: String, position: Int): Int {
var start = maxOf(0, minOf(position, text.length - 1))
while (start > 0 && !text[start - 1].isWhitespace()) {
start--
}
return start
}

private fun findWordEnd(text: String, position: Int): Int {
var end = maxOf(0, minOf(position, text.length - 1))
while (end < text.length - 1 && !text[end + 1].isWhitespace()) {
end++
}
return end
}
}

4. Implement the actual getTTSProvider for Android​


Finally, we connect the shared expect fun getTTSProvider() to our Android implementation using the actual keyword:

// composeApp/src/androidMain/kotlin/your_package_name/TextToSpeechManager.android.kt

actual fun getTTSProvider(): TTSProvider {
return AndroidTTSProvider()
}
Step 4. Implement the actual getTTSProvider for iOS

IOS TTS uses AVFoundation’s AVSpeechSynthesizer. Here we bridge Kotlin Multiplatform with Swift using an actual implementation.

1. Kotlin (iosMain) Side​


We declare an actual function and a setTTSProvider method so Swift can provide the real implementation:

// composeApp/src/iosMain/kotlin/your_package_name/TextToSpeechManager.ios.kt
private var ttsProvider: () -> TTSProvider? = { null }

fun setTTSProvider(provider: () -> TTSProvider) {
ttsProvider = provider
}

actual fun getTTSProvider(): TTSProvider {
return ttsProvider.invoke() ?: throw IllegalStateException("TTS provider not set")
}

2. Swift Side (iosApp)​


We build the actual implementation using AVSpeechSynthesizer.

  • TTSManagerIOS (singleton) implements TTSProvider
  • Uses a delegate (TTSSynthesizerDelegate) to track start, word boundaries, and finish events
  • Supports pause, resume, stop, and release
  • Handles real-time highlighting by mapping word boundaries correctly
1*RpzrBtMfXARnUGHNdhXBSg.png


// iosApp/iosApp/TTSManagerIOS.swift
import ComposeApp
import AVFoundation
import Foundation

class TTSManagerIOS: ComposeApp.TTSProvider {
static let shared = TTSManagerIOS()

private let synthesizer = AVSpeechSynthesizer()
private var delegateHandler: TTSSynthesizerDelegate?

private var isPausedState = false
private var originalText = ""
private var pausedPosition = 0
private var resumeOffset = 0

// Callback blocks
private var onWordBoundaryCallback: ((KotlinInt, KotlinInt) -> Void)?
private var onStartCallback: (() -> Void)?
private var onCompleteCallback: (() -> Void)?

private func setupDelegate() {
delegateHandler = TTSSynthesizerDelegate(
onStart: { [weak self] in
print("🎀 TTS Started")
self?.onStartCallback?()
},
onWordBoundary: { [weak self] start, end in
print("πŸ“ Word boundary: \(start)-\(end)")
self?.handleWordBoundary(start: start, end: end)
},
onFinish: { [weak self] in
print("βœ… TTS Finished")
self?.handleFinish()
}
)
synthesizer.delegate = delegateHandler
}

func initialize(onInitialized: @escaping () -> Void) {
print("πŸš€ TTS Initialized")
setupDelegate()
onInitialized()
}

func speak(
text: String,
onWordBoundary: @escaping (KotlinInt, KotlinInt) -> Void,
onStart: @escaping () -> Void,
onComplete: @escaping () -> Void
) {
print("πŸ—£οΈ Speak called with text: '\(text.prefix(50))...'")
print("πŸ“Š Current state - isPaused: \(isPausedState), resumeOffset: \(resumeOffset)")

// Store callbacks
onWordBoundaryCallback = onWordBoundary
onStartCallback = onStart
onCompleteCallback = onComplete

// Check if originalText is empty to determine if this is first time or resume
let isFirstTimeSpeak = originalText.isEmpty

if isFirstTimeSpeak {
print("πŸ†• First time speaking - resetting state")
originalText = text
pausedPosition = 0
resumeOffset = 0
} else {
print("πŸ”„ Resume speaking - keeping resumeOffset: \(resumeOffset)")
// This is a resume call - don't reset anything
}

// Set paused state to false after checking
isPausedState = false

let utterance = AVSpeechUtterance(string: text)
utterance.rate = 0.5

synthesizer.speak(utterance)
}

func stop() {
print("πŸ›‘ Stop called")
synthesizer.stopSpeaking(at: .immediate)
isPausedState = false
pausedPosition = 0
resumeOffset = 0
originalText = ""
onWordBoundaryCallback?(-1, -1)
}

func pause() {
print("⏸️ Pause called")
print("πŸ“ Pausing at position: \(pausedPosition)")
synthesizer.stopSpeaking(at: .immediate) // Changed from pauseSpeaking to stopSpeaking
isPausedState = true
}

func resume() {
print("▢️ Resume called")
print("πŸ“Š Resume state - isPaused: \(isPausedState), pausedPos: \(pausedPosition), originalText.count: \(originalText.count)")

if isPausedState && !originalText.isEmpty {
let remainingText = getRemainingText()
print("πŸ“ Remaining text: '\(remainingText.prefix(50))...'")

if !remainingText.isEmpty {
let wordStartPos = findWordStart(text: originalText, position: pausedPosition)
resumeOffset = wordStartPos
print("πŸ“ Resume offset set to: \(resumeOffset)")

if let wordBoundary = onWordBoundaryCallback,
let start = onStartCallback,
let complete = onCompleteCallback {

// Set paused state to false BEFORE calling speak
isPausedState = false
print("πŸ”„ Calling speak with remaining text, resumeOffset should stay: \(resumeOffset)")
speak(text: remainingText, onWordBoundary: wordBoundary, onStart: start, onComplete: complete)
print("πŸ“ After speak call, resumeOffset is: \(resumeOffset)")
}
}
}
}

func isPlaying() -> Bool {
let playing = synthesizer.isSpeaking && !isPausedState
print("❓ isPlaying: \(playing) (speaking: \(synthesizer.isSpeaking), paused: \(isPausedState))")
return playing
}

func isPaused() -> Bool {
print("❓ isPaused: \(isPausedState)")
return isPausedState
}

func release() {
print("πŸ—‘οΈ Release called")
synthesizer.stopSpeaking(at: .immediate)
synthesizer.delegate = nil
delegateHandler = nil
isPausedState = false
pausedPosition = 0
resumeOffset = 0
originalText = ""
}

private func handleWordBoundary(start: Int, end: Int) {
if !isPausedState {
// Calculate position in original text (same logic as Android)
let actualStart = resumeOffset + start
let actualEnd = resumeOffset + end - 1 // Note: end-1 like Android

print("🎯 Handling word boundary: local(\(start)-\(end)) -> actual(\(actualStart)-\(actualEnd))")
print("πŸ“ Original text length: \(originalText.count), resumeOffset: \(resumeOffset)")

guard actualStart >= 0 && actualStart < originalText.count else {
print("⚠️ Word boundary actualStart(\(actualStart)) out of bounds!")
return
}

// Find word boundaries in original text (same as Android)
let wordStart = findWordStart(text: originalText, position: actualStart)
let wordEnd = findWordEnd(text: originalText, position: min(actualEnd, originalText.count - 1))

// Update paused position for future resume (same as Android)
pausedPosition = wordStart

print("✨ Highlighting: \(wordStart)-\(wordEnd), updated pausedPosition: \(pausedPosition)")
print("πŸ“ Highlighted text: '\(String(Array(originalText)[wordStart...wordEnd]))'")

onWordBoundaryCallback?(KotlinInt(integerLiteral: wordStart), KotlinInt(integerLiteral: wordEnd))
}
}

private func handleFinish() {
print("🏁 Speech finished - isPaused: \(isPausedState)")
if !isPausedState {
print("🏁 Speech finished normally")
onWordBoundaryCallback?(-1, -1)
onCompleteCallback?()
originalText = ""
pausedPosition = 0
resumeOffset = 0
} else {
print("⏸️ Speech finished due to pause - keeping state")
// Don't reset state when paused, keep everything for resume
}
}

private func getRemainingText() -> String {
if pausedPosition < originalText.count {
let wordStartPos = findWordStart(text: originalText, position: pausedPosition)
let startIndex = originalText.index(originalText.startIndex, offsetBy: wordStartPos)
let remaining = String(originalText[startIndex...])
print("πŸ“ getRemainingText: pausedPos=\(pausedPosition), wordStart=\(wordStartPos), remaining='\(remaining.prefix(30))...'")
return remaining
}
print("πŸ“ getRemainingText: No remaining text")
return ""
}

private func findWordStart(text: String, position: Int) -> Int {
let safePosition = max(0, min(position, text.count - 1))
var start = safePosition

let textArray = Array(text)
while start > 0 && !textArray[start - 1].isWhitespace {
start -= 1
}
return start
}

private func findWordEnd(text: String, position: Int) -> Int {
let safePosition = max(0, min(position, text.count - 1))
var end = safePosition

let textArray = Array(text)
while end < textArray.count - 1 && !textArray[end + 1].isWhitespace {
end += 1
}
return end
}
}

private class TTSSynthesizerDelegate: NSObject, AVSpeechSynthesizerDelegate {
let onStart: () -> Void
let onWordBoundary: (Int, Int) -> Void
let onFinish: () -> Void

init(
onStart: @escaping () -> Void,
onWordBoundary: @escaping (Int, Int) -> Void,
onFinish: @escaping () -> Void
) {
self.onStart = onStart
self.onWordBoundary = onWordBoundary
self.onFinish = onFinish
super.init()
}

func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didStart utterance: AVSpeechUtterance) {
onStart()
}

func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, willSpeakRangeOfSpeechString characterRange: NSRange, utterance: AVSpeechUtterance) {
let start = characterRange.location
let end = characterRange.location + characterRange.length - 1
onWordBoundary(start, end)
}

func speechSynthesizer(_ synthesizer: AVSpeechSynthesizer, didFinish utterance: AVSpeechUtterance) {
onFinish()
}
}

3. Connecting Swift with Compose Multiplatform​


We pass the iOS TTS provider into the KMP side via ContentView.swift:

// iosApp/iosApp/ContentView.swift
struct ComposeView: UIViewControllerRepresentable {
func makeUIViewController(context: Context) -> UIViewController {
MainViewControllerKt.MainViewController(
ttsProvider: TTSManagerIOS.shared
)
}
func updateUIViewController(_ uiViewController: UIViewController, context: Context) {}
}

And then consume it in Kotlin:

// composeApp/src/iosMain/kotlin/your_package_name/MainViewController.kt
fun MainViewController(ttsProvider: TTSProvider) = ComposeUIViewController(
configure = {
setTTSProvider { ttsProvider }
}
) { App() }
Step 5: Create TTSState and TTSViewModel

In this step, we introduce a state management layer to control and observe our Text-to-Speech functionality.

1. Define TTSState (Enum):​


This represents the current status of the TTS engine.

// composeApp/src/commonMain/kotlin/your_package_name/TTSState.kt

enum class TTSState {
IDLE, PLAYING, PAUSED
}
  • We create an enum TTSState with three values:
  • IDLE β†’ Nothing is being spoken.
  • PLAYING β†’ Text-to-Speech is currently speaking.
  • PAUSED β†’ Speech is paused and can be resumed.

This makes it easier to know what the current status of TTS is.

2. Create TTSViewModel​


In this step, we create a TTSViewModel that manages the Text-to-Speech (TTS) state and interaction with our TTSProvider.

// composeApp/src/commonMain/kotlin/your_package_name/TTSViewModel.kt

class TTSViewModel : ViewModel() {
// Track highlighted word range while speaking
private val _currentWordRange = MutableStateFlow(-1..-1)
val currentWordRange: StateFlow<IntRange> = _currentWordRange

// Manage TTS state (IDLE, PLAYING, PAUSED)
private val _ttsState = MutableStateFlow(TTSState.IDLE)
val ttsState: StateFlow<TTSState> = _ttsState

// Track initialization status
private val _isInitialized = MutableStateFlow(false)
val isInitialized: StateFlow<Boolean> = _isInitialized

// Get platform-specific TTS provider (Android/iOS)
private val ttsManager = getTTSProvider()
  • currentWordRange β†’ Tracks the currently highlighted word (e.g., 10..15).
  • ttsState β†’ Stores whether TTS is idle, playing, or paused.
  • isInitialized β†’ Lets the UI know when TTS is ready.
  • ttsManager β†’ Gets the platform-specific implementation (Android/iOS).

init {
ttsManager.initialize {
_isInitialized.value = true
}
}
  • Calls initialize on TTSManager (Android/iOS implementation).
  • Once ready, _isInitialized becomes true.

fun speak(text: String) {
// Reset highlight immediately when starting
_currentWordRange.update {
-1..-1
}

ttsManager.speak(
text = text,
onWordBoundary = { wordStart, wordEnd ->
_currentWordRange.update {
wordStart..wordEnd
}
},
onStart = {
_ttsState.update {
TTSState.PLAYING
}
},
onComplete = {
_ttsState.update {
TTSState.IDLE
}
_currentWordRange.update {
-1..-1
}
}
)
}
  • Starts TTS, highlights words as they are spoken, and updates state (PLAYING β†’ IDLE when complete).

fun stop() {
ttsManager.stop()
_ttsState.update {
TTSState.IDLE
}
_currentWordRange.update {
-1..-1
}
}

fun pause() {
if (_ttsState.value == TTSState.PLAYING) {
ttsManager.pause()
_ttsState.update {
TTSState.PAUSED
}
}
}

fun resume() {
if (_ttsState.value == TTSState.PAUSED) {
ttsManager.resume()
_ttsState.update {
TTSState.PLAYING
}
}
}

fun release() {
ttsManager.release()
}
  • pause() β†’ Pauses playback if it’s currently PLAYING, sets state to PAUSED.
  • resume() β†’ Resumes playback if it’s PAUSED, sets state back to PLAYING.
  • stop() β†’ Stops playback, clears highlights, resets state to IDLE.
  • release() β†’ Cleans up resources when the ViewModel is no longer needed.

fun isPlaying(): Boolean = ttsManager.isPlaying()
fun isPaused(): Boolean = ttsManager.isPaused()
fun isIdle(): Boolean = _ttsState.value == TTSState.IDLE
  • Helper methods for checking the current state.
Step 6: Create a HighlightedText Composable

Now that we have the TTSViewModel tracking the current spoken word range, we need a UI component that visually highlights the active word in the text.

// composeApp/src/commonMain/kotlin/your_package_name/HighlightedText.kt

import androidx.compose.material3.MaterialTheme
import androidx.compose.material3.Text
import androidx.compose.runtime.Composable
import androidx.compose.ui.Modifier
import androidx.compose.ui.graphics.Color
import androidx.compose.ui.text.SpanStyle
import androidx.compose.ui.text.buildAnnotatedString
import androidx.compose.ui.text.font.FontWeight
import androidx.compose.ui.text.withStyle
import androidx.compose.ui.unit.sp

@Composable
fun HighlightedText(
text: String,
highlightRange: IntRange,
modifier: Modifier = Modifier,
normalTextColor: Color = MaterialTheme.colorScheme.onSurface,
highlightColor: Color = MaterialTheme.colorScheme.primary.copy(alpha = 0.3f),
highlightTextColor: Color = MaterialTheme.colorScheme.onPrimary
) {
val annotatedString = buildAnnotatedString {
// Check if we have a valid highlight range
if (highlightRange.first >= 0 &&
highlightRange.last >= highlightRange.first &&
highlightRange.first < text.length
) {

val safeStart = maxOf(0, highlightRange.first)
val safeEnd = minOf(text.length - 1, highlightRange.last)

// Text before highlight
if (safeStart > 0) {
withStyle(SpanStyle(color = normalTextColor)) {
append(text.substring(0, safeStart))
}
}

// Highlighted text with enhanced styling
if (safeStart <= safeEnd) {
withStyle(
SpanStyle(
background = highlightColor,
color = highlightTextColor,
fontWeight = FontWeight.Bold,
letterSpacing = 0.5.sp
)
) {
append(text.substring(safeStart, safeEnd + 1))
}
}

// Text after highlight
if (safeEnd + 1 < text.length) {
withStyle(SpanStyle(color = normalTextColor)) {
append(text.substring(safeEnd + 1))
}
}
} else {
// No highlight, show normal text
withStyle(SpanStyle(color = normalTextColor)) {
append(text)
}
}
}

Text(
text = annotatedString,
modifier = modifier,
fontSize = 16.sp,
lineHeight = 28.sp,
style = MaterialTheme.typography.bodyLarge
)
}
  • highlightRange β†’ defines which part of the text should be highlighted (word currently being spoken).
  • Normal text styling β†’ default color using normalTextColor.
  • Highlighted text styling β†’ uses highlightColor, bold font, and letter spacing for better visibility.
  • Safety checks β†’ ensures no crash when range values are invalid (like -1..-1).
  • Fallback β†’ if no word is being spoken, the entire text is shown normally.

πŸ’‘ With this composable, whenever the TTSViewModel updates currentWordRange, the highlight automatically updates in real time.

Step 7: Build the TTS Screen (Compose UI)

In this step, we build the TTSScreen using Jetpack Compose Multiplatform. The screen connects to TTSViewModel and provides text input, highlighted speech display, and full playback controls (Play, Pause, Resume, Stop). It also shows a status indicator and handles initialization gracefully.

// composeApp/src/commonMain/kotlin/your_package_name/TTSScreen.kt

import androidx.compose.animation.*
import androidx.compose.foundation.*
import androidx.compose.foundation.layout.*
import androidx.compose.foundation.lazy.*
import androidx.compose.foundation.shape.*
import androidx.compose.material.icons.Icons
import androidx.compose.material.icons.automirrored.filled.*
import androidx.compose.material.icons.filled.*
import androidx.compose.material3.*
import androidx.compose.runtime.*
import androidx.compose.runtime.saveable.rememberSaveable
import androidx.compose.ui.*
import androidx.compose.ui.Modifier
import androidx.compose.ui.graphics.Color
import androidx.compose.ui.text.font.FontWeight
import androidx.compose.ui.unit.dp
import androidx.lifecycle.compose.LifecycleResumeEffect
import androidx.lifecycle.viewmodel.compose.viewModel

@OptIn(ExperimentalMaterial3Api::class)
@Composable
fun TTSScreen() {
val viewModel = viewModel {
TTSViewModel()
}
val currentWordRange by viewModel.currentWordRange.collectAsState()
val ttsState by viewModel.ttsState.collectAsState()
val isInitialized by viewModel.isInitialized.collectAsState()
val sampleTexts = listOf(
"Welcome to Text-to-Speech with real-time highlighting. This demonstration shows how words are highlighted as they are spoken.",
"The quick brown fox jumps over the lazy dog. This sentence contains every letter in the English alphabet.",
"Technology has revolutionized the way we communicate, learn, and work in the modern world.",
"Reading aloud helps improve pronunciation, comprehension, and overall language skills."
)

var customText by rememberSaveable { mutableStateOf(sampleTexts[0]) }
var selectedSampleIndex by rememberSaveable { mutableStateOf(0) }
var showSettings by rememberSaveable { mutableStateOf(false) }

DisposableEffect(Unit) {
onDispose {
viewModel.release()
}
}
LifecycleResumeEffect(Unit) {
onPauseOrDispose { viewModel.pause() }
}
Scaffold(
topBar = {
TopAppBar(
title = {
Column {
Text(
text = "Text-to-Speech",
style = MaterialTheme.typography.titleLarge,
color = MaterialTheme.colorScheme.onPrimary
)
Text(
text = "Real-time word highlighting",
style = MaterialTheme.typography.bodySmall,
color = MaterialTheme.colorScheme.onPrimary.copy(alpha = 0.8f)
)
}
}, actions = {
IconButton(
onClick = { showSettings = !showSettings }) {
Icon(
if (showSettings) Icons.Default.ExpandLess else Icons.Default.Settings,
contentDescription = "Settings",
tint = MaterialTheme.colorScheme.onPrimary
)
}
}, colors = TopAppBarDefaults.topAppBarColors(
containerColor = MaterialTheme.colorScheme.primary
)
)
},
containerColor = MaterialTheme.colorScheme.background,
contentColor = MaterialTheme.colorScheme.onBackground
) { paddingValues ->
Column(
modifier = Modifier.fillMaxSize().padding(paddingValues)
.verticalScroll(rememberScrollState()).padding(16.dp),
verticalArrangement = Arrangement.spacedBy(10.dp)
) {
// Settings Panel
AnimatedVisibility(
visible = showSettings,
enter = slideInVertically() + fadeIn(),
exit = slideOutVertically() + fadeOut()
) {
Card(
modifier = Modifier.fillMaxWidth(), colors = CardDefaults.cardColors(
containerColor = MaterialTheme.colorScheme.secondaryContainer
), elevation = CardDefaults.cardElevation(defaultElevation = 6.dp)
) {
Column(
modifier = Modifier.padding(16.dp),
verticalArrangement = Arrangement.spacedBy(12.dp)
) {
Text(
"Sample Texts",
style = MaterialTheme.typography.titleMedium,
color = MaterialTheme.colorScheme.onSecondaryContainer,
fontWeight = FontWeight.Bold
)

LazyRow(
horizontalArrangement = Arrangement.spacedBy(8.dp)
) {
itemsIndexed(sampleTexts) { index, text ->
FilterChip(
onClick = {
if (viewModel.isIdle()) {
selectedSampleIndex = index
customText = text
}
},
label = {
Text(
"Sample ${index + 1}", fontWeight = FontWeight.Medium
)
},
selected = selectedSampleIndex == index,
enabled = ttsState == TTSState.IDLE,
colors = FilterChipDefaults.filterChipColors(
selectedContainerColor = MaterialTheme.colorScheme.primary,
selectedLabelColor = MaterialTheme.colorScheme.onPrimary
)
)
}
}
}
}
}

// Text Input Section
Card(
modifier = Modifier.fillMaxWidth(),
elevation = CardDefaults.cardElevation(defaultElevation = 4.dp),
colors = CardDefaults.cardColors(
containerColor = MaterialTheme.colorScheme.surface
)
) {
Column(
modifier = Modifier.padding(20.dp)
) {
Row(
modifier = Modifier.fillMaxWidth(),
horizontalArrangement = Arrangement.SpaceBetween,
verticalAlignment = Alignment.CenterVertically
) {
Text(
"Text to speak:",
style = MaterialTheme.typography.titleMedium,
color = MaterialTheme.colorScheme.primary,
fontWeight = FontWeight.Bold
)

Text(
"${customText.length} characters",
style = MaterialTheme.typography.bodySmall,
color = MaterialTheme.colorScheme.onSurfaceVariant
)
}

Spacer(modifier = Modifier.height(12.dp))

OutlinedTextField(
value = customText,
onValueChange = {
customText = it
},
modifier = Modifier.fillMaxWidth(),
minLines = 4,
maxLines = 8,
enabled = ttsState == TTSState.IDLE,
placeholder = {
Text(
"Enter your text here to convert to speech...",
color = MaterialTheme.colorScheme.onSurfaceVariant.copy(alpha = 0.6f)
)
},
colors = OutlinedTextFieldDefaults.colors(
focusedBorderColor = MaterialTheme.colorScheme.primary,
cursorColor = MaterialTheme.colorScheme.primary,
disabledBorderColor = MaterialTheme.colorScheme.outline.copy(alpha = 0.5f)
)
)

if (ttsState != TTSState.IDLE) {
Text(
"Text editing disabled during speech",
style = MaterialTheme.typography.bodySmall,
color = Color(0xFFED6C02),
modifier = Modifier.padding(top = 8.dp)
)
}
}
}

// Live Text Display
Card(
modifier = Modifier.fillMaxWidth(), colors = CardDefaults.cardColors(
containerColor = MaterialTheme.colorScheme.surfaceVariant
), elevation = CardDefaults.cardElevation(defaultElevation = 4.dp)
) {
Column(
modifier = Modifier.padding(20.dp)
) {
Row(
modifier = Modifier.fillMaxWidth(),
horizontalArrangement = Arrangement.SpaceBetween,
verticalAlignment = Alignment.CenterVertically
) {
Text(
"Live Speech:",
style = MaterialTheme.typography.titleMedium,
color = MaterialTheme.colorScheme.primary,
fontWeight = FontWeight.Bold
)

// Enhanced Status Indicator
Row(
verticalAlignment = Alignment.CenterVertically,
horizontalArrangement = Arrangement.spacedBy(8.dp)
) {
val (statusColor, statusText, statusIcon) = when (ttsState) {
TTSState.PLAYING -> Triple(
Color(0xFF2E7D32),
"Speaking",
Icons.AutoMirrored.Filled.VolumeUp
)

TTSState.PAUSED -> Triple(
Color(0xFFED6C02), "Paused", Icons.Default.Pause
)

TTSState.IDLE -> Triple(
MaterialTheme.colorScheme.onSurfaceVariant,
"Ready",
Icons.AutoMirrored.Filled.VolumeOff
)
}

Icon(
statusIcon,
contentDescription = null,
tint = statusColor,
modifier = Modifier.size(18.dp)
)

Box(
modifier = Modifier.size(10.dp).background(statusColor, CircleShape)
)

Text(
text = statusText,
style = MaterialTheme.typography.bodyMedium,
color = statusColor,
fontWeight = FontWeight.Bold
)
}
}

Spacer(modifier = Modifier.height(16.dp))

HighlightedText(
text = customText,
highlightRange = currentWordRange,
modifier = Modifier.fillMaxWidth(),
normalTextColor = MaterialTheme.colorScheme.onSurfaceVariant,
highlightColor = MaterialTheme.colorScheme.primary.copy(alpha = 0.3f),
highlightTextColor = MaterialTheme.colorScheme.primary
)
}
}

// Control Buttons
Card(
modifier = Modifier.fillMaxWidth(), colors = CardDefaults.cardColors(
containerColor = MaterialTheme.colorScheme.surface
), elevation = CardDefaults.cardElevation(defaultElevation = 4.dp)
) {
Column(
modifier = Modifier.padding(20.dp),
verticalArrangement = Arrangement.spacedBy(16.dp)
) {
Text(
"Controls",
style = MaterialTheme.typography.titleMedium,
color = MaterialTheme.colorScheme.primary,
fontWeight = FontWeight.Bold
)

// Main Control Buttons
Row(
modifier = Modifier.fillMaxWidth(),
horizontalArrangement = Arrangement.spacedBy(12.dp)
) {
// Play Button
Button(
onClick = {
viewModel.speak(customText)
},
enabled = isInitialized && ttsState == TTSState.IDLE && customText.isNotBlank(),
modifier = Modifier.weight(1f).height(48.dp),
colors = ButtonDefaults.buttonColors(
containerColor = MaterialTheme.colorScheme.primary,
contentColor = MaterialTheme.colorScheme.onPrimary,
disabledContainerColor = MaterialTheme.colorScheme.outline.copy(
alpha = 0.3f
),
disabledContentColor = MaterialTheme.colorScheme.onSurfaceVariant.copy(
alpha = 0.5f
)
),
elevation = ButtonDefaults.buttonElevation(defaultElevation = 4.dp)
) {
Icon(
Icons.Default.PlayArrow,
contentDescription = null,
modifier = Modifier.size(20.dp)
)
Spacer(modifier = Modifier.width(6.dp))
Text("Play", fontWeight = FontWeight.Bold)
}

// Pause Button
Button(
onClick = { viewModel.pause() },
enabled = ttsState == TTSState.PLAYING,
modifier = Modifier.weight(1f).height(48.dp),
colors = ButtonDefaults.buttonColors(
containerColor = Color(0xFFED6C02),
contentColor = Color.White,
disabledContainerColor = MaterialTheme.colorScheme.outline.copy(
alpha = 0.3f
),
disabledContentColor = MaterialTheme.colorScheme.onSurfaceVariant.copy(
alpha = 0.5f
)
),
elevation = ButtonDefaults.buttonElevation(defaultElevation = 4.dp)
) {
Icon(
Icons.Default.Pause,
contentDescription = null,
modifier = Modifier.size(20.dp)
)
Spacer(modifier = Modifier.width(6.dp))
Text("Pause", fontWeight = FontWeight.Bold)
}
}

Row(
modifier = Modifier.fillMaxWidth(),
horizontalArrangement = Arrangement.spacedBy(12.dp)
) {
// Resume Button
Button(
onClick = { viewModel.resume() },
enabled = viewModel.isPaused(),
modifier = Modifier.weight(1f).height(48.dp),
colors = ButtonDefaults.buttonColors(
containerColor = Color(0xFF2E7D32),
contentColor = Color.White,
disabledContainerColor = MaterialTheme.colorScheme.outline.copy(
alpha = 0.3f
),
disabledContentColor = MaterialTheme.colorScheme.onSurfaceVariant.copy(
alpha = 0.5f
)
),
elevation = ButtonDefaults.buttonElevation(defaultElevation = 4.dp)
) {
Icon(
Icons.Default.PlayArrow,
contentDescription = null,
modifier = Modifier.size(20.dp)
)
Spacer(modifier = Modifier.width(6.dp))
Text("Resume", fontWeight = FontWeight.Bold)
}

// Stop Button
Button(
onClick = { viewModel.stop() },
enabled = ttsState != TTSState.IDLE,
modifier = Modifier.weight(1f).height(48.dp),
colors = ButtonDefaults.buttonColors(
containerColor = MaterialTheme.colorScheme.error,
contentColor = MaterialTheme.colorScheme.onError,
disabledContainerColor = MaterialTheme.colorScheme.outline.copy(
alpha = 0.3f
),
disabledContentColor = MaterialTheme.colorScheme.onSurfaceVariant.copy(
alpha = 0.5f
)
),
elevation = ButtonDefaults.buttonElevation(defaultElevation = 4.dp)
) {
Icon(
Icons.Default.Stop,
contentDescription = null,
modifier = Modifier.size(20.dp)
)
Spacer(modifier = Modifier.width(6.dp))
Text("Stop", fontWeight = FontWeight.Bold)
}
}

// Quick Actions Row
Row(
modifier = Modifier.fillMaxWidth(),
horizontalArrangement = Arrangement.spacedBy(12.dp)
) {
// Clear Text Button
OutlinedButton(
onClick = {
customText = ""
},
enabled = ttsState == TTSState.IDLE && customText.isNotEmpty(),
modifier = Modifier.weight(1f).height(44.dp),
colors = ButtonDefaults.outlinedButtonColors(
contentColor = MaterialTheme.colorScheme.error,
disabledContentColor = MaterialTheme.colorScheme.onSurfaceVariant.copy(
alpha = 0.5f
)
),
border = BorderStroke(
2.dp,
if (ttsState == TTSState.IDLE && customText.isNotEmpty()) MaterialTheme.colorScheme.error
else MaterialTheme.colorScheme.outline.copy(alpha = 0.5f)
)
) {
Icon(
Icons.Default.Clear,
contentDescription = null,
modifier = Modifier.size(18.dp)
)
Spacer(modifier = Modifier.width(4.dp))
Text("Clear", fontWeight = FontWeight.Medium)
}

// Random Sample Button
OutlinedButton(
onClick = {
val randomIndex = sampleTexts.indices.random()
selectedSampleIndex = randomIndex
customText = sampleTexts[randomIndex]
},
enabled = ttsState == TTSState.IDLE,
modifier = Modifier.weight(1f).height(44.dp),
colors = ButtonDefaults.outlinedButtonColors(
contentColor = MaterialTheme.colorScheme.primary,
disabledContentColor = MaterialTheme.colorScheme.onSurfaceVariant.copy(
alpha = 0.5f
)
),
border = BorderStroke(
2.dp,
if (ttsState == TTSState.IDLE) MaterialTheme.colorScheme.primary
else MaterialTheme.colorScheme.outline.copy(alpha = 0.5f)
)
) {
Icon(
Icons.Default.Shuffle,
contentDescription = null,
modifier = Modifier.size(18.dp)
)
Spacer(modifier = Modifier.width(4.dp))
Text("Random", fontWeight = FontWeight.Medium)
}
}
}
}

// Initialization Status
if (!isInitialized) {
Card(
modifier = Modifier.fillMaxWidth(), colors = CardDefaults.cardColors(
containerColor = MaterialTheme.colorScheme.errorContainer
), elevation = CardDefaults.cardElevation(defaultElevation = 4.dp)
) {
Row(
modifier = Modifier.padding(20.dp),
verticalAlignment = Alignment.CenterVertically,
horizontalArrangement = Arrangement.spacedBy(16.dp)
) {
CircularProgressIndicator(
modifier = Modifier.size(24.dp),
strokeWidth = 3.dp,
color = MaterialTheme.colorScheme.onErrorContainer
)

Column {
Text(
"Initializing Text-to-Speech",
style = MaterialTheme.typography.titleSmall,
color = MaterialTheme.colorScheme.onErrorContainer,
fontWeight = FontWeight.Bold
)
Text(
"Please wait while we set up the speech engine...",
style = MaterialTheme.typography.bodySmall,
color = MaterialTheme.colorScheme.onErrorContainer.copy(alpha = 0.8f)
)
}
}
}
}
}
}
}

Here’s what the TTSScreen will include:

  • ViewModel Binding
    We initialize the TTSViewModel and collect its state flows (currentWordRange, ttsState, and isInitialized) so that the UI updates in real time as the TTS engine runs.
  • Text Input & Sample Texts
    A text field allows users to type custom text, while a few sample texts are provided for quick testing.
  • Highlighted Text Display
    As the TTS engine speaks, the current word is highlighted in the displayed text, making it easy to follow along.
  • Status Indicator
    A chip at the top shows the current TTS state: Ready, Speaking, Paused, or Stopped.
  • Initialization State
    If the TTS engine is not yet initialized, the screen shows a loader and message until it’s ready.
  • Control Buttons
    The screen provides buttons for:
  • Play β†’ viewModel.speak(customText)
  • Pause β†’ viewModel.pause()
  • Resume β†’ viewModel.resume()
  • Stop β†’ viewModel.stop()
  • Clear / Random β†’ for managing text input

Finally, integrate the screen into your app:

// composeApp/src/commonMain/kotlin/your_package_name/App.kt

@Composable
@Preview
fun App() {
MaterialTheme {
TTSScreen()
}
}

That’s it!

You can explore the complete source code and video on GitHub:


GitHub - Coding-Meet/TextToSpeech-CMP

Conclusion

By following this guide, you’ve created a fully functional Text-to-Speech app using Kotlin Multiplatform that runs natively on both Android and iOS, complete with real-time word highlighting and clean, shared logic.

What you now have:

  • A shared TTS architecture, powered by expect/actual and unified ViewModel logic.
  • Platform-specific speech synthesis via Android’s TextToSpeech and iOS’s AVSpeechSynthesizer.
  • A Compose UI (TTSScreen) that offers intuitive playback control, dynamic highlighting, and state awareness (playing, paused, idle).
  • Swift interoperability for iOS, ensuring both platforms make use of the same shared codebase.

If you’re interested in learning more about Kotlin Multiplatform and Compose Multiplatform, check out my playlist on YouTube Channel:
Kotlin Multiplatform & Compose Multiplatform

Thank you for reading! πŸ™ŒπŸ™βœŒ I hope you found this guide useful.

Don’t forget to clap πŸ‘ to support me and follow for more insightful articles about Android Development, Kotlin, and KMP. If you need any help related to Android, Kotlin, and KMP, I’m always happy to assist.

Explore More Projects​


If you’re interested in seeing full applications built with Kotlin Multiplatform and Jetpack Compose, check out these open-source projects:

  • Pokemon Appβ€Šβ€”β€ŠMVI Compose Multiplatform Template (Supports Android, iOS, Windows, macOS, Linux):
    A beautiful, modern Pokemon application built with Compose Multiplatform featuring MVI architecture, type-safe navigation, and dynamic theming. Explore Pokemon, manage favorites, and enjoy a seamless experience across Android, Desktop, and iOS platforms.
    GitHub Repository: CMP-MVI-Template
  • News Kotlin Multiplatform App (Supports Android, iOS, Windows, macOS, Linux):
    News KMP App is a Kotlin Compose Multiplatform (KMP) project that aims to provide a consistent news reading experience across multiple platforms, including Android, iOS, Windows, macOS, and Linux. This project leverages Kotlin’s multiplatform capabilities to share code and logic while using Compose for UI, ensuring a seamless and native experience on each platform.
    GitHub Repository: News-KMP-App
  • Gemini AI Kotlin Multiplatform App (Supports Android, iOS, Windows, macOS, Linux, and Web):
    Gemini AI KMP App is a Kotlin Compose Multiplatform project designed by Gemini AI where you can retrieve information from text and images in a conversational format. Additionally, it allows storing chats group-wise using SQLDelight and KStore, and facilitates changing the Gemini API key.
    GitHub Repository: Gemini-AI-KMP-App

Follow me on​


My Portfolio Website , YouTube , GitHub , Instagram , LinkedIn , Buy Me a Coffee , Twitter , DM Me For Freelancing Project

stat



Cross-Platform Text-to-Speech with Real-time Highlighting (Kotlin Multiplatform + Swift… was originally published in ProAndroidDev on Medium, where people are continuing the conversation by highlighting and responding to this story.

Continue reading...
 


Join 𝕋𝕄𝕋 on Telegram
Channel PREVIEW:
Back
Top