J
Thread Author
Jose Garcia
Guest
This is Part 4 of a series of articles where I explain how to implement GenAI on Android. [Click here to view the full series.]
![[TrendyMediaToday.com] Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good {file_size} {filename} [TrendyMediaToday.com] Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good {file_size} {filename}](https://cdn-images-1.medium.com/max/1024/1*vtQ5JnAHejWb7w9XlCh8rw.png)
Gen-AI also comes with computer vision
This is the coolest bit of SmartWriter so far: pick a photo and the app describes what it sees — entirely on device, no cloud. On my Galaxy S25 Ultra it’s quick too: typically ~1–3 seconds per image after the first model download.

What you can build with this
- Accessibility / alt‑text: auto‑generate descriptive text for images.
- Smart gallery captions: save human‑like captions with photos.
- Notes with pictures: drop a photo into a note and get a first‑draft description.
- Private visual search: tag/cluster images locally for personal search.
- Social posting helpers: suggest captions users can tweak.
All of this runs locally, so it’s private, fast, and works offline once the model is installed.
Setup
Add the dependency to your Version Catalog:
mlkit-genai-image-description = "com.google.mlkit:genai-image-description:1.0.0-beta1"
Then reference it from your module:
dependencies {
implementation(libs.mlkit.genai.image.description)
}
You’ll need a supported device (e.g., Galaxy S25 Ultra, Pixel 9+, …). Emulators don’t run these GenAI models.
ViewModel — how it works
Below are the important pieces of my ImageDescViewModel and what each one does. (This is the exact implementation used in the app; I’m just showing the key sections here. The full source is in the repo.)
1) User picks an image, then call the API
Describe
We store the selected Uri, create the on‑device client and hand off to the feature‑status flow:
fun onImageSelected(uri: Uri) {
_uiState.update { it.copy(imageUri = uri) }
}
fun describe(context: Context) {
_uiState.update { it.copy(isLoading = true) }
viewModelScope.launch {
try {
val options = ImageDescriberOptions.builder(context).build()
imageDescriber = ImageDescription.getClient(options)
prepareAndStartImageDesc(context)
} catch (e: Exception) {
_uiEvent.emit(ImageDescUiEvent.Error("Error: ${e.message}"))
}
}
}
2) Check model availability and handle download
On first run the model may need to be downloaded. We check FeatureStatus and react:
suspend fun prepareAndStartImageDesc(context: Context) {
val featureStatus = imageDescriber?.checkFeatureStatus()?.await()
when (featureStatus) {
FeatureStatus.DOWNLOADABLE -> downloadFeature(context)
FeatureStatus.DOWNLOADING -> {
imageDescriber?.let { desc ->
uiState.value.imageUri?.let { uri ->
startImageDescRequest(uri, context, desc)
}
}
}
FeatureStatus.AVAILABLE -> {
_uiState.update { it.copy(isLoading = true) }
imageDescriber?.let { desc ->
uiState.value.imageUri?.let { uri ->
startImageDescRequest(uri, context, desc)
}
}
}
FeatureStatus.UNAVAILABLE, null -> {
_uiEvent.emit(
ImageDescUiEvent.Error("Your device does not support this feature.")
)
}
}
}
3) Download callbacks (first‑time only)
We show progress and immediately run inference once the model is ready:
private fun downloadFeature(context: Context) {
imageDescriber?.downloadFeature(object : DownloadCallback {
override fun onDownloadStarted(bytesToDownload: Long) {
_uiState.update { it.copy(isLoading = true) }
}
override fun onDownloadProgress(totalBytesDownloaded: Long) {
_uiState.update { it.copy(isLoading = true) }
}
override fun onDownloadCompleted() {
_uiState.update { it.copy(isLoading = false) }
imageDescriber?.let { desc ->
uiState.value.imageUri?.let { uri ->
startImageDescRequest(uri, context, desc)
}
}
}
override fun onDownloadFailed(e: GenAiException) {
_uiState.update { it.copy(isLoading = false) }
_uiEvent.tryEmit(
ImageDescUiEvent.Error("Download failed: ${e.message}")
)
}
})
}
4) Run inference (decode → request → await)
Decode the Uri to a Bitmap, wrap it in a request, then await the natural‑language description:
fun startImageDescRequest(
uri: Uri,
context: Context,
imageDescriber: ImageDescriber,
) {
val bitmap = ImageDecoder.decodeBitmap(
ImageDecoder.createSource(context.contentResolver, uri)
)
val request = ImageDescriptionRequest.builder(bitmap).build()
_uiState.update { it.copy(isLoading = true) }
viewModelScope.launch {
try {
val description = imageDescriber.runInference(request).await().description
_uiState.update { it.copy(description = description) }
} catch (e: Exception) {
_uiEvent.emit(
ImageDescUiEvent.Error("Error describing the image: ${e.message}")
)
} finally {
_uiState.update { it.copy(isLoading = false) }
}
}
}
Tip: Very large images can be memory‑heavy. Consider down‑scaling before building the request if you hit OOMs.
Exposing data with UiState
Your ImageDescUiState carries:
- imageUri — the user’s chosen image
- description — the generated caption / alt‑text
- isLoading — drives the progress indicator
Transient errors go through SharedFlow<ImageDescUiEvent> so you can show a Snackbar/toast without polluting state.
Latency (real‑world)
On a Galaxy S25 Ultra, I’m seeing ~1–3s per image after the first run. Once the model is on device, the feature works offline.
Recap
- Fully on‑device image descriptions with ML Kit GenAI.
- Minimal code if you’ve already implemented the other three features — the feature‑status/download pattern is the same.
- Great for accessibility, captions, and private photo workflows.
Thanks for reading!
That’s the end of the SmartWriter series — I hope you found it useful (and a bit fun). If you enjoyed this, follow me on Medium and hit Subscribe so you don’t miss future Android + Kotlin experiments. I’m planning more hands-on pieces soon.
If you want to try everything yourself, the code’s in the repo — and you can read the other parts below:
- Part 0 — Intro: Google Just Gave Android Developers Superpowers — Here’s How I’m Using Them
- Part 1 — Summarisation: This One Line of Code Made My Android App Summarise Anything Instantly
- Part 2 — Proofreading: Google’s AI Just Proofread My Writing Better Than I Ever Could
- Part 3 — Rewriting: I Built a Button That Rewrites Text in Any Tone. Now My App Sounds Like a CEO!
Got suggestions or questions? Drop a comment or ping me — I’d love to hear how you’re using ML Kit GenAI in your apps.

Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good

Continue reading...