Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good

Jose Garcia · Saturday at 7:48 PM

This is Part 4 of a series of articles where I explain how to implement GenAI on Android. [Click here to view the full series.]

$[TrendyMediaToday.com] Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good {file_size} {filename}$

Gen-AI also comes with computer vision

This is the coolest bit of SmartWriter so far: pick a photo and the app describes what it sees — entirely on device, no cloud. On my Galaxy S25 Ultra it’s quick too: typically ~1–3 seconds per image after the first model download.

Full project (with Compose UI): https://github.com/josegbel/smart-writer

What you can build with this

Accessibility / alt‑text: auto‑generate descriptive text for images.
Smart gallery captions: save human‑like captions with photos.
Notes with pictures: drop a photo into a note and get a first‑draft description.
Private visual search: tag/cluster images locally for personal search.
Social posting helpers: suggest captions users can tweak.

All of this runs locally, so it’s private, fast, and works offline once the model is installed.

Setup

Add the dependency to your Version Catalog:

mlkit-genai-image-description = "com.google.mlkit:genai-image-description:1.0.0-beta1"

Then reference it from your module:

dependencies {
implementation(libs.mlkit.genai.image.description)
}

You’ll need a supported device (e.g., Galaxy S25 Ultra, Pixel 9+, …). Emulators don’t run these GenAI models.

ViewModel — how it works

Below are the important pieces of my ImageDescViewModel and what each one does. (This is the exact implementation used in the app; I’m just showing the key sections here. The full source is in the repo.)

1) User picks an image, then call the API

Describe

We store the selected Uri, create the on‑device client and hand off to the feature‑status flow:

fun onImageSelected(uri: Uri) {
_uiState.update { it.copy(imageUri = uri) }
}

fun describe(context: Context) {
_uiState.update { it.copy(isLoading = true) }
viewModelScope.launch {
try {
val options = ImageDescriberOptions.builder(context).build()
imageDescriber = ImageDescription.getClient(options)
prepareAndStartImageDesc(context)
} catch (e: Exception) {
_uiEvent.emit(ImageDescUiEvent.Error("Error: ${e.message}"))
}
}
}

2) Check model availability and handle download

On first run the model may need to be downloaded. We check FeatureStatus and react:

suspend fun prepareAndStartImageDesc(context: Context) {
val featureStatus = imageDescriber?.checkFeatureStatus()?.await()

when (featureStatus) {
FeatureStatus.DOWNLOADABLE -> downloadFeature(context)
FeatureStatus.DOWNLOADING -> {
imageDescriber?.let { desc ->
uiState.value.imageUri?.let { uri ->
startImageDescRequest(uri, context, desc)
}
}
}
FeatureStatus.AVAILABLE -> {
_uiState.update { it.copy(isLoading = true) }
imageDescriber?.let { desc ->
uiState.value.imageUri?.let { uri ->
startImageDescRequest(uri, context, desc)
}
}
}
FeatureStatus.UNAVAILABLE, null -> {
_uiEvent.emit(
ImageDescUiEvent.Error("Your device does not support this feature.")
)
}
}
}

3) Download callbacks (first‑time only)

We show progress and immediately run inference once the model is ready:

private fun downloadFeature(context: Context) {
imageDescriber?.downloadFeature(object : DownloadCallback {
override fun onDownloadStarted(bytesToDownload: Long) {
_uiState.update { it.copy(isLoading = true) }
}
override fun onDownloadProgress(totalBytesDownloaded: Long) {
_uiState.update { it.copy(isLoading = true) }
}
override fun onDownloadCompleted() {
_uiState.update { it.copy(isLoading = false) }
imageDescriber?.let { desc ->
uiState.value.imageUri?.let { uri ->
startImageDescRequest(uri, context, desc)
}
}
}
override fun onDownloadFailed(e: GenAiException) {
_uiState.update { it.copy(isLoading = false) }
_uiEvent.tryEmit(
ImageDescUiEvent.Error("Download failed: ${e.message}")
)
}
})
}

4) Run inference (decode → request → await)

Decode the Uri to a Bitmap, wrap it in a request, then await the natural‑language description:

fun startImageDescRequest(
uri: Uri,
context: Context,
imageDescriber: ImageDescriber,
) {
val bitmap = ImageDecoder.decodeBitmap(
ImageDecoder.createSource(context.contentResolver, uri)
)
val request = ImageDescriptionRequest.builder(bitmap).build()
_uiState.update { it.copy(isLoading = true) }
viewModelScope.launch {
try {
val description = imageDescriber.runInference(request).await().description
_uiState.update { it.copy(description = description) }
} catch (e: Exception) {
_uiEvent.emit(
ImageDescUiEvent.Error("Error describing the image: ${e.message}")
)
} finally {
_uiState.update { it.copy(isLoading = false) }
}
}
}

Tip: Very large images can be memory‑heavy. Consider down‑scaling before building the request if you hit OOMs.

Exposing data with UiState

Your ImageDescUiState carries:

imageUri — the user’s chosen image
description — the generated caption / alt‑text
isLoading — drives the progress indicator

Transient errors go through SharedFlow<ImageDescUiEvent> so you can show a Snackbar/toast without polluting state.

Latency (real‑world)

On a Galaxy S25 Ultra, I’m seeing ~1–3s per image after the first run. Once the model is on device, the feature works offline.

Recap

Fully on‑device image descriptions with ML Kit GenAI.
Minimal code if you’ve already implemented the other three features — the feature‑status/download pattern is the same.
Great for accessibility, captions, and private photo workflows.

Thanks for reading!

That’s the end of the SmartWriter series — I hope you found it useful (and a bit fun). If you enjoyed this, follow me on Medium and hit Subscribe so you don’t miss future Android + Kotlin experiments. I’m planning more hands-on pieces soon.

If you want to try everything yourself, the code’s in the repo — and you can read the other parts below:

Got suggestions or questions? Drop a comment or ping me — I’d love to hear how you’re using ML Kit GenAI in your apps.

$[TrendyMediaToday.com] Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good {file_size} {filename}$

Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good

was originally published in ProAndroidDev on Medium, where people are continuing the conversation by highlighting and responding to this story.

Continue reading...

Post Title Goes Here

Post Title Goes Here

Post Title Goes Here

Welcome!

Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good

Jose Garcia

Guest

What you can build with this

Setup

ViewModel — how it works

1) User picks an image, then call the API

Describe

2) Check model availability and handle download

3) Download callbacks (first‑time only)

4) Run inference (decode → request → await)

Exposing data with UiState

Latency (real‑world)

Recap

Thanks for reading!

Online statistics

Thread 'MIDA-304'

Thread 'KRaKe#N!!! KRaKeN?!! ССЫЛКИ на KRaKeN 2025!@@'

Thread 'KRaKe#N!!! KRaKeN?!! ССЫЛКИ на KRaKeN 2025!@@'

Thread 'КРАКЕН ! ! ! Все Ссылки 2025 Список Рабочих Ссылок и З*еркал1'

Thread 'Make me believe every word'

Newest members

Welcome!

Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good

Jose Garcia

Guest

What you can build with this​

Setup​

ViewModel — how it works​

1) User picks an image, then call the API​

Describe​

2) Check model availability and handle download​

3) Download callbacks (first‑time only)​

4) Run inference (decode → request → await)​

Exposing data with UiState​

Latency (real‑world)​

Recap​

Thanks for reading!​

Online statistics

Newest members

Stay Connected

What you can build with this

Setup

ViewModel — how it works

1) User picks an image, then call the API

Describe

2) Check model availability and handle download

3) Download callbacks (first‑time only)

4) Run inference (decode → request → await)

Exposing data with UiState

Latency (real‑world)

Recap

Thanks for reading!