Welcome!

By registering with us, you'll be able to discuss, share and private message with other members of our community.

SignUp Now!

Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good

J
Thread Author

Jose Garcia

Guest
This is Part 4 of a series of articles where I explain how to implement GenAI on Android. [Click here to view the full series.]
[TrendyMediaToday.com] Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good {file_size} {filename}

Gen-AI also comes with computer vision

This is the coolest bit of SmartWriter so far: pick a photo and the app describes what it sees — entirely on device, no cloud. On my Galaxy S25 Ultra it’s quick too: typically ~1–3 seconds per image after the first model download.

🔗 Full project (with Compose UI): https://github.com/josegbel/smart-writer

💡 What you can build with this​

  • Accessibility / alt‑text: auto‑generate descriptive text for images.
  • Smart gallery captions: save human‑like captions with photos.
  • Notes with pictures: drop a photo into a note and get a first‑draft description.
  • Private visual search: tag/cluster images locally for personal search.
  • Social posting helpers: suggest captions users can tweak.

All of this runs locally, so it’s private, fast, and works offline once the model is installed.

⚙️ Setup​


Add the dependency to your Version Catalog:

mlkit-genai-image-description = "com.google.mlkit:genai-image-description:1.0.0-beta1"

Then reference it from your module:

dependencies {
implementation(libs.mlkit.genai.image.description)
}
⚠️ You’ll need a supported device (e.g., Galaxy S25 Ultra, Pixel 9+, …). Emulators don’t run these GenAI models.

🧠 ViewModel — how it works​


Below are the important pieces of my ImageDescViewModel and what each one does. (This is the exact implementation used in the app; I’m just showing the key sections here. The full source is in the repo.)

1) User picks an image, then call the API​

Describe​


We store the selected Uri, create the on‑device client and hand off to the feature‑status flow:

fun onImageSelected(uri: Uri) {
_uiState.update { it.copy(imageUri = uri) }
}

fun describe(context: Context) {
_uiState.update { it.copy(isLoading = true) }
viewModelScope.launch {
try {
val options = ImageDescriberOptions.builder(context).build()
imageDescriber = ImageDescription.getClient(options)
prepareAndStartImageDesc(context)
} catch (e: Exception) {
_uiEvent.emit(ImageDescUiEvent.Error("Error: ${e.message}"))
}
}
}

2) Check model availability and handle download​


On first run the model may need to be downloaded. We check FeatureStatus and react:

suspend fun prepareAndStartImageDesc(context: Context) {
val featureStatus = imageDescriber?.checkFeatureStatus()?.await()

when (featureStatus) {
FeatureStatus.DOWNLOADABLE -> downloadFeature(context)
FeatureStatus.DOWNLOADING -> {
imageDescriber?.let { desc ->
uiState.value.imageUri?.let { uri ->
startImageDescRequest(uri, context, desc)
}
}
}
FeatureStatus.AVAILABLE -> {
_uiState.update { it.copy(isLoading = true) }
imageDescriber?.let { desc ->
uiState.value.imageUri?.let { uri ->
startImageDescRequest(uri, context, desc)
}
}
}
FeatureStatus.UNAVAILABLE, null -> {
_uiEvent.emit(
ImageDescUiEvent.Error("Your device does not support this feature.")
)
}
}
}

3) Download callbacks (first‑time only)​


We show progress and immediately run inference once the model is ready:

private fun downloadFeature(context: Context) {
imageDescriber?.downloadFeature(object : DownloadCallback {
override fun onDownloadStarted(bytesToDownload: Long) {
_uiState.update { it.copy(isLoading = true) }
}
override fun onDownloadProgress(totalBytesDownloaded: Long) {
_uiState.update { it.copy(isLoading = true) }
}
override fun onDownloadCompleted() {
_uiState.update { it.copy(isLoading = false) }
imageDescriber?.let { desc ->
uiState.value.imageUri?.let { uri ->
startImageDescRequest(uri, context, desc)
}
}
}
override fun onDownloadFailed(e: GenAiException) {
_uiState.update { it.copy(isLoading = false) }
_uiEvent.tryEmit(
ImageDescUiEvent.Error("Download failed: ${e.message}")
)
}
})
}

4) Run inference (decode → request → await)​


Decode the Uri to a Bitmap, wrap it in a request, then await the natural‑language description:

fun startImageDescRequest(
uri: Uri,
context: Context,
imageDescriber: ImageDescriber,
) {
val bitmap = ImageDecoder.decodeBitmap(
ImageDecoder.createSource(context.contentResolver, uri)
)
val request = ImageDescriptionRequest.builder(bitmap).build()
_uiState.update { it.copy(isLoading = true) }
viewModelScope.launch {
try {
val description = imageDescriber.runInference(request).await().description
_uiState.update { it.copy(description = description) }
} catch (e: Exception) {
_uiEvent.emit(
ImageDescUiEvent.Error("Error describing the image: ${e.message}")
)
} finally {
_uiState.update { it.copy(isLoading = false) }
}
}
}
💡 Tip: Very large images can be memory‑heavy. Consider down‑scaling before building the request if you hit OOMs.

🗂️ Exposing data with UiState​


Your ImageDescUiState carries:

  • imageUri — the user’s chosen image
  • description — the generated caption / alt‑text
  • isLoading — drives the progress indicator

Transient errors go through SharedFlow<ImageDescUiEvent> so you can show a Snackbar/toast without polluting state.

⚡ Latency (real‑world)​


On a Galaxy S25 Ultra, I’m seeing ~1–3s per image after the first run. Once the model is on device, the feature works offline.

✅ Recap​

  • Fully on‑device image descriptions with ML Kit GenAI.
  • Minimal code if you’ve already implemented the other three features — the feature‑status/download pattern is the same.
  • Great for accessibility, captions, and private photo workflows.

🎉 Thanks for reading!​


That’s the end of the SmartWriter series — I hope you found it useful (and a bit fun). If you enjoyed this, follow me on Medium and hit Subscribe so you don’t miss future Android + Kotlin experiments. I’m planning more hands-on pieces soon.

If you want to try everything yourself, the code’s in the repo — and you can read the other parts below:


Got suggestions or questions? Drop a comment or ping me — I’d love to hear how you’re using ML Kit GenAI in your apps. 🚀

[TrendyMediaToday.com] Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good {file_size} {filename}



Upload a Photo, Get a Caption. Google’s On-Device AI Is Getting Crazy Good 📸 was originally published in ProAndroidDev on Medium, where people are continuing the conversation by highlighting and responding to this story.

Continue reading...
 


Join 𝕋𝕄𝕋 on Telegram
Channel PREVIEW:
Back
Top Bottom