Skip to content
All posts
April 21, 20266 min read

Debugging in Production Without Direct Access (Real Strategies)

The device is in a client's lobby. The user is in another country. You can't reproduce it locally. Here's how to debug production issues when you have no direct access to the failing environment.

EngineeringDebuggingAndroidTesting
Share:

The bug is real. The user is experiencing it. You can't reproduce it on any device in your lab.

The device is on a client's premises. The user is in a different timezone. Your access to the environment is: whatever they can screenshot, whatever logs you've pre-configured to capture, and whatever information you can extract from your monitoring tools.

This is production debugging without direct access. It's a different skill from debugging with a connected device.


The Foundation: Instrumentation You Need Before Incidents Happen

The ability to debug remotely depends almost entirely on decisions you made before the incident. If you didn't instrument the system upfront, you're flying blind.

1. Crash Reporting with Context

Firebase Crashlytics, Sentry, and similar tools capture crashes automatically. But the default configuration captures the stack trace — not enough to understand the context.

Add custom keys that give you the "state before the crash":

kotlin
// Set before any operation that might crash
fun onUserBeganCheckout(cart: Cart) {
    val crashlytics = FirebaseCrashlytics.getInstance()
    crashlytics.setCustomKey("last_action", "checkout_started")
    crashlytics.setCustomKey("cart_item_count", cart.items.size)
    crashlytics.setCustomKey("cart_has_promo", cart.promoCode != null)
    crashlytics.setCustomKey("user_account_age_days", user.accountAgeDays)
    crashlytics.setCustomKey("network_type", networkType)
    crashlytics.setCustomKey("device_storage_mb", availableStorageMb)
}

When the crash comes in, you see not just where it crashed, but what the user was doing, what state the cart was in, and what network conditions they were on.

2. Breadcrumbs

Breadcrumbs are a trail of events leading up to a crash or error. They answer: "What happened in the 2 minutes before this failed?"

kotlin
// Log breadcrumbs throughout key user flows
fun onScreenShown(screenName: String) {
    Crashlytics.log("Screen: $screenName shown")
}

fun onApiCallStarted(endpoint: String) {
    Crashlytics.log("API: $endpoint started")
}

fun onApiCallCompleted(endpoint: String, statusCode: Int, durationMs: Long) {
    Crashlytics.log("API: $endpoint completed $statusCode in ${durationMs}ms")
}

fun onUserAction(action: String, details: String) {
    Crashlytics.log("Action: $action - $details")
}

A crash with breadcrumbs tells you: the user was on the checkout screen, called the payment API (200), then tapped apply promo code, then called the promo API (which timed out), then the app crashed. Without breadcrumbs, you see: NullPointerException at CheckoutViewModel.kt:142.

3. Remote Logging for Non-Crash Issues

Not every production issue is a crash. Slow performance, incorrect data, missing content, failed background jobs — these don't appear in crash reports.

Consider a lightweight remote logging layer for your most critical non-crash events:

kotlin
// For signage / kiosk apps: remote log viewer
object RemoteLogger {
    fun log(tag: String, message: String, level: Level = Level.INFO) {
        // Write to local file
        localLogFile.appendLine("${timestamp()} [$level] $tag: $message")
        
        // Batch upload every 5 minutes (or on error)
        if (level == Level.ERROR || shouldFlush()) {
            uploadLogs()
        }
    }
}

For signage deployments where you can't SSH into the device, an HTTP-accessible log tail endpoint is invaluable:

kotlin
// Simple log server endpoint — for internal/kiosk use only
@GET("/logs/tail")
fun getRecentLogs(@Query("lines") lines: Int = 100): Response<List<LogEntry>> {
    return Response.success(logStore.getRecent(lines))
}

[!WARNING] Never expose a log endpoint on a public-facing server or in a consumer app. This is for internally-managed kiosk/signage devices where you control network access.


Remote Debugging Techniques

1. ADB Over Wi-Fi (For Managed Devices)

If the device is on a network you can reach:

bash
# On the device (one-time setup, requires brief USB connection)
adb tcpip 5555

# From your machine (on same network or via VPN)
adb connect [DEVICE_IP]:5555

# Now you have full ADB access
adb logcat -v time | grep "com.your.package"
adb shell dumpsys meminfo com.your.package
adb shell am start -n com.your.package/.MainActivity

For signage deployments, configure ADB over Wi-Fi during initial setup. It's much easier than explaining to a client's IT team how to enable USB debugging at 11pm during an incident.

2. Screen Mirroring for Visual Issues

When a visual bug is reported but you can't reproduce it locally:

bash
# scrcpy: mirror Android screen over USB or ADB Wi-Fi
scrcpy --stay-awake

# Or for a specific resolution
scrcpy --max-size 1920 --bit-rate 8M

Seeing the actual device screen in real-time resolves a huge class of "I can't reproduce this" situations. What looks like a logic bug is often a layout issue that only appears on the client's specific screen resolution or display configuration.

3. Client-Assisted Reproduction

When you genuinely can't reproduce and can't get remote access, your last resort is structured client assistance. This requires:

A clear reproduction script:

code
1. Tell me the device model and Android version (Settings → About phone)
2. Open the app and go to [screen]
3. Tap [specific button]
4. Screenshot the result immediately
5. Wait 30 seconds and screenshot again
6. Go to Settings → Apps → [App name] → tap "Force stop"
7. Reopen the app — does the problem persist?

Log extraction if you've enabled it:

code
1. Connect device to PC via USB
2. Open device as file storage
3. Navigate to [specific folder]
4. Copy and email the .log files

Screenshots with enough context: Train clients to include the status bar in screenshots (shows time, network indicator, battery — all useful). A screenshot that's cropped to just the app UI hides the environment context.


Using Production Signals to Triangulate

When you can't reproduce locally, use production data to build a picture:

SignalWhat It Tells You
Crash rate by OS versionIs this Android 12 specific?
Crash rate by device modelSamsung Galaxy A series only?
Crash timing patternAfter X hours of runtime? After an OS update?
Affected user segmentsFree users only? Specific account type?
Network at crash timeOnly on cellular, not Wi-Fi?
App version distributionRegression from version 2.3.0 to 2.4.0?

If the crash rate jumped 3 days ago and you deployed version 2.4.0 4 days ago, the correlation is strong. If the crash only affects Samsung devices running Android 12, you know where to focus your reproduction attempts.


The Emergency Protocol

When a production issue is active and clients are affected:

  1. Acknowledge first. Clients need to know you're aware. Don't go silent while investigating.

  2. Capture state before doing anything. Pull logs, take screenshots, note the time. Once you start making changes, the current state is gone.

  3. Check if you can remotely restart. For signage, a remote reboot often restores service while you investigate the root cause.

  4. Establish a minimum viable reproduction. Can you reproduce with any similar device? Can you reproduce in any configuration? Any reproduction, even imperfect, is better than none.

  5. Communicate what you know and what you don't. "We've identified this is affecting Android 11 devices only. We're investigating. ETA for update: 2 hours." is vastly better than silence.

The engineers who handle remote production incidents best are the ones who invested in instrumentation before incidents happened. The breadcrumbs, the custom crash keys, the remote log access — all of it was set up before anyone needed it.

The time to add production instrumentation is when everything is working, not when something is broken.

Share:
S

Sudarshan Chaudhari

AI Systems Builder / Product Engineer

Bangkok, Thailand

Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.

Stay updated

Get new posts on Android, Kotlin, and solo dev straight to your inbox.

Newsletter preferences

Related Apps

MyFamilyTracker

Real-time family location sharing — Firebase Realtime DB for sub-second propagation, WorkManager + ForegroundService for OS-compliant background collection, geofencing via Google Maps API.

Building something? Available for Android dev and QA consulting.

Work with me

Comments — powered by Giscus