MyFamilyTracker
Real-time family location sharing — Firebase Realtime DB for sub-second propagation, WorkManager + ForegroundService for OS-compliant background collection, geofencing via Google Maps API.
When you maintain 22 Android apps simultaneously, production debugging has to be systematic. This is the exact triage process, tooling setup, and root-cause workflow I use to go from crash alert to fix in under 2 hours — even for intermittent, hard-to-reproduce issues.
On this page
22 apps. 80+ repositories. One engineer.
When a production crash comes in, I have to move fast with limited context. A crash on app 14 at 11pm after I've been working on app 7 all day — I need a system that gets me to root cause quickly, without spending 45 minutes reconstructing what that app even does.
Here's the exact system I use.
The foundation is aggressive crash context tagging. Every app initializes the same way:
class MyApplication : Application() {
override fun onCreate() {
super.onCreate()
initCrashlytics()
}
private fun initCrashlytics() {
FirebaseCrashlytics.getInstance().apply {
// Device context
setCustomKey("device_model", Build.MODEL)
setCustomKey("manufacturer", Build.MANUFACTURER)
setCustomKey("os_version", Build.VERSION.RELEASE)
setCustomKey("firmware", Build.DISPLAY) // captures OEM build
setCustomKey("sdk_int", Build.VERSION.SDK_INT.toString())
// App context
setCustomKey("app_version", BuildConfig.VERSION_NAME)
setCustomKey("version_code", BuildConfig.VERSION_CODE.toString())
setCustomKey("build_type", BuildConfig.BUILD_TYPE)
// Session context
setCustomKey("session_start", System.currentTimeMillis().toString())
setCustomKey("install_source", getInstallSource())
}
}
private fun getInstallSource(): String {
return try {
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.R) {
packageManager.getInstallSourceInfo(packageName).installingPackageName
?: "unknown"
} else {
@Suppress("DEPRECATION")
packageManager.getInstallerPackageName(packageName) ?: "unknown"
}
} catch (e: Exception) {
"error"
}
}
}The
firmwareBuild.DISPLAYCrashlytics breadcrumbs tell you what the user was doing before the crash. Without them, you have a stack trace but no user journey.
object CrashBreadcrumb {
fun screen(name: String) {
FirebaseCrashlytics.getInstance().log("SCREEN: $name")
}
fun action(name: String, params: Map<String, String> = emptyMap()) {
val paramStr = if (params.isEmpty()) "" else " | ${params.entries.joinToString { "${it.key}=${it.value}" }}"
FirebaseCrashlytics.getInstance().log("ACTION: $name$paramStr")
}
fun networkCall(endpoint: String, statusCode: Int) {
FirebaseCrashlytics.getInstance().log("NETWORK: $endpoint → $statusCode")
}
fun state(key: String, value: String) {
FirebaseCrashlytics.getInstance().setCustomKey("state_$key", value)
}
}Usage in practice:
@Composable
fun HomeScreen(viewModel: HomeViewModel = hiltViewModel()) {
LaunchedEffect(Unit) {
CrashBreadcrumb.screen("HomeScreen")
}
val uiState by viewModel.uiState.collectAsStateWithLifecycle()
HomeContent(
uiState = uiState,
onRefresh = {
CrashBreadcrumb.action("Refresh", mapOf("trigger" to "user"))
viewModel.refresh()
}
)
}In Crashlytics, the breadcrumb log reads:
SCREEN: HomeScreen
ACTION: Refresh | trigger=user
NETWORK: /api/items → 200
SCREEN: DetailScreenThat tells me the user was on HomeScreen, refreshed, got a 200 from the API, navigated to DetailScreen — then crashed. The stack trace tells me where. The breadcrumbs tell me why.
When a new crash alert arrives:
Open Crashlytics → crash issue → check:
If it's below 98% crash-free rate on a single firmware version and my latest release isn't new, I'm looking at an OS regression, not a code bug. Different playbook.
I look for the first line that's in my code:
Fatal Exception: java.lang.NullPointerException
at com.sudarshantechlabs.myfamilytracker.data.repository.LocationRepository.processUpdate(LocationRepository.kt:87)
at com.sudarshantechlabs.myfamilytracker.data.repository.LocationRepository$locationFlow$1.invokeSuspend(LocationRepository.kt:54)
at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)LocationRepository.processUpdate:87LocationRepository.ktI look at the breadcrumbs for the user actions before the crash, then try to reproduce that sequence. 80% of crashes can be reproduced in a local build if the breadcrumbs are detailed enough.
If local reproduction fails, I set a custom key to record the specific state that led to the crash:
// In LocationRepository, before the problematic operation:
CrashBreadcrumb.state("last_location_update", location?.toString() ?: "null")
CrashBreadcrumb.state("network_available", isNetworkAvailable.toString())
CrashBreadcrumb.state("background_state", isInBackground.toString())Redeploy to internal testing, collect more crashes with better state data.
The fix must be validated on:
Not just "it compiles" — actually run the reproduction scenario on the target hardware.
The hardest class of crashes: low volume, no consistent pattern, no reliable reproduction.
These are almost always timing issues: race conditions, background thread state, or UI events firing in unexpected order.
My approach:
// Add explicit state machine to track object lifecycle
class LocationRepository @Inject constructor(
private val locationSource: LocationDataSource
) {
private enum class RepositoryState { IDLE, COLLECTING, STOPPED }
private var state = RepositoryState.IDLE
fun processUpdate(location: Location?) {
if (state != RepositoryState.COLLECTING) {
// Log the unexpected call instead of crashing
FirebaseCrashlytics.getInstance().log(
"processUpdate called in state $state — ignoring"
)
return
}
// safe to proceed
location ?: return // explicit null guard
// ... process
}
}Converting intermittent crashes into logged anomalies is often more valuable than trying to reproduce them. The non-fatal log tells you what state the object was in when the unexpected call arrived — information you can act on.
When you maintain 22 apps with identical architecture, crashes often occur across multiple apps simultaneously. An Android OS update can affect all of them.
Cross-app monitoring dashboard:
I built a simple monitoring view that pulls crash-free rates across all 22 apps from the Crashlytics API and shows them in one screen. When an OS update drops, I can see within 4 hours which apps are affected rather than checking each app's Firebase console separately.
# Pseudocode — actual impl uses Firebase Admin SDK
def get_crash_rates(app_ids: list[str]) -> dict:
return {
app_id: crashlytics_client.get_crash_free_rate(
app_id=app_id,
period_days=1
)
for app_id in app_ids
}
# Alert if any app drops below threshold
for app_id, rate in get_crash_rates(ALL_APP_IDS).items():
if rate < 0.995:
send_alert(f"{app_id}: crash-free rate {rate:.1%}")Shared root causes:
When multiple apps crash on the same day, the root cause is almost always:
Build.DISPLAYThe tagging system identifies which is which: if the firmware version clusters, it's #1. If it affects all versions equally, it's #2 or #3.
Crash rate is a lagging indicator. Time to root cause is what you can improve.
Before the tagging and breadcrumb system: average 4-6 hours to understand intermittent production crashes.
After: average 90 minutes to root cause for common crash types, 4 hours for novel issues.
The investment is in the logging infrastructure, not in being clever after a crash happens. When the crash arrives, the data is already there.
The best debugging strategy is the one you set up before you need it. Every crash that's hard to debug is a signal to add more structured logging before the next one.
Sudarshan Chaudhari
AI Systems Builder / Product Engineer
Bangkok, Thailand
Solo Android developer with 13+ years in QA, building Android apps, AI automation systems, and developer tools at SudarshanTechLabs.
Related Posts
Related Apps
Real-time family location sharing — Firebase Realtime DB for sub-second propagation, WorkManager + ForegroundService for OS-compliant background collection, geofencing via Google Maps API.
Building something? Available for Android dev and QA consulting.
Work with meComments — powered by Giscus
Real-time family location sharing — Firebase Realtime DB for sub-second propagation, WorkManager + ForegroundService for OS-compliant background collection, geofencing via Google Maps API.
ReadSend gentle nudges, emojis, and short voice notes to say "I miss you" without chatting.
Read