System Server: Debugging and Troubleshooting
Part 4 of 6 in the Android System Server Deep Dive series
Previous: Part 3: Binder IPC Framework
Next: Part 5: Best Practices and Optimization
Series Index: View all articles
Learning Objectives
By the end of this article, you will understand:
- Essential command-line debugging tools and advanced techniques
- Binder IPC debugging and transaction analysis
- Service-specific debugging commands for AMS, WMS, PMS, and others
- Thread analysis, deadlock detection, and memory leak identification
- Performance profiling with systrace, perfetto, and simpleperf
- Boot time debugging and service lifecycle analysis
- ANR analysis workflow and Watchdog timeout diagnosis
- SELinux debugging and system properties for troubleshooting
- Real-world debugging scenarios and step-by-step workflows
Part IV: Practical Debugging and Analysis
4.1 Essential Command-Line Tools
Process Inspection:
bash# Find system_server PID adb shell ps -A | grep system_server # List all threads adb shell ps -T -p $(pidof system_server) # Monitor CPU usage adb shell top -t -d 3
Log Analysis:
bash# Filter system_server logs adb logcat | grep --line-buffered "$(pidof system_server)" # Filter by specific tags adb logcat ActivityManager:I PowerManagerService:W *:S # View crash logs adb logcat -b crash # Advanced log filtering # Filter by priority and tag adb logcat *:S ActivityManager:D WindowManager:V # Filter by process name adb logcat | grep --line-buffered "system_server" # Filter by time range adb logcat -t '11-15 10:30:00.000' -t '11-15 10:35:00.000' # Save logs to file adb logcat > system_server.log # Filter multiple tags adb logcat -s ActivityManager WindowManager PackageManager # Clear logs and start fresh adb logcat -c adb logcat > fresh_logs.txt
Service State Inspection:
bash# List all services adb shell service list # Dump service state adb shell dumpsys activity adb shell dumpsys window adb shell dumpsys power adb shell dumpsys meminfo system_server # List all available dumpsys services adb shell dumpsys -l # Dump specific service with options adb shell dumpsys activity activities adb shell dumpsys activity services adb shell dumpsys activity processes adb shell dumpsys window windows adb shell dumpsys window displays
Service-Specific Debugging Commands:
ActivityManagerService (AMS):
bash# Dump activity stack adb shell dumpsys activity activities | grep -A 20 "mResumedActivity" # Check running processes adb shell dumpsys activity processes # View pending intents adb shell dumpsys activity intents # Check app startup info adb shell dumpsys activity top # Monitor activity lifecycle adb logcat | grep -E "ActivityManager|ActivityTaskManager"
WindowManagerService (WMS):
bash# Dump window hierarchy adb shell dumpsys window windows # Check display info adb shell dumpsys window displays # View surface flinger info adb shell dumpsys SurfaceFlinger # Monitor window operations adb logcat | grep -E "WindowManager|SurfaceFlinger"
PackageManagerService (PMS):
bash# List installed packages adb shell dumpsys package packages # Check package permissions adb shell dumpsys package permissions # View package info adb shell dumpsys package <package_name> # Monitor package operations adb logcat | grep -E "PackageManager|Installer"
PowerManagerService:
bash# Check wake locks adb shell dumpsys power | grep -A 20 "Wake Locks" # View power state adb shell dumpsys power # Check battery stats adb shell dumpsys batterystats # Monitor power events adb logcat | grep -E "PowerManager|BatteryService"
Other Critical Services:
bash# Network service adb shell dumpsys connectivity adb shell dumpsys netstats # Input service adb shell dumpsys input # Location service adb shell dumpsys location # Notification service adb shell dumpsys notification
4.2 Performance Profiling
CPU Profiling with simpleperf:
bash# Record profile adb shell simpleperf record -p $(pidof system_server) -g --duration 10 -o /data/local/tmp/perf.data # Pull and analyze adb pull /data/local/tmp/perf.data
Memory Analysis:
bash# Memory breakdown adb shell dumpsys meminfo system_server # Process ranking adb shell procrank # Native heap profiling ./tools/heap_profile --name system_server
System Tracing:
Perfetto Tracing:
bash# Capture comprehensive system trace adb shell perfetto -o /data/local/tmp/trace.pbtxt -t 10s \ -c - <<EOF buffers: { size_kb: 63488 fill_policy: DISCARD } buffers: { size_kb: 2048 fill_policy: DISCARD } data_sources: { config { name: "android.surfaceflinger.frame" } } data_sources: { config { name: "linux.ftrace" ftrace_config { ftrace_events: "sched/sched_switch" ftrace_events: "sched/sched_waking" ftrace_events: "power/suspend_resume" ftrace_events: "power/cpu_frequency" ftrace_events: "power/cpu_idle" ftrace_events: "binder/binder_transaction" ftrace_events: "binder/binder_transaction_received" atrace_categories: "gfx" atrace_categories: "input" atrace_categories: "view" atrace_categories: "webview" atrace_categories: "wm" atrace_categories: "am" atrace_categories: "sm" atrace_categories: "audio" atrace_categories: "video" atrace_categories: "camera" atrace_categories: "hal" atrace_categories: "app" atrace_categories: "res" atrace_categories: "dalvik" atrace_categories: "rs" atrace_categories: "bionic" atrace_categories: "power" atrace_categories: "pm" atrace_categories: "ss" atrace_categories: "database" atrace_categories: "network" atrace_categories: "adb" atrace_categories: "vibrator" atrace_categories: "aidl" atrace_categories: "nnapi" atrace_categories: "rro" buffer_size_kb: 2048 } } } EOF # Pull trace and view in perfetto UI adb pull /data/local/tmp/trace.pbtxt # Open https://ui.perfetto.dev/ and upload trace.pbtxt
Systrace (Legacy but still useful):
bash# Capture systrace python systrace.py --time=10 -o trace.html \ sched freq idle am wm gfx view binder_driver hal dalvik camera input res # View in Chrome: chrome://tracing
Analyzing Traces:
- Binder Transactions: Look for long binder_transaction events (>5ms)
- CPU Scheduling: Check for thread starvation or priority inversion
- Service Calls: Identify slow service responses
- Deadlocks: Look for circular wait patterns in thread states
4.3 Binder IPC Debugging
Binder Transaction Monitoring:
bash# View Binder statistics adb shell cat /proc/binder/stats # Monitor Binder transactions in real-time adb shell strace -p $(pidof system_server) -e trace=binder_ioctl # Check Binder thread pool state adb shell dumpsys activity services | grep -A 10 "Binder" # Monitor Binder transaction latency adb shell dumpsys meminfo system_server | grep -i binder
Binder Deadlock Detection:
bash# Check for blocked Binder threads adb shell cat /data/anr/traces.txt | grep -A 20 "Binder" # Monitor Binder transaction timeouts adb logcat | grep -E "Binder.*timeout|DeadObjectException" # Check Binder transaction queue depth adb shell cat /proc/binder/proc/$(pidof system_server)
Binder Performance Analysis:
bash# Trace Binder calls with systrace python systrace.py --time=10 -o trace.html binder # Check transaction sizes adb shell dumpsys activity services | grep -E "transaction|parcel"
4.4 Thread Analysis and Deadlock Detection
Thread Inspection:
bash# List all threads with details adb shell ps -T -p $(pidof system_server) # Get thread stack traces adb shell kill -3 $(pidof system_server) adb shell cat /data/anr/traces.txt | grep -A 50 "system_server" # Monitor thread CPU usage adb shell top -H -p $(pidof system_server) # Check thread priorities adb shell cat /proc/$(pidof system_server)/task/*/stat | awk '{print $1, $18}'
Deadlock Detection Workflow:
bash# 1. Capture thread dump adb shell kill -3 $(pidof system_server) adb pull /data/anr/traces.txt # 2. Analyze for deadlock patterns # Look for: # - Threads waiting on locks held by other threads # - Circular wait conditions # - Blocked threads in synchronized blocks # 3. Identify lock holders grep -B 5 -A 20 "BLOCKED" traces.txt # 4. Check for monitor contention grep -i "waiting.*monitor" traces.txt
Thread State Analysis:
bash# Check thread states (RUNNABLE, BLOCKED, WAITING, TIMED_WAITING) adb shell cat /proc/$(pidof system_server)/task/*/stat | \ awk '{print $1, $3}' | sort -k2 # Monitor thread creation/destruction adb logcat | grep -E "Thread.*started|Thread.*terminated"
4.5 Memory Leak Detection
Memory Monitoring:
bash# Continuous memory monitoring watch -n 1 "adb shell dumpsys meminfo system_server | head -30" # Check for memory growth over time adb shell dumpsys meminfo system_server > mem_before.txt # ... perform operations ... adb shell dumpsys meminfo system_server > mem_after.txt diff mem_before.txt mem_after.txt # Check native heap adb shell dumpsys meminfo system_server | grep -A 10 "Native Heap" # Check Java heap adb shell dumpsys meminfo system_server | grep -A 10 "Java Heap"
Heap Analysis:
bash# Generate heap dump (requires root or debuggable build) adb shell am dumpheap $(pidof system_server) /data/local/tmp/heap.hprof adb pull /data/local/tmp/heap.hprof # Analyze with Android Studio Memory Profiler or MAT (Eclipse Memory Analyzer) # Check for object retention adb shell dumpsys meminfo system_server | grep -E "Objects|Views|Activities"
Native Memory Leaks:
bash# Use Malloc Debug (requires root) adb shell setprop libc.debug.malloc.options backtrace adb shell setprop libc.debug.malloc.program system_server adb reboot # After reboot, check for leaks adb shell dumpsys meminfo system_server | grep -A 5 "Native"
4.6 Boot Time Debugging
Service Startup Analysis:
bash# Monitor service startup sequence adb logcat | grep -E "SystemServiceManager|Starting.*Service" # Check boot time adb shell dumpsys activity services | grep -E "boot|startup" # Analyze service initialization order adb logcat -b all | grep -E "SystemServer.*start|ServiceManager"
Boot Performance:
bash# Capture boot trace adb shell perfetto -o /data/local/tmp/boot_trace.pbtxt \ -c - <<EOF buffers: { size_kb: 63488 } data_sources: { config { name: "android.boot" } } EOF # Check boot time metrics adb shell dumpsys activity services | grep -E "boot.*time|startup.*duration"
Service Registration Debugging:
bash# Check service registration adb shell service list | grep -E "activity|window|power" # Monitor service binder registration adb logcat | grep -E "addService|ServiceManager.*add"
4.7 Advanced Debugging Techniques
Attaching Native Debugger:
bash# Get PID adb shell pidof system_server # Attach LLDB (requires root or debuggable build) lldbclient.py -p <PID> # Common LLDB commands: # (lldb) bt # Backtrace # (lldb) thread list # List threads # (lldb) frame select # Select frame # (lldb) print <var> # Print variable
Java Debugging with JDWP:
bash# Enable JDWP (requires debuggable build) adb shell setprop dalvik.vm.checkjni true adb shell setprop debug.checkjni 1 # Attach debugger jdb -attach localhost:8700
System Properties for Debugging:
bash# Enable debug logging adb shell setprop log.tag.ActivityManager VERBOSE adb shell setprop log.tag.WindowManager VERBOSE adb shell setprop log.tag.PackageManager VERBOSE # Enable Binder debugging adb shell setprop debug.binder.logging 1 # Enable service manager debugging adb shell setprop debug.servicemanager.logging 1 # Check all debug properties adb shell getprop | grep debug
SELinux Debugging:
bash# Check SELinux status adb shell getenforce # View SELinux denials adb shell dmesg | grep -i "avc.*denied" adb logcat | grep -i "selinux" # Check SELinux context for system_server adb shell ps -Z | grep system_server # Set SELinux to permissive (requires root, for debugging only) adb shell setenforce 0
Watchdog Analysis:
- Examine
/data/anr/traces.txtfor deadlock patterns - Look for blocked threads and lock holders
- Analyze circular dependencies
- Check for threads stuck in native code
- Verify no infinite loops in service code
Part V: Common Failure Modes
5.1 Watchdog Timeouts
Symptoms: 60-second system freeze followed by soft reboot
Causes: Deadlocked threads, blocking I/O on main thread
Diagnosis: Analyze stack traces in /data/anr/traces.txt
5.2 System ANRs
Symptoms: App ANR dialogs with root cause in system_server
Causes: Slow service responses, blocked Binder threads
ANR Analysis Workflow:
Step 1: Identify ANR Type
bash# Check ANR traces adb shell cat /data/anr/traces.txt # Check ANR in logcat adb logcat | grep -i "anr\|not responding" # Check dropbox for ANR reports adb shell dumpsys dropbox --print | grep -i anr
Step 2: Analyze Application Stack Trace
bash# Find the app process in traces.txt grep -A 50 "pid.*<app_pid>" /data/anr/traces.txt # Look for: # - Main thread blocked on Binder call # - Waiting on system_server response # - Blocked on lock held by system_server
Step 3: Analyze System Server Stack Trace
bash# Find system_server in traces.txt grep -A 50 "system_server" /data/anr/traces.txt # Look for: # - Binder thread blocked on slow operation # - Service handler stuck in long-running code # - Deadlock with application thread
Step 4: Identify Root Cause
bash# Check Binder transaction latency adb shell dumpsys activity services | grep -A 5 "Binder" # Check service response times adb logcat | grep -E "Service.*took|Transaction.*duration" # Verify service is not deadlocked adb shell cat /data/anr/traces.txt | grep -E "BLOCKED|WAITING"
Common ANR Patterns:
-
Slow Binder Transaction:
- App thread:
BinderProxy.transact()waiting - System thread: Service handler processing slowly
- Solution: Optimize service handler, use async callbacks
- App thread:
-
Deadlock:
- App thread: Waiting on lock A, holding lock B
- System thread: Waiting on lock B, holding lock A
- Solution: Fix lock ordering, use tryLock with timeout
-
Blocked on I/O:
- System thread: Blocked on file/network I/O
- Solution: Move I/O to background thread, use async APIs
ANR/Watchdog Interaction:
While Watchdog Timeouts (Section 5.1) and System ANRs are distinct failure modes, they often interact in complex ways. For example, an extremely slow Binder call from an application waiting on the system_server could lead to both an App ANR (when the app's main thread is blocked waiting for the system_server response) and eventually contribute to a Watchdog timeout if critical system_server threads are blocked by the same operation. This interaction highlights the importance of monitoring both application-level ANRs and system-level watchdog events when diagnosing performance issues, as they may share a common root cause in system_server thread blocking.
5.3 Native Crashes
Symptoms: Instant hard reboot or system_server crash
Causes: Segmentation faults in JNI or native code, null pointer dereferences, stack overflow
Crash Analysis Workflow:
Step 1: Locate Crash Reports
bash# List tombstone files adb shell ls -l /data/tombstones/ # Pull latest tombstone adb pull /data/tombstones/tombstone_00 # Check crash in logcat adb logcat -b crash # Check dropbox for crash reports adb shell dumpsys dropbox --print | grep -i "crash\|tombstone"
Step 2: Analyze Tombstone
bash# Key sections to examine: # - Build fingerprint # - Signal (SIGSEGV, SIGABRT, etc.) # - Register state # - Stack trace # - Memory map # Use addr2line to resolve addresses addr2line -e <binary> <address>
Step 3: Identify Root Cause
- SIGSEGV: Null pointer, buffer overflow, use-after-free
- SIGABRT: Assertion failure, abort() call
- SIGBUS: Misaligned memory access
- SIGFPE: Floating point exception
Common Native Crash Patterns:
-
JNI Null Pointer:
- Check JNI code for null checks
- Verify object references are valid
- Solution: Add null checks, validate JNI parameters
-
Stack Overflow:
- Check for infinite recursion
- Verify stack size limits
- Solution: Fix recursion, increase stack size if needed
-
Use-After-Free:
- Check for dangling pointers
- Verify object lifetime management
- Solution: Use smart pointers, validate object state
5.4 Service Lifecycle Issues
Symptoms: Services not starting, crashing, or not responding
Debugging Service Startup:
bash# Monitor service registration adb logcat | grep -E "addService|ServiceManager" # Check service availability adb shell service check <service_name> # Verify service binder interface adb shell dumpsys activity services | grep <service_name>
Service Crash Recovery:
bash# Check if service crashed adb logcat | grep -E "Service.*died|Service.*crash" # Monitor service restart adb logcat | grep -E "Service.*restart|Service.*recovery" # Check service state adb shell dumpsys activity services | grep -A 10 <service_name>
5.5 Performance Regressions
Symptoms: System slowdown, increased latency, higher CPU usage
Performance Baseline:
bash# Capture baseline metrics adb shell dumpsys meminfo system_server > baseline_mem.txt adb shell top -n 1 -p $(pidof system_server) > baseline_cpu.txt # After changes, compare adb shell dumpsys meminfo system_server > current_mem.txt adb shell top -n 1 -p $(pidof system_server) > current_cpu.txt
Identifying Regressions:
bash# Compare CPU usage diff baseline_cpu.txt current_cpu.txt # Compare memory usage diff baseline_mem.txt current_mem.txt # Check for new allocations adb shell dumpsys meminfo system_server | grep -E "Objects|Allocations"
Performance Profiling:
bash# Profile before and after adb shell simpleperf record -p $(pidof system_server) -g --duration 30 # Compare profiles simpleperf report --compare baseline.data current.data
5.6 Battery and Thermal Issues
Symptoms: Excessive battery drain, thermal throttling
Battery Analysis:
bash# Check wake locks adb shell dumpsys power | grep -A 20 "Wake Locks" # Check battery stats adb shell dumpsys batterystats # Monitor CPU frequency adb shell cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_cur_freq # Check thermal state adb shell dumpsys thermalservice
Identifying Battery Drains:
bash# Check for excessive CPU wakeups adb shell dumpsys batterystats | grep -E "wakeup|partial_wake_lock" # Monitor service CPU usage adb shell top -H -p $(pidof system_server) -d 5 # Check for polling loops adb logcat | grep -E "polling|wakeup|alarm"
6. Real-World Debugging Scenarios
Scenario 1: System Freeze Investigation
Symptoms: Device freezes, no response to input
Debugging Steps:
bash# 1. Check if system_server is alive adb shell ps -A | grep system_server # 2. Get thread dump adb shell kill -3 $(pidof system_server) adb pull /data/anr/traces.txt # 3. Analyze for deadlocks grep -E "BLOCKED|WAITING" traces.txt # 4. Check Binder transactions adb shell cat /proc/binder/stats # 5. Check CPU usage adb shell top -H -p $(pidof system_server)
Scenario 2: App Launch Slowdown
Symptoms: Apps take too long to start
Debugging Steps:
bash# 1. Trace app launch adb shell perfetto -o /data/local/tmp/launch.pbtxt -t 10s am # 2. Check AMS activity stack adb shell dumpsys activity activities | grep -A 10 "mResumedActivity" # 3. Monitor service calls during launch adb logcat | grep -E "ActivityManager|PackageManager" | grep -E "start|launch" # 4. Check for blocking operations adb shell dumpsys activity services | grep -A 5 "Binder"
Scenario 3: Memory Leak Investigation
Symptoms: System memory usage grows over time
Debugging Steps:
bash# 1. Baseline memory adb shell dumpsys meminfo system_server > mem_baseline.txt # 2. Wait and check again sleep 3600 # Wait 1 hour adb shell dumpsys meminfo system_server > mem_after.txt # 3. Compare diff mem_baseline.txt mem_after.txt # 4. Generate heap dump adb shell am dumpheap $(pidof system_server) /data/local/tmp/heap.hprof adb pull /data/local/tmp/heap.hprof # 5. Analyze with MAT or Android Studio
Scenario 4: Binder Transaction Timeout
Symptoms: Apps report "Service not responding"
Debugging Steps:
bash# 1. Check Binder statistics adb shell cat /proc/binder/stats # 2. Monitor Binder transactions adb shell strace -p $(pidof system_server) -e trace=binder_ioctl # 3. Check service handler threads adb shell ps -T -p $(pidof system_server) | grep -E "Binder|Handler" # 4. Analyze service response times adb logcat | grep -E "Service.*took|Transaction.*duration" # 5. Check for deadlocks adb shell cat /data/anr/traces.txt | grep -A 20 "Binder"
Summary
This comprehensive article covered essential and advanced debugging tools and techniques for system_server:
- Command-Line Tools: Process inspection, log analysis, service state dumps, and service-specific debugging
- Binder IPC Debugging: Transaction monitoring, deadlock detection, and performance analysis
- Thread Analysis: Deadlock detection, thread state analysis, and CPU profiling
- Memory Debugging: Leak detection, heap analysis, and native memory tracking
- Boot Time Debugging: Service startup analysis and initialization order
- Performance Profiling: CPU, memory, and system tracing with perfetto and systrace
- Advanced Techniques: Native debugging, JDWP, system properties, and SELinux debugging
- Failure Modes: Watchdog timeouts, ANRs, native crashes, service lifecycle issues, performance regressions, and battery/thermal problems
- Real-World Scenarios: Step-by-step workflows for common debugging situations
Next Steps
Continue to Part 5: Best Practices and Optimization to learn development guidelines and common pitfalls to avoid.
Related Articles
- Part 3: Binder IPC Framework
- Part 5: Best Practices and Optimization
- Part 6: Advanced Q&A - See Q3, Q5 for detailed failure analysis
- Series Index
This article is part of the Android System Server Deep Dive series. For the complete learning path, start with the Series Index.