1

I have used strace to observe running service process for a few times and it seems there is not much impact to tracee process. But today when I used ltrace to do some tracing, it crashed the tracee process, luckily it is in an dev environment. The dmesg show following error:

[Fri Aug  2 11:02:43 2024] traps: writer1[4137194] trap int3 ip:4092c1 sp:7f1eda53db58 error:0 in service_prog[400000+1ea000]

So I have these questions:

  1. Does perf/strace/ltrace has performance impact on the tracee process?
  2. Is it safe to use perf/strace/ltrace in a production environment?
  3. Why ltrace crashed the tracee process? How to interpret the above dmesg message? what is trap int3?

Thank you.

Xiaoyong Guo
  • 113
  • 3

1 Answers1

2

Yes, all performance profiling tools add overhead. However, the slowdown and maturity of tools varies wildly. Learn the tools on a test host. Maybe use them in production.

strace can have incredibly high overhead. Gregg's point about it bears repeating:

WARNING: Can cause significant and sometimes massive performance overhead, in the worst case, slowing the target application by over 100x. This may not only make it unsuitable for production use, but any timing information may also be so distorted as to be misleading.

Essentially its hitting a debugger break point every system call. In the worst case when an application is syscall heavy it can run hundreds of times slower. As I recall ltrace is also ptrace() based and has similar overhead.

Both strace and ltrace are simple and mature tools, the programs are probably available and you won't have to write code. However I would not start with either when investigating a performance issue, especially not in production.

Performance tools are very platform specific. On Linux, my progression from monitoring to profiling goes something like this depending on what tools are available:

  • Broad host level metrics using a tool like netdata, these can stay running
  • System wide on CPU at a glance using perf top
  • Better visualization of on CPU (specific programs or system wide) using perf script flamegraph
  • Existing bpf scripts like from bcc-tools
  • perf trace --syscalls an alternative to strace
  • ltrace or strace

Notice a lot of these are using perf_events, a newer Linux interface to profiling data that's much more efficient at the user kernel boundary.

I do not know what exactly that log message means. If it is signal related, speaking of bcc-tools, consider using the killsnoop program to see what process sent the signal.

John Mahowald
  • 36,071