Yes, all performance profiling tools add overhead. However, the slowdown and maturity of tools varies wildly. Learn the tools on a test host. Maybe use them in production.
strace can have incredibly high overhead. Gregg's point about it bears repeating:
WARNING: Can cause significant and sometimes massive performance
overhead, in the worst case, slowing the target application by over
100x. This may not only make it unsuitable for production use, but any
timing information may also be so distorted as to be misleading.
Essentially its hitting a debugger break point every system call. In the worst case when an application is syscall heavy it can run hundreds of times slower. As I recall ltrace is also ptrace() based and has similar overhead.
Both strace and ltrace are simple and mature tools, the programs are probably available and you won't have to write code. However I would not start with either when investigating a performance issue, especially not in production.
Performance tools are very platform specific. On Linux, my progression from monitoring to profiling goes something like this depending on what tools are available:
- Broad host level metrics using a tool like netdata, these can stay running
- System wide on CPU at a glance using perf top
- Better visualization of on CPU (specific programs or system wide) using perf script flamegraph
- Existing bpf scripts like from bcc-tools
- perf trace --syscalls an alternative to strace
- ltrace or strace
Notice a lot of these are using perf_events, a newer Linux interface to profiling data that's much more efficient at the user kernel boundary.
I do not know what exactly that log message means. If it is signal related, speaking of bcc-tools, consider using the killsnoop program to see what process sent the signal.