When you are developing on remote servers, you don't always have the luxury of the nice GUI of various IDEs. X forwarding over ssh sucks ...
For all my real debugging tools (whenever I am testing code on full-scale production on hundreds of cores and ten thousands of threads), I use PAPI hardware counters, but more importantly Score-P, which gives me a lot more detailed aggregate information on time spent in MPI communication, stack trace on a given thread, etc.