.. ## Copyright (c) Lawrence Livermore National Security, LLC and .. ## other Smith Project Developers. See the top-level COPYRIGHT file for details. .. ## .. ## SPDX-License-Identifier: (BSD-3-Clause) =============== Memory Checking =============== There are two commonly available memory checkers available to use with Smith on LC: `AddressSanitizer `_ and `Valgrind `_. AddressSanitizer ---------------- AddressSanitizer (aka Asan) is memory error detection tool that is a part of LLVM. It very fast and easy to use but doesn't seem as robust as Valgrind. It requires compile and link flags which are enabled via the CMake option ``ENABLE_ASAN``. Anything in our CMake system will get those flags after that is enabled but our third-party libraries (like MFEM) will not. After that just run your built executable and Asan will output a log to the screen after your program is done running. Asan's behavior can be modified with a set of `environment variables `_ . .. note:: Asan only works with the Clang and GCC compiler chains. Our build system will throw an error if you try to build with anything else while ``ENABLE_ASAN`` is ``ON``. Here is a recommended workflow: .. code-block:: bash ./config-build.py -hc host-configs/ruby-toss_4_x86_64_ib-clang@14.0.6.cmake -DENABLE_ASAN=ON cd build-ruby-toss_4_x86_64_ib-clang@14.0.6-debug srun -N1 --exclusive --mpi-bind=off make -j LSAN_OPTIONS=suppressions=../suppressions.asan ASAN_OPTIONS=log_path=asan.out:log_exe_name=true srun -n2 bin/smith This will output files in the current directory for each process that follow the pattern: ``asan.out..``. It also sets your return code to a non-zero value if there were any non-suppressed memory errors. ``LSAN_OPTIONS`` and ``ASAN_OPTIONS`` are delimited by ':'. Here is an explanation of the given options (all should be added to ``ASAN_OPTIONS`` unless noted): * ``suppressions``: Location of memory leak suppression file (``LSAN_OPTIONS``) * ``log_path``: Logs to the given file instead of to the screen. This is very helpful to avoid intermingled lines on the screen from every process * ``log_exe_name``: Adds executable name to log_path Helpful options: * ``fast_unwind_on_malloc=0``: This improves Asan's stack tracing ability but also greatly slows down the run * ``exitcode=0``: This stops Asan from returning a a non-zero exit code from your executable (defaults to 23) (``LSAN_OPTIONS``) Debugging with Address Sanitizer enabled ======================================== If a program results in address sanitizer emitting an error (gcc example, here): .. code-block:: bash ==45800==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61a00000a1c0 at pc 0x5561996372b7 bp 0x7fff89f707e0 sp 0x7fff89f707d0 READ of size 8 at 0x61a00000a1c0 thread T0 #0 0x5561996372b6 in smith::Functional), (smith::ExecutionSpace)0>::Gradient::operator mfem::Vector() (/home/sam/code/smith/build/tests/functional_qoi+0x57e2b6) #1 0x55619962df8c in void check_gradient)>(smith::Functional), (smith::ExecutionSpace)0>&, mfem::Vector&) (/home/sam/code/smith/build/tests/functional_qoi+0x574f8c) #2 0x556199624144 in void functional_qoi_test<2, 2>(mfem::ParMesh&, smith::H1<2, 1>, smith::Dimension<2>) (/home/sam/code/smith/build/tests/functional_qoi+0x56b144) #3 0x556199614ee3 in main /home/sam/code/smith/src/smith/physics/utilities/functional/tests/functional_qoi.cpp:206 the information provided (e.g. invalid read at /home/sam/code/smith/build/tests/functional_qoi+0x57e2b6) doesn't precisely reveal where the error takes place. We can set a breakpoint on the address sanitizer error function to find out exactly where the problem lies, but first we need to find out the name of that error reporting symbol. One way to do this is to filter the symbols in the executable: .. code-block:: bash sam@provolone:~/code/smith/build/tests$ nm -g ./functional_qoi | grep asan U __asan_after_dynamic_init U __asan_before_dynamic_init U __asan_handle_no_return U __asan_init 0000000001646450 B __asan_option_detect_stack_use_after_return U __asan_poison_stack_memory U __asan_register_globals U __asan_report_load1 U __asan_report_load16 U __asan_report_load2 U __asan_report_load4 U __asan_report_load8 U __asan_report_load_n U __asan_report_store1 U __asan_report_store16 U __asan_report_store4 U __asan_report_store8 U __asan_report_store_n U __asan_stack_free_5 U __asan_stack_free_6 U __asan_stack_free_7 U __asan_stack_free_8 U __asan_stack_free_9 U __asan_stack_malloc_0 U __asan_stack_malloc_1 U __asan_stack_malloc_2 U __asan_stack_malloc_3 U __asan_stack_malloc_4 U __asan_stack_malloc_5 U __asan_stack_malloc_6 U __asan_stack_malloc_7 U __asan_stack_malloc_8 U __asan_stack_malloc_9 U __asan_unpoison_stack_memory U __asan_unregister_globals U __asan_version_mismatch_check_v8 00000000016466a1 B __odr_asan.mesh2D 00000000016466a0 B __odr_asan.mesh3D 00000000016466a3 B __odr_asan.myid 00000000016466a2 B __odr_asan.nsamples 00000000016466a4 B __odr_asan.num_procs and select the one that corresponds to the the original error (read of size 8 => ``__asan_report_load8``). Valgrind -------- Valgrind is a very powerful set of tools that help with dynamic analysis tools. We will focus on `memcheck `_ which is a memory error detection tool. Unlike Asan, valgrind does not need any special compiler flags. Just build your executable and run your executable with ``valgrind``. Valgrind's suppression files are easily generated by valgrind with ``--gen-suppressions=all`` and are more customizable than Asan's. Here is a recommended workflow: .. code-block:: bash ./config-build.py -hc host-configs/rzgenie-toss_3_x86_64_ib-gcc@8.1.0.cmake cd build-rzgenie-toss_3_x86_64_ib-gcc@8.1.0-debug srun -N1 --exclusive --mpi-bind=off make -j srun -n2 valgrind --tool=memcheck --log-file=valgrind.out --leak-check=yes --show-leak-kinds=all --num-callers=20 --suppressions=../suppressions.valgrind bin/smith This will produce a file called ``valgrind.out`` in the current directory with a valgrind report. Here is an explanation of the given options: * ``--tool=memcheck``: valgrind is a tool-suite so this runs the memcheck tool * ``--log-file=valgrind.out``: Logs report to the given file * ``--leak-check=yes``: Enables memory leak checks * ``--show-leak-kinds=all```: Enables showing all memory leak kinds * ``--num-callers=20``: Limits the size of the stack traces to 20 * ``--suppressions=../suppressions.valgrind``: Location of memory leak suppression file