Corry crashing inconsistently running with ITk strip data on lxplus

I have been trying to run Corry on ITk strip data on lxplus7, and while it sometimes will run successfully, an immediate and identical re-run may exit with any of several errors or segfaults; all involve issues with strings and/or memory, and the segfault stacktraces point to string members of Configuration objects. Has anyone seen such behavior before? When I run on built-in testing data (corryvreckan/testing/test_filtering_timepix3tel_ebeam120.conf), everything runs fine.

Script to install everything:

#!/bin/bash
# runs on clean lxplus7

# optional: set gitlab_cern to https://user:accesstoken@gitlab.cern.ch
# source ~/set_gitlab.sh

# if gitlab_cern hasn't been set with access credentials
if [ -z "$gitlab_cern" ]; then
	gitlab_cern="https://gitlab.cern.ch"
fi

# initialize software dir
export install_dir=/eos/home-c/cjessima/Testbeam/202206_R2/software4
mkdir -p $install_dir
cd $install_dir
rm -rf ${install_dir}/testbeam-analysis ${install_dir}/eudaq ${install_dir}/corryvreckan

# clone necessary repos
git clone -b jonas-R5 ${gitlab_cern}/atlas-itk-strip-testbeam/testbeam-analysis.git ${install_dir}/testbeam-analysis
git clone -b jens-work-rebased ${gitlab_cern}/atlas-itk-strip-testbeam/eudaq.git ${install_dir}/eudaq
git clone https://github.com/eudaq/tlu-dependencies.git ${install_dir}/eudaq/user/tlu/extern
git clone -b master ${gitlab_cern}/corryvreckan/corryvreckan.git ${install_dir}/corryvreckan

# set up environment and build
source ${install_dir}/corryvreckan/etc/setup_lxplus.sh
export CC=$(which gcc)
export CXX=$(which g++)

mkdir ${install_dir}/eudaq/build
cd ${install_dir}/eudaq/build
cmake3 \
    -DUSER_TLU_BUILD=ON                  \
    -DUSER_EUDET_BUILD=ON                \
    -DUSER_STCONTROL_BUILD=ON            \
    -DUSER_EXAMPLE_BUILD=ON              \
    -DUSER_ITKSTRIP_BUILD=ON             \
    -DEUDAQ_BUILD_EXECUTABLE=ON          \
    -DEUDAQ_BUILD_GUI=ON                 \
    -DUSER_BUILD_TLU_ONLY_CONVERTER=ON   \
    -DEUDAQ_EXTRA_BUILD_NREADER=OFF      \
    -DEUDAQ_LIBRARY_BUILD_LCIO=OFF       \
	${install_dir}/eudaq
make -j4
make install
export eudaq_DIR=${install_dir}/eudaq

mkdir ${install_dir}/corryvreckan/build
cd ${install_dir}/corryvreckan/build
cmake3 \
    -DBUILD_EventLoaderEUDAQ2=ON          \
    ${install_dir}/corryvreckan
make -j4
make install
cd ${install_dir}

Script to run Corry (running masking on data on EOS):

#!/bin/bash

export install_dir=/eos/home-c/cjessima/Testbeam/202206_R2/software4

${install_dir}/corryvreckan/bin/corry \
	-c "${install_dir}/testbeam-analysis/config/june2021/USBPIX_ss/createmasks.conf" \
	-o detectors_file="${install_dir}/testbeam-analysis/config/june2021/USBPIX_ss/geometries/A.conf" \
	-o EventLoaderEUDAQ2.file_name="/eos/atlas/atlascerngroupdisk/det-itk/general/strips/Testbeam/June2021/raw_data/run002204_210701125256.raw" \
	-o output_directory="/eos/home-c/cjessima/temp/" \
	-o histogram_file="h_ss2204_mask.root" \
	-o log_level=TRACE

Summary of different errors:

From successful run:

...
|19:34:19.032|  (STATUS) =================| Initializing modules |==================
|19:34:19.032|   (TRACE) Creating and accessing ROOT directory
|19:34:19.037|  (STATUS) [I:EventLoaderEUDAQ2:TLU_0] Initializing "EventLoaderEUDAQ2:TLU_0"
|19:34:19.141|   (TRACE) Creating and accessing ROOT directory
...

Error 1:

...
|18:32:21.585|  (STATUS) =================| Initializing modules |==================
|18:32:21.585|   (TRACE) Creating and accessing ROOT directory
|18:32:21.585|   (FATAL) Fatal internal error
                            basic_string::_M_create
                         Cannot continue...

Error 2:

...
|19:33:38.232|  (STATUS) =================| Initializing modules |==================
|19:33:38.232|   (TRACE) Creating and accessing ROOT directory
|19:33:38.233|   (FATAL) Fatal internal error
                            std::bad_alloc
                         Cannot continue...

Error 3 (don't have full log, just error message):

...
(STATUS) =================| Initializing modules |==================
(TRACE) Creating and accessing ROOT directory
(FATAL) Fatal internal error
	basic_string::_M_construct null not valid
	Cannot continue...

Crash 1:

...
|18:26:36.353|  (STATUS) =================| Initializing modules |==================
|18:26:36.353|   (TRACE) Creating and accessing ROOT directory

 *** Break *** segmentation violation

===========================================================
probably-most-relevant lines of backtrace:
===========================================================

#12 std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string (this=0x7ffca90f7670, __str=...) at /cvmfs/sft.cern.ch/lcg/releases/gcc/10.3.0-f5826/x86_64-centos7/lib/gcc/x86_64-pc-linux-gnu/10.3.0/../../../../include/c++/10.3.0/bits/basic_string.h:451
#13 corryvreckan::Configuration::getName[abi:cxx11]() const (this=<optimized out>) at /eos/home-c/cjessima/Testbeam/202206_R2/software3/corryvreckan/src/core/config/Configuration.cpp:64
#14 0x00007f81dfe49317 in corryvreckan::ModuleManager::initializeAll (this=0x1321200) at /eos/home-c/cjessima/Testbeam/202206_R2/software3/corryvreckan/src/core/module/ModuleManager.cpp:648
#15 0x00007f81dfe0bef8 in corryvreckan::Corryvreckan::init (this=0x1320f60) at /eos/home-c/cjessima/Testbeam/202206_R2/software3/corryvreckan/src/core/Corryvreckan.cpp:163
#16 0x0000000000405ef3 in main (argc=13, argv=<optimized out>) at /eos/home-c/cjessima/Testbeam/202206_R2/software3/corryvreckan/src/exec/corry.cpp:188
===========================================================

Crash 2:

...
|18:35:12.417|  (STATUS) =================| Initializing modules |==================
|18:35:12.417|   (TRACE) Creating and accessing ROOT directory

 *** Break *** segmentation violation

===========================================================
probably-most-relevant lines of backtrace:
===========================================================

#11 0x00007f1856b5886d in std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >::find (this=0x2f46238, __x=...) at /cvmfs/sft.cern.ch/lcg/releases/gcc/10.3.0-f5826/x86_64-centos7/lib/gcc/x86_64-pc-linux-gnu/10.3.0/../../../../include/c++/10.3.0/bits/stl_map.h:1195
#12 corryvreckan::Configuration::has (this=0x2f461f0, key=...) at /eos/home-c/cjessima/Testbeam/202206_R2/software3/corryvreckan/src/core/config/Configuration.cpp:46
#13 0x00007f1856b6beed in corryvreckan::ModuleManager::set_module_before (this=<optimized out>, config=...) at /eos/home-c/cjessima/Testbeam/202206_R2/software3/corryvreckan/src/core/module/ModuleManager.cpp:807
#14 0x00007f1856b6e6de in corryvreckan::ModuleManager::initializeAll (this=0xf63180) at /eos/home-c/cjessima/Testbeam/202206_R2/software3/corryvreckan/src/core/module/ModuleManager.cpp:679
#15 0x00007f1856b30ef8 in corryvreckan::Corryvreckan::init (this=0xf62ee0) at /eos/home-c/cjessima/Testbeam/202206_R2/software3/corryvreckan/src/core/Corryvreckan.cpp:163
#16 0x0000000000405ef3 in main (argc=13, argv=<optimized out>) at /eos/home-c/cjessima/Testbeam/202206_R2/software3/corryvreckan/src/exec/corry.cpp:188

Hi @cjessima

somehow the notifications don’t work for me in this forum…

Meanwhile, here is your fix - thanks for discovering:

/Smon