Alex Pearwin

Fixing a TMVA error

This post has been archived. It's pretty old and likely concerns topics that are avoidable by using more modern tools and techniques. The original text is preserved below for posterity but it may no longer be relevant or correct.

If you’re using the multivariate analysis package TMVA and are running in to the following error on training or testing:

--- <FATAL> Tools: <GetSeparation> signal and background histograms have different or invalid dimensions

It might be failing because you’re giving it one or more NaN values.

By default, TMVA selects its training and testing data randomly from the input data. It will do this in a repeatable fashion unless you give 0 as the value to the RandomSeed option in the splitting options, so you may either get this error each time you run TMVA or just occasionally.

After a couple weeks of working around the issue, I discovered that I had a single event (a single event, mind you!) with a negative value for a particular variable, which I gave the log of to TMVA. In C++, the logarithm of a negative number is represented as a NaN value, and it was this which was causing the error.

You can either apply a cut to the input data:

factory->PrepareTrainingAndTestTree("troublesome_var > 0", "troublesome_var > 0", splitOptions);

or make sure that the events with bad values aren’t present in your input data.

The TMVA user’s guide documents the PrepareTrainingAndTestTree method, amongst other things.