Quantitative Input Feature Usage

Data science plays a crucial role in critical decision making in a variety of fields, including healthcare, finance, and avionics. Therefore, disastrous outcomes may result from programming errors in these safety-critical settings, especially when they do not result in software failures but instead produce a plausible yet erroneous outcome. Such bugs are hard to spot since they provide no indication that something went wrong. A potential source of such errors is when an input feature of an application has an unexpected impact on the program computation compared to the developers’ expectations. The likelihood that a programming error causes some input feature to be more, or less, influent than what the application expects is even higher for data science applications, where data goes through a long pipeline of layers that filter, merge, and manipulate it.
 
We present a novel quantitative input feature usage framework by abstract interpretation to discriminate between input features with different impact on the program’s outcome. This framework can be used to identify input features that have a disproportionate impact on the computation. Such knowledge could either certify intended behaviour or reveal potential flaws, by matching the developers’ intuition on the expected impact of their input features with the actual result of the quantitative study. A preliminary implementation shows the practical applicability of our approach, computing an upper bound of the input features impact for feed-forward ReLU-activated neural networks.