Data-driven fault diagnosis requires training data that is representative of the different operating conditions of the system to capture its behavior. If training data is limited, one solution is to incorporate physical insights into machine learning models to improve their effectiveness. However, while previous works show the usefulness of hybrid approaches for isolation of faults, the impact of training data must be taken into consideration when drawing conclusions from data-driven residuals in a consistency-based diagnosis framework. By giving an understanding of the physical interaction between the signals, a hybrid fault diagnosis approach, can enforce model properties of residual generators to isolate faults that are not represented in training data. The objective of this work is to analyze the impact of limited training data when training neural network-based residual generators. It is also investigated how the use of structural information when selecting the network structure is a solution to limited training data and how to ameliorate the performance of hybrid approaches in face of this challenge.