Evaluating drug toxicity signals: is a hierarchical classification of evidence useful or a hindrance?