Paper URL: https://www.nature.com/articles/s41588-024-02053-6
Owing to the relatively large dynamic range of RNA-seq, we normalized each coverage track by exponentiating its bin values by 3/4. If bin values were still larger than 384 after exponentiation, we applied an additional square-root transform to the residual value. These operations effectively limit the contribution that very highly expressed genes can impose on the model training loss. The formula below summarizes the transform applied to the jth bin for tissue t of target tensor y:
$$ boldsymbol{y}{j, t}^{(text {squashed ) }}=left{boldsymbol{y}{j, t}^{(3 / 4)} text { if } boldsymbol{y}{j, t}^{(3 / 4)} leq 384 text {, otherwise } 384+sqrt{boldsymbol{y}{j, t}^{(3 / 4)}-384}right} $$
We refer to this set of transformations as ‘squashed scale’ in the main text.
pros:
cons:
pros:
cons: Computationally expensive, longer processing time.
GradCAM/GradCAM++:结合特征图和梯度信息,生成类激活热图,特别适用于CNN。
DeepLIFT:比较每个神经元激活与参考激活的差异,分配贡献分数。
Input×Gradient:将输入与梯度相乘,突出显示重要特征。
LIME (Local Interpretable Model-agnostic Explanations):通过扰动输入并训练本地可解释模型来近似复杂模型。
SHAP (SHapley Additive exPlanations):基于博弈论的方法,计算每个特征对预测的贡献。
Occlusion/Perturbation:通过遮挡或修改输入的不同部分,观察模型输出变化。
CAM (Class Activation Mapping):使用全局平均池化层前的特征图来识别区分性区域。
Feature Visualization:通过优化输入以最大化特定神经元的激活,可视化神经元偏好。
Activation Maximization:生成最大化特定类别分数的合成输入。
Attention Maps:在使用注意力机制的模型中,可视化注意力权重指示模型关注的区域。
Transformer Explainability:可视化Transformer模型中自注意力矩阵,解释token之间的关系。
TCAV (Testing with Concept Activation Vectors):测试高级人类概念对模型决策的重要性。
Network Dissection:将神经元与可解释的视觉概念对应起来。
Concept Bottleneck Models:强制模型通过人类可理解的概念进行预测。
Influence Functions:识别训练集中对特定预测影响最大的样本。
Nearest Neighbors:通过查找激活空间中相似样本来解释预测。
代理模型:训练简单的可解释模型(决策树、线性模型)来模拟复杂模型的行为。
Rule Extraction:从神经网络中提取规则集合。
自解释模型:如注意力机制、原型网络等设计时考虑可解释性的模型。
稀疏模型:强制大部分权重为零,提高可解释性。
解释稳定性:评估相似输入的解释一致性。
解释忠实度:测量解释与模型行为的符合程度。
人类评估:通过用户研究评估解释对人类的帮助程度。
每种方法都有其优势和局限性,选择合适的可解释性技术取决于具体应用场景、模型类型和解释需求。
To improve the visual clarity of the learned feature curves, we apply L2 regularization to the weights of the second convolution layer in each consensus network with a scale factor of 2 × 10-3. To prevent attention bias in downstream prediction, we apply L1 regularization to all trainable convolution weights: a scale factor of 4 × 10-5 for the first encoding layer and 5 × 10-5 for the second convolution layer.
L1 Regularization (Lasso):
Definition: Adds the absolute value of weights to the loss function.
Formula:
$$
J(theta)=text { Loss }+lambda sum_{i=1}^nleft|theta_iright|
$$
Characteristics:
L2 Regularization (Ridge):
Definition: Adds the square of weights to the loss function.
Formula:
$$
J(theta)=text { Loss }+lambda sum_{i=1}^n theta_i^2
$$
Characteristics:
Key Differences:
where f(t) is the time-domain signal (the effect curve), F(ω) is its frequency-domain representation,
ω is the angular frequency, and iii is the imaginary unit.