Borzoi – predicting RNA-seq coverage via Transformer-variant Model

Paper URL: https://www.nature.com/articles/s41588-024-02053-6

Data preprocessing

Transformation: Squashed scale

Owing to the relatively large dynamic range of RNA-seq, we normalized each coverage track by exponentiating its bin values by 3/4. If bin values were still larger than 384 after exponentiation, we applied an additional square-root transform to the residual value. These operations effectively limit the contribution that very highly expressed genes can impose on the model training loss. The formula below summarizes the transform applied to the jth bin for tissue t of target tensor y:

$$ boldsymbol{y}{j, t}^{(text {squashed ) }}=left{boldsymbol{y}{j, t}^{(3 / 4)} text { if } boldsymbol{y}{j, t}^{(3 / 4)} leq 384 text {, otherwise } 384+sqrt{boldsymbol{y}{j, t}^{(3 / 4)}-384}right} $$

We refer to this set of transformations as ‘squashed scale’ in the main text.

Input sequence Attribution methods

Gradient Saliency

pros:

cons:

Implementation:

image-20250424154757577

In-silico Mutagenesis

pros:

cons: Computationally expensive, longer processing time.

Implementation:

image-20250424154810488

image-20250424155232227

深度学习可解释性技术概览 (generated by AI)

基于梯度的方法

扰动和干预方法

基于激活的方法

基于注意力的方法

概念和语义层面的解释

基于案例的方法

后解释方法

模型内在可解释性

评估和比较

每种方法都有其优势和局限性,选择合适的可解释性技术取决于具体应用场景、模型类型和解释需求。

genoRetriever – predicting TSS…

Two methods might help

To improve the visual clarity of the learned feature curves, we apply L2 regularization to the weights of the second convolution layer in each consensus network with a scale factor of 2 × 10-3. To prevent attention bias in downstream prediction, we apply L1 regularization to all trainable convolution weights: a scale factor of 4 × 10-5 for the first encoding layer and 5 × 10-5 for the second convolution layer.

My intitution about L1 normalization and L2 normalization

L1 Regularization (Lasso):

L2 Regularization (Ridge):

Key Differences:

  1. Sparsity: L1 leads to sparse solutions; L2 retains all weights.
  2. Optimization: L1 is more complex; L2 is simpler.
  3. Use Cases: L1 is good for feature selection; L2 is better for multicollinearity.

Scores by discrete Fourier Transform (DFT) could reflect the combined positional and abundance effects of motif removal on transcription initation.

image-20250427170039562

where f(t) is the time-domain signal (the effect curve), F(ω) is its frequency-domain representation,

ω is the angular frequency, and iii is the imaginary unit.