GeneZip: Region-Aware Compression for Long Context DNA Modeling

ByAdmin

Feb 23, 2026

THE AI TODAY

arXiv:2602.17739v1 Announce Type: cross
Abstract: Genomic sequences span billions of base pairs (bp), posing a fundamental challenge for genome-scale foundation models. Existing approaches largely sidestep this barrier by either scaling relatively small models to long contexts or relying on heavy multi-GPU parallelism. Here we introduce GeneZip, a DNA compression model that leverages a key biological prior: genomic information is highly imbalanced. Coding regions comprise only a small fraction (about 2 percent) yet are information-dense, whereas most non-coding sequence is comparatively information-sparse. GeneZip couples HNet-style dynamic routing with a region-aware compression-ratio objective, enabling adaptive allocation of representation budget across genomic regions. As a result, GeneZip learns region-aware compression and achieves 137.6x compression with only 0.31 perplexity increase. On downstream long-context benchmarks, GeneZip achieves comparable or better performance on contact map prediction, expression quantitative trait loci prediction, and enhancer-target gene prediction. By reducing effective sequence length, GeneZip unlocks simultaneous scaling of context and capacity: compared to the prior state-of-the-art model JanusDNA, it enables training models 82.6x larger at 1M-bp context, supporting a 636M-parameter GeneZip model at 1M-bp context. All experiments in this paper can be trained on a single A100 80GB GPU.

By Admin

AI RESEARCH

Accurate prediction of ecDNA in interphase cancer cells using deep neural networks

Apr 11, 2026 Admin

AI RESEARCH

Using causal machine learning and real world data to improve dose response decision making for secukinumab in psoriatic arthritis

Apr 11, 2026 Admin

AI RESEARCH

LobePrior segments lung lobes on computed tomography images in the presence of severe abnormalities

Apr 10, 2026 Admin

GeneZip: Region-Aware Compression for Long Context DNA Modeling

ByAdmin

By Admin

Related Post

Accurate prediction of ecDNA in interphase cancer cells using deep neural networks

Using causal machine learning and real world data to improve dose response decision making for secukinumab in psoriatic arthritis

LobePrior segments lung lobes on computed tomography images in the presence of severe abnormalities

You missed

Using causal machine learning and real world data to improve dose response decision making for secukinumab in psoriatic arthritis

Accurate prediction of ecDNA in interphase cancer cells using deep neural networks

A lightweight machine learning approach for DDoS detection and classification

LobePrior segments lung lobes on computed tomography images in the presence of severe abnormalities