Every human is made up of trillions of cells, each with its own function, whether it’s carrying oxygen, fighting infections, or building organs. Even within the same tissue, no two cells are exactly alike. Single-cell RNA sequencing (scRNA-seq) allows us to measure the gene expression of individual cells, revealing what each cell is doing at a given moment.
But there’s a catch: single-cell data are massive, high-dimensional, and hard to interpret. Each cell can be represented by thousands of numbers — its gene expression measurements — which traditionally require specialized tools and models to analyze. This makes single-cell analysis slow, difficult to scale, and limited to expert users.
What if we could turn those thousands of numbers into language that humans and language models can understand? That is, what if we could ask a cell how it’s feeling, what it’s doing, or how it might respond to a drug or disease — and get an answer back in plain English? From individual cells to entire tissues, understanding biological systems at this level could transform how we study, diagnose, and treat disease.
Today in “Scaling Large Language Models for Next-Generation Single-Cell Analysis“, we’re excited to introduce Cell2Sentence-Scale (C2S-Scale), a family of powerful, open-source large language models (LLMs) trained to “read” and “write” biological data at the single-cell level. In this post, we’ll walk through the basics of single-cell biology, how we transform cells into sequences of words, and how C2S-Scale opens up new possibilities for biological discovery.