Abstract Cellulose is a carbon source widespread in nature. However, it is a difficult task for any organism to get carbon atoms from the ce
Abstract Cellulose is a carbon source widespread in nature. However, it is a difficult task for any organism to get carbon atoms from the cellulose as it has a highly complex structure. Only a few taxonomic groups are known to decompose cellulose. They do it by producing cellulases, the various enzymes which break beta-glycosidic bonds in the cellulose. Cellulases were identified in 1,735 metagenomes from 225 bioprojects. The set of 12,837 metagenome-derived cellulases encompass three catalytic functions: exoglucanases (CBH, 1,042), endoglucanases (EG, 5,685), and beta-glucosidases (βG, 6,110). All three enzymatic functions are thought to be necessary for driving cellulase to a cascade of reactions that can make cellulose available as glucose. These metagenome-derived cellulases were clustered into protein families for each EC category individually, resulting in a total of 136 clusters, with the majority observed for EG (97 clusters), followed by βG (19 clusters) and CBH (19 clusters). These clusters provided a useful cellulase dataset for future research on cellulase utilization.