Evolution of sequence-diverse disordered regions in a protein family: order within the chaos
journal contributionposted on 2020-12-08, 23:44 authored by Thomas ShafeeThomas Shafee, Tony BacicTony Bacic, Kim JohnsonKim Johnson
© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org. Approaches for studying the evolution of globular proteins are now well established yet are unsuitable for disordered sequences. Our understanding of the evolution of proteins containing disordered regions therefore lags that of globular proteins, limiting our capacity to estimate their evolutionary history, classify paralogs, and identify potential sequence-function relationships. Here, we overcome these limitations by using new analytical approaches that project representations of sequence space to dissect the evolution of proteins with both ordered and disordered regions, and the correlated changes between these. We use the fasciclin-like arabinogalactan proteins (FLAs) as a model family, since they contain a variable number of globular fasciclin domains as well as several distinct types of disordered regions: proline (Pro)-rich arabinogalactan (AG) regions and longer Pro-depleted regions. Sequence space projections of fasciclin domains from 2019 FLAs from 78 species identified distinct clusters corresponding to different types of fasciclin domains. Clusters can be similarly identified in the seemingly random Pro-rich AG and Pro-depleted disordered regions. Sequence features of the globular and disordered regions clearly correlate with one another, implying coevolution of these distinct regions, as well as with the N-linked and O-linked glycosylation motifs. We reconstruct the overall evolutionary history of the FLAs, annotated with the changing domain architectures, glycosylation motifs, number and length of AG regions, and disordered region sequence features. Mapping these features onto the functionally characterized FLAs therefore enables their sequence-function relationships to be interrogated. These findings will inform research on the abundant disordered regions in protein families from all kingdoms of life.
T.S., A.B., and K.J. were supported by the La Trobe Institute for Agriculture and Food and K.J. by La Trobe Research Focus Area (Grant No. 2000004372)
JournalMolecular Biology and Evolution
PublisherOxford University Press (OUP)
Rights StatementThe Authors reserves all moral rights over the deposited text and must be credited if any re-use occurs. Documents deposited in OPAL are the Open Access versions of outputs published elsewhere. Changes resulting from the publishing process may therefore not be reflected in this document. The final published version may be obtained via the publisher’s DOI. Please note that additional copyright and access restrictions may apply to the published version.
Science & TechnologyLife Sciences & BiomedicineBiochemistry & Molecular BiologyEvolutionary BiologyGenetics & Hereditydisordered protein regionsfasciclin-like arabinogalactan proteinssequence spacesequence analysishydroxyproline-rich glycoproteinsCELL-ADHESION MOLECULEARABINOGALACTAN-PROTEINSPROLYL 4-HYDROXYLASEMUCILAGE ADHERENCEFASCICLIN-IARABIDOPSISALIGNMENTPLANTTOOLIDENTIFICATIONsequence analysis, hydroxyproline-rich glycoproteins