A quantitative map of protein sequence space for the cis-defensin superfamily
© The Author(s) 2018.
Motivation: The cis-defensins are a superfamily of small, cationic, cysteine-rich proteins, sharing a common scaffold, but highly divergent sequences and varied functions from host-defence to signalling. Superfamily members are most abundant in plants (with some genomes containing hundreds of members), but are also found across fungi and invertebrates. However, of the thousands of cis-defensin sequences in databases, only have a handful have solved structures or assigned activities. Non-phylogenetic sequence-analysis methods are therefore necessary to use the relationships within the superfamily to classify members, and to predict and engineer functions. Results: We show that the generation of a quantitative map of sequence space allows these highly divergent sequences to be usefully analyzed. This information-rich technique can identify natural groupings of sequences with similar biophysical properties, detect interpretable covarying properties, and provide information on typical or intermediate sequences for each cluster. The cis-defensin superfamily contains clearly-defined groups, identifiable based on their biophysical properties and motifs. The organization of sequences within this space also provides a foundation of understanding the ancient evolution of the superfamily.
This work has been supported by the Australian Research Council (grant: DP150104386).
Australian Research Council | DP150104386
JournalBioinformatics (Oxford, England)
Pagination9p. (p. 743-752)
PublisherOxford University Press
Rights StatementThe Author reserves all moral rights over the deposited text and must be credited if any re-use occurs. Documents deposited in OPAL are the Open Access versions of outputs published elsewhere. Changes resulting from the publishing process may therefore not be reflected in this document. The final published version may be obtained via the publisher’s DOI. Please note that additional copyright and access restrictions may apply to the published version.
Science & TechnologyLife Sciences & BiomedicineTechnologyPhysical SciencesBiochemical Research MethodsBiotechnology & Applied MicrobiologyComputer Science, Interdisciplinary ApplicationsMathematical & Computational BiologyStatistics & ProbabilityBiochemistry & Molecular BiologyComputer ScienceMathematicsPLANT DEFENSINSR PACKAGEEVOLUTIONTOOLALIGNMENTPEPTIDESFAMILIESORGANIZATIONSELECTIONFUNGALDefensinsEvolution, MolecularAmino Acid SequenceGenomeBioinformatics