Prediction and application of intrinsically disordered regions in practice

by Adam Górka

Department of Physical Biochemistry, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Kraków, Poland.

e-mail:adam.gorka@uj.edu.pl

Protein quartet model

 

Figure 1

Proteins are the major components of living cells. As discovered recently proteins and protein regions may exist in at least four different states of structural organization: ordered, molten globule, pre-molten globule, and coil-like (Figure 1). What is more, protein function is associated with any of these distinct states or with transitions between them. Entire proteins or proteins regions in the pre-molten globule state or coil-like state display no unique tertiary structure [1]. These Intrinsically Disordered Proteins (IDPs) and Intrinsically Disordered Regions (IDRs) are involved in key biological processes including transcription regulation, cell cycle control, recognition, and signaling [2].

Order vs. Disorder

In contrast to ordered regions (OR), IDR exist as dynamic ensembles in which atom positions and backbone Ramachandran angles vary significantly over time with no specific equilibrium values. The conformational changes of IDR are typically non-cooperative and random. The structure of IDR is dynamic and changing over time what does not exclude the temporary presence of local functional secondary structure that fluctuates in absence of stabilizing forces [3, 4].

How to become a disordered protein

The amino acid chain of IDPs is characterized by a high mean net charge and low mean hydrophobicity per protein residue. High net charge leads to formation of charge–charge repulsion forces that outweighs low hydrophobic force driving to protein collapse and prevent formation of a stable tertiary structure [5, 6].

Amino acid composition of intrinsically disordered regions

This feature make differences between amino acid composition of ordered and disordered regions in proteins. IDPs are enriched in amino acids that promote disorder including: Q, E, A, R, S, F, G, P [7, 8] and depleted in hydrophobic amino acids that promote hydrophobic collapse: W, F, L, S, V, C, F, N [7, 8]. Further differences also exist between short (< 30 amino acids) and long (>= 30 amino acids) IDRs. Short regions are enriched in G, D and have less I, V, L while the long contain more K, E, P and less Q, G, N [9, 10].

Intrinsically disordered regions in numbers

Bioinformatic analyses of proteins’ amino acid composition in known genomes show that the unstructured regions are much more frequent in eukaryotic than prokaryotic proteins. The Disopred2 algorithm predicts that 33% of eukaryotic proteins and only 2% of archaean proteins, and 4.2% of eubacterial proteins contain more than 30 amino acid long IDRs [11]. A similar prediction with the PONDR VL-XT algorithm showed that more than 30 amino acid long IDRs have 9-57% archaeal proteins, 13-52% bacterial proteins and 48-63% eukaryotic proteins. In contrast, IDRs longer than 50 amino acids were predicted only for less than 10% prokaryotic and archaeal and 25% eukaryotic proteins. [12]. It is estimated that 12% of proteins in eukaryotes are completely disordered [13]. 82-94% of transcription factors have long IDRs [14]. IDRs longer than 30 amino acids also contains 79% proteins associated with cancer and 66% of the proteins involved in signal transduction [15]. Disordered regions are common and present in proteins. IDPs are a good object for bioinformatic protein sequence analysis or molecular modeling.

Intrinsically prediction of disordered regions in practice

In practice the following approach can be used to predict globular regions with secondary structures and intrinsically disordered regions from amino acid sequences. As a first step, one should search for homologous sequences and collect useful literature data about their functional, interacting, and binding regions, posttranslational modifications, domain compositions etc. [13, 16].

Next, to avoid pitfalls one should perform an analysis of homologous sequence composition and complexity, a search for signal peptides, transmembrane regions, leucine zippers, zing fingers, coiled-coil regions, modification sites, known sequence motifs, disulfide bridges, presence of similar crystallographic structures, generate HCA and LCA graphs of sequences etc. [13, 16].

Preparation of homologous sequence alignment is the next step. Alignments have to be supplemented with secondary structure prediction and disorder prediction with few ab initio methods (Secondary structure prediction algoritms: PsiPred, Yaspin, Jpred 3, Sspro, Porter [17], Proteus 2 [18]. Disorder prediction algoritms: GLOBPLOT 2.3, IUPRED 2, PONDR VL-XT, PONDR VSL2, metaPrDOS, Metadisorder [9], PONDR FIT [19]). The conserved sequences, secondary structures and disorder regions have to be specified. Pay attention to sequence deletion and insertions that are likely to occur more often in disordered regions [20]. Consider that disordered regions commonly undergo disorder to order transition upon binding. Such sequences can form non-stable fluctuating secondary structures [13, 16].

In the next step, the domain organization of protein should be proposed. Then the CDF-plot and CH-plot analysis methods can be run to provisionally classify them as ordered, disordered [21] or chameleon morphing sequence regions that could be both [22-24]. Chameleon sequences are potential binding sites. They can be disordered when a protein is isolated and perform a function of molecular recognition features (MoRFs) undergoing disorder to order transition upon binding. MoRFs can adopt diverse secondary structures in different complexes. Their occurrence correlates with ELMs and SLiMs inside long disordered regions [10, 15, 22]

Prediction of MoRFs is a challenge. The first average class algorithms of MoRFs prediction like PONDR VL-XT [25, 26] or ANCHOR [14, 27, 28] are available. New algorithms for identifying different types of MoREs are under development [29, 30]. All collected information from sequence analysis should be compared and verified by available literature and experimental data. Bioinformatic protein sequence analysis of IDPs allow for verification of existing hypotheses and to postulate new one [13, 16].

Application of intrinsic disorder

 

Figure 2

What is more, identification of MoRFs in IDPs can have more practical application. Knowledge of MoRFs can be used in rational drug design. MoRFs are short peptides (about 30 amino acids) inside long disordered regions that bind specifically to a molecular partner. Newly designed drugs can mimic MoRFs or its binding site selectively targeting specific protein-protein interactions (Figure 2). Disordered regions and MoRF prediction has been used by Molecular Kinetics, Inc. to identify 35 781 sequences in the human proteome that possess features of druggability [14, 30, 31].

Conclusion

Prediction of MoRFs in IDRs open new ways in rational drug design because protein‑protein interactions based on disorder-to-order transition of one partner can make ideal druggable targets.

References:

1. Uversky VN: Natively unfolded proteins: a point where biology waits for physics. Protein science : a publication of the Protein Society 2002, 11:739-56.

2. Uversky VN: The mysterious unfoldome: structureless, underappreciated, yet vital part of any given proteome. Journal of biomedicine & biotechnology 2010, 2010:568068.

3. Radivojac P, Iakoucheva LM, Oldfield CJ, et al.: Intrinsic disorder and functional proteomics. Biophysical journal 2007, 92:1439-56.

4. Serdyuk IN: Structured proteins and proteins with intrinsic disorder. Molecular Biology 2007, 41:262-277.

5. Uversky VN: Intrinsically disordered proteins from A to Z. The international journal of biochemistry & cell biology 2011, 43:1090-103.

6. Ashbaugh HS, Hatch HW: Natively unfolded protein stability as a coil-to-globule transition in charge/hydropathy space. Journal of the American Chemical Society 2008, 130:9536-42.

7. Dunker AK, Lawson JD, Brown CJ, et al.: Intrinsically disordered protein. Journal of molecular graphics & modelling 2001, 19:26-59.

8. Uversky VN: What does it mean to be natively unfolded? European journal of biochemistry / FEBS 2002, 269:2-12.

9. He B, Wang K, Liu Y, et al.: Predicting intrinsic disorder in proteins: an overview. Cell research 2009, 19:929-49.

10. Xue B, Hsu W-L, Lee J-H, et al.: SPA: Short peptide analyzer of intrinsic disorder status of short peptides. Genes to cells : devoted to molecular & cellular mechanisms 2010, 15:635-46.

11. Ward JJ, Sodhi JS, McGuffin LJ, Buxton BF, Jones DT: Prediction and functional analysis of native disorder in proteins from the three kingdoms of life. Journal of molecular biology 2004, 337:635-645.

12. Chen JW, Romero P, Uversky VN, Dunker AK: Conservation of intrinsic disorder in protein domains and families: II. functions of conserved disorder. Journal of proteome research 2006, 5:888-98.

13. Bourhis JM, Canard B, Longhi S: Predicting protein disorder and induced folding: from theoretical principles to practical applications. Current protein & peptide science 2007, 8:135-149.

14. Dunker a K, Uversky VN: Drugs for “protein clouds”: targeting intrinsically disordered transcription factors. Current opinion in pharmacology 2010, 10:782-8.

15. Oldfield CJ, Cheng Y, Cortese MS, et al.: Coupled folding and binding with alpha-helix-forming molecular recognition elements. Biochemistry 2005, 44:12454-70.

16. Ferron F, Longhi S, Canard B, Karlin D: A practical overview of protein disorder prediction methods. Proteins 2006, 65:1-14.

17. Pirovano W, Heringa J: Protein secondary structure prediction. Methods in molecular biology (Clifton, N.J.) 2010, 609:327-48.

18. Montgomerie S, Cruz J a, Shrivastava S, et al.: PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation. Nucleic acids research 2008, 36:W202-9.

19. Xue B, Dunbrack RL, Williams RW, Dunker a K, Uversky VN: PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. Biochimica et biophysica acta 2010, 1804:996-1010.

20. Brown CJ, Johnson AK, Dunker a K, Daughdrill GW: Evolution and disorder. Current opinion in structural biology 2011, 21:441-6.

21. Uversky VN, Dunker a K: Understanding protein non-folding. Biochimica et biophysica acta 2010, 1804:1231-64.

22. Uversky VN: Multitude of binding modes attainable by intrinsically disordered proteins: a portrait gallery of disorder-based complexes. Chemical Society reviews 2011, 40:1623-34.

23. Rost B, Eyrich VA: EVA: large-scale analysis of secondary structure prediction. Proteins 2001, Suppl 5:192-9.

24. Uversky VN: Intrinsically disordered proteins may escape unwanted interactions via functional misfolding. Biochimica et biophysica acta 2011, 1814:693-712.

25. Mohan A, Oldfield CJ, Radivojac P, et al.: Analysis of molecular recognition features (MoRFs). Journal of molecular biology 2006, 362:1043-59.

26. Vacic V, Oldfield CJ, Mohan A, et al.: Characterization of molecular recognition features, MoRFs, and their binding partners. Journal of proteome research 2007, 6:2351-66.

27. Mészáros B, Simon I, Dosztányi Z: Prediction of protein binding regions in disordered proteins. PLoS computational biology 2009, 5:e1000376.

28. Dosztányi Z, Mészáros B, Simon I: ANCHOR: web server for predicting protein binding regions in disordered proteins. Bioinformatics (Oxford, England) 2009, 25:2745-6.

29. Cheng Y, Oldfield CJ, Meng J, et al.: Mining alpha-helix-forming molecular recognition features with cross species sequence alignments. Biochemistry 2007, 46:13468-77.

30. Cheng Y, LeGall T, Oldfield CJ, et al.: Rational drug design via intrinsically disordered protein. Trends in biotechnology 2006, 24:435-42.

31. Metallo SJ: Intrinsically disordered proteins are potential drug targets. Current opinion in chemical biology 2010, 14:481-8.

[starrater tpl=45]

Share and Enjoy:
  • Print
  • Digg
  • Facebook
  • Twitter
  • Google Bookmarks
  • LinkedIn
  • PDF
  • Technorati