Viral 3C proteases

I thought I would write out an example that I'm specifically interested in but just don't have the time to develop. This problem is certainly do-able, but it isn't a weekend project for sure. It's also not impossible, mainly its getting good at doing alignments and logic puzzles. I think this is something that could be automated quite nicely so if you are more of a programmer you could do something really sweet with this.

Lots of viruses have 3C or 3C like proteases. Hepatitis A virus, Foot-and-mouth disease, Polio, SARS Coronavirus. Insect viruses like Acute bee paralysis viruses, drosophila C virus, Kashmir bee virus, Taura syndrome virus. Plus plant viruses like Turnip rosette virus, Ryegrass mottle virus etc etc. Check out Picornaviruses or picornavirus superfamilies/related. You get the structures using Blast, Psi-Blast, and stuff like: Positive ssRNA viruses as a starting point.

These viruses express all or part of the genome as a single polypeptide which is specifically processed by a virally encoded protease. Kinda like those cheap plastic model airplanes that come all in one piece that you have to pop them out before you assemble. These viral proteases are highly specific but all the viral proteases have different specificities. It is interesting to know what residues of the proteases are important for specificity and if we could change the specificity of a viral protease to make it more useful in the lab. As a heads up, you want to be looking 4 or 5 residues upstream, 3 or 4 downstream. They say picornaviruses cut between QG, but it could be QS, QM etc etc. A good cut site identification is something like LRTQSFS.

So what you need to do is do massive alignments of all the 3C and 3CL proteases. Make sure that the active site Cys and His's line up (it would be really special to find a protease with a Ser instead of a Cys in the active site - the Sobemoviruses?). Check out the Asp that it aligns as well (not important in the 3CL's which have an extra domain which may need to be removed to do good alignments). Generally when doing alignments, do a more closely related family first, make sure gaps make sense when looking at the structure (i.e. gaps occur in loops). Then add onto this profile (in ClustalX).

The goal would be to create a massive alignment similar to Evolutionary conserved networks, check out Ranganathan R on the web and on pubmed. From this alignment you can look at what residues co-evolved and what might have a role in structure and what would have a role in specificity. There is also an RNA binding pocket on the back of the protease which may impact its specificity and that might also show up in this massive alignment. Massive might be too big for ClustalX, I don't know what alternatives exist or what Ranganathan used.

All these viruses have specificity but most times specificity isn't experimentally determined. In fact, most of the proteases don't have a determined N and C term. What you need to do is look at alignments of parts of the viral proteins. Some proteins have the cut-sites experimentally determined. So you do alignments and make your best guesses as to cleavage sites in undetermined viruses. So what I did (for a few viruses anyway) was align the structural polypeptide, ID potential cut sites. Then guess where the cut sites are on the non-structural polypeptide, align those proteins with known ones (i.e. align the well defined helicases with your unknown ones) and make better guesses as to the cut sites. You should start to build up a database of potential specificities of each viral protease. Yeah yeah, huge job that takes a lot of nit-picking, that's why I haven't done it yet. But once you get the technique down for one the rest won't be as daunting and you'll just get better. Hey, you wanted to know what goes into an important bioinformatics question ^_^ If you're a better programmer than I am you could probably automate this quite nicely, get the program to do your alignments, guess your cut sites and present them to the person to make the judgment call. If you made a program like that it would be really useful to other viral researchers who might be looking at the other proteins etc etc.

You do need to keep track of everything, keep the genbank number (or PDB code for structures). What type of virus, any paper it was referenced in. What are the sequences at each of the cut sites, top choices, second choices.

Anyway, now take your specificity profiles that you have determined for each viral protease and overlay it against your massive family tree. After that you can start to guess what ancestral proteases looked like, how specificity was determined, how to mutate an enzyme with the specificity you want. If you can get the sobemoviruses in there you can maybe get insight into the serine proteases versus the cysteine proteases and what it takes to switch between the two. And those are just the dry-bench questions. This is a road-map for tons of stuff at the wet-bench.

Anyway, that's an outline to a bioinformatics question that is just sitting there and is defined. I just can't do everything and I'm not going to get around to doing this work. If you get something up and running, drop me a line (cereusb(at)gmail.com). I think this is a pretty neat question (so many neat questions, so little time!). I don't think there are any other researchers looking at it either.

This text is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

SuggestionsQuestions

Viral 3C proteases