Diseases caused by RNA viruses are often difficult to control because of the high mutation rate and the continual emergence of novel genetic and antigenic variants that escape from immune surveillance. The degree to which immunity induced by one virus is effective against another is largely dependent on the antigenic differences between them. Foot-and-mouth disease virus (FMDV) is an example of an antigenically variable pathogen that infects many species of cloven-hoofed animals, such as cattle, sheep, pigs and goats, and remains a potent threat to agricultural livestock (Sutmoller et al., 2003). Although FMD vaccines made from chemically inactivated virus particles are in widespread use, control of the disease remains difficult. This is because the vaccines provide only short-lived protection and the virus occurs as seven clinically indistinguishable serotypes (O, A, C, Asia1 and three Southern African Territories serotypes: SAT1, SAT2 and SAT3), each of which has multiple, constantly evolving sub-types (Knowles & Samuel, 2003). Viruses belonging to the SAT serotypes display appreciably greater genomic and antigenic variation in their capsid proteins compared to serotype A and O viruses (Bastos et al., 2001; Bastos et al., 2003; Maree et al., 2011), possibly due to their long term maintenance within African buffalo (Syncerus caffer). Constant surveillance of circulating strains is required to ensure that vaccine stocks remain effective.
In common with other members of the picornavirus family, FMDV has a single-stranded, positive-sense RNA genome. Cell entry in infected hosts is followed immediately by translation of a large open reading frame in the viral RNA. This yields a polyprotein precursor of over 2,000 amino acids that is processed into fourteen distinct capsid and non-structural proteins for virus replication. The majority of this processing is done by the virus-encoded 3C protease (3Cpro), which cleaves the precursor at ten distinct sites. FMDV 3Cpro may also assist infection by proteolysis of host cell proteins and has RNA-binding activity that is important for initiation of replication of the viral RNA (reviewed in Curry et al., 2007b).
Crystallographic analysis of the 3Cpro from a type A FMDV (sub-type A1061) showed that, similar to other picornavirus 3C proteases, it adopts a trypsin-like fold consisting of two β-barrels that pack together to create a centrally-located Cys-His-Asp/Glu catalytic triad in the active site (Allaire et al., 1994; Matthews et al., 1994; Mosimann et al., 1997; Birtley & Curry, 2005; Yin et al., 2005). Subsequent studies on FMDV 3Cpro complexed with peptides derived from the viral polyprotein work revealed that substrate recognition is achieved by conformational changes primarily involving the movement of a β-ribbon (residues 138-150) that helps to secure the position of cognate peptides in relation to the active site of the protein (Sweeney et al., 2007; Zunszain et al., 2010).
Sequence analysis has shown that while variation within FMDV 3Cpro does not rigidly reflect that observed with capsid proteins, the SAT-type 3C proteases generally form a distinct cluster (Van Rensburg et al., 2002). Mapping of the sequence variation between different FMDV serotypes onto the structure of A1061 3Cpro indicated that the peptide-binding face of the protease is completely conserved among the non-SAT serotypes (which are 91–97% conserved in amino-acid sequence), supporting the notion that identification of inhibitors of the protease might aid the development of broad spectrum antiviral drugs (Birtley & Curry, 2005; Curry et al., 2007a). This structure should therefore serve as a useful model for the 3C protease from this group of viruses. However, the same comparison suggested the presence of at least two amino acid differences on the peptide-binding surfaces between A1061 3Cpro and the corresponding 3C sequences from SAT serotype viruses.
To provide a more complete picture of the structural variation between FMDV 3C proteases from different serotypes, we set out to determine the crystal structure of 3Cpro from at least one SAT serotype virus. We report here the cloning and expression of 3Cpro from four distinct SAT1 and SAT2 viruses and the crystal structure of the 3Cpro from a SAT2 serotype virus (SAT2/GHA/8/91).
Materials and Methods
Cloning and mutagenesis
We used the polymerase chain reaction (PCR) to amplify the coding regions for the FMDV 3C proteases of sub-types SAT2/GHA/8/91 (Accession No. AY884136), SAT1/NIG/5/81 (Accession No. AY882592), SAT1/UGA/1/97 (Accession No. AF283456), and SAT2/ZIM/7/83 (Accession No. AF540910). In each case the reaction was performed using DNA primers (Table 1) that introduced 5′ XhoI and a 3′ HindIII restriction sites into the PCR products. These served to facilitate ligation into a version of the pETM-11 vector that had been modified to insert a thrombin cleavage site immediately downstream of the N-terminal His tag (Birtley & Curry, 2005). DNA ligations were performed using the Roche Rapid Ligation Kit according to the manufacturer’s instructions.
Site-directed mutagenesis was performed with the Quikchange method (Stratagene), using KOD polymerase (Novagen). All DNA sequences were verified by sequencing.
Details of the particular modifications made to expressed proteins are given in the Results and Discussion section.
Protein expression and purification
All SAT-type 3C proteases were expressed in cultures of BL21 (DE3) pLysS E. coli (Invitrogen) grown in lysogeny broth (LB) at 37 °C with shaking at 225 rpm. Protein expression was induced for 5 h by the addition of 1 mM isopropyl β-d-1-thiogalactopyranoside (IPTG) once the optical density at 600 nm reached 0.8–1.0. Cells were harvested by centrifugation at 4550 g for 15 min at 4 °C and frozen at −80 °C.
The volumes given below are appropriate for processing the pellet from 1 L of bacterial culture. Cell pellets were thawed on ice and re-suspended in 30 mL Buffer A (50 mM HEPES pH7.1, 400 mM NaCl, 1 mM β-mercaptoethanol) supplemented with 0.1% Triton X-100 and 1 mM phenylmethylsulfonyl fluoride (PMSF) protease inhibitor. Cells were lysed by sonication on ice and lysates clarified by centrifugation at 29,000 g for 20 min at 4 °C. Protamine Sulphate (Sigma) was added to 1 mg/ml final concentration to precipitate nucleic acids, and lysates were then centrifuged again at 29,000 g for 20 min. The supernatant was filtered using a 1.2 µm syringe filter and incubated for 90 min at 4 °C with slow rotation in 1 mL bed volume of TALON metal affinity resin (Clontech) pre-equilibrated with buffer A. This slurry was applied to a gravity-flow column and the TALON beads washed three times with 50 mL of Buffer A supplemented with 0, 5 and 10 mM imidazole respectively. His-tagged 3C proteins were eluted in 20 mL of Buffer A containing 100 mM imidazole, followed by a final wash with 10 mL of Buffer A containing 250 mM imidazole. To remove the His tag the eluted protein was mixed with 100 units of bovine thrombin (Sigma) and dialysed for 16 h at 4 °C in 4 L of Buffer A supplemented with 2 mM CaCl2. Cleaved protein was then re-applied to TALON resin to remove the cleaved His tag and other contaminants. The untagged protease was recovered in the flow through, concentrated using Vivaspin concentraters (3 kD MWCO) (Sartorius Stedim Biotech) and further purified by gel filtration using HiLoad 16/60 Superdex 75 gel filtration column (Amersham Bioscience) in Buffer A supplemented with 1 mM EDTA and 0.01% sodium azide at a flow rate of 0.5 mL/min. Peak fractions were pooled, concentrated and stored at −80 °C. Protein concentrations were determined from absorbance measurements at 280 nm using extinction coefficients calculated with the ProtParam tool (Gastiger et al., 2005).
Crystallisation and structure determination
Crystallisation trials with purified SAT-type 3Cpro were performed at 4 °C and 18 °C using protein concentrations in the range 5–10 mg/mL. Initial screens were done by sitting drop vapour diffusion using a Mosquito crystallisation robot (TTP Labtech). Typically in each drop 100 nL of protein was mixed with 100 nL taken from the 100 µL reservoir solution. Trials were performed with the following commercial screens: crystal screen 1 and 2, and PEG/Ion (Hampton Research); Memstart, Memcys, JCSG+, and PACT (Molecular Dimensions); Wizard 1 and 2 (Rigaku Reagents).
Crystals of g3C-SAT2-G(1-208) for data collection were washed in the mother liquor (15% (w/v) PEG-8000, 0.09 M Na-cacodylate pH 7.0, 0.27 M Ca-acetate, 0.01 M Tris pH 8.5, 0.08 M Na-thiocyanate) supplemented with 20% (v/v) glycerol, and immediately frozen in liquid nitrogen in a nylon loop. X-ray diffraction data were processed and scaled with the CCP4 program suite (Collaborative Computer Project No. 4, 1994), and phased by molecular replacement using the coordinates of type A1061 FMDV 3Cpro (PDB ID 2j92; (Sweeney et al., 2007)) as a search model in Phaser (McCoy et al., 2007). The search model was edited to delete side-chains (to the Cβ atom) for all residues that differed with g3C-SAT2-G(1-208) and to remove all the atoms in the β-ribbon (residues 138-150), since these have been observed to vary in structure between different crystal forms (Sweeney et al., 2007). Model building and adjustments were done using Coot (Emsley et al., 2010); crystallographic refinement was performed initially with CNS (Brünger et al., 1998) and completed using Phenix (Adams et al., 2010).
Results and Discussion
Protein expression and crystallisation
We engineered bacterial expression plasmids for FMDV 3C proteases from four SAT sub-types: SAT2/GHA/8/91, SAT1/NIG/5/81, SAT1/UGA/1/97, and SAT2/ZIM/7/83 (see Materials and Methods) which have 80%, 92%, 82% and 85% amino acid sequence identity respectively with the 3Cpro from FMDV A1061 (Fig. 1). In doing so we were guided by the lessons learned from work to express and crystallise subtype A1061 FMDV 3Cpro, which suggested that preserving the N terminus of the protein but truncating the C terminus by up to six residues would be optimal for solubility and crystallisation (Birtley & Curry, 2005). Accordingly, for each SAT sub-type we generated expression constructs that add a thrombin-cleavable His tag to the N terminus of residues 1-208 of the 213 amino acid 3C protease; following thrombin cleavage there is a single additional Gly residue appended to the N terminus of the protease polypeptide. To ensure the solubility of the SAT-type 3C proteins, we introduced to all constructs a C142A substitution to remove a surface-exposed Cys that had been shown previously to be responsible for protein aggregation (Birtley & Curry, 2005; Birtley et al., 2005). (The C95K mutation also introduced to eliminate aggregation of A1061 FMDV 3Cpro (Birtley & Curry, 2005) was not needed here because residue 95 is an Arg in the SAT 3C proteases used in this study). In addition, the active site nucleophile was eliminated from all constructs by incorporation of a C163A substitution to prevent adventitious proteolysis in highly concentrated samples of purified 3Cpro. For consistency with our earlier naming scheme these SAT2/GHA/8/91, SAT1/NIG/5/81, SAT1/UGA/1/97, and SAT2/ZIM/7/83 3C constructs will be referred to as SAT2/G-g3Cpro(1-208), SAT1/N-g3Cpro(1-208), SAT1/U-g3Cpro(1-208), and SAT2/Z-g3Cpro (1-208) respectively.
The 3Cpro proteins from all four SAT sub-types yielded soluble protein that was purified first by metal-affinity chromatography and then, following thrombin cleavage of the N-terminal His tag, on a gel filtration column (see Materials and Methods). Of the four, SAT1/N-g3Cpro(1-208) appeared to be the most soluble and could be concentrated to 20 mg/mL (see Table 2). The other three variants exhibited some precipitation during gel filtration, indicated by a void peak containing aggregated 3Cpro, which was about one-third of the area of the monomeric peak. They also had lower apparent solubility limits and could be concentrated to ∼6 mg/mL (SAT2/G-g3Cpro(1-208)) or ∼11 mg/mL (SAT1/U-g3Cpro(1-208), and SAT2/Z-g3Cpro(1-208)).
|Protein||Yield (mg per L of culture)||Maximum concentration (mg/mL)||Aggregation|
|SAT1/U- g3Cpro(1-208)||1.2||11.9||+ + +|
|SAT2/Z- g3Cpro(1-208)||2.2||11.3||+ +|
|SAT2/G-g3Cpro(1-207)||1.7||5.6||+ + +|
In crystallisation trials we only obtained crystals from the 3Cpro of a single sub-type: SAT2/G-g3Cpro(1-208). These exhibited a variety of habits but the largest were needle-shaped and were typically 10 µm wide and up to 300 µm long. In initial diffraction tests on beamline ID23-2 at the European Synchrotron Radiation Facility (ESRF) showed that the crystals belonged to a trigonal spacegroup and diffracted to a resolution limit of 2 Å. Unfortunately, for reasons that remain unclear, efforts to reproduce these crystals proved unsuccessful. In subsequent trials diffraction was limited to ∼3 Å.
We used mutagenesis to engineer modifications to the SAT2/G-g3Cpro(1-208) construct in the search for better crystals. Although alterations to trim the C-terminus by one residue (in SAT2/G-g3Cpro(1-207)), or to add back a single His residue (in SAT2/G-g3Cpro(1-207 h))–strategies that had been useful when working with type A1061 3Cpro (Birtley & Curry, 2005)—both yielded soluble protein (Table 2) and SAT2/G-g3Cpro(1-207 h) produced crystals, there was no improvement in the resolution of the diffraction.
In a further effort to enhance crystal quality, we used the Surface Entropy Reduction prediction server (Goldschmidt et al., 2007) to design additional SAT2/G-g3Cpro(1-208) mutants. We made four different mutants, each containing the following pairs of substitutions: (i) K110T/K111Y (ii) K110Y/K111T; (iii) K51A/K54Y; (iv) K51T/K54S. Of these, only the K51A/K54Y mutant gave protein that was as soluble as wild-type. The K110T/K111Y and K51T/K54S double-mutants produced significantly larger void peaks during purification by gel filtration chromatography, while the K110Y/K111T double-mutant appeared almost entirely aggregated under these conditions. For the three surface-entropy mutants that did yield some soluble protein, no useable crystals were obtained.
Structure of SAT2/G-g3Cpro(1-208)
A complete dataset to 3.2 Å resolution was obtained from crystals of SAT2/G-g3Cpro(1-208). The crystals belong to space-group P32 and have a long c-axis (318.5 Å). The diffraction data were phased by molecular replacement using a search model based on the crystal structure of type A1061 FMDV 3Cpro, which is 80% identical in amino-acid sequence to SAT2/G-g3Cpro(1-208) (see Materials and Methods). This gave an unambiguous solution with a log likelihood gain of 1495 (McCoy et al., 2007), revealing five molecules in the asymmetric unit. Though of modest resolution, the initial electron density maps (Fig. 2A) were of sufficient quality to guide adjustment of the initial molecular replacement model prior to multiple interleaved rounds of refinement and model building. Because of the limited resolution and non-crystallographic symmetry, refinement was performed using group B-factors and non-crystallographic restraints. Model building was done conservatively—amino acid side-chains were truncated to the Cβ atom in cases where there was no indicative electron density. The final model of SAT2/G-g3Cpro(1-208) contains residues 7–207 for all five chains and has an Rfree of 27.2% and good stereochemistry; full data collection and refinement statistics are given in Table 3.
|a, b, c (Å)||54.0, 54.0, 318.5|
|α, β, γ (°)||α = β = 90; γ = 120|
|Resolution range (Å)||53.1–3.2 (3.37–3.2)|
|No. of independent reflections||17,053|
|Completeness (%)||99.3 (99.5)|
|No. of Non-hydrogen atoms||7,535|
|Average B-factor (Å2)||119|
|RMS deviations—Bonds (Å)e||0.006|
|RMS deviations—Angles (°)||1.1|
|Ramachandran plot (favoured/allowed) %||89.8/10.2|
|PDB Accession Code||5HM2|
As expected, given the high level of amino acid sequence identity with A1061 3Cpro (80%), FMDV SAT2/G-g3Cpro(1-208) adopts the same trypsin-like fold (Fig. 2B), which has been described in detail elsewhere (Birtley & Curry, 2005; Sweeney et al., 2007). Superposition of the five molecules in the asymmetric unit shows that they are highly similar to one another (Figs. 1 and 2C)—the pair-wise root mean square deviation in Cα positions between chains is 0.2–0.3 Å. The largest differences are observed in the longest surface-exposed loops, the E1–F1 loop in the N-terminal β-barrel and the B2–C2 loop known as the β-ribbon in the C-terminal β-barrel (Fig. 2C). These are also the regions of greatest difference between SAT2/G-g3Cpro(1-208) and A1061 3Cpro; (overlay of the two structures yields an overall rms deviation in Cα positions of ∼0.6 Å) (Fig. 2D). The flexibility of the β-ribbon, which shifts in position to aid peptide binding, has been noted before (Zunszain et al., 2010) and clearly it plays a similar role in SAT-type 3C proteases.
The results reported here provide a template structure of a SAT-type FMDV 3C protease that should be of value in directing molecular investigations of this group of proteases. Although it is frustrating that higher-resolution diffraction data were not obtained, given that initial crystals of SAT2/G-g3Cpro(1-208) diffracted to 2 Å, this should be possible with further optimization. Likewise, since soluble 3Cpro was found to be purified from three other SAT-type viruses—notably SAT1/NIG/5/81—crystal structures for these proteases may well also be achievable.