Chinese Interpreting Studies: a data-driven analysis of a dynamic field of enquiry

Ziyun Xu; Leonid Pekelis

doi:10.7717/peerj.1249

Chinese Interpreting Studies: a data-driven analysis of a dynamic field of enquiry

Ziyun Xu ¹, Leonid Pekelis²

1Intercultural Studies Group, Universitat Rovira i Virgili, Spain

2Department of Statistics, Stanford University, USA

DOI: 10.7717/peerj.1249

Published: 2015-09-17
Accepted: 2015-08-28
Received: 2015-04-15

Academic Editor: Massimiliano Zanin

Subject Areas: Science and Medical Education, Science Policy, Statistics, Computational Science
Keywords: Scientometrics, Chinese interpreting studies, Citation analysis, Statistical modeling

Copyright: © 2015 Xu and Pekelis
Licence: This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.

Cite this article: Xu Z, Pekelis L. 2015. Chinese Interpreting Studies: a data-driven analysis of a dynamic field of enquiry. PeerJ 3:e1249 https://doi.org/10.7717/peerj.1249

The authors have chosen to make the review history of this article public.

Abstract

Over the five decades since its beginnings, Chinese Interpreting Studies (CIS) has evolved into a dynamic field of academic enquiry with more than 3,500 scholars and 4,200 publications. Using quantitative and qualitative analysis, this scientometric study delves deep into CIS citation data to examine some of the noteworthy trends and patterns of behavior in the field: how can the field’s progress be quantified by means of citation analysis? Do its authors tend repeatedly to cite ‘classic’ papers or are they more drawn to their colleagues’ latest research? What different effects does the choice of empirical vs. theoretical research have on the use of citations in the various research brackets? The findings show that the field is steadily moving forward with new papers continuously being cited, although a number of influential papers stand out, having received a stream of citations in all the years examined. CIS scholars also have a tendency to cite much older English than Chinese publications across all document types, and empirical research has the greatest influence on the citation behavior of doctoral scholars, while theoretical studies have the largest impact on that of article authors. The goal of this study is to demonstrate the merits of blending quantitative and qualitative analyses to uncover hidden trends.

Introduction

There are various channels through which scholars communicate with one another, easing the flow of knowledge and furthering the advance of science. One such important channel comes in the form of citations, which are the result of the duty incumbent upon all scholars to conduct comprehensive and critical reviews of existing literature before embarking on new research, to gain a deep understanding of the field and find the precise empty niche into which their own work will fit, referring to previous related work to bolster their arguments. Though citing other people’s work did not become the norm in scientific writing until the early 1900s (Garfield, 1979), it is now standard and required practice for authors to acknowledge the works of predecessors from which they have drawn inspiration, thereby maintaining the ‘intellectual lineage’ from one generation of academics to the next. Citation analysis has long attracted attention in the scientific community (see for example Garfield, 1972; White & Mccain, 1998; Baumgartner & Pieters, 2003; Vallmitjana & Sabaté, 2008). This is mostly as a consequence of Kuhn’s (1970) ground-breaking work on the nature of science, in which he called on future scholars to recognise the crucial importance of adopting an empirical approach to studying the structure of a scientific community.

Such academic pursuits are particularly relevant in the Translation and Interpreting Studies (TIS) community, because it has experienced a significant growth in both quantitative and qualitative terms over the past two decades, and because hundreds of papers with diverse research methodologies and themes are produced on a yearly basis (Franco Aixelá, 2013). During this period of significant growth, more empirical studies are needed if we are to fully appreciate the patterns of communication and trends in TIS. A number of earlier scholars have used citation data to trace the evolution of the field and understand how scholars communicate with each other (see for example, Pöchhacker, 1995; Gile, 2005; Grbić & Pöllabauer, 2009). However, despite its usefulness, there are limitations to a purely quantitative approach in analysing TIS citation data, and qualitative analysis is called for in order to obtain a fuller picture of the discipline (Gile, 2000). The purpose of this scientometric study is to marry quantitative and qualitative approaches to analysing citation in order to obtain a panorama of CIS’ evolution and reveal its hidden trends and predominant theoretical influences.

Background

Major questions

CIS has been developing rapidly since the 1990s, as evidenced by its increasing number of publications and researchers (Chen, 2009). Using an all-but-exhaustive collection of citation data, three component strands of CIS (journal articles, MA theses, and doctoral dissertations) were studied with the aim of finding changes or differences in patterns of citation. In what ways is the citation network changing? Are authors still primarily influenced by older works or do more recent ones now hold the ascendancy? How do different research methods (theoretical, empirical, etc.) affect the use of citations in the works themselves? The three bodies of literature are generally produced by three distinct groups of authors: established researchers for journal articles and conference proceedings; graduate students for MA theses; and PhD students for doctoral dissertations (Xu, 2014; Xu, 2015a; Xu, 2015b). Examining these three strands individually is necessary if we are to fully understand how each contributes to advancing the field as a whole. Building on earlier studies by the present authors (Xu, 2014; Xu, 2015a; Xu, 2015b; Xu & Pekelis, 2015), which provided an overview of the field from different perspectives, this study uses some of the most sophisticated data-mining techniques currently available to answer the aforementioned complex questions, none of which can be adequately addressed by simple descriptive statistics.

Literature review

The study of research trends in Translation and Interpreting Studies (TIS) is currently dominated by citation analysis (see for example Gile, 2005; Gile, 2006; Gao, 2008; Franco Aixelá, 2004). There are various methods of carrying out citation analysis, but the overall basic concept is always the same. First, a sample of articles is selected; the researcher then counts the number of times each article is cited in other works. Citing (or ‘source’) works can be categorized according to type (conference proceedings, monographs, periodicals, etc.), and a weight assigned to each citation based on various factors: the type of publication in which it is being cited; the number of authors being cited; in the case of co-authorships an author’s contribution to the work being cited (the ‘target’); and others. Finally, a numerical score is calculated for each author, article, research institution, journal or whatever the researcher is focusing on; these scores can then be ranked to indicate each cited individual’s or entity’s relative impact (Lowry, Karuga & Richardson, 2007). The procedure is based on the premise that the number of times a work is cited is a measure of its influence in the academic world.

Citation analysis has increasingly been adopted to map out the historical evolution of a particular area of study, the impact of individual researchers, academic institutions or scientific publications, the extent of collaboration between these, or the influence of certain disciplines on others (Glänzel, 2003; Kalaitzidakis, Mamuneas & Stengos, 2003). In their general study of the technique, Braun, Glänzel & Schubert (1985) found that articles cited between five and ten times each year during the period immediately following their publication tend to be assimilated into the relevant discipline’s ‘universal’ stock of knowledge, and that conversely, if articles go uncited over the same period, there is little chance of such assimilation taking place. Citation analysis has been used in well-established disciplines such as linguistics (White, 2004), psychology (Carr & Britton, 2003; White & White, 1977), and information science (White & Mccain, 1998), but has also been highly useful in assessing research patterns in fields with much shorter histories, such as TIS (Gile, 2005).

Given the increasing popularity of citation analysis, Garfield’s Institute of Scientific Information (ISI) produced the first citation index¹ for articles published in academic journals shortly after it was founded in 1960. The ISI has since produced numerous other indexes, which have grown to encompass more than 40 million records and 8,700 research journals (Meho, 2006) and are now accessible online via Thomson Reuters’ Web of Science. Although originally designed to facilitate access to information, the indexes are now widely recognized as an important source of empirical data for scientometric research (Ivancheva, 2008).

Despite the growth in use of citation indexes, the exponential expansion of scientific research into new disciplines over the past four decades has resulted in numerous high-quality journals being excluded from the ‘baskets’ used by the leading indexes. To facilitate improved communication among researchers in the field of interpreting, in 1990 Daniel Gile set out to create an international network—the Conference Interpreting Research Information Network (CIRIN)—which publishes a biannual Bulletin. Since then several other searchable databases have been created for this discipline: the Bibliography of Interpreting and Translation (BITRA), for example, carries over 50,000 entries and is updated on a monthly basis, while the Translation Studies Bibliography (TSB) subscription service has 24,500 entries to date.

Gile (2005) surveyed citations from 47 papers on translator and interpreter training written by Western academics to find out which theories were most influential, the languages that target works were most often written in, and whether empirical or non-empirical research had more influence. The interpreter training material he sampled for the study revealed several interesting points: the model advocated by the Association Internationale des Interprètes de Conférence (AIIC) was the most frequently cited theory, while functional theories were dominant in translator training; the majority of the cited literature was written in English; and empirical research played very little part in the papers sampled. In another study (2006) he introduced a qualitative dimension to his analysis by grouping citations into different categories (concepts, methods, findings, etc.), on the assumption that such an approach would provide a more nuanced analysis of each category’s impact on the evolution of Translation and Interpreting Studies (TIS). The study revealed that scholars were cited on their methods and findings in less than 10% of the articles in the corpus. Adopting the same classification scheme, Nasr (2010) examined a corpus of 542 texts on translator training. Her study produced a similar result, indicating that empirical research was not influential in shaping research into that subject either.

By developing methodologies based on citation analysis, earlier researchers have laid the groundwork for assessing the impact of an individual’s work and tracing the evolution of a field. In addition to quantitative analysis, qualitative approaches have been proposed to study how scholars cite one another. However, the application of these methodological techniques to investigating the evolution of CIS has to date been very limited. The goal of the present study was to adopt a blended approach with equal emphasis on both quantitative and qualitative considerations to explore how the CIS citation network changes over time and how different research methodologies have affected citation behaviours.

The Present Study

Expanding on the broad themes of enquiry outlined at the beginning of this paper, three more in-depth mini-studies were drawn up to address some of the major issues unresolved by previous researchers. The rationale for each is summarized in the following section.

Data organization

The authors created a near-comprehensive database of 59,303 citations from the 1,289 Chinese MA theses, 32 doctoral dissertations and 2,909 research papers available to them. The CIS literature was collected from multiple databases and other sources: field trips to university libraries, interlibrary loans, book purchases, and academic databases such as CNKI, Wanfang and the National Digital Library of Theses and Dissertations in Taiwan. Publications with no bibliographic references were excluded from the analysis. Every effort was made to ensure that the field was covered as widely and exhaustively as possible. Though the contents of a handful of embargoed theses were inaccessible, the present study authors analysed those of their features that were available, such as titles, publication years and abstracts. Essentially the entire population of CIS studies was sampled, making it possible to generalize the conclusions of this study to the entire field. Once collected, the references were manually entered into a relational database which uses Structured Query Language (SQL) for managing data.

Study 1

Research question

Do CIS authors tend repeatedly to cite ‘classic’ papers, or are they more drawn to the latest research within the field? How can the progress of CIS be quantified by means of citation analysis?

A number of scholars (Merton, 1967; Lederberg, 1972; Garfield, 1977) have observed that at the same time as science constantly moves forward, there exists a phenomenon known as obliteration: the pace of scientific progress is so rapid, and new findings become so quickly and thoroughly absorbed into the ‘general stock’ of knowledge, that a great deal of work is quickly ‘forgotten’ by the academic community. The phenomenon is particularly noticeable in exact sciences, in which authors seem consistently to build upon relatively recent research, the time lag between an author and the work he cites remaining fairly constant (Van Raan, 2010). At the same time, other scholars have observed the long-lasting impact of ‘classic’ works on the evolution of a field. For instance, Franco Aixelá’s 2013 study of the most cited works in Western Translation Studies (WTS) revealed that almost all the most frequently cited papers were “classics” published well before the 2000s, a finding which appears to suggest that WTS scholars have a marked preference for deepening and widening their understanding of the ages-old issues of translation and otherwise carrying on the intellectual lineage of classical authors.

The research by the aforementioned authors points to two contrasting patterns of knowledge flow existing in tandem. Merton (1967), Lederberg (1972), Garfield (1977) and Van Raan (2010) identified a scenario whereby knowledge flows at a steady rate, referred to in the remainder of this chapter as ‘perfect research flow’. By contrast, Franco Aixelá (2013) has observed deviations from this scenario, proposing a continuum of flow rates reaching to the extreme opposite of ‘research stagnation’. The aim in this section was to discover whether or not the CIS community followed this academic tradition of WTS’ and, more generally, to examine how the field’s progress could be illustrated by means of citation analysis.

Research methodology

Two null hypotheses were tested: the first was that of ‘research stagnation’²—this tests whether new papers are not constantly being cited; and the second was that of ‘perfect research flow’—this tests whether the citation process is stationary.³

The hypothesis of research stagnation

Research stagnation occurs when articles published after a given year (t) suddenly cease completely to be cited. One scenario which can lead to this state of affairs is when articles published before year t are so influential that they ‘drown out’ all citations from ones published after it. This hypothesis is rejected if new papers are being constantly cited.

The hypothesis of perfect research flow

Perfect research flow occurs when the citation process is stationary. The following example illustrates a case of perfect research flow: for articles published in a given year t, let us suppose that no citations come from year t − 4 or earlier, and that most citations come from papers published in year t − 3, with half as many for each successive year down to t itself. Perfect research flow comes about when this distribution of citations is true for all the years t examined in the study.

A typical scenario that would cause this hypothesis to be rejected would be if a few very influential (‘classic’) articles were published in a given year t₀ and cited more than the average article, even ten years later: in this case the citation process would indeed not be stationary, because in year t₀ + 10 citations of this article published ten years previously would still be being produced! We therefore would not be dealing with a case of perfect research flow.

Hypothesis testing

The aforementioned hypotheses concern the distribution of the citation process. To test them, all the papers published in year t and the years of all citations contained in those papers were identified. The distribution of papers cited in year t was estimated as the average number of citations per paper published in year t coming from each previous year: t − 1, t − 2, t − 3 and so on. The same methodology was applied to all publication years between 1990 and 2013. Once the distribution of cited papers for each year t was established, it was possible to test whether the figure was stagnant, and, by measuring how it changed from year to year, whether it was stationary.

These two hypotheses were tested by comparing the performances of two models—one for each hypothesis—to that of a third, namely a varying coefficient model (VCM).

VCMs are more generalized versions of regression. Regression expresses the value of an output as a combination of different types of input (or predictors). Each input has an associated coefficient which signifies the importance of its contribution to explain the output. In varying coefficient models, the coefficients themselves vary with other variables, which may or may not be connected to the predictors. For example, in the context of a chemistry experiment, we may get very different coefficients in a linear regression of amount of reagent created depending on outside parameters such as temperature. A VCM would give a better fit: the coefficients in the regression are functions of the temperature (this is not the same as including the temperature as a predictor, since the dependence of reagent created to the temperature is not direct or linear). An overview of the theory behind VCM models can be found in Hastie & Tibshirani (1993), and Fan & Zhang (2008) contains an excellent review of the many ways in which such models are applied and implemented.

We used a Varying Coefficient Model to fit a citation process somewhere between research stagnation and perfect research flow in the following way. In each model the output was the average number of citations per paper published in target year t. For research stagnation the input was the raw source year of the citations (e.g., a paper is published in 1996). For perfect research flow the input was the relative source year of the citations (the relative source year of papers published in year t − i is i). For both of these models the input coefficients were forced to be fixed across target year t. Finally, we fit the third—VCM—model with the same inputs as for perfect research flow, but allowed the coefficients to vary smoothly with source year.

All three models were fit using a generalized linear model with Poisson link function. Additionally, the VCM was fit using locally weighted least squares and a Gaussian kernel.

To test the null hypotheses of research stagnation and perfect research flow we examined whether the VCM model fit the data significantly better than either of the first two using a generalized deviance difference test as proposed in Fan, Zhang & Zhang (2001). Since the technical details and assumptions of such tests often depend in complex ways on particular features of the data-set, a bootstrap procedure was used to calculate p-values non-parametrically. In other words, the p-values for this analysis are adaptive and automatically fair to the features of the current data-set. For a general description of hypothesis testing with the bootstrap, see Efron & Tibshirani (1994), in particular Chapter 16. Table 1 below contains the resulting p-values.

Table 1:

Evaluation of VCMs.

	Research stagnation deviance	Perfect research flow deviance	VCM model deviance	P-value VCM < RS	P-value VCM < PRF
MA	341.9	845.40	247.22	<0.001	<0.001
Journal	3,871.26	10,124.87	1,909.24	<0.001	<0.001
PhD	67.0	80.45	29.25	<0.001	<0.001

DOI: 10.7717/peerj.1249/table-1

A more detailed description of our statistical methods—model description, fitting procedure and hypothesis tests—can be found online at: http://interpretrainer.com/VCM_Justification.pdf.

Results and discussions

Figure 1 represents the distribution of citation processes for MA theses, doctoral dissertations and journal articles in different years.

Figure 1: Normalized incoming citations for three strands of CIS research.

Download full-size image

DOI: 10.7717/peerj.1249/fig-1

Hypothesis of research stagnation

Figure 1 indicates definite movement over time for the incoming citation curves. If all the curves in panel 4 of the figure had looked the same, this would have supported the hypothesis that the field of CIS is static. This is not the case here: the ‘peaks’ in the curves move forward from year to year and do not ‘stagnate’ at a given year. In sum, the figures suggest that CIS research is moving forward.

In addition, the hypothesis of research stagnation was rejected on statistical grounds: more recent CIS publications were constantly being cited, as opposed to classic papers receiving the majority of citations as time went by, and that caused the model corresponding to research stagnation to fit less accurately the data than did the VCM model, as demonstrated by the very low p-values for the corresponding tests⁴ (see Table 1 for more information).

While newer citations may not necessarily contain innovation—instead simply restating the positions found in classic works—there is assuredly some foundation to Zuckerman’s argument (1987) that the use of more recent citations nonetheless indicates that academic inquiry is moving forward. The argument is as follows: a cited paper (Paper A) gains influence when it is cited by multiple authors; however, authors may sometimes be inclined to cite other more recent papers that specifically refer to Paper A, as opposed to citing it directly. While these more recent publications may or may not generate new findings or innovative material, they effectively serve as an intellectual conduit connecting contemporary researchers with past foundational work. Paper A has become so thoroughly incorporated into the field’s stock of knowledge, has become so fundamental to it, that authors feel no need to make explicit reference to it. Therefore, the rejection of the research stagnation hypothesis indicates that contemporary researchers build on more recent work and that academic enquiry is moving forward.

Even though it was both visually and statistically confirmed that CIS is moving forward, whether it has been doing so at a steady pace remains an open question. Rejection of the research stagnation hypothesis tells us nothing about how research evolves, in particular it sheds no light on the question of whether the flow of research is ‘perfect’ in the sense that the distribution of citations remains the same from year to year. Hence the need to test the hypothesis of perfect research flow.

Hypothesis of perfect research flow

Perfect research flow is the extreme opposite of stagnation; it means that papers are cited in exactly the same fashion every year. Figure 1 also enables us to grasp visually the rejection of the stationarity hypothesis. If this hypothesis were true, it would imply that the lines shown in the plots did not change with source year. This is clearly not the case here.

Additional statistical analysis was conducted to confirm the visually striking evidence in Fig. 1 regarding the hypothesis of perfect research flow. Indeed, this hypothesis was rejected on statistical grounds, because the Varying Coefficient Model fit better to the citation data than did the model corresponding to perfect research flow.⁵ Once again this rejection is demonstrated by the very low p-values of the corresponding tests in Table 1.

Hypothesis testing and graphical interpretations

To test both of the previously mentioned hypotheses a VCM model was used first to describe the data as accurately as possible, then this model’s performance was tested to compare it with those of the models corresponding to each hypothesis.

For each year t, a spline was fit to incoming citations as a function of |t − i|, where i was the year of publication of the cited article. The VCM model was constructed so that it would be easy to control the variation of the coefficients over time.

The resulting graphs (see Figs. 2–4) can be likened to a frame-by-frame film of the evolution of incoming citations over time.

Figure 2: Trends in citations for research papers.

Download full-size image

DOI: 10.7717/peerj.1249/fig-2

Figure 3: Trends in citations for MA theses.

Download full-size image

DOI: 10.7717/peerj.1249/fig-3

Figure 4: Trends in citations for doctoral dissertations.

Download full-size image

DOI: 10.7717/peerj.1249/fig-4

The red line is the fit for the VCM model and can be considered the average citations count for that year; the blue dots are the actual number of citations produced in each year; and the grey shaded areas represent a 95% confidence interval for the red line. The grey headers show the year under consideration—for example, ‘2000’ means that all the papers written in 2000 were examined to ascertain the number of citations in them dating from 2000 (t), 1999 (t − 1), 1998 (t − 2), and so forth.

Examination of the incoming citation data revealed that recent papers were regularly cited within an interval of a year or two—this trend was particularly obvious from 2009 to 2012. Moed (2005) has argued that an author might include a certain reference not only because its content fits the flow of an argument, but because he believes the scholar he is citing has gained a certain stature in the field and will lend credibility to his own ideas. For example, it would be more credible to cite a definition of empirical research formulated by a scholar who has conducted extensive studies of that type than by one whose focus is purely theoretical. The finding that recent papers are cited so soon indicates that newer research has a more or less instant impact on the latest studies and that CIS research is in a state of continuous progression. It was also remarked that, in disregard of the 1–2 year rule mentioned above, citations from material published in 1990 were made in CIS papers throughout the period under study, suggesting that that year may have seen the publication of particularly influential material, whose impact on research has been especially long-lasting. On further examination of the incoming citations, Hu Gengshen’s (1990) An overview of interpreting research in China stood out as the aforementioned material. Hu’s paper took a scientometric approach to assessing the themes and trends in interpreting research. From the Y axis it was also clear that many more citations were being made in later years, probably because the number of CIS papers being written was increasing year on year.

The situation for MA theses was slightly different from that of journal articles, though research was moving forward here too. These authors were somewhat hesitant to cite recently completed theses, preferring those produced at least three years previously, which they could be sure, had been adopted by the academic community and become established. It was also noticeable that material produced in 1996 was cited by numerous MA authors in all subsequent years, suggesting that some very influential work was produced in that year. Detailed analysis revealed that work to be Ru Mingli’s (1996) thesis Interpreting quality and the role of the interpreter from the perspective of users, which was produced under the supervision of Chen Yongyu. It should be noted, however, that MA thesis authors cited their predecessors’ work far less often than the authors of research papers did theirs: in 2010, for example, research papers produced in 2008 were cited no fewer than 148 times; the same figure for MA theses was a mere 22. There are two possible reasons for this phenomenon. Firstly, a number of researchers (Lawrence, 2001; Harnad & Brody, 2004; Hajjem, Harnad & Gingras, 2005) have identified that open-access articles receive a substantially higher number of citations than those that require a subscription—this is true across many disciplines including computer science, physics, sociology and psychology. Proceeding from their findings, it is reasonable to speculate that the difficulty—and expense—of obtaining access has contributed to the significantly lower number of CIS theses being cited in comparison to research papers. Secondly, in the academic world MA theses are generally considered to be of lower quality than research papers, which have gone through rigorous peer review.

Given that the total number of doctoral dissertations was only 32, little in the way of trends was observable. It should be noted, however, that a particular doctoral dissertation produced in 2008 was consistently quoted by later PhD authors in the period 2010–13—this was Gong Longsheng’s (2008) An analytical study of the application of Adaptation Theory in interpreting, written under the supervision of Dai Weidong. Gong is such a well-established and visible academic within the CIS community,⁶ that it is hardly surprising that his work might attract a large number of incoming citations.

To conclude, two null hypotheses were both visually and statistically rejected: research stagnation and perfect research flow. To perform those tests two models corresponding to each of the hypotheses, and a third, the Varying Coefficient Model, were constructed. The three were tested to see how well they fit the CIS citation data. Both hypotheses were rejected, because the first two models performed poorly in comparison with the VCM model. Analysis of the citations yielded enough evidence to say that this field is going forward, though not at a uniform pace.