15 ene 2015

About the visibility of 7 German Education and Psichology journals on Google Scholar

Lüke, T., Wilbert, J., Weichselbaum, M., & Grünke, M. (2014). On the visibility of “Empirische Sonderpädagogik”: A bibliometric analysis. Empirische Sonderpädagogik,  4: 365-372

The most interesting thing about this study is the data about the visibility of seven German Education and Psychology journals that publish articles on Special Education (2010-2013).

    These journals are all indexed on Google Scholar. Only one of them is present in Web of Science, and two of them are in Scopus. Another example of the wider coverage and size of Google Scholar, compared to traditional databases.

The average indexing rate of the articles published in these journals on Google Scholar is 74%, but there are significant differences between them. Three of them are covered entirely (100%) while for the other four, less than 50% of the articles are covered. One of them is practically invisible (11%). Therefore, there is still a number of documents that Google Scholar hasn't been able to discover.


14 ene 2015

On Google Scholar H-Index Manipulation by Merging Articles

Van Bevern, R., Komusiewicz, C., Niedermeier, R., Sorge, M., & Walsh, T. (2014). On Google Scholar H-Index Manipulation by Merging Articles. arXiv preprint arXiv:1412.5498

Google Scholar allows merging multiple article versions into one. This merging affects the H-index computed by Google Scholar. The parameterized complexity of maximizing the H-index using article merges is analized. Multiple possible measures for computing the citation count of a merged article are considered.

Two ways of measuring the number of citations of a merged article are proposed. One of these two new measures seems to be the one actually used by Google Scholar. Second, a model for restricting the set of posible merge operations is also proposed. Although Google Scholar allows merges between arbitrary articles, such a restriction is still motivated: even an untruthful scientist may try to merge only superficially similar articles in order to conceal the
manipulation. The variant in which only a limited number of merges may be applied in order to achieve a desired H-index are also considered. This is again motivated by the fact that an untruthful scientist may try to conceal the manipulation by performing only few changes to the own profile.

In conclusion, an algorithm that maximizes the H-index in linear time if there is only a constant number of versions of the same article is proposed.

13 ene 2015

A Digest of Google Scholar journal metrics: Comparison with impact factor and SCImago journal rank indicator for nuclear medicine journals

Zarifmahmoudi, L., Jamali, J., & Sadeghi, R. (2015). Google Scholar journal metrics: Comparison with impact factor and SCImago journal rank indicator for nuclear medicine journals. Iranian Journal of Nuclear Medicine, 23(1), 8-14.

In the current study, we compared h5-index provided by Google Scholar (GS), impact factor (IF) provided by web of sciences (WOS), and SCImago journal rank indicator (SJR) provided by SCOPUS for quality assessment of nuclear medicine journals.
2013 h5-index, 2012 IF, and 2011 SJR of nuclear medicine journals were extracted from their publishers namely GS, WOS, and SCOPUS. Rank of each journal according to each index was provided. Spearman correlation was used for evaluation of the correlation between metrics.
  • Overall 22 journals were identified. 
  • Spearman correlation coefficients between h5-index and other journal metrics were 0.907 for 2012 IF, 0.979 for 2011 JCR, and 0.978 for 2011 SCOPUS h-index (all p-values<0.00001). 
  • Wilcoxon signed ranks test showed no statistically meaningful difference between rankings according to h5-index and other journal metrics (p values of 0.589, 0.565, and 0.542 for 2012 IF, 2011 SJR, and 2011 SCOPUS h-index respectively)
Our results showed very high correlation between h5-index and other metrics
(all above 0.9). This strong correlation is quite remarkable if we take into account the source of the metrics and the way they are calculated.  In addition to high correlation between different metrics, the ranks of journals according to each source were quite similar again and no statistically meaningful difference was noticed between rankings.
Overall it seems that the new GS journal metrics (h5-index, and h5-median) are reliable and can be used as an alternative to IF or SJR 

Confirms the results found in

Delgado López-Cózar, E.; Cabezas Clavijo, A. (2013). Ranking journals: could Google Scholar Metrics be an alternative to Journal Citation Reports and Scimago Journal Rank? Learned  Publishing, 26 (2): 101-113. 
Delgado-López-Cózar, E &Repiso-Caballero, R., (2013). The Impact of Scientific Journals of Communication: Comparing Google Scholar Metrics, Web of Science and Scopus. Comunicar21(41), 45-52.
Cabezas-Clavijo, A., & Delgado-López-Cózar, E. (2013). Google Scholar and the h-index in biomedicine: The popularization of bibliometric assessment. Medicina Intensiva (English Edition)37(5), 343-354.
Delgado López-Cózar, E.; Orduña Malea, E.; Marcos Cartagena, D.; Jiménez Contreras, E.; Ruiz Pérez, R. (2012). JOURNAL SCHOLAR: Una alternativa internacional, gratuita y de libre acceso para medir el impacto de las revistas de Arte, Humanidades y Ciencias Sociales. EC3 Working Papers 5: 12 de mayo de 2012. 

7 ene 2015

Reviving the past: the growth of citations to old documents

Verstak A, Acharya A, Suzuki H, Henderson S, Lakhiaev M, Chiung Yu Lin C, Shetty N. On the Shoulders of Giants: The Growing Impact of Older Articles. http://arxiv.org/abs/1411.0275


§ How often are older articles cited in scholarly papers and how has this changed over time?
§ How does the impact of older articles vary across different fields of scholarship? 
§ Is the change in the impact of older articles accelerating or slowing down
§ Are these trends different for much older articles?

Unit analysis

Citations from English articles published in scientific journals and conferences (1990-2013)indexed in the 2014 release of Google Scholar Metrics
Citations from English articles published in scientific journals and conferences (1990-2013)indexed in the 2014 release of Google Scholar Metrics
§ This study covers English scientific journals and conferences assigned to one or more subject categories (261) from 2014 release Google Scholar Metrics.
§ The 261 subject categories are grouped into 9 broad research areas.
§ For each journal and conference, all articles with a publication date within 1990-2013 are considered.
§ For each category-year/area-year group, the total number of citations as well as the number of citations to articles published in each preceding year are computed.
§ Three different thresholds for older articles were used: ≥ 10 years old; ≥ 15 years old; and ≥ 20 years old.
§ To see if the rate of change in the fraction of older citations is speeding up or slowing down, the aggregate change for 1990-2001 (first half) and 2002-2013 (second half) for every category are computed.

§ Percentage of citations to older articles (articles that were published at least 10, 15 and 20 years before the citing article) from articles published in English scientific journals, indexed in the 2014 release Google Scholar Metrics.
§ Percentage of citations to older articles for  9 broad areas of research.
§ Rate of change in the fraction of older citations: the aggregate change for 1990-2001 (first half) and 2002-2013 (second half) for every category are computed.

Period analyzed:  1990-2013
Data collection date: Unknown, but data from the 2014 edition of Google Scholar Metrics (which was released in June 2014) is used.
1. In 1990, the 28% of citations were to articles that were at least 10 years old. In 2013, this percentage was 36%, a growth of 28% since 1990 (Table 1).
2. The fraction of older citations increased over 1990-2013 for 7 out of 9 broad areas of research and 231 out of 261 subject categories (Table 1).

3. This growth occurs in almost all scientific disciplines: 102 out of 261 subject categories saw a growth in the fraction of older citations that was over 30%, 44 of them with an increase over 50%. For Business, Economics & Management and Computer Science, almost two-thirds of the subject categories saw a growth over 50%. On the other hand, in Chemical & Material Sciences and Engineering, most of the subject categories show a drop in the fraction of older citations.
4. The change over the second half (2002-2013) was significantly larger than that over first half (1990-2001) — the increase in the second half was double the increase in the first half.

Table 2.Change in the fraction of citations to older articles over 1990-2001 & 2002-2013

5. The trend of a growing impact of older articles also holds for articles that are at least 15 years old and those that are at least 20 years old. If in 1990, 16% of citations were to articles ≥ 15 years old and 10% to articles ≥ 20 years old, in 2013 these figures rose to 21% and 13% respectively, with growth rates higher than 30%.


With an attractive and suggestive title, this work brings up interesting and relevant questions. After defining a set of clear and precise goals, the authors describe a simple and straightforward methodological design - very adequate to answer these questions -, and finally they present clear and convincing results. 

However, the method section should indicate the exact size of the object of study: the number of journals, articles and citations that have been processed. There are some questions that remain unanswered:

- How many journals do they refer to when they say "all the categorized journals and conferences, not only the top 20 per category"?
- How many articles do they refer to when they say  "articles with a publication date within 1990-2013"?
- How many citations do they refer to when they say  "total citations as well as the number of citations for each preceding year, included all the citations from these articles"?
- How many citations have been processed?
- Where are the results for each of the 261 subject categories? Why aren't they included as a table or Appendix?
- Why don't they offer the raw data so it can be analysed by other researchers? 

On another note, it is important to stress that these results refer to journals written in English. Would the results be different if journals written in other languages had been analysed instead? Ruiz & Jiménez (1996) discovered, for a sample of Library and Information Science journals, two different paces of ageing, one for English written journals, and one for the rest of journals.

In order to check if the results shown in this work can be confirmed using other data sources that cover journals written in languages other than English, and at the same time using alternative procedures to calculate the pace of ageing of citations, we decided to replicate this study. To do this, we have used data from Thomson Reuters' Journal Citation Reports.

Since Gross & Gross (1927) introduced the concept of obsolescence, that is, the phenomenon by which scientific publications are decreasingly used over time, various methods have been proposed to measure the ageing process of scientific literature. 

Ruiz & Bailón (1998) empirically analysed five methods, comparing and assessing the quality of the results and the statistical errors each of these methods presented. Among these measures, the most popular one is known by the name of "Half-Life", and was first proposed by Burton & Kebler in 1960. 

Our study uses the "Cited Half-Life" indicator, used by Thomson Reuters, defined as "the median age of the articles that were cited in Journal Citation Reports each year". Every year, Thomson Reuters calculates an "Aggregate Cited Half-Life" for the 53 subject categories present in the Social Sciences edition of the JCR, and also for the 167 Science categories. In order to explain this indicator, we offer the following example: in JCR 2003, the subject category Energy & Fuels has a cited half-life of 7.0. That means that articles published in Energy & Fuels journals between 1997-2003 (inclusive) account for 50% of all citations to articles from those journals in 2003. Since the first time the JCR included the Aggregate Cited Half-Life was in 2003, we have used this year and compared it to the data shown in JCR 2013.

Therefore, we can't accurately replicate the study by Verstak et al., which analyses data from 1990 onwards. However, we'll be able to observe the evolution of this last decade (2003-2013). Moreover, another limitation in our study is that we have not being able to analyse journals from the Arts & Humanities. There is no JCR for A&H, and therefore their Cited Half-Life indicators have not been calculated.

Journal, article and citation data used in the calculation of the cited half-life for the 220 subject categories present in the JCR 2003 and 2013 are presented in Table 3. Since there are many journals indexed in more than one subject category, the sum of these elements is not equal to the number of unique journals, articles and citations.

The results shown in Table 3 are clear:
  • On average, the half-life of citations in journals increases from 7.2 years to 7.5. This is true both for Science & Technology journals (from 7 to 7.4) and Social Sciences journals (from 7.9 to 8.1).
  • This growth is present across the majority of subject categories: there is an increase in 159 categories, a decrease in 42 categories, and in 6 categories the indicator stays the same. There are also 13 categories for which we ignore this information.
  • The disciplines with a higher cited half-life are those linked to the Humanities (History, Philosophy) and the Social Sciences (Sociology, Economics & Business, Psychology). Within Science & Technology, Mathematics, the biological sciences (Zoology, Ornitology), and the Geosciences have also a high cited half-life.
  • The number of disciplines with a cited half-life higher than 10 years in 2013 is twice what it was back in 2003.
  • The subject categories with a higher increase in their cited half-life are quite diverse (Developmental Biology, Microscopy, Engineering Ocean, Peripheral Vascular Disease, Critical Care Medicine, Biochemistry & Molecular Biology, Physics: Fluids & Plasmas, Physics Mathematical).
  • The subject categories where there has been a higher decrease in their cited half-life belong mostly to engineering and chemistry, specially Material Science: Energy & Fuels, Agricultural Engineering, Special Education, Biodiversity Conservation, Parasitology, Chemistry Multidisciplinary, Materials Science: Paper & Woo.

Table 3. Aggregate Cited Half-Life subject categories Journal Citations Reports 2003 and 2013

As can be seen, the results offered in the Journal Citation Reports / Web of Science match very closely the results described by Verstak et al., which have been reached using Google Scholar data.

That said, the reasons that explain this extension in the life cycle of scientific documents are still open to discussion.

The first factor that should be considered has already been studied extensively, and it is the relation between the exponential growth of scientific production and the pace of obsolescence. In 1963, Price suggested this bond, although it was Line, in 1970, who precisely described the relationship between these two phenomena, determining that if the number of published articles grows rapidly, an equally rapid growth in the number of citations to recently published articles can be expected. He confirmed that the faster the pace at which scientific studies are published, the faster these publications become obsolete and stop being cited.

However, in 1993, Egghe mathematically proved how the growth of scientific production modifies the pace of obsolescence, pointing out the technical differences between diachronous and synchronous studies. He confirmed how obsolescence increases in synchronous studies (like the one carried out by Verstak et al.) and decreases in diachronous studies. In various brainy as well as sharp studies (Egghe & Rao 1992a-b, Egghe & Rousseau 2000), they systematically describe in much detail all that is known about this issue, concluding that "the growth can influence aging but that it does not cause aging".

Well, if we know that growth and obsolescence are closely related, what does the increase in citations to old documents reported in this study means? Does it mean that we are in a period of slow scientific growth? Or what is the same, is science growing exponentially like in previous periods? Or, is today's scientific production of a lower quality, not providing as many new discoveries and techniques? These are all interesting as well as disturbing questions (Bohannon 2014).

Figure 1. Accumulated growth of scientific production (nº of documents) over the years in Google Scholar, Microsoft Academic Search, Web of Science and Scopus.

The second factor that may explain the growth in the fraction of citations to old documents, (thus avoiding being quickly forgotten) could be the recent changes in scientific communication brought about by advancements in information and communications technologies (the development of the Internet, the creation and widespread adoption of the Web). The study by Google Scholar's team heads in this direction when they say that widespread citation of older documents is now possible thanks to accessibility improvements to scientific knowledge (digitization of old documents and proliferation of repositories and search engines).

The truth is that these arguments seems quite reasonable, and they are supported by the changes in scientist's reading habits detected by Tenpoir & King (2008). For a sample of US science faculties, they find that the advent of digital technologies on searching, storing and publishing has had a dramatic impact on information seeking and reading patterns in science, science scientists have substantially increased their number of readings, read from a much broader range of sources of articles due to access to enlarged library electronic collections, on-line searching capabilities, access to other new sources such as author websites, and, what is more relevant to the issue at hand, "the age of articles read appears to be fairly stable over the years, with a recent increase in reading of older articles. Electronic technologies have enhanced access to older articles, since nearly 80% of articles over ten years old are found by on-line searching or from citation (linkages) and nearly 70% of articles over ten years old are provided by libraries (mostly electronic collections)".

Apart from the effect that improved accessibility thanks to technological advancement has on the citations to old documents, the influence of Google Scholar on these changes should also be considered. Since it has already been established that Google Scholar has become the most used source for searching scientific information (Orduña et al. 2014a), that it is the largest public source of scientific information in existence (Orduña et al. 2014b, Ortega 2014), and that it grows at a higher pace than its competitors (Orduña & Delgado López-Cózar 2014), we may conclude that this search engine is contributing in a significant way towards this trend. Although Verstak et al. don't explicitly say it, the use of the product's motto "Stand on the shoulders of giants" in the title of the study might be interpreted as suggesting that the growing trend to cite old documents has been in part caused by Google.

There is truth to this claim, since it is undeniable that Google Scholar has revolutionized the way we search and access scientific information (Van Noorden 2014). A clear manifestation of this is the way results are nowadays displayed in most search engines and databases, a key issue that determines how the document is accessed, read, and potentially cited. The "first results page syndrome", which is causing that users are increasingly getting used to access only those documents that are displayed in the first results pages. In Google Scholar, as opposed to traditional bibliographic databases (Web of Science, Scopus, Proquest) and library catalogues, documents are sorted by relevance and not by their publication date. Relevance, in the eyes of Google Scholar, is strongly influenced by citations (Beel & Gipp 2009, Martín-Martín et al. 2014).

Google Scholar favours the most cited documents (which obviously are also the oldest documents) over more recent documents, which have had less time to accumulate citations. Although it is true that GS offers the possibility of sorting and filtering searches by publication date, this option is not used by default. On the other hand, traditional database do the exact opposite: trying to prioritize novelty and recentness in their searches (the criterion the have always thought the user will be most interested in) they sort their results by publication date by default, allowing the user to select other criterion if they are so inclined (citation, relevance, name of first author, publication name, etc...).

The question is served. Is Google Scholar contributing to change reading and citation habits because of the way information is searched and accessed through its search engine? If this is true, we could say that the work of the thousands of intellectual labourers that support with their citations the findings of the heroes of science is resting on the shoulders of a GIANT, and that giant has taken the form of a search engine.

Figure 2.  “Standing on the Shoulders of Giants” Wikipedia, 2013. http://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants


Beel, J., Gipp, B. (2009). Google Scholar‘s Ranking Algorithm: An Introductory Overview. In B. Larsen, J. Leta (Eds.), Proceedings of the 12th International Conference on Scientometrics and Informetrics. Rio de Janeiro, Brazil: University
Bohannon, J. (2014). Older papers are increasingly remembered—and cited. ScienceInsider,  4 November 2014. http://news.sciencemag.org/scientific-community/2014/11/older-papers-are-increasingly-remembered-and-cited#disqus_thread
Burton, RE,  Kebler, RW. (1960). The half-life of some scientific and technical literatures. American Documentation, 11(1):18–22.
Egghe, L. (1993). On the influence of growth on obsolescence. Scientometrics, 27, 195-214.
Egghe, L.,  Rao, I.K. Ravichandra (1992a). Citation age data and the obsolescence function: fits and explanations. Information Processing & Management, 28, 201-217. 
Egghe, L., Rao, I.K. Ravichandra (1992b). Classification of growth models based on growth rates and its applications. Scientometrics, 25, 5-46.
Egghe, L., Rousseau, R. (2000). Aging, obsolescence, impact, growth, and utilization: Definitions and relations. Journal of the American Society for information science, 51(11), 1004-1017.
Gross, P.L.K., Gross, E.M. (1927). College Libraries and Chemical Education. plant pathology. Science, 66: 385-389.
Line, M.B. (1970). The 'half-life' of periodical literature: apparent and real obsolescence. Journal of Documentation, 26, 46-52. 
Martín-Martín, A., Orduña-Malea, E., Ayllón, J.M.,  Delgado López-Cózar, E. (2014). Does Google  Scholar contain all highly cited documents (1950-2013)? Granada: EC3 Working Papers, 19: October 29, 2014. http://arxiv.org/abs/1410.8464
Orduña-Malea, E., Ayllón, J.M., Martín-Martín, A., Delgado López-Cózar, E. (2014a). Empirical Evidences in Citation-Based Search Engines: Is Microsoft Academic Search dead? Granada: EC3 Working Papers, 16: 28 April 2014. http://arxiv.org/pdf/1404.7045
Orduña-Malea, E., Ayllón, J.M., Martín-Martín, A., Delgado López-Cózar, E. (2014). About the size of Google Scholar: playing the numbers.Granada: EC3 Working Papers, 18: 24 July 2014. http://arxiv.org/abs/1407.6239
Orduña-Malea, E., Delgado López-Cózar, E. (2014). Google Scholar Metrics evolution: an analysis according to languages. Scientometrics, 98(3), 2353–2367
Ortega, JL. (2014). Academic Search Engines: A Quantitative Outlook. Elsevier, Chandos Information 
Price, DJS (1963). Little Science, Big Science. New York: Columbia University Press.
Ruiz-Baños, R., & Bailón-Moreno, R. (1998). Métodos para medir experimentalmente el envejecimiento de la literatura científica. Boletín de la Asociación Andaluza de Bibliotecarios, 12(46), 57-75.
Ruiz-Baños, R., Jimenez-Contreras, E. (1996). Envejecimiento de la literatura científica en documentación. Influencia del origen nacional de las revistas. Estudio de una muestra. Revista Española de Documentación Científica, 19(1), 39-49.
Tenopir, C., King, D.W. (2008). Electronic Journals and Changes in Scholarly Article Seeking and Reading Patterns. D-Lib, 14(11/12).
Van Noorden, R. Google Scholar pioneer on search engine’s future. Nature news, 7 November 2014. http://www.nature.com/news/google-scholar-pioneer-on-search-engine-s-future-1.16269?WT.ec_id=NEWS-20141111