26 jun 2015

2015 Google Scholar Metrics: happy monotony

2015 Google Scholar Metrics released 


No surprises. Almost with the punctuality of a fine Swiss watch, Google released, on Thursday, June 25th, 2015,  12:16 PM, its ranking of scientific publications: Google Scholar Metrics (GSM). Last year’s version was published a one month before, on Thursday, June 26, 2014,5:24 PM. Google has stopped being different: it seems that from now on, these coveted lists of publications sorted by their scientific impact (that is, their h index) will be released every. This means that GSM: Google Scholar Metrics will join it competitor (JCR:Journal Citation Reports) in updating the product on a yearly basis: JCR 2015 appeared last week.

We can only welcome that the American company has decided to keep supporting GSM, a free product which is also very different from traditional journal rankings. Competition is healthy, and scientists can only be pleased about this variety of search and ranking tools, especially when they are offered free of charge.

Continuity and stability are the norm in this edition. The total number of publications that can be visualized in the 2015 rankings is 7,211. Now, however, since 1,761 of them (24.2%) are classified in more than one subject area, the number of unique publications is lower: 5,450.
In short, Google has just updated the data, which means that some of the limitations outlined in previous studies still persist [1-5]: the visualization of a limited number of publications (100 for those that are not published in English), the lack of categorization by subject areas and disciplines for non-English publications, and normalization problems (unification of journal titles, problems in the linking of documents, and problems in the search and retrieval of publication titles). As an example, it is inexcusable that there are duplicates to be found in a ranking of the top 100 publications (according to their h5-index) of a particular language. This is the case with the journal Íconos-Revista de Ciencias Sociales, which appears in the 99nd and 100th positions in the spanish rankings (have been rectified now);  but in honor of the truth, errors are lower than in previous years. 

In our previous studies, we have described again and again the underlying philosophy embedded in all of Google’s academic products. These products have been created in the image and likeness of Google’s general search engine: fast, simple, easy to use, understand and calculate?, and last but not least, accessible to everyone free of charge. GSM follows all these precepts, and it is, in the end, nothing more than:

- A hybrid between a bibliometric tool (indicators based on citation counts), and a bibliography (a list of highly cited documents, and of the documents that cite them).
- It offers a simple, straightforward journal classification scheme (although it also includes some conferences and repositories).
- It is based on two basic bibliometric indicators (the h index, and the median number of citations for the articles that make up the h index).
- It covers a single five-year time frame (the current one being 2010-2014).
- It uses rudimentary journal inclusion criteria, namely: publishing at least 100 articles during the last five-year period, and having received at least one citation.
- It provides lists of publications according to the language their documents are written in. For all of them, except for English publications (these are a total of 8: Chinese, Portuguese, German, Spanish, French, Japanese, Dutch, and Italian) it offers lists of only 100 titles: those with the higher h index. For English publications, however, it shows a total of 4655 different publications, grouped in 8 subject areas. For each publication, it shows the titles of the documents whose citations contribute to the h index, and for each one of these documents, in turn, the titles of the documents that cite them.
- It provides a search feature that, for any given set of keywords, will retrieve a list of 20 publications whose titles contain the selected keywords. In the cases where there are more than 20 publications that satisfy the query, only the first 20 results, those with a higher h index, will be displayed.
- It doesn’t perform any kind of quality control in the indexing process nor in the information visualization process.

To sum up, GSM is a minimalist information product with few features, closed (it cannot be customized by the user), and simple (navigating it only takes a few clicks). If GSM wants to improve as a bibliometric toolit should incorporate a wider range of features. At the very least, it should: 

- Display the total number of publications indexed in GSM, as well as their countries and language of publication. Our estimations lead us to believe that this figure is probably higher than 40,000 [6]. In the case of Spain, there are over 1,000 publications indexed, which make up about 45% of the total number of academic publications in Spain [7-9].
- Provide some other basic and descriptive bibliometric indicators, like the total number of documents published in the publications indexed in GSM, and the total number of citations received in the analysed time frame. These are the two essential parameters that make it possible to assess the reliability and accuracy of any bibliometric indicator. Other indicators could be added in order to elucidate other issues like self-citation rates, impact over time (immediacy index), or to normalize results (citation average).
- Provide the complete list of documents of any given publication that have received n citations and especially those that have received 0 citations. This would allow us to verify the accuracy of the information provided by this product. It is true, much to Google’s credit, that this information could be extracted, though not easily?, from Google Scholar.
- Provide a detailed list of the conferences and repositories included in the product. The statement Google makes about including some conferences in the Engineering & Computer Science area, and some document collections like the mega-repositories arXiv, RePec and SSRN, is much too vague.
- Define the criteria that has been followed for the creation of the classification scheme (areas and disciplines), and the rules and procedures followed when assigning publications to these areas and disciplines.
- Enable the selection of different time frames for the calculation of indicators and the visualization and sorting of publications. The significant disparities in publishing processes and citation habits between areas (publishing speed, pace of obsolescence) require the possibility to customize the time frame according to the particularities of any given subject area.
-  Enable access to previous versions of Google Scholar Metrics (2007-2011, 2008-2012, 2009-2013) to ensure that it is possible to assess the evolution of publications over time. Moreover, they could dare venture into the unknown and do something no one else has done before: a dynamic product, with indicators and rankings updated in real-time, just as Google Scholar does.
- Enable browsing publications by language, country and discipline, and directly display all results for these selections.
- Remove visualization restrictions: currently 100 results for each language and 20 for each discipline or keyword search.
- Enable the visualization of results by country of publication and by publisher.
- Enable sorting results according to various criteria (publication title, country, language, publishers), as well as according to other indicators (h index, h median, number of documents per publication, number of citations, self-citation rate…).
- Enable searching not only by publication title, but also by country and language of publication.
- Enable an option for exporting global results, as well as results by discipline, or those of a custom query.
- Enable an option for reporting errors detected by users, so they can be fixed (duplicate titles, erroneous titles, incorrect links, deficient calculations…).
Dixit a year ago

Isolated errors we've found in this edition:


1. Delgado López-Cózar, E; Cabezas-Clavijo, Á (2012). Google Scholar Metrics updated: Now it begins to get serious. EC3 Working Papers 8: 16 de noviembr de 2012. Available: http://digibug.ugr.es/bitstream/10481/22439/6/Google%20Scholar%20Metrics%20updated.pdf
2. Delgado-López-Cózar, E., y Cabezas-Clavijo, Á. (2012). Google Scholar Metrics: an unreliable tool for assessing scientific journals.El Profesional de la Información, 21(4), 419–427. Available: http://dx.doi.org/10.3145/epi.2012.jul.15
3.Cabezas-Clavijo, Á., y Delgado-López-Cózar, E. (2012). Scholar Metrics: el impacto de las revistas según Google, ¿un divertimento o un producto científico aceptable? EC3 Working Papers, (1). Available: http://eprints.rclis.org/16830/1/Google%20Scholar%20Metrics.pdf
4. Cabezas-Clavijo, Álvaro; Delgado López-Cózar, Emilio (2013). Google Scholar Metrics 2013: nothing new under the sun. EC3 Working Papers, 12: 25 de julio de 2013. Available: http://arxiv.org/ftp/arxiv/papers/1307/1307.6941.pdf
5. Martín-Martín, A.; Ayllón, J.M.; Orduña-Malea, E.; Delgado López-Cózar, E. (2014). Google Scholar Metrics 2014: a low cost bibliometric tool. EC3 Working Papers, 17: 8 July 2014. http://arxiv.org/pdf/1407.2827
6. Delgado López-Cózar, E.; Cabezas Clavijo, A. (2013). Ranking journals: could Google Scholar Metrics be an alternative to Journal Citation Reports and Scimago Journal Rank? Learned Publishing, 26 (2): 101-113. Available: http://arxiv.org/ftp/arxiv/papers/1303/1303.5870.pdf  
7. Delgado López-Cózar, E.; Ayllón, JM, Ruiz-Pérez, R. (2013). Índice H de las revistas científicas españolas según Google Scholar Metrics (2007-2011). 2ª edición. EC3 Informes, 3: 9 de abril de 2013. Available: http://digibug.ugr.es/handle/10481/24141  
8. Ayllón Millán, J.M.; Ruiz-Pérez, R.; Delgado López-Cózar, E. Índice H de las revistas científicas españolas según Google Scholar Metrics (2008-2012). EC3 Reports, 7 (2013). Available: http://hdl.handle.net/10481/29348
9. Ayllón, Juan Manuel; Martín-Martín, Alberto; Orduña-Malea, Enrique; Ruiz Pérez, Rafael ; Delgado López-Cózar, Emilio (2014). Índice H de las revistas científicas españolas según Google Scholar Metrics (2009-2013). EC3 Reports, 17. Granada, 28 de julio de 2014. Available: http://hdl.handle.net/10481/32471
Granada, June 26, 2015, 22:10 PM.

8 jun 2015

Honouring the pioneers of Bibliometrics & Scientometrics / Library & Information Science by creating their Google Scholar Citations Profiles

It is a pleasure for our group to present a portal from which you’ll be able to access the bibliographic profiles- created on Google Scholar Citations- of 39 scholars, now deceased, who played an outstanding role in the creation and consolidation of the fields of Library and Information Science (29 profiles) and Bibliometrics (10 profiles). These profiles are already accessible on Google Scholar. You may access them from:


This initiative seeks to fulfill the following objectives:
  • To pay public tribute to those researchers and professionals who dedicated their life’s efforts to establish new scientific disciplines and fields of study. Rescuing their work from bibliographic oblivion and keeping their memory alive are the main motives behind this project.
  • To provide these authors with a digital bibliographic identity, in order to improve the visibility of their scientific production, and learn about the impact their work has had on the scholarly community, thanks to the citation counts Google Scholar provides.
  • To test the capabilities of Google Scholar Citations for identifying the bibliographic production of an author whose professional activities ceased many years ago (in some cases, more than two centuries), as well as to test its performance as regards the detection of citations to these works, making note of the potential technical issues the study of these cases may bring to light.

Recovering the memory of the researchers that came before us and keeping it alive is an act not only of intellectual gratitude, but also of justice. Scientific progress is based on cumulative knowledge, that is, the constant flow of ideas and knowledge among scientists. The act of reusing the findings, experiments and ideas developed by other scientists for new purposes, thus creating an infinite chain, is one of the principles at the base of the scientific enterprise. New knowledge can only spawn from previous knowledge. The science of a generation becomes tradition for the next one. In conclusion, science never starts from a clean slate.

With this initiative we mean to acknowledge our intellectual precursors, in keeping with the famous motto “Standing on the shoulder of giants” which, despite being wrongly attributed to Newton, is still fully valid. Very appropriately, this motto was also adopted by Google Scholar.

Although we are convinced that history and the test of time are the best judges of the relevance and quality of the discoveries and ideas transmitted through publications, we are also aware that when the time comes for a scientist to retire, and particularly when he or she passes away, the memory of his/her work may tend to disappear slowly as his/her contemporaries and disciples also disappear. To sum up, when an author’s ideas find no heirs who continue those lines of research, they will probably fade into obscurity.

Thanks to current information search and storage technologies, today it is possible to restore that bibliographic memory, now in digital form. However, it may be that the scientific production of these authors is scattered through the multiple nodes that constitute the Web. If this is the case, the bibliographic identity of an author will appear fragmented, and his/her visibility lessened for this reason.

Our intention is to make use of Google Scholar, the main academic search engine in existence, and specifically Google Scholar Citations, to locate all the bibliographic traces of those authors. As is well known by now, on the 20th of July, 2011, Google presented Google Scholar Citations (GSC), although only a limited number of people were able to test it at first. Also known by some people as Google Scholar Profiles, this tool was designed to enable scientific authors to build a bibliographic profile of all their publications (providing they are indexed on Google Scholar). The intention is to improve the visibility of both authors and publications. Additionally, the product also presents the number of times each document has been cited, and from that data it automatically generates various basic bibliometric indicators: total number of citations, the h-index, and the number of documents which have received at least 10 citations (i10-index). These three indicators are computed for citations received since the researcher published his/her first document, and also for the subset of citations which have been received in the last five complete and the current year.

The process of building profiles for deceased authors constitutes a bibliographic challenge of no small proportion, since it requires collecting bibliographic information which is usually scattered and very poorly normalized. If there is one tool that, as of today, allows us to perform an exhaustive bibliographic search of this kind, that tool is Google Scholar. As a starting point, and in order to facilitate this task, we consulted the following sources:
  • Wikipedia entries for each of the authors. Although not all of them had one, most of them had, and the information contained in them gave us a general and concise view of the authors’ personal and professional lives.

  • Worldcat Identities entries for each of the authors. This is an experimental product developed by OCLC which intends to gather in automated fashion all bibliographic information on WorldCat and other online resources about people and organizations.

By making use of the information found in those sources, and the various search options Google Scholar Citations offers (author, title, keywords), we proceeded to search the production of these authors on Google Scholar. Since this task is not only monotonous but also complicated because of the multiple variants that may exist on Google Scholar of a same document, it wouldn’t be surprising if some omissions or misattributions had crept in. In order to avoid this, a manual revision as well as a minimal normalization of each record has been carried out. But, as is well known, manual tasks of this kind are prone to errors, for which we apologize in advance.This Project is, of course, part of the research line the EC3 Research Group initiated to discover the inner depths of Google Scholar and test its suitability as a tool for research evaluation. In this case, the goal is to test the capabilities of Google Scholar Citations for identifying the bibliographic production of an author whose professional activities ceased many years ago (in some cases, more than two centuries), as well as to test its performance as regards the detection of citations to these works, making note of the potential technical issues the study of these cases may bring to light.
Thank you, and I hope you find it of interest.