20 abr 2016

Variation in number of hits for complex searches in Google Scholar


Bramer, W.M. (2016). 
Variation in number of hits for complex searches in Google Scholar. 
Journal of the Medical Library Association, 104(2), 143-145.

·       How reliable are results numbers
·       Nmber of hits reported was the sum of the number of versions reported by the first 1,000 hits
a)  32 searches for the analysis of hit count estimates numbers fluctuation.
b)  98 searches for actual records retrieved.
Each search (designed for systematic review projects) is performed in Google Scholar at least monthly for more than 2 years. The publication data is unfiltered and “Citations” option is both enable and disable for comparison.
For each search, all results (records) were downloaded into a DOC file, annotating the total number of results displayed (hit count estimates).
For each record, some bibliographic data (such as the number of versions and citations) were retrieved as well.
Period analyzed: 
All searches were carried out between June 2013 and January 2014,
Sample a)
Numbers of hits in Google Scholar for different queries sometimes showed great changes in short time periods, located at:
- September 19, 2013 and September 28, 2013
- October 17, 2014 and November 21, 2014.
 Sample b)
·  The number of citations in the 1,000 records retrieved for 68 of the 98 searches varied greatly. This ranged from 9 to 502, with a median of 127
· The number of versions for 33 observations varies greatly too (minimum: 1,808, maximum: 7,639, median: 4,486).
·   The number of hits reported by Google Scholar varied even more (minimum: 631, maximum: 142,000, median: 16,500)


The number of hits reported in Google Scholar is an unreliable measure.

The relative changes in numbers of hits varied greatly between search strategies and seemed to be not related to an overall increase in coverage.

What this study adds

This brief study sheds light about the accuracy of the hit count estimates that Google Scholar provides for a set of systematic searches over a period of 2 years, providing thus interesting empirical data about this search engine’ functionality

The two dimensions analysed (fluctuation over time of hits and accuracy between hits and actual results) are of much interest for the literature. Nonetheless, probably the most outstanding finding is the lack of relation between the periods of greater variation and the coverage updates. This effect should be deeply studied in the future to verify and - if checked - to explain it

However, this study lacks of an appropriate structure. The method section is incomplete and some issues remain unclear. Some methodological aspects are included in the results section hindering the understanding of the procedure. Moreover, the nomenclature is not standardized (for example we hardly may differentiate hits versus references). The writing should improve readability. Most important is however the lack of information about the query searches performed, which prevent the audience not only to check but to replicate all of author’s results as well as the inclusion of some inaccurate sentences about citations and versions.

Otherwise, this topic has been widely studied in the literature. Some of the author’s findings (mismatching of hit counts respect to actual results, fluctuation, etc.) have been previously pointed out by several researchers, especially Mike Thelwall, Peter Jacsó, and EC3 Research Group (see EC3 Bibliography about this topic), thus jeopardizing its novelty. The discussion and comparison of the new results against previous research findings is desirable.

Finally, if the main conclusions of the paper are about the dynamism of Google Scholar, this research is unnecessary (the functioning of academic search engines is well known). Conversely, if the appropriateness of GS specifically on systematic reviews is discussed, the article should have been oriented in a different way taking better profit of all interesting raw data gathered.

15 abr 2016

Comparison of metrics and document coverage of Google Scholar, Web of Science, and Scopus: the case of 18 geography researchers at the University of Vienna

Gorraiz, J., Gumpenberger, C., Glade, T. On the bibliometric coordinates of four different research fields in Geography. Scientometrics, in press 
This study is a bibliometric analysis of the highly complex research discipline Geography. In order to identify the most popular and most cited publication channels, to reveal publication strategies, and to analyse the discipline’s coverage within publications, the three main data sources for citation analyses, namely Web of Science, Scopus and Google Scholar, have been utilized. the main research questions of this study are:
§  Which are the most usual and most cited publication channels?
§ Which data sources are most applicable for Geography bibliometrics? Is Google Scholar indeed a useful complementary data source?
§  Are there any bibliometric differences detectable between natural science and human science oriented research fields?

Documents published by 40 faculty members at the  Department for Geography and Regional Research at the University of Vienna specialized in four different subfields: Geoecology (12), Social and Economic Geography (10), Demography and Population Geography (9), and Economic Geography (9)
§ This study is primarily based on publication data collected for four evaluation exercises performed at the Department for Geography and Regional Research at the University of Vienna.  
§  In all exercises, the publication data were delivered directly by the candidates, whose identity has to remain anonymous.  All bibliometric indicators added to the list of publications were controlled or recalculated in order to enable a correct and comparable analysis.
§ The main data source for coverage and citation analyses was “Web of Science -Core Collection”Furthermore, Google Scholar and Scopus has been utilized as another primary source within the Demography and Population Geography , and  Google Scholar, “Web of Science -Cited Reference Search” en Economic Geography. The analysis in GS was performed by using the GS Citations Profiles. 
§  Thee candidates were invited to create their individual profiles and make them publicly available for at least a couple of weeks. In addition, the tool ‘Publish or Perish’ was used to check and amend these profiles. In the cases where individual profiles were not available, respective queries have been made in order to assemble a complete data set
Period analyzed:  1990-

1.   The results show significant differences in the publication and citation habits among the four specialties studied. While in natural science, publications in highly ranked international peer-reviewed journals is of highest importance, the publications within the social science domain refer often to reports, book-chapters and also monographs
2. In addition, very heterogeneous and individual publication strategies, even in the same research fields, are observed.
3. Monographs, journal articles and book chapters are the most cited document types.
4. Monographs (Books) are completely covered in GS (100%), whereas coverage is less complete in WoS Cited Reference Search (~75 %) and almost zero in WoS Core Collection including both Book Citation Indices. Similar trends are reported for the other publication types. However, the percentage of book chapters covered in GS is still low (around 55 %) but high in comparison to WoS (14.1%) and WoS Cited Reference Search (32.9%).
5. No considerable differences are observed when using WoS or Scopus. 
6. The inclusion of the Cited Reference Search allows for a significant coverage and is more practicable in WoS than in Scopus.
Coverage & Metrics comparison
Web of Science, Scopus, Google Scholar
7.   The number of papers and citations in Google Scholar is substantially higher than both the Web of Science (Core Collection plus Cited Reference Search) and Scopus for every subdiscipline. Depending on the bibliometric indicator, Google Scholar doubles, triples, quadruples, or quintuples the results offered by Web of Science and Scopus, as can be observed in the graph.

8. Spearman correlations performed for number of citations, citations per cited publication and h-index in the three data sources (WoS, Scopus and Google Scholar) were very strong (varying from 0.8 to 0.95).


We only discuss the conclusions concerning Google Scholar

What is already known on this topic

Google Scholar has a much broader document and citation coverage than Web of Science and Scopus. This study supports the findings of previous studies.

The values of the main citation indicators might differ in absolute values in GS, WoS, WoS Cited Reference Search and Scopus, but are comparable in terms of relative values. It confirms what other studies dealing with similar comparisons found.

In addition, it also confirms something we've stated over the years: Google Scholar is an indispensable source of data if one wants to study any of the disciplines of the Social Sciences and Humanities.

What this study adds

This work analyses the coverage differences of various databases as regards several specialties in the discipline of Geography. The small size and the limitations of the sample (only 9 researchers are studied in each subdiscipline), limited to researchers from a department of one university (Vienna), and a country (Austria), make it difficult to generalize the results to the specialties studied. However, it has value as a case study.

This study is one of the first to take advantage of the Cited Reference Search feature in Web of Science to calculate bibliometric indicators, and compare them with data from other databases, in a similar way as some of our products do: Classic Scholars' Profiles: Bibliometrics & Scientometrics and Scholar Mirrors.

Furthermore, this works proves that both the Web of Science with the Book Citation Index, and Scopus with its "Titles Expansion Program" are still incapable of covering the majority of books and book chapters published locally in each country. Their book collections are for the most part limited to elite publishers in English-speaking countries, which their language, geographic, and publisher biases are still there.

Lastly, this study states that "the accuracy of the citations in GS was very high (~95 %)", but unfortunately they don't offer more details or explanations about this issue.

10 abr 2016

Opinions and use of scholarly metrics by a sample of faculty members at the University of Vermont

DeSanto, D., & Nichols, A.
Scholarly Metrics Baseline: A Survey of Faculty Knowledge, 
Use, and Opinion About Scholarly Metrics
College & Research Libraries2016, crl16-868. 

This article presents the results of a faculty survey conducted at the University of Vermont during academic year 2014-2015. Five guiding questions shaped our survey work:
 How familiar are faculty with scholarly metrics?
 How/why/when do they seek them out?
 Where do faculty turn for help?
 What role do scholarly metrics play in the tenure and promotion process?
 What opinions and thoughts do faculty members have about how well these metrics reflect the impact of a scholar’s work? 

During winter break 2014-2015, an online survey was distributed to all tenure-track faculty on campus with the exception of faculty in the College of Medicine. The survey was distributed to faculty on December 18th, 2014 and was closed on February 6th, 2015. Two reminders were sent out during this time period. Out of 470 faculty solicited for participation, 225 faculty began the survey and 206 completed it, providing a response rate of 44%.
Results are presented as the total of all survey respondents and are, for some questions, broken down by academic rank and/or disciplinary category. In order to present data that are statistically significant, we present data in three major disciplinary categories: sciences; social sciences, business, and social services; and humanities & arts. 
We define the term “scholarly metrics” to include both traditional impact metrics (e.g., hindex, ISI journal impact factor, SCImago journal rank) as well as citation count. We consider article-level metrics or “altmetrics” separately and posed questions specifically about altmetrics. 

Even though the results of this study can't be extrapolated to the general population of professors because of the specific nature of the sample (the opinion of 206 faculty members at the University of Vermont), they are interesting and constitute an excellent case study.
The main findings are summarised in the table below:
Regardless of the attention the results of this study may garner, to this blog (focused on sharing empirical evidences on Google Scholar), the most original aspect is how faculty members at the University of Vermont use platforms and bibliometric indicators, confirming the profound differences among disciplines. These are the main results:

6 abr 2016

The many faces of authors in academic social networking sites: looking for differences between Google Scholar Citations, Mendeley, and Microsoft Academic Search

Tsou, A., Bowman, T. D., Sugimoto, T., Lariviere, V., & Sugimoto, C.R.  
Self-presentation in scholarly profiles: Characteristics of images and perceptions of 
professionalism and attractiveness on academic social networking sites
First Monday, 2016, 21(4) 

In this paper we undertake an exploratory study to analyze how academics present themselves in profile pictures online and how they are perceived. In particular, we examine perceptions and predictors of professionalism, and how these are influenced by age, gender, and race. We examine differences amongst three different academic social networking platforms: Microsoft Academic Search, Mendeley, and Google Scholar. Furthermore, we analyze how framing individuals as “scholars” alters perceptions of professionalism. Specifically, we address the following research questions:

1. How do scholars present themselves online? Do presentation characteristics vary by demographic characteristic (i.e., gender, age, or race)? Do presentation characteristics vary by platform (i.e., Mendeley, Google Scholar, Microsoft Academic Search)?
2. What is the relationship between presentation characteristics and perceptions of professionalism? Does this vary by demographic characteristic (i.e., gender, age, race)?
3. What is the relationship between presentation characteristics and perceptions of attractiveness? Does this vary by demographic characteristic (i.e., gender, age, race)? How does priming affect the framing of perceptions of professionalism? How does priming affect the framing of perceptions of attractiveness?

This study used Amazon’s Mechanical Turk service to code 10,500 profile pictures used by scholars on three platforms: Mendeley, Microsoft Academic Search, and Google Scholar. 
The data were gathered from three online social networking sites for academics: Microsoft Academic Search, Google Scholar, and Mendeley. Ultimately, 10,000 profile pictures were sampled from each of these sites, for a total initial sampling frame of 30,000 images. 
These images were coded by workers (known as Turkers) on Amazon’s Mechanical Turk (AMT) service. The HITs were titled “Image categorization.” Results were validated by a researcher who manually checked every image that had not been flagged as a photograph of a single adult. The images were coded for descriptive (e.g., age, race, gender), objective (e.g., presence/absence of glasses, color of clothes), and subjective (i.e., attractiveness, professionalism) variables. Each of the 10,500 images was posted as its own HIT on AMT, along with the codebook. In order to both validate the initial coding and investigate priming, another round of coding using the same 10,500 images was conducted using AMT Turkers. Two types of inter-rater reliability were conducted. For one round, a random sample of 100 images coded by AMT Turkers was compared against coding done by one of the researchers. The second round of coding used the priming coding to assess reliability amongst Turkers.

The majority of the individuals on Mendeley, Microsoft Academic Search, and Google Scholar were Caucasian, male, and perceived to be over the age of 35. Females and younger individuals were perceived as less professional than male and older individuals, while women were more likely to be perceived as “attractive.” In addition, the Mechanical Turk coders were susceptible to framing; the individuals in the profile pictures were considered more “professional” if they were identified as “scholars” rather than merely as “individuals.” The results have far-reaching implications for self-presentation and framing, both for scholars and for other professionals. In the academic realm, there are serious implications for hiring and the allocation of resources and rewards.

Regardless of the attention the results of this study may garner, to this blog (focused on sharing empirical evidences on Google Scholar), the most original aspect is the comparison of images available in Google Scholar Citations, Mendeley, and Microsoft Academic Seach profiles. These are the main results:
Several differences are found by social media platform. We noted that the proportion of profiles with images varies dramatically, with Mendeley having a much higher proportion than Google Scholar or Microsoft Academic Search, suggesting a difference in user behavior. Our initial analysis also revealed differences by gender, with Mendeley having the largest proportion of women and Google Scholar the lowest. Other significant differences revealed that Mendeley had the youngest users, as well as the ones most likely to be associated with non-professional variables — reinforcing earlier research on the demographics of Mendeley users (Haustein, et al., 2014). Google Scholar academics were perceived as being associated with more professional variables — e.g., wearing blue or black, wearing business and business casual, and employing a traditional head-and-shoulders shot. MAS users, who were the most likely to be older and depicted wearing glasses, were seen as the least attractive. 

These findings have several implications for sociological and scientific studies using social media data. For example, contemporary scientometric studies often use data from Mendeley or other sources to describe the impact of research (Haustein, et al., 2014). However, users of this platform are significantly different demographically from Google Scholar or MAS, in ways that can have implications for the results of such studies.

Stefanie Haustein, Vincent Larivière, Mike Thelwall, Didier Amyot, and Isabella Peters, 2014. “Tweets vs. Mendeley readers: How do these two social media metrics differ?” it-Information Technology, volume 56, number 5, pp. 207–215.
doi: http://dx.doi.org/10.1515/itit-2014-1048

1 abr 2016

The "strategy as practice" through Web of Science and Google Scholar

Maia, J.L., Di Serio, L.C.,  Alves Filho, A. G. . Bibliometric research on strategy as practice: exploratory results and source comparison
Sistemas & Gestão, 2016, 10(4), 654-69. 
DOI: 10.20985/1980-5160.2015.v10n4.662

The purpose of this article, therefore, is to sketch an overview of the scientific production in this new field of “Strategy as Practice”, assessing issues such as major works, authors, publishing media, themes, institutions, related keywords, and more. From a condensed work previously published by Maia et Alves Filho (2013), this article seeks to recover and explore more deeply the referred research, bringing new aspects and ways of interpretation, as well as similar bibliometric research performance using Google Scholar as an alternative source of information.

The data used in the first bibliometric analysis in this article are the documents found in the database Web of Science, which are published by Thomson Reuters. The process of searching  or documents was performed using the keywords “strategy as practice” and “strategy-as- - practice” and the Boolean operator “OR”, in fields titles, descriptors and topics of the publications, limited “Articles or congress work or conference abstracts or book chapters, excluding book reviews. From this search and refinement 72 publications were obtained.

Data for the bibliometric analysis in this article are the documents retrieved in the Google Scholar database, which could be obtained and extracted with the assistance of the Publish or Perish software (Harzig, 2007). The process of searching for documents was executed based
on the same keywords of the previous query, “strategy as practice” and “strategy-as-practice” and the Boolean operator “OR”. Because of the limitations of the maximum results of Google Scholar (1000 results), the research had to be divided into several publishing periods, which were subsequently consolidated. From this process, 2,372 results were obtained, including 360 without date of preparation. 

(1) size limits in Google Scholar make it impossible to calculate classic bibliometric indicators; (2) Google Scholar has generated a database with more than 30 times the results in Web of Science. Although it was expected that the amount of Google Scholar results were higher than the Web of Science, this proportionality also emphasizes that much of the research on SAP is still “sub-published”, and it is developed in congress articles, open jobs available on the internet…
(3) Google Scholar has generated a much more disperse and diverse base. While the top five authors in Web of Science are responsible for almost 50% of works, in Google Scholar these authors produce only 4% of them. In the case of sources, the numbers are smaller but similarly distinct: the five largest sources publish 36% of the results via Web of Science, while in Google Scholar that number is only 10%.
(4) there are concerns in terms of using Scholar as a source of information: 15% of the documents do not present publication date while 27% do not present publication source.