20 abr. 2016

Variation in number of hits for complex searches in Google Scholar


Bramer, W.M. (2016). 
Variation in number of hits for complex searches in Google Scholar. 
Journal of the Medical Library Association, 104(2), 143-145.

·       How reliable are results numbers
·       Nmber of hits reported was the sum of the number of versions reported by the first 1,000 hits
a)  32 searches for the analysis of hit count estimates numbers fluctuation.
b)  98 searches for actual records retrieved.
Each search (designed for systematic review projects) is performed in Google Scholar at least monthly for more than 2 years. The publication data is unfiltered and “Citations” option is both enable and disable for comparison.
For each search, all results (records) were downloaded into a DOC file, annotating the total number of results displayed (hit count estimates).
For each record, some bibliographic data (such as the number of versions and citations) were retrieved as well.
Period analyzed: 
All searches were carried out between June 2013 and January 2014,
Sample a)
Numbers of hits in Google Scholar for different queries sometimes showed great changes in short time periods, located at:
- September 19, 2013 and September 28, 2013
- October 17, 2014 and November 21, 2014.
 Sample b)
·  The number of citations in the 1,000 records retrieved for 68 of the 98 searches varied greatly. This ranged from 9 to 502, with a median of 127
· The number of versions for 33 observations varies greatly too (minimum: 1,808, maximum: 7,639, median: 4,486).
·   The number of hits reported by Google Scholar varied even more (minimum: 631, maximum: 142,000, median: 16,500)


The number of hits reported in Google Scholar is an unreliable measure.

The relative changes in numbers of hits varied greatly between search strategies and seemed to be not related to an overall increase in coverage.

What this study adds

This brief study sheds light about the accuracy of the hit count estimates that Google Scholar provides for a set of systematic searches over a period of 2 years, providing thus interesting empirical data about this search engine’ functionality

The two dimensions analysed (fluctuation over time of hits and accuracy between hits and actual results) are of much interest for the literature. Nonetheless, probably the most outstanding finding is the lack of relation between the periods of greater variation and the coverage updates. This effect should be deeply studied in the future to verify and - if checked - to explain it

However, this study lacks of an appropriate structure. The method section is incomplete and some issues remain unclear. Some methodological aspects are included in the results section hindering the understanding of the procedure. Moreover, the nomenclature is not standardized (for example we hardly may differentiate hits versus references). The writing should improve readability. Most important is however the lack of information about the query searches performed, which prevent the audience not only to check but to replicate all of author’s results as well as the inclusion of some inaccurate sentences about citations and versions.

Otherwise, this topic has been widely studied in the literature. Some of the author’s findings (mismatching of hit counts respect to actual results, fluctuation, etc.) have been previously pointed out by several researchers, especially Mike Thelwall, Peter Jacsó, and EC3 Research Group (see EC3 Bibliography about this topic), thus jeopardizing its novelty. The discussion and comparison of the new results against previous research findings is desirable.

Finally, if the main conclusions of the paper are about the dynamism of Google Scholar, this research is unnecessary (the functioning of academic search engines is well known). Conversely, if the appropriateness of GS specifically on systematic reviews is discussed, the article should have been oriented in a different way taking better profit of all interesting raw data gathered.

