HathiTrust offers full-text search of digitized books, journals
A year after its launch by 25 leading U.S. research libraries, HathiTrust Digital Library announces a service that will transform how researchers use the more than 1.6 billion pages (4.6 million volumes) in its collections.
The breakthrough allows for full-text searching capabilities across the entire library. Researchers now can search public domain and in-copyright works by keyword or phrase.
Based on open source Solr/Lucene technology, the service expands on an experimental search of public domain volumes introduced in November 2008. Full-text search will continue to be supported across the repository as it grows at a rate of hundreds of thousands of volumes every month.
"The HathiTrust partners are pleased to offer a search service that helps mine this growing body of authoritative library materials," says John Wilkin, HathiTrust executive director and associate university librarian at U-M. "HathiTrust continues to distinguish itself with its reliability and with its efforts to broaden the availability of digitized library collections in the flow of scholarly discourse."
In combination with the HathiTrust Digital Library's carefully curated bibliographic data, the new functionality allows researchers to more efficiently locate items relevant to their research. It also lays the foundation for future services such as full-text search with faceted browsing, advanced search, "more like this" options, and tools that can be used in computational research.
The effort to provide full-text searching capabilities across the repository has yielded valuable benchmarking data, methods, and code to the broader large-scale search community, Wilkin says.
The HathiTrust partners are committed to developing the repository and its services to meet the long-term needs of their academic communities, and offer a unique resource on the Web for scholarship and research.
HathiTrust (www.hathitrust.org) is a collaboration of the 13 universities of the Committee on Institutional Cooperation, the University of California system and the University of Virginia, and currently includes digitized volumes from U-M, University of California, Indiana University and the University of Wisconsin.