10/20/2010

Solr 1.4 Enterprise Search Server - pdf

Solr 1.4 Enterprise Search ServerBook Description
If you are a Java developer building a high-traffic web site, you need to have a terrific search engine. Sites like Netflix.com and Zappos.com employ Solr, an open source enterprise search server, which uses and extends the Lucene search library. This is the first book in the market on Solr and it will show you how to optimize your web site for high volume web traffic with full-text search capabilities along with loads of customization options. So, let your users gain a terrific search experience.

This book is a comprehensive reference guide for every feature Solr has to offer. It serves the reader right from initiation to development to deployment. It also comes with complete running examples to demonstrate its use and show how to integrate it with other languages and frameworks.

This book first gives you a quick overview of Solr, and then gradually takes you from basic to advanced features that enhance your search. It starts off by discussing Solr and helping you understand how it fits into your architecture–where all databases and document/web crawlers fall short, and Solr shines. The main part of the book is a thorough exploration of nearly every feature that Solr offers. To keep this interesting and realistic, we use a large open source set of metadata about artists, releases, and tracks courtesy of the MusicBrainz.org project. Using this data as a testing ground for Solr, you will learn how to import this data in various ways from CSV to XML to database access. You will then learn how to search this data in a myriad of ways, including Solr’s rich query syntax, “boosting” match scores based on record data and other means, about searching across multiple fields with different boosts, getting facets on the results, auto-complete user queries, spell-correcting searches, highlighting queried text in search results, and so on.

After this thorough tour, we’ll demonstrate working examples of integrating a variety of technologies with Solr such as Java, JavaScript, Drupal, Ruby, XSLT, PHP, and Python.

Finally, we’ll cover various deployment considerations to include indexing strategies and performance-oriented configuration that will enable you to scale Solr to meet the needs of a high-volume site.

What you will learn from this book?

  • Blend structured data with real search features
  • Import CSV formatted data, XML, common document formats, and from databases
  • Deploy Solr and provide reference to Solr’s query syntax from the basics to range queries
  • Enhance search results with spell-checking, auto-completing queries, highlighting search results, and more.
  • Secure Solr
  • Integrate a host of technologies with Solr from the server side to client-side JavaScript, to frameworks like Drupal
  • Scale Solr using replication, distributed searches, and tuning

Approach

The book takes a step-by-step tutorial approach with fully working examples in Java. It will show you how to implement a Solr-based search engine on your intranet or web site.

Who this book is written for?

This book is for Java developers who would like to use Solr for their applications. You only need to have basic programming skills to use Solr. Knowledge of Lucene is certainly a bonus.

About the Author

David Smiley

Born to code, David Smiley is a senior software developer with 10 years of experience in the defense industry using Java and various web technologies. David is a strong believer in the open-source development model and has made small contributions to various projects over the years.

David began using Lucene way back in 2000 and was immediately excited by it and its future potential. Later on he went to use the Lucene-based “Compass” library to construct a very basic search server similar in spirit to Solr. Since then, David has used Solr for a larger search project and was able to contribute modifications back to the Solr community. Although preferring open-source solutions, David has also been trained on the commercial Endeca search platform and is currently using that product as well as Solr for a different project.

Eric Pugh

Fascinated by the “craft” of software development, Eric Pugh has been heavily involved in the open source world as a developer, committer, and user for the past 5 years. He is a member of the Apache Software Foundation and lately has been mulling over how we move from the read/write web to the read/write/share web.

In biotech, financial services, and defense IT, he has helped European and American companies develop coherent strategies for embracing open source software. As a speaker he has advocated the advantages of Agile practices in software development.

Eric became involved in Solr when he submitted the patch SOLR-284 for Parsing Rich Document types such as PDF and MS Office formats that became the single most popular patch as measured by votes! SOLR-284 became part of Solr version 1.4.

Book Details

  • Paperback: 250 pages
  • Publisher: Packt Publishing (September 7, 2009)
  • Language: English
  • ISBN-10: 1847195881
  • ISBN-13: 978-1847195883
  • File Size: 6.6 MiB
  • Hits: 1,798 times