Hello ‘Lucene’ World – Web Implementation

Lucene is a Java Library by Apache used extensively for making custom search engines and indexing. Here are few of the features of Lucene, straight from the Lucene’s homepage :

Scalable, High-Performance Indexing

  • over 95GB/hour on modern hardware
  • small RAM requirements — only 1MB heap
  • incremental indexing as fast as batch indexing
  • index size roughly 20-30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

  • ranked searching — best results returned first
  • many powerful query types: phrase queries, wildcard queries, proximity queries, range queries and more
  • fielded searching (e.g., title, author, contents)
  • date-range searching
  • sorting by any field
  • multiple-index searching with merged results
  • allows simultaneous update and searching
If you want a quick 5 min tutorial on Lucene, here you go. Once you seen the structure of Lucene it would be nice to have hello world implementation of it so that you could see it’s working in the barebone structure. Recently I had the similar problem; while there were numerous examples of Lucene implementation I couldn’t find the minimal ‘hello world’ implementation of web search using Lucene. I am a old-school, to understand working of a new language or library, I like to see the minimal implementation upon which I love to tinker.
Thus, inability to find anything like that made me write one. Therefore I present you “Hello World” with Lucene Web interface.
Note:
-You need to have Lucene Library installed and index built (see the 5 min tutorial link above)
-For real application it is not a good and safe practice to write Java inside JSP, instead you use servlets
-This is not intended to be Lucene tutorial or Lucene best practice. This code is minimal required code to have a crude sample web search engine running on Lucene.

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
   "http://www.w3.org/TR/html4/loose.dtd">
<!
	Project	:		Lucened Enabled Simple Text Search
	Author	:		Aaditya Prakash
	Date	:		25- Sep-2012 16:21
-->
 
<%@ page import="SearchWebCall" %>
 
<html>
 
<head>
	<title> Lucene Enabled Simple Text Search </title>
 
	<script type="text/javascript">
 
	<!-- Validation to prevent some characters, 
	currently Lucene Doesn't parse these characters -->
			function isSpclChar(){
				var iChars = "!@#^&*()+=-[]\\\';,./{}|~`\":<>?";
				for (var i = 0; i < document.lucene.input.value.length; i++) {
					if (iChars.indexOf(document.lucene.input.value.charAt(i)) != -1) {
						//throwback the unallowed character
						document.lucene.input.value = document.lucene.input.value.slice(,-1);
						return false;
					}
				}
			}
 
	</script>
 
</head>
 
<body>
 
<h1>Lucene Enabled Simple Text Search </h1>
 
<form name="lucene" action="index.jsp" method="get">
		<input name="input" size="30" id="input" onkeyup="isSpclChar()"/>
		<br />
		<input type="submit" value="Search" />
</form>
 
<br />
 
<%
	String searchString[] = new String[2];
	String result[] = new String[100];
	searchString[] = "-query";
	searchString[1] = request.getParameter("input");
 
	if(searchString[1] != null) {
		// send the Search term to obtain the result, 
		//all processing happens in the class 
		//(keeping jsp code to minimum)
		result = SearchWebCall.filter(searchString);
		int noOfMatch=;
 
		out.print("Search Result of : <b>" + searchString[1] + "</b></br></br>");
		for(int i=0; i< result.length; i++){
 
			//format the results
			if (result[i] != null) {
				out.print(i+1 + ". ");
				//results are displayed as hyperlinks to faciliate information retrieval
				out.print("<a href=\""+result[i]+"\">"+result[i]+"</a>");
				out.print("<br />");
				noOfMatch++;
			}
		}
		out.print("<h2><p> No. of Matches: " + noOfMatch+ " </h2>");
	}
//end of JSP
%>
</body>
</html>
Written on October 20, 2012