Santry Technology Solutions, Content Management, DotNetNuke, SharePoint Consulting
Register | Login
Tuesday, February 09, 2010

Sections
  
About Us
  
Partners
Downloads
  
 WWWCoder.com Resource Directory

Indexing Database Content with dotLucene
5/4/2005 2:02:34 PM

In this article we will discuss how you can use dotLucene to index content located within a database. dotLucene is a open source full-text catalog solution developed in C# which is a direct line by line port of the popular Lucene engine based on Java.

In this article we will discuss how you can use dotLucene to index content located within a database. dotLucene is a open source full-text catalog solution developed in C# which is a direct line by line port of the popular Lucene engine based on Java.

In this particular example, we will discuss how the Weblog indexer was created for BlingBlangBlog.com and WWWCoder.com that goes out and spiders Weblog content, places it into a warehouse, and then code is written to index the content via dotLucene for later searching.

In order to index the Weblog content, a Windows application was created to spider the RSS content and place it into a database table. The only application that actually makes any requests to this database table is the spider and indexer applications. Our Web application will connect to the dotLucene catalog directly for searching.

 

We're not going to cover the actual spider in this article, basically the spider will make requests to the RSS feeds located in the database, query each item, check the database to see if the items are in there, if not then it will insert the individual item in the database table. Once we have our table populated, we'll then create a query that returns a recordset and then populates our index.

 

Indexing Content

 

The first method we'll cover is the AddDocumentsToIndex method. This method can be instantiated from a button click. The method will make a request to our database to obtain the records and their field values that we want to place in our index.

 

Public Sub AddDocumentsToIndex()
        Dim CatalogDir As String = "c:\pathtoindex\"
        Dim DSNASPSearch As DataSet
        Dim SQLQ As String = "SELECT * FROM MyTable" 'could be a join, sproc, whatever.
        Dim sqlConnection As SqlConnection = New SqlConnection(DB_Connection)
        sqlConnection.Open()
        Dim SqlCommand As SqlCommand = New SqlCommand(SQLQ, sqlConnection)
        Dim dr As SqlDataReader = SqlCommand.ExecuteReader()
       
'the CateLogDir is a string value of the physical directory
        Dim Indexer As New Indexer(CatelogDir)
        While dr.Read
            Indexer.AddDocument(dr("pubDate"), _
                dr("Description"), _
                dr("URL"), _
                dr("Title"))
        End While
        dr.Close()
        dr = Nothing
        sqlConnection.Close()
        sqlConnection = Nothing
End
Sub

 

We first set a value for the physical directory of the index location. Again for demonstration we hard coded the values here, you could set up some configuration file for obtaining the values. You can see in the code above that we're making a call to the Indexer.AddDocument method, this method is contained in the Indexer class which focuses on interaction with the dotLucene API.

You can see in the indexer class, it is pretty thin, you basically call methods of dotLucene to create an instance of the catalog, and then populate a document with the values from your record. Another item of note is the method ReturnSortDate, this formats the date in the database to 20050501 so we can sort by date within our search.

 

Imports Lucene.Net.Documents
Imports
Lucene.Net.Index
Imports
Lucene.Net.Analysis.Standard
Imports
System.Text.RegularExpressions
Imports
System
 

Public Class Indexer
    Private
writer As IndexWriter
 

    Public Sub New(ByVal Directory As String)
        InitializeIndex(Directory)
    End Sub

 

    Public Sub InitializeIndex(ByVal Directory As String)
        writer = New IndexWriter(Directory, New StandardAnalyzer, True)
        writer.SetUseCompoundFile(True)
    End Sub

 

    Public Sub AddDocument(ByVal DateAdded As String, _
    ByVal Description As String, ByVal URL As String, _
    ByVal Title As String)
        Try
            Dim doc As New Document
            doc.Add(Field.Keyword("date", DateAdded))
            doc.Add(Field.Text("description", Description))
            doc.Add(Field.Text("url", URL))           
            doc.Add(Field.Text("title", Title))
            doc.Add(Field.Keyword("sortdate", ReturnSortDate(DateAdded)))
            writer.AddDocument(doc)
        Catch ex As Exception
            'failed to add doc

 

        End Try
    End Sub

 

    Private
Function ReturnSortDate(ByVal DateAdded As Date) As Integer
        'Returns 20050511 for sorting on date.
        Dim thisMonth As String = DateAdded.Month.ToString
        'add leading zero
        If thisMonth.Length = 1 Then thisMonth = "0" & thisMonth
        Dim thisYear As String = DateAdded.Year.ToString
        Dim thisDay As String = DateAdded.Day.ToString
        If thisDay.Length = 1 Then thisDay = "0" & thisDay
        Return thisYear & thisMonth & thisDay
    End Function

 
    Private
Function Close()
        writer.Optimize()
        writer.Close()
    End Function

End
Class

 

Searching Content

 

That's it, we have a searchable index ready for our search code in a Web form. dotLucene comes with search examples in the source distribution, so here we'll cover some basic searching of this index. The sample code includes paging routines, and binding of a DataTable to a Web form. In the next block of code we'll do a search on the catalog and return a DataTable with the results. The Search method will accept the search query passed from a textbox.

 

Imports Lucene.Net.Documents
Imports
Lucene.Net.Index
Imports
Lucene.Net.Analysis.Standard
Imports
System.Text.RegularExpressions
Imports
System

Public Class Searcher
    Dim
searcher As IndexSearcher

    Sub New(ByVal Directory As String)
        searcher = New IndexSearcher(Directory)
    End Sub


      Public Function Search(ByVal Query As String, _
     Optional ByVal SortBy As String = "date") As DataTable
       ‘we’ll create our datatable structure to mirror the index.
             Dim Results As New DataTable
        Results.Columns.Add("Title")
        Results.Columns.Add("Description")
        Results.Columns.Add("URL")
        Results.Columns.Add("Published")
 

        Dim MyQuery As Query = QueryParser.Parse(Query, "description", _
          New StandardAnalyzer)
        Dim Sort As Sort = New Sort(SortBy, True)
        Dim Hits As Hits = searcher.Search(MyQuery, Sort)
        mTotalRecs = Hits.Length
        Dim iCount As Integer = 0
        While iCount < mTotalRecs
              Dim doc As Document = Hits.Doc(
iCount)
               Dim row As DataRow = Results.NewRow
               row("url") = doc.Get("url")
               row("Title") = doc.Get("title")
               row("Description") = doc.Get("description")
               row("Published") = doc.Get("date")
               Results.Rows.Add(row)
              iCount = iCount + 1
        Loop
        searcher.Close()
        Return Results
     End Function

End
Class

 

Now that you have a DataTable created with the results from your dotLucene search, you can bind it to a DataGrid for viewing on the Web or some other application. For example:

 

Dim searcher As New Searcher("c:\pathtoindex\")
Dim
ds As New DataTable
Dim
myQuery As String = myKeywords
ds = searcher.Search(myQuery, sort)


Where you pass the keywords or query to the Search method and the sort type. In our application the sort method parameter sent to dotLucene would be "date" for the date field we calculated earlier using the ReturnSortDate method used to format our date to a sortable integer.


Page Options:
format for printing  Format for Printer
email article  Email Page
add to your favorites   Add to Favorites
How would you rate the quality of this content?
Poor - - Excellent
Comments?
Overall Rating:
Comments Left:
Left on 10/27/2009 4:46:26 AM by Anonymous
Comments: Lucene.net looks excellent.

However, a basic question - what are its advantages as compared to SQL Server fulltext index search? (if we are comparing searching only database records)

No ratings available.
Left on 10/21/2008 8:09:45 AM by Anonymous
Comments: is this code working with the latest version of lucene 2.1 ??
Left on 9/26/2008 3:11:04 AM by Anonymous
Comments: can you please put as a whole project to download
No ratings available.
Left on 7/30/2008 2:45:41 AM by Anonymous
Comments: can it be done with c# ? please i need it urgently
thanks
No ratings available.
Left on 4/25/2008 2:27:20 AM by Anonymous
Comments:
Left on 7/19/2006 9:57:37 PM by Anonymous
Comments: Comments from the following blog: Patrick Santry's (aka wwwCoder) Blog, located at: http://blogs.wwwcoder.com/psantry/archive/2006/07/19/40828.aspx
No ratings available.
Left on 7/4/2006 7:40:15 AM by Anonymous
Comments: hi yah, This article is very good. I want to implement using c# instead of vb.net
creation of index is working fine but i am unable to retrieve the data. it throws an exception as NullReferenceException was unhandled. can any body help me how to come out from this exception
No ratings available.
Left on 6/16/2005 4:30:46 AM by Anonymous
Comments: Comments from the following blog: Marco Scheel, located at: http://www.geekdotnet.com/blogs/marco/archive/2005/06/16/82.aspx
No ratings available.
Left on 5/29/2005 10:18:17 AM by Anonymous
Comments: dotlucene, is verry intresting, great article. There are 2 small errors in the example tough. The indexer: make close() public and call close on your indexer after adding a batch of documents. in The Searcher: replace "while ..." by "DO while ..."
Left on 5/4/2005 2:18:58 PM by Anonymous
Comments: Kingsley Tagbo - That was pretty quick!
  

 Latest Articles
  

 Latest News
  

 

Spotlight
Syndication

 


 


Digg This
 


DotNetNuke Platinum Benefactor

  
 

 Terms Of Use | Privacy Statement
 Copyright 2008 - Santry Technology Solutions, Box 172, Girard, PA 16417, (814) 774-0970