We are very pleased to contribute this updated integration with Lucene for OFBiz! Also, the timing is good, as I am excited to be attending the Lucene/Solr Revolution EU 2013 conference in Dublin, Ireland November 4-7. (If you will be in attendance and would like to get together, please feel free to contact me through the website.)
Many thanks to Scott Gray and the others within HotWax Media who worked with me on this effort. Also thanks to the Lucene Foundation for their cooperation.
I hope you enjoy the following technical overview of the Lucene component of OFBiz. Let me know what you think!
The “lucene” component in OFBiz provides:
- a framework for the efficient management of Lucene indexes, the definition, preparation and indexing of documents for information stored in the OFBiz data model
- an implementation of an index and documents for Product searches
This document describes the main features of the Product index/documents and provides details about the framework and how to use it to implement custom indexes and documents.
A specific Lucene index is used to index and search product related information. The index is of type directory (FSDirectory) but the type of index can be easily changed programmatically. The parent folder for directory based indexes in OFBiz is runtime/indexes/ and the directory name of the product index is “products”, so the actual index files maintained by Lucene will be in runtime/indexes/products/: this is the “data” folder, you can backup (or clear) it when needed.
defaultIndex=runtime/indexes
in the file lucene/config/search.properties.
For example you can set it to a path to a shared folder where the index can be used by other applications or other OFBiz instances.
The Lucene index named “products” contains Lucene documents of the same type: the Product Document described in the next section.
How to create a new index named “customIndex”
DocumentIndexer indexer = DocumentIndexer.getInstance(delegator, "customIndex");
This is actually the same code you can use to get an indexer object, i.e. an object you can use to submit for indexing your documents: if the system cannot locate a directory based index named “customIndex” it creates one (under runtime/indexes/customIndex/); the next calls to DocumentIndexer.getInstance(…) will then return an indexer for it.
This section describes the internal structure of the product document in the “products” index. This information is a useful reference for the implementation of advanced research user interfaces (e.g. faceted searches, weighted searches etc…).
ProductDocument (the topic of this section and ContentDocument).
getDocumentIdentifier(…) should return a Lucene Term that uniquely identifies the document in the index: this is used to locate and recreate a document in the index
prepareDocument(…) contains the information extraction logic to prepare a Lucene document
As soon as you have implemented your custom implementation of a LuceneDocument you can submit it for indexing; see this sample code:
DocumentIndexer indexer = DocumentIndexer.getInstance(delegator, "customIndex"); // CustomDocument is a custom implementation of a LuceneDocument LuceneDocument document = new CustomDocument("ABC"); // ABC is the unique identifier of the document in the index indexer.queue(document);
The unique identifier of the document in the index is the productId field.
The productId field is the only field whose content is stored with the document in the index (and thus returned in Lucene search results): in this way the size of the index is kept as small as possible.
The fields that in the table have the “Boosted” column set to Y can be boosted by setting a weight (i.e. boost factor) in the file applications/product/config/productsearch.properties; for example, the following line sets a boost factor of 1 for the field Product.description:
index.weight.Product.description=1
If in this file the weight of a field is set to 0 then the field is not added to the document (not indexed).
The field with Type “id” are added to the index as is (without parsing/tokenization); the field with type “text” are parsed and tokenized.
“fullText” is the main field of the document and should be used for generic searches: in fact the content of several other fields is added to it. The other fields are useful faceted searches or for weighted searches.
Name | Description | Type | Multi Valued | In fullText | Boosted | Stored |
---|---|---|---|---|---|---|
productId | The unique identifier of the Document in the Lucene Index; it matches the Product.productId, i.e. the unique identifier in the OFBiz transactional database. This field is primarily used to find and recreate the document in the index when the product information is updated and it needs to be reindexed. This is the only field whose content is stored with the document in the index. | id | N | N | N | Y |
fullText | The field is associated to the aggregated content from several other fields (the ones with “In fullText” set to “Y” in this table). This is the primary field to be used in free txt searches. | text | N | – | N | N |
productName | text | N | Y | Y | N | |
internalName | text | N | Y | Y | N | |
brandName | text | N | Y | Y | N | |
description | text | N | Y | Y | N | |
longDescription | text | N | Y | Y | N | |
introductionDate | Long value representing a date; the time information has been removed from the original field. | quantized date | N | N | N | N |
salesDiscontinuationDate | Long value representing a date; the time information has been removed from the original field. | quantized date | N | N | N | N |
isVariant | id | N | N | N | N | |
productFeatureId | id | Y | N | N | N | |
productFeatureCategoryId | id | Y | N | N | N | |
productFeatureTypeId | id | Y | N | N | N | |
featureDescription | text | Y | Y | Y | N | |
featureAbbreviation | text | Y | Y | Y | N | |
featureCode | text | Y | Y | Y | N | |
productFeatureGroupId | id | Y | N | N | N | |
attributeName | text | Y | Y | Y | N | |
attributeValue | text | Y | Y | Y | N | |
goodIdentificationTypeId | id | Y | N | N | N | |
goodIdentificationIdValue | id | Y | N | N | N | |
${goodIdentificationTypeId} _GoodIdentification | Dynamic field: different documents representing different products with different identification types may have different field names in the index. | id | Y | N | N | N |
identificationValue | text | Y | Y | Y | N | |
variantProductId | text | Y | Y | Y | N | |
content | text | Y | Y | Y | N | |
${productPriceTypeId} _${productPricePurposeId} _${currencyUomId} _${productStoreGroupId} _price | Dynamic field associated to the double value term of a specific price type/purpose/currency/store group. If a product has several different prices the document will have one field for each. | double | Y | N | N | N |
supplierPartyId | id | Y | N | N | N | |
prodCatalogId | id | Y | N | N | N | |
prodCategoryId | id | Y | N | N | N | |
directProductCategoryId | id | Y | N | N | N |
A product document is submitted for indexing or re-indexing every time the information about the product is added/updated/removed; this is done real time using Entity-Condition-Actions (ECAs) rules.
In particular, the following events trigger a product document indexing (or-reindexing):
In addition to the automatic synchronization described in the previous section, there are two ways to easily submit for indexing one product; there is also a script to submit for indexing all products.
How to submit for indexing in index “products” the product “ABC”
String productId = "ABC"; // get an instance of the indexer for the index named "products" DocumentIndexer indexer = DocumentIndexer.getInstance(delegator, "products"); // submit the product document for indexing indexer.queue(new ProductDocument(productId));
The second way is equivalent and it is based on a service call for service “indexProduct”, passing in the productId:
How to submit for indexing the product “ABC” (service call)
dispatcher.runSync("indexProduct", UtilMisc.toMap("productId", "ABC"));
There is also a Groovy script that submits for indexing all the products in the database. The script location is:
lucene/webapp/content/WEB-INF/actions/IndexProducts.groovy
The easiest way to run it is thru the user interface following the steps 1 and 2 in the section “How to Test – Admin User Interface”.