Lucene.Net.Queries

A container that allows Boolean composition of s. s are allocated into one of three logical constructs; SHOULD, MUST NOT, MUST The results BitSet is constructed as follows: SHOULD Filters are OR'd together The resulting is NOT'd with the NOT s The resulting is AND'd with the MUST s

Returns the a representing the Boolean composition of the filters that have been added.

Adds a new to the Boolean container

A object containing a and an parameter

Gets the list of clauses

Returns an iterator on the clauses in this query. It implements the interface to make it possible to do: for (FilterClause clause : booleanFilter) {}

Prints a user-readable version of this .

The class can be used to effectively demote results that match a given query. Unlike the "NOT" clause, this still selects documents that contain undesirable terms, but reduces their overall score:


                Query balancedQuery = new BoostingQuery(positiveQuery, negativeQuery, 0.01f);

In this scenario the positiveQuery contains the mandatory, desirable criteria which is used to select all matching documents, and the negativeQuery contains the undesirable elements which are simply used to lessen the scores. Documents that match the negativeQuery have their score multiplied by the supplied "boost" parameter, so this should be less than 1 to achieve a demoting effect This code was originally made available here: [WWW] http://marc.theaimsgroup.com/?l=lucene-user&m=108058407130459&w=2 and is documented here: http://wiki.apache.org/lucene-java/CommunityContributions

Allows multiple s to be chained. Logical operations such as NOT and XOR are applied between filters. One operation can be used for all filters, or a specific operation can be declared for each filter. Order in which filters are called depends on the position of the filter in the chain. It's probably more efficient to place the most restrictive filters/least computationally-intensive filters first.

Logical operation when none is declared. Defaults to OR.

The filter chain

Ctor.

The chain of filters

Ctor.

The chain of filters Logical operations to apply between filters

Ctor.

The chain of filters Logical operation to apply to ALL filters

Delegates to each filter in the chain.

AtomicReaderContext Logical operation DocIdSet

Delegates to each filter in the chain.

AtomicReaderContext Logical operation DocIdSet

A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. Scores produced by this query will be slightly different than plain scorer mainly due to differences in the number of leaf queries in the required boolean clause. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable. has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files. Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document. Collection initializer note: To create and populate a in a single statement, you can use the following example as a guide:


            var query = new CommonTermsQuery() {
                new Term("field", "microsoft"), 
                new Term("field", "office")
            };

Creates a new

used for high frequency terms used for low frequency terms a value in [0..1) (or absolute number >=1) representing the maximum threshold of a terms document frequency to be considered a low frequency term. if is pass as or

Creates a new

Adds a term to the

the term to add

Returns true iff is disabled in scoring for the high and low frequency query instance. The top level query will always disable coords.

Gets or Sets a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match. By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

Gets or Sets a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match. By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

Builds a new instance. This is intended for subclasses that wish to customize the generated queries.

term the to be used to create the low level term query. Can be null. new instance

Returns an enumerator that iterates through the collection.

An enumerator that can be used to iterate through the collection.

Returns an enumerator that iterates through the collection.

An enumerator that can be used to iterate through the collection.

An instance of this subclass should be returned by , if you want to modify the custom score calculation of a . Since Lucene 2.9, queries operate on each segment of an index separately, so the protected field can be used to resolve doc IDs, as the supplied doc ID is per-segment and without knowledge of the you cannot access the document or . @lucene.experimental @since 2.9.2

Creates a new instance of the provider class for the given .

Compute a custom score by the subQuery score and a number of scores. Subclasses can override this method to modify the custom score. If your custom scoring is different than the default herein you should override at least one of the two methods. If the number of s is always < 2 it is sufficient to override the other method, which is simpler. The default computation herein is a multiplication of given scores:


                ModifiedScore = valSrcScore * valSrcScores[0] * valSrcScores[1] * ...

id of scored doc. score of that doc by the subQuery. scores of that doc by the . custom score.

Compute a custom score by the and the score. Subclasses can override this method to modify the custom score. If your custom scoring is different than the default herein you should override at least one of the two methods. If the number of s is always < 2 it is sufficient to override this method, which is simpler. The default computation herein is a multiplication of the two scores:


                ModifiedScore = subQueryScore * valSrcScore

id of scored doc. score of that doc by the subQuery. score of that doc by the . custom score.

Explain the custom score. Whenever overriding , this method should also be overridden to provide the correct explanation for the part of the custom scoring.

doc being explained. explanation for the sub-query part. explanation for the value source part. an explanation for the custom score

Explain the custom score. Whenever overriding , this method should also be overridden to provide the correct explanation for the part of the custom scoring.

doc being explained. explanation for the sub-query part. explanation for the value source part. an explanation for the custom score

Query that sets document score as a programmatic function of several (sub) scores: the score of its subQuery (any query) (optional) the score of its (or queries). Subclasses can modify the computation by overriding . @lucene.experimental

Create a over input .

the sub query whose scored is being customized. Must not be null.

Create a over input and a .

the sub query whose score is being customized. Must not be null. a value source query whose scores are used in the custom score computation. This parameter is optional - it can be null.

Create a over input and a .

the sub query whose score is being customized. Must not be null. value source queries whose scores are used in the custom score computation. This parameter is optional - it can be null or even an empty array.

Returns true if is equal to this.

Returns a hash code value for this object.

Returns a that calculates the custom scores for the given . The default implementation returns a default implementation as specified in the docs of . @since 2.9.2

A scorer that applies a (callback) function on scores of the subQuery.

Checks if this is strict custom scoring. In strict custom scoring, the part does not participate in weight normalization. This may be useful when one wants full control over how scores are modified, and does not care about normalizing by the part. One particular case where this is useful if for testing this query. Note: only has effect when the part is not null.

The sub-query that wraps, affecting both the score and which documents match.

The scoring queries that only affect the score of .

A short name of this query, used in .

A that wrapped with an indication of how that filter is used when composed with another filter. (Follows the boolean logic in for composition of queries.)

Create a new

A object containing a BitSet A parameter implementation indicating SHOULD, MUST or MUST NOT

Returns this 's filter

A object

Returns this 's occur parameter

An object

Query that is boosted by a

Abstract implementation which supports retrieving values. Implementations can control how the values are loaded through

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

Serves as base class for based on DocTermsIndex. @lucene.internal

Custom to be thrown when the DocTermsIndex for a field cannot be generated

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Abstract implementation which supports retrieving values. Implementations can control how the values are loaded through

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

Abstract implementation which supports retrieving values. Implementations can control how the values are loaded through NOTE: This was FloatDocValues in Lucene

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

Abstract implementation which supports retrieving values. Implementations can control how the values are loaded through NOTE: This was IntDocValues in Lucene

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

Abstract implementation which supports retrieving values. Implementations can control how the values are loaded through NOTE: This was LongDocValues in Lucene

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was externalToLong() in Lucene

Abstract implementation which supports retrieving values. Implementations can control how the values are loaded through

Returns a score for each document based on a , often some function of the value of a field. Note: This API is experimental and may change in non backward-compatible ways in the future

defines the function to be used for scoring The associated

Prints a user-readable version of this query.

Returns true if is equal to this.

Returns a hash code value for this object.

Represents field values as different types. Normally created via a for a particular field and reader.

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

returns the bytes representation of the str val - TODO: should this return the indexed raw bytes not?

Native representation of the value

Returns true if there is a value for this document

The doc to retrieve to sort ordinal for the sort ordinal for the specified doc TODO: Maybe we can just use intVal for this... the number of unique sort ordinals this instance has

Abstraction of the logic required to fill the value of a specified doc into a reusable . Implementations of are encouraged to define their own implementations of if their value is not a . @lucene.experimental

will be reused across calls

will be reused across calls. Returns true if the value exists.

This class may be used to create instances anonymously.

@lucene.experimental

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

Instantiates for a particular reader. Often used when creating a .

Gets the values for this reader and the context that was previously passed to

description of field, used in Explain()

Implementations should propagate CreateWeight to sub-ValueSources which can optionally store weight info in the context. The context object will be passed to GetValues() where this info can be retrieved.

Returns a new non-threadsafe context map.

EXPERIMENTAL: This method is subject to change. Get the for this . Uses the to populate the .

true if this is a reverse sort. The for the

Implement a that works off of the for a instead of the normal Lucene that works off of a .

which returns the result of as the score for a document. When overriding this class, be aware that ValueSourceScorer constructor is calling its private SetCheckDeletesInternal method as opposed to virtual SetCheckDeletes method. This is done to avoid virtual call in constructor. You can call your own private method for CheckDeletes initialization in your constructor if you need to.

This class may be used to create instances anonymously.

Abstract parent class for those implementations which apply boolean logic to their values

Obtains field values from the using and makes those values available as other numeric types, casting as needed. *

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

An implementation for retrieving instances for based fields.

is the base class for all constant numbers

NOTE: This was getInt() in Lucene

NOTE: This was getLong() in Lucene

NOTE: This was getFloat() in Lucene

returns a constant for all documents

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

NOTE: This was getInt() in Lucene

NOTE: This was getLong() in Lucene

NOTE: This was getFloat() in Lucene

implementation which only returns the values from the provided s which are available for a particular docId. Consequently, when combined with a , this function serves as a way to return a default value when the values for a field are unavailable.

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

Function to divide "a" by "b" NOTE: This was DivFloatFunction in Lucene

the numerator. the denominator.

NOTE: This was ConstIntDocValues in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

returns the number of documents containing the term. @lucene.internal

Function that returns a constant double value for every document.

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

NOTE: This was getInt() in Lucene

NOTE: This was getLong() in Lucene

NOTE: This was getFloat() in Lucene

Obtains field values from and makes those values available as other numeric types, casting as needed.

Abstract implementation which wraps two s and applies an extendible function to their values. NOTE: This was DualFloatFunction in Lucene

the base. the exponent.

NOTE: This was floatVal() in Lucene

Obtains field values from and makes those values available as other numeric types, casting as needed. StrVal of the value is not the value, but its (displayed) value.

NOTE: This was intValueToStringValue() in Lucene

NOTE: This was stringValueToIntValue() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

A base class for implementations that retrieve values for a single field from the .

Obtains field values from and makes those values available as other numeric types, casting as needed. NOTE: This was FloatFieldSource in Lucene

NOTE: This was floatVal() in Lucene

Function that returns for every document. Note that the configured Similarity for the field must be a subclass of @lucene.internal

Depending on the value of the function, returns the value of the or function.

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

Obtains field values from and makes those values available as other numeric types, casting as needed. NOTE: This was IntFieldSource in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

Use a field value and find the Document Frequency within another field. @since solr 4.0

NOTE: This was intVal() in Lucene

implements a linear function over another . Normally Used as an argument to a NOTE: This was LinearFloatFunction in Lucene

NOTE: This was floatVal() in Lucene

Pass a the field value through as a , no matter the type // Q: doesn't this mean it's a "str"?

returns the literal value

Obtains field values from and makes those values available as other numeric types, casting as needed. NOTE: This was LongFieldSource in Lucene

NOTE: This was externalToLong() in Lucene

NOTE: This was longToObject() in Lucene

NOTE: This was longToString() in Lucene

NOTE: This was longVal() in Lucene

NOTE: This was externalToLong() in Lucene

NOTE: This was longToString() in Lucene

Returns the value of for every document. This is the number of documents including deletions.

returns the max of it's components. NOTE: This was MaxFloatFunction in Lucene

returns the min of it's components. NOTE: This was MinFloatFunction in Lucene

Abstract implementation which wraps multiple s and applies an extendible function to their values.

Abstract implementation which wraps multiple s and applies an extendible function to their values. NOTE: This was MultiFloatFunction in Lucene

NOTE: This was floatVal() in Lucene

Abstract parent class for implementations that wrap multiple s and apply their own logic.

A that abstractly represents s for poly fields, and other things.

Function that returns for every document. Note that the configured Similarity for the field must be a subclass of @lucene.internal

NOTE: This was floatVal() in Lucene

Returns the value of for every document. This is the number of documents excluding deletions.

Obtains the ordinal of the field value from the default Lucene using StringIndex. The native lucene index order is used to assign an ordinal value for each field value. Field values (terms) are lexicographically ordered by unicode value, and numbered starting at 1. Example:


                If there were only three field values: "apple","banana","pear"
                then ord("apple")=1, ord("banana")=2, ord("pear")=3

WARNING: Ord depends on the position in an index and can thus change when other documents are inserted or deleted, or if a MultiSearcher is used. WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use since they must use a FieldCache entry at the top level reader, while sorting and function queries now use entries at the segment level. Hence sorting or using a different function query, in addition to ord()/rord() will double memory use.

NOTE: This was intVal() in Lucene

Function to raise the base "a" to the power "b" NOTE: This was PowFloatFunction in Lucene

the base. the exponent.

returns the product of it's components. NOTE: This was ProductFloatFunction in Lucene

returns the relevance score of the query

NOTE: This was floatVal() in Lucene

implements a map function over another whose values fall within min and max inclusive to target. Normally used as an argument to a NOTE: This was RangeMapFloatFunction in Lucene

NOTE: This was floatVal() in Lucene

implements a reciprocal function f(x) = a/(mx+b), based on the value of a field or function as exported by . When a and b are equal, and x>=0, this function has a maximum value of 1 that drops as x increases. Increasing the value of a and b together results in a movement of the entire function to a flatter part of the curve. These properties make this an idea function for boosting more recent documents. Example: recip(ms(NOW,mydatefield),3.16e-11,1,1) A multiplier of 3.16e-11 changes the units from milliseconds to years (since there are about 3.16e10 milliseconds per year). Thus, a very recent date will yield a value close to 1/(0+1) or 1, a date a year in the past will get a multiplier of about 1/(1+1) or 1/2, and date two years old will yield 1/(2+1) or 1/3. NOTE: This was ReciprocalFloatFunction in Lucene

f(source) = a/(m*float(source)+b)

NOTE: This was floatVal() in Lucene

Obtains the ordinal of the field value from the default Lucene using and reverses the order. The native lucene index order is used to assign an ordinal value for each field value. Field values (terms) are lexicographically ordered by unicode value, and numbered starting at 1. Example of reverse ordinal (rord):


                If there were only three field values: "apple","banana","pear"
                then rord("apple")=3, rord("banana")=2, ord("pear")=1

NOTE: This was intVal() in Lucene

Scales values to be between min and max. This implementation currently traverses all of the source values to obtain their min and max. This implementation currently cannot distinguish when documents have been deleted or documents that have no value, and 0.0 values will be used for these cases. This means that if values are normally all greater than 0.0, one can still end up with 0.0 as the min value to map from. In these cases, an appropriate map() function could be used as a workaround to change 0.0 to a value in the real range. NOTE: This was ScaleFloatFunction in Lucene

NOTE: This was floatVal() in Lucene

Obtains field values from the using and makes those values available as other numeric types, casting as needed. NOTE: This was ShortFieldSource in Lucene

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

implementation which applies an extendible function to the values of a single wrapped . Functions this can be used for include whether a field has a value or not, or inverting the value of the wrapped .

A simple function with a single argument NOTE: This was SimpleFloatFunction in Lucene

NOTE: This was floatVal() in Lucene

A function with a single (one) argument. NOTE: This was SingleFunction in Lucene, changed to avoid conusion with operations on the datatype .

returns the sum of its components. NOTE: This was SumFloatFunction in Lucene

returns the number of tokens. (sum of term freqs across all documents, across all terms). Returns -1 if frequencies were omitted for the field, or if the codec doesn't support this statistic. @lucene.internal

NOTE: This was longVal() in Lucene

Function that returns for the supplied term in every document. If the term does not exist in the document, returns 0. If frequencies are omitted, returns 1.

NOTE: This was intVal() in Lucene

Function that returns for every document. Note that the configured Similarity for the field must be a subclass of @lucene.internal

NOTE: This was floatVal() in Lucene

returns the total term freq (sum of term freqs across all documents). Returns -1 if frequencies were omitted for the field, or if the codec doesn't support this statistic. @lucene.internal

NOTE: This was longVal() in Lucene

Converts individual instances to leverage the FunctionValues *Val functions that work with multiple values, i.e.

NOTE: This was shortVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was shortVal() in Lucene

NOTE: This was floatVal() in Lucene

NOTE: This was intVal() in Lucene

NOTE: This was longVal() in Lucene

Generate "more like this" similarity queries. Based on this mail:


            Lucene does let you access the document frequency of terms, with .
            Term frequencies can be computed by re-tokenizing the text, which, for a single document,
            is usually fast enough.  But looking up the  of every term in the document is
            probably too slow.
            
            You can use some heuristics to prune the set of terms, to avoid calling  too much,
            or at all.  Since you're trying to maximize a tf*idf score, you're probably most interested
            in terms with a high tf. Choosing a tf threshold even as low as two or three will radically
            reduce the number of terms under consideration.  Another heuristic is that terms with a
            high idf (i.e., a low df) tend to be longer.  So you could threshold the terms by the
            number of characters, not selecting anything less than, e.g., six or seven characters.
            With these sorts of heuristics you can usually find small set of, e.g., ten or fewer terms
            that do a pretty good job of characterizing a document.
            
            It all depends on what you're trying to do.  If you're trying to eek out that last percent
            of precision and recall regardless of computational difficulty so that you can win a TREC
            competition, then the techniques I mention above are useless.  But if you're trying to
            provide a "more like this" button on a search results page that does a decent job and has
            good performance, such techniques might be useful.
            
            An efficient, effective "more-like-this" query generator would be a great contribution, if
            anyone's interested.  I'd imagine that it would take a Reader or a String (the document's
            text), analyzer Analyzer, and return a set of representative terms using heuristics like those
            above.  The frequency and length thresholds could be parameters, etc.
            
            Doug

Initial Usage This class has lots of options to try to make it efficient and flexible. The simplest possible usage is as follows. The bold fragment is specific to this class.


            IndexReader ir = ...
            IndexSearcher is = ...
            
            MoreLikeThis mlt = new MoreLikeThis(ir);
            TextReader target = ... // orig source of doc you want to find similarities to
            Query query = mlt.Like(target);
            
            Hits hits = is.Search(query);
            // now the usual iteration thru 'hits' - the only thing to watch for is to make sure
            //you ignore the doc if it matches your 'target' document, as it should be similar to itself

Thus you: do your normal, Lucene setup for searching, create a MoreLikeThis, get the text of the doc you want to find similarities to then call one of the calls to generate a similarity query call the searcher to find the similar docs More Advanced Usage You may want to use the setter for so you can examine multiple fields (e.g. body and title) for similarity. Depending on the size of your index and the size and makeup of your documents you may want to call the other set methods to control how the similarity queries are generated:

Changes: Mark Harwood 29/02/04 Some bugfixing, some refactoring, some optimisation. - bugfix: retrieveTerms(int docNum) was not working for indexes without a termvector -added missing code - bugfix: No significant terms being created for fields with a termvector - because was only counting one occurrence per term/field pair in calculations(ie not including frequency info from TermVector) - refactor: moved common code into isNoiseWord() - optimise: when no termvector support available - used maxNumTermsParsed to limit amount of tokenization

Default maximum number of tokens to parse in each example doc field that is not stored with TermVector support.

Ignore terms with less than this frequency in the source doc.

Ignore words which do not occur in at least this many docs.

Ignore words which occur in more than this many docs.

Boost terms in query based on score.

Default field names. Null is used to specify that the field names should be looked up at runtime from the provided reader.

Ignore words less than this length or if 0 then this has no effect.

Ignore words greater than this length or if 0 then this has no effect.

Default set of stopwords. If null means to allow stop words.

Return a Query with no more than this many terms.

to use

Boost factor to use when boosting the terms

Gets or Sets the boost factor used when boosting terms

Constructor requiring an .

For idf() calculations.

Gets or Sets an analyzer that will be used to parse source doc with. The default analyzer is not set. An analyzer is not required for generating a query with the method, all other 'like' methods require an analyzer.

Gets or Sets the frequency below which terms will be ignored in the source doc. The default frequency is the .

Gets or Sets the frequency at which words will be ignored which do not occur in at least this many docs. The default frequency is .

Gets or Sets the maximum frequency in which words may still appear. Words that appear in more than this many docs will be ignored. The default frequency is .

Set the maximum percentage in which words may still appear. Words that appear in more than this many percent of all docs will be ignored.

the maximum percentage of documents (0-100) that a term may appear in to be still considered relevant

Gets or Sets whether to boost terms in query based on "score" or not. The default is .

Gets or Sets the field names that will be used when generating the 'More Like This' query. The default field names that will be used is . Set this to null for the field names to be determined at runtime from the provided in the constructor.

Gets or Sets the minimum word length below which words will be ignored. Set this to 0 for no minimum word length. The default is .

Gets or Sets the maximum word length above which words will be ignored. Set this to 0 for no maximum word length. The default is .

Gets or Sets the set of stopwords. Any word in this set is considered "uninteresting" and ignored. Even if your allows stopwords, you might want to tell the code to ignore them, as for the purposes of document similarity it seems reasonable to assume that "a stop word is never interesting".

Gets or Sets the maximum number of query terms that will be included in any generated query. The default is .

Gets or Sets the maximum number of tokens to parse in each example doc field that is not stored with TermVector support

Return a query that will return docs like the passed lucene document ID.

the documentID of the lucene doc to generate the 'More Like This" query for. a query that will return docs like the passed lucene document ID.

Return a query that will return docs like the passed .

a query that will return docs like the passed .

Create the More like query from a

Create a from a word->tf map.

a map of words keyed on the word() with objects as the values.

Describe the parameters that control how the "more like this" query is formed.

Find words for a more-like-this query former.

the id of the lucene document from which to find terms

Adds terms and frequencies found in vector into the

a of terms and their frequencies List of terms and their frequencies for a doc/field

Adds term frequencies found by tokenizing text from reader into the words

a source of text to be tokenized a of terms and their frequencies Used by analyzer for any special per-field analysis

determines if the passed term is likely to be of interest in "more like" comparisons

The word being considered true if should be ignored, false if should be used in further analysis

Find words for a more-like-this query former. The result is a priority queue of objects with one entry for every word in the document. Each object has 6 properties. The properties are: The () The that this word comes from () The for this word () The value () The (frequency of this word in the index ()) The (frequency of this word in the source document ()) This is a somewhat "advanced" routine, and in general only the is of interest. This method is exposed so that you can identify the "interesting words" in a document. For an easier method to call see .

the reader that has the content of the document field passed to the analyzer to use when analyzing the content the most interesting words in the document ordered by score, with the highest scoring, or best entry, first

Convenience routine to make it easy to return the most interesting words in a document. More advanced users will call directly.

the source document field passed to analyzer to use when analyzing the content the most interesting words in the document

that orders words by score.

Use for frequencies and to avoid renewing s. NOTE: This was Int in Lucene

An "interesting word" and related top field, score and frequency information.

Gets the word.

Gets the top field that this word comes from.

Gets the score for this word ().

Gets the inverse document frequency (IDF) value ().

Gets the frequency of this word in the index ().

Gets the frequency of this word in the source document ().

A simple wrapper for for use in scenarios where a object is required eg in custom QueryParser extensions. At query.Rewrite() time the reader is used to construct the actual object and obtain the real object.

fields used for similarity measure

A filter that includes documents that match with a specific term.

The term documents need to have in order to be a match for this filter.

Gets the term this filter includes documents with.

Constructs a filter for docs matching any of the terms added to this class. Unlike a RangeFilter this can be used for filtering on multiple terms that are not necessarily in a sequence. An example might be a collection of primary keys from a database query result or perhaps a choice of "category" labels picked by the end user. As a filter, this is much faster than the equivalent query (a with many "should" s)

Creates a new from the given list. The list can contain duplicate terms and multiple fields.

Creates a new from the given list for a single field.

Creates a new from the given array for a single field.

Creates a new from the given array. The array can contain duplicate terms and multiple fields.