Lucene.Net

An builds s, which analyze text. It thus represents a policy for extracting index terms from text. In order to define what analysis is done, subclasses must define their in . The components are then reused in each call to . Simple example:


            Analyzer analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
            {
                Tokenizer source = new FooTokenizer(reader);
                TokenStream filter = new FooFilter(source);
                filter = new BarFilter(filter);
                return new TokenStreamComponents(source, filter);
            });

For more examples, see the namespace documentation. For some concrete implementations bundled with Lucene, look in the analysis modules: [Common](../analysis-common/overview.html): Analyzers for indexing content in different languages and domains. [ICU](../icu/Lucene.Net.Analysis.Icu.html): Exposes functionality from ICU to Apache Lucene. [Kuromoji](../analysis-kuromoji/Lucene.Net.Analysis.Ja.html): Morphological analyzer for Japanese text. [Morfologik](../analysis-morfologik/Lucene.Net.Analysis.Morfologik.html): Dictionary-driven lemmatization for the Polish language. [OpenNLP](../analysis-opennlp/Lucene.Net.Analysis.OpenNlp.html): Analysis integration with Apache OpenNLP. [Phonetic](../analysis-phonetic/Lucene.Net.Analysis.Phonetic.html): Analysis for indexing phonetic signatures (for sounds-alike search). [Smart Chinese](../analysis-smartcn/Lucene.Net.Analysis.Cn.Smart.html): Analyzer for Simplified Chinese, which indexes words. [Stempel](../analysis-stempel/Lucene.Net.Analysis.Stempel.html): Algorithmic Stemmer for the Polish Language.

Create a new , reusing the same set of components per-thread across calls to .

Expert: create a new Analyzer with a custom . NOTE: if you just want to reuse on a per-field basis, its easier to use a subclass of such as Lucene.Net.Analysis.Common.Miscellaneous.PerFieldAnalyzerWrapper instead.

Creates a new instance with the ability to specify the body of the method through the parameter. Simple example:


                var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
                {
                    Tokenizer source = new FooTokenizer(reader);
                    TokenStream filter = new FooFilter(source);
                    filter = new BarFilter(filter);
                    return new TokenStreamComponents(source, filter);
                });

LUCENENET specific

A delegate method that represents (is called by) the method. It accepts a fieldName and a reader and returns the for this analyzer. A new instance.

Creates a new instance with the ability to specify the body of the method through the parameter and allows the use of a . Simple example:


                var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
                {
                    Tokenizer source = new FooTokenizer(reader);
                    TokenStream filter = new FooFilter(source);
                    filter = new BarFilter(filter);
                    return new TokenStreamComponents(source, filter);
                }, reuseStrategy);

LUCENENET specific

An delegate method that represents (is called by) the method. It accepts a fieldName and a reader and returns the for this analyzer. A custom instance. A new instance.

Creates a new instance with the ability to specify the body of the method through the parameter and the body of the method through the parameter. Simple example:


                var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
                {
                    Tokenizer source = new FooTokenizer(reader);
                    TokenStream filter = new FooFilter(source);
                    filter = new BarFilter(filter);
                    return new TokenStreamComponents(source, filter);
                }, initReader: (fieldName, reader) => 
                {
                    return new HTMLStripCharFilter(reader);
                });

LUCENENET specific

A delegate method that represents (is called by) the method. It accepts a fieldName and a reader and returns the for this analyzer. A delegate method that represents (is called by) the method. It accepts a fieldName and a reader and returns the that can be modified or wrapped by the method. A new instance.

Creates a new instance with the ability to specify the body of the method through the parameter, the body of the method through the parameter, and allows the use of a . Simple example:


                var analyzer = Analyzer.NewAnonymous(createComponents: (fieldName, reader) => 
                {
                    Tokenizer source = new FooTokenizer(reader);
                    TokenStream filter = new FooFilter(source);
                    filter = new BarFilter(filter);
                    return new TokenStreamComponents(source, filter);
                }, initReader: (fieldName, reader) => 
                {
                    return new HTMLStripCharFilter(reader);
                }, reuseStrategy);

LUCENENET specific

Creates a new instance for this analyzer.

the name of the fields content passed to the sink as a reader the reader passed to the constructor the for this analyzer.

Returns a suitable for , tokenizing the contents of text. This method uses to obtain an instance of . It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through . NOTE: After calling this method, the consumer must follow the workflow described in to properly consume its contents. See the namespace documentation for some examples demonstrating this.

the name of the field the created is used for the reader the streams source reads from for iterating the analyzed content of if the Analyzer is disposed. if an i/o error occurs (may rarely happen for strings).

Returns a suitable for , tokenizing the contents of . This method uses to obtain an instance of . It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through . NOTE: After calling this method, the consumer must follow the workflow described in to properly consume its contents. See the namespace documentation for some examples demonstrating this.

the name of the field the created is used for the the streams source reads from for iterating the analyzed content of reader if the Analyzer is disposed. if an i/o error occurs (may rarely happen for strings).

Override this if you want to add a chain. The default implementation returns unchanged.

name being indexed original reader, optionally decorated with (s)

Invoked before indexing a instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between instances using the same field name. The default value position increment gap is 0. With a 0 position increment gap and the typical default token position increment of 1, all terms in a field, including across instances, are in successive positions, allowing exact matches, for instance, across instance boundaries.

name being indexed. position increment gap, added to the next token emitted from . this value must be >= 0.

Just like , except for offsets instead. By default this returns 1. this method is only called if the field produced at least one token for indexing.

the field just indexed offset gap, added to the next token emitted from . this value must be >= 0.

Returns the used .

Frees persistent resources used by this

A predefined that reuses the same components for every field.

Implementation of that reuses the same components for every field.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

A predefined that reuses components per-field by maintaining a Map of per field name.

Implementation of that reuses components per-field by maintaining a Map of per field name.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

LUCENENET specific helper class to mimick Java's ability to create anonymous classes. Clearly, the design of took this feature of Java into consideration. Since it doesn't exist in .NET, we can use a delegate method to call the constructor of this concrete instance to fake it (by calling an overload of ).

This class encapsulates the outer components of a token stream. It provides access to the source () and the outer end (sink), an instance of which also serves as the returned by .

Original source of the tokens.

Sink tokenstream, such as the outer tokenfilter decorating the chain. This can be the source if there are no filters.

Internal cache only used by .

Creates a new instance.

the analyzer's tokenizer the analyzer's resulting token stream

Creates a new instance.

the analyzer's tokenizer

Resets the encapsulated components with the given reader. If the components cannot be reset, an Exception should be thrown.

a reader to reset the source component if the component's reset method throws an

Returns the sink

the sink

Returns the component's

Component's

Strategy defining how are reused per call to .

Gets the reusable for the field with the given name.

from which to get the reused components. Use and to access the data on the . Name of the field whose reusable are to be retrieved Reusable for the field, or null if there was no previous components for the field

Stores the given as the reusable components for the field with the give name.

Analyzer Name of the field whose are being set which are to be reused for the field

Returns the currently stored value.

Currently stored value or null if no value is stored if the is closed.

Sets the stored value.

Analyzer Value to store if the is closed.

Extension to suitable for s which wrap other s. allows the to wrap multiple s which are selected on a per field basis. allows the of the wrapped to then be wrapped (such as adding a new to form new ).

Creates a new . Since the of the wrapped s are unknown, is assumed.

Creates a new with the given reuse strategy. If you want to wrap a single delegate you can probably reuse its strategy when instantiating this subclass: base(innerAnalyzer.Strategy). If you choose different analyzers per field, use .

Retrieves the wrapped appropriate for analyzing the field with the given name

Name of the field which is to be analyzed for the field with the given name. Assumed to be non-null

Wraps / alters the given , taken from the wrapped , to form new components. It is through this method that new s can be added by s. By default, the given components are returned.

Name of the field which is to be analyzed taken from the wrapped Wrapped / altered .

Wraps / alters the given . Through this method s can implement . By default, the given reader is returned.

name of the field which is to be analyzed the reader to wrap the wrapped reader

This class can be used if the token attributes of a are intended to be consumed more than once. It caches all token attribute states locally in a List. implements the optional method , which repositions the stream to the first .

Create a new around , caching its token attributes, which can be replayed again after a call to .

Rewinds the iterator to the beginning of the cached list. Note that this does not call on the wrapped tokenstream ever, even the first time. You should the inner tokenstream before wrapping it with .

Releases resources used by the and if overridden in a derived class, optionally releases unmanaged resources.

true to release both managed and unmanaged resources; false to release only unmanaged resources.

Subclasses of can be chained to filter a They can be used as with additional offset correction. s will automatically use if a subclass is used. This class is abstract: at a minimum you must implement , transforming the input in some way from , and to adjust the offsets to match the originals. You can optionally provide more efficient implementations of additional methods like , but this is not required. For examples and integration with , see the namespace documentation.

The underlying character-input stream.

Create a new wrapping the provided reader.

a , can also be a for chaining.

Closes the underlying input stream. NOTE: The default implementation closes the input , so be sure to call base.Dispose(disposing) when overriding this method.

Subclasses override to correct the current offset.

current offset corrected offset

Chains the corrected offset through the input (s).

Skips characters. This method will block until some characters are available, an I/O error occurs, or the end of the stream is reached. LUCENENET specific. Moved here from the Reader class (in Java) so it can be overridden to provide reader buffering.

The number of characters to skip The number of characters actually skipped

LUCENENET specific. Moved here from the Reader class (in Java) so it can be overridden to provide reader buffering.

Tells whether this stream is ready to be read. True if the next is guaranteed not to block for input, false otherwise. Note that returning false does not guarantee that the next read will block. LUCENENET specific. Moved here from the Reader class (in Java) so it can be overridden to provide reader buffering.

Tells whether this stream supports the operation. The default implementation always returns false. Subclasses should override this method. LUCENENET specific. Moved here from the Reader class (in Java) so it can be overridden to provide reader buffering.

true if and only if this stream supports the mark operation.

Marks the present position in the stream. Subsequent calls to will attempt to reposition the stream to this point. Not all character-input streams support the operation. LUCENENET specific. Moved here from the Reader class (in Java) so it can be overridden to provide reader buffering.

Limit on the number of characters that may be read while still preserving the mark. After reading this many characters, attempting to reset the stream may fail.

Expert: this class provides a for indexing numeric values that can be used by or . Note that for simple usage, , , or is recommended. These fields disable norms and term freqs, as they are not usually needed during searching. If you need to change these settings, you should use this class. Here's an example usage, for an field:


                 IndexableFieldType fieldType = new IndexableFieldType(TextField.TYPE_NOT_STORED)
                 {
                     OmitNorms = true,
                     IndexOptions = IndexOptions.DOCS_ONLY
                 };
                 Field field = new Field(name, new NumericTokenStream(precisionStep).SetInt32Value(value), fieldType);
                 document.Add(field);

For optimal performance, re-use the and instance for more than one document:


                 NumericTokenStream stream = new NumericTokenStream(precisionStep);
                 IndexableFieldType fieldType = new IndexableFieldType(TextField.TYPE_NOT_STORED)
                 {
                     OmitNorms = true,
                     IndexOptions = IndexOptions.DOCS_ONLY
                 };
                 Field field = new Field(name, stream, fieldType);
                 Document document = new Document();
                 document.Add(field);
            
                 for(all documents) 
                 {
                     stream.SetInt32Value(value)
                     writer.AddDocument(document);
                 }

this stream is not intended to be used in analyzers; it's more for iterating the different precisions during indexing a specific numeric value. NOTE: as token streams are only consumed once the document is added to the index, if you index more than one numeric field, use a separate instance for each. See for more details on the precisionStep parameter as well as how numeric fields work under the hood. @since 2.9

The full precision token gets this token type assigned.

The lower precision tokens gets this token type assigned.

Expert: Use this attribute to get the details of the currently generated token. @lucene.experimental @since 4.0

Returns current shift value, undefined before first token

Returns current token's raw value as with all applied, undefined before first token

Returns value size in bits (32 for , ; 64 for , )

Don't call this method! @lucene.internal

Implementation of . @lucene.internal @since 4.0

Creates, but does not yet initialize this attribute instance

Creates a token stream for numeric values using the default (4). The stream is not yet initialized, before using set a value using the various Set???Value() methods.

Creates a token stream for numeric values with the specified . The stream is not yet initialized, before using set a value using the various Set???Value() methods.

Expert: Creates a token stream for numeric values with the specified using the given . The stream is not yet initialized, before using set a value using the various Set???Value() methods.

Initializes the token stream with the supplied value. NOTE: This was setLongValue() in Lucene

the value, for which this should enumerate tokens. this instance, because of this you can use it the following way: new Field(name, new NumericTokenStream(precisionStep).SetInt64Value(value))

Initializes the token stream with the supplied value. NOTE: This was setIntValue() in Lucene

the value, for which this should enumerate tokens. this instance, because of this you can use it the following way: new Field(name, new NumericTokenStream(precisionStep).SetInt32Value(value))

Initializes the token stream with the supplied value.

the value, for which this should enumerate tokens. this instance, because of this you can use it the following way: new Field(name, new NumericTokenStream(precisionStep).SetDoubleValue(value))

Initializes the token stream with the supplied value. NOTE: This was setFloatValue() in Lucene

the value, for which this should enumerate tokens. this instance, because of this you can use it the following way: new Field(name, new NumericTokenStream(precisionStep).SetSingleValue(value))

Returns the precision step.

Internal class to enable reuse of the string reader by

A is an occurrence of a term from the text of a field. It consists of a term's text, the start and end offset of the term in the text of the field, and a type string. The start and end offsets permit applications to re-associate a token with its source text, e.g., to display highlighted query terms in a document browser, or to show matching text fragments in a KWIC (KeyWord In Context) display, etc. The type is a string, assigned by a lexical analyzer (a.k.a. tokenizer), naming the lexical or syntactic class that the token belongs to. For example an end of sentence marker token might be implemented with type "eos". The default token type is "word". A Token can optionally have metadata (a.k.a. payload) in the form of a variable length byte array. Use to retrieve the payloads from the index. NOTE: As of 2.9, Token implements all interfaces that are part of core Lucene and can be found in the namespace. Even though it is not necessary to use anymore, with the new API it can be used as convenience class that implements all s, which is especially useful to easily switch from the old to the new API. s and s should try to re-use a instance when possible for best performance, by implementing the API. Failing that, to create a new you should first use one of the constructors that starts with null text. To load the token from a char[] use . To load from a use followed by or . Alternatively you can get the 's termBuffer by calling either , if you know that your text is shorter than the capacity of the termBuffer or , if there is any possibility that you may need to grow the buffer. Fill in the characters of your term into this buffer, with if loading from a string, or with , and finally call to set the length of the term text. See LUCENE-969 for details. Typical Token reuse patterns: Copying text from a string (type is reset to if not specified):


                     return reusableToken.Reinit(string, startOffset, endOffset[, type]);

Copying some text from a string (type is reset to if not specified):


                     return reusableToken.Reinit(string, 0, string.Length, startOffset, endOffset[, type]);

Copying text from char[] buffer (type is reset to if not specified):


                     return reusableToken.Reinit(buffer, 0, buffer.Length, startOffset, endOffset[, type]);

Copying some text from a char[] buffer (type is reset to if not specified):


                     return reusableToken.Reinit(buffer, start, end - start, startOffset, endOffset[, type]);

Copying from one one to another (type is reset to if not specified):


                     return reusableToken.Reinit(source.Buffer, 0, source.Length, source.StartOffset, source.EndOffset[, source.Type]);

A few things to note: initializes all of the fields to default values. this was changed in contrast to Lucene 2.4, but should affect no one. Because s can be chained, one cannot assume that the 's current type is correct. The startOffset and endOffset represent the start and offset in the source text, so be careful in adjusting them. When caching a reusable token, clone it. When injecting a cached token into a stream that can be reset, clone it again. Please note: With Lucene 3.1, the method had to be changed to match the interface introduced by the interface . this method now only prints the term text, no additional information anymore.

Constructs a will null text.

Constructs a with null text and start & end offsets.

start offset in the source text end offset in the source text

Constructs a with null text and start & end offsets plus the type.

start offset in the source text end offset in the source text the lexical type of this

Constructs a with null text and start & end offsets plus flags. NOTE: flags is EXPERIMENTAL.

start offset in the source text end offset in the source text The bits to set for this token

Constructs a with the given term text, and start & end offsets. The type defaults to "word." NOTE: for better indexing speed you should instead use the char[] termBuffer methods to set the term text.

term text start offset in the source text end offset in the source text

Constructs a with the given text, start and end offsets, & type. NOTE: for better indexing speed you should instead use the char[] termBuffer methods to set the term text.

term text start offset in the source text end offset in the source text token type

Constructs a with the given text, start and end offsets, & type. NOTE: for better indexing speed you should instead use the char[] termBuffer methods to set the term text.

term text start offset in the source text end offset in the source text token type bits

Constructs a with the given term buffer (offset & length), start and end offsets

buffer containing term text the index in the buffer of the first character number of valid characters in the buffer start offset in the source text end offset in the source text

Gets or Sets the position increment (the distance from the prior term). The default value is one.

if value is set to a negative value.

Gets or Sets the position length of this (how many positions this token spans). The default value is one.

if value is set to zero or negative.

Returns this 's starting offset, the position of the first character corresponding to this token in the source text. Note that the difference between and may not be equal to termText.Length, as the term text may have been altered by a stemmer or some other filter.

Returns this 's ending offset, one greater than the position of the last character corresponding to this token in the source text. The length of the token in the source text is (EndOffset - ).

Set the starting and ending offset.

If or are negative, or if is greater than

Gets or Sets this 's lexical type. Defaults to "word".

Get the bitset for any bits that have been set. This is completely distinct from , although they do share similar purposes. The flags can be used to encode information about the token for use by other s.

Gets or Sets this 's payload.

Resets the term text, payload, flags, and positionIncrement, startOffset, endOffset and token type to default.

Makes a clone, but replaces the term buffer & start/end offset in the process. This is more efficient than doing a full clone (and then calling ) because it saves a wasted copy of the old termBuffer.

Shorthand for calling , , , (set)

this instance

Shorthand for calling , , , (set) on

this instance

Shorthand for calling , , , (set)

this instance

Shorthand for calling , , , (set)

this instance

Shorthand for calling , , , (set) on

this instance

Shorthand for calling , , , (set) on

this instance

Copy the prototype token's fields into this one. Note: Payloads are shared.

source to copy fields from

Copy the prototype token's fields into this one, with a different term. Note: Payloads are shared.

existing new term text

Copy the prototype token's fields into this one, with a different term. Note: Payloads are shared.

existing buffer containing new term text the index in the buffer of the first character number of valid characters in the buffer

Convenience factory that returns as implementation for the basic attributes and return the default impl (with "Impl" appended) for all other attributes. @since 3.0

Expert: Creates a returning as instance for the basic attributes and for all other attributes calls the given delegate factory. @since 3.0

Expert: Creates an returning as instance for the basic attributes and for all other attributes calls the given delegate factory.

The term text of a .

Copies the contents of buffer, starting at offset for length characters, into the termBuffer array.

the buffer to copy the index in the buffer of the first character to copy the number of characters to copy

Returns the internal termBuffer character array which you can then directly alter. If the array is too small for your token, use to increase it. After altering the buffer be sure to call to record the number of valid characters that were placed into the termBuffer. NOTE: The returned buffer may be larger than the valid .

Grows the termBuffer to at least size , preserving the existing content.

minimum size of the new termBuffer newly created termBuffer with length >= newSize

Gets or Sets the number of valid characters (in the termBuffer array.

Set number of valid characters (length of the term) in the termBuffer array. Use this to truncate the termBuffer or to synchronize with external manipulation of the termBuffer. Note: to grow the size of the array, use first. NOTE: This is exactly the same operation as calling the setter, the primary difference is that this method returns a reference to the current object so it can be chained.


            obj.SetLength(30).Append("hey you");

the truncated length

Sets the length of the termBuffer to zero. Use this method before appending contents.

Appends the contents of the to this character sequence. The characters of the argument are appended, in order, increasing the length of this sequence by the length of the argument. If is null, this method is a no-op. IMPORTANT: This method uses .NET semantics. In Lucene, a null would append the string "null" to the instance, but in Lucene.NET a null value will be ignored.

Appends the a string representation of the specified to this instance. The characters of the argument are appended, in order, increasing the length of this sequence by the length of . If is null and and are not 0, an is thrown.

Appends the supplied to this character sequence.

The to append.

Appends the contents of the array to this character sequence. The characters of the argument are appended, in order, increasing the length of this sequence by the length of the . If is null, this method is a no-op. This method uses .NET semantics. In Lucene, a null would append the string "null" to the instance, but in Lucene.NET a null value will be safely ignored.

The array to append. LUCENENET specific method, added to simulate using the CharBuffer class in Java.

Appends the string representation of the array to this instance. The characters of the argument are appended, in order, increasing the length of this sequence by the length of the . If is null and and are not 0, an is thrown.

The sequence of characters to append. The start index of the to begin copying characters. The number of characters to append. is null, and and are not zero. is less than zero. -or- is less than zero. -or- + is greater than the length of . LUCENENET specific method, added to simulate using the CharBuffer class in Java. Note that the method provides similar functionality.

Appends the specified to this character sequence. The characters of the argument are appended, in order, increasing the length of this sequence by the length of the argument. If argument is null, this method is a no-op. This method uses .NET semantics. In Lucene, a null would append the string "null" to the instance, but in Lucene.NET a null value will be safely ignored.

The sequence of characters to append. LUCENENET specific method, added because the .NET data type doesn't implement .

Appends the contents of the to this character sequence. The characters of the argument are appended, in order, increasing the length of this sequence by the length of . If is null and and are not 0, an is thrown.

Appends a string representation of the specified to this character sequence. The characters of the argument are appended, in order, increasing the length of this sequence by the length of the argument. If argument is null, this method is a no-op. This method uses .NET semantics. In Lucene, a null would append the string "null" to the instance, but in Lucene.NET a null value will be safely ignored.

Appends the contents of the other to this character sequence. The characters of the argument are appended, in order, increasing the length of this sequence by the length of the argument. If argument is null, this method is a no-op. This method uses .NET semantics. In Lucene, a null would append the string "null" to the instance, but in Lucene.NET a null value will be safely ignored.

The sequence of characters to append.

Default implementation of .

Initialize this attribute with empty term text

Returns solely the term text as specified by the interface. this method changed the behavior with Lucene 3.1, before it returned a String representation of the whole term with all attributes. this affects especially the subclass.

This attribute can be used to pass different flags down the chain, eg from one TokenFilter to another one. This is completely distinct from , although they do share similar purposes. The flags can be used to encode information about the token for use by other s. @lucene.experimental While we think this is here to stay, we may want to change it to be a long.

Get the bitset for any bits that have been set.

Default implementation of .

Initialize this attribute with no bits set

This attribute can be used to mark a token as a keyword. Keyword aware s can decide to modify a token based on the return value of if the token is modified. Stemming filters for instance can use this attribute to conditionally skip a term if returns true.

Gets or Sets whether the current token is a keyword. true if the current token is a keyword, otherwise false.

Default implementation of .

Initialize this attribute with the keyword value as false.

The start and end character offset of a .

Set the starting and ending offset.

If or are negative, or if is greater than

Returns this 's ending offset, one greater than the position of the last character corresponding to this token in the source text. The length of the token in the source text is (EndOffset - ).

Default implementation of .

Initialize this attribute with startOffset and endOffset of 0.

The payload of a Token. The payload is stored in the index at each position, and can be used to influence scoring when using Payload-based queries in the and namespaces. NOTE: because the payload will be stored at each position, its usually best to use the minimum number of bytes necessary. Some codec implementations may optimize payload storage when all payloads have the same length.

Gets or Sets this 's payload.

Default implementation of .

Initialize this attribute with no payload.

Initialize this attribute with the given payload.

Determines the position of this token relative to the previous in a , used in phrase searching. The default value is one. Some common uses for this are: Set it to zero to put multiple terms in the same position. this is useful if, e.g., a word has multiple stems. Searches for phrases including either stem will match. In this case, all but the first stem's increment should be set to zero: the increment of the first instance should be one. Repeating a token with an increment of zero can also be used to boost the scores of matches on that token. Set it to values greater than one to inhibit exact phrase matches. If, for example, one does not want phrases to match across removed stop words, then one could build a stop word filter that removes stop words and also sets the increment to the number of stop words removed before each non-stop word. Then exact phrase queries will only match when the terms occur with no intervening stop words.

Gets or Sets the position increment (the distance from the prior term). The default value is one.

if value is set to a negative value.

Default implementation of .

Initialize this attribute with position increment of 1

Determines how many positions this token spans. Very few analyzer components actually produce this attribute, and indexing ignores it, but it's useful to express the graph structure naturally produced by decompounding, word splitting/joining, synonym filtering, etc. NOTE: this is optional, and most analyzers don't change the default value (1).

Gets or Sets the position length of this (how many positions this token spans). The default value is one.

if value is set to zero or negative.

Default implementation of .

Initializes this attribute with position length of 1.

This attribute is requested by TermsHashPerField to index the contents. This attribute can be used to customize the final byte[] encoding of terms. Consumers of this attribute call up-front, and then invoke for each term. Example:


               TermToBytesRefAttribute termAtt = tokenStream.GetAttribute<TermToBytesRefAttribute>;
               BytesRef bytes = termAtt.BytesRef;
            
               while (tokenStream.IncrementToken()
               {
                 // you must call termAtt.FillBytesRef() before doing something with the bytes.
                 // this encodes the term value (internally it might be a char[], etc) into the bytes.
                 int hashCode = termAtt.FillBytesRef();
            
                 if (IsInteresting(bytes))
                 {
                   // because the bytes are reused by the attribute (like ICharTermAttribute's char[] buffer),
                   // you should make a copy if you need persistent access to the bytes, otherwise they will
                   // be rewritten across calls to IncrementToken()
            
                   DoSomethingWith(new BytesRef(bytes));
                 }
               }
               ...

@lucene.experimental this is a very expert API, please use and its implementation of this method for UTF-8 terms.

Updates the bytes to contain this term's final encoding.

Retrieve this attribute's . The bytes are updated from the current term when the consumer calls .

this s internal .

A 's lexical type. The Default value is "word".

Gets or Sets the lexical type.

Default implementation of .

the default type

Initialize this attribute with

A is a whose input is another . This is an abstract class; subclasses must override .

The source of tokens for this filter.

Construct a token stream filtering the given input.

This method is called by the consumer after the last token has been consumed, after returned false (using the new API). Streams implementing the old API should upgrade to use this feature. This method can be used to perform any end-of-stream operations, such as setting the final offset of a stream. The final offset of a stream might differ from the offset of the last token eg in case one or more whitespaces followed after the last token, but a WhitespaceTokenizer was used. Additionally any skipped positions (such as those removed by a stopfilter) can be applied to the position increment, or any adjustment of other attributes where the end-of-stream value may be important. NOTE: The default implementation chains the call to the input TokenStream, so be sure to call base.End() first when overriding this method.

If an I/O error occurs

Releases resources associated with this stream. If you override this method, always call base.Dispose(disposing), otherwise some internal state will not be correctly reset (e.g., will throw on reuse). NOTE: The default implementation chains the call to the input TokenStream, so be sure to call base.Dispose(disposing) when overriding this method.

This method is called by a consumer before it begins consumption using . Resets this stream to a clean state. Stateful implementations must implement this method so that they can be reused, just as if they had been created fresh. If you override this method, always call base.Reset(), otherwise some internal state will not be correctly reset (e.g., will throw on further usage).

NOTE: The default implementation chains the call to the input , so be sure to call base.Reset() when overriding this method.

A is a whose input is a . This is an abstract class; subclasses must override NOTE: Subclasses overriding must call before setting attributes.

The text source for this .

Pending reader: not actually assigned to input until

Construct a token stream processing the given input.

Construct a token stream processing the given input using the given .

NOTE: The default implementation closes the input , so be sure to call base.Dispose(disposing) when overriding this method.

Return the corrected offset. If is a subclass this method calls , else returns .

offset as seen in the output corrected offset based on the input

Expert: Set a new reader on the . Typically, an analyzer (in its tokenStream method) will use this to re-use a previously created tokenizer.

A enumerates the sequence of tokens, either from s of a or from query text. this is an abstract class; concrete subclasses are: , a whose input is a ; and , a whose input is another . A new API has been introduced with Lucene 2.9. this API has moved from being -based to -based. While still exists in 2.9 as a convenience class, the preferred way to store the information of a is to use s. now extends , which provides access to all of the token s for the . Note that only one instance per is created and reused for every token. This approach reduces object creation and allows local caching of references to the s. See for further details. The workflow of the new API is as follows: Instantiation of /s which add/get attributes to/from the . The consumer calls . The consumer retrieves attributes from the stream and stores local references to all attributes it wants to access. The consumer calls until it returns false consuming the attributes after each call. The consumer calls so that any end-of-stream operations can be performed. The consumer calls to release any resource when finished using the . To make sure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in . You can find some example code for the new API in the analysis documentation. Sometimes it is desirable to capture a current state of a , e.g., for buffering purposes (see , TeeSinkTokenFilter). For this usecase and can be used. The -API in Lucene is based on the decorator pattern. Therefore all non-abstract subclasses must be sealed or have at least a sealed implementation of ! This is checked when assertions are enabled.

A using the default attribute factory.

A that uses the same attributes as the supplied one.

A using the supplied for creating new instances.

Consumers (i.e., ) use this method to advance the stream to the next token. Implementing classes must implement this method and update the appropriate s with the attributes of the next token. The producer must make no assumptions about the attributes after the method has been returned: the caller may arbitrarily change it. If the producer needs to preserve the state for subsequent calls, it can use to create a copy of the current attribute state. this method is called for every token of a document, so an efficient implementation is crucial for good performance. To avoid calls to and , references to all s that this stream uses should be retrieved during instantiation. To ensure that filters and consumers know which attributes are available, the attributes must be added during instantiation. Filters and consumers are not required to check for availability of attributes in .

false for end of stream; true otherwise

This method is called by the consumer after the last token has been consumed, after returned false (using the new API). Streams implementing the old API should upgrade to use this feature. This method can be used to perform any end-of-stream operations, such as setting the final offset of a stream. The final offset of a stream might differ from the offset of the last token eg in case one or more whitespaces followed after the last token, but a WhitespaceTokenizer was used. Additionally any skipped positions (such as those removed by a stopfilter) can be applied to the position increment, or any adjustment of other attributes where the end-of-stream value may be important. If you override this method, always call base.End();.

If an I/O error occurs

Consumes a and creates an where the transition labels are UTF8 bytes (or Unicode code points if unicodeArcs is true) from the . Between tokens we insert and for holes we insert . @lucene.experimental

Sole constructor.

Whether to generate holes in the automaton for missing positions, true by default.

Whether to make transition labels Unicode code points instead of UTF8 bytes, false by default

Subclass & implement this if you need to change the token (such as escaping certain bytes) before it's turned into a graph.

We create transition between two adjacent tokens.

We add this arc to represent a hole.

Pulls the graph (including from the provided , and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.

Holds all state required for to produce a without re-seeking the terms dict.

How many docs have this term?

Total number of occurrences of this term.

The term's ord in the current block.

File pointer into the terms dict primary file (_X.tim) that holds this term.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

A block-based terms index and dictionary that assigns terms to variable length blocks according to how they share prefixes. The terms index is a prefix trie whose leaves are term blocks. The advantage of this approach is that SeekExact() is often able to determine a term cannot exist without doing any IO, and intersection with Automata is very fast. Note that this terms dictionary has it's own fixed terms index (ie, it does not support a pluggable terms index implementation). NOTE: this terms dictionary does not support index divisor when opening an IndexReader. Instead, you can change the min/maxItemsPerBlock during indexing. The data structure used by this implementation is very similar to a burst trie (http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3499), but with added logic to break up too-large blocks of all terms sharing a given prefix into smaller ones. Use with the -verbose option to see summary statistics on the blocks in the dictionary. See . @lucene.experimental

File offset where the directory starts in the terms file.

File offset where the directory starts in the index file.

Sole constructor.

LUCENENET specific parameter which allows a subclass to set state. It is *optional* and can be used when overriding the ReadHeader(), ReadIndexHeader() and SeekDir() methods. It only matters in the case where the state is required inside of any of those methods that is passed in to the subclass constructor. When passed to the constructor, it is set to the protected field m_subclassState before any of the above methods are called where it is available for reading when overriding the above methods. If your subclass needs to pass more than one piece of data, you can create a class or struct to do so. All other virtual members of BlockTreeTermsReader are not called in the constructor, so the overrides of those methods won't specifically need to use this field (although they could for consistency).

Reads terms file header.

Reads index file header.

Seek to the directory offset.

Disposes all resources used by this object.

BlockTree statistics for a single field returned by .

How many nodes in the index FST.

How many arcs in the index FST.

Byte size of the index.

Total number of terms in the field.

Total number of bytes (sum of term lengths) across all terms in the field.

The number of normal (non-floor) blocks in the terms file.

The number of floor blocks (meta-blocks larger than the allowed maxItemsPerBlock) in the terms file.

The number of sub-blocks within the floor blocks.

The number of "internal" blocks (that have both terms and sub-blocks).

The number of "leaf" blocks (blocks that have only terms).

The number of "internal" blocks that do not contain terms (have only sub-blocks).

Total number of blocks.

Number of blocks at each prefix depth.

Total number of bytes used to store term suffixes.

Total number of bytes used to store term stats (not including what the stores.

Total bytes stored by the , plus the other few vInts stored in the frame.

Segment name.

Field name.

BlockTree's implementation of .

For debugging -- used by CheckIndex too

Returns approximate RAM bytes used

NOTE: This was longs (field) in Lucene

Runs next() through the entire terms dict, computing aggregate statistics.

NOTE: This was longs (field) in Lucene

Does initial decode of next block of terms; this doesn't actually decode the docFreq, totalTermFreq, postings details (frq/prx offset, etc.) metadata; it just loads them as byte[] blobs which are then decoded on-demand if the metadata is ever requested for any term in this block. this enables terms-only intensive consumes (eg certain MTQs, respelling) to not pay the price of decoding metadata they won't use.

Scans to sub-block that has this target fp; only called by Next(); NOTE: does not set startBytePos/suffix as a side effect

Suggested default value for the minItemsInBlock parameter to .

Suggested default value for the maxItemsInBlock parameter to .

Extension of terms file.

Initial terms format.

Append-only

Meta data as array.

Checksums.

Current terms format.

Extension of terms index file.

Block-based terms index and dictionary writer. Writes terms dict and index, block-encoding (column stride) each term's metadata for each set of terms between two index terms. Files: .tim: Term Dictionary .tip: Term Index

Term Dictionary

The .tim file contains the list of terms in each field along with per-term statistics (such as docfreq) and per-term metadata (typically pointers to the postings list for that term in the inverted index). The .tim is arranged in blocks: with blocks containing a variable number of entries (by default 25-48), where each entry is either a term or a reference to a sub-block. NOTE: The term dictionary can plug into different postings implementations: the postings writer/reader are actually responsible for encoding and decoding the Postings Metadata and Term Metadata sections. TermsDict (.tim) --> Header, PostingsHeader, NodeBlock^NumBlocks, FieldSummary, DirOffset, Footer NodeBlock --> (OuterNode | InnerNode) OuterNode --> EntryCount, SuffixLength, Byte^SuffixLength, StatsLength, < TermStats >^EntryCount, MetaLength, <TermMetadata>^EntryCount InnerNode --> EntryCount, SuffixLength[,Sub?], Byte^SuffixLength, StatsLength, < TermStats ? >^EntryCount, MetaLength, <TermMetadata ? >^EntryCount TermStats --> DocFreq, TotalTermFreq FieldSummary --> NumFields, <FieldNumber, NumTerms, RootCodeLength, Byte^{RootCodeLength}, SumTotalTermFreq?, SumDocFreq, DocCount>^NumFields Header --> CodecHeader ( DirOffset --> Uint64 () EntryCount,SuffixLength,StatsLength,DocFreq,MetaLength,NumFields, FieldNumber,RootCodeLength,DocCount --> VInt (_ TotalTermFreq,NumTerms,SumTotalTermFreq,SumDocFreq --> VLong () Footer --> CodecFooter () Notes: Header is a CodecHeader () storing the version information for the BlockTree implementation. DirOffset is a pointer to the FieldSummary section. DocFreq is the count of documents which contain the term. TotalTermFreq is the total number of occurrences of the term. this is encoded as the difference between the total number of occurrences and the DocFreq. FieldNumber is the fields number from . (.fnm) NumTerms is the number of unique terms for the field. RootCode points to the root block for the field. SumDocFreq is the total number of postings, the number of term-document pairs across the entire field. DocCount is the number of documents that have at least one posting for this field. PostingsHeader and TermMetadata are plugged into by the specific postings implementation: these contain arbitrary per-file data (such as parameters or versioning information) and per-term data (such as pointers to inverted files). For inner nodes of the tree, every entry will steal one bit to mark whether it points to child nodes(sub-block). If so, the corresponding and TermMetadata are omitted

Term Index

The .tip file contains an index into the term dictionary, so that it can be accessed randomly. The index is also used to determine when a given term cannot exist on disk (in the .tim file), saving a disk seek. TermsIndex (.tip) --> Header, FSTIndex^NumFields <IndexStartFP>^NumFields, DirOffset, Footer Header --> CodecHeader () DirOffset --> Uint64 () IndexStartFP --> VLong () FSTIndex --> Footer --> CodecFooter ( Notes: The .tip file contains a separate FST for each field. The FST maps a term prefix to the on-disk block that holds all terms starting with that prefix. Each field's IndexStartFP points to its FST. DirOffset is a pointer to the start of the IndexStartFPs for all fields It's possible that an on-disk block would contain too many terms (more than the allowed maximum (default: 48)). When this happens, the block is sub-divided into new blocks (called "floor blocks"), and then the output in the FST for the block's prefix encodes the leading byte of each sub-block, and its file pointer. @lucene.experimental

NOTE: This was longsSize (field) in Lucene

Create a new writer. The number of items (terms or sub-blocks) per block will aim to be between

LUCENENET specific parameter which allows a subclass to set state. It is *optional* and can be used when overriding the WriteHeader(), WriteIndexHeader(). It only matters in the case where the state is required inside of any of those methods that is passed in to the subclass constructor. When passed to the constructor, it is set to the protected field m_subclassState before any of the above methods are called where it is available for reading when overriding the above methods. If your subclass needs to pass more than one piece of data, you can create a class or struct to do so. All other virtual members of BlockTreeTermsWriter are not called in the constructor, so the overrides of those methods won't specifically need to use this field (although they could for consistency).

Writes the terms file header.

Writes the index file header.

Writes the terms file trailer.

Writes the index file trailer.

Disposes all resources used by this object.

Encodes/decodes an inverted index segment. Note, when extending this class, the name () is written into the index. In order for the segment to be read, the name must resolve to your implementation via . This method uses to resolve codec names. To implement your own codec: Subclass this class. Subclass , override the method, and add the line base.ScanForCodecs(typeof(YourCodec).Assembly). If you have any codec classes in your assembly that are not meant for reading, you can add the to them so they are ignored by the scan. set the new by calling at application startup. If your codec has dependencies, you may also override to inject them via pure DI or a DI container. See DI-Friendly Framework to understand the approach used. Codec Names Unlike the Java version, codec names are by default convention-based on the class name. If you name your custom codec class "MyCustomCodec", the codec name will the same name without the "Codec" suffix: "MyCustom". You can override this default behavior by using the to name the codec differently than this convention. Codec names must be all ASCII alphanumeric, and less than 128 characters in length.

Sets the instance used to instantiate subclasses.

The new . The parameter is null.

Gets the associated codec factory.

The codec factory.

Creates a new codec. The will be written into the index segment: in order for the segment to be read this class should be registered by subclassing and calling in the class constructor. The new can be registered by calling at application startup.

Returns this codec's name.

Encodes/decodes postings.

Encodes/decodes docvalues.

Encodes/decodes stored fields.

Encodes/decodes term vectors.

Encodes/decodes field infos file.

Encodes/decodes segment info file.

Encodes/decodes document normalization values.

Encodes/decodes live docs.

Looks up a codec by name.

Returns a list of all available codec names.

Expert: returns the default codec used for newly created s.

Returns the codec's name. Subclasses can override to provide more detail (such as parameters).

Utility class for reading and writing versioned headers. Writing codec headers is useful to ensure that a file is in the format you think it is. @lucene.experimental

Constant to identify the start of a codec header.

Constant to identify the start of a codec footer.

Writes a codec header, which records both a string to identify the file and a version number. This header can be parsed and validated with . CodecHeader --> Magic,CodecName,Version Magic --> Uint32 (). this identifies the start of the header. It is always . CodecName --> String (). this is a string to identify this file. Version --> Uint32 (). Records the version of the file. Note that the length of a codec header depends only upon the name of the codec, so this length can be computed at any time with .

Output stream String to identify this file. It should be simple ASCII, less than 128 characters in length. Version number If there is an I/O error writing to the underlying medium.

Computes the length of a codec header.

Codec name. Length of the entire codec header.

Reads and validates a header previously written with . When reading a file, supply the expected and an expected version range ( to ).

Input stream, positioned at the point where the header was previously written. Typically this is located at the beginning of the file. The expected codec name. The minimum supported expected version number. The maximum supported expected version number. The actual version found, when a valid header is found that matches , with an actual version where minVersion <= actual <= maxVersion. Otherwise an exception is thrown. If the first four bytes are not , or if the actual codec found is not . If the actual version is less than . If the actual version is greater than . If there is an I/O error reading from the underlying medium.

Like except this version assumes the first has already been read and validated from the input.

Writes a codec footer, which records both a checksum algorithm ID and a checksum. This footer can be parsed and validated with . CodecFooter --> Magic,AlgorithmID,Checksum Magic --> Uint32 (). this identifies the start of the footer. It is always . AlgorithmID --> Uint32 (). this indicates the checksum algorithm used. Currently this is always 0, for zlib-crc32. Checksum --> Uint32 (). The actual checksum value for all previous bytes in the stream, including the bytes from Magic and AlgorithmID.

Output stream If there is an I/O error writing to the underlying medium.

Computes the length of a codec footer.

Length of the entire codec footer.

Validates the codec footer previously written by .

Actual checksum value. If the footer is invalid, if the checksum does not match, or if is not properly positioned before the footer at the end of the stream.

Returns (but does not validate) the checksum previously written by .

actual checksum value If the footer is invalid.

Checks that the stream is positioned at the end, and throws exception if it is not.

Clones the provided input, reads all bytes from the file, and calls Note that this method may be slow, as it must process the entire file. If you just need to extract the checksum value, call .

A that is very similar to but compresses documents in chunks in order to improve the compression ratio. For a chunk size of chunkSize bytes, this does not support documents larger than (2³¹ - chunkSize) bytes. In case this is a problem, you should use another format, such as . For optimal performance, you should use a that returns segments that have the biggest byte size first. @lucene.experimental

Create a new with an empty segment suffix.

Create a new . is the name of the format. This name will be used in the file formats to perform codec header checks (). is the segment suffix. this suffix is added to the result file name only if it's not the empty string. The parameter allows you to choose between compression algorithms that have various compression and decompression speeds so that you can pick the one that best fits your indexing and searching throughput. You should never instantiate two s that have the same name but different s. is the minimum byte size of a chunk of documents. A value of 1 can make sense if there is redundancy across fields. In that case, both performance and compression ratio should be better than with with compressed fields. Higher values of should improve the compression ratio but will require more memory at indexing time and might make document loading a little slower (depending on the size of your OS cache compared to the size of your index).

The name of the . The to use. The minimum number of bytes of a single chunk of stored documents.

Random-access reader for . @lucene.internal

Efficient index format for block-based s. this writer generates a file which can be loaded into memory using memory-efficient data structures to quickly locate the block that contains any document. In order to have a compact in-memory representation, for every block of 1024 chunks, this index computes the average number of bytes per chunk and for every chunk, only stores the difference between

${chunk number} * ${average length of a chunk}

and the actual start offset of the chunk

Data is written as follows:

PackedIntsVersion, <Block>^BlockCount, BlocksEndMarker

PackedIntsVersion --> as a VInt ()

BlocksEndMarker --> 0 as a VInt () , this marks the end of blocks since blocks are not allowed to start with 0

Block --> BlockChunks, <DocBases>, <StartPointers>

BlockChunks --> a VInt () which is the number of chunks encoded in the block

DocBases --> DocBase, AvgChunkDocs, BitsPerDocBaseDelta, DocBaseDeltas

DocBase --> first document ID of the block of chunks, as a VInt ()

AvgChunkDocs --> average number of documents in a single chunk, as a VInt ()

BitsPerDocBaseDelta --> number of bits required to represent a delta from the average using ZigZag encoding

DocBaseDeltas --> packed () array of BlockChunks elements of BitsPerDocBaseDelta bits each, representing the deltas from the average doc base using ZigZag encoding.

StartPointers --> StartPointerBase, AvgChunkSize, BitsPerStartPointerDelta, StartPointerDeltas

StartPointerBase --> the first start pointer of the block, as a VLong ()

AvgChunkSize --> the average size of a chunk of compressed documents, as a VLong ()

BitsPerStartPointerDelta --> number of bits required to represent a delta from the average using ZigZag encoding

StartPointerDeltas --> packed () array of BlockChunks elements of BitsPerStartPointerDelta bits each, representing the deltas from the average start pointer using ZigZag encoding

Footer --> CodecFooter ()

Notes

For any block, the doc base of the n-th chunk can be restored with DocBase + AvgChunkDocs * n + DocBaseDeltas[n].

For any block, the start pointer of the n-th chunk can be restored with StartPointerBase + AvgChunkSize * n + StartPointerDeltas[n].

Once data is loaded into memory, you can lookup the start pointer of any document by performing two binary searches: a first one based on the values of DocBase in order to find the right block, and then inside the block based on DocBaseDeltas (by reconstructing the doc bases for every chunk).

@lucene.internal

impl for . @lucene.experimental

Sole constructor.

If this FieldsReader is disposed.

Dispose the underlying s.

Return the decompressed size of the chunk

Go to the chunk containing the provided ID.

Decompress the chunk.

Copy compressed data.

Check integrity of the data. The iterator is not usable after this method has been called.

impl for . @lucene.experimental

NOTE: This was NUMERIC_INT in Lucene

NOTE: This was NUMERIC_FLOAT in Lucene

NOTE:This was NUMERIC_LONG in Lucene

Sole constructor.

NOTE: This was saveInts() in Lucene.

A that compresses chunks of documents together in order to improve the compression ratio. @lucene.experimental

Create a new . is the name of the format. this name will be used in the file formats to perform codec header checks (). The parameter allows you to choose between compression algorithms that have various compression and decompression speeds so that you can pick the one that best fits your indexing and searching throughput. You should never instantiate two s that have the same name but different s. is the minimum byte size of a chunk of documents. Higher values of should improve the compression ratio but will require more memory at indexing time and might make document loading a little slower (depending on the size of your OS cache compared to the size of your index).

The name of the . A suffix to append to files created by this format. The to use. The minimum number of bytes of a single chunk of stored documents.

for . @lucene.experimental

Sole constructor.

NOTE: This was getPackedIntsVersion() in Lucene

if this is disposed.

for . @lucene.experimental

A pending doc.

A pending field.

Sole constructor.

Returns a sorted array containing unique field numbers.

A compression mode. Tells how much effort should be spent on compression and decompression of stored fields. @lucene.experimental

A compression mode that trades compression ratio for speed. Although the compression ratio might remain high, compression and decompression are very fast. Use this mode with indices that have a high update rate but should be able to load documents from disk quickly.

A compression mode that trades speed for compression ratio. Although compression and decompression might be slow, this compression mode should provide a good compression ratio. this mode might be interesting if/when your index size is much bigger than your OS cache.

This compression mode is similar to but it spends more time compressing in order to improve the compression ratio. This compression mode is best used with indices that have a low update rate but should be able to load documents from disk quickly.

Sole constructor.

Create a new instance.

A data compressor.

Sole constructor, typically called from sub-classes.

Compress bytes into . It it the responsibility of the compressor to add all necessary information so that a will know when to stop decompressing bytes from the stream.

A decompressor.

Sole constructor, typically called from sub-classes.

Decompress bytes that were stored between offsets and offset+length in the original stream from the compressed stream to . After returning, the length of (bytes.Length) must be equal to . Implementations of this method are free to resize depending on their needs.

The input that stores the compressed stream. The length of the original data (before compression). Bytes before this offset do not need to be decompressed. Bytes after offset+length do not need to be decompressed. a where to store the decompressed data.

LZ4 compression and decompression routines. http://code.google.com/p/lz4/ http://fastcompression.blogspot.fr/p/lz4.html

NOTE: This was readInt() in Lucene.

NOTE: This was readIntEquals() in Lucene.

Decompress at least bytes into dest[dOff]. Please note that must be large enough to be able to hold all decompressed data (meaning that you need to know the total decompressed length).

Compress bytes[off:off+len] into using at most 16KB of memory. shouldn't be shared across threads but can safely be reused.

Compress bytes[off:off+len] into . Compared to , this method is slower and uses more memory (~ 256KB per thread) but should provide better compression ratios (especially on large inputs) because it chooses the best match among up to 256 candidates and then performs trade-offs to fix overlapping matches. shouldn't be shared across threads but can safely be reused.

Abstract API that consumes numeric, binary and sorted docvalues. Concrete implementations of this actually do "something" with the docvalues (write it into the index in a specific format). The lifecycle is: DocValuesConsumer is created by or . , , or are called for each Numeric, Binary, or Sorted docvalues field. The API is a "pull" rather than "push", and the implementation is free to iterate over the values multiple times (). After all fields are added, the consumer is d. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Writes numeric docvalues for a field.

Field information. of numeric values (one for each document). null indicates a missing value. If an I/O error occurred.

Writes binary docvalues for a field.

Field information. of binary values (one for each document). null indicates a missing value. If an I/O error occurred.

Writes pre-sorted binary docvalues for a field.

Field information. of binary values in sorted order (deduplicated). of ordinals (one for each document). -1 indicates a missing value. If an I/O error occurred.

Writes pre-sorted set docvalues for a field

Field information. of binary values in sorted order (deduplicated). of the number of values for each document. A zero ordinal count indicates a missing value. of ordinal occurrences (*maxDoc total). If an I/O error occurred.

Merges the numeric docvalues from . The default implementation calls , passing an that merges and filters deleted documents on the fly.

Merges the binary docvalues from . The default implementation calls , passing an that merges and filters deleted documents on the fly.

Merges the sorted docvalues from . The default implementation calls , passing an that merges ordinals and values and filters deleted documents.

Merges the sortedset docvalues from . The default implementation calls , passing an that merges ordinals and values and filters deleted documents.

Disposes all resources used by this object.

Implementations must override and should dispose all resources used by this instance.

Encodes/decodes per-document values. Note, when extending this class, the name () may written into the index in certain configurations. In order for the segment to be read, the name must resolve to your implementation via . This method uses to resolve format names. To implement your own format: Subclass this class. Subclass , override the method, and add the line base.ScanForDocValuesFormats(typeof(YourDocValuesFormat).Assembly). If you have any format classes in your assembly that are not meant for reading, you can add the to them so they are ignored by the scan. Set the new by calling at application startup. If your format has dependencies, you may also override to inject them via pure DI or a DI container. See DI-Friendly Framework to understand the approach used. DocValuesFormat Names Unlike the Java version, format names are by default convention-based on the class name. If you name your custom format class "MyCustomDocValuesFormat", the format name will the same name without the "DocValuesFormat" suffix: "MyCustom". You can override this default behavior by using the to name the format differently than this convention. Format names must be all ASCII alphanumeric, and less than 128 characters in length. @lucene.experimental

Unique name that's used to retrieve this format when reading the index.

Sets the instance used to instantiate subclasses.

The new . The parameter is null.

Gets the associated factory.

The factory.

Creates a new docvalues format. The provided name will be written into the index segment in some configurations (such as when using ): in such configurations, for the segment to be read this class should be registered by subclassing and calling in the class constructor. The new can be registered by calling at application startup.

Returns a to write docvalues to the index.

Returns a to read docvalues from the index. NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an should be thrown by the implementation. s are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.

Unique name that's used to retrieve this format when reading the index.

Looks up a format by name.

Returns a list of all available format names.

Abstract API that produces numeric, binary and sorted docvalues. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns for this field. The returned instance need not be thread-safe: it will only be used by a single thread.

Returns a at the size of reader.MaxDoc, with turned on bits for each docid that does have a value for this field. The returned instance need not be thread-safe: it will only be used by a single thread.

Returns approximate RAM bytes used.

Checks consistency of this producer. Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files. @lucene.internal

Disposes all resources used by this object.

Implementations must override and should dispose all resources used by this instance.

Encodes/decodes . @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns a to read field infos from the index.

Returns a to write field infos to the index.

Codec API for reading . @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Read the previously written with .

Codec API for writing . @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Writes the provided to the directory.

Abstract API that consumes terms, doc, freq, prox, offset and payloads postings. Concrete implementations of this actually do "something" with the postings (write it into the index in a specific format). The lifecycle is: FieldsConsumer is created by . For each field, is called, returning a for the field. After all fields are added, the consumer is d. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Add a new field.

Called when we are done adding everything.

Implementations must override and should dispose all resources used by this instance.

Called during merging to merge all from sub-readers. this must recurse to merge all postings (terms, docs, positions, etc.). A can override this default implementation to do its own merging.

Abstract API that produces terms, doc, freq, prox, offset and payloads postings. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Disposes all resources used by this object.

Implementations must override and should dispose all resources used by this instance.

Returns approximate RAM bytes used.

Checks consistency of this reader. Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files. @lucene.internal

A codec that forwards all its method calls to another codec. Extend this class when you need to reuse the functionality of an existing codec. For example, if you want to build a codec that redefines Lucene46's :


                 public sealed class CustomCodec : FilterCodec 
                 {
                     public CustomCodec()
                         : base("CustomCodec", new Lucene46Codec())
                     {
                     }
            
                     public override LiveDocsFormat LiveDocsFormat 
                     {
                         get { return new CustomLiveDocsFormat(); }
                     }
                 }

Please note: Don't call from the no-arg constructor of your own codec. When the loads your own , the has not yet fully initialized! If you want to extend another , instantiate it directly by calling its constructor. @lucene.experimental

The codec to filter.

Sole constructor. When subclassing this codec, create a no-arg ctor and pass the delegate codec and a unique name to this ctor.

Format for live/deleted documents. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Creates a new MutableBits, with all bits set, for the specified size.

Creates a new mutablebits of the same bits set and size of existing.

Read live docs bits.

Persist live docs bits. Use to determine the generation of the deletes file you should write to.

Records all files in use by this into the files argument.

Supports the Lucene 3.x index format (readonly)

Extension of compound file for doc store files.

Returns file names for shared doc stores, if any, else null.

Lucene3x ReadOnly implementation. @lucene.experimental

@lucene.experimental

Extension of field infos.

Exposes flex API on a pre-flex index, as a codec. @lucene.experimental

Lucene3x ReadOnly implementation. @lucene.experimental

Reads Lucene 3.x norms format and exposes it via API. @lucene.experimental

Norms header placeholder.

Extension of norms file.

Extension of separate norms file.

Codec that reads the pre-flex-indexing postings format. It does not provide a writer because newly written segments should use the configured on . @lucene.experimental

Extension of terms file.

Extension of terms index file.

Extension of freq postings file.

Extension of prox postings file

Lucene3x ReadOnly implementation. @lucene.experimental

This format adds optional per-segment String diagnostics storage, and switches userData to Map.

Each segment records whether it has term vectors.

Each segment records the Lucene version that created it.

Extension used for saving each SegmentInfo, once a 3.x index is first committed to with 4.0.

If this segment shares stored fields & vectors, this offset is where in that file this segment's docs begin. Name used to derive fields/vectors file we share with other segments. Whether doc store files are stored in compound file (*.cfx).

Lucene 3x implementation of . @lucene.experimental

Reads from legacy 3.x segments_N.

Returns the freq pointer of the doc to which the last call of has skipped.

Returns the prox pointer of the doc to which the last call of has skipped.

Returns the payload length of the payload stored just before the doc to which the last call of has skipped.

Class responsible for access to stored document fields. It uses <segment>.fdt and <segment>.fdx; files.

Extension of stored fields file.

Extension of stored fields index file.

Returns a cloned FieldsReader that shares open IndexInputs with the original one. It is the caller's job not to close the original FieldsReader until all clones are called (eg, currently SegmentReader manages this logic).

Verifies that the code version which wrote the segment is supported.

If this FieldsReader is disposed.

Closes the underlying streams. This means that the Fields values will not be accessible.

If there is a low-level I/O error.

Lucene3x ReadOnly implementation @lucene.experimental

Extension of vectors fields file.

Extension of vectors documents file.

Extension of vectors index file.

The number of documents in the reader. NOTE: This was size() in Lucene.

@lucene.experimental

Optimized implementation.

Overridden by to skip in prox stream.

Optimized implementation.

@lucene.experimental

Increments the enumeration to the next element. True if one exists.

Returns the current Term in the enumeration. Initially invalid, valid after called for the first time.

Returns the previous Term enumerated. Initially null.

Returns the current in the enumeration. Initially invalid, valid after called for the first time.

Sets the argument to the current in the enumeration. Initially invalid, valid after called for the first time.

Returns the docFreq from the current in the enumeration. Initially invalid, valid after called for the first time.

Returns the freqPointer from the current in the enumeration. Initially invalid, valid after called for the first time.

Returns the proxPointer from the current in the enumeration. Initially invalid, valid after called for the first time.

Closes the enumeration to further activity, freeing resources.

@lucene.experimental

Called by base.SkipTo().

@lucene.experimental

A is the record of information stored for a term.

The number of documents which contain the term.

This stores a monotonically increasing set of Term, TermInfo pairs in a Directory. Pairs are accessed either by or by ordinal position the set. @lucene.experimental

Per-thread resources managed by ThreadLocal.

Returns the number of term/value pairs in the set. NOTE: This was size() in Lucene.

Returns the for a in the set, or null.

Returns the position of a in the set or -1.

Returns an enumeration of all the s and s in the set.

Returns an enumeration of terms starting at or after the named term.

This stores a monotonically increasing set of Term, TermInfo pairs in an index segment. Pairs are accessed either by or by ordinal position the set. The and are actually serialized and stored into a byte array and pointers to the position of each are stored in a array.

Loads the segment information at segment load time.

The term enum. The index divisor. The size of the tii file, used to approximate the size of the buffer. The total index interval.

Binary search for the given term.

The term to locate. If there is a low-level I/O error.

Gets the term at the given position. For testing.

The position to read the term from the index. The term. If there is a low-level I/O error.

Returns the number of terms.

int.

The compares the given term against the term in the index specified by the term index. ie It returns negative N when term is less than index term;

The given term. The index of the of term to compare. int. If there is a low-level I/O error.

Compare the fields of the terms first, and if not equals return from compare. If equal compare terms.

The term to compare. The position of the term in the input to compare The input buffer. int. If there is a low-level I/O error.

Compares the fields before checking the text of the terms.

The given term. The term that exists in the data block. The data block. int. If there is a low-level I/O error.

Optimized implementation of a vector of bits. This is more-or-less like java.util.BitSet, but also includes the following: a count() method, which efficiently computes the number of one bits; optimized read from and write to disk; inlinable get() method; store and load, as bit set or d-gaps, depending on sparseness; @lucene.internal

Constructs a vector capable of holding bits.

Sets the value of to one.

Sets the value of to true, and returns true if bit was already set.

Sets the value of to zero.

Returns true if is one and false if it is zero.

Returns the number of bits in this vector. This is also one greater than the number of the largest valid bit number. This is the equivalent of either size() or length() in Lucene.

Returns the total number of one bits in this vector. This is efficiently computed and cached, so that, if the vector is not changed, no recomputation is done for repeated calls.

For testing

Writes this vector to the file in Directory , in a format that can be read by the constructor .

Invert all bits.

Set all bits.

Write as a bit set.

Write as a d-gaps list.

Indicates if the bit vector is sparse and should be saved as a d-gaps list, or dense, and should be saved as a bit set.

Constructs a bit vector from the file in Directory , as written by the method.

Read as a bit set.

Read as a d-gaps list.

Read as a d-gaps cleared bits list.

Implements the Lucene 4.0 index format, with configurable per-field postings formats. If you want to reuse functionality of this codec in another codec, extend . See package documentation for file format details.

Sole constructor.

Returns the postings format that should be used for writing new segments of . The default implementation always returns "Lucene40".

Lucene 4.0 DocValues format. Files: .dv.cfs: compound container () .dv.cfe: compound entries () Entries within the compound file: <segment>_<fieldNumber>.dat: data values <segment>_<fieldNumber>.idx: index into the .dat for DEREF types There are several many types of with different encodings. From the perspective of filenames, all types store their values in .dat entries within the compound file. In the case of dereferenced/sorted types, the .dat actually contains only the unique values, and an additional .idx file contains pointers to these unique values. Formats: .dat --> Header, PackedType, MinValue, DefaultValue, PackedStream .dat --> Header, ValueSize, Byte () ^maxdoc .dat --> Header, ValueSize, Short () ^maxdoc .dat --> Header, ValueSize, Int32 () ^maxdoc .dat --> Header, ValueSize, Int64 () ^maxdoc .dat --> Header, ValueSize, Float32^maxdoc .dat --> Header, ValueSize, Float64^maxdoc .dat --> Header, ValueSize, (Byte () * ValueSize)^maxdoc .idx --> Header, TotalBytes, Addresses .dat --> Header, (Byte () * variable ValueSize)^maxdoc .idx --> Header, NumValues, Addresses .dat --> Header, ValueSize, (Byte () * ValueSize)^NumValues .idx --> Header, TotalVarBytes, Addresses .dat --> Header, (LengthPrefix + Byte () * variable ValueSize)^NumValues .idx --> Header, NumValues, Ordinals .dat --> Header, ValueSize, (Byte () * ValueSize)^NumValues .idx --> Header, TotalVarBytes, Addresses, Ordinals .dat --> Header, (Byte () * variable ValueSize)^NumValues Data Types: Header --> CodecHeader () PackedType --> Byte () MaxAddress, MinValue, DefaultValue --> Int64 () PackedStream, Addresses, Ordinals --> ValueSize, NumValues --> Int32 () Float32 --> 32-bit float encoded with then written as Int32 () Float64 --> 64-bit float encoded with then written as Int64 () TotalBytes --> VLong () TotalVarBytes --> Int64 () LengthPrefix --> Length of the data value as VInt () (maximum of 2 bytes) Notes: PackedType is a 0 when compressed, 1 when the stream is written as 64-bit integers. Addresses stores pointers to the actual byte location (indexed by docid). In the VAR_STRAIGHT case, each entry can have a different length, so to determine the length, docid+1 is retrieved. A sentinel address is written at the end for the VAR_STRAIGHT case, so the Addresses stream contains maxdoc+1 indices. For the deduplicated VAR_DEREF case, each length is encoded as a prefix to the data itself as a VInt () (maximum of 2 bytes). Ordinals stores the term ID in sorted order (indexed by docid). In the FIXED_SORTED case, the address into the .dat can be computed from the ordinal as Header+ValueSize+(ordinal*ValueSize) because the byte length is fixed. In the VAR_SORTED case, there is double indirection (docid -> ordinal -> address), but an additional sentinel ordinal+address is always written (so there are NumValues+1 ordinals). To determine the length, ord+1's address is looked up as well. in contrast to other straight variants uses a .idx file to improve lookup perfromance. In contrast to it doesn't apply deduplication of the document values. Limitations: Binary doc values can be at most in length.

Maximum length for each binary doc values field.

Sole constructor.

Reads the 4.0 format of norms/docvalues. @lucene.experimental

NOTE: This was loadVarIntsField() in Lucene.

NOTE: This was loadShortField() in Lucene.

NOTE: This was loadIntField() in Lucene.

NOTE: This was loadLongField() in Lucene.

NOTE: This was loadFloatField() in Lucene.

Lucene 4.0 Field Infos format. Field names are stored in the field info file, with suffix .fnm. FieldInfos (.fnm) --> Header,FieldsCount, <FieldName,FieldNumber, FieldBits,DocValuesBits,Attributes> ^FieldsCount Data types: Header --> CodecHeader () FieldsCount --> VInt () FieldName --> String () FieldBits, DocValuesBits --> Byte () FieldNumber --> VInt () Attributes --> IDictionary<String,String> () Field Descriptions: FieldsCount: the number of fields in this file. FieldName: name of the field as a UTF-8 String. FieldNumber: the field's number. Note that unlike previous versions of Lucene, the fields are not numbered implicitly by their order in the file, instead explicitly. FieldBits: a byte containing field options. The low-order bit is one for indexed fields, and zero for non-indexed fields. The second lowest-order bit is one for fields that have term vectors stored, and zero for fields without term vectors. If the third lowest order-bit is set (0x4), offsets are stored into the postings list in addition to positions. Fourth bit is unused. If the fifth lowest-order bit is set (0x10), norms are omitted for the indexed field. If the sixth lowest-order bit is set (0x20), payloads are stored for the indexed field. If the seventh lowest-order bit is set (0x40), term frequencies and positions omitted for the indexed field. If the eighth lowest-order bit is set (0x80), positions are omitted for the indexed field. DocValuesBits: a byte containing per-document value types. The type recorded as two four-bit integers, with the high-order bits representing norms options, and the low-order bits representing options. Each four-bit integer can be decoded as such: 0: no DocValues for this field. 1: variable-width signed integers. () 2: 32-bit floating point values. () 3: 64-bit floating point values. () 4: fixed-length byte array values. () 5: fixed-length dereferenced byte array values. () 6: variable-length byte array values. () 7: variable-length dereferenced byte array values. () 8: 16-bit signed integers. () 9: 32-bit signed integers. () 10: 64-bit signed integers. () 11: 8-bit signed integers. () 12: fixed-length sorted byte array values. () 13: variable-length sorted byte array values. () Attributes: a key-value map of codec-private attributes. @lucene.experimental

Sole constructor.

Extension of field infos

Lucene 4.0 FieldInfos reader. @lucene.experimental

Sole constructor.

Lucene 4.0 Live Documents Format. The .del file is optional, and only exists when a segment contains deletions. Although per-segment, this file is maintained exterior to compound segment files. Deletions (.del) --> Format,Header,ByteCount,BitCount, Bits | DGaps (depending on Format) Format,ByteSize,BitCount --> Uint32 () Bits --> < Byte () > ^ByteCount DGaps --> <DGap,NonOnesByte> ^{NonzeroBytesCount} DGap --> VInt () NonOnesByte --> Byte() Header --> CodecHeader () Format is 1: indicates cleared DGaps. ByteCount indicates the number of bytes in Bits. It is typically (SegSize/8)+1. BitCount indicates the number of bits that are currently set in Bits. Bits contains one bit for each document indexed. When the bit corresponding to a document number is cleared, that document is marked as deleted. Bit ordering is from least to most significant. Thus, if Bits contains two bytes, 0x00 and 0x02, then document 9 is marked as alive (not deleted). DGaps represents sparse bit-vectors more efficiently than Bits. It is made of DGaps on indexes of nonOnes bytes in Bits, and the nonOnes bytes themselves. The number of nonOnes bytes in Bits (NonOnesBytesCount) is not stored. For example, if there are 8000 bits and only bits 10,12,32 are cleared, DGaps would be used: (VInt) 1 , (byte) 20 , (VInt) 3 , (Byte) 1

Extension of deletes

Sole constructor.

Lucene 4.0 Norms Format. Files: .nrm.cfs: compound container () .nrm.cfe: compound entries () Norms are implemented as DocValues, so other than file extension, norms are written exactly the same way as . @lucene.experimental

Sole constructor.

Provides a and .

Sole constructor.

Lucene 4.0 Postings format. Files: .tim: Term Dictionary .tip: Term Index .frq: Frequencies .prx: Positions

Term Dictionary

The .tim file contains the list of terms in each field along with per-term statistics (such as docfreq) and pointers to the frequencies, positions and skip data in the .frq and .prx files. See for more details on the format. NOTE: The term dictionary can plug into different postings implementations: the postings writer/reader are actually responsible for encoding and decoding the Postings Metadata and Term Metadata sections described here: Postings Metadata --> Header, SkipInterval, MaxSkipLevels, SkipMinimum Term Metadata --> FreqDelta, SkipDelta?, ProxDelta? Header --> CodecHeader () SkipInterval,MaxSkipLevels,SkipMinimum --> Uint32 () SkipDelta,FreqDelta,ProxDelta --> VLong () Notes: Header is a CodecHeader () storing the version information for the postings. SkipInterval is the fraction of TermDocs stored in skip tables. It is used to accelerate . Larger values result in smaller indexes, greater acceleration, but fewer accelerable cases, while smaller values result in bigger indexes, less acceleration (in case of a small value for MaxSkipLevels) and more accelerable cases. MaxSkipLevels is the max. number of skip levels stored for each term in the .frq file. A low value results in smaller indexes but less acceleration, a larger value results in slightly larger indexes but greater acceleration. See format of .frq file for more information about skip levels. SkipMinimum is the minimum document frequency a term must have in order to write any skip data at all. FreqDelta determines the position of this term's TermFreqs within the .frq file. In particular, it is the difference between the position of this term's data in that file and the position of the previous term's data (or zero, for the first term in the block). ProxDelta determines the position of this term's TermPositions within the .prx file. In particular, it is the difference between the position of this term's data in that file and the position of the previous term's data (or zero, for the first term in the block. For fields that omit position data, this will be 0 since prox information is not stored. SkipDelta determines the position of this term's SkipData within the .frq file. In particular, it is the number of bytes after TermFreqs that the SkipData starts. In other words, it is the length of the TermFreq data. SkipDelta is only stored if DocFreq is not smaller than SkipMinimum.

Term Index

The .tip file contains an index into the term dictionary, so that it can be accessed randomly. See for more details on the format.

Frequencies

The .frq file contains the lists of documents which contain each term, along with the frequency of the term in that document (except when frequencies are omitted: ). FreqFile (.frq) --> Header, <TermFreqs, SkipData?> ^TermCount Header --> CodecHeader () TermFreqs --> <TermFreq> ^DocFreq TermFreq --> DocDelta[, Freq?] SkipData --> <<SkipLevelLength, SkipLevel> ^{NumSkipLevels-1}, SkipLevel> <SkipDatum> SkipLevel --> <SkipDatum> ^{DocFreq/(SkipInterval^(Level +
1))} SkipDatum --> DocSkip,PayloadLength?,OffsetLength?,FreqSkip,ProxSkip,SkipChildLevelPointer? DocDelta,Freq,DocSkip,PayloadLength,OffsetLength,FreqSkip,ProxSkip --> VInt () SkipChildLevelPointer --> VLong () TermFreqs are ordered by term (the term is implicit, from the term dictionary). TermFreq entries are ordered by increasing document number. DocDelta: if frequencies are indexed, this determines both the document number and the frequency. In particular, DocDelta/2 is the difference between this document number and the previous document number (or zero when this is the first document in a TermFreqs). When DocDelta is odd, the frequency is one. When DocDelta is even, the frequency is read as another VInt. If frequencies are omitted, DocDelta contains the gap (not multiplied by 2) between document numbers and no frequency information is stored. For example, the TermFreqs for a term which occurs once in document seven and three times in document eleven, with frequencies indexed, would be the following sequence of VInts: 15, 8, 3 If frequencies were omitted () it would be this sequence of VInts instead: 7,4 DocSkip records the document number before every SkipInterval ^th document in TermFreqs. If payloads and offsets are disabled for the term's field, then DocSkip represents the difference from the previous value in the sequence. If payloads and/or offsets are enabled for the term's field, then DocSkip/2 represents the difference from the previous value in the sequence. In this case when DocSkip is odd, then PayloadLength and/or OffsetLength are stored indicating the length of the last payload/offset before the SkipInterval^th document in TermPositions. PayloadLength indicates the length of the last payload. OffsetLength indicates the length of the last offset (endOffset-startOffset). FreqSkip and ProxSkip record the position of every SkipInterval ^th entry in FreqFile and ProxFile, respectively. File positions are relative to the start of TermFreqs and Positions, to the previous SkipDatum in the sequence. For example, if DocFreq=35 and SkipInterval=16, then there are two SkipData entries, containing the 15 ^th and 31 ^st document numbers in TermFreqs. The first FreqSkip names the number of bytes after the beginning of TermFreqs that the 16 ^th SkipDatum starts, and the second the number of bytes after that that the 32 ^nd starts. The first ProxSkip names the number of bytes after the beginning of Positions that the 16 ^th SkipDatum starts, and the second the number of bytes after that that the 32 ^nd starts. Each term can have multiple skip levels. The amount of skip levels for a term is NumSkipLevels = Min(MaxSkipLevels, floor(log(DocFreq/log(SkipInterval)))). The number of SkipData entries for a skip level is DocFreq/(SkipInterval^(Level + 1)), whereas the lowest skip level is Level=0. Example: SkipInterval = 4, MaxSkipLevels = 2, DocFreq = 35. Then skip level 0 has 8 SkipData entries, containing the 3^rd, 7^th, 11^th, 15^th, 19^th, 23^rd, 27^th, and 31^st document numbers in TermFreqs. Skip level 1 has 2 SkipData entries, containing the 15^th and 31^st document numbers in TermFreqs. The SkipData entries on all upper levels > 0 contain a SkipChildLevelPointer referencing the corresponding SkipData entry in level-1. In the example has entry 15 on level 1 a pointer to entry 15 on level 0 and entry 31 on level 1 a pointer to entry 31 on level 0.

Positions

The .prx file contains the lists of positions that each term occurs at within documents. Note that fields omitting positional data do not store anything into this file, and if all fields in the index omit positional data then the .prx file will not exist. ProxFile (.prx) --> Header, <TermPositions> ^TermCount Header --> CodecHeader () TermPositions --> <Positions> ^DocFreq Positions --> <PositionDelta,PayloadLength?,OffsetDelta?,OffsetLength?,PayloadData?> ^Freq PositionDelta,OffsetDelta,OffsetLength,PayloadLength --> VInt () PayloadData --> byte () ^{PayloadLength} TermPositions are ordered by term (the term is implicit, from the term dictionary). Positions entries are ordered by increasing document number (the document number is implicit from the .frq file). PositionDelta is, if payloads are disabled for the term's field, the difference between the position of the current occurrence in the document and the previous occurrence (or zero, if this is the first occurrence in this document). If payloads are enabled for the term's field, then PositionDelta/2 is the difference between the current and the previous position. If payloads are enabled and PositionDelta is odd, then PayloadLength is stored, indicating the length of the payload at the current term position. For example, the TermPositions for a term which occurs as the fourth term in one document, and as the fifth and ninth term in a subsequent document, would be the following sequence of VInts (payloads disabled): 4, 5, 4 PayloadData is metadata associated with the current term position. If PayloadLength is stored at the current position, then it indicates the length of this payload. If PayloadLength is not stored, then this payload has the same length as the payload at the previous position. OffsetDelta/2 is the difference between this position's startOffset from the previous occurrence (or zero, if this is the first occurrence in this document). If OffsetDelta is odd, then the length (endOffset-startOffset) differs from the previous occurrence and an OffsetLength follows. Offset data is only written for .

Minimum items (terms or sub-blocks) per block for BlockTree.

Maximum items (terms or sub-blocks) per block for BlockTree.

Creates with default settings.

Creates with custom values for and passed to block terms dictionary.

Extension of freq postings file.

Extension of prox postings file.

Concrete class that reads the 4.0 frq/prox postings format.

Sole constructor.

Returns the payload at this position, or null if no payload was indexed.

Lucene 4.0 Segment info format. Files: .si: Header, SegVersion, SegSize, IsCompoundFile, Diagnostics, Attributes, Files Data types: Header --> CodecHeader () SegSize --> Int32 () SegVersion --> String () Files --> ISet<String> () Diagnostics, Attributes --> IDictionary<String,String> () IsCompoundFile --> Int8 () Field Descriptions: SegVersion is the code version that created the segment. SegSize is the number of documents contained in the segment index. IsCompoundFile records whether the segment is written as a compound file or not. If this is -1, the segment is not a compound file. If it is 1, the segment is a compound file. Checksum contains the CRC32 checksum of all bytes in the segments_N file up until the checksum. This is used to verify integrity of the file on opening the index. The Diagnostics Map is privately written by , as a debugging aid, for each segment it creates. It includes metadata like the current Lucene version, OS, .NET/Java version, why the segment was created (merge, flush, addIndexes), etc. Attributes: a key-value map of codec-private attributes. Files is a list of files referred to by this segment. @lucene.experimental

Sole constructor.

File extension used to store .

Lucene 4.0 implementation of . @lucene.experimental

Sole constructor.

Lucene 4.0 implementation of . @lucene.experimental

Sole constructor.

Save a single segment's info.

Implements the skip list reader for the 4.0 posting list format that stores positions and payloads.

Sole constructor.

Per-term initialization.

Returns the freq pointer of the doc to which the last call of has skipped.

Returns the prox pointer of the doc to which the last call of has skipped.

Returns the payload length of the payload stored just before the doc to which the last call of has skipped.

Returns the offset length (endOffset-startOffset) of the position stored just before the doc to which the last call of has skipped.

Lucene 4.0 Stored Fields Format. Stored fields are represented by two files: The field index, or .fdx file. This is used to find the location within the field data file of the fields of a particular document. Because it contains fixed-length data, this file may be easily randomly accessed. The position of document n 's field data is the Uint64 () at n*8 in this file. This contains, for each document, a pointer to its field data, as follows: FieldIndex (.fdx) --> <Header>, <FieldValuesPosition> ^SegSize Header --> CodecHeader () FieldValuesPosition --> Uint64 () The field data, or .fdt file. This contains the stored fields of each document, as follows: FieldData (.fdt) --> <Header>, <DocFieldData> ^SegSize Header --> CodecHeader () DocFieldData --> FieldCount, <FieldNum, Bits, Value> ^FieldCount FieldCount --> VInt () FieldNum --> VInt () Bits --> Byte () low order bit reserved. second bit is one for fields containing binary data third bit reserved. 4th to 6th bit (mask: 0x7<<3) define the type of a numeric field: all bits in mask are cleared if no numeric field at all 1<<3: Value is Int 2<<3: Value is Long 3<<3: Value is Int as Float (as of 4<<3: Value is Long as Double (as of Value --> String | BinaryValue | Int | Long (depending on Bits) BinaryValue --> ValueSize, < Byte () >^ValueSize ValueSize --> VInt () @lucene.experimental

Sole constructor.

Class responsible for access to stored document fields. It uses <segment>.fdt and <segment>.fdx; files. @lucene.internal

Returns a cloned FieldsReader that shares open s with the original one. It is the caller's job not to dispose the original FieldsReader until all clones are called (eg, currently manages this logic).

Used only by clone.

Sole constructor.

if this FieldsReader is disposed.

Closes the underlying streams. This means that the values will not be accessible.

If an I/O error occurs.

Returns number of documents. NOTE: This was size() in Lucene.

Returns the length in bytes of each raw document in a contiguous range of length starting with . Returns the (the fieldStream), already seeked to the starting point for .

Class responsible for writing stored document fields. It uses <segment>.fdt and <segment>.fdx; files. @lucene.experimental

Extension of stored fields file.

Extension of stored fields index file.

Sole constructor.

Bulk write a contiguous series of documents. The array is the length (in bytes) of each raw document. The is the fieldsStream from which we should bulk-copy all bytes.

Maximum number of contiguous documents to bulk-copy when merging stored fields.

Lucene 4.0 Term Vectors format. Term Vector support is an optional on a field by field basis. It consists of 3 files. The Document Index or .tvx file. For each document, this stores the offset into the document data (.tvd) and field data (.tvf) files. DocumentIndex (.tvx) --> Header,<DocumentPosition,FieldPosition> ^NumDocs Header --> CodecHeader () DocumentPosition --> UInt64 () (offset in the .tvd file) FieldPosition --> UInt64 () (offset in the .tvf file) The Document or .tvd file. This contains, for each document, the number of fields, a list of the fields with term vector info and finally a list of pointers to the field information in the .tvf (Term Vector Fields) file. The .tvd file is used to map out the fields that have term vectors stored and where the field information is in the .tvf file. Document (.tvd) --> Header,<NumFields, FieldNums, FieldPositions> ^NumDocs Header --> CodecHeader () NumFields --> VInt () FieldNums --> <FieldNumDelta> ^NumFields FieldNumDelta --> VInt () FieldPositions --> <FieldPositionDelta> ^NumFields-1 FieldPositionDelta --> VLong () The Field or .tvf file. This file contains, for each field that has a term vector stored, a list of the terms, their frequencies and, optionally, position, offset, and payload information. Field (.tvf) --> Header,<NumTerms, Flags, TermFreqs> ^NumFields Header --> CodecHeader () NumTerms --> VInt () Flags --> Byte () TermFreqs --> <TermText, TermFreq, Positions?, PayloadData?, Offsets?> ^NumTerms TermText --> <PrefixLength, Suffix> PrefixLength --> VInt () Suffix --> String () TermFreq --> VInt () Positions --> <PositionDelta PayloadLength?>^TermFreq PositionDelta --> VInt () PayloadLength --> VInt () PayloadData --> Byte () ^{NumPayloadBytes} Offsets --> <VInt (), VInt () >^TermFreq Notes: Flags byte stores whether this term vector has position, offset, payload. information stored. Term byte prefixes are shared. The PrefixLength is the number of initial bytes from the previous term which must be pre-pended to a term's suffix in order to form the term's bytes. Thus, if the previous term's text was "bone" and the term is "boy", the PrefixLength is two and the suffix is "y". PositionDelta is, if payloads are disabled for the term's field, the difference between the position of the current occurrence in the document and the previous occurrence (or zero, if this is the first occurrence in this document). If payloads are enabled for the term's field, then PositionDelta/2 is the difference between the current and the previous position. If payloads are enabled and PositionDelta is odd, then PayloadLength is stored, indicating the length of the payload at the current term position. PayloadData is metadata associated with a term position. If PayloadLength is stored at the current position, then it indicates the length of this payload. If PayloadLength is not stored, then this payload has the same length as the payload at the previous position. PayloadData encodes the concatenated bytes for all of a terms occurrences. Offsets are stored as delta encoded VInts. The first VInt is the startOffset, the second is the endOffset.

Sole constructor.

Lucene 4.0 Term Vectors reader. It reads .tvd, .tvf, and .tvx files.

Extension of vectors fields file.

Extension of vectors documents file.

Extension of vectors index file.

Used by clone.

Sole constructor.

Retrieve the length (in bytes) of the tvd and tvf entries for the next starting with . This is used for bulk copying when merging segments, if the field numbers are congruent. Once this returns, the tvf & tvd streams are seeked to the .

The number of documents in the reader. NOTE: This was size() in Lucene.

Lucene 4.0 Term Vectors writer. It writes .tvd, .tvf, and .tvx files.

Sole constructor.

Do a bulk copy of numDocs documents from reader to our streams. This is used to expedite merging, if the field numbers are congruent.

Maximum number of contiguous documents to bulk-copy when merging term vectors.

Close all streams.

Encode all values in normal area with fixed bit width, which is determined by the max value in this block.

Special number of bits per value used whenever all values to encode are equal.

Upper limit of the number of bytes that might be required to stored encoded values.

Upper limit of the number of values that might be decoded in a single call to . Although values after are garbage, it is necessary to allocate value buffers whose size is >= MAX_DATA_SIZE to avoid s.

Compute the number of iterations required to decode values with the provided .

Compute the number of bytes required to encode a block of values that require bits per value with format .

Create a new instance and save state into .

Restore a from a .

Write a block of data (For format).

The data to write. A buffer to use to encode data. The destination output. If there is a low-level I/O error.

Read the next block of data (For format).

The input to use to read data. A buffer that can be used to store encoded data. Where to write decoded data. If there is a low-level I/O error.

Skip the next block of data.

The input where to read data. If there is a low-level I/O error.

Compute the number of bits required to serialize any of the longs in .

Implements the Lucene 4.1 index format, with configurable per-field postings formats. If you want to reuse functionality of this codec in another codec, extend . See package documentation for file format details. @lucene.experimental

Sole constructor.

Returns the postings format that should be used for writing new segments of . The default implementation always returns "Lucene41"

Provides a and . @lucene.experimental

Sole constructor.

Lucene 4.1 postings format, which encodes postings in packed integer blocks for fast decode. NOTE: this format is still experimental and subject to change without backwards compatibility. Basic idea: Packed Blocks and VInt Blocks: In packed blocks, integers are encoded with the same bit width packed format (): the block size (i.e. number of integers inside block) is fixed (currently 128). Additionally blocks that are all the same value are encoded in an optimized way. In VInt blocks, integers are encoded as VInt (): the block size is variable. Block structure: When the postings are long enough, Lucene41PostingsFormat will try to encode most integer data as a packed block. Take a term with 259 documents as an example, the first 256 document ids are encoded as two packed blocks, while the remaining 3 are encoded as one VInt block. Different kinds of data are always encoded separately into different packed blocks, but may possibly be interleaved into the same VInt block. This strategy is applied to pairs: <document number, frequency>, <position, payload length>, <position, offset start, offset length>, and <position, payload length, offsetstart, offset length>. Skipdata settings: The structure of skip table is quite similar to previous version of Lucene. Skip interval is the same as block size, and each skip entry points to the beginning of each block. However, for the first block, skip data is omitted. Positions, Payloads, and Offsets: A position is an integer indicating where the term occurs within one document. A payload is a blob of metadata associated with current position. An offset is a pair of integers indicating the tokenized start/end offsets for given term in current position: it is essentially a specialized payload. When payloads and offsets are not omitted, numPositions==numPayloads==numOffsets (assuming a null payload contributes one count). As mentioned in block structure, it is possible to encode these three either combined or separately. In all cases, payloads and offsets are stored together. When encoded as a packed block, position data is separated out as .pos, while payloads and offsets are encoded in .pay (payload metadata will also be stored directly in .pay). When encoded as VInt blocks, all these three are stored interleaved into the .pos (so is payload metadata). With this strategy, the majority of payload and offset data will be outside .pos file. So for queries that require only position data, running on a full index with payloads and offsets, this reduces disk pre-fetches. Files and detailed format: .tim: Term Dictionary .tip: Term Index .doc: Frequencies and Skip Data .pos: Positions .pay: Payloads and Offsets

Term Dictionary The .tim file contains the list of terms in each field along with per-term statistics (such as docfreq) and pointers to the frequencies, positions, payload and skip data in the .doc, .pos, and .pay files. See for more details on the format. NOTE: The term dictionary can plug into different postings implementations: the postings writer/reader are actually responsible for encoding and decoding the PostingsHeader and TermMetadata sections described here: PostingsHeader --> Header, PackedBlockSize TermMetadata --> (DocFPDelta|SingletonDocID), PosFPDelta?, PosVIntBlockFPDelta?, PayFPDelta?, SkipFPDelta? Header, --> CodecHeader () PackedBlockSize, SingletonDocID --> VInt () DocFPDelta, PosFPDelta, PayFPDelta, PosVIntBlockFPDelta, SkipFPDelta --> VLong () Footer --> CodecFooter () Notes: Header is a CodecHeader () storing the version information for the postings. PackedBlockSize is the fixed block size for packed blocks. In packed block, bit width is determined by the largest integer. Smaller block size result in smaller variance among width of integers hence smaller indexes. Larger block size result in more efficient bulk i/o hence better acceleration. This value should always be a multiple of 64, currently fixed as 128 as a tradeoff. It is also the skip interval used to accelerate . DocFPDelta determines the position of this term's TermFreqs within the .doc file. In particular, it is the difference of file offset between this term's data and previous term's data (or zero, for the first term in the block).On disk it is stored as the difference from previous value in sequence. PosFPDelta determines the position of this term's TermPositions within the .pos file. While PayFPDelta determines the position of this term's <TermPayloads, TermOffsets?> within the .pay file. Similar to DocFPDelta, it is the difference between two file positions (or neglected, for fields that omit payloads and offsets). PosVIntBlockFPDelta determines the position of this term's last TermPosition in last pos packed block within the .pos file. It is synonym for PayVIntBlockFPDelta or OffsetVIntBlockFPDelta. This is actually used to indicate whether it is necessary to load following payloads and offsets from .pos instead of .pay. Every time a new block of positions are to be loaded, the PostingsReader will use this value to check whether current block is packed format or VInt. When packed format, payloads and offsets are fetched from .pay, otherwise from .pos. (this value is neglected when total number of positions i.e. totalTermFreq is less or equal to PackedBlockSize). SkipFPDelta determines the position of this term's SkipData within the .doc file. In particular, it is the length of the TermFreq data. SkipDelta is only stored if DocFreq is not smaller than SkipMinimum (i.e. 128 in Lucene41PostingsFormat). SingletonDocID is an optimization when a term only appears in one document. In this case, instead of writing a file pointer to the .doc file (DocFPDelta), and then a VIntBlock at that location, the single document ID is written to the term dictionary.

Term Index The .tip file contains an index into the term dictionary, so that it can be accessed randomly. See for more details on the format.

Frequencies and Skip Data The .doc file contains the lists of documents which contain each term, along with the frequency of the term in that document (except when frequencies are omitted: ). It also saves skip data to the beginning of each packed or VInt block, when the length of document list is larger than packed block size. docFile(.doc) --> Header, <TermFreqs, SkipData?>^TermCount, Footer Header --> CodecHeader () TermFreqs --> <PackedBlock> ^{PackedDocBlockNum}, VIntBlock? PackedBlock --> PackedDocDeltaBlock, PackedFreqBlock? VIntBlock --> <DocDelta[, Freq?]>^{DocFreq-PackedBlockSize*PackedDocBlockNum} SkipData --> <<SkipLevelLength, SkipLevel> ^{NumSkipLevels-1}, SkipLevel>, SkipDatum? SkipLevel --> <SkipDatum> ^{TrimmedDocFreq/(PackedBlockSize^(Level + 1))} SkipDatum --> DocSkip, DocFPSkip, <PosFPSkip, PosBlockOffset, PayLength?, PayFPSkip?>?, SkipChildLevelPointer? PackedDocDeltaBlock, PackedFreqBlock --> PackedInts () DocDelta, Freq, DocSkip, DocFPSkip, PosFPSkip, PosBlockOffset, PayByteUpto, PayFPSkip --> VInt () SkipChildLevelPointer --> VLong () Footer --> CodecFooter () Notes: PackedDocDeltaBlock is theoretically generated from two steps: Calculate the difference between each document number and previous one, and get a d-gaps list (for the first document, use absolute value); For those d-gaps from first one to PackedDocBlockNum*PackedBlockSize^th, separately encode as packed blocks. If frequencies are not omitted, PackedFreqBlock will be generated without d-gap step. VIntBlock stores remaining d-gaps (along with frequencies when possible) with a format that encodes DocDelta and Freq: DocDelta: if frequencies are indexed, this determines both the document number and the frequency. In particular, DocDelta/2 is the difference between this document number and the previous document number (or zero when this is the first document in a TermFreqs). When DocDelta is odd, the frequency is one. When DocDelta is even, the frequency is read as another VInt. If frequencies are omitted, DocDelta contains the gap (not multiplied by 2) between document numbers and no frequency information is stored. For example, the TermFreqs for a term which occurs once in document seven and three times in document eleven, with frequencies indexed, would be the following sequence of VInts: 15, 8, 3 If frequencies were omitted () it would be this sequence of VInts instead: 7,4 PackedDocBlockNum is the number of packed blocks for current term's docids or frequencies. In particular, PackedDocBlockNum = floor(DocFreq/PackedBlockSize) TrimmedDocFreq = DocFreq % PackedBlockSize == 0 ? DocFreq - 1 : DocFreq. We use this trick since the definition of skip entry is a little different from base interface. In , skip data is assumed to be saved for skipInterval^th, 2*skipInterval^th ... posting in the list. However, in Lucene41PostingsFormat, the skip data is saved for skipInterval+1^th, 2*skipInterval+1^th ... posting (skipInterval==PackedBlockSize in this case). When DocFreq is multiple of PackedBlockSize, MultiLevelSkipListWriter will expect one more skip data than Lucene41SkipWriter. SkipDatum is the metadata of one skip entry. For the first block (no matter packed or VInt), it is omitted. DocSkip records the document number of every PackedBlockSize^th document number in the postings (i.e. last document number in each packed block). On disk it is stored as the difference from previous value in the sequence. DocFPSkip records the file offsets of each block (excluding )posting at PackedBlockSize+1^th, 2*PackedBlockSize+1^th ... , in DocFile. The file offsets are relative to the start of current term's TermFreqs. On disk it is also stored as the difference from previous SkipDatum in the sequence. Since positions and payloads are also block encoded, the skip should skip to related block first, then fetch the values according to in-block offset. PosFPSkip and PayFPSkip record the file offsets of related block in .pos and .pay, respectively. While PosBlockOffset indicates which value to fetch inside the related block (PayBlockOffset is unnecessary since it is always equal to PosBlockOffset). Same as DocFPSkip, the file offsets are relative to the start of current term's TermFreqs, and stored as a difference sequence. PayByteUpto indicates the start offset of the current payload. It is equivalent to the sum of the payload lengths in the current block up to PosBlockOffset

Positions The .pos file contains the lists of positions that each term occurs at within documents. It also sometimes stores part of payloads and offsets for speedup. PosFile(.pos) --> Header, <TermPositions> ^TermCount, Footer Header --> CodecHeader () TermPositions --> <PackedPosDeltaBlock> ^{PackedPosBlockNum}, VIntBlock? VIntBlock --> <PositionDelta[, PayloadLength?], PayloadData?, OffsetDelta?, OffsetLength?>^PosVIntCount PackedPosDeltaBlock --> PackedInts () PositionDelta, OffsetDelta, OffsetLength --> VInt () PayloadData --> byte ()^PayLength Footer --> CodecFooter () Notes: TermPositions are order by term (terms are implicit, from the term dictionary), and position values for each term document pair are incremental, and ordered by document number. PackedPosBlockNum is the number of packed blocks for current term's positions, payloads or offsets. In particular, PackedPosBlockNum = floor(totalTermFreq/PackedBlockSize) PosVIntCount is the number of positions encoded as VInt format. In particular, PosVIntCount = totalTermFreq - PackedPosBlockNum*PackedBlockSize The procedure how PackedPosDeltaBlock is generated is the same as PackedDocDeltaBlock in chapter Frequencies and Skip Data. PositionDelta is, if payloads are disabled for the term's field, the difference between the position of the current occurrence in the document and the previous occurrence (or zero, if this is the first occurrence in this document). If payloads are enabled for the term's field, then PositionDelta/2 is the difference between the current and the previous position. If payloads are enabled and PositionDelta is odd, then PayloadLength is stored, indicating the length of the payload at the current term position. For example, the TermPositions for a term which occurs as the fourth term in one document, and as the fifth and ninth term in a subsequent document, would be the following sequence of VInts (payloads disabled): 4, 5, 4 PayloadData is metadata associated with the current term position. If PayloadLength is stored at the current position, then it indicates the length of this payload. If PayloadLength is not stored, then this payload has the same length as the payload at the previous position. OffsetDelta/2 is the difference between this position's startOffset from the previous occurrence (or zero, if this is the first occurrence in this document). If OffsetDelta is odd, then the length (endOffset-startOffset) differs from the previous occurrence and an OffsetLength follows. Offset data is only written for .

Payloads and Offsets The .pay file will store payloads and offsets associated with certain term-document positions. Some payloads and offsets will be separated out into .pos file, for performance reasons. PayFile(.pay): --> Header, <TermPayloads, TermOffsets?> ^TermCount, Footer Header --> CodecHeader () TermPayloads --> <PackedPayLengthBlock, SumPayLength, PayData> ^{PackedPayBlockNum} TermOffsets --> <PackedOffsetStartDeltaBlock, PackedOffsetLengthBlock> ^{PackedPayBlockNum} PackedPayLengthBlock, PackedOffsetStartDeltaBlock, PackedOffsetLengthBlock --> PackedInts () SumPayLength --> VInt () PayData --> byte () ^SumPayLength Footer --> CodecFooter () Notes: The order of TermPayloads/TermOffsets will be the same as TermPositions, note that part of payload/offsets are stored in .pos. The procedure how PackedPayLengthBlock and PackedOffsetLengthBlock are generated is the same as PackedFreqBlock in chapter Frequencies and Skip Data. While PackedStartDeltaBlock follows a same procedure as PackedDocDeltaBlock. PackedPayBlockNum is always equal to PackedPosBlockNum, for the same term. It is also synonym for PackedOffsetBlockNum. SumPayLength is the total length of payloads written within one block, should be the sum of PayLengths in one packed block. PayLength in PackedPayLengthBlock is the length of each payload associated with the current position.

@lucene.experimental

Filename extension for document number, frequencies, and skip data. See chapter: Frequencies and Skip Data

Filename extension for positions. See chapter: Positions

Filename extension for payloads and offsets. See chapter: Payloads and Offsets

Fixed packed block size, number of integers encoded in a single packed block.

Creates with default settings.

Creates with custom values for and passed to block terms dictionary.

Concrete class that reads docId(maybe frq,pos,offset,payloads) list with postings format. @lucene.experimental

Sole constructor.

Read values that have been written using variable-length encoding instead of bit-packing. NOTE: This was readVIntBlock() in Lucene.

Concrete class that writes docId(maybe frq,pos,offset,payloads) list with postings format. Postings list for each term will be stored separately. @lucene.experimental

for details about skipping setting and postings layout.

Expert: The maximum number of skip levels. Smaller values result in slightly smaller indexes, but slower skipping in big posting lists.

Creates a postings writer with the specified PackedInts overhead ratio

Creates a postings writer with PackedInts.COMPACT

NOTE: This was IntBlockTermState in Lucene

Add a new position & payload

Called when we are done adding docs to this term.

Implements the skip list reader for block postings format that stores positions and payloads. Although this skipper uses MultiLevelSkipListReader as an interface, its definition of skip position will be a little different. For example, when skipInterval = blockSize = 3, df = 2*skipInterval = 6, 0 1 2 3 4 5 d d d d d d (posting list) ^ ^ (skip point in MultiLeveSkipWriter) ^ (skip point in Lucene41SkipWriter) In this case, MultiLevelSkipListReader will use the last document as a skip point, while Lucene41SkipReader should assume no skip point will comes. If we use the interface directly in Lucene41SkipReader, it may silly try to read another skip data after the only skip point is loaded. To illustrate this, we can call skipTo(d[5]), since skip point d[3] has smaller docId, and numSkipped+blockSize== df, the MultiLevelSkipListReader will assume the skip list isn't exhausted yet, and try to load a non-existed skip point Therefore, we'll trim df before passing it to the interface. see .

Trim original docFreq to tell skipReader read proper number of skip points. Since our definition in Lucene41Skip* is a little different from MultiLevelSkip* this trimmed docFreq will prevent skipReader from: 1. silly reading a non-existed skip point after the last block boundary 2. moving into the vInt block

Returns the doc pointer of the doc to which the last call of has skipped.

Write skip lists with multiple levels, and support skip within block ints. Assume that docFreq = 28, skipInterval = blockSize = 12 | block#0 | | block#1 | |vInts| d d d d d d d d d d d d d d d d d d d d d d d d d d d d (posting list) ^ ^ (level 0 skip point) Note that skipWriter will ignore first document in block#0, since it is useless as a skip point. Also, we'll never skip into the vInts block, only record skip data at the start its start point(if it exist). For each skip point, we will record: 1. docID in former position, i.e. for position 12, record docID[11], etc. 2. its related file points(position, payload), 3. related numbers or uptos(position, payload). 4. start offset.

Sets the values for the current skip data.

Lucene 4.1 stored fields format. Principle This compresses blocks of 16KB of documents in order to improve the compression ratio compared to document-level compression. It uses the LZ4 compression algorithm, which is fast to compress and very fast to decompress data. Although the compression method that is used focuses more on speed than on compression ratio, it should provide interesting compression ratios for redundant inputs (such as log files, HTML or plain text). File formats Stored fields are represented by two files: A fields data file (extension .fdt). this file stores a compact representation of documents in compressed blocks of 16KB or more. When writing a segment, documents are appended to an in-memory byte[] buffer. When its size reaches 16KB or more, some metadata about the documents is flushed to disk, immediately followed by a compressed representation of the buffer using the LZ4 compression format. Here is a more detailed description of the field data file format: FieldData (.fdt) --> <Header>, PackedIntsVersion, <Chunk>^ChunkCount Header --> CodecHeader () PackedIntsVersion --> as a VInt () ChunkCount is not known in advance and is the number of chunks necessary to store all document of the segment Chunk --> DocBase, ChunkDocs, DocFieldCounts, DocLengths, <CompressedDocs> DocBase --> the ID of the first document of the chunk as a VInt () ChunkDocs --> the number of documents in the chunk as a VInt () DocFieldCounts --> the number of stored fields of every document in the chunk, encoded as followed: if chunkDocs=1, the unique value is encoded as a VInt () else read a VInt () (let's call it bitsRequired) if bitsRequired is 0 then all values are equal, and the common value is the following VInt () else bitsRequired is the number of bits required to store any value, and values are stored in a packed () array where every value is stored on exactly bitsRequired bits DocLengths --> the lengths of all documents in the chunk, encoded with the same method as DocFieldCounts CompressedDocs --> a compressed representation of <Docs> using the LZ4 compression format Docs --> <Doc>^ChunkDocs Doc --> <FieldNumAndType, Value>^{DocFieldCount} FieldNumAndType --> a VLong (), whose 3 last bits are Type and other bits are FieldNum Type --> 0: Value is String 1: Value is BinaryValue 2: Value is Int 3: Value is Float 4: Value is Long 5: Value is Double 6, 7: unused FieldNum --> an ID of the field Value --> String () | BinaryValue | Int | Float | Long | Double depending on Type BinaryValue --> ValueLength <Byte>^ValueLength Notes If documents are larger than 16KB then chunks will likely contain only one document. However, documents can never spread across several chunks (all fields of a single document are in the same chunk). When at least one document in a chunk is large enough so that the chunk is larger than 32KB, the chunk will actually be compressed in several LZ4 blocks of 16KB. this allows s which are only interested in the first fields of a document to not have to decompress 10MB of data if the document is 10MB, but only 16KB. Given that the original lengths are written in the metadata of the chunk, the decompressor can leverage this information to stop decoding as soon as enough data has been decompressed. In case documents are incompressible, CompressedDocs will be less than 0.5% larger than Docs. A fields index file (extension .fdx). FieldsIndex (.fdx) --> <Header>, <ChunkIndex> Header --> CodecHeader () ChunkIndex: See Known limitations This does not support individual documents larger than (2³¹ - 2¹⁴) bytes. In case this is a problem, you should use another format, such as . @lucene.experimental

Sole constructor.

Implements the Lucene 4.2 index format, with configurable per-field postings and docvalues formats. If you want to reuse functionality of this codec in another codec, extend . See package documentation for file format details. @lucene.experimental

Sole constructor.

Returns the postings format that should be used for writing new segments of . The default implementation always returns "Lucene41"

Returns the docvalues format that should be used for writing new segments of . The default implementation always returns "Lucene42"

Lucene 4.2 DocValues format. Encodes the four per-document value types (Numeric,Binary,Sorted,SortedSet) with seven basic strategies. Delta-compressed Numerics: per-document integers written in blocks of 4096. For each block the minimum value is encoded, and each entry is a delta from that minimum value. Table-compressed Numerics: when the number of unique values is very small, a lookup table is written instead. Each per-document entry is instead the ordinal to this table. Uncompressed Numerics: when all values would fit into a single byte, and the acceptableOverheadRatio would pack values into 8 bits per value anyway, they are written as absolute values (with no indirection or packing) for performance. GCD-compressed Numerics: when all numbers share a common divisor, such as dates, the greatest common denominator (GCD) is computed, and quotients are stored using Delta-compressed Numerics. Fixed-width Binary: one large concatenated byte[] is written, along with the fixed length. Each document's value can be addressed by maxDoc*length. Variable-width Binary: one large concatenated byte[] is written, along with end addresses for each document. The addresses are written in blocks of 4096, with the current absolute start for the block, and the average (expected) delta per entry. For each document the deviation from the delta (actual - expected) is written. Sorted: an FST mapping deduplicated terms to ordinals is written, along with the per-document ordinals written using one of the numeric strategies above. SortedSet: an FST mapping deduplicated terms to ordinals is written, along with the per-document ordinal list written using one of the binary strategies above. Files: .dvd: DocValues data .dvm: DocValues metadata The DocValues metadata or .dvm file. For DocValues field, this stores metadata, such as the offset into the DocValues data (.dvd) DocValues metadata (.dvm) --> Header,<FieldNumber,EntryType,Entry>^NumFields,Footer Entry --> NumericEntry | BinaryEntry | SortedEntry NumericEntry --> DataOffset,CompressionType,PackedVersion BinaryEntry --> DataOffset,DataLength,MinLength,MaxLength,PackedVersion?,BlockSize? SortedEntry --> DataOffset,ValueCount FieldNumber,PackedVersion,MinLength,MaxLength,BlockSize,ValueCount --> VInt () DataOffset,DataLength --> Int64 () EntryType,CompressionType --> Byte () Header --> CodecHeader () Footer --> CodecFooter () Sorted fields have two entries: a SortedEntry with the FST metadata, and an ordinary NumericEntry for the document-to-ord metadata. SortedSet fields have two entries: a SortedEntry with the FST metadata, and an ordinary BinaryEntry for the document-to-ord-list metadata. FieldNumber of -1 indicates the end of metadata. EntryType is a 0 (NumericEntry), 1 (BinaryEntry, or 2 (SortedEntry) DataOffset is the pointer to the start of the data in the DocValues data (.dvd) CompressionType indicates how Numeric values will be compressed: 0 --> delta-compressed. For each block of 4096 integers, every integer is delta-encoded from the minimum value within the block. 1 --> table-compressed. When the number of unique numeric values is small and it would save space, a lookup table of unique values is written, followed by the ordinal for each document. 2 --> uncompressed. When the acceptableOverheadRatio parameter would upgrade the number of bits required to 8, and all values fit in a byte, these are written as absolute binary values for performance. 3 --> gcd-compressed. When all integers share a common divisor, only quotients are stored using blocks of delta-encoded ints. MinLength and MaxLength represent the min and max byte[] value lengths for Binary values. If they are equal, then all values are of a fixed size, and can be addressed as DataOffset + (docID * length). Otherwise, the binary values are of variable size, and packed integer metadata (PackedVersion,BlockSize) is written for the addresses. The DocValues data or .dvd file. For DocValues field, this stores the actual per-document data (the heavy-lifting) DocValues data (.dvd) --> Header,<NumericData | BinaryData | SortedData>^NumFields,Footer NumericData --> DeltaCompressedNumerics | TableCompressedNumerics | UncompressedNumerics | GCDCompressedNumerics BinaryData --> Byte () ^DataLength,Addresses SortedData --> FST<Int64> () DeltaCompressedNumerics --> BlockPackedInts(blockSize=4096) () TableCompressedNumerics --> TableSize, Int64 () ^TableSize, PackedInts () UncompressedNumerics --> Byte () ^maxdoc Addresses --> MonotonicBlockPackedInts(blockSize=4096) () Footer --> CodecFooter ( SortedSet entries store the list of ordinals in their BinaryData as a sequences of increasing vLongs (), delta-encoded. Limitations: Binary doc values can be at most in length.

Maximum length for each binary doc values field.

Calls Lucene42DocValuesFormat(PackedInts.DEFAULT) (.

Creates a new with the specified for . @lucene.experimental

Compression parameter for numerics. Currently this is only used when the number of unique values is small.

Reader for .

NOTE: This was packedIntsVersion (field) in Lucene

Lucene 4.2 Field Infos format. Field names are stored in the field info file, with suffix .fnm. FieldInfos (.fnm) --> Header,FieldsCount, <FieldName,FieldNumber, FieldBits,DocValuesBits,Attributes> ^FieldsCount Data types: Header --> CodecHeader FieldsCount --> VInt FieldName --> String FieldBits, DocValuesBits --> Byte FieldNumber --> VInt Attributes --> IDictionary<String,String> Field Descriptions: FieldsCount: the number of fields in this file. FieldName: name of the field as a UTF-8 String. FieldNumber: the field's number. Note that unlike previous versions of Lucene, the fields are not numbered implicitly by their order in the file, instead explicitly. FieldBits: a byte containing field options. The low-order bit is one for indexed fields, and zero for non-indexed fields. The second lowest-order bit is one for fields that have term vectors stored, and zero for fields without term vectors. If the third lowest order-bit is set (0x4), offsets are stored into the postings list in addition to positions. Fourth bit is unused. If the fifth lowest-order bit is set (0x10), norms are omitted for the indexed field. If the sixth lowest-order bit is set (0x20), payloads are stored for the indexed field. If the seventh lowest-order bit is set (0x40), term frequencies and positions omitted for the indexed field. If the eighth lowest-order bit is set (0x80), positions are omitted for the indexed field. DocValuesBits: a byte containing per-document value types. The type recorded as two four-bit integers, with the high-order bits representing norms options, and the low-order bits representing options. Each four-bit integer can be decoded as such: 0: no DocValues for this field. 1: NumericDocValues. () 2: BinaryDocValues. () 3: SortedDocValues. () Attributes: a key-value map of codec-private attributes. @lucene.experimental

Sole constructor.

Extension of field infos.

Lucene 4.2 FieldInfos reader. @lucene.experimental

Sole constructor.

Writer for .

Lucene 4.2 score normalization format. NOTE: this uses the same format as Numeric DocValues, but with different file extensions, and passing for uncompressed encoding: trading off space for performance. Files: .nvd: DocValues data .nvm: DocValues metadata

Calls Lucene42DocValuesFormat(PackedInt32s.FASTEST) ().

Creates a new with the specified for . @lucene.experimental

Compression parameter for numerics. Currently this is only used when the number of unique values is small.

Lucene 4.2 term vectors format (). Very similarly to , this format is based on compressed chunks of data, with document-level granularity so that a document can never span across distinct chunks. Moreover, data is made as compact as possible: textual data is compressed using the very light, LZ4 compression algorithm, binary data is written using fixed-size blocks of packed s (). Term vectors are stored using two files a data file where terms, frequencies, positions, offsets and payloads are stored, an index file, loaded into memory, used to locate specific documents in the data file. Looking up term vectors for any document requires at most 1 disk seek. File formats A vector data file (extension .tvd). this file stores terms, frequencies, positions, offsets and payloads for every document. Upon writing a new segment, it accumulates data into memory until the buffer used to store terms and payloads grows beyond 4KB. Then it flushes all metadata, terms and positions to disk using LZ4 compression for terms and payloads and blocks of packed s () for positions. Here is a more detailed description of the field data file format: VectorData (.tvd) --> <Header>, PackedIntsVersion, ChunkSize, <Chunk>^ChunkCount, Footer Header --> CodecHeader () PackedIntsVersion --> as a VInt () ChunkSize is the number of bytes of terms to accumulate before flushing, as a VInt () ChunkCount is not known in advance and is the number of chunks necessary to store all document of the segment Chunk --> DocBase, ChunkDocs, < NumFields >, < FieldNums >, < FieldNumOffs >, < Flags >, < NumTerms >, < TermLengths >, < TermFreqs >, < Positions >, < StartOffsets >, < Lengths >, < PayloadLengths >, < TermAndPayloads > DocBase is the ID of the first doc of the chunk as a VInt () ChunkDocs is the number of documents in the chunk NumFields --> DocNumFields^ChunkDocs DocNumFields is the number of fields for each doc, written as a VInt () if ChunkDocs==1 and as a array otherwise FieldNums --> FieldNumDelta^{TotalDistincFields}, a delta-encoded list of the sorted unique field numbers present in the chunk FieldNumOffs --> FieldNumOff^TotalFields, as a array FieldNumOff is the offset of the field number in FieldNums TotalFields is the total number of fields (sum of the values of NumFields) Flags --> Bit < FieldFlags > Bit is a single bit which when true means that fields have the same options for every document in the chunk FieldFlags --> if Bit==1: Flag^{TotalDistinctFields} else Flag^TotalFields Flag: a 3-bits int where: the first bit means that the field has positions the second bit means that the field has offsets the third bit means that the field has payloads NumTerms --> FieldNumTerms^TotalFields FieldNumTerms: the number of terms for each field, using blocks of 64 packed s () TermLengths --> PrefixLength^TotalTerms SuffixLength^TotalTerms TotalTerms: total number of terms (sum of NumTerms) PrefixLength: 0 for the first term of a field, the common prefix with the previous term otherwise using blocks of 64 packed s () SuffixLength: length of the term minus PrefixLength for every term using blocks of 64 packed s () TermFreqs --> TermFreqMinus1^TotalTerms TermFreqMinus1: (frequency - 1) for each term using blocks of 64 packed s () Positions --> PositionDelta^{TotalPositions} TotalPositions is the sum of frequencies of terms of all fields that have positions PositionDelta: the absolute position for the first position of a term, and the difference with the previous positions for following positions using blocks of 64 packed s () StartOffsets --> (AvgCharsPerTerm^{TotalDistinctFields}) StartOffsetDelta^TotalOffsets TotalOffsets is the sum of frequencies of terms of all fields that have offsets AvgCharsPerTerm: average number of chars per term, encoded as a float on 4 bytes. They are not present if no field has both positions and offsets enabled. StartOffsetDelta: (startOffset - previousStartOffset - AvgCharsPerTerm * PositionDelta). previousStartOffset is 0 for the first offset and AvgCharsPerTerm is 0 if the field has no positions using blocks of 64 packed s () Lengths --> LengthMinusTermLength^TotalOffsets LengthMinusTermLength: (endOffset - startOffset - termLength) using blocks of 64 packed s () PayloadLengths --> PayloadLength^{TotalPayloads} TotalPayloads is the sum of frequencies of terms of all fields that have payloads PayloadLength is the payload length encoded using blocks of 64 packed s () TermAndPayloads --> LZ4-compressed representation of < FieldTermsAndPayLoads >^TotalFields FieldTermsAndPayLoads --> Terms (Payloads) Terms: term bytes Payloads: payload bytes (if the field has payloads) Footer --> CodecFooter () An index file (extension .tvx). VectorIndex (.tvx) --> <Header>, <ChunkIndex>, Footer Header --> CodecHeader () ChunkIndex: See Footer --> CodecFooter () @lucene.experimental

Sole constructor.

Implements the Lucene 4.5 index format, with configurable per-field postings and docvalues formats. If you want to reuse functionality of this codec in another codec, extend . See package documentation for file format details. @lucene.experimental

Sole constructor.

Returns the postings format that should be used for writing new segments of . The default implementation always returns "Lucene41"

Returns the docvalues format that should be used for writing new segments of . The default implementation always returns "Lucene45"

Writer for

Compressed using packed blocks of s.

Compressed by computing the GCD.

Compressed by giving IDs to unique values.

Uncompressed binary, written directly (fixed length).

Uncompressed binary, written directly (variable length).

Compressed binary with shared prefixes

Standard storage for sorted set values with 1 level of indirection: docId -> address -> ord.

Single-valued sorted set values, encoded as sorted values, so no level of indirection: docId -> ord.

Expert: Creates a new writer.

Expert: writes a value dictionary for a sorted/sortedset field.

Lucene 4.5 DocValues format. Encodes the four per-document value types (Numeric,Binary,Sorted,SortedSet) with these strategies: : Delta-compressed: per-document integers written in blocks of 16k. For each block the minimum value in that block is encoded, and each entry is a delta from that minimum value. Each block of deltas is compressed with bitpacking. For more information, see . Table-compressed: when the number of unique values is very small (< 256), and when there are unused "gaps" in the range of values used (such as ), a lookup table is written instead. Each per-document entry is instead the ordinal to this table, and those ordinals are compressed with bitpacking (). GCD-compressed: when all numbers share a common divisor, such as dates, the greatest common denominator (GCD) is computed, and quotients are stored using Delta-compressed Numerics. : Fixed-width Binary: one large concatenated is written, along with the fixed length. Each document's value can be addressed directly with multiplication (docID * length). Variable-width Binary: one large concatenated is written, along with end addresses for each document. The addresses are written in blocks of 16k, with the current absolute start for the block, and the average (expected) delta per entry. For each document the deviation from the delta (actual - expected) is written. Prefix-compressed Binary: values are written in chunks of 16, with the first value written completely and other values sharing prefixes. Chunk addresses are written in blocks of 16k, with the current absolute start for the block, and the average (expected) delta per entry. For each chunk the deviation from the delta (actual - expected) is written. : Sorted: a mapping of ordinals to deduplicated terms is written as Prefix-Compressed Binary, along with the per-document ordinals written using one of the numeric strategies above. : SortedSet: a mapping of ordinals to deduplicated terms is written as Prefix-Compressed Binary, an ordinal list and per-document index into this list are written using the numeric strategies above. Files: .dvd: DocValues data .dvm: DocValues metadata The DocValues metadata or .dvm file. For DocValues field, this stores metadata, such as the offset into the DocValues data (.dvd) DocValues metadata (.dvm) --> Header,<Entry>^NumFields,Footer Entry --> NumericEntry | BinaryEntry | SortedEntry | SortedSetEntry NumericEntry --> GCDNumericEntry | TableNumericEntry | DeltaNumericEntry GCDNumericEntry --> NumericHeader,MinValue,GCD TableNumericEntry --> NumericHeader,TableSize,Int64 () ^TableSize DeltaNumericEntry --> NumericHeader NumericHeader --> FieldNumber,EntryType,NumericType,MissingOffset,PackedVersion,DataOffset,Count,BlockSize BinaryEntry --> FixedBinaryEntry | VariableBinaryEntry | PrefixBinaryEntry FixedBinaryEntry --> BinaryHeader VariableBinaryEntry --> BinaryHeader,AddressOffset,PackedVersion,BlockSize PrefixBinaryEntry --> BinaryHeader,AddressInterval,AddressOffset,PackedVersion,BlockSize BinaryHeader --> FieldNumber,EntryType,BinaryType,MissingOffset,MinLength,MaxLength,DataOffset SortedEntry --> FieldNumber,EntryType,BinaryEntry,NumericEntry SortedSetEntry --> EntryType,BinaryEntry,NumericEntry,NumericEntry FieldNumber,PackedVersion,MinLength,MaxLength,BlockSize,ValueCount --> VInt ( EntryType,CompressionType --> Byte ( Header --> CodecHeader () MinValue,GCD,MissingOffset,AddressOffset,DataOffset --> Int64 () TableSize --> vInt () Footer --> CodecFooter () Sorted fields have two entries: a with the value metadata, and an ordinary for the document-to-ord metadata. SortedSet fields have three entries: a with the value metadata, and two s for the document-to-ord-index and ordinal list metadata. FieldNumber of -1 indicates the end of metadata. EntryType is a 0 () or 1 () DataOffset is the pointer to the start of the data in the DocValues data (.dvd) NumericType indicates how Numeric values will be compressed: 0 --> delta-compressed. For each block of 16k integers, every integer is delta-encoded from the minimum value within the block. 1 --> gcd-compressed. When all integers share a common divisor, only quotients are stored using blocks of delta-encoded ints. 2 --> table-compressed. When the number of unique numeric values is small and it would save space, a lookup table of unique values is written, followed by the ordinal for each document. BinaryType indicates how Binary values will be stored: 0 --> fixed-width. All values have the same length, addressing by multiplication. 1 --> variable-width. An address for each value is stored. 2 --> prefix-compressed. An address to the start of every interval'th value is stored. MinLength and MaxLength represent the min and max byte[] value lengths for Binary values. If they are equal, then all values are of a fixed size, and can be addressed as DataOffset + (docID * length). Otherwise, the binary values are of variable size, and packed integer metadata (PackedVersion,BlockSize) is written for the addresses. MissingOffset points to a containing a bitset of all documents that had a value for the field. If its -1, then there are no missing values. Checksum contains the CRC32 checksum of all bytes in the .dvm file up until the checksum. this is used to verify integrity of the file on opening the index. The DocValues data or .dvd file. For DocValues field, this stores the actual per-document data (the heavy-lifting) DocValues data (.dvd) --> Header,<NumericData | BinaryData | SortedData>^NumFields,Footer NumericData --> DeltaCompressedNumerics | TableCompressedNumerics | GCDCompressedNumerics BinaryData --> Byte () ^DataLength,Addresses SortedData --> FST<Int64> () DeltaCompressedNumerics --> BlockPackedInts(blockSize=16k) () TableCompressedNumerics --> PackedInts () GCDCompressedNumerics --> BlockPackedInts(blockSize=16k) () Addresses --> MonotonicBlockPackedInts(blockSize=16k) () Footer --> CodecFooter () SortedSet entries store the list of ordinals in their BinaryData as a sequences of increasing vLongs (), delta-encoded. @lucene.experimental

Sole Constructor

Reader for .

Expert: instantiates a new reader.

Returns an address instance for variable-length binary values. @lucene.internal

Returns an address instance for prefix-compressed binary values. @lucene.internal

Returns an address instance for sortedset ordinal lists. @lucene.internal

Metadata entry for a numeric docvalues field.

Offset to the bitset representing docsWithField, or -1 if no documents have missing values.

Offset to the actual numeric values.

Packed s version used to encode these numerics. NOTE: This was packedIntsVersion (field) in Lucene

Count of values written.

Packed s blocksize.

Metadata entry for a binary docvalues field.

Offset to the bitset representing docsWithField, or -1 if no documents have missing values.

Offset to the actual binary values.

Count of values written.

Offset to the addressing data that maps a value to its slice of the .

Interval of shared prefix chunks (when using prefix-compressed binary).

Packed ints version used to encode addressing information. NOTE: This was packedIntsVersion (field) in Lucene.

Packed ints blocksize.

Metadata entry for a sorted-set docvalues field.

NOTE: This was LongBinaryDocValues in Lucene.

Implements the Lucene 4.6 index format, with configurable per-field postings and docvalues formats. If you want to reuse functionality of this codec in another codec, extend . See package documentation for file format details. @lucene.experimental

Sole constructor.

Returns the postings format that should be used for writing new segments of . The default implementation always returns "Lucene41"

Returns the docvalues format that should be used for writing new segments of . The default implementation always returns "Lucene45"

Lucene 4.6 Field Infos format. Field names are stored in the field info file, with suffix .fnm. FieldInfos (.fnm) --> Header,FieldsCount, <FieldName,FieldNumber, FieldBits,DocValuesBits,DocValuesGen,Attributes> ^FieldsCount,Footer Data types: Header --> CodecHeader () FieldsCount --> VInt () FieldName --> String () FieldBits, DocValuesBits --> Byte () FieldNumber --> VInt () Attributes --> IDictionary<String,String> () DocValuesGen --> Int64 () Footer --> CodecFooter () Field Descriptions: FieldsCount: the number of fields in this file. FieldName: name of the field as a UTF-8 string. FieldNumber: the field's number. Note that unlike previous versions of Lucene, the fields are not numbered implicitly by their order in the file, instead explicitly. FieldBits: a containing field options. The low-order bit is one for indexed fields, and zero for non-indexed fields. The second lowest-order bit is one for fields that have term vectors stored, and zero for fields without term vectors. If the third lowest order-bit is set (0x4), offsets are stored into the postings list in addition to positions. Fourth bit is unused. If the fifth lowest-order bit is set (0x10), norms are omitted for the indexed field. If the sixth lowest-order bit is set (0x20), payloads are stored for the indexed field. If the seventh lowest-order bit is set (0x40), term frequencies and positions omitted for the indexed field. If the eighth lowest-order bit is set (0x80), positions are omitted for the indexed field. DocValuesBits: a containing per-document value types. The type recorded as two four-bit integers, with the high-order bits representing norms options, and the low-order bits representing options. Each four-bit integer can be decoded as such: 0: no DocValues for this field. 1: . () 2: . () 3: . () DocValuesGen is the generation count of the field's . If this is -1, there are no updates to that field. Anything above zero means there are updates stored by . Attributes: a key-value map of codec-private attributes. @lucene.experimental

Sole constructor.

Extension of field infos

Lucene 4.6 FieldInfos reader. @lucene.experimental

Sole constructor.

Lucene 4.6 FieldInfos writer. @lucene.experimental

Sole constructor.

Lucene 4.6 Segment info format. Files: .si: Header, SegVersion, SegSize, IsCompoundFile, Diagnostics, Files, Footer Data types: Header --> CodecHeader () SegSize --> Int32 () SegVersion --> String () Files --> ISet<String> () Diagnostics --> IDictionary<String,String> () IsCompoundFile --> Int8 () Footer --> CodecFooter () Field Descriptions: SegVersion is the code version that created the segment. SegSize is the number of documents contained in the segment index. IsCompoundFile records whether the segment is written as a compound file or not. If this is -1, the segment is not a compound file. If it is 1, the segment is a compound file. The Diagnostics Map is privately written by , as a debugging aid, for each segment it creates. It includes metadata like the current Lucene version, OS, .NET/Java version, why the segment was created (merge, flush, addIndexes), etc. Files is a list of files referred to by this segment. @lucene.experimental

Sole constructor.

File extension used to store .

Lucene 4.6 implementation of . @lucene.experimental

Sole constructor.

Lucene 4.0 implementation of . @lucene.experimental

Sole constructor.

Save a single segment's info.

Exposes flex API, merged from flex API of sub-segments, remapping docIDs (this is used for segment merging). @lucene.experimental

Sole constructor.

Gets or Sets the , which is used to re-map document IDs.

How many sub-readers we are merging.

Returns sub-readers we are merging.

Exposes flex API, merged from flex API of sub-segments, remapping docIDs (this is used for segment merging). @lucene.experimental

Sole constructor.

Sets the , which is used to re-map document IDs.

How many sub-readers we are merging.

Returns sub-readers we are merging.

This abstract class reads skip lists with multiple levels. See for the information about the encoding of the multi level skip lists. Subclasses must implement the abstract method which defines the actual format of the skip data. @lucene.experimental

The maximum number of skip levels possible for this index.

SkipStream for each level.

The start pointer of each skip level.

SkipInterval of each level.

Number of docs skipped per level.

Doc id of current skip entry per level.

Doc id of last read skip entry with docId <= target.

Child pointer of current skip entry per level.

childPointer of last read skip entry with docId <= target.

Creates a .

Creates a , where and are the same.

Returns the id of the doc to which the last call of has skipped.

Skips entries to the first beyond the current whose document number is greater than or equal to . Returns the current doc count.

Seeks the skip entry on the given level.

Disposes all resources used by this object.

Disposes all resources used by this object. Subclasses may override to dispose their own resources.

Initializes the reader, for reuse on a new term.

Loads the skip levels.

Subclasses must implement the actual skip data encoding in this method.

The level skip data shall be read from. The skip stream to read from.

Copies the values of the last read skip entry on this .

Used to buffer the top skip levels.

This abstract class writes skip lists with multiple levels.


            
             Example for skipInterval = 3:
                                                                 c            (skip level 2)
                             c                 c                 c            (skip level 1)
                 x     x     x     x     x     x     x     x     x     x      (skip level 0)
             d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d  (posting list)
                 3     6     9     12    15    18    21    24    27    30     (df)
            
             d - document
             x - skip data
             c - skip data with child pointer
            
             Skip level i contains every skipInterval-th entry from skip level i-1.
             Therefore the number of entries on level i is: floor(df / ((skipInterval ^ (i + 1))).
            
             Each skip entry on a level i>0 contains a pointer to the corresponding skip entry in list i-1.
             this guarantees a logarithmic amount of skips to find the target document.
            
             While this class takes care of writing the different skip levels,
             subclasses must define the actual format of the skip data.

@lucene.experimental

Number of levels in this skip list.

The skip interval in the list with level = 0.

SkipInterval used for level > 0.

For every skip level a different buffer is used.

Creates a .

Creates a , where and are the same.

Allocates internal skip buffers.

Creates new buffers or empties the existing ones.

Subclasses must implement the actual skip data encoding in this method.

The level skip data shall be writing for. The skip buffer to write to.

Writes the current skip data to the buffers. The current document frequency determines the max level is skip data is to be written to.

The current document frequency. If an I/O error occurs.

Writes the buffered skip lists to the given output.

The the skip lists shall be written to. The pointer the skip list starts.

Encodes/decodes per-document score normalization values.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns a to write norms to the index.

Returns a to read norms from the index. NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an should be thrown by the implementation. are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.

Enables per field docvalues support. Note, when extending this class, the name () is written into the index. In order for the field to be read, the name must resolve to your implementation via . This method uses to resolve format names. See for information about how to implement your own . Files written by each docvalues format have an additional suffix containing the format name. For example, in a per-field configuration instead of _1.dat filenames would look like _1_Lucene40_0.dat. @lucene.experimental

attribute name used to store the format name for each field.

attribute name used to store the segment suffix name for each field.

Sole constructor.

Returns the doc values format that should be used for writing new segments of . The field to format mapping is written to the index, so this method is only invoked when writing, not when reading.

Enables per field postings support. Note, when extending this class, the name () is written into the index. In order for the field to be read, the name must resolve to your implementation via . This method uses to resolve format names. See for information about how to implement your own . Files written by each posting format have an additional suffix containing the format name. For example, in a per-field configuration instead of _1.prx filenames would look like _1_Lucene40_0.prx. @lucene.experimental

attribute name used to store the format name for each field.

attribute name used to store the segment suffix name for each field.

Sole constructor.

Returns the postings format that should be used for writing new segments of . The field to format mapping is written to the index, so this method is only invoked when writing, not when reading.

Provides a and . @lucene.experimental

Unique name that's used to retrieve this codec when reading the index.

Sole constructor.

Creates the for this format.

Abstract API that consumes postings for an individual term. The lifecycle is: PostingsConsumer is returned for each term by . is called for each document where the term occurs, specifying id and term frequency for that document. If positions are enabled for the field, then will be called for each occurrence in the document. is called when the producer is done adding positions to the document. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Adds a new doc in this term. will be -1 when term frequencies are omitted for the field.

Add a new position & payload, and start/end offset. A null means no payload; a non-null with zero length also means no payload. Caller may reuse the for the between calls (method must fully consume the payload). and will be -1 when offsets are not indexed.

Called when we are done adding positions & payloads for each doc.

Default merge impl: append documents, mapping around deletes.

Encodes/decodes terms, postings, and proximity data. Note, when extending this class, the name () may written into the index in certain configurations. In order for the segment to be read, the name must resolve to your implementation via . This method uses to resolve format names. If you implement your own format: Subclass this class. Subclass , override , and add the line base.ScanForPostingsFormats(typeof(YourPostingsFormat).Assembly). If you have any format classes in your assembly that are not meant for reading, you can add the to them so they are ignored by the scan. Set the new by calling at application startup. If your format has dependencies, you may also override to inject them via pure DI or a DI container. See DI-Friendly Framework to understand the approach used. PostingsFormat Names Unlike the Java version, format names are by default convention-based on the class name. If you name your custom format class "MyCustomPostingsFormat", the codec name will the same name without the "PostingsFormat" suffix: "MyCustom". You can override this default behavior by using the to name the format differently than this convention. Format names must be all ASCII alphanumeric, and less than 128 characters in length. @lucene.experimental

Zero-length array.

Unique name that's used to retrieve this format when reading the index.

Sets the instance used to instantiate subclasses.

The new . The parameter is null.

Gets the associated factory.

The factory.

Creates a new postings format. The provided name will be written into the index segment in some configurations (such as when using ): in such configurations, for the segment to be read this class should be registered by subclassing and calling in the class constructor. The new can be registered by calling at application startup.

Returns this posting format's name.

Writes a new segment.

Reads a segment. NOTE: by the time this call returns, it must hold open any files it will need to use; else, those files may be deleted. Additionally, required files may be deleted during the execution of this call before there is a chance to open them. Under these circumstances an should be thrown by the implementation. s are expected and will automatically cause a retry of the segment opening logic with the newly revised segments.

Looks up a format by name.

Returns a list of all available format names.

The core terms dictionaries (BlockTermsReader, ) interact with a single instance of this class to manage creation of and instances. It provides an (termsIn) where this class may read any previously stored data that it had written in its corresponding at indexing time. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Performs any initialization, such as reading and verifying the header from the provided terms dictionary .

Return a newly created empty .

Actually decode metadata for next term.

Must fully consume state, since after this call that may be reused.

Returns approximate RAM bytes used.

Checks consistency of this reader. Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files. @lucene.internal

Disposes all resources used by this object.

Implementations must override and should dispose all resources used by this instance.

Extension of to support pluggable term dictionaries. This class contains additional hooks to interact with the provided term dictionaries such as . If you want to re-use an existing implementation and are only interested in customizing the format of the postings list, extend this class instead. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Called once after startup, before any terms have been added. Implementations typically write a header to the provided .

Return a newly created empty

Start a new term. Note that a matching call to is done, only if the term has at least one document.

Finishes the current term. The provided contains the term's summary statistics, and will holds metadata from PBF when returned.

Encode metadata as and . controls whether current term is delta encoded according to latest term. Usually elements in are file pointers, so each one always increases when a new term is consumed. is used to write generic bytes, which are not monotonic. NOTE: sometimes might contain "don't care" values that are unused, e.g. the pointer to postings list may not be defined for some terms but is defined for others, if it is designed to inline some postings data in term dictionary. In this case, the postings writer should always use the last value, so that each element in metadata remains monotonic.

Sets the current field for writing, and returns the fixed length of metadata (which is fixed per field), called when the writing switches to another field.

Disposes all resources used by this object.

Implementations must override and should dispose all resources used by this instance.

Expert: Controls the format of the (segment metadata file). @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns the for reading instances.

Returns the for writing instances.

Specifies an API for classes that can read information. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Read data from a directory.

Directory to read from. Name of the segment to read. IO context. Infos instance to be populated with data. If an I/O error occurs.

Specifies an API for classes that can write out data. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Write data.

If an I/O error occurs.

Controls the format of stored fields.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns a to load stored fields.

Returns a to write stored fields.

Codec API for reading stored fields. You need to implement to read the stored fields for a document, implement (creating clones of any s used, etc), and to cleanup any allocated resources. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Visit the stored fields for document .

Disposes all resources used by this object.

Implementations must override and should dispose all resources used by this instance.

Returns approximate RAM bytes used.

Checks consistency of this reader. Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files. @lucene.internal

Codec API for writing stored fields: For every document, is called, informing the Codec how many fields will be written. is called for each field in the document. After all documents have been written, is called for verification/sanity-checks. Finally the writer is disposed () @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Called before writing the stored fields of the document. will be called times. Note that this is called even if the document has no stored fields, in this case will be zero.

Called when a document and all its fields have been added.

Writes a single stored field.

Aborts writing entirely, implementation should remove any partially-written files, etc.

Called before , passing in the number of documents that were written. Note that this is intentionally redundant (equivalent to the number of calls to , but a should check that this is the case to detect the bug described in LUCENE-1282.

Merges in the stored fields from the readers in . The default implementation skips over deleted documents, and uses , , and , returning the number of documents that were written. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).

Sugar method for + for every stored field in the document.

Disposes all resources used by this object.

Implementations must override and should dispose all resources used by this instance.

Abstract API that consumes terms for an individual field. The lifecycle is: TermsConsumer is returned for each field by . TermsConsumer returns a for each term in . When the producer (e.g. IndexWriter) is done adding documents for the term, it calls , passing in the accumulated term statistics. Producer calls with the accumulated collection statistics when it is finished adding terms to the field. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Starts a new term in this field; this may be called with no corresponding call to finish if the term had no docs.

Finishes the current term; numDocs must be > 0. stats.TotalTermFreq will be -1 when term frequencies are omitted for the field.

Called when we are done adding terms to this field. will be -1 when term frequencies are omitted for the field.

Gets the used to sort terms before feeding to this API.

Default merge impl.

Holder for per-term statistics.

How many documents have at least one occurrence of this term.

Total number of times this term occurs across all documents in the field.

Sole constructor.

Controls the format of term vectors.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns a to read term vectors.

Returns a to write term vectors.

Codec API for reading term vectors: @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns term vectors for this document, or null if term vectors were not indexed. If offsets are available they are in an available from the .

Returns approximate RAM bytes used.

Checks consistency of this reader. Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files. @lucene.internal

Create a clone that one caller at a time may use to read term vectors.

Disposes all resources used by this object.

Implementations must override and should dispose all resources used by this instance.

Codec API for writing term vectors: For every document, is called, informing the how many fields will be written. is called for each field in the document, informing the codec how many terms will be written for that field, and whether or not positions, offsets, or payloads are enabled. Within each field, is called for each term. If offsets and/or positions are enabled, then will be called for each term occurrence. After all documents have been written, is called for verification/sanity-checks. Finally the writer is disposed () @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Called before writing the term vectors of the document. will be called times. Note that if term vectors are enabled, this is called even if the document has no vector fields, in this case will be zero.

Called after a doc and all its fields have been added.

Called before writing the terms of the field. will be called times.

Called after a field and all its terms have been added.

Adds a and its term frequency . If this field has positions and/or offsets enabled, then will be called times respectively.

Called after a term and all its positions have been added.

Adds a term and offsets.

Aborts writing entirely, implementation should remove any partially-written files, etc.

Called by when writing new segments. This is an expert API that allows the codec to consume positions and offsets directly from the indexer. The default implementation calls , but subclasses can override this if they want to efficiently write all the positions, then all the offsets, for example. NOTE: this API is extremely expert and subject to change or removal!!! @lucene.internal

Merges in the term vectors from the readers in . The default implementation skips over deleted documents, and uses , , , , and , returning the number of documents that were written. Implementations can override this method for more sophisticated merging (bulk-byte copying, etc).

Safe (but, slowish) default method to write every vector field in the document.

Return the used to sort terms before feeding to this API.

Disposes all resources used by this object.

Implementations must override and should dispose all resources used by this instance.

Represents an attribute that is used to name a , if a name other than the default naming convention is desired.

Implements the default functionality of . To replace the instance, call at application start up. can be subclassed or passed additional parameters to register additional codecs, inject dependencies, or change caching behavior, as shown in the following examples. Alternatively, can be implemented to provide complete control over codec creation and lifetimes.

Register Additional Codecs

Additional codecs can be added by initializing the instance of and passing an array of -derived types.


            // Register the factory at application start up.
            Codec.SetCodecFactory(new DefaultCodecFactory {
                CustomCodecTypes = new Type[] { typeof(MyCodec), typeof(AnotherCodec) }
            });

Only Use Explicitly Defined Codecs

can be used to explicitly add codec types. In this example, the call to base.Initialize() is excluded to skip the built-in codec registration. Since AnotherCodec doesn't have a default constructor, the method is overridden to supply the required parameters.


            public class ExplicitCodecFactory : DefaultCodecFactory
            {
                protected override void Initialize()
                {
                    // Load specific codecs in a specific order.
                    PutCodecType(typeof(MyCodec));
                    PutCodecType(typeof(AnotherCodec));
                }
                
                protected override Codec NewCodec(Type type)
                {
                    // Special case: AnotherCodec has a required dependency
                    if (typeof(AnotherCodec).Equals(type))
                        return new AnotherCodec(new SomeDependency());
                    
                    return base.NewCodec(type);
                }
            }
            
            // Register the factory at application start up.
            Codec.SetCodecFactory(new ExplicitCodecFactory());

See the namespace documentation for more examples of how to inject dependencies into subclasses.

Use Reflection to Scan an Assembly for Codecs

or can be used to scan assemblies using .NET Reflection for codec types and add all subclasses that are found automatically. This example calls base.Initialize() to load the default codecs prior to scanning for additional codecs.


            public class ScanningCodecFactory : DefaultCodecFactory
            {
                protected override void Initialize()
                {
                    // Load all default codecs
                    base.Initialize();
                    
                    // Load all of the codecs inside of the same assembly that MyCodec is defined in
                    ScanForCodecs(typeof(MyCodec).Assembly);
                }
            }
            
            // Register the factory at application start up.
            Codec.SetCodecFactory(new ScanningCodecFactory());

Codecs in the target assemblie(s) can be excluded from the scan by decorating them with the .

Creates a new instance of .

An array of custom -derived types to be registered. This property can be initialized during construction of to make your custom codecs known to Lucene. These types will be registered after the default Lucene types, so if a custom type has the same name as a Lucene (via ) the custom type will replace the Lucene type with the same name.

Initializes the codec type cache with the known types. Override this method (and optionally call base.Initialize()) to add your own types by calling or . If two types have the same name by using the , the last one registered wins.

Scans the given for subclasses of and adds their names to the . Note that names will be automatically overridden if the name appears multiple times - the last match wins.

A list of assemblies to scan. The assemblies will be scanned from first to last, and the last match for each name wins.

Scans the given for subclasses of and adds their names to the . Note that names will be automatically overridden if the name appears multiple times - the last match wins.

The assembly to scan.

Adds a type to the , using the name provided in the , if present, or the name of the codec class minus the "Codec" suffix as the name by default. Note that if a with the same name already exists in the map, calling this method will update it to the new type.

A type that subclasses .

Gets the instance from the provided .

The name of the instance to retrieve. The instance.

Gets the instance from the provided .

The of to retrieve. The instance.

Instantiates a based on the provided .

The of to instantiate. The new instance.

Gets the from the provided .

The name of the to retrieve. The .

Gets a list of the available s (by name).

A of names.

Register Additional DocValuesFormats

Additional codecs can be added by initializing the instance of and passing an array of -derived types.


            // Register the factory at application start up.
            DocValuesFormat.SetDocValuesFormatFactory(new DefaultDocValuesFormatFactory {
                CustomDocValuesFormatTypes = new Type[] { typeof(MyDocValuesFormat), typeof(AnotherDocValuesFormat) }
            });

Only Use Explicitly Defined DocValuesFormats

can be used to explicitly add codec types. In this example, the call to base.Initialize() is excluded to skip the built-in codec registration. Since AnotherDocValuesFormat doesn't have a default constructor, the method is overridden to supply the required parameters.


            public class ExplicitDocValuesFormatFactory : DefaultDocValuesFormatFactory
            {
                protected override void Initialize()
                {
                    // Load specific codecs in a specific order.
                    PutDocValuesFormatType(typeof(MyDocValuesFormat));
                    PutDocValuesFormatType(typeof(AnotherDocValuesFormat));
                }
                
                protected override DocValuesFormat NewDocValuesFormat(Type type)
                {
                    // Special case: AnotherDocValuesFormat has a required dependency
                    if (typeof(AnotherDocValuesFormat).Equals(type))
                        return new AnotherDocValuesFormat(new SomeDependency());
                    
                    return base.NewDocValuesFormat(type);
                }
            }
            
            // Register the factory at application start up.
            DocValuesFormat.SetDocValuesFormatFactory(new ExplicitDocValuesFormatFactory());

See the namespace documentation for more examples of how to inject dependencies into subclasses.

Use Reflection to Scan an Assembly for DocValuesFormats

or can be used to scan assemblies using .NET Reflection for codec types and add all subclasses that are found automatically.


            public class ScanningDocValuesFormatFactory : DefaultDocValuesFormatFactory
            {
                protected override void Initialize()
                {
                    // Load all default codecs
                    base.Initialize();
                    
                    // Load all of the codecs inside of the same assembly that MyDocValuesFormat is defined in
                    ScanForDocValuesFormats(typeof(MyDocValuesFormat).Assembly);
                }
            }
            
            // Register the factory at application start up.
            DocValuesFormat.SetDocValuesFormatFactory(new ScanningDocValuesFormatFactory());

Doc values formats in the target assembly can be excluded from the scan by decorating them with the .

Creates a new instance of .

Initializes the doc values type cache with the known types. Override this method (and optionally call base.Initialize()) to add your own types by calling or . If two types have the same name by using the , the last one registered wins.

Scans the given for subclasses of and adds their names to the . Note that names will be automatically overridden if the name appears multiple times - the last match wins.

A list of assemblies to scan. The assemblies will be scanned from first to last, and the last match for each name wins.

Scans the given for subclasses of and adds their names to the . Note that names will be automatically overridden if the name appears multiple times - the last match wins.

The assembly to scan.

Adds a type to the , using the name provided in the , if present, or the name of the codec class minus the "DocValuesFormat" suffix as the name by default. Note that if a with the same name already exists in the map, calling this method will update it to the new type.

A type that subclasses .

Gets the instance from the provided .

The name of the instance to retrieve. The instance.

Gets the instance from the provided .

The of to retrieve. The instance.

Instantiates a based on the provided .

The of to instantiate. The new instance.

Gets the from the provided .

The name of the to retrieve. The .

Gets a list of the available s (by name).

A of names.

Register Additional PostingsFormats

Additional codecs can be added by initializing the instance of and passing an array of -derived types.


            // Register the factory at application start up.
            PostingsFormat.SetPostingsFormatFactory(new DefaultPostingsFormatFactory {
                CustomPostingsFormatTypes = new Type[] { typeof(MyPostingsFormat), typeof(AnotherPostingsFormat) }
            });

Only Use Explicitly Defined PostingsFormats

can be used to explicitly add codec types. In this example, the call to base.Initialize() is excluded to skip the built-in codec registration. Since AnotherPostingsFormat doesn't have a default constructor, the method is overridden to supply the required parameters.


            public class ExplicitPostingsFormatFactory : DefaultPostingsFormatFactory
            {
                protected override void Initialize()
                {
                    // Load specific codecs in a specific order.
                    PutPostingsFormatType(typeof(MyPostingsFormat));
                    PutPostingsFormatType(typeof(AnotherPostingsFormat));
                }
                
                protected override PostingsFormat NewPostingsFormat(Type type)
                {
                    // Special case: AnotherPostingsFormat has a required dependency
                    if (typeof(AnotherPostingsFormat).Equals(type))
                        return new AnotherPostingsFormat(new SomeDependency());
                    
                    return base.NewPostingsFormat(type);
                }
            }
            
            // Register the factory at application start up.
            PostingsFormat.SetPostingsFormatFactory(new ExplicitPostingsFormatFactory());

See the namespace documentation for more examples of how to inject dependencies into subclasses.

Use Reflection to Scan an Assembly for PostingsFormats

or can be used to scan assemblies using .NET Reflection for codec types and add all subclasses that are found automatically.


            public class ScanningPostingsFormatFactory : DefaultPostingsFormatFactory
            {
                protected override void Initialize()
                {
                    // Load all default codecs
                    base.Initialize();
                    
                    // Load all of the codecs inside of the same assembly that MyPostingsFormat is defined in
                    ScanForPostingsFormats(typeof(MyPostingsFormat).Assembly);
                }
            }
            
            // Register the factory at application start up.
            PostingsFormat.SetPostingsFormatFactory(new ScanningPostingsFormatFactory());

Postings formats in the target assembly can be excluded from the scan by decorating them with the .

Creates a new instance of .

Scans the given for subclasses of and adds their names to the . Note that names will be automatically overridden if the name appears multiple times - the last match wins.

A list of assemblies to scan. The assemblies will be scanned from first to last, and the last match for each name wins.

Scans the given for subclasses of and adds their names to the . Note that names will be automatically overridden if the name appears multiple times - the last match wins.

The assembly to scan.

A type that subclasses .

Gets the instance from the provided .

The name of the instance to retrieve. The instance.

Gets the instance from the provided .

The of to retrieve. The instance.

Instantiates a based on the provided .

The of to instantiate. The new instance.

Gets the from the provided .

The name of the to retrieve. The .

Gets a list of the available s (by name).

A of names.

Represents an attribute that is used to name a , if a name other than the default naming convention is desired.

When placed on a class that subclasses , adding this attribute will exclude the type from consideration in the method. However, the type can still be added manually using .

Contract for extending the functionality of implementations so they can be injected with dependencies. To set the , call . See the namespace documentation for some common usage examples.

Gets the instance from the provided .

The name of the instance to retrieve. The instance.

Contract for extending the functionality of implementations so they can be injected with dependencies. To set the , call . See the namespace documentation for some common usage examples.

Gets the instance from the provided .

The name of the instance to retrieve. The instance.

Contract for extending the functionality of implementations so they can be injected with dependencies. To set the , call . See the namespace documentation for some common usage examples.

Gets the instance from the provided .

The name of the instance to retrieve. The instance.

Represents an attribute that is used to name a , if a name other than the default naming convention is desired.

Field that stores a per-document value. The values are stored directly with no sharing, which is a good fit when the fields don't share (many) values, such as a title field. If values may be shared and sorted it's better to use . Here's an example usage:


               document.Add(new BinaryDocValuesField(name, new BytesRef("hello")));

If you also need to store the value, you should add a separate instance.

Type for straight bytes .

Create a new binary field.

field name binary content if the field is null.

Field that stores a per-document value for scoring, sorting or value retrieval. Here's an example usage:


               document.Add(new ByteDocValuesField(name, (byte) 22));

If you also need to store the value, you should add a separate instance.

Creates a new field with the specified 8-bit byte value

field name 8-bit byte value if the field name is null.

Simple utility class providing static methods to compress and decompress binary data for stored fields. this class uses the class to compress and decompress.

Compresses the specified range using the specified .

Compresses the specified range, with default level

Compresses all s in the array, with default level

Compresses the value, with default level

Compresses the value using the specified .

Decompress the array previously returned by compress (referenced by the provided )

Decompress the array previously returned by compress

Decompress the array previously returned by back into a

Decompress the array (referenced by the provided ) previously returned by back into a

Provides support for converting dates to strings and vice-versa. The strings are structured so that lexicographic sorting orders them by date, which makes them suitable for use as field values and search terms. This class also helps you to limit the resolution of your dates. Do not save dates with a finer resolution than you really need, as then and will require more memory and become slower. Another approach is , which provides a sortable binary representation (prefix encoded) of numeric values, which date/time are. For indexing a , just get the from and index this as a numeric value with and use to query it.

Returns the date format string for the specified or null if the resolution is invalid.

Converts a to a string suitable for indexing using the specified . The is converted according to its property to the Universal Coordinated Time (UTC) prior to rounding to the the specified . If is , is assumed.

The date to be converted. The desired resolution, see . An invariant string in format yyyyMMddHHmmssSSS or shorter, depending on ; using UTC as the timezone. is not defined in the enum.

Converts a to a string suitable for indexing using the specified and . The is converted from the specified to Universal Coordinated Time (UTC) prior to rounding to the the specified .

The date to be converted. The time zone of the specified . The desired resolution, see . An invariant string in format yyyyMMddHHmmssSSS or shorter, depending on ; using UTC as the timezone. is null. is not defined in the enum.

Converts a to a string suitable for indexing using the specified . The is converted using its property.

The date to be converted. The desired resolution, see . An invariant string in format yyyyMMddHHmmssSSS or shorter, depending on ; using UTC as the timezone.

Converts from a numeric representation of a time to a string suitable for indexing. NOTE: For compatibility with Lucene.NET 3.0.3 and Lucene.NET 4.8.0-beta00001 through 4.8.0-beta00015 specify as .

The ticks that represent the date to be converted. The desired resolution, see . The numeric representation of . An invariant string in format yyyyMMddHHmmssSSS or shorter, depending on ; using GMT as timezone. is not defined in the enum.

Converts a string produced by or back to a time, represented as a . NOTE: For compatibility with Lucene.NET 3.0.3 and Lucene.NET 4.8.0-beta00001 through 4.8.0-beta00015 specify as .

The date string to be converted. The numeric representation of the return value. A numeric representation of represented as specified by . is not in the expected format. is null. is not defined in the enum.

Converts a string produced by or back to a time, represented as a object.

the date string to be converted the parsed time as a object if is not in the expected format is null.

Limit a date's resolution. For example, the date 2004-09-21 13:50:11 will be changed to 2004-09-01 00:00:00 when using .

The to be rounded. The desired resolution of the to be returned. The with all values more precise than set to their minimum value (0 or 1 depending on the field). is not defined in the enum.

Limit a date's resolution. For example, the time 1095774611000 (which represents 2004-09-21 13:50:11) will be changed to 1093996800000 (2004-09-01 00:00:00) when using and for both and . The ticks 632313714110000000 (which represents 2004-09-21 13:50:11) will be changed to 632295936000000000 (2004-09-01 00:00:00) when using and for both and . NOTE: For compatibility with Lucene.NET 3.0.3 and Lucene.NET 4.8.0-beta00001 through 4.8.0-beta00015 specify as and as .

The ticks that represent the date to be rounded. The desired resolution of the date to be returned. The numeric representation of . The numeric representation of the return value. The date with all values more precise than set to their minimum value (0 or 1 depending on the field). The return value is expressed in ticks. is not defined in the enum. -or- is not defined in the enum. -or- is not defined in the enum.

Converts from .NET ticks to the number of milliseconds since January 1, 1970, 00:00:00 UTC (also known as the "epoch"). This is the value that is stored in Java Lucene indexes and can be used for storing values that can be read by Java Lucene.

The .NET ticks to be converted. The converted ticks to number of milliseconds since January 1, 1970, 00:00:00 UTC (also known as the "epoch").

Converts from the number of milliseconds since January 1, 1970, 00:00:00 UTC (also known as the "epoch") to .NET ticks.

The number of milliseconds since January 1, 1970, 00:00:00 UTC (also known as the "epoch") to be converted. The converted .NET ticks that can be used to create a or .

Specifies the time granularity.

Limit a date's resolution to year granularity.

Limit a date's resolution to month granularity.

Limit a date's resolution to day granularity.

Limit a date's resolution to hour granularity.

Limit a date's resolution to minute granularity.

Limit a date's resolution to second granularity.

Limit a date's resolution to millisecond granularity.

Specifies how a time will be represented as a .

The number of milliseconds since January 1, 1970, 00:00:00 UTC (also known as the "epoch"). This is the format that Lucene uses, and it is recommended to store this value in the index for compatibility.

The .NET ticks representing a date. Specify this to pass the raw ticks from or to instantiate a new from the result.

.NET ticks as total milliseconds. Input values must be converted using the formula ticks / . Output values can be converted to ticks using ticks * . This option is provided for compatibility with Lucene.NET 3.0.3 and Lucene.NET 4.8.0-beta00001 through 4.8.0-beta00015, since it was the only option for input representation.

Field that stores a per-document value. Here's an example usage:


               document.Add(new DerefBytesDocValuesField(name, new BytesRef("hello")));

If you also need to store the value, you should add a separate instance.

Type for bytes : all with the same length

Type for bytes : can have variable lengths

Create a new fixed or variable-length field.

field name binary content if the field name is null

Create a new fixed or variable length field.

field name binary content (ignored) if the field name is null

Documents are the unit of indexing and search. A Document is a set of fields. Each field has a name and a textual value. A field may be stored () with the document, in which case it is returned with search hits on the document. Thus each document should typically contain one or more stored fields which uniquely identify it. Note that fields which are not are not available in documents retrieved from the index, e.g. with or .

Constructs a new document with no fields.

Adds a field to a document. Several fields may be added with the same name. In this case, if the fields are indexed, their text is treated as though appended for the purposes of search. Note that add like the and methods only makes sense prior to adding a document to an index. These methods cannot be used to change the content of an existing index! In order to achieve this, a document has to be deleted from an index and a new changed version of that document has to be added.

Removes field with the specified name from the document. If multiple fields exist with this name, this method removes the first field that has been added. If there is no field with the specified name, the document remains unchanged. Note that the and methods like the add method only make sense prior to adding a document to an index. These methods cannot be used to change the content of an existing index! In order to achieve this, a document has to be deleted from an index and a new changed version of that document has to be added.

Removes all fields with the given name from the document. If there is no field with the specified name, the document remains unchanged. Note that the and methods like the add method only make sense prior to adding a document to an index. These methods cannot be used to change the content of an existing index! In order to achieve this, a document has to be deleted from an index and a new changed version of that document has to be added.

Returns an array of byte arrays for of the fields that have the name specified as the method parameter. This method returns an empty array when there are no matching fields. It never returns null.

the name of the field a of binary field values

Returns an array of bytes for the first (or only) field that has the name specified as the method parameter. this method will return null if no binary fields with the specified name are available. There may be non-binary fields with the same name.

the name of the field. a containing the binary field value or null

Returns a field with the given name if any exist in this document, or null. If multiple fields exists with this name, this method returns the first value added.

Returns an array of s with the given name. This method returns an empty array when there are no matching fields. It never returns null.

the name of the field a array

Returns a List of all the fields in a document. Note that fields which are not stored are not available in documents retrieved from the index, e.g. or .

Returns an array of values of the field specified as the method parameter. This method returns an empty array when there are no matching fields. It never returns null. For , , and it returns the string value of the number. If you want the actual numeric field instances back, use .

the name of the field a of field values

the name of the field A standard or custom numeric format string. This parameter has no effect if this field is non-numeric. a of field values

the name of the field An object that supplies culture-specific formatting information.This parameter has no effect if this field is non-numeric. a of field values

the name of the field A standard or custom numeric format string. This parameter has no effect if this field is non-numeric. An object that supplies culture-specific formatting information. This parameter has no effect if this field is non-numeric. a of field values

Returns the string value of the field with the given name if any exist in this document, or null. If multiple fields exist with this name, this method returns the first value added. If only binary fields with this name exist, returns null. For , , and it returns the string value of the number. If you want the actual numeric field instance back, use .

A standard or custom numeric format string. This parameter has no effect if this field is non-numeric.

An object that supplies culture-specific formatting information. This parameter has no effect if this field is non-numeric.

A standard or custom numeric format string. This parameter has no effect if this field is non-numeric. An object that supplies culture-specific formatting information. This parameter has no effect if this field is non-numeric.

Prints the fields of a document for human consumption.

An object that supplies culture-specific formatting information. This parameter has no effect if this field is non-numeric.

Prints the fields of a document for human consumption.

A that creates a containing all stored fields, or only specific requested fields provided to . This is used by to load a document. @lucene.experimental

Load only fields named in the provided .

Set of fields to load, or null (all fields).

Load only fields named in the provided fields.

Load all stored fields.

Retrieve the visited document.

Document populated with stored fields. Note that only the stored information in the field instances is valid, data such as boosts, indexing options, term vector options, etc is not set.

Syntactic sugar for encoding doubles as via . Per-document double values can be retrieved via . NOTE: In most all cases this will be rather inefficient, requiring eight bytes per document. Consider encoding double values yourself with only as much precision as you require.

Creates a new field with the specified 64-bit double value

field name 64-bit double value if the field name is null

Field that indexes values for efficient range filtering and sorting. Here's an example usage:


             document.Add(new DoubleField(name, 6.0, Field.Store.NO));

For optimal performance, re-use the and instance for more than one document:


                 DoubleField field = new DoubleField(name, 0.0, Field.Store.NO);
                 Document document = new Document();
                 document.Add(field);
            
                 for (all documents)
                 {
                     ...
                     field.SetDoubleValue(value)
                     writer.AddDocument(document);
                     ...
                 }

See also , , . To perform range querying or filtering against a , use or . To sort according to a , use the normal numeric sort types, eg . values can also be loaded directly from . You may add the same field name as an to the same document more than once. Range querying and filtering will be the logical OR of all values; so a range query will hit all documents that have at least one value in the range. However sort behavior is not defined. If you need to sort, you should separately index a single-valued . A will consume somewhat more disk space in the index than an ordinary single-valued field. However, for a typical index that includes substantial textual content per document, this increase will likely be in the noise. Within Lucene, each numeric value is indexed as a trie structure, where each term is logically assigned to larger and larger pre-defined brackets (which are simply lower-precision representations of the value). The step size between each successive bracket is called the precisionStep, measured in bits. Smaller precisionStep values result in larger number of brackets, which consumes more disk space in the index but may result in faster range search performance. The default value, 4, was selected for a reasonable tradeoff of disk space consumption versus performance. You can create a custom and invoke the setter if you'd like to change the value. Note that you must also specify a congruent value when creating or . For low cardinality fields larger precision steps are good. If the cardinality is < 100, it is fair to use , which produces one term per value. For more information on the internals of numeric trie indexing, including the (precisionStep) configuration, see . The format of indexed values is described in . If you only need to sort by numeric value, and never run range querying/filtering, you can index using a precisionStep of . this will minimize disk space consumed. More advanced users can instead use directly, when indexing numbers. This class is a wrapper around this token stream type for easier, more intuitive usage. @since 2.9

Type for a that is not stored: normalization factors, frequencies, and positions are omitted.

Type for a stored : normalization factors, frequencies, and positions are omitted.

Creates a stored or un-stored with the provided value and default precisionStep (4).

field name 64-bit value if the content should also be stored if the field name is null.

Expert: allows you to customize the .

field name 64-bit double value customized field type: must have of . if the field name or type is null, or if the field type does not have a

Expert: directly create a field for a document. Most users should use one of the sugar subclasses: , , , , , , , , , . A field is a section of a . Each field has three parts: name, type and value. Values may be text (, or pre-analyzed ), binary (), or numeric (, , , or ). Fields are optionally stored in the index, so that they may be returned with hits on the document. NOTE: the field type is an . Making changes to the state of the will impact any Field it is used in. It is strongly recommended that no changes be made after instantiation.

Field's type

Field's name

Field's value.

Field's value Setting this property will automatically set the backing field for the property.

Field's numeric data type (or if field non-numeric).

Pre-analyzed for indexed fields; this is separate from because you are allowed to have both; eg maybe field has a value but you customize how it's tokenized

Field's boost

Expert: creates a field with no initial value. Intended only for custom subclasses.

field name field type if either the or is null.

Create field with value.

field name reader value field type if is true, or if is false. if the , or is null

Create field with value.

field name TokenStream value field type if is true, or if is false, or if is false. if the , or is null

Create field with binary value. NOTE: the provided is not copied so be sure not to change it until you're done with this field.

field name byte array pointing to binary content (not copied) field type if the is true the field is null, or if the is null

Create field with binary value. NOTE: the provided is not copied so be sure not to change it until you're done with this field.

field name byte array pointing to binary content (not copied) starting position of the byte array valid length of the byte array field type if the is true if the field is null, or the is null

Create field with binary value. NOTE: the provided BytesRef is not copied so be sure not to change it until you're done with this field.

field name BytesRef pointing to binary content (not copied) field type if the is true if the field is null, or the is null

Create field with value.

field name string value field type if the field's type is neither indexed() nor stored(), or if indexed() is false but storeTermVectors() is true. if either the or is null, or if the is null

The value of the field as a , or null. If null, the value or binary value is used. Exactly one of , , and must be set.

The string representation of the value if it is either a or numeric type.

The value of the field as a , or null. If null, the value or binary value is used. Exactly one of , , and must be set.

An object that supplies culture-specific formatting information. This parameter has no effect if this field is non-numeric. The string representation of the value if it is either a or numeric type.

The value of the field as a , or null. If null, the value or binary value is used. Exactly one of , , and must be set.

A standard or custom numeric format string. This parameter has no effect if this field is non-numeric. The string representation of the value if it is either a or numeric type.

The value of the field as a , or null. If null, the value or binary value is used. Exactly one of , , and must be set.

The for this field to be used when indexing, or null. If null, the value or value is analyzed to produce the indexed tokens.

Expert: change the value of this field. This can be used during indexing to re-use a single instance to improve indexing speed by avoiding GC cost of new'ing and reclaiming instances. Typically a single instance is re-used as well. This helps most on small documents. Each instance should only be used once within a single instance. See ImproveIndexingSpeed for details.

Expert: change the value of this field. See .

Expert: change the value of this field. See . NOTE: the provided is not copied so be sure not to change it until you're done with this field.

Expert: change the value of this field. See .

Expert: sets the token stream to be used for indexing and causes and to return true. May be combined with stored values from or

The field's name

Gets or sets the boost factor on this field.

The default value is 1.0f (no boost). (setter only) if this field is not indexed, or if it omits norms.

Gets the of the underlying value, or if the value is not set or non-numeric. Expert: The difference between this property and is this is represents the current state of the field (whether being written or read) and the property represents instructions on how the field will be written, but does not re-populate when reading back from an index (it is write-only). In Java, the numeric type was determined by checking the type of . However, since there are no reference number types in .NET, using so will cause boxing/unboxing. It is therefore recommended to use this property to check the underlying type and the corresponding Get*Value() method to retrieve the value. NOTE: Since Lucene codecs do not support or , fields created with these types will always be when read back from the index.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Non-null if this field has a binary value.

Prints a for human consumption.

A standard or custom numeric format string. This parameter has no effect if this field is non-numeric.

Prints a for human consumption.

An object that supplies culture-specific formatting information. This parameter has no effect if this field is non-numeric.

Prints a for human consumption.

Returns the for this field as type .

Creates a new that returns a as single token. Warning: Does not initialize the value, you must call afterwards!

Sets the string value.

Specifies whether and how a field should be stored.

Store the original field value in the index. this is useful for short texts like a document's title which should be displayed with the results. The value is stored in its original form, i.e. no analyzer is used before it is stored.

Do not store the field's value in the index.

Specifies whether and how a field should be indexed.

Do not index the field value. This field can thus not be searched, but one can still access its contents provided it is stored.

Index the tokens produced by running the field's value through an . This is useful for common text.

Index the field's value without using an , so it can be searched. As no analyzer is used the value will be stored as a single term. This is useful for unique Ids like product numbers.

Expert: Index the field's value without an Analyzer, and also disable the storing of norms. Note that you can also separately enable/disable norms by setting . No norms means that index-time field and document boosting and field length normalization are disabled. The benefit is less memory usage as norms take up one byte of RAM per indexed field for every document in the index, during searching. Note that once you index a given field with norms enabled, disabling norms will have no effect. In other words, for this to have the above described effect on a field, all instances of that field must be indexed with NOT_ANALYZED_NO_NORMS from the beginning.

Expert: Index the tokens produced by running the field's value through an Analyzer, and also separately disable the storing of norms. See for what norms are and why you may want to disable them.

Specifies whether and how a field should have term vectors.

Do not store term vectors.

Store the term vectors of each document. A term vector is a list of the document's terms and their number of occurrences in that document.

Store the term vector + token position information

Store the term vector + Token offset information

Store the term vector + Token position and offset information

Translates the pre-4.0 enums for specifying how a field should be indexed into the 4.0 approach.

Create a field by specifying its , and how it will be saved in the index. Term vectors will not be stored in the index.

The name of the field The string to process Whether should be stored in the index Whether the field should be indexed, and if so, if it should be tokenized before indexing if or is null if the field is neither stored nor indexed

Create a field by specifying its , and how it will be saved in the index.

The name of the field The string to process Whether should be stored in the index Whether the field should be indexed, and if so, if it should be tokenized before indexing Whether term vector should be stored if or is null in any of the following situations: the field is neither stored nor indexed the field is not indexed but termVector is

Create a tokenized and indexed field that is not stored. Term vectors will not be stored. The is read only when the is added to the index, i.e. you may not close the until has been called.

The name of the field The reader with the content if or is null

Create a tokenized and indexed field that is not stored, optionally with storing term vectors. The is read only when the is added to the index, i.e. you may not close the until has been called.

The name of the field The reader with the content Whether term vector should be stored if or is null

Create a tokenized and indexed field that is not stored. Term vectors will not be stored. This is useful for pre-analyzed fields. The is read only when the is added to the index, i.e. you may not close the until has been called.

The name of the field The with the content if or is null

Create a tokenized and indexed field that is not stored, optionally with storing term vectors. This is useful for pre-analyzed fields. The is read only when the is added to the index, i.e. you may not close the until has been called.

The name of the field The with the content Whether term vector should be stored if or is null

Create a stored field with binary value. Optionally the value may be compressed.

The name of the field The binary value

Create a stored field with binary value. Optionally the value may be compressed.

The name of the field The binary value Starting offset in value where this 's bytes are Number of bytes to use for this , starting at offset

LUCENENET specific extension methods to add functionality to enumerations that mimic Lucene

Get the best representation of a TermVector given the flags.

Data type of the numeric value

No numeric type (the field is not numeric).

8-bit unsigned integer numeric type

16-bit short numeric type

32-bit integer numeric type

64-bit long numeric type

32-bit float numeric type

64-bit double numeric type

Describes the properties of a field.

Create a new mutable with all of the properties from

Create a new with default properties.

Prevents future changes. Note, it is recommended that this is called once the 's properties have been set, to prevent unintentional state changes.

this

Set to true to index (invert) this field. The default is false.

if this is frozen against future modifications.

Set to true to store this field. The default is false.

if this is frozen against future modifications.

Set to true to tokenize this field's contents via the configured . The default is true.

if this is frozen against future modifications.

Set to true if this field's indexed form should be also stored into term vectors. The default is false.

if this is frozen against future modifications.

Set to true to also store token character offsets into the term vector for this field. The default is false.

if this is frozen against future modifications.

Set to true to also store token positions into the term vector for this field. The default is false.

if this is frozen against future modifications.

Set to true to also store token payloads into the term vector for this field. The default is false.

if this is frozen against future modifications.

Set to true to omit normalization values for the field. The default is false.

if this is frozen against future modifications.

Sets the indexing options for the field. The default is .

if this is frozen against future modifications.

Specifies the field's numeric type, or set to if the field has no numeric type. If not then the field's value will be indexed numerically so that can be used at search time. The default is .

if this is frozen against future modifications.

Sets the numeric precision step for the field. This has no effect if is . The default is .

if precisionStep is less than 1. if this is frozen against future modifications.

Prints a for human consumption.

Sets the field's , or set to if no should be stored. The default is (no ).

if this is frozen against future modifications.

Data type of the numeric value @since 3.2

No numeric type will be used. NOTE: This is the same as setting to null in Lucene

32-bit integer numeric type NOTE: This was INT in Lucene

64-bit long numeric type NOTE: This was LONG in Lucene

32-bit float numeric type NOTE: This was FLOAT in Lucene

64-bit double numeric type

Syntactic sugar for encoding floats as via . Per-document floating point values can be retrieved via . NOTE: In most all cases this will be rather inefficient, requiring four bytes per document. Consider encoding floating point values yourself with only as much precision as you require. NOTE: This was FloatDocValuesField in Lucene

Creates a new DocValues field with the specified 32-bit value

field name 32-bit value if the field name is null

Field that indexes values for efficient range filtering and sorting. Here's an example usage:


             document.Add(new SingleField(name, 6.0F, Field.Store.NO));

For optimal performance, re-use the and instance for more than one document:


                 FloatField field = new SingleField(name, 0.0F, Field.Store.NO);
                 Document document = new Document();
                 document.Add(field);
            
                 for (all documents) 
                 {
                     ...
                     field.SetSingleValue(value)
                     writer.AddDocument(document);
                     ...
                 }

See also , , . To perform range querying or filtering against a , use or . To sort according to a , use the normal numeric sort types, eg . values can also be loaded directly from . You may add the same field name as an to the same document more than once. Range querying and filtering will be the logical OR of all values; so a range query will hit all documents that have at least one value in the range. However sort behavior is not defined. If you need to sort, you should separately index a single-valued . A will consume somewhat more disk space in the index than an ordinary single-valued field. However, for a typical index that includes substantial textual content per document, this increase will likely be in the noise. Within Lucene, each numeric value is indexed as a trie structure, where each term is logically assigned to larger and larger pre-defined brackets (which are simply lower-precision representations of the value). The step size between each successive bracket is called the precisionStep, measured in bits. Smaller precisionStep values result in larger number of brackets, which consumes more disk space in the index but may result in faster range search performance. The default value, 4, was selected for a reasonable tradeoff of disk space consumption versus performance. You can create a custom and invoke the setter if you'd like to change the value. Note that you must also specify a congruent value when creating or . For low cardinality fields larger precision steps are good. If the cardinality is < 100, it is fair to use , which produces one term per value. For more information on the internals of numeric trie indexing, including the precisionStep configuration, see . The format of indexed values is described in . If you only need to sort by numeric value, and never run range querying/filtering, you can index using a precisionStep of . this will minimize disk space consumed. More advanced users can instead use directly, when indexing numbers. This class is a wrapper around this token stream type for easier, more intuitive usage. NOTE: This was FloatField in Lucene @since 2.9

Type for a that is not stored: normalization factors, frequencies, and positions are omitted.

Type for a stored : normalization factors, frequencies, and positions are omitted.

Creates a stored or un-stored with the provided value and default precisionStep (4).

field name 32-bit value if the content should also be stored if the field is null.

Expert: allows you to customize the .

field name 32-bit value customized field type: must have of . if the field or is null. if the field type does not have a

Field that stores a per-document value for scoring, sorting or value retrieval. Here's an example usage:


                 document.Add(new Int32DocValuesField(name, 22));

If you also need to store the value, you should add a separate instance. NOTE: This was IntDocValuesField in Lucene

Creates a new DocValues field with the specified 32-bit value

field name 32-bit value if the field is null

Field that indexes values for efficient range filtering and sorting. Here's an example usage:


                 document.Add(new Int32Field(name, 6, Field.Store.NO));

For optimal performance, re-use the and instance for more than one document:


                 Int32Field field = new Int32Field(name, 6, Field.Store.NO);
                 Document document = new Document();
                 document.Add(field);
            
                 for (all documents) 
                 {
                     ...
                     field.SetInt32Value(value)
                     writer.AddDocument(document);
                     ...
                 }

See also , , . To perform range querying or filtering against a , use or . To sort according to a , use the normal numeric sort types, eg . values can also be loaded directly from . You may add the same field name as an to the same document more than once. Range querying and filtering will be the logical OR of all values; so a range query will hit all documents that have at least one value in the range. However sort behavior is not defined. If you need to sort, you should separately index a single-valued . An will consume somewhat more disk space in the index than an ordinary single-valued field. However, for a typical index that includes substantial textual content per document, this increase will likely be in the noise. Within Lucene, each numeric value is indexed as a trie structure, where each term is logically assigned to larger and larger pre-defined brackets (which are simply lower-precision representations of the value). The step size between each successive bracket is called the precisionStep, measured in bits. Smaller precisionStep values result in larger number of brackets, which consumes more disk space in the index but may result in faster range search performance. The default value, 4, was selected for a reasonable tradeoff of disk space consumption versus performance. You can create a custom and invoke the setter if you'd like to change the value. Note that you must also specify a congruent value when creating or . For low cardinality fields larger precision steps are good. If the cardinality is < 100, it is fair to use , which produces one term per value. For more information on the internals of numeric trie indexing, including the precisionStep configuration, see . The format of indexed values is described in . If you only need to sort by numeric value, and never run range querying/filtering, you can index using a precisionStep of . this will minimize disk space consumed. More advanced users can instead use directly, when indexing numbers. this class is a wrapper around this token stream type for easier, more intuitive usage. NOTE: This was IntField in Lucene @since 2.9

Type for an that is not stored: normalization factors, frequencies, and positions are omitted.

Type for a stored : normalization factors, frequencies, and positions are omitted.

Creates a stored or un-stored with the provided value and default precisionStep (4).

field name 32-bit value if the content should also be stored if the field is null.

Expert: allows you to customize the .

field name 32-bit value customized field type: must have of . if the field or is null. if the field type does not have a of

Field that stores a per-document value for scoring, sorting or value retrieval. Here's an example usage:


                 document.Add(new Int64DocValuesField(name, 22L));

If you also need to store the value, you should add a separate instance. NOTE: This was LongDocValuesField in Lucene

Creates a new DocValues field with the specified 64-bit value

field name 64-bit value if the field is null

Field that indexes values for efficient range filtering and sorting. Here's an example usage:


             document.Add(new Int64Field(name, 6L, Field.Store.NO));

For optimal performance, re-use the and instance for more than one document:


                 Int64Field field = new Int64Field(name, 0L, Field.Store.NO);
                 Document document = new Document();
                 document.Add(field);
            
                 for (all documents) {
                     ...
                     field.SetInt64Value(value)
                     writer.AddDocument(document);
                     ...
                 }

See also , , . Any type that can be converted to long can also be indexed. For example, date/time values represented by a can be translated into a long value using the property. If you don't need millisecond precision, you can quantize the value, either by dividing the result of or using the separate getters (for year, month, etc.) to construct an or value. To perform range querying or filtering against a , use or . To sort according to a , use the normal numeric sort types, eg . values can also be loaded directly from . You may add the same field name as an to the same document more than once. Range querying and filtering will be the logical OR of all values; so a range query will hit all documents that have at least one value in the range. However sort behavior is not defined. If you need to sort, you should separately index a single-valued . An will consume somewhat more disk space in the index than an ordinary single-valued field. However, for a typical index that includes substantial textual content per document, this increase will likely be in the noise. Within Lucene, each numeric value is indexed as a trie structure, where each term is logically assigned to larger and larger pre-defined brackets (which are simply lower-precision representations of the value). The step size between each successive bracket is called the precisionStep, measured in bits. Smaller precisionStep values result in larger number of brackets, which consumes more disk space in the index but may result in faster range search performance. The default value, 4, was selected for a reasonable tradeoff of disk space consumption versus performance. You can create a custom and invoke the setter if you'd like to change the value. Note that you must also specify a congruent value when creating or . For low cardinality fields larger precision steps are good. If the cardinality is < 100, it is fair to use , which produces one term per value. For more information on the internals of numeric trie indexing, including the precisionStep configuration, see . The format of indexed values is described in . If you only need to sort by numeric value, and never run range querying/filtering, you can index using a precisionStep of . this will minimize disk space consumed. More advanced users can instead use directly, when indexing numbers. this class is a wrapper around this token stream type for easier, more intuitive usage. NOTE: This was LongField in Lucene @since 2.9

Type for a that is not stored: normalization factors, frequencies, and positions are omitted.

Type for a stored : normalization factors, frequencies, and positions are omitted.

Creates a stored or un-stored with the provided value and default precisionStep (4).

field name 64-bit value if the content should also be stored if the field is null.

Expert: allows you to customize the .

field name 64-bit value customized field type: must have of . if the field or is null. if the field type does not have a of

Field that stores a per-document value for scoring, sorting or value retrieval. Here's an example usage:


                 document.Add(new NumericDocValuesField(name, 22L));

If you also need to store the value, you should add a separate instance.

Type for numeric .

Creates a new field with the specified 64-bit value

field name 64-bit value if the field is null

Field that stores a per-document value for scoring, sorting or value retrieval. Here's an example usage:


                 document.Add(new PackedInt64DocValuesField(name, 22L));

If you also need to store the value, you should add a separate instance. NOTE: This was PackedLongDocValuesField in Lucene

Creates a new field with the specified value

field name 64-bit value if the field is null

Field that stores a per-document value for scoring, sorting or value retrieval. Here's an example usage:


                 document.Add(new Int16DocValuesField(name, (short) 22));

If you also need to store the value, you should add a separate instance. NOTE: This was ShortDocValuesField in Lucene

Creates a new field with the specified 16-bit value

field name 16-bit value if the field is null

Field that stores a per-document value, indexed for sorting. Here's an example usage:


                 document.Add(new SortedBytesDocValuesField(name, new BytesRef("hello")));

If you also need to store the value, you should add a separate instance.

Type for sorted bytes : all with the same length

Type for sorted bytes : can have variable lengths

Create a new fixed or variable-length sorted field.

field name binary content if the field is null

Create a new fixed or variable length sorted field.

field name binary content (ignored) if the field is null

Field that stores a per-document value, indexed for sorting. Here's an example usage:


                 document.Add(new SortedDocValuesField(name, new BytesRef("hello")));

If you also need to store the value, you should add a separate instance.

Type for sorted bytes

Create a new sorted field.

field name binary content if the field is null

Field that stores a set of per-document values, indexed for faceting,grouping,joining. Here's an example usage:


                 document.Add(new SortedSetDocValuesField(name, new BytesRef("hello")));
                 document.Add(new SortedSetDocValuesField(name, new BytesRef("world")));

If you also need to store the value, you should add a separate instance.

Type for sorted bytes

Create a new sorted field.

field name binary content if the field is null

A field whose value is stored so that and will return the field and its value.

Type for a stored-only field.

Create a stored-only field with the given binary value. NOTE: the provided is not copied so be sure not to change it until you're done with this field.

field name byte array pointing to binary content (not copied) if the field is null.

Create a stored-only field with the given binary value. NOTE: the provided is not copied so be sure not to change it until you're done with this field.

field name array pointing to binary content (not copied) starting position of the byte array valid length of the byte array if the field is null.

Create a stored-only field with the given binary value. NOTE: the provided is not copied so be sure not to change it until you're done with this field.

field name pointing to binary content (not copied) if the field is null.

Create a stored-only field with the given value.

field name value if the field or is null.

Create a stored-only field with the given value.

field name value if the field is null.

Create a stored-only field with the given value.

field name value if the field is null.

Create a stored-only field with the given value.

field name value if the field is null.

Create a stored-only field with the given value.

field name value if the field is null.

Field that stores a per-document value. If values may be shared it's better to use . Here's an example usage:


                 document.Add(new StraightBytesDocValuesField(name, new BytesRef("hello")));

If you also need to store the value, you should add a separate instance.

Type for direct bytes : all with the same length

Type for direct bytes : can have variable lengths

Create a new fixed or variable length field.

field name binary content if the field is null

Create a new fixed or variable length direct field.

field name binary content (ignored) if the field is null

A field that is indexed but not tokenized: the entire value is indexed as a single token. For example this might be used for a 'country' field or an 'id' field, or any field that you intend to use for sorting or access through the field cache.

Indexed, not tokenized, omits norms, indexes , not stored.

Indexed, not tokenized, omits norms, indexes , stored

Creates a new (a field that is indexed but not tokenized)

field name value if the content should also be stored if the field or is null.

A field that is indexed and tokenized, without term vectors. For example this would be used on a 'body' field, that contains the bulk of a document's text.

Indexed, tokenized, not stored.

Indexed, tokenized, stored.

Creates a new un-stored with value.

field name value if the field or is null

Creates a new with value.

field name value if the content should also be stored if the field or is null.

Creates a new un-stored with value.

field name value if the field or is null.

LUCENENET specific extensions to the class.

Returns a field with the given name if any exist in this document cast to type , or null. If multiple fields exists with this name, this method returns the first value added.

This . Field name If the field type cannot be cast to . This is null.

Returns an array of s with the given name, cast to type . This method returns an empty array when there are no matching fields. It never returns null.

This . the name of the field a array If the field type cannot be cast to . This is null.

Adds a new .

This . field name binary content The field that was added to this . if this or the field is null.

Adds a new field with the specified 64-bit double value

Adds a stored or un-stored with the provided value and default precisionStep (4).

This . field name 64-bit value if the content should also be stored The field that was added to this . if this or the field is null.

Adds a stored or un-stored with the provided value. Expert: allows you to customize the .

This . field name 64-bit double value customized field type: must have of . The field that was added to this . if this , the field or is null, or if the field type does not have a

Adds a new field with the specified 32-bit value

This . field name 32-bit value The field that was added to this . if this or the field is null

Adds a stored or un-stored with the provided value and default precisionStep (4).

This . field name 32-bit value if the content should also be stored The field that was added to this . if this or the field is null.

Adds a stored or un-stored with the provided value. Expert: allows you to customize the .

This . field name 32-bit value customized field type: must have of . The field that was added to this . if this , the field or is null. if the field type does not have a

Adds a stored or un-stored with the provided value and default precisionStep (4).

This . field name 32-bit value if the content should also be stored The field that was added to this . if this or the field is null.

Adds a stored or un-stored with the provided value. Expert: allows you to customize the .

This . field name 32-bit value customized field type: must have of . The field that was added to this . if this , the field or is null. if the field type does not have a of

Adds a stored or un-stored with the provided value and default precisionStep (4).

This . field name 64-bit value if the content should also be stored The field that was added to this . if this or the field is null.

Adds a stored or un-stored with the provided value. Expert: allows you to customize the .

This . field name 64-bit value customized field type: must have of . The field that was added to this . if this , the field or is null. if the field type does not have a of

Adds a new field with the specified 64-bit value

If you also need to store the value, you should add a separate instance. This . field name 64-bit value The field that was added to this . if this , the field is null.

Adds a new field.

If you also need to store the value, you should add a separate instance. This . field name binary content The field that was added to this . if this , the field is null.

Adds a new field.

If you also need to store the value, you should add a separate instance. This . field name binary content The field that was added to this . if this , the field is null.

Adds a stored-only field with the given binary value. NOTE: the provided is not copied so be sure not to change it until you're done with this field.

This . field name byte array pointing to binary content (not copied) The field that was added to this . if this , the field is null.

Adds a stored-only field with the given binary value. NOTE: the provided is not copied so be sure not to change it until you're done with this field.

This . field name array pointing to binary content (not copied) starting position of the byte array valid length of the byte array The field that was added to this . if this , the field is null.

Adds a stored-only field with the given binary value. NOTE: the provided is not copied so be sure not to change it until you're done with this field.

This . field name pointing to binary content (not copied) The field that was added to this . if this , the field is null.

Adds a stored-only field with the given value.

This . field name value The field that was added to this . if this , the field or is null.

Adds a stored-only field with the given value.

This . field name value The field that was added to this . if this , the field is null.

Adds a stored-only field with the given value.

This . field name value The field that was added to this . if this , the field is null.

Adds a stored-only field with the given value.

This . field name value The field that was added to this . if this , the field is null.

Adds a stored-only field with the given value.

This . field name value The field that was added to this . if this , the field is null.

Adds a new (a field that is indexed but not tokenized)

This . field name value if the content should also be stored The field that was added to this . if this , the field or is null.

Adds a new un-stored with value.

This . field name value The field that was added to this . if this , the field or is null.

Adds a new with value.

This . field name value if the content should also be stored The field that was added to this . if this , the field or is null.

Adds a new un-stored with value.

This . field name value The field that was added to this . if this , the field or is null.

Extension methods to the interface.

Returns the field value as or 0 if the type is non-numeric.

This . The field value or 0 if the type is non-numeric.

Returns the field value as or 0 if the type is non-numeric.

This . The field value or 0 if the type is non-numeric.

Returns the field value as or 0 if the type is non-numeric.

This . The field value or 0 if the type is non-numeric.

Returns the field value as or 0 if the type is non-numeric.

This . The field value or 0 if the type is non-numeric.

Returns the field value as or 0 if the type is non-numeric.

This . The field value or 0 if the type is non-numeric.

Returns the field value as or 0 if the type is non-numeric.

This . The field value or 0 if the type is non-numeric.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

LUCENENET specific propety that allows access to the context as , which prevents the need to cast.

Returns true if there are norms stored for this .

Returns for this reader. this property may return null if the reader has no postings.

Returns the number of documents containing the . This method returns 0 if the term or field does not exist. This method does not take into account deleted documents that have not yet been merged away.

This may return null if the field does not exist.

Returns for the specified term. This will return null if either the field or term does not exist.

Returns for the specified term. This will return null if the field or term does not exist or positions weren't indexed.

Returns for this field, or null if no were indexed for this field. The returned instance should only be used by a single thread.

Returns a at the size of reader.MaxDoc, with turned on bits for each docid that does have a value for this field, or null if no were indexed for this field. The returned instance should only be used by a single thread.

Returns representing norms for this field, or null if no were indexed. The returned instance should only be used by a single thread.

Get the describing all fields in this reader. @lucene.experimental

Returns the representing live (not deleted) docs. A set bit indicates the doc ID has not been deleted. If this method returns null it means there are no deleted documents (all documents are live). The returned instance has been safely published for use by multiple threads without additional synchronization.

Checks consistency of this reader. Note that this may be costly in terms of I/O, e.g. may involve computing a checksum value against large data files. @lucene.internal

for instances.

The readers ord in the top-level's leaves array

The readers absolute doc base

Creates a new

A that enumerates terms based upon what is accepted by a DFA. The algorithm is such: As long as matches are successful, keep reading sequentially. When a match fails, skip to the next string in lexicographic order that does not enter a reject state. The algorithm does not attempt to actually skip to the next string that is completely accepted. this is not possible when the language accepted by the FSM is not finite (i.e. * operator). @lucene.experimental

Construct an enumerator based upon an automaton, enumerating the specified field, working on a supplied @lucene.experimental

TermsEnum CompiledAutomaton

Returns true if the term matches the automaton. Also stashes away the term to assist with smart enumeration.

Sets the enum to operate in linear fashion, as we have found a looping transition at position: we set an upper bound and act like a for this portion of the term space.

Increments the byte buffer to the next string in binary order after s that will not put the machine into a reject state. If such a string does not exist, returns false. The correctness of this method depends upon the automaton being deterministic, and having no transitions to dead states.

true if more possible solutions exist for the DFA

Returns the next string in lexicographic order that will not put the machine into a reject state. This method traverses the DFA from the given position in the string, starting at the given state. If this cannot satisfy the machine, returns false. This method will walk the minimal path, in lexicographic order, as long as possible. If this method returns false, then there might still be more solutions, it is necessary to backtrack to find out.

current non-reject state useful portion of the string true if more possible solutions exist for the DFA from this position

Attempts to backtrack thru the string after encountering a dead end at some given position. Returns false if no more possible strings can match.

current position in the input string position >=0 if more possible solutions exist for the DFA

Base class for implementing s based on an array of sub-readers. The implementing class has to add code for correctly refcounting and closing the sub-readers. User code will most likely use to build a composite reader on a set of sub-readers (like several s). For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions. NOTE: instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the instance; use your own (non-Lucene) objects instead. @lucene.internal

List view solely for , for effectiveness the array is used internally.

Constructs a on the given .

the wrapped sub-readers. This array is returned by and used to resolve the correct subreader for docID-based methods. Please note: this array is not cloned and not protected for modification, the subclass is responsible to do this.

Helper method for subclasses to get the corresponding reader for a doc ID

Helper method for subclasses to get the docBase of the given sub-reader index.

A per-document

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Lookup the value for document.

A which holds updates of documents, of a single . @lucene.experimental

Buffers up pending per doc, then flushes when segment flushes.

Maximum length for a binary field.

Exposes a slice of an existing as a new . @lucene.internal

Holds buffered deletes and updates, by docID, term or query for a single segment. this is used to hold buffered pending deletes and updates against the to-be-flushed segment. Once the deletes and updates are pushed (on flush in ), they are converted to a FrozenDeletes instance. NOTE: instances of this class are accessed either via a private instance on , or via sync'd code by

NOTE: This was MAX_INT in Lucene

Tracks the stream of BufferedDeletes. When flushes, its buffered deletes and updates are appended to this stream. We later apply them (resolve them to the actual docIDs, per segment) when a merge is started (only to the to-be-merged segments). We also apply to all segments when NRT reader is pulled, commit/close is called, or when too many deletes or updates are buffered and must be flushed (by RAM usage or by count). Each packet is assigned a generation, and each flushed or merged segment is also assigned a generation, so we can track which BufferedDeletes packets to apply to any given segment.

Appends a new packet of buffered deletes to the stream, setting its generation:

Resolves the buffered deleted Term/Query/docIDs, into actual deleted docIDs in the liveDocs for each .

Removes any BufferedDeletes that we no longer need to store because all segments in the index have had the deletes applied.

that knows how to read the byte slices written by Posting and PostingVector. We read the bytes in each slice until we hit the end of that slice at which point we read the forwarding address of the next slice and then jump to it.

Class to write byte streams into slices of shared . This is used by to hold the posting list for many terms in RAM.

Set up the writer to write at address.

Write byte into byte slice stream

Basic tool and API to check the health of an index and write a new segments file that removes reference to problematic segments. As this tool checks every byte in the index, on a large index it can take quite a long time to run. Please make a complete backup of your index before using this to fix your index! @lucene.experimental

Returned from detailing the health and status of the index. @lucene.experimental

True if no problems were found with the index.

True if we were unable to locate and load the segments_N file.

True if we were unable to open the segments_N file.

True if we were unable to read the version number from segments_N file.

Name of latest segments_N file in the index.

Number of segments in the index.

Empty unless you passed specific segments list to check as optional 3rd argument.

True if the index was created with a newer version of Lucene than the tool.

List of instances, detailing status of each segment.

index is in.

instance containing only segments that had no problems (this is used with the method to repair the index.

How many documents will be lost to bad segments.

How many bad segments were found.

True if we checked only specific segments ( was called with non-null argument).

The greatest segment name.

Whether the is greater than any of the segments' names.

Holds the userData of the last commit in the index

Holds the status of each segment in the index. See . @lucene.experimental

Name of the segment.

Codec used to read this segment.

Document count (does not take deletions into account).

True if segment is compound file format.

Number of files referenced by this segment.

Net size (MB) of the files referenced by this segment.

Doc store offset, if this segment shares the doc store files (stored fields and term vectors) with other segments. This is -1 if it does not share.

String of the shared doc store segment, or null if this segment does not share the doc store files.

True if the shared doc store files are compound file format.

True if this segment has pending deletions.

Current deletions generation.

Number of deleted documents.

True if we were able to open an on this segment.

Number of fields in this segment.

Map that includes certain debugging details that records into each segment it creates

Status for testing of field norms (null if field norms could not be tested).

Status for testing of indexed terms (null if indexed terms could not be tested).

Status for testing of stored fields (null if stored fields could not be tested).

Status for testing of term vectors (null if term vectors could not be tested).

Status for testing of (null if could not be tested).

Status from testing field norms.

Number of fields successfully tested

Exception thrown during term index test (null on success)

Status from testing term index.

Number of terms with at least one live doc.

Number of terms with zero live docs docs.

Total frequency across all terms.

Total number of positions.

Exception thrown during term index test (null on success)

Holds details of block allocations in the block tree terms dictionary (this is only set if the for this segment uses block tree.

Status from testing stored fields.

Number of documents tested.

Total number of stored fields tested.

Exception thrown during stored fields test (null on success)

Status from testing stored fields.

Number of documents tested.

Total number of term vectors tested.

Exception thrown during term vector test (null on success)

Status from testing

Total number of docValues tested.

Total number of numeric fields

Total number of binary fields

Total number of sorted fields

Total number of sortedset fields

Exception thrown during doc values test (null on success)

Create a new on the directory.

If true, term vectors are compared against postings to make sure they are the same. This will likely drastically increase time it takes to run !

Gets or Sets infoStream where messages should go. If null, no messages are printed. If is true then more details are printed.

If true, prints more details to the , if set.

Returns a instance detailing the state of the index. As this method checks every byte in the index, on a large index it can take quite a long time to run. WARNING: make sure you only call this when the index is not opened by any writer.

Returns a instance detailing the state of the index.

list of specific segment names to check As this method checks every byte in the specified segments, on a large index it can take quite a long time to run. WARNING: make sure you only call this when the index is not opened by any writer.

Test field norms. @lucene.experimental

Checks api is consistent with itself. Searcher is optional, to verify with queries. Can be null.

Test the term index. @lucene.experimental

Test stored fields. @lucene.experimental

Test docvalues. @lucene.experimental

Test term vectors. @lucene.experimental

Repairs the index using previously returned result from . Note that this does not remove any of the unreferenced files after it's done; you must separately open an , which deletes unreferenced files when it's created. WARNING: this writes a new segments file into the index, effectively removing all documents in broken segments from the index. BE CAREFUL. WARNING: Make sure you only call this when the index is not opened by any writer.

LUCENENET specific: In the Java implementation, this Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to this method: index check

The command line arguments Thrown if invalid arguments are provided

This was termsIterable() in Lucene.

This was queriesIterable() in Lucene.

Instances of this reader type can only be used to get stored fields from the underlying s, but it is not possible to directly retrieve postings. To do that, get the for all sub-readers via . Alternatively, you can mimic an (with a serious slowdown), by wrapping composite readers with . instances for indexes on disk are usually constructed with a call to one of the static DirectoryReader.Open() methods, e.g. . implements the interface, it is not possible to directly get postings. Concrete subclasses of are usually constructed with a call to one of the static Open() methods, e.g. . For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions. NOTE: instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the instance; use your own (non-Lucene) objects instead.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Expert: returns the sequential sub readers that this reader is logically composed of. This method may not return null. NOTE: In contrast to previous Lucene versions this method is no longer public, code that wants to get all s this composite is composed of should use .

for instance.

Creates a for intermediate readers that aren't not top-level readers in the current context

Creates a for top-level readers with parent set to null

A that runs each merge using a separate thread. Specify the max number of threads that may run at once, and the maximum number of simultaneous merges with . If the number of merges exceeds the max number of threads then the largest merges are paused until one of the smaller merges completes. If more than merges are requested then this class will forcefully throttle the incoming threads by pausing until one more more merges complete.

List of currently active s.

Default . We default to 1: tests on spinning-magnet drives showed slower indexing performance if more than one merge thread runs at once (though on an SSD it was faster)

Default .

that holds the index.

that owns this instance.

How many s have kicked off (this is use to name them).

Sole constructor, with all settings set to default values.

Sets the maximum number of merge threads and simultaneous merges allowed.

the max # simultaneous merges that are allowed. If a merge is necessary yet we already have this many threads running, the incoming thread (that is calling add/updateDocument) will block until a merge thread has completed. Note that we will only run the smallest merges at a time. The max # simultaneous merge threads that should be running at once. This must be <=

Returns .

See .

Return the priority that merge threads run at. By default the priority is 1 plus the priority of (ie, slightly higher priority than) the first thread that calls merge.

Set the base priority that merge threads run at. Note that CMS may increase priority of some merge threads beyond this base priority. It's best not to set this any higher than (4)-maxThreadCount, so that CMS has room to set relative priority among threads.

Sorts s; larger merges come first.

Called whenever the running merges have changed, to pause & unpause threads. This method sorts the merge threads by their merge size in descending order and then pauses/unpauses threads from first to last -- that way, smaller merges are guaranteed to run before larger ones.

Returns true if verbosing is enabled. This method is usually used in conjunction with , like that:


             if (IsVerbose) 
             {
                 Message("your message");
             }

Outputs the given message - this method assumes was called and returned true.

Wait for any running merge threads to finish. This call is not interruptible as used by .

Returns the number of merge threads that are alive. Note that this number is <= size.

Does the actual merge, by calling

Create and return a new

Runs a merge thread, which may run one or more merges in sequence.

Sole constructor.

Record the currently running merge.

Return the current merge, or null if this is done.

Set the priority of this thread.

Called when an exception is hit in a background merge thread

Used for testing

This exception is thrown when Lucene detects an inconsistency in the index.

Constructor.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

is an implementation of that can read indexes in a . instances are usually constructed with a call to one of the static Open() methods, e.g. . For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions. NOTE: instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the instance; use your own (non-Lucene) objects instead.

Default termInfosIndexDivisor.

The index directory.

Returns a reading the index in the given

the index directory if there is a low-level IO error

Expert: Returns a reading the index in the given with the given termInfosIndexDivisor.

the index directory Subsamples which indexed terms are loaded into RAM. this has the same effect as setting (on ) except that setting must be done at indexing time while this setting can be set per reader. When set to N, then one in every N*termIndexInterval terms in the index is loaded into memory. By setting this to a value > 1 you can reduce memory usage, at the expense of higher latency when loading a TermInfo. The default value is 1. Set this to -1 to skip loading the terms index entirely. NOTE: divisor settings > 1 do not apply to all implementations, including the default one in this release. It only makes sense for terms indexes that can efficiently re-sample terms at load time. if there is a low-level IO error

Open a near real time from the . @lucene.experimental

The to open from If true, all buffered deletes will be applied (made visible) in the returned reader. If false, the deletes are not applied but remain buffered (in IndexWriter) so that they will be applied in the future. Applying deletes can be costly, so if your app can tolerate deleted documents being returned you might gain some performance by passing false. The new if the index is corrupt if there is a low-level IO error

Expert: returns an reading the index in the given .

the commit point to open if there is a low-level IO error

Expert: returns an reading the index in the given and .

the commit point to open Subsamples which indexed terms are loaded into RAM. this has the same effect as setting (on ) except that setting must be done at indexing time while this setting can be set per reader. When set to N, then one in every N*termIndexInterval terms in the index is loaded into memory. By setting this to a value > 1 you can reduce memory usage, at the expense of higher latency when loading a TermInfo. The default value is 1. Set this to -1 to skip loading the terms index entirely. NOTE: divisor settings > 1 do not apply to all implementations, including the default one in this release. It only makes sense for terms indexes that can efficiently re-sample terms at load time. if there is a low-level IO error

If the index has changed since the provided reader was opened, open and return a new reader; else, return null. The new reader, if not null, will be the same type of reader as the previous one, ie a near-real-time (NRT) reader will open a new NRT reader, a will open a new , etc. This method is typically far less costly than opening a fully new as it shares resources (for example sub-readers) with the provided , when possible. The provided reader is not disposed (you are responsible for doing so); if a new reader is returned you also must eventually dispose it. Be sure to never dispose a reader while other threads are still using it; see to simplify managing this.

if the index is corrupt if there is a low-level IO error null if there are no changes; else, a new instance which you must eventually dispose

If the differs from what the provided reader is searching, open and return a new reader; else, return null.

Expert: If there changes (committed or not) in the versus what the provided reader is searching, then open and return a new searching both committed and uncommitted changes from the writer; else, return null (though, the current implementation never returns null). This provides "near real-time" searching, in that changes made during an session can be quickly made available for searching without closing the writer nor calling . It's near real-time because there is no hard guarantee on how quickly you can get a new reader after making changes with . You'll have to experiment in your situation to determine if it's fast enough. As this is a new and experimental feature, please report back on your findings so we can learn, improve and iterate. The very first time this method is called, this writer instance will make every effort to pool the readers that it opens for doing merges, applying deletes, etc. This means additional resources (RAM, file descriptors, CPU time) will be consumed. For lower latency on reopening a reader, you should call (on ) to pre-warm a newly merged segment before it's committed to the index. This is important for minimizing index-to-search delay after a large merge. If an AddIndexes* call is running in another thread, then this reader will only search those segments from the foreign index that have been successfully copied over, so far. NOTE: Once the writer is disposed, any outstanding readers may continue to be used. However, if you attempt to reopen any of those readers, you'll hit an . @lucene.experimental

that covers entire index plus all changes made so far by this instance, or null if there are no new changes The to open from If true, all buffered deletes will be applied (made visible) in the returned reader. If false, the deletes are not applied but remain buffered (in ) so that they will be applied in the future. Applying deletes can be costly, so if your app can tolerate deleted documents being returned you might gain some performance by passing false. if there is a low-level IO error

Returns all commit points that exist in the . Normally, because the default is , there would be only one commit point. But if you're using a custom then there could be many commits. Once you have a given commit, you can open a reader on it by calling There must be at least one commit in the , else this method throws . Note that if a commit is in progress while this method is running, that commit may or may not be returned.

a sorted list of s, from oldest to latest.

Returns true if an index likely exists at the specified directory. Note that if a corrupt index exists, or if an index in the process of committing

the directory to check for an index true if an index exists; false otherwise

Expert: Constructs a on the given .

the wrapped atomic index segment readers. This array is returned by and used to resolve the correct subreader for docID-based methods. Please note: this array is not cloned and not protected for modification outside of this reader. Subclasses of should take care to not allow modification of this internal array, e.g. .

Returns the directory this index resides in.

Implement this method to support . If this reader does not support reopen, return null, so client code is happy. This should be consistent with (should always return true) if reopen is not supported.

if there is a low-level IO error null if there are no changes; else, a new instance.

Implement this method to support . If this reader does not support reopen from a specific , throw .

if there is a low-level IO error null if there are no changes; else, a new instance.

Implement this method to support . If this reader does not support reopen from , throw .

if there is a low-level IO error null if there are no changes; else, a new instance.

Version number when this was opened. This method returns the version recorded in the commit that the reader opened. This version is advanced every time a change is made with .

Check whether any new changes have occurred to the index since this reader was opened. If this reader was created by calling an overload of , then this method checks if any further commits (see ) have occurred in the directory. If instead this reader is a near real-time reader (ie, obtained by a call to , or by calling an overload of on a near real-time reader), then this method checks if either a new commit has occurred, or any new uncommitted changes have taken place via the writer. Note that even if the writer has only performed merging, this method will still return false. In any event, if this returns false, you should call an overload of to get a new reader that sees the changes.

if there is a low-level IO error

Expert: return the that this reader has opened. @lucene.experimental

Called when decides to create a new segment

Called when an aborting exception is hit

Processes all occurrences of a single field

This is a that gathers all fields under the same name, and calls per-field consumers to process field by field. This class doesn't doesn't do any "real" work of its own: it just forwards the fields to a .

Holds all per thread, per field state.

This is a that inverts each field, separately, from a , and accepts a to process those terms.

Holds state for inverting all occurrences of a single field in the document. This class doesn't do anything itself; instead, it forwards the tokens produced by analysis to its own consumer (). It also interacts with an endConsumer ().

Flag to pass to if you require that no offsets and payloads will be returned.

Flag to pass to if you require offsets in the returned enum.

Flag to pass to if you require payloads in the returned enum.

Also iterates through positions.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns the next position. You should only call this up to times else the behavior is not defined. If positions were not indexed this will return -1; this only happens if offsets were indexed and you passed needsOffset=true when pulling the enum.

Returns start offset for the current position, or -1 if offsets were not indexed.

Returns end offset for the current position, or -1 if offsets were not indexed.

Returns the payload at this position, or null if no payload was indexed. You should not modify anything (neither members of the returned nor bytes in the ).

Flag to pass to if you don't require term frequencies in the returned enum.

Flag to pass to if you require term frequencies in the returned enum.

Iterates through the documents and term freqs. NOTE: you must first call before using any of the per-doc methods.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns term frequency in the current document, or 1 if the field was indexed with . Do not call this before is first called, nor after returns . NOTE: if the was obtain with , the result of this method is undefined.

Returns the related attributes.

This class enables fast access to multiple term ords for a specified field across all docIDs. Like , it uninverts the index and holds a packed data structure in RAM to enable fast access. Unlike , it can handle multi-valued fields, and, it does not hold the term bytes in RAM. Rather, you must obtain a from the method, and then seek-by-ord to get the term's bytes. While normally term ords are type , in this API they are as the internal representation here cannot address more than unique terms. Also, typically this class is used on fields with relatively few unique terms vs the number of documents. In addition, there is an internal limit (16 MB) on how many bytes each chunk of documents may consume. If you trip this limit you'll hit an . Deleted documents are skipped during uninversion, and if you look them up you'll get 0 ords. The returned per-document ords do not retain their original order in the document. Instead they are returned in sorted (by ord, ie term's comparer) order. They are also de-dup'd (ie if doc has same term more than once in this field, you'll only get that ord back once). This class tests whether the provided reader is able to retrieve terms by ord (ie, it's single segment, and it uses an ord-capable terms index). If not, this class will create its own term index internally, allowing to create a wrapped that can handle ord. The method then provides this wrapped enum, if necessary. The RAM consumption of this class can be high! @lucene.experimental

Final form of the un-inverted field: Each document points to a list of term numbers that are contained in that document. Term numbers are in sorted order, and are encoded as variable-length deltas from the previous term number. Real term numbers start at 2 since 0 and 1 are reserved. A term number of 0 signals the end of the termNumber list. There is a single int[maxDoc()] which either contains a pointer into a byte[] for the termNumber lists, or directly contains the termNumber list if it fits in the 4 bytes of an integer. If the first byte in the integer is 1, the next 3 bytes are a pointer into a byte[] where the termNumber list starts. There are actually 256 byte arrays, to compensate for the fact that the pointers into the byte arrays are only 3 bytes long. The correct byte array for a document is a function of it's id. To save space and speed up faceting, any term that matches enough documents will not be un-inverted... it will be skipped while building the un-inverted field structure, and will use a set intersection method during faceting. To further save memory, the terms (the actual string values) are not all stored in memory, but a TermIndex is used to convert term numbers to term values only for the terms needed after faceting has completed. Only every 128th term value is stored, along with it's corresponding term number, and this is used as an index to find the closest term and iterate until the desired number is hit (very much like Lucene's own internal term index).

Term ords are shifted by this, internally, to reserve values 0 (end term) and 1 (index is a pointer into byte array)

Every 128th term is indexed, by default.

Don't uninvert terms that exceed this count.

Field we are uninverting.

Number of terms in the field.

Total number of references to term numbers.

Total time to uninvert the field.

Time for phase1 of the uninvert process.

Holds the per-document ords or a pointer to the ords.

Holds term ords for documents.

Total bytes (sum of term lengths) for all indexed terms.

Holds the indexed (by default every 128th) terms.

If non-null, only terms matching this prefix were indexed.

Ordinal of the first term in the field, or 0 if the does not implement .

Used while uninverting.

Returns total bytes used.

Inverts all terms

Inverts only terms starting w/ prefix

Inverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <=

Inverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= , with a custom indexing interval (default is every 128nd term).

Subclass inits w/ this, but be sure you then call uninvert, only once

Returns a that implements . If the provided supports , we just return its ; if it does not, we build a "private" terms index internally (WARNING: consumes RAM) and use that index to implement . This also enables on top of a composite reader. The returned is unpositioned. This returns null if there are no terms. NOTE: you must pass the same reader that was used when creating this class

Returns the number of terms in this field

Returns true if no terms were indexed.

Subclass can override this

Invoked during to record the document frequency for each uninverted term.

Call this only once (if you subclass!)

Number of bytes to represent an unsigned int as a vint. NOTE: This was vIntSize() in Lucene

NOTE: This was writeInt() in Lucene

Only used if original doesn't implement ; in this case we "wrap" our own terms index around it.

Returns the term () corresponding to the provided ordinal.

Returns a view of this instance

Buffer must be at least 5 s long. Returns number of term ords placed into buffer; if this count is less than buffer.Length then that is the end.

This class accepts multiple added documents and directly writes segment files. Each added document is passed to the , which in turn processes the document and interacts with other consumers in the indexing chain. Certain consumers, like and , digest a document and immediately write bytes to the "doc store" files (ie, they do not consume RAM per document, except while they are processing the document). Other consumers, eg and , buffer bytes in RAM and flush only when a new segment is produced. Once we have used our allowed RAM buffer, or the number of added docs is large enough (in the case we are flushing by doc count instead of RAM usage), we create a real segment and flush it to the Directory. Threads: Multiple threads are allowed into AddDocument at once. There is an initial synchronized call to which allocates a for this thread. The same thread will get the same over time (thread affinity) so that if there are consistent patterns (for example each thread is indexing a different content source) then we make better use of RAM. Then ProcessDocument is called on that without synchronization (most of the "heavy lifting" is in this call). Finally the synchronized "finishDocument" is called to flush changes to the directory. When flush is called by we forcefully idle all threads and flush only once they are all idle. this means you can call flush with a given thread even while other threads are actively adding/deleting documents. Exceptions: Because this class directly updates in-memory posting lists, and flushes stored fields and term vectors directly to files in the directory, there are certain limited times when an exception can corrupt this state. For example, a disk full while flushing stored fields leaves this file in a corrupt state. Or, an OOM exception while appending to the in-memory posting lists can corrupt that posting list. We call such exceptions "aborting exceptions". In these cases we must call to discard all docs added since the last flush. All other exceptions ("non-aborting exceptions") can still partially update the index structures. These updates are consistent, but, they represent only a part of the document seen up until the exception was hit. When this happens, we immediately mark the document as deleted so that the document is always atomically ("all or none") added to the index.

we preserve changes during a full flush since IW might not checkout before we release all changes. NRT Readers otherwise suddenly return true from IsCurrent() while there are actually changes currently committed. See also &

Returns how many docs are currently buffered in RAM.

Called if we hit an exception at a bad time (when updating the index files) and must discard all currently buffered docs. this resets our state, discarding any docs added since last flush.

is a non-blocking linked pending deletes queue. In contrast to other queue implementation we only maintain the tail of the queue. A delete queue is always used in a context of a set of DWPTs and a global delete pool. Each of the DWPT and the global pool need to maintain their 'own' head of the queue (as a instance per DWPT). The difference between the DWPT and the global pool is that the DWPT starts maintaining a head once it has added its first document since for its segments private deletes only the deletes after that document are relevant. The global pool instead starts maintaining the head once this instance is created by taking the sentinel instance as its initial head. Since each maintains its own head and the list is only single linked the garbage collector takes care of pruning the list for us. All nodes in the list that are still relevant should be either directly or indirectly referenced by one of the DWPT's private or by the global slice. Each DWPT as well as the global delete pool maintain their private DeleteSlice instance. In the DWPT case updating a slice is equivalent to atomically finishing the document. The slice update guarantees a "happens before" relationship to all other updates in the same indexing session. When a DWPT updates a document it: consumes a document and finishes its processing updates its private either by calling or (if the document has a delTerm) applies all deletes in the slice to its private and resets it increments its internal document id The DWPT also doesn't apply its current documents delete term until it has updated its delete slice which ensures the consistency of the update. If the update fails before the could have been updated the deleteTerm will also not be added to its private deletes neither to the global deletes.

invariant for document update

Returns true iff the given item is identical to the item hold by the slices tail, otherwise false.

This class controls flushing during indexing. It tracks the memory consumption per and uses a configured to decide if a must flush. In addition to the the flush control might set certain as flush pending iff a exceeds the to prevent address space exhaustion.

Sets flush pending state on the given . The must have indexed at least on and must not be already pending.

Returns an iterator that provides access to all currently active s

Returns the number of delete terms in the global pool

Prunes the blockedQueue by removing all DWPT that are associated with the given flush queue.

Returns true if a full flush is currently running

Returns the number of flushes that are already checked out but not yet actively flushing

Returns the number of flushes that are checked out but not yet available for flushing. This only applies during a full flush if a DWPT needs flushing but must not be flushed until the full flush has finished.

This method will block if too many DWPT are currently flushing and no checked out DWPT are available

Returns true iff stalled

Returns the

@lucene.internal

Publishes the flushed segment, segment private deletes (if any) and its associated global delete (if present) to . The actual publishing operation is synced on IW -> BDS so that the 's delete generation is always () + 1

The must define the method which returns the that the calls to process the documents.

Called if we hit an exception at a bad time (when updating the index files) and must discard all currently buffered docs. this resets our state, discarding any docs added since last flush.

Returns the number of delete terms in this

Returns the number of RAM resident documents in this

Prepares this DWPT for flushing. this method will freeze and return the s global buffer and apply all pending deletes to this DWPT.

Flush all pending docs to a new segment

Seals the for the new flushed segment and persists the deleted documents .

Get current segment info we are writing.

Initial chunks size of the shared byte[] blocks used to store postings data

if you increase this, you must fix field cache impl for getTerms/getTermsIndex requires <= 32768

NOTE: This was IntBlockAllocator in Lucene

Allocate another int[] from the shared pool

controls instances and their thread assignments during indexing. Each holds a reference to a that is once a is obtained from the pool exclusively used for indexing a single document by the obtaining thread. Each indexing thread must obtain such a to make progress. Depending on the implementation assignments might differ from document to document. Once a is selected for flush the thread pool is reusing the flushing s with a new instance.

references and guards a instance that is used during indexing to build a in-memory index segment. also holds all flush related per-thread data controlled by . A , its methods and members should only accessed by one thread a time. Users must acquire the lock via and release the lock in a finally block via (on the instance) before accessing the state.

Resets the internal with the given one. if the given DWPT is null this is marked as inactive and should not be used for indexing anymore.

Returns true if this is still open. This will only return false iff the DW has been disposed and this is already checked out for flush.

Returns the number of currently active bytes in this ThreadState's

Returns this s

Returns true iff this is marked as flush pending otherwise false

Creates a new with a given maximum of s.

Returns the max number of instances available in this

Returns the active number of instances.

Returns a new iff any new state is available otherwise null. NOTE: the returned is already locked iff non-null.

a new iff any new state is available otherwise null

Deactivate all unreleased threadstates

Returns the ith active where i is the given ord.

the ordinal of the the ith active where i is the given ord.

Returns the number of currently deactivated instances. A deactivated should not be used for indexing anymore.

the number of currently deactivated instances.

Deactivates an active . Inactive can not be used for indexing anymore once they are deactivated. This method should only be used if the parent is closed or aborted.

the state to deactivate

Controls the health status of a sessions. This class used to block incoming indexing threads if flushing significantly slower than indexing to ensure the s healthiness. If flushing is significantly slower than indexing the net memory used within an session can increase very quickly and easily exceed the runtime's available memory. To prevent OOM Errors and ensure 's stability this class blocks incoming threads from indexing once 2 x number of available is exceeded. Once flushing catches up and the number of flushing DWPT is equal or lower than the number of active s threads are released and can continue indexing.

Update the stalled flag status. this method will set the stalled flag to true iff the number of flushing is greater than the number of active . Otherwise it will reset the to healthy and release all threads waiting on

Blocks if documents writing is currently in a stalled state.

This class contains utility methods and constants for

An empty which returns for every document

An empty which returns zero for every document

An empty which returns for every document

Returns a multi-valued view over the provided

Returns a single-valued view of the , if it was previously wrapped with , or null.

Returns a representing all documents from that have a value.

Holds updates of a single field, for a set of documents. @lucene.experimental

Add an update to a document from a . The 's value should be null to unset a value. Note that the value is exposed by casting to the apprpriate subclasss.

Returns a over the updated documents and their values.

Merge with another . this is called for a segment which received updates while it was being merged. The given updates should override whatever updates are in that instance.

Returns true if this instance contains any updates.

TODO

An iterator over documents. Only documents with updates are returned by this iterator, and the documents are returned in increasing order.

Returns the next document which has an update, or if there are no more documents to return.

Returns the current document this iterator is on.

Reset the iterator's state. Should be called before and value.

An iterator over documents and their updated values. This differs from in that it exposes the strongly-typed value. Only documents with updates are returned by this iterator, and the documents are returned in increasing order.

Returns the value of the document returned from . A null value means that it was unset for this document.

An in-place update to a field.

Constructor.

the the which determines the documents that will be updated the to update

An in-place update to a binary field

An in-place update to a numeric field

Access to the Field Info file that describes document fields and whether or not they are indexed. Each segment has a separate Field Info file. Objects of this class are thread-safe for multiple readers, but only one thread can be adding documents at a time, with no other reader or writer threads accessing this object.

Field's name

Internal field number

Sole Constructor. @lucene.experimental

Returns for the field, or null if the field is not indexed

Returns true if this field has any docValues.

Gets or Sets the docValues generation of this field, or -1 if no docValues.

Returns of the norm. This may be if the field has no norms.

Returns true if norms are explicitly omitted for this field

Returns true if this field actually has any norms.

Returns true if this field is indexed.

Returns true if any payloads exist for this field.

Returns true if any term vectors exist for this field.

Get a codec attribute value, or null if it does not exist

Puts a codec attribute value. this is a key-value mapping for the field that the codec can use to store additional metadata, and will be available to the codec when reading the segment via If a value already exists for the field, it will be replaced with the new value.

Returns internal codec attributes map. May be null if no mappings exist.

Controls how much information is stored in the postings lists. @lucene.experimental

No index options will be used. NOTE: This is the same as setting to null in Lucene

Only documents are indexed: term frequencies and positions are omitted. Phrase and other positional queries on the field will throw an exception, and scoring will behave as if any term in the document appears only once.

Only documents and term frequencies are indexed: positions are omitted. this enables normal scoring, except Phrase and other positional queries will throw an exception.

Indexes documents, frequencies and positions. this is a typical default for full-text search: full scoring is enabled and positional queries are supported.

Indexes documents, frequencies, positions and offsets. Character offsets are encoded alongside the positions.

DocValues types. Note that DocValues is strongly typed, so a field cannot have different types across different documents.

No doc values type will be used. NOTE: This is the same as setting to null in Lucene

A per-document numeric type

A per-document . Values may be larger than 32766 bytes, but different codecs may enforce their own limits.

A pre-sorted . Fields with this type only store distinct byte values and store an additional offset pointer per document to dereference the shared byte[]. The stored byte[] is presorted and allows access via document id, ordinal and by-value. Values must be <= 32766 bytes.

A pre-sorted ISet<byte[]>. Fields with this type only store distinct byte values and store additional offset pointers per document to dereference the shared s. The stored is presorted and allows access via document id, ordinal and by-value. Values must be <= 32766 bytes.

Collection of s (accessible by number or by name). @lucene.experimental

Constructs a new from an array of objects

Returns true if any fields have freqs

Returns true if any fields have positions

Returns true if any fields have payloads

Returns true if any fields have offsets

Returns true if any fields have vectors

Returns true if any fields have norms

Returns true if any fields have

Returns the number of fields. NOTE: This was size() in Lucene.

Returns an iterator over all the fieldinfo objects present, ordered by ascending field number

Return the object referenced by the

the object or null when the given doesn't exist.

Return the object referenced by the .

field's number. the object or null when the given doesn't exist. if is negative

Returns the global field number for the given field name. If the name does not exist yet it tries to add it with the given preferred field number assigned if possible otherwise the first unassigned field number is used as the field number.

Returns true if the exists in the map and is of the same .

Creates a new instance with the given .

NOTE: this method does not carry over termVector booleans nor docValuesType; the indexer chain (TermVectorsConsumerPerField, DocFieldProcessor) must set these fields when they succeed in consuming the document

This class tracks the number and position / offset parameters of terms being added to the index. The information collected in this class is also used to calculate the normalization factor for a field. @lucene.experimental

Creates for the specified field name.

Creates for the specified field name and values for all fields.

Re-initialize the state

Gets the last processed term position.

the position

Gets or Sets total number of terms in this field.

the length

Gets or Sets the number of terms with positionIncrement == 0.

the numOverlap

Gets end offset of the last processed term.

the offset

Gets or Sets boost value. This is the cumulative product of document boost and field boost for all field instances sharing the same field name.

the boost

Get the maximum term-frequency encountered for any term in the field. A field containing "the quick brown fox jumps over the lazy dog" would have a value of 2, because "the" appears twice.

Gets the number of unique terms encountered in this field.

Gets the from the that provided the indexed tokens for this field.

Gets the field's name

Flex API for access to fields and terms @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns an enumerator that will step through all field names. This will not return null.

Get the for this field. This will return null if the field does not exist.

Gets the number of fields or -1 if the number of distinct field names is unknown. If >= 0, will return as many field names. NOTE: This was size() in Lucene.

Returns the number of terms for all fields, or -1 if this measure isn't stored by the codec. Note that, just like other term measures, this measure does not take deleted documents into account.

Zero-length array.

A contains another , which it uses as its basic source of data, possibly transforming the data along the way or providing additional functionality. The class itself simply implements all abstract methods of with versions that pass all requests to the contained index reader. Subclasses of may further override some of these methods and may also provide additional methods and fields. NOTE: If you override , you will likely need to override as well and vice-versa. NOTE: If this does not change the content the contained reader, you could consider overriding so that and share the same entries for this atomic reader and the wrapped one. could be overridden as well if the are not changed either.

Get the wrapped instance by as long as this reader is an intance of .

Base class for filtering implementations.

The underlying instance.

Creates a new .

the underlying instance.

Base class for filtering implementations. NOTE: If the order of terms and documents is not changed, and if these terms are going to be intersected with automata, you could consider overriding for better performance.

The underlying instance.

Creates a new

the underlying instance.

Base class for filtering implementations.

The underlying instance.

Creates a new

the underlying instance.

Base class for filtering implementations.

The underlying instance.

Create a new

the underlying instance.

Base class for filtering implementations.

The underlying instance.

Create a new

the underlying instance.

The underlying .

Construct a based on the specified base reader. Note that base reader is closed if this is closed.

specified base reader.

A wraps another , allowing implementations to transform or extend it. Subclasses should implement to return an instance of the subclass. If the subclass wants to wrap the 's subreaders, it should also implement a subclass, and pass an instance to its base constructor.

Factory class passed to constructor that allows subclasses to wrap the filtered 's subreaders. You can use this to, e.g., wrap the subreaders with specialized implementations.

Constructor

Wrap one of the parent 's subreaders

the subreader to wrap a wrapped/filtered

A no-op that simply returns the parent 's original subreaders.

Constructor

The filtered

Create a new that filters a passed in .

the to filter

Create a new that filters a passed in , using the supplied to wrap its subreader.

the to filter the to use to wrap subreaders

Called by the methods to return a new wrapped . Implementations should just return an instance of themselves, wrapping the passed in .

the to wrap the wrapped

Abstract class for enumerating a subset of all terms. Term enumerations are always ordered by . Each term in the enumeration is greater than all that precede it. Please note: Consumers of this enumeration cannot call Seek(), it is forward only; it throws when a seeking method is called.

Return value, if term should be accepted or the iteration should . The *_SEEK values denote, that after handling the current term the enum should call and step forward.

Accept the term and position the enum at the next term.

Accept the term and advance () to the next term.

Reject the term and position the enum at the next term.

Reject the term and advance () to the next term.

Reject the term and stop enumerating.

Return if term is accepted, not accepted or the iteration should ended (and possibly seek).

Creates a filtered on a terms enum.

the terms enumeration to filter.

Creates a filtered on a terms enum.

the terms enumeration to filter. start with seek

Use this method to set the initial to seek before iterating. This is a convenience method for subclasses that do not override . If the initial seek term is null (default), the enum is empty. You can only use this method, if you keep the default implementation of .

On the first call to or if returns or , this method will be called to eventually seek the underlying to a new position. On the first call, will be null, later calls will provide the term the underlying enum is positioned at. This method returns per default only one time the initial seek term and then null, so no repositioning is ever done. Override this method, if you want a more sophisticated , that repositions the iterator during enumeration. If this method always returns null the enum is empty. Please note: this method should always provide a greater term than the last enumerated term, else the behavior of this enum violates the contract for s.

Returns the related attributes, the returned is shared with the delegate .

this enum does not support seeking!

In general, subclasses do not support seeking.

this enum does not support seeking!

In general, subclasses do not support seeking.

this enum does not support seeking!

In general, subclasses do not support seeking.

this enum does not support seeking!

In general, subclasses do not support seeking.

Returns the filtered enums term state

Default implementation that flushes new segments based on RAM used and document count depending on the 's . It also applies pending deletes based on the number of buffered delete terms. - applies pending delete operations based on the global number of buffered delete terms iff is enabled - flushes either on the number of documents per ( ) or on the global active memory consumption in the current indexing session iff or is enabled respectively - calls and in order All settings are used to mark as flush pending during indexing with respect to their live updates. If (setter) is enabled, the largest ram consuming will be marked as pending iff the global active RAM consumption is >= the configured max RAM buffer.

Marks the most ram consuming active flush pending

Returns true if this flushes on , otherwise false.

controls when segments are flushed from a RAM resident internal data-structure to the s . Segments are traditionally flushed by: RAM consumption - configured via Number of RAM resident documents - configured via The policy also applies pending delete operations (by term and/or query), given the threshold set in . consults the provided to control the flushing process. The policy is informed for each added or updated document as well as for each delete term. Based on the , the information provided via and , the decides if a needs flushing and mark it as flush-pending via , or if deletes need to be applied.

Called for each delete term. If this is a delete triggered due to an update the given is non-null. Note: this method is called synchronized on the given and it is guaranteed that the calling thread holds the lock on the given

Called for each document update on the given 's . Note: this method is called synchronized on the given and it is guaranteed that the calling thread holds the lock on the given

Called for each document addition on the given s . Note: this method is synchronized by the given and it is guaranteed that the calling thread holds the lock on the given

Called by to initialize the

Returns the current most RAM consuming non-pending with at least one indexed document. This method will never return null

Walk through all unique text tokens (Posting instances) found in this field and serialize them into a single RAM segment.

Holds buffered deletes and updates by term or query, once pushed. Pushed deletes/updates are write-once, so we shift to more memory efficient data structure to hold them. We don't hold docIDs because these are applied on flush.

Query we often undercount (say 24 bytes), plus int.

Terms, in sorted order:

Parallel array of deleted query, and the docIDUpto for each

numeric DV update term and their updates

binary DV update term and their updates

Represents a single field for indexing. consumes IEnumerable<IndexableField> as a document. @lucene.experimental

Field name

describing the properties of this field.

Returns the field's index-time boost. Only fields can have an index-time boost, if you want to simulate a "document boost", then you must pre-multiply it across all the relevant fields yourself. The boost is used to compute the norm factor for the field. By default, in the method, the boost value is multiplied by the length normalization factor and then rounded by before it is stored in the index. One should attempt to ensure that this product does not overflow the range of that encoding. It is illegal to return a boost other than 1.0f for a field that is not indexed ( is false) or omits normalization values ( returns true).

Non-null if this field has a binary value.

Non-null if this field has a string value.

The string representation of the value if it is either a or numeric type.

The value of the field as a , or null. If null, the value or binary value is used. Exactly one of , , and must be set.

An object that supplies culture-specific formatting information. This parameter has no effect if this field is non-numeric. The string representation of the value if it is either a or numeric type.

The value of the field as a , or null. If null, the value or binary value is used. Exactly one of , , and must be set.

A standard or custom numeric format string. This parameter has no effect if this field is non-numeric. The string representation of the value if it is either a or numeric type.

The value of the field as a , or null. If null, the value or binary value is used. Exactly one of , , and must be set.

Non-null if this field has a value

Non-null if this field has a numeric value.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Returns the field value as or null if the type is non-numeric.

The field value or null if the type is non-numeric.

Creates the used for indexing this field. If appropriate, implementations should use the given to create the s.

that should be used to create the s from value for indexing the document. Should always return a non-null value if the field is to be indexed Can be thrown while creating the

Describes the properties of a field. @lucene.experimental

true if this field should be indexed (inverted)

true if the field's value should be stored

true if this field's value should be analyzed by the . This has no effect if returns false.

true if this field's indexed form should be also stored into term vectors. this builds a miniature inverted-index for this field which can be accessed in a document-oriented way from . This option is illegal if returns false.

true if this field's token character offsets should also be stored into term vectors. This option is illegal if term vectors are not enabled for the field ( is false)

true if this field's token positions should also be stored into the term vectors. This option is illegal if term vectors are not enabled for the field ( is false).

true if this field's token payloads should also be stored into the term vectors. This option is illegal if term vector positions are not enabled for the field ( is false).

true if normalization values should be omitted for the field. This saves memory, but at the expense of scoring quality (length normalization will be disabled), and if you omit norms, you cannot use index-time boosts.

, describing what should be recorded into the inverted index

DocValues : if not then the field's value will be indexed into docValues.

Expert: represents a single commit into an index as seen by the or . Changes to the content of an index are made visible only after the writer who made that change commits by writing a new segments file (segments_N). This point in time, when the action of writing of a new segments file to the directory is completed, is an index commit. Each index commit point has a unique segments file associated with it. The segments file associated with a later index commit point would have a larger N. @lucene.experimental

Get the segments file (segments_N) associated with this commit point.

Returns all index files referenced by this commit point.

Returns the for the index.

Delete this commit point. This only applies when using the commit point in the context of 's . Upon calling this, the writer is notified that this commit point should be deleted. Decision that a commit-point should be deleted is taken by the in effect and therefore this should only be called by its or methods.

Returns true if this commit should be deleted; this is only used by after invoking the .

Returns number of segments referenced by this commit.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Two IndexCommits are equal if both their and versions are equal.

Returns the generation (the _N in segments_N) for this

Returns userData, previously passed to } for this commit. The dictionary is -> .

Expert: policy for deletion of stale s. Implement this interface, and pass it to one of the or constructors, to customize when older point-in-time commits () are deleted from the index directory. The default deletion policy is , which always removes old commits as soon as a new commit is done (this matches the behavior before 2.2). One expected use case for this (and the reason why it was first created) is to work around problems with an index directory accessed via filesystems like NFS because NFS does not provide the "delete on last close" semantics that Lucene's "point in time" search normally relies on. By implementing a custom deletion policy, such as "a commit is only removed once it has been stale for more than X minutes", you can give your readers time to refresh to the new commit before removes the old commits. Note that doing so will increase the storage requirements of the index. See LUCENE-710 for details. Implementers of sub-classes should make sure that returns an independent instance able to work with any other or instance.

Sole constructor, typically called by sub-classes constructors.

this is called once when a writer is first instantiated to give the policy a chance to remove old commit points. The writer locates all index commits present in the index directory and calls this method. The policy may choose to delete some of the commit points, doing so by calling method . Note: the last CommitPoint is the most recent one, i.e. the "front index state". Be careful not to delete it, unless you know for sure what you are doing, and unless you can afford to lose the index content while doing that.

List of current point-in-time commits (), sorted by age (the 0th one is the oldest commit). Note that for a new index this method is invoked with an empty list.

this is called each time the writer completed a commit. this gives the policy a chance to remove old commit points with each commit. The policy may now choose to delete old commit points by calling method of . This method is only called when } or is called, or possibly not at all if the } method is called. Note: the last CommitPoint is the most recent one, i.e. the "front index state". Be careful not to delete it, unless you know for sure what you are doing, and unless you can afford to lose the index content while doing that.

List of s, sorted by age (the 0th one is the oldest commit).

This class keeps track of each SegmentInfos instance that is still "live", either because it corresponds to a segments_N file in the (a "commit", i.e. a committed ) or because it's an in-memory that a writer is actively updating but has not yet committed. This class uses simple reference counting to map the live instances to individual files in the . The same directory file may be referenced by more than one , i.e. more than one . Therefore we count how many commits reference each file. When all the commits referencing a certain file have been deleted, the refcount for that file becomes zero, and the file is deleted. A separate deletion policy interface () is consulted on creation (OnInit) and once per commit (OnCommit), to decide when a commit should be removed. It is the business of the to choose when to delete commit points. The actual mechanics of file deletion, retrying, etc, derived from the deletion of commit points is the business of the . The current default deletion policy is , which removes all prior commits when a new commit has completed. This matches the behavior before 2.2. Note that you must hold the write.lock before instantiating this class. It opens segments_N file(s) directly with no retry logic.

Files that we tried to delete but failed (likely because they are open and we are running on Windows), so we will retry them again later:

Reference count for all files in the index. Counts how many existing commits reference a file.

Holds all commits (segments_N) currently in the index. this will have just 1 commit if you are using the default delete policy (KeepOnlyLastCommitDeletionPolicy). Other policies may leave commit points live for longer in which case this list would be longer than 1:

Holds files we had incref'd from the previous non-commit checkpoint:

Commits that the IndexDeletionPolicy have decided to delete:

Change to true to see details of reference counts when infoStream is enabled

Initialize the deleter: find all previous commits in the , incref the files they reference, call the policy to let it delete commits. this will remove any files not referenced by any of the commits.

if there is a low-level IO error

Remove the CommitPoints in the commitsToDelete List by DecRef'ing all files from each SegmentInfos.

Writer calls this when it has hit an error and had to roll back, to tell us that there may now be unreferenced files in the filesystem. So we re-list the filesystem and delete such files. If is non-null, we will only delete files corresponding to that segment.

Revisits the by calling its again with the known commits. this is useful in cases where a deletion policy which holds onto index commits is used. The application may know that some commits are not held by the deletion policy anymore and call , which will attempt to delete the unused commits again.

For definition of "check point" see comments: "Clarification: Check Points (and commits)". Writer calls this when it has made a "consistent change" to the index, meaning new files are written to the index and the in-memory have been modified to point to those files. This may or may not be a commit (segments_N may or may not have been written). We simply incref the files referenced by the new and decref the files we had previously seen (if any). If this is a commit, we also call the policy to give it a chance to remove other commits. If any commits are removed, we decref their files as well.

Deletes the specified files, but only if they are new (have not yet been incref'd).

Tracks the reference count for a single index file:

Holds details for each commit point. This class is also passed to the deletion policy. Note: this class has a natural ordering that is inconsistent with equals.

Called only by the deletion policy, to remove this commit point from the index.

This class contains useful constants representing filenames and extensions used by lucene, as well as convenience methods for querying whether a file name matches an extension (), as well as generating file names from a segment name, generation and extension (, ). NOTE: extensions used by codecs are not listed here. You must interact with the directly. @lucene.internal

No instance

Name of the index segment file

Extension of gen file

Name of the generation reference file name

Extension of compound file

Extension of compound file entries

This array contains all filename extensions used by Lucene's index files, with one exception, namely the extension made up from .s + a number. Also note that Lucene's segments_N files do not have any filename extension.

Computes the full file name from base, extension and generation. If the generation is -1, the file name is null. If it's 0, the file name is <base>.<ext>. If it's > 0, the file name is <base>_<gen>.<ext>. NOTE: .<ext> is added to the name only if ext is not an empty string.

main part of the file name extension of the filename generation

Returns a file name that includes the given segment name, your own custom name and extension. The format of the filename is: <segmentName>(_<name>)(.<ext>). NOTE: .<ext> is added to the result file name only if ext is not empty. NOTE: _<segmentSuffix> is added to the result file name only if it's not the empty string NOTE: all custom files should be named using this method, or otherwise some structures may fail to handle them properly (such as if they are added to compound files).

Returns true if the given filename ends with the given extension. One should provide a pure extension, without '.'.

Locates the boundary of the segment name, or -1

Strips the segment name out of the given file name. If you used or to create your files, then this method simply removes whatever comes before the first '.', or the second '_' (excluding both).

the filename with the segment name removed, or the given filename if it does not contain a '.' and '_'.

Parses the segment name out of the given file name.

the segment name only, or filename if it does not contain a '.' and '_'.

Removes the extension (anything after the first '.'), otherwise returns the original filename.

Return the extension (anything after the first '.'), or null if there is no '.' in the file name.

All files created by codecs much match this pattern (checked in ).

This exception is thrown when Lucene detects an index that is newer than this Lucene version.

Creates an @lucene.internal

describes the file that was too old the version of the file that was too old the minimum version accepted the maxium version accepted

Creates an @lucene.internal

the open file that's too old the version of the file that was too old the minimum version accepted the maxium version accepted

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

This exception is thrown when Lucene detects an index that is too old for this Lucene version

Creates an . @lucene.internal

describes the file that was too old the version of the file that was too old

Creates an . @lucene.internal

the open file that's too old the version of the file that was too old

Creates an . @lucene.internal

describes the file that was too old the version of the file that was too old the minimum version accepted the maxium version accepted

Creates an . @lucene.internal

the open file that's too old the version of the file that was too old the minimum version accepted the maxium version accepted

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Signals that no index was found in the . Possibly because the directory is empty, however can also indicate an index corruption.

Creates with the description message.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

is an abstract class, providing an interface for accessing an index. Search of an index is done entirely through this abstract interface, so that any subclass which implements it is searchable. There are two different types of s: : These indexes do not consist of several sub-readers, they are atomic. They support retrieval of stored fields, doc values, terms, and postings. : Instances (like ) of this reader can only be used to get stored fields from the underlying s, but it is not possible to directly retrieve postings. To do that, get the sub-readers via . Alternatively, you can mimic an (with a serious slowdown), by wrapping composite readers with . instances for indexes on disk are usually constructed with a call to one of the static DirectoryReader.Open() methods, e.g. . inherits the abstract class, it is not possible to directly get postings. For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions. NOTE: instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the instance; use your own (non-Lucene) objects instead.

Expert: adds a . The provided listener will be invoked when this reader is disposed. NOTE: This was addReaderClosedListener() in Lucene. @lucene.experimental

Expert: remove a previously added . NOTE: This was removeReaderClosedListener() in Lucene. @lucene.experimental

Expert: this method is called by s which wrap other readers (e.g. or ) to register the parent at the child (this reader) on construction of the parent. When this reader is disposed, it will mark all registered parents as disposed, too. The references to parent readers are weak only, so they can be GCed once they are no longer in use. @lucene.experimental

Expert: returns the current refCount for this reader

Expert: increments the of this instance. s are used to determine when a reader can be disposed safely, i.e. as soon as there are no more references. Be sure to always call a corresponding , in a finally clause; otherwise the reader may never be disposed. Note that simply calls , which means that the will not really be disposed until has been called for all outstanding references.

Expert: increments the of this instance only if the has not been disposed yet and returns true iff the was successfully incremented, otherwise false. If this method returns false the reader is either already disposed or is currently being disposed. Either way this reader instance shouldn't be used by an application unless true is returned. s are used to determine when a reader can be disposed safely, i.e. as soon as there are no more references. Be sure to always call a corresponding , in a finally clause; otherwise the reader may never be disposed. Note that simply calls , which means that the will not really be disposed until has been called for all outstanding references.

Expert: decreases the of this instance. If the drops to 0, then this reader is disposed. If an exception is hit, the is unchanged.

in case an occurs in

Throws if this or any of its child readers is disposed, otherwise returns.

Determines whether two object instances are equal. For caching purposes, subclasses are not allowed to implement Equals/GetHashCode, so methods are declared sealed. To lookup instances from caches use and .

Serves as the default hash function. For caching purposes, subclasses are not allowed to implement Equals/GetHashCode, so methods are declared sealed. To lookup instances from caches use and .

Returns a reading the index in the given

the index directory if there is a low-level IO error

Expert: Returns a reading the index in the given with the given .

the index directory Subsamples which indexed terms are loaded into RAM. this has the same effect as (which can be set on ) except that setting must be done at indexing time while this setting can be set per reader. When set to N, then one in every N*termIndexInterval terms in the index is loaded into memory. By setting this to a value > 1 you can reduce memory usage, at the expense of higher latency when loading a TermInfo. The default value is 1. Set this to -1 to skip loading the terms index entirely. if there is a low-level IO error

Open a near real time from the .

The to open from If true, all buffered deletes will be applied (made visible) in the returned reader. If false, the deletes are not applied but remain buffered (in ) so that they will be applied in the future. Applying deletes can be costly, so if your app can tolerate deleted documents being returned you might gain some performance by passing false. The new if there is a low-level IO error @lucene.experimental

Expert: returns an reading the index in the given .

the commit point to open if there is a low-level IO error

Expert: returns an reading the index in the given and .

the commit point to open Subsamples which indexed terms are loaded into RAM. this has the same effect as (which can be set in ) except that setting must be done at indexing time while this setting can be set per reader. When set to N, then one in every N*termIndexInterval terms in the index is loaded into memory. By setting this to a value > 1 you can reduce memory usage, at the expense of higher latency when loading a TermInfo. The default value is 1. Set this to -1 to skip loading the terms index entirely. if there is a low-level IO error

Retrieve term vectors for this document, or null if term vectors were not indexed. The returned instance acts like a single-document inverted index (the docID will be 0).

Retrieve term vector for this document and field, or null if term vectors were not indexed. The returned instance acts like a single-document inverted index (the docID will be 0).

Returns the number of documents in this index.

Returns one greater than the largest possible document number. this may be used to, e.g., determine how big to allocate an array which will have an element for every document number in an index.

Returns the number of deleted documents.

Expert: visits the fields of a stored document, for custom processing/loading of each field. If you simply want to load all fields, use . If you want to load a subset, use .

Returns the stored fields of the n^th in this index. This is just sugar for using . NOTE: for performance reasons, this method does not check if the requested document is deleted, and therefore asking for a deleted document may yield unspecified results. Usually this is not required, however you can test if the doc is deleted by checking the returned from . NOTE: only the content of a field is returned, if that field was stored during indexing. Metadata like boost, omitNorm, IndexOptions, tokenized, etc., are not preserved.

if there is a low-level IO error

Like but only loads the specified fields. Note that this is simply sugar for .

Returns true if any documents have been deleted. Implementers should consider overriding this property if or are not constant-time operations.

Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called.

If there is a low-level IO error

Closes files associated with this index. This method implements the disposable pattern. It may be overridden to dispose any managed or unmanaged resources, but be sure to call base.Dispose(disposing) to close files associated with the underlying .

true indicates to dispose all managed and unmanaged resources, false indicates dispose unmanaged resources only

Implements close.

Expert: Returns the root for this 's sub-reader tree. Iff this reader is composed of sub readers, i.e. this reader being a composite reader, this method returns a holding the reader's direct children as well as a view of the reader tree's atomic leaf contexts. All sub- instances referenced from this readers top-level context are private to this reader and are not shared with another context tree. For example, uses this API to drive searching by one atomic leaf reader at a time. If this reader is not composed of child readers, this method returns an . Note: Any of the sub- instances referenced from this top-level context do not support . Only the top-level context maintains the convenience leaf-view for performance reasons.

Returns the reader's leaves, or itself if this reader is atomic. This is a convenience method calling this.Context.Leaves.

Expert: Returns a key for this , so / can find it again. This key must not have Equals()/GetHashCode() methods, so "equals" means "identical".

Expert: Returns a key for this that also includes deletions, so / can find it again. This key must not have Equals()/GetHashCode() methods, so "equals" means "identical".

Returns the total number of occurrences of across all documents (the sum of the Freq for each doc that has this term). This will be -1 if the codec doesn't support this measure. Note that, like other term measures, this measure does not take deleted documents into account.

Returns the sum of for all terms in this field, or -1 if this measure isn't stored by the codec. Note that, just like other term measures, this measure does not take deleted documents into account.

Returns the number of documents that have at least one term for this field, or -1 if this measure isn't stored by the codec. Note that, just like other term measures, this measure does not take deleted documents into account.

Returns the sum of for all terms in this field, or -1 if this measure isn't stored by the codec (or if this fields omits term freq and positions). Note that, just like other term measures, this measure does not take deleted documents into account.

A custom listener that's invoked when the is closed. @lucene.experimental

Invoked when the is closed.

Expert: adds a . The provided listener will be invoked when this reader is closed. @lucene.experimental

Expert: remove a previously added . @lucene.experimental

A custom listener that's invoked when the is disposed. NOTE: This was IndexReader.ReaderClosedListener in Lucene. @lucene.experimental

Invoked when the is disposed.

A struct like class that represents a hierarchical relationship between instances.

The reader context for this reader's immediate parent, or null if none

true if this context struct represents the top level reader within the hierarchical context

the doc base for this reader in the parent, 0 if parent is null

the ord for this reader in the parent, 0 if parent is null

Returns the , this context represents.

Returns the context's leaves if this context is a top-level context. For convenience, if this is an this returns itself as the only leaf. Note: this is convenience method since leaves can always be obtained by walking the context tree using .

if this is not a top-level context.

Returns the context's children iff this context is a composite context otherwise null.

This is an easy-to-use tool that upgrades all segments of an index from previous Lucene versions to the current segment file format. It can be used from command line. LUCENENET specific: In the Java implementation, this class' Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to that method: index upgrade Alternatively this class can be instantiated and invoked. It uses and triggers the upgrade via an request to . This tool keeps only the last commit in an index; for this reason, if the incoming index has more than one commit, the tool refuses to run by default. Specify -delete-prior-commits to override this, allowing the tool to delete all but the last commit. From .NET code this can be enabled by passing true to . Warning: this tool may reorder documents if the index was partially upgraded before execution (e.g., documents were added). If your application relies on "monotonicity" of doc IDs (which means that the order in which the documents were added to the index is preserved), do a full ForceMerge instead. The set by may also reorder documents.

Main method to run from the command-line. LUCENENET specific: In the Java implementation, this Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to this method: index upgrade

The command line arguments Thrown if any incorrect arguments are provided

Creates index upgrader on the given directory, using an using the given . The tool refuses to upgrade indexes with multiple commit points.

Creates index upgrader on the given directory, using an using the given . You have the possibility to upgrade indexes with multiple commit points by removing all older ones. If is not null, all logging output will be sent to this stream.

Creates index upgrader on the given directory, using an using the given config. You have the possibility to upgrade indexes with multiple commit points by removing all older ones.

Perform the upgrade.

An creates and maintains an index.

The option on determines whether a new index is created, or whether an existing index is opened. Note that you can open an index with even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. If is used will create a new index if there is not already an index at the provided path and otherwise open the existing index. In either case, documents are added with and removed with or . A document can be updated with (which just deletes and then adds the entire document). When finished adding, deleting and updating documents, should be called. These changes are buffered in memory and periodically flushed to the (during the above method calls). A flush is triggered when there are enough added documents since the last flush. Flushing is triggered either by RAM usage of the documents (see ) or the number of added documents (see ). The default is to flush when RAM usage hits MB. For best indexing speed you should flush by RAM usage with a large RAM buffer. Additionally, if reaches the configured number of buffered deletes (see ) the deleted terms and queries are flushed and applied to existing segments. In contrast to the other flush options and , deleted terms won't trigger a segment flush. Note that flushing just moves the internal buffered state in into the index, but these changes are not visible to until either or is called. A flush may also trigger one or more segment merges which by default run with a background thread so as not to block the addDocument calls (see below for changing the ). Opening an creates a lock file for the directory in use. Trying to open another on the same directory will lead to a . The is also thrown if an on the same directory is used to delete documents from the index. Expert: allows an optional implementation to be specified. You can use this to control when prior commits are deleted from the index. The default policy is which removes all prior commits as soon as a new commit is done (this matches behavior before 2.2). Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, to allow readers to refresh to the new commit without having the old commit deleted out from under them. This is necessary on filesystems like NFS that do not support "delete on last close" semantics, which Lucene's "point in time" search normally relies on. Expert: allows you to separately change the and the . The is invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return a describing the merges. The default is . Then, the is invoked with the requested merges and it decides when and how to run the merges. The default is . NOTE: if you hit an then will quietly record this fact and block all future segment commits. This is a defensive measure in case any internal state (buffered documents and deletions) were corrupted. Any subsequent calls to will throw an . The only course of action is to call , which internally will call , to undo any changes to the index since the last commit. You can also just call directly. NOTE: instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the instance as this may cause deadlock; use your own (non-Lucene) objects instead. NOTE: Do not use on a thread that's within , as .NET will throw on any wait, sleep, or join including any lock statement with contention on it. As a result, it is not practical to try to support due to the chance could potentially be thrown in the middle of a or somewhere in the application that will cause a deadlock. We recommend using another shutdown mechanism to safely cancel a parallel operation. See: https://github.com/apache/lucenenet/issues/526.

Name of the write lock in the index.

Key for the source of a segment in the .

Source of a segment which results from a merge of other segments.

Source of a segment which results from a flush.

Source of a segment which results from a call to .

Absolute hard maximum length for a term, in bytes once encoded as UTF8. If a term arrives from the analyzer longer than this length, an is thrown and a message is printed to , if set (see ).

Expert: returns a readonly reader, covering all committed as well as un-committed changes to the index. this provides "near real-time" searching, in that changes made during an session can be quickly made available for searching without closing the writer nor calling . Note that this is functionally equivalent to calling Flush() and then opening a new reader. But the turnaround time of this method should be faster since it avoids the potentially costly . You must close the returned by this method once you are done using it. It's near real-time because there is no hard guarantee on how quickly you can get a new reader after making changes with . You'll have to experiment in your situation to determine if it's fast enough. As this is a new and experimental feature, please report back on your findings so we can learn, improve and iterate. The resulting reader supports , but that call will simply forward back to this method (though this may change in the future). The very first time this method is called, this writer instance will make every effort to pool the readers that it opens for doing merges, applying deletes, etc. This means additional resources (RAM, file descriptors, CPU time) will be consumed. For lower latency on reopening a reader, you should set to pre-warm a newly merged segment before it's committed to the index. This is important for minimizing index-to-search delay after a large merge. If an AddIndexes* call is running in another thread, then this reader will only search those segments from the foreign index that have been successfully copied over, so far. NOTE: Once the writer is disposed, any outstanding readers may continue to be used. However, if you attempt to reopen any of those readers, you'll hit an . @lucene.experimental

that covers entire index plus all changes made so far by this instance If there is a low-level I/O error

Holds shared instances. uses s for 1) applying deletes, 2) doing merges, 3) handing out a real-time reader. This pool reuses instances of the s in all these places if it is in "near real-time mode" ( has been called on this instance).

Remove all our references to readers, and commits any pending changes.

Commit live docs changes for the segment readers for the provided infos.

If there is a low-level I/O error

Obtain a instance from the readerPool. If is true, you must later call .

Make sure that every segment appears only once in the pool:

Obtain the number of deleted docs for a pooled reader. If the reader isn't being pooled, the segmentInfo's delCount is returned.

Used internally to throw an if this has been disposed or is in the process of diposing.

if true, also fail when is in the process of disposing (closing=true) but not yet done disposing ( closed=false) if this IndexWriter is closed or in the process of closing

Used internally to throw an if this has been disposed (closed=true) or is in the process of disposing (closing=true). Calls .

if this is disposed

Constructs a new per the settings given in . If you want to make "live" changes to this writer instance, use . NOTE: after ths writer is created, the given configuration instance cannot be passed to another writer. If you intend to do so, you should it beforehand.

the index directory. The index is either created or appended according . the configuration settings according to which should be initialized. if the directory cannot be read/written to, or if it does not exist and is or if there is any other low-level IO error

Loads or returns the already loaded the global field number map for . If has no global field number map the returned instance is empty

Returns a , which can be used to query the current settings, as well as modify "live" ones.

Commits all changes to an index, waits for pending merges to complete, and closes all associated files. This is a "slow graceful shutdown" which may take a long time especially if a big merge is pending: If you only want to close resources use . If you only want to commit pending changes and close resources see . Note that this may be a costly operation, so, try to re-use a single writer instead of closing and opening a new one. See for caveats about write caching done by some IO devices. If an is hit during close, eg due to disk full or some other reason, then both the on-disk index and the internal state of the instance will be consistent. However, the close will not be complete even though part of it (flushing buffered documents) may have succeeded, so the write lock will still be held. If you can correct the underlying cause (eg free up some disk space) then you can call again. Failing that, if you want to force the write lock to be released (dangerous, because you may then lose buffered docs in the instance) then you can do something like this:


             try
             {
                 writer.Dispose();
             }
             finally
             {
                 if (IndexWriter.IsLocked(directory))
                 {
                     IndexWriter.Unlock(directory);
                 }
             }

after which, you must be certain not to use the writer instance anymore. NOTE: if this method hits an you should immediately dispose the writer, again. See for details.

if there is a low-level IO error

Disposes the index with or without waiting for currently running merges to finish. This is only meaningful when using a that runs merges in background threads. NOTE: If this method hits an you should immediately dispose the writer, again. See for details. NOTE: It is dangerous to always call Dispose(false), especially when is not open for very long, because this can result in "merge starvation" whereby long merges will never have a chance to finish. This will cause too many segments in your index over time. NOTE: This overload should not be called when implementing a finalizer. Instead, call with disposing set to false and waitForMerges set to true.

If true, this call will block until all merges complete; else, it will ask all running merges to abort, wait until those merges have finished (which should be at most a few seconds), and then return.

Disposes the index with or without waiting for currently running merges to finish. This is only meaningful when using a that runs merges in background threads. This call will block until all merges complete; else, it will ask all running merges to abort, wait until those merges have finished (which should be at most a few seconds), and then return. NOTE: Always be sure to call base.Dispose(disposing, waitForMerges) when overriding this method. NOTE: When implementing a finalizer in a subclass, this overload should be called with set to false and set to true. NOTE: If this method hits an you should immediately dispose the writer, again. See for details. NOTE: It is dangerous to always call with set to false, especially when is not open for very long, because this can result in "merge starvation" whereby long merges will never have a chance to finish. This will cause too many segments in your index over time.

If true, this call will block until all merges complete; else, it will ask all running merges to abort, wait until those merges have finished (which should be at most a few seconds), and then return. true to release both managed and unmanaged resources; false to release only unmanaged resources.

Returns true if this thread should attempt to close, or false if IndexWriter is now closed; else, waits until another thread finishes closing

Gets the used by this index.

Gets the analyzer used by this index.

Gets total number of docs in this index, including docs not yet flushed (still in the RAM buffer), not counting deletions.

Gets total number of docs in this index, including docs not yet flushed (still in the RAM buffer), and including deletions. NOTE: buffered deletions are not counted. If you really need these to be counted you should call first.

Returns true if this index has deletions (including buffered deletions). Note that this will return true if there are buffered Term/Query deletions, even if it turns out those buffered deletions don't match any documents. Also, if a merge kicked off as a result of flushing a

Adds a document to this index. Note that if an is hit (for example disk full) then the index will be consistent, but this document may not have been added. Furthermore, it's possible the index will have one segment in non-compound format even when using compound files (when a merge has partially succeeded). This method periodically flushes pending documents to the (see ), and also periodically triggers segment merges in the index according to the in use. Merges temporarily consume space in the directory. The amount of space required is up to 1X the size of all segments being merged, when no readers/searchers are open against the index, and up to 2X the size of all segments being merged when readers/searchers are open against the index (see for details). The sequence of primitive merge operations performed is governed by the merge policy. Note that each term in the document can be no longer than in bytes, otherwise an will be thrown. Note that it's possible to create an invalid Unicode string in java if a UTF16 surrogate pair is malformed. In this case, the invalid characters are silently replaced with the Unicode replacement character U+FFFD. NOTE: if this method hits an you should immediately dispose the writer. See for details.

if the index is corrupt if there is a low-level IO error

Adds a document to this index, using the provided instead of the value of . See for details on index and state after an , and flushing/merging temporary free space requirements. NOTE: if this method hits an you should immediately dispose the writer. See for details.

if the index is corrupt if there is a low-level IO error

Atomically adds a block of documents with sequentially assigned document IDs, such that an external reader will see all or none of the documents. WARNING: the index does not currently record which documents were added as a block. Today this is fine, because merging will preserve a block. The order of documents within a segment will be preserved, even when child documents within a block are deleted. Most search features (like result grouping and block joining) require you to mark documents; when these documents are deleted these search features will not work as expected. Obviously adding documents to an existing block will require you the reindex the entire block. However it's possible that in the future Lucene may merge more aggressively re-order documents (for example, perhaps to obtain better index compression), in which case you may need to fully re-index your documents at that time. See for details on index and state after an , and flushing/merging temporary free space requirements. NOTE: tools that do offline splitting of an index (for example, IndexSplitter in Lucene.Net.Misc) or re-sorting of documents (for example, IndexSorter in contrib) are not aware of these atomically added documents and will likely break them up. Use such tools at your own risk! NOTE: if this method hits an you should immediately dispose the writer. See for details. @lucene.experimental

if the index is corrupt if there is a low-level IO error

Atomically adds a block of documents, analyzed using the provided , with sequentially assigned document IDs, such that an external reader will see all or none of the documents. @lucene.experimental

if the index is corrupt if there is a low-level IO error

Atomically deletes documents matching the provided and adds a block of documents with sequentially assigned document IDs, such that an external reader will see all or none of the documents. @lucene.experimental

if the index is corrupt if there is a low-level IO error

Atomically deletes documents matching the provided and adds a block of documents, analyzed using the provided , with sequentially assigned document IDs, such that an external reader will see all or none of the documents. @lucene.experimental

if the index is corrupt if there is a low-level IO error

Deletes the document(s) containing . NOTE: if this method hits an you should immediately dispose the writer. See for details.

the term to identify the documents to be deleted if the index is corrupt if there is a low-level IO error

Expert: attempts to delete by document ID, as long as the provided is a near-real-time reader (from . If the provided is an NRT reader obtained from this writer, and its segment has not been merged away, then the delete succeeds and this method returns true; else, it returns false the caller must then separately delete by Term or Query. NOTE: this method can only delete documents visible to the currently open NRT reader. If you need to delete documents indexed after opening the NRT reader you must use the other DeleteDocument() methods (e.g., ).

Deletes the document(s) containing any of the terms. All given deletes are applied and flushed atomically at the same time. NOTE: if this method hits an you should immediately dispose the writer. See for details.

array of terms to identify the documents to be deleted if the index is corrupt if there is a low-level IO error

Deletes the document(s) matching the provided query. NOTE: if this method hits an you should immediately dispose the writer. See for details.

the query to identify the documents to be deleted if the index is corrupt if there is a low-level IO error

Deletes the document(s) matching any of the provided queries. All given deletes are applied and flushed atomically at the same time. NOTE: if this method hits an you should immediately dispose the writer. See for details.

array of queries to identify the documents to be deleted if the index is corrupt if there is a low-level IO error

Updates a document by first deleting the document(s) containing and then adding the new document. The delete and then add are atomic as seen by a reader on the same index (flush may happen only after the add). NOTE: if this method hits an you should immediately dispose the writer. See for details.

the term to identify the document(s) to be deleted the document to be added if the index is corrupt if there is a low-level IO error

the term to identify the document(s) to be deleted the document to be added the analyzer to use when analyzing the document if the index is corrupt if there is a low-level IO error

Updates a document's for to the given . This method can be used to 'unset' a document's value by passing null as the new . Also, you can only update fields that already exist in the index, not add new fields through this method. NOTE: if this method hits an you should immediately dispose the writer. See for details.

the term to identify the document(s) to be updated field name of the field new value for the field if the index is corrupt if there is a low-level IO error

Updates a document's for to the given . this method can be used to 'unset' a document's value by passing null as the new . Also, you can only update fields that already exist in the index, not add new fields through this method. NOTE: this method currently replaces the existing value of all affected documents with the new value. NOTE: if this method hits an you should immediately dispose the writer. See for details.

the term to identify the document(s) to be updated field name of the field new value for the field if the index is corrupt if there is a low-level IO error

If non-null, information about merges will be printed to this.

Forces merge policy to merge segments until there are <= . The actual merges to be executed are determined by the . This is a horribly costly operation, especially when you pass a small ; usually you should only call this if the index is static (will no longer be changed). Note that this requires up to 2X the index size free space in your Directory (3X if you're using compound file format). For example, if your index size is 10 MB then you need up to 20 MB free for this to complete (30 MB if you're using compound file format). Also, it's best to call afterwards, to allow to free up disk space. If some but not all readers re-open while merging is underway, this will cause > 2X temporary space to be consumed as those new readers will then hold open the temporary segments at that time. It is best not to re-open readers while merging is running. The actual temporary usage could be much less than these figures (it depends on many factors). In general, once this completes, the total size of the index will be less than the size of the starting index. It could be quite a bit smaller (if there were many pending deletes) or just slightly smaller. If an is hit, for example due to disk full, the index will not be corrupted and no documents will be lost. However, it may have been partially merged (some segments were merged but not all), and it's possible that one of the segments in the index will be in non-compound format even when using compound file format. This will occur when the is hit during conversion of the segment into compound format. This call will merge those segments present in the index when the call started. If other threads are still adding documents and flushing segments, those newly created segments will not be merged unless you call again. NOTE: if this method hits an you should immediately dispose the writer. See for details. NOTE: if you call with false, which aborts all running merges, then any thread still running this method might hit a .

maximum number of segments left in the index after merging finishes if the index is corrupt if there is a low-level IO error

Just like , except you can specify whether the call should block until all merging completes. This is only meaningful with a that is able to run merges in background threads. NOTE: if this method hits an you should immediately dispose the writer. See for details.

Returns true if any merges in or are merges.

Just like , except you can specify whether the call should block until the operation completes. This is only meaningful with a that is able to run merges in background threads. NOTE: if this method hits an you should immediately dispose the writer. See for details. NOTE: if you call with false, which aborts all running merges, then any thread still running this method might hit a .

Forces merging of all segments that have deleted documents. The actual merges to be executed are determined by the . For example, the default will only pick a segment if the percentage of deleted docs is over 10%. This is often a horribly costly operation; rarely is it warranted. To see how many deletions you have pending in your index, call . NOTE: this method first flushes a new segment (if there are indexed documents), and applies all buffered deletes. NOTE: if this method hits an you should immediately dispose the writer. See for details.

Expert: asks the whether any merges are necessary now and if so, runs the requested merges and then iterate (test again if merges are needed) until no more merges are returned by the . Explicit calls to are usually not necessary. The most common case is when merge policy parameters have changed. this method will call the with . NOTE: if this method hits an you should immediately dispose the writer. See for details.

Expert: to be used by a to avoid selecting merges for segments already being merged. The returned collection is not cloned, and thus is only safe to access if you hold 's lock (which you do when invokes the ). Do not alter the returned collection!

Expert: the calls this method to retrieve the next merge requested by the @lucene.experimental

Expert: returns true if there are merges waiting to be scheduled. @lucene.experimental

Close the without committing any changes that have occurred since the last commit (or since it was opened, if commit hasn't been called). this removes any temporary files that had been created, after which the state of the index will be the same as it was when was last called or when this writer was first opened. This also clears a previous call to .

if there is a low-level IO error

Delete all documents in the index. This method will drop all buffered documents and will remove all segments from the index. This change will not be visible until a has been called. This method can be rolled back using . NOTE: this method is much faster than using DeleteDocuments(new MatchAllDocsQuery()). Yet, this method also has different semantics compared to / since internal data-structures are cleared as well as all segment information is forcefully dropped anti-viral semantics like omitting norms are reset or doc value types are cleared. Essentially a call to is equivalent to creating a new with which a delete query only marks documents as deleted. NOTE: this method will forcefully abort all merges in progress. If other threads are running , or methods, they may receive s.

Wait for any currently outstanding merges to finish. It is guaranteed that any merges started prior to calling this method will have completed once this method completes.

Called whenever the has been updated and the index files referenced exist (correctly) in the index directory.

Checkpoints with , so it's aware of new files, and increments , so on close/commit we will write a new segments file, but does NOT bump segmentInfos.version.

Called internally if any index state has changed.

Atomically adds the segment private delete packet and publishes the flushed segments to the index writer.

Acquires write locks on all the directories; be sure to match with a call to in a finally clause.

Adds all segments from an array of indexes into this index. This may be used to parallelize batch indexing. A large document collection can be broken into sub-collections. Each sub-collection can be indexed in parallel, on a different thread, process or machine. The complete index can then be created by merging sub-collection indexes with this method. NOTE: this method acquires the write lock in each directory, to ensure that no is currently open or tries to open while this is running. This method is transactional in how s are handled: it does not commit a new segments_N file until all indexes are added. this means if an occurs (for example disk full), then either no indexes will have been added or they all will have been. Note that this requires temporary free space in the up to 2X the sum of all input indexes (including the starting index). If readers/searchers are open against the starting index, then temporary free space required will be higher by the size of the starting index (see for details). NOTE: this method only copies the segments of the incoming indexes and does not merge them. Therefore deleted documents are not removed and the new segments are not merged with the existing ones. This requires this index not be among those to be added. NOTE: if this method hits an you should immediately dispose the writer. See for details.

if the index is corrupt if there is a low-level IO error if we were unable to acquire the write lock in at least one directory

Merges the provided indexes into this index. The provided s are not closed. See for details on transactional semantics, temporary free space required in the , and non-CFS segments on an . NOTE: if this method hits an you should immediately dispose the writer. See for details. NOTE: empty segments are dropped by this method and not added to this index. NOTE: this method merges all given s in one merge. If you intend to merge a large number of readers, it may be better to call this method multiple times, each time with a small set of readers. In principle, if you use a merge policy with a mergeFactor or maxMergeAtOnce parameter, you should pass that many readers in one call. Also, if the given readers are s, they can be opened with termIndexInterval=-1 to save RAM, since during merge the in-memory structure is not used. See . NOTE: if you call with false, which aborts all running merges, then any thread still running this method might hit a .

if the index is corrupt if there is a low-level IO error

Copies the segment files as-is into the 's directory.

A hook for extending classes to execute operations after pending added and deleted documents have been flushed to the but before the change is committed (new segments_N file written).

A hook for extending classes to execute operations before pending added and deleted documents are flushed to the .

Expert: prepare for commit. This does the first phase of 2-phase commit. this method does all steps necessary to commit changes since this writer was opened: flushes pending added and deleted docs, syncs the index files, writes most of next segments_N file. After calling this you must call either to finish the commit, or to revert the commit and undo all changes done since the writer was opened. You can also just call directly without first in which case that method will internally call . NOTE: if this method hits an you should immediately dispose the writer. See for details.

Sets the commit user data map. That method is considered a transaction by and will be committed ( even if no other changes were made to the writer instance. Note that you must call this method before , or otherwise it won't be included in the follow-on . NOTE: the dictionary is cloned internally, therefore altering the dictionary's contents after calling this method has no effect.

Returns the commit user data map that was last committed, or the one that was set on .

Used only by commit and prepareCommit, below; lock order is commitLock -> IW

Commits all pending changes (added & deleted documents, segment merges, added indexes, etc.) to the index, and syncs all referenced index files, such that a reader will see the changes and the index updates will survive an OS or machine crash or power loss. Note that this does not wait for any running background merges to finish. This may be a costly operation, so you should test the cost in your application and do it only when really necessary. Note that this operation calls on the index files. That call should not return until the file contents & metadata are on stable storage. For , this calls the OS's fsync. But, beware: some hardware devices may in fact cache writes even during fsync, and return before the bits are actually on stable storage, to give the appearance of faster performance. If you have such a device, and it does not have a battery backup (for example) then on power loss it may still lose data. Lucene cannot guarantee consistency on such devices. NOTE: if this method hits an you should immediately dispose the writer. See for details.

Returns true if there may be changes that have not been committed. There are cases where this may return true when there are no actual "real" changes to the index, for example if you've deleted by or but that or does not match any documents. Also, if a merge kicked off as a result of flushing a new segment during , or a concurrent merged finished, this method may return true right after you had just called .

Ensures only one is actually flushing segments at a time:

Flush all in-memory buffered updates (adds and deletes) to the .

if true, we may merge segments (if deletes or docs were flushed) if necessary whether pending deletes should also

Expert: Return the total size of all index files currently cached in memory. Useful for size management with flushRamDocs()

Expert: Return the number of documents currently buffered in RAM.

Carefully merges deletes and updates for the segments we just merged. This is tricky because, although merging will clear all deletes (compacts the documents) and compact all the updates, new deletes and updates may have been flushed to the segments since the merge was started. This method "carries over" such new deletes and updates onto the newly merged segment, and saves the resulting deletes and updates files (incrementing the delete and DV generations for merge.info). If no deletes were flushed, no new deletes file is saved.

Merges the indicated segments, replacing them in the stack with a single segment. @lucene.experimental

Hook that's called when the specified merge is complete.

Checks whether this merge involves any segments already participating in a merge. If not, this merge is "registered", meaning we record that its segments are now participating in a merge, and true is returned. Else (the merge conflicts) false is returned.

Does initial setup for a merge, which is fast but holds the synchronized lock on instance.

Does fininishing for a merge, which is fast but holds the synchronized lock on instance.

Does the actual (time-consuming) work of the merge, but without holding synchronized lock on instance

Returns a string description of all segments, for debugging. @lucene.internal

Returns a string description of the specified segments, for debugging. @lucene.internal

Returns a string description of the specified segment, for debugging. @lucene.internal

Only for testing. @lucene.internal

Walk through all files referenced by the current and ask the to sync each file, if it wasn't already. If that succeeds, then we prepare a new segments_N file but do not fully commit it.

Returns true iff the index in the named directory is currently locked.

the directory to check for a lock if there is a low-level IO error

Forcibly unlocks the index in the named directory. Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this index.

If has been called (ie, this writer is in near real-time mode), then after a merge completes, this class can be invoked to warm the reader on the newly merged segment, before the merge commits. This is not required for near real-time search, but will reduce search latency on opening a new near real-time reader after a merge completes. @lucene.experimental NOTE: is called before any deletes have been carried over to the merged segment.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Invoked on the for the newly merged segment, before that segment is made visible to near-real-time readers.

Expert: remove any index files that are no longer used. normally deletes unused files itself, during indexing. However, on Windows, which disallows deletion of open files, if there is a reader open on the index then those files cannot be deleted. This is fine, because will periodically retry the deletion. However, doesn't try that often: only on open, close, flushing a new segment, and finishing a merge. If you don't do any of these actions with your , you'll see the unused files linger. If that's a problem, call this method to delete them (once you've closed the open readers that were preventing their deletion). In addition, you can call this method to delete unreferenced index commits. this might be useful if you are using an which holds onto index commits until some criteria are met, but those commits are no longer needed. Otherwise, those commits will be deleted the next time is called.

NOTE: this method creates a compound file for all files returned by info.files(). While, generally, this may include separate norms and deletion files, this must not reference such files when this method is called, because they are not allowed within a compound file.

Tries to delete the given files if unreferenced

the files to delete if an occurs

Cleans up residuals from a segment that could not be entirely flushed due to an error

Interface for internal atomic events. See for details. Events are executed concurrently and no order is guaranteed. Each event should only rely on the serializeability within it's process method. All actions that must happen before or after a certain action must be encoded inside the method.

Processes the event. this method is called by the passed as the first argument.

the that executes the event. false iff this event should not trigger any segment merges true iff this event should clear all buffers associated with the event. if an occurs

Used only by asserts: returns true if the file exists (can be opened), false if it cannot be opened, and (unlike ) throws if there's some unexpected error.

Holds all the configuration that is used to create an . Once has been created with this object, changes to this object will not affect the instance. For that, use that is returned from . LUCENENET NOTE: Unlike Lucene, we use property setters instead of setter methods. In C#, this allows you to initialize the using the language features of C#, for example:


                 IndexWriterConfig conf = new IndexWriterConfig(analyzer)
                 {
                     Codec = Lucene46Codec(),
                     OpenMode = OpenMode.CREATE
                 };

However, if you prefer to match the syntax of Lucene using chained setter methods, there are extension methods in the Lucene.Net.Index.Extensions namespace. Example usage:


                 using Lucene.Net.Index.Extensions;
                 
                 ..
                 
                 IndexWriterConfig conf = new IndexWriterConfig(analyzer)
                     .SetCodec(new Lucene46Codec())
                     .SetOpenMode(OpenMode.CREATE);

@since 3.1

Default value is 32. Change using setter.

Denotes a flush trigger is disabled.

Disabled by default (because IndexWriter flushes by RAM usage by default).

Default value is 16 MB (which means flush when buffered docs consume approximately 16 MB RAM).

Default value for the write lock timeout (1,000 ms).

Default setting for .

Default value is 1. Change using setter.

Default value is 1945. Change using setter.

The maximum number of simultaneous threads that may be indexing documents at once in ; if more than this many threads arrive they will wait for others to finish. Default value is 8.

Default value for compound file system for newly written segments (set to true). For batch indexing with very large ram buffers use false

Default value for calling before merging segments (set to false). You can set this to true for additional safety.

Gets or sets the default (for any instance) maximum time to wait for a write lock (in milliseconds).

Gets or sets the this config is attached to.

if this config is already attached to a writer.

Creates a new config that with defaults that match the specified as well as the default . If is >= , is used for merging; else . Note that is free to select non-contiguous merges, which means docIDs may not remain monotonic over time. If this is a problem you should switch to or .

Specifies of the index. Only takes effect when is first created.

Expert: allows an optional implementation to be specified. You can use this to control when prior commits are deleted from the index. The default policy is which removes all prior commits as soon as a new commit is done (this matches behavior before 2.2). Creating your own policy can allow you to explicitly keep previous "point in time" commits alive in the index for some time, to allow readers to refresh to the new commit without having the old commit deleted out from under them. This is necessary on filesystems like NFS that do not support "delete on last close" semantics, which Lucene's "point in time" search normally relies on. NOTE: the deletion policy cannot be null. Only takes effect when IndexWriter is first created.

Expert: allows to open a certain commit point. The default is null which opens the latest commit point. Only takes effect when is first created.

Expert: set the implementation used by this . NOTE: the similarity cannot be null. Only takes effect when is first created.

Expert: Gets or sets the merge scheduler used by this writer. The default is . NOTE: the merge scheduler cannot be null. Only takes effect when is first created.

Gets or sets the maximum time to wait for a write lock (in milliseconds) for this instance. You can change the default value for all instances by calling the setter. Only takes effect when is first created.

Gets or sets the . Only takes effect when is first created.

Expert: is invoked whenever there are changes to the segments in the index. Its role is to select which merges to do, if any, and return a describing the merges. It also selects merges to do for . Only takes effect when is first created.

Expert: Gets or sets the instance used by the to assign thread-states to incoming indexing threads. If no is set will use with max number of thread-states set to (see ). NOTE: The given instance must not be used with other instances once it has been initialized / associated with an . NOTE: this only takes effect when is first created.

Gets or sets the max number of simultaneous threads that may be indexing documents at once in . Values < 1 are invalid and if passed maxThreadStates will be set to . Only takes effect when is first created.

By default, does not pool the s it must open for deletions and merging, unless a near-real-time reader has been obtained by calling . this setting lets you enable pooling without getting a near-real-time reader. NOTE: if you set this to false, will still pool readers once is called. Only takes effect when is first created.

Expert: Gets or sets the chain to be used to process documents. Only takes effect when is first created.

Expert: Gets or sets the maximum memory consumption per thread triggering a forced flush if exceeded. A is forcefully flushed once it exceeds this limit even if the has not been exceeded. This is a safety limit to prevent a from address space exhaustion due to its internal 32 bit signed integer based memory addressing. The given value must be less that 2GB (2048MB).

Expert: Controls when segments are flushed to disk during indexing. The initialized during instantiation and once initialized the given instance is bound to this and should not be used with another writer.

Information about merges, deletes and a message when maxFieldLength is reached will be printed to this. Must not be null, but may be used to supress output.

Convenience method that uses to write to the passed in . Must not be null.

Specifies the open mode for .

Creates a new index or overwrites an existing one.

Opens an existing index.

Creates a new index if one does not exist, otherwise it opens the index and documents will be appended.

Abort (called after hitting AbortException)

Flush a new segment

This implementation that keeps only the most recent commit and immediately removes all prior commits after a new commit is done. This is the default deletion policy.

Sole constructor.

Deletes all commits except the most recent one.

Holds all the configuration used by with few setters for settings that can be changed on an instance "live". @since 4.0

controlling when commit points are deleted.

that is opened on.

that is opened with.

to use when encoding norms.

to use for running merges.

Timeout when trying to obtain the write lock on init.

that determines how documents are indexed.

used to write new segments.

for debugging messages.

for selecting merges.

to control how threads are allocated to .

True if readers should be pooled.

to control when segments are flushed.

Sets the hard upper bound on RAM usage for a single segment, after which the segment is forced to flush.

that should emulate.

True if segment flushes should use compound file format

True if merging should check integrity of segments before merge

Creates a new config that that handles the live settings.

Gets the default analyzer to use for indexing documents.

Expert: Gets or sets the interval between indexed terms. Large values cause less memory to be used by , but slow random-access to terms. Small values cause more memory to be used by an , and speed random-access to terms. This parameter determines the amount of computation required per query term, regardless of the number of documents that contain that term. In particular, it is the maximum number of other terms that must be scanned before a term is located and its frequency and position information may be processed. In a large index with user-entered query terms, query processing time is likely to be dominated not by term lookup but rather by the processing of frequency and positional data. In a small index or when many uncommon query terms are generated (e.g., by wildcard queries) term lookup may become a dominant cost. In particular, numUniqueTerms/interval terms are read into memory by an , and, on average, interval/2 terms must be scanned for each random term access. Takes effect immediately, but only applies to newly flushed/merged segments. NOTE: this parameter does not apply to all implementations, including the default one in this release. It only makes sense for term indexes that are implemented as a fixed gap between terms. For example, implements the term index instead based upon how terms share prefixes. To configure its parameters (the minimum and maximum size for a block), you would instead use . which can also be configured on a per-field basis:


             public class MyLucene45Codec : Lucene45Codec
             {
                 //customize Lucene41PostingsFormat, passing minBlockSize=50, maxBlockSize=100
                 private readonly PostingsFormat tweakedPostings = new Lucene41PostingsFormat(50, 100);
             
                 public override PostingsFormat GetPostingsFormatForField(string field)
                 {
                     if (field.Equals("fieldWithTonsOfTerms", StringComparison.Ordinal))
                         return tweakedPostings;
                     else
                         return base.GetPostingsFormatForField(field);
                 }
             }
             ...
             
             iwc.Codec = new MyLucene45Codec();

Note that other implementations may have their own parameters, or no parameters at all.

Gets or sets a value that determines the maximum number of delete-by-term operations that will be buffered before both the buffered in-memory delete terms and queries are applied and flushed. Disabled by default (writer flushes by RAM usage). NOTE: this setting won't trigger a segment flush. Takes effect immediately, but only the next time a document is added, updated or deleted. Also, if you only delete-by-query, this setting has no effect, i.e. delete queries are buffered until the next segment is flushed.

if maxBufferedDeleteTerms is enabled but smaller than 1

Gets or sets a value that determines the amount of RAM that may be used for buffering added documents and deletions before they are flushed to the . Generally for faster indexing performance it's best to flush by RAM usage instead of document count and use as large a RAM buffer as you can. When this is set, the writer will flush whenever buffered documents and deletions use this much RAM. Pass in to prevent triggering a flush due to RAM usage. Note that if flushing by document count is also enabled, then the flush will be triggered by whichever comes first. The maximum RAM limit is inherently determined by the runtime's available memory. Yet, an session can consume a significantly larger amount of memory than the given RAM limit since this limit is just an indicator when to flush memory resident documents to the . Flushes are likely happen concurrently while other threads adding documents to the writer. For application stability the available memory in the runtime should be significantly larger than the RAM buffer used for indexing. NOTE: the account of RAM usage for pending deletions is only approximate. Specifically, if you delete by , Lucene currently has no way to measure the RAM usage of individual Queries so the accounting will under-estimate and you should compensate by either calling periodically yourself, or by setting to flush and apply buffered deletes by count instead of RAM usage (for each buffered delete a constant number of bytes is used to estimate RAM usage). Note that enabling will not trigger any segment flushes. NOTE: It's not guaranteed that all memory resident documents are flushed once this limit is exceeded. Depending on the configured only a subset of the buffered documents are flushed and therefore only parts of the RAM buffer is released. The default value is . Takes effect immediately, but only the next time a document is added, updated or deleted.

if ramBufferSizeMB is enabled but non-positive, or it disables ramBufferSizeMB when maxBufferedDocs is already disabled

Gets or sets a value that determines the minimal number of documents required before the buffered in-memory documents are flushed as a new Segment. Large values generally give faster indexing. When this is set, the writer will flush every maxBufferedDocs added documents. Pass in to prevent triggering a flush due to number of buffered documents. Note that if flushing by RAM usage is also enabled, then the flush will be triggered by whichever comes first. Disabled by default (writer flushes by RAM usage). Takes effect immediately, but only the next time a document is added, updated or deleted.

if maxBufferedDocs is enabled but smaller than 2, or it disables maxBufferedDocs when ramBufferSizeMB is already disabled

Gets or sets the merged segment warmer. See . Takes effect on the next merge.

Gets or sets the termsIndexDivisor passed to any readers that opens, for example when applying deletes or creating a near-real-time reader in . If you pass -1, the terms index won't be loaded by the readers. This is only useful in advanced situations when you will only .Next() through all terms; attempts to seek will hit an exception. Takes effect immediately, but only applies to readers opened after this call NOTE: divisor settings > 1 do not apply to all implementations, including the default one in this release. It only makes sense for terms indexes that can efficiently re-sample terms at load time.

Gets the set by setter.

Gets the specified in setter or the default

Gets the as specified in setter or the default, null which specifies to open the latest index commit point.

Expert: returns the implementation used by this .

Returns the that was set by setter.

Returns allowed timeout when acquiring the write lock.

Returns the current .

Returns the current in use by this writer.

Returns the configured instance.

Returns the max number of simultaneous threads that may be indexing documents at once in .

Returns true if should pool readers even if has not been called.

Returns the indexing chain set on .

Returns the max amount of memory each can consume until forcefully flushed.

Returns used for debugging.

Gets or sets if the should pack newly written segments in a compound file. Default is true. Use false for batch indexing with very large RAM buffer settings. Note: To control compound file usage during segment merges see and . This setting only applies to newly created segments.

Gets or sets if should call on existing segments before merging them into a new one. Use true to enable this safety check, which can help reduce the risk of propagating index corruption from older segments into new ones, at the expense of slower merging.

This is a that measures size of a segment as the total byte size of the segment's files.

Default minimum segment size.

Default maximum segment size. A segment of this size or larger will never be merged.

Default maximum segment size. A segment of this size or larger will never be merged during .

Sole constructor, setting all settings to their defaults.

Determines the largest segment (measured by total byte size of the segment's files, in MB) that may be merged with other segments. Small values (e.g., less than 50 MB) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches. Note that is also used to check whether a segment is too large for merging (it's either or).

Determines the largest segment (measured by total byte size of the segment's files, in MB) that may be merged with other segments during forceMerge. Setting it low will leave the index with more than 1 segment, even if is called.

Sets the minimum size for the lowest level segments. Any segments below this size are considered to be on the same level (even if they vary drastically in size) and will be merged whenever there are mergeFactor of them. This effectively truncates the "long tail" of small segments that would otherwise be created into a single level. If you set this too large, it could greatly increase the merging cost during indexing (if you flush many small segments).

This is a that measures size of a segment as the number of documents (not taking deletions into account).

Default minimum segment size.

Sole constructor, setting all settings to their defaults.

This class implements a that tries to merge segments into levels of exponentially increasing size, where each level has fewer segments than the value of the merge factor. Whenever extra segments (beyond the merge factor upper bound) are encountered, all segments within the level are merged. You can get or set the merge factor using . This class is abstract and requires a subclass to define the method which specifies how a segment's size is determined. is one subclass that measures size by document count in the segment. is another subclass that measures size as the total byte size of the file(s) for the segment.

Defines the allowed range of log(size) for each level. A level is computed by taking the max segment log size, minus LEVEL_LOG_SPAN, and finding all segments falling within that range.

Default merge factor, which is how many segments are merged at a time

Default maximum segment size. A segment of this size or larger will never be merged.

Default noCFSRatio. If a merge's size is >= 10% of the index, then we disable compound file for it.

How many segments to merge at a time.

Any segments whose size is smaller than this value will be rounded up to this value. This ensures that tiny segments are aggressively merged.

If the size of a segment exceeds this value then it will never be merged.

If the size of a segment exceeds this value then it will never be merged during .

If a segment has more than this many documents then it will never be merged.

If true, we pro-rate a segment's size by the percentage of non-deleted documents.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns true if is enabled in .

Print a debug message to .

Gets or Sets the number of segments that are merged at once and also controls the total number of segments allowed to accumulate in the index. This determines how often segment indices are merged by . With smaller values, less RAM is used while indexing, and searches are faster, but indexing speed is slower. With larger values, more RAM is used during indexing, and while searches is slower, indexing is faster. Thus larger values (> 10) are best for batch index creation, and smaller values (< 10) for indices that are interactively maintained.

Gets or Sets whether the segment size should be calibrated by the number of deletes when choosing segments for merge.

Return the number of documents in the provided , pro-rated by percentage of non-deleted documents if is set.

Return the byte size of the provided , pro-rated by percentage of non-deleted documents if is set.

Returns true if the number of segments eligible for merging is less than or equal to the specified .

Returns the merges necessary to the index. this method constraints the returned merges only by the parameter, and guaranteed that exactly that number of segments will remain in the index.

Returns the merges necessary to merge the index down to a specified number of segments. this respects the setting. By default, and assuming maxNumSegments=1, only one segment will be left in the index, where that segment has no deletions pending nor separate norms, and it is in compound file format if the current useCompoundFile setting is true. This method returns multiple merges (mergeFactor at a time) so the in use may make use of concurrency.

Finds merges necessary to force-merge all deletes from the index. We simply merge adjacent segments that have deletes, up to mergeFactor at a time.

Checks if any merges are now necessary and returns a if so. A merge is necessary when there are more than segments at a given level. When multiple levels have too many segments, this method will return multiple merges, allowing the to use concurrency.

Determines the largest segment (measured by document count) that may be merged with other segments. Small values (e.g., less than 10,000) are best for interactive indexing, as this limits the length of pauses while indexing to a few seconds. Larger values are best for batched indexing and speedier searches. The default value is . The default merge policy () also allows you to set this limit by net size (in MB) of the segment, using .

Expert: a determines the sequence of primitive merge operations. Whenever the segments in an index have been altered by , either the addition of a newly flushed segment, addition of many segments from AddIndexes* calls, or a previous merge that may now need to cascade, invokes to give the a chance to pick merges that are now required. This method returns a instance describing the set of merges that should be done, or null if no merges are necessary. When is called, it calls and the should then return the necessary merges. Note that the policy can return more than one merge at a time. In this case, if the writer is using , the merges will be run sequentially but if it is using they will be run concurrently. The default MergePolicy is . @lucene.experimental

A map of doc IDs.

Sole constructor, typically invoked from sub-classes constructors.

Return the new doc ID according to its old value.

Useful from an assert.

OneMerge provides the information necessary to perform an individual primitive merge operation, resulting in a single new segment. The merge spec includes the subset of segments to be merged as well as whether the new segment should use the compound file format.

Estimated size in bytes of the merged segment.

Segments to be merged.

Number of documents in the merged segment.

Sole constructor.

List of s to be merged.

Expert: Get the list of readers to merge. Note that this list does not necessarily match the list of segments to merge and should only be used to feed SegmentMerger to initialize a merge. When a reorders doc IDs, it must override too so that deletes that happened during the merge can be applied to the newly merged segment.

Expert: Sets the of this . Allows sub-classes to e.g. set diagnostics properties.

Expert: If reorders document IDs, this method must be overridden to return a mapping from the natural doc ID (the doc ID that would result from a natural merge) to the actual doc ID. This mapping is used to apply deletions that happened during the merge to the new segment.

Record that an exception occurred while executing this merge

Mark this merge as aborted. If this is called before the merge is committed then the merge will not be committed.

Returns true if this merge was aborted.

Called periodically by while merging to see if the merge is aborted.

Set or clear whether this merge is paused paused (for example will pause merges if too many are running).

Returns true if this merge is paused.

Returns a readable description of the current merge state.

Returns the total size in bytes of this merge. Note that this does not indicate the size of the merged segment, but the input total size. This is only set once the merge is initialized by .

Returns the total number of documents that are included with this merge. Note that this does not indicate the number of documents after the merge.

Return describing this merge.

A instance provides the information necessary to perform multiple merges. It simply contains a list of instances.

The subset of segments to be included in the primitive merge.

Sole constructor. Use to add merges.

Adds the provided to this specification.

Returns a description of the merges in this specification.

Exception thrown if there are any problems while executing a merge.

Create a .

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Returns the of the index that hit the exception.

Thrown when a merge was explicity aborted because was called with false. Normally this exception is privately caught and suppresed by .

Create a .

Create a with a specified message.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Default ratio for compound file system usage. Set to 1.0, always use compound file system.

Default max segment size in order to use compound file system. Set to .

that contains this instance.

If the size of the merge segment exceeds this ratio of the total index size then it will remain in non-compound format

If the size of the merged segment exceeds this value then it will not use compound file format.

Creates a new merge policy instance. Note that if you intend to use it without passing it to , you should call .

Creates a new merge policy instance with default settings for and . This ctor should be used by subclasses using different defaults than the

Sets the to use by this merge policy. This method is allowed to be called only once, and is usually set by . If it is called more than once, is thrown.

Determine what set of merge operations are now necessary on the index. calls this whenever there is a change to the segments. This call is always synchronized on the instance so only one thread at a time will call this method.

the event that triggered the merge the total set of segments in the index

Determine what set of merge operations is necessary in order to merge to <= the specified segment count. calls this when its method is called. This call is always synchronized on the instance so only one thread at a time will call this method.

The total set of segments in the index Requested maximum number of segments in the index (currently this is always 1) Contains the specific instances that must be merged away. This may be a subset of all SegmentInfos. If the value is true for a given , that means this segment was an original segment present in the to-be-merged index; else, it was a segment produced by a cascaded merge.

Determine what set of merge operations is necessary in order to expunge all deletes from the index.

the total set of segments in the index

Release all resources for the policy.

Returns true if a new segment (regardless of its origin) should use the compound file format. The default implementation returns true iff the size of the given mergedInfo is less or equal to and the size is less or equal to the TotalIndexSize * otherwise false.

Return the byte size of the provided , pro-rated by percentage of non-deleted documents is set.

Returns true if this single info is already fully merged (has no pending deletes, is in the same dir as the writer, and matches the current compound file setting

Gets or Sets current . If a merged segment will be more than this percentage of the total size of the index, leave the segment as non-compound file even if compound file is enabled. Set to 1.0 to always use CFS regardless of merge size.

Gets or Sets the largest size allowed for a compound file segment. If a merged segment will be more than this value, leave the segment as non-compound file even if compound file is enabled. Set this to (default) and to 1.0 to always use CFS regardless of merge size.

Expert: uses an instance implementing this interface to execute the merges selected by a . The default MergeScheduler is . Implementers of sub-classes should make sure that returns an independent instance able to work with any instance. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Run the merges provided by .

the to obtain the merges from. the that caused this merge to happen true iff any new merges were found by the caller; otherwise false

Dispose this MergeScheduler.

Holds common state used during segment merging. @lucene.experimental

Remaps docids around deletes during merge

Returns the mapped docID corresponding to the provided one.

Returns the total number of documents, ignoring deletions.

Returns the number of not-deleted documents.

Returns the number of deleted documents.

Returns true if there are any deletions.

Creates a instance appropriate for this reader.

of the newly merged segment.

Readers being merged.

Maps docIDs around deletions.

New docID base per reader.

Holds the instance, which is invoked periodically to see if the merge has been aborted.

for debugging messages.

s that have identical field name/number mapping, so their stored fields and term vectors may be bulk merged.

How many are set.

Sole constructor.

Class for recording units of work when merging segments.

Creates a instance.

Records the fact that roughly units amount of work have been done since this method was last called. When adding time-consuming code into , you should test different values for units to ensure that the time in between calls to merge.CheckAborted is up to ~ 1 second.

If you use this: IW.Dispose(false) cannot abort your merge! @lucene.internal

is passed to to indicate the event that triggered the merge.

LUCENENET-specific value to be used instead of null.

Merge was triggered by a segment flush.

Merge was triggered by a full flush. Full flushes can be caused by a commit, NRT reader reopen or a call on the index writer.

Merge has been triggered explicitly by the user.

Merge was triggered by a successfully finished merge.

Merge was triggered by a disposing .

Concatenates multiple together, on every lookup. NOTE: this is very costly, as every lookup must do a binary search to locate the right sub-reader. @lucene.experimental

Represents a sub-Bits from .

Returns a sub-Bits matching the provided Because null usually has a special meaning for (e.g. no deleted documents), you must check instead to ensure the sub was actually found.

Exposes flex API, merged from flex API of sub-segments. @lucene.experimental

Sole constructor.

Returns true if this instance can be reused by the provided .

Re-use and reset this instance on the provided slices.

How many sub-readers we are merging.

Returns sub-readers we are merging.

Holds a along with the corresponding .

for this sub-reader.

describing how this sub-reader fits into the composite reader.

Exposes , merged from API of sub-segments. @lucene.experimental

Sole constructor

The that created us. How many sub-readers are being merged.

Returns true if this instance can be reused by the provided .

How many sub-readers we are merging.

Returns sub-readers we are merging.

Holds a along with the corresponding .

of this sub-reader.

describing how this sub-reader fits into the composite reader.

A wrapper for providing access to . NOTE: for multi readers, you'll get better performance by gathering the sub readers using to get the atomic leaves and then operate per-AtomicReader, instead of using this class. NOTE: this is very costly. @lucene.experimental @lucene.internal

Returns a for a reader's norms (potentially merging on-the-fly). This is a slow way to access normalization values. Instead, access them per-segment with

Returns a for a reader's docvalues (potentially merging on-the-fly) This is a slow way to access numeric values. Instead, access them per-segment with

Returns a for a reader's docsWithField (potentially merging on-the-fly) This is a slow way to access this bitset. Instead, access them per-segment with

Returns a for a reader's docvalues (potentially merging on-the-fly) This is a slow way to access binary values. Instead, access them per-segment with

Returns a for a reader's docvalues (potentially doing extremely slow things). this is an extremely slow way to access sorted values. Instead, access them per-segment with

Returns a for a reader's docvalues (potentially doing extremely slow things). This is an extremely slow way to access sorted values. Instead, access them per-segment with

maps per-segment ordinals to/from global ordinal space

Creates an ordinal map that allows mapping ords to/from a merged space from subs.

a cache key s that support . They need not be dense (e.g. can be FilteredTermsEnums). if an I/O error occurred.

Given a segment number and segment ordinal, returns the corresponding global ordinal.

Given global ordinal, returns the ordinal of the first segment which contains this ordinal (the corresponding to the segment return ).

Given a global ordinal, returns the index of the first segment that contains this term.

Returns the total number of unique terms in global ord space.

Returns total byte size used by this ordinal map.

Implements over n subs, using an @lucene.internal

docbase for each leaf: parallel with

leaf values

ordinal map mapping ords from values to global ord space

Creates a new over

Implements over n subs, using an @lucene.internal

docbase for each leaf: parallel with

leaf values

ordinal map mapping ords from values to global ord space

Creates a new over

Exposes flex API, merged from flex API of sub-segments. This is useful when you're interacting with an implementation that consists of sequential sub-readers (eg or ). NOTE: for composite readers, you'll get better performance by gathering the sub readers using to get the atomic leaves and then operate per-AtomicReader, instead of using this class. @lucene.experimental

Returns a single instance for this reader, merging fields/terms/docs/positions on the fly. This method will return null if the reader has no postings. NOTE: this is a slow way to access postings. It's better to get the sub-readers and iterate through them yourself.

Returns a single instance for this reader, merging live Documents on the fly. This method will return null if the reader has no deletions. NOTE: this is a very slow way to access live docs. For example, each access will require a binary search. It's better to get the sub-readers and iterate through them yourself.

this method may return null if the field does not exist.

Returns for the specified field & term. This will return null if the field or term does not exist.

Returns for the specified field & term, with control over whether freqs are required. Some codecs may be able to optimize their implementation when freqs are not required. This will return null if the field or term does not exist. See .

Returns for the specified field & term. This will return null if the field or term does not exist or positions were not indexed.

Returns for the specified field & term, with control over whether offsets and payloads are required. Some codecs may be able to optimize their implementation when offsets and/or payloads are not required. This will return null if the field or term does not exist or positions were not indexed. See .

Expert: construct a new instance directly. @lucene.internal

Call this to get the (merged) for a composite reader. NOTE: the returned field numbers will likely not correspond to the actual field numbers in the underlying readers, and codec metadata () will be unavailable.

Call this to get the (merged) representing the set of indexed fields only for a composite reader. NOTE: the returned field numbers will likely not correspond to the actual field numbers in the underlying readers, and codec metadata () will be unavailable.

A which reads multiple indexes, appending their content. It can be used to create a view on several sub-readers (like ) and execute searches on it. For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a unique document in the index. These document numbers are ephemeral -- they may change as documents are added to and deleted from an index. Clients should thus not rely on a given document having the same number between sessions. NOTE: instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the instance; use your own (non-Lucene) objects instead.

Construct a aggregating the named set of (sub)readers. Note that all subreaders are closed if this Multireader is closed.

set of (sub)readers

Construct a aggregating the named set of (sub)readers.

set of (sub)readers; this array will be cloned. indicates whether the subreaders should be disposed when this is disposed

Exposes flex API, merged from flex API of sub-segments. @lucene.experimental

Sole constructor.

The instances of all sub-readers. A parallel array (matching ) describing the sub-reader slices.

Exposes API, merged from API of sub-segments. This does a merge sort, by term text, of the sub-readers. @lucene.experimental

Returns how many sub-reader slices contain the current term.

Returns sub-reader slices positioned to the current term.

Initializes a new instance of with the specified .

Which sub-reader slices we should merge. is null.

The terms array must be newly created , ie has not yet been called.

An which keeps all index commits around, never deleting them. This class is a singleton and can be accessed by referencing .

The single instance of this class.

A which never returns merges to execute (hence it's name). It is also a singleton and can be accessed through if you want to indicate the index does not use compound files, or through otherwise. Use it if you want to prevent an from ever executing merges, without going through the hassle of tweaking a merge policy's settings to achieve that, such as changing its merge factor.

A singleton which indicates the index does not use compound files.

A singleton which indicates the index uses compound files.

A which never executes any merges. It is also a singleton and can be accessed through . Use it if you want to prevent an from ever executing merges, regardless of the used. Note that you can achieve the same thing by using , however with you also ensure that no unnecessary code of any implementation is ever executed. Hence it is recommended to use both if you want to disable merges from ever happening.

The single instance of

Writes norms. Each thread X field accumulates the norms for the doc/fields it saw, then the flush method below merges all of these together into a single _X.nrm file.

A per-document numeric value.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns the numeric value for the specified document ID.

document ID to lookup numeric value

A which holds updates of documents, of a single . @lucene.experimental

Buffers up pending long per doc, then flushes when segment flushes.

An ordinal based @lucene.experimental

Term ordinal, i.e. it's position in the full list of sorted terms.

Sole constructor.

An which reads multiple, parallel indexes. Each index added must have the same number of documents, but typically each contains different fields. Deletions are taken from the first reader. Each document contains the union of the fields of all documents with the same document number. When searching, matches for a query term are from the first index added that has the field. This is useful, e.g., with collections that have large fields which change rarely and small fields that change more frequently. The smaller fields may be re-indexed in a new index and both indexes may be searched together. Warning: It is up to you to make sure all indexes are created and modified the same way. For example, if you add documents to one index, you need to add the same documents in the same order to the other indexes. Failure to do so will result in undefined behavior.

Create a based on the provided readers; auto-disposes the given on .

Create a based on the provided .

Expert: create a based on the provided and ; when a document is loaded, only will be used.

Get the describing all fields in this reader. NOTE: the returned field numbers will likely not correspond to the actual field numbers in the underlying readers, and codec metadata ( will be unavailable.

A which reads multiple, parallel indexes. Each index added must have the same number of documents, and exactly the same hierarchical subreader structure, but typically each contains different fields. Deletions are taken from the first reader. Each document contains the union of the fields of all documents with the same document number. When searching, matches for a query term are from the first index added that has the field. This is useful, e.g., with collections that have large fields which change rarely and small fields that change more frequently. The smaller fields may be re-indexed in a new index and both indexes may be searched together. Warning: It is up to you to make sure all indexes are created and modified the same way. For example, if you add documents to one index, you need to add the same documents in the same order to the other indexes. Failure to do so will result in undefined behavior. A good strategy to create suitable indexes with is to use , as this one does not reorder documents during merging (like ) and triggers merges by number of documents per segment. If you use different s it might happen that the segment structure of your index is no longer predictable.

Create a based on the provided readers; auto-disposes the given on .

Create a based on the provided .

Expert: create a based on the provided and ; when a document is loaded, only will be used.

A which adds a persistence layer so that snapshots can be maintained across the life of an application. The snapshots are persisted in a and are committed as soon as or is called. NOTE: Sharing s that write to the same directory across s will corrupt snapshots. You should make sure every has its own and that they all write to a different . It is OK to use the same that holds the index. This class adds a method to release commits from a previous snapshot's . @lucene.experimental

Prefix used for the save file.

wraps another to enable flexible snapshotting, passing by default.

the that is used on non-snapshotted commits. Snapshotted commits, by definition, are not deleted until explicitly released via . the which will be used to persist the snapshots information.

wraps another to enable flexible snapshotting.

the that is used on non-snapshotted commits. Snapshotted commits, by definition, are not deleted until explicitly released via . the which will be used to persist the snapshots information. specifies whether a new index should be created, deleting all existing snapshots information (immediately), or open an existing index, initializing the class with the snapshots information.

Snapshots the last commit. Once this method returns, the snapshot information is persisted in the directory.

Deletes a snapshotted commit. Once this method returns, the snapshot information is persisted in the directory.

Deletes a snapshotted commit by generation. Once this method returns, the snapshot information is persisted in the directory.

Returns the file name the snapshots are currently saved to, or null if no snapshots have been saved.

Reads the snapshots information from the given . This method can be used if the snapshots information is needed, however you cannot instantiate the deletion policy (because e.g., some other process keeps a lock on the snapshots directory).

Prefix codes term instances (prefixes are shared) @lucene.experimental

size in bytes iterator over the bytes

Builds a : call add repeatedly, then finish.

add a term

finalized form

Extension of that supports random access to the ordinals of a document. Operations via this API are independent of the iterator api () and do not impact its state. Codecs can optionally extend this API if they support constant-time access to ordinals for the document.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Retrieve the ordinal for the current document (previously set by at the specified index. An index ranges from 0 to Cardinality-1. The first ordinal value is at index 0, the next at index 1, and so on, as for array indexing.

index of the ordinal for the document. ordinal for the document at the specified index.

Gets the cardinality for the current document (previously set by .

Utility class to safely share instances across multiple threads, while periodically reopening. This class ensures each reader is disposed only once all threads have finished using it. @lucene.experimental

Creates and returns a new from the given .

the to open the from. If true, all buffered deletes will be applied (made visible) in the / . If false, the deletes may or may not be applied, but remain buffered (in ) so that they will be applied in the future. Applying deletes can be costly, so if your app can tolerate deleted documents being returned you might gain some performance by passing false. See . If there is a low-level I/O error

Creates and returns a new from the given .

the directory to open the on. If there is a low-level I/O error

Used by to hold open s (for searching or merging), plus pending deletes and updates, for a given segment

Returns a .

Returns a ref to a clone. NOTE: you should the reader when you're done (ie do not call ).

NOTE: This was getLongEnumerable() in Lucene

Returns a reader for merge. this method applies field updates if there are any and marks that this segment is currently merging.

Drops all merging updates. Called from IndexWriter after this segment finished merging (whether successfully or not).

Returns updates that came in while this segment was merging.

Subreader slice from a parent composite reader. @lucene.internal

Zero-length array.

Document ID this slice starts from.

Number of documents in this slice.

Sub-reader index for this slice.

Sole constructor.

Common util methods for dealing with s and s. @lucene.internal

Walks up the reader tree and return the given context's top level reader context, or in other words the reader tree's root context.

Returns index of the searcher/reader for document n in the array used to construct this searcher/reader.

Embeds a [read-only] and adds per-commit fields. @lucene.experimental

The that we wrap.

Sole constructor.

that we wrap number of deleted documents in this segment deletion generation number (used to name deletion files) generation number (used to name field-infos files)

Returns the per generation updates files.

Sets the updates file names per generation. Does not deep clone the map.

Called when we succeed in writing deletes

Called if there was an exception while writing deletes, so that we don't try to write to the same file more than once.

Called when we succeed in writing a new generation.

Called if there was an exception while writing a new generation of , so that we don't try to write to the same file more than once.

Returns total size in bytes of all files for this segment. NOTE: this value is not correct for 3.0 segments that have shared docstores. To get the correct value, upgrade!

Returns all files in use by this segment.

Returns true if there are any deletions for the segment at this commit.

Returns true if there are any field updates for the segment in this commit.

Returns the next available generation number of the files.

Returns the generation number of the field infos file or -1 if there are no field updates yet.

Returns the next available generation number of the live docs file.

Returns generation number of the live docs file or -1 if there are no deletes yet.

Returns the number of deleted docs in the segment.

Returns a description of this segment.

Holds core readers that are shared (unchanged) when is cloned or reopened

Returns approximate RAM bytes used

Manages the held by and keeps track of their reference counting.

Returns the for the given generation.

Decrement the reference count of the given generations.

Information about a segment such as it's name, directory, and files related to the segment. @lucene.experimental

Used by some member fields to mean not present (e.g., norms, deletions).

Used by some member fields to mean present (e.g., norms, deletions).

Unique segment name in the directory.

Where this segment resides.

Gets or Sets diagnostics saved into the segment when it was written.

Construct a new complete instance from input. Note: this is public only to allow access from the codecs package.

Gets or Sets whether this segment is stored as a compound file. true if this is a compound file; else, false

Gets or Sets that wrote this segment. Setter can only be called once.

Returns number of documents in this segment (deletions are not taken into account).

Return all files referenced by this .

Used for debugging. Format may suddenly change. Current format looks like _a(3.1):c45/4, which means the segment's name is _a; it was created with Lucene 3.1 (or '?' if it's unknown); it's using compound file format (would be C if not compound); it has 45 documents; it has 4 deletions (this part is left off when there are no deletions).

We consider another instance equal if it has the same dir and same name.

Used by DefaultSegmentInfosReader to upgrade a 3.0 segment to record its version is "3.0". this method can be removed when we're not required to support 3x indexes anymore, e.g. in 5.0. NOTE: this method is used for internal purposes only - you should not modify the version of a , or it may result in unexpected exceptions thrown when you attempt to open the index. @lucene.internal

Sets the files written for this segment.

Add these files to the set of files written for this segment.

Add this file to the set of files written for this segment.

Get a codec attribute value, or null if it does not exist

Puts a codec attribute value. This is a key-value mapping for the field that the codec can use to store additional metadata, and will be available to the codec when reading the segment via If a value already exists for the field, it will be replaced with the new value.

Returns the internal codec attributes map. May be null if no mappings exist.

A collection of segmentInfo objects with methods for operating on those segments in relation to the file system. The active segments in the index are stored in the segment info file, segments_N. There may be one or more segments_N files in the index; however, the one with the largest generation is the active one (when older segments_N files are present it's because they temporarily cannot be deleted, or, a writer is in the process of committing, or a custom is in use). This file lists each segment by name and has details about the codec and generation of deletes. There is also a file segments.gen. this file contains the current generation (the _N in segments_N) of the index. This is used only as a fallback in case the current generation cannot be accurately determined by directory listing alone (as is the case for some NFS clients with time-based directory cache expiration). This file simply contains an version header (), followed by the generation recorded as , written twice. Files: segments.gen: GenHeader, Generation, Generation, Footer segments_N: Header, Version, NameCounter, SegCount, <SegName, SegCodec, DelGen, DeletionCount, FieldInfosGen, UpdatesFiles>^SegCount, CommitUserData, Footer Data types: Header --> GenHeader, NameCounter, SegCount, DeletionCount --> Generation, Version, DelGen, Checksum, FieldInfosGen --> SegName, SegCodec --> CommitUserData --> UpdatesFiles --> Footer --> Field Descriptions: Version counts how often the index has been changed by adding or deleting documents. NameCounter is used to generate names for new segment files. SegName is the name of the segment, and is used as the file name prefix for all of the files that compose the segment's index. DelGen is the generation count of the deletes file. If this is -1, there are no deletes. Anything above zero means there are deletes stored by . DeletionCount records the number of deleted documents in this segment. SegCodec is the of the that encoded this segment. CommitUserData stores an optional user-supplied opaque that was passed to . FieldInfosGen is the generation count of the fieldInfos file. If this is -1, there are no updates to the fieldInfos in that segment. Anything above zero means there are updates to fieldInfos stored by . UpdatesFiles stores the list of files that were updated in that segment. @lucene.experimental

The file format version for the segments_N codec header, up to 4.5.

The file format version for the segments_N codec header, since 4.6+.

The file format version for the segments_N codec header, since 4.8+

Current format of segments.gen

Setting this to true will generate the same file names that were used in 4.8.0-beta00001 through 4.8.0-beta00015. When writing more than 10 segments, these segment names were incompatible with prior versions of Lucene.NET and incompatible with Lucene 4.8.0. This is only for reading codecs from the affected 4.8.0 beta versions, it is not recommended to use this setting for general use. This must be set prior to opening an index at application startup. When setting it at other times the behavior is undefined. Note that this property can also be set using the "useLegacySegmentNames" system property to "true" (such as setting the environment variable "lucene:useLegacySegmentNames"). System properties can also be injected by supplying a at application startup through .

Optimized version of with a radix of 36, that simply does a switch case for the first 100 numbers, which takes only 5% of the time as calculating it. We fall back to calling the method after 100 segments. This also implements the switch for so it doesn't have to be dealt with externally.

Used to name new segments.

Opaque that user can specify during

If non-null, information about loading segments_N files will be printed here.

Sole constructor. Typically you call this and then use or to populate each . Alternatively, you can add/remove your own s.

Returns at the provided index. This was info(int) in Lucene.

Get the generation of the most recent commit to the list of index files (N in the segments_N file).

array of file names to check

Get the generation of the most recent commit to the index in this directory (N in the segments_N file).

directory to search for the latest segments_N file

Get the filename of the segments_N file for the most recent commit in the list of index files.

array of file names to check

Get the filename of the segments_N file for the most recent commit to the index in this Directory.

directory to search for the latest segments_N file

Get the segments_N filename in use by this segment infos.

Parse the generation off the segments file name and return it.

A utility for writing the file to a . NOTE: this is an internal utility which is kept public so that it's accessible by code from other packages. You should avoid calling this method unless you're absolutely sure what you're doing! @lucene.internal

Get the next segments_N filename that will be written.

Read a particular . Note that this may throw an if a commit is in process.

directory containing the segments file segment file to load if the index is corrupt if there is a low-level IO error

Find the latest commit (segments_N file) and load all s.

Returns a copy of this instance, also copying each .

Version number when this was generated.

Returns current generation.

Returns last succesfully read or written generation.

If non-null, information about retries when loading the segments file will be printed to this.

Advanced configuration of retry logic in loading segments_N file

Gets or Sets the . Advanced: set how many times to try incrementing the gen when loading the segments file. this only runs if the primary (listing directory) and secondary (opening segments.gen file) methods fail to find the segments file. @lucene.experimental

Prints the given message to the . Note, this method does not check for null . It assumes this check has been performed by the caller, which is recommended to avoid the (usually) expensive message creation.

Utility class for executing code that needs to do something with the current segments file. This is necessary with lock-less commits because from the time you locate the current segments file name, until you actually open it, read its contents, or check modified time, etc., it could have been deleted due to a writer commit finishing.

Sole constructor.

Locate the most recent segments file and run on it.

Run on the provided commit.

Subclass must implement this. The assumption is an will be thrown if something goes wrong during the processing that could have been caused by a writer committing.

Call this to start a commit. This writes the new segments file, but writes an invalid checksum at the end, so that it is not visible to readers. Once this is called you must call to complete the commit or to abort it. Note: should be called prior to this method if changes have been made to this instance

Returns all file names referenced by instances matching the provided (ie files associated with any "external" segments are skipped). The returned collection is recomputed on each invocation.

Writes & syncs to the Directory dir, taking care to remove the segments file on exception Note: should be called prior to this method if changes have been made to this instance

Returns readable description of this segment.

Gets saved with this commit.

Replaces all segments in this instance, but keeps generation, version, counter so that future commits remain write once.

Returns sum of all segment's docCounts. Note that this does not include deletions

Call this before committing if changes have been made to the segments.

applies all changes caused by committing a merge to this

Returns an unmodifiable of contained segments in order.

Returns all contained segments as an unmodifiable view.

Returns number of s. NOTE: This was size() in Lucene.

Appends the provided .

Appends the provided s.

Clear all s.

Remove the provided . WARNING: O(N) cost

Remove the at the provided index. WARNING: O(N) cost

Return true if the provided is contained. WARNING: O(N) cost

Returns index of the provided . WARNING: O(N) cost

The class combines two or more Segments, represented by an , into a single Segment. Call the merge method to combine the segments.

True if any merging should happen

Merges the readers into the directory passed to the constructor

The number of documents that were merged if the index is corrupt if there is a low-level IO error The number of documents in all of the readers if the index is corrupt if there is a low-level IO error

Merge the TermVectors from each of the segments into the new one.

if there is a low-level IO error

implementation over a single segment. Instances pointing to the same segment (but with different deletes, etc) may share the same core data. @lucene.experimental

Constructs a new with a new core.

if the index is corrupt if there is a low-level IO error

Create new sharing core from a previous and loading new live docs from a new deletes file. Used by .

Create new sharing core from a previous and using the provided in-memory liveDocs. Used by to provide a new NRT reader

Reads the most recent of the given segment info. @lucene.internal

Expert: retrieve thread-private @lucene.internal

Return the name of the segment this reader is reading.

Return the of the segment this reader is reading.

Returns the directory this index resides in.

Returns term infos index divisor originally passed to .

Called when the shared core for this is disposed. This listener is called only once all s sharing the same core are disposed. At this point it is safe for apps to evict this reader from any caches keyed on . This is the same interface that uses, internally, to evict entries. NOTE: This was CoreClosedListener in Lucene. @lucene.experimental

Invoked when the shared core of the original has disposed.

Expert: adds a to this reader's shared core

Expert: removes a from this reader's shared core

Returns approximate RAM Bytes used

Holder class for common parameters used during read. @lucene.experimental

where this segment is read from.

describing this segment.

describing all fields in this segment.

to pass to .

The termInfosIndexDivisor to use, if appropriate (not all s support it; in particular the current default does not). NOTE: if this is < 0, that means "defer terms index load until needed". But if the codec must load the terms index on init (preflex is the only once currently that must do so), then it should negate this value to get the app's terms divisor

Unique suffix for any postings files read for this segment. sets this for each of the postings formats it wraps. If you create a new then any files you write/read must be derived using this suffix (use ).

Create a .

Holder class for common parameters used during write. @lucene.experimental

used for debugging messages.

where this segment will be written to.

describing this segment.

describing all fields in this segment.

Number of deleted documents set while flushing the segment.

Deletes and updates to apply while we are flushing the segment. A is enrolled in here if it was deleted/updated at one point, and it's mapped to the docIDUpto, meaning any docID < docIDUpto containing this term should be deleted/updated.

recording live documents; this is only set if there is one or more deleted documents.

Unique suffix for any postings files written for this segment. sets this for each of the postings formats it wraps. If you create a new then any files you write/read must be derived using this suffix (use ).

Expert: The fraction of terms in the "dictionary" which should be stored in RAM. Smaller values use more memory, but make searching slightly faster, while larger values use less memory and make searching slightly slower. Searching is typically not dominated by dictionary lookup, so tweaking this is rarely useful.

for all writes; you should pass this to .

Sole constructor.

Constructor which takes segment suffix.

Create a shallow copy of with a new segment suffix.

A that simply does each merge sequentially, using the current thread.

Sole constructor.

Just do the merges in sequence. We do this "synchronized" so that even if the application is using multiple threads, only one merge may run at a time.

A very simple merged segment warmer that just ensures data structures are initialized.

Creates a new

to log statistics about warming.

Subclass of for enumerating a single term. For example, this can be used by s that need only visit one term, but want to preserve semantics such as .

Creates a new . After calling the constructor the enumeration is already pointing to the term, if it exists.

Exposes multi-valued view over a single-valued instance. This can be used if you want to have one multi-valued implementation against e.g. that also works for single-valued fields.

Creates a multi-valued view over the provided

Return the wrapped

This class forces a composite reader (eg a or ) to emulate an atomic reader. This requires implementing the postings APIs on-the-fly, using the static methods in , , by stepping through the sub-readers to merge fields/terms, appending docs, etc. NOTE: This class almost always results in a performance hit. If this is important to your use case, you'll get better performance by gathering the sub readers using to get the atomic leaves and then operate per-AtomicReader, instead of using this class.

This method is sugar for getting an from an of any kind. If the reader is already atomic, it is returned unchanged, otherwise wrapped by this class.

An that wraps any other and adds the ability to hold and later release snapshots of an index. While a snapshot is held, the will not remove any files associated with it even if the index is otherwise being actively, arbitrarily changed. Because we wrap another arbitrary , this gives you the freedom to continue using whatever you would normally want to use with your index. This class maintains all snapshots in-memory, and so the information is not persisted and not protected against system failures. If persistence is important, you can use . @lucene.experimental

Records how many snapshots are held against each commit generation

Used to map gen to .

Wrapped

Most recently committed .

Used to detect misuse

Sole constructor, taking the incoming to wrap.

Release a snapshotted commit.

the commit previously returned by

Release a snapshot by generation.

Increments the refCount for this .

Snapshots the last commit and returns it. Once a commit is 'snapshotted,' it is protected from deletion (as long as this is used). The snapshot can be removed by calling followed by a call to . NOTE: while the snapshot is held, the files it references will not be deleted, which will consume additional disk space in your index. If you take a snapshot at a particularly bad time (say just before you call ) then in the worst case this could consume an extra 1X of your total index size, until you release the snapshot.

if this index does not have any commits yet the that was snapshotted.

Returns all s held by at least one snapshot.

Returns the total number of snapshots currently held.

Retrieve an from its generation; returns null if this is not currently snapshotted

Wraps each as a .

Wraps a provided and prevents it from being deleted.

The we are preventing from deletion.

Creates a wrapping the provided .

A per-document with presorted values. Per-Document values in a are deduplicated, dereferenced, and sorted into a dictionary of unique values. A pointer to the dictionary value (ordinal) can be retrieved for each document. Ordinals are dense and in increasing sorted order.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns the ordinal for the specified docID.

document ID to lookup ordinal for the document: this is dense, starts at 0, then increments by 1 for the next value in sorted order. Note that missing values are indicated by -1.

Retrieves the value for the specified ordinal.

ordinal to lookup (must be >= 0 and < ) will be populated with the ordinal's value

Returns the number of unique values.

Number of unique values in this . This is also equivalent to one plus the maximum ordinal.

If exists, returns its ordinal, else returns -insertionPoint-1, like

Key to look up

Returns a over the values. The enum supports and .

Implements a wrapping a provided .

Creates a new over the provided values

Buffers up pending per doc, deref and sorting via int ord, then flushes when segment flushes.

A per-document set of presorted values. Per-Document values in a are deduplicated, dereferenced, and sorted into a dictionary of unique values. A pointer to the dictionary value (ordinal) can be retrieved for each document. Ordinals are dense and in increasing sorted order.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

When returned by it means there are no more ordinals for the document.

Returns the next ordinal for the current document (previously set by .

Next ordinal for the document, or . ordinals are dense, start at 0, then increment by 1 for the next value in sorted order.

Sets iteration to the specified docID

document ID

Retrieves the value for the specified ordinal.

ordinal to lookup will be populated with the ordinal's value

Returns the number of unique values.

Number of unique values in this . This is also equivalent to one plus the maximum ordinal.

If exists, returns its ordinal, else returns -insertionPoint-1, like .

Key to look up

Returns a over the values. The enum supports and .

Implements a wrapping a provided .

Creates a new over the provided values

Buffers up pending s per doc, deref and sorting via int ord, then flushes when segment flushes.

called only from static Open() methods

called from DirectoryReader.Open(...) methods

Used by near real-time search

This constructor is only used for

This is a that writes stored fields.

Fills in any hole in the docIDs

Expert: Provides a low-level means of accessing the stored field values in an index. See . NOTE: a implementation should not try to load or visit other stored documents in the same reader because the implementation of stored fields for most codecs is not reeentrant and you will see strange exceptions as a result. See , which is a that builds the containing all stored fields. This is used by . @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Process a binary field.

newly allocated byte array with the binary contents.

Process a field

Process a numeric field.

Hook before processing a field. Before a field is processed, this method is invoked so that subclasses can return a representing whether they need that particular field or not, or to stop processing entirely.

Enumeration of possible return values for .

YES: the field should be visited.

NO: don't visit this field, but continue processing fields for this document.

STOP: don't visit this field and stop processing any other fields for this document.

A represents a word from text. This is the unit of search. It is composed of two elements, the text of the word, as a string, and the name of the field that the text occurred in. Note that terms may represent more than words from text fields, but also things like dates, email addresses, urls, etc.

Constructs a with the given field and bytes. Note that a null field or null bytes value results in undefined behavior for most Lucene APIs that accept a Term parameter. WARNING: the provided is not copied, but used directly. Therefore the bytes should not be modified after construction, for example, you should clone a copy by rather than pass reused bytes from a .

Constructs a with the given field and text. Note that a null field or null text value results in undefined behavior for most Lucene APIs that accept a parameter.

Constructs a with the given field and empty text. this serves two purposes: 1) reuse of a with the same field. 2) pattern for a query.

field's name

Returns the field of this term. The field indicates the part of a document which this term came from.

Returns the text of this term. In the case of words, this is simply the text of the word. In the case of dates and other types, this is an encoding of the object as a string.

Returns human-readable form of the term text. If the term is not unicode, the raw bytes will be printed instead.

Returns the bytes of this term.

Compares two terms, returning a negative integer if this term belongs before the argument, zero if this term is equal to the argument, and a positive integer if this term belongs after the argument. The ordering of terms is first by field, then by text.

Resets the field and text of a . WARNING: the provided is not copied, but used directly. Therefore the bytes should not be modified after construction, for example, you should clone a copy rather than pass reused bytes from a TermsEnum.

Maintains a view over instances containing a single term. The doesn't track if the given objects are valid, neither if the instances refer to the same terms in the associated readers. @lucene.experimental

Holds the of the top-level , used internally only for asserting. @lucene.internal

Creates an empty from a

Creates a with an initial , pair.

Creates a from a top-level and the given . this method will lookup the given term in all context's leaf readers and register each of the readers containing the term in the returned using the leaf reader's ordinal. Note: the given context must be a top-level context.

Clears the internal state and removes all registered s

Registers and associates a with an leaf ordinal. The leaf ordinal should be derived from a 's leaf ord.

Returns the for an leaf ordinal or null if no for the ordinal was registered.

The readers leaf ordinal to get the for. The for the given readers ord or null if no for the reader was registered

Returns the accumulated term frequency of all instances passed to .

the accumulated term frequency of all instances passed to .

expert: only available for queries that want to lie about docfreq @lucene.internal

Access to the terms in a specific field. See . @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns an iterator that will step through all terms. This method will not return null.

If you have a previous , for example from a different field, you can pass it for possible reuse if the implementation can do so.

Returns an iterator that will step through all terms. This method will not return null. If you have a previous , for example from a different field, you can pass it for possible reuse if the implementation can do so.

Returns a that iterates over all terms that are accepted by the provided . If the is provided then the returned enum will only accept terms > , but you still must call first to get to the first term. Note that the provided must be accepted by the automaton. NOTE: the returned cannot seek.

Return the used to sort terms provided by the iterator. This method may return null if there are no terms. This method may be invoked many times; it's best to cache a single instance & reuse it.

Returns the number of terms for this field, or -1 if this measure isn't stored by the codec. Note that, just like other term measures, this measure does not take deleted documents into account. NOTE: This was size() in Lucene.

Returns the sum of for all terms in this field, or -1 if this measure isn't stored by the codec. Note that, just like other term measures, this measure does not take deleted documents into account.

Returns true if documents in this field store per-document term frequency ().

Returns true if documents in this field store offsets.

Returns true if documents in this field store positions.

Returns true if documents in this field store payloads.

Zero-length array of .

Enumerator to seek (, ) or step through ( terms to obtain , frequency information (), or for the current term (). Term enumerations are always ordered by . Each term in the enumeration is greater than the one before it. The is unpositioned when you first obtain it and you must first successfully call or one of the Seek methods. @lucene.experimental

Moves to the next item in the . The default implementation can and should be overridden with a more optimized version.

true if the enumerator was successfully advanced to the next element; false if the enumerator has passed the end of the collection.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns the related attributes.

Represents returned result from .

The term was not found, and the end of iteration was hit.

The precise term was found.

A different term was found after the requested term

Attempts to seek to the exact term, returning true if the term is found. If this returns false, the enum is unpositioned. For some codecs, may be substantially faster than .

Seeks to the specified term, if it exists, or to the next (ceiling) term. Returns to indicate whether exact term was found, a different term was found, or EOF was hit. The target term may be before or after the current term. If this returns , the enum is unpositioned.

Seeks to the specified term by ordinal (position) as previously returned by . The target may be before or after the current ord, and must be within bounds.

Expert: Seeks a specific position by previously obtained from . Callers should maintain the to use this method. Low-level implementations may position the without re-seeking the term dictionary. Seeking by should only be used iff the state was obtained from the same instance. NOTE: Using this method with an incompatible might leave this in undefined state. On a segment level instances are compatible only iff the source and the target operate on the same field. If operating on segment level, TermState instances must not be used across segments. NOTE: A seek by might not restore the 's state. states must be maintained separately if this method is used.

the term the corresponds to the

Returns current term. Do not call this when the enum is unpositioned.

Returns ordinal position for current term. This is an optional property (the codec may throw . Do not call this when the enum is unpositioned.

Returns the number of documents containing the current term. Do not call this when the enum is unpositioned.

Returns the total number of occurrences of this term across all documents (the sum of the Freq for each doc that has this term). This will be -1 if the codec doesn't support this measure. Note that, like other term measures, this measure does not take deleted documents into account.

Get for the current term. Do not call this when the enum is unpositioned. This method will not return null.

Unset bits are documents that should not be returned Pass a prior for possible reuse

Get for the current term, with control over whether freqs are required. Do not call this when the enum is unpositioned. This method will not return null.

Unset bits are documents that should not be returned Pass a prior DocsEnum for possible reuse Specifies which optional per-document values you require;

Get for the current term. Do not call this when the enum is unpositioned. This method will return null if positions were not indexed.

Unset bits are documents that should not be returned Pass a prior DocsAndPositionsEnum for possible reuse

Get for the current term, with control over whether offsets and payloads are required. Some codecs may be able to optimize their implementation when offsets and/or payloads are not required. Do not call this when the enum is unpositioned. This will return null if positions were not indexed.

Unset bits are documents that should not be returned Pass a prior DocsAndPositionsEnum for possible reuse Specifies which optional per-position values you require; see .

Expert: Returns the s internal state to position the without re-seeking the term dictionary. NOTE: A seek by might not capture the 's state. Callers must maintain the states separately

An empty for quickly returning an empty instance e.g. in Please note: this enum should be unmodifiable, but it is currently possible to add Attributes to it. This should not be a problem, as the enum is always empty and the existence of unused Attributes does not matter.

This class implements , which is passed each token produced by the analyzer on each field. It stores these tokens in a hash table, and allocates separate byte streams per token. Consumers of this class, eg and , write their own byte streams under each term.

Implement this class to plug into the processor, which inverts & stores s into a hash table and provides an API for writing bytes into multiple streams for each unique .

Collapse the hash table & sort in-place.

Secondary entry point (for 2nd & subsequent ), because token text has already been "interned" into , so we hash by

NOTE: This was writeVInt() in Lucene

Encapsulates all required internal state to position the associated without re-seeking. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Copies the content of the given to this instance

the to copy

Fills in no-term-vectors for all docs we haven't seen since the last doc that had term vectors.

Called once per field per document if term vectors are enabled, to write the vectors to RAMOutputStream, which is then quickly flushed to the real term vectors files in the Directory.

Merges segments of approximately equal size, subject to an allowed number of segments per tier. This is similar to , except this merge policy is able to merge non-adjacent segment, and separates how many segments are merged at once () from how many segments are allowed per tier (). This merge policy also does not over-merge (i.e. cascade merges). For normal merging, this policy first computes a "budget" of how many segments are allowed to be in the index. If the index is over-budget, then the policy sorts segments by decreasing size (pro-rating by percent deletes), and then finds the least-cost merge. Merge cost is measured by a combination of the "skew" of the merge (size of largest segment divided by smallest segment), total merge size and percent deletes reclaimed, so that merges with lower skew, smaller size and those reclaiming more deletes, are favored. If a merge will produce a segment that's larger than , then the policy will merge fewer segments (down to 1 at once, if that one has deletions) to keep the segment size under budget. NOTE: This policy freely merges non-adjacent segments; if this is a problem, use . NOTE: This policy always merges by byte size of the segments, always pro-rates by percent deletes, and does not apply any maximum segment size during forceMerge (unlike ). @lucene.experimental

Default noCFSRatio. If a merge's size is >= 10% of the index, then we disable compound file for it.

Sole constructor, setting all settings to their defaults.

Gets or sets maximum number of segments to be merged at a time during "normal" merging. For explicit merging (eg, or was called), see . Default is 10.

Gets or sets maximum number of segments to be merged at a time, during or . Default is 30.

Gets or sets maximum sized segment to produce during normal merging. This setting is approximate: the estimate of the merged segment size is made by summing sizes of to-be-merged segments (compensating for percent deleted docs). Default is 5 GB.

Controls how aggressively merges that reclaim more deletions are favored. Higher values will more aggressively target merges that reclaim deletions, but be careful not to go so high that way too much merging takes place; a value of 3.0 is probably nearly too high. A value of 0.0 means deletions don't impact merge selection.

Segments smaller than this are "rounded up" to this size, ie treated as equal (floor) size for merge selection. this is to prevent frequent flushing of tiny segments from allowing a long tail in the index. Default is 2 MB.

When forceMergeDeletes is called, we only merge away a segment if its delete percentage is over this threshold. Default is 10%.

Gets or sets the allowed number of segments per tier. Smaller values mean more merging but fewer segments. NOTE: this value should be >= the otherwise you'll force too much merging to occur. Default is 10.0.

Holds score and explanation for a single candidate merge.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns the score for this merge candidate; lower scores are better.

Human readable explanation of how the merge got this score.

Expert: scores one merge; subclasses can override.

Class that tracks changes to a delegated , used by to ensure specific changes are visible. Create this class (passing your ), and then pass this class to . Be sure to make all changes via the , otherwise won't know about the changes. @lucene.experimental

Create a wrapping the provided .

Calls and returns the generation that reflects this change.

Return the current generation being indexed.

Return the wrapped .

Return and increment current gen. @lucene.internal

Cals and returns the generation that reflects this change.

An interface for implementations that support 2-phase commit. You can use to execute a 2-phase commit algorithm over several s. @lucene.experimental

The first stage of a 2-phase commit. Implementations should do as much work as possible in this method, but avoid actual committing changes. If the 2-phase commit fails, is called to discard all changes since last successful commit.

The second phase of a 2-phase commit. Implementations should ideally do very little work in this method (following , and after it returns, the caller can assume that the changes were successfully committed to the underlying storage.

Discards any changes that have occurred since the last commit. In a 2-phase commit algorithm, where one of the objects failed to or , this method is used to roll all other objects back to their previous state.

A utility for executing 2-phase commit on several objects. @lucene.experimental

No instance

Thrown by when an object fails to .

Sole constructor.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Thrown by when an object fails to .

Sole constructor.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Rollback all objects, discarding any exceptions that occur.

Executes a 2-phase commit algorithm by first all objects and only if all succeed, it proceeds with . If any of the objects fail on either the preparation or actual commit, it terminates and all of them. NOTE: It may happen that an object fails to commit, after few have already successfully committed. This tool will still issue a rollback instruction on them as well, but depending on the implementation, it may not have any effect. NOTE: if any of the objects are null, this method simply skips over them.

if any of the objects fail to if any of the objects fail to

Just switches between two s.

This is used for upgrading all existing segments of an index when calling . All other methods delegate to the base given to the constructor. This allows for an as-cheap-as possible upgrade of an older index by only upgrading segments that are created by previous Lucene versions. ForceMerge does no longer really merge; it is just used to "ForceMerge" older segment versions away. In general one would use , but for a fully customizeable upgrade, you can use this like any other and call :


                IndexWriterConfig iwc = new IndexWriterConfig(LuceneVersion.LUCENE_XX, new KeywordAnalyzer());
                iwc.MergePolicy = new UpgradeIndexMergePolicy(iwc.MergePolicy);
                using (IndexWriter w = new IndexWriter(dir, iwc))
                {
                    w.ForceMerge(1);
                }

Warning: this merge policy may reorder documents if the index was partially upgraded before calling (e.g., documents were added). If your application relies on "monotonicity" of doc IDs (which means that the order in which the documents were added to the index is preserved), do a ForceMerge(1) instead. Please note, the delegate may also reorder documents. @lucene.experimental

Wrapped .

Wrap the given and intercept requests to only upgrade segments written with previous Lucene versions.

Returns true if the given segment should be upgraded. The default implementation will return !Constants.LUCENE_MAIN_VERSION.Equals(si.Info.Version, StringComparison.Ordinal), so all segments created with a different version number than this Lucene version will get upgraded.

Extension methods that can be used to provide similar syntax as Java Lucene. (config.SetCheckIntegrityAtMerge(100).SetMaxBufferedDocs(1000);)

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Builder method for .

this instance this instance

Represents an comparison operation that uses comparison rules. Since in .NET the standard comparers will do boxing when comparing enum types, this class was created as a more performant alternative than calling CompareTo() on .

Gets the default static singleton instance of .

Compares two enums and returns an indication of their relative sort order.

An enum to compare to . An enum to compare to . A signed integer that indicates the relative values of and , as shown in the following table. Value Meaning Less than zero precedes y in the sort order. Zero is equal to . Greater than zero follows in the sort order.

A that runs each merge using s on the default . If more than merges are requested then this class will forcefully throttle the incoming threads by pausing until one more more merges complete. LUCENENET specific

List of currently active s.

How many s have kicked off (this is use to name them).

that holds the index.

that owns this instance.

Sole constructor, with all settings set to default values.

Sets the maximum number of merge threads and simultaneous merges allowed.

The max # simultaneous merges that are allowed. If a merge is necessary yet we already have this many threads running, the incoming thread (that is calling add/updateDocument) will block until a merge thread has completed. Note that we will only run the smallest merges at a time. The max # simultaneous merge threads that should be running at once. This must be <=

Max number of merge threads allowed to be running at once. When there are more merges then this, we forcefully pause the larger ones, letting the smaller ones run, up until merges at which point we forcefully pause incoming threads (that presumably are the ones causing so much merging).

Max number of merges we accept before forcefully throttling the incoming threads

Return the priority that merge threads run at. This is always the same.

This method has no effect in because the returns a constant value.

Returns true if verbosing is enabled. This method is usually used in conjunction with , like that:


             if (Verbose) {
                 Message("your message");
             }

Outputs the given message - this method assumes was called and returned true.

Wait for any running merge threads to finish. This call is not interruptible as used by .

Returns the number of merge threads that are alive. Note that this number is <= size.

Does the actual merge, by calling

Create and return a new

Called when an exception is hit in a background merge thread

Used for testing

Runs a merge thread, which may run one or more merges in sequence.

Sole constructor.

Record the currently running merge.

Return the current merge, or null if this is done.

Returns the cardinality for the current document (previously set by .

Returns the text of this term. In the case of words, this is simply the text of the word. In the case of dates and other types, this is an encoding of the object as a string.

A that will match terms against a finite-state machine. This query will match documents that contain terms accepted by a given finite-state machine. The automaton can be constructed with the API. Alternatively, it can be created from a regular expression with or from the standard Lucene wildcard syntax with . When the query is executed, it will create an equivalent DFA of the finite-state machine, and will enumerate the term dictionary in an intelligent way to reduce the number of comparisons. For example: the regular expression of [dl]og? will make approximately four comparisons: do, dog, lo, and log. @lucene.experimental

The automaton to match index terms against

Term containing the field, and possibly some pattern structure

Create a new AutomatonQuery from an .

containing field and possibly some pattern structure. The term text is ignored. to run, terms that are accepted are considered a match.

Returns the automaton used to create this query

This implementation supplies a filtered , that excludes all docids which are not in a instance. This is especially useful in to apply the passed to before returning the final .

Convenience wrapper method: If acceptDocs is null it returns the original set without wrapping.

Underlying DocIdSet. If null, this method returns null Allowed docs, all docids not in this set will not be returned by this . If null, this method returns the original set without wrapping.

Constructor.

Underlying Allowed docs, all docids not in this set will not be returned by this

A clause in a .

The query whose matching documents are combined by the boolean query.

Constructs a .

Returns true if is equal to this.

Returns a hash code value for this object.

Specifies how clauses are to occur in matching documents.

Use this operator for clauses that must appear in the matching documents.

Use this operator for clauses that should appear in the matching documents. For a with no clauses one or more clauses must match a document for the to match.

Use this operator for clauses that must not appear in the matching documents. Note that it is not possible to search for queries that only consist of a clause.

A that matches documents matching boolean combinations of other queries, e.g. s, s or other s. Collection initializer note: To create and populate a in a single statement, you can use the following example as a guide:


            var booleanQuery = new BooleanQuery() {
                { new WildcardQuery(new Term("field2", "foobar")), Occur.SHOULD },
                { new MultiPhraseQuery() {
                    new Term("field", "microsoft"), 
                    new Term("field", "office")
                }, Occur.SHOULD }
            };
            
            // or
            
            var booleanQuery = new BooleanQuery() {
                new BooleanClause(new WildcardQuery(new Term("field2", "foobar")), Occur.SHOULD),
                new BooleanClause(new MultiPhraseQuery() {
                    new Term("field", "microsoft"), 
                    new Term("field", "office")
                }, Occur.SHOULD)
            };

Thrown when an attempt is made to add more than clauses. This typically happens if a , , , or is expanded to many terms during search.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Return the maximum number of clauses permitted, 1024 by default. Attempts to add more than the permitted number of clauses cause to be thrown.

Constructs an empty boolean query.

Constructs an empty boolean query. may be disabled in scoring, as appropriate. For example, this score factor does not make sense for most automatically generated queries, like and .

Disables in scoring.

Returns true if is disabled in scoring for this query instance.

Specifies a minimum number of the optional s which must be satisfied. By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required. Use of this method is totally independent of specifying that any specific clauses are required (or prohibited). This number will only be compared against the number of matching optional clauses.

The number of optional clauses that must match

Adds a clause to a boolean query.

If the new number of clauses exceeds the maximum clause number

Adds a clause to a boolean query.

If the new number of clauses exceeds the maximum clause number

Returns the set of clauses in this query.

Returns the list of clauses in this query.

Returns an iterator on the clauses in this query. It implements the interface to make it possible to do: foreach (BooleanClause clause in booleanQuery) {}

Expert: the for , used to normalize, score and explain these queries. @lucene.experimental

The implementation.

Prints a user-readable version of this query.

Returns true if is equal to this.

Returns a hash code value for this object.

Description from Doug Cutting (excerpted from LUCENE-1483): uses an array to score windows of 2K docs. So it scores docs 0-2K first, then docs 2K-4K, etc. For each window it iterates through all query terms and accumulates a score in table[doc%2K]. It also stores in the table a bitmask representing which terms contributed to the score. Non-zero scores are chained in a linked list. At the end of scoring each window it then iterates through the linked list and, if the bitmask matches the boolean constraints, collects a hit. For boolean queries with lots of frequent terms this can be much faster, since it does not need to update a priority queue for each posting, instead performing constant-time operations per posting. The only downside is that it results in hits being delivered out-of-order within the window, which means it cannot be nested within other scorers. But it works well as a top-level scorer. The new BooleanScorer2 implementation instead works by merging priority queues of postings, albeit with some clever tricks. For example, a pure conjunction (all terms required) does not require a priority queue. Instead it sorts the posting streams at the start, then repeatedly skips the first to to the last. If the first ever equals the last, then there's a hit. When some terms are required and some terms are optional, the conjunction can be evaluated first, then the optional terms can all skip to the match and be added to the score. Thus the conjunction can reduce the number of priority queue updates for the optional terms.

A simple hash table of document scores within a range.

See the description in comparing & . An alternative to that also allows a minimum number of optional scorers that should match. Implements SkipTo(), and has no limitations on the numbers of added scorers. Uses , , and .

The scorer to which all scoring will be delegated, except for computing and using the coordination factor.

The number of optionalScorers that need to match (if there are any)

Creates a with the given similarity and lists of required, prohibited and optional scorers. In no required scorers are added, at least one of the optional scorers will have to match during the search.

The to be used. If this parameter is true, coordination level matching () is not used. The minimum number of optional added scorers that should match during the search. In case no required scorers are added, at least one of the optional scorers will have to match during the search. The list of required scorers. The list of prohibited scorers. The list of optional scorers. The max coord.

Count a scorer as a single match.

Returns the scorer to be used for match counting and score summing. Uses requiredScorers, optionalScorers and prohibitedScorers.

Returns the scorer to be used for match counting and score summing. Uses the given required scorer and the prohibitedScorers.

A required scorer already built.

Add this to a returned by and update the boost on each returned term. This enables to control the boost factor for each matching term in or mode. is using this to take the edit distance into account. Please note: this attribute is intended to be added only by the to itself in its constructor and consumed by the . @lucene.internal

Gets or Sets the boost in this attribute. Default is 1.0f.

Implementation class for . @lucene.internal

Gets or Sets the boost in this attribute. Default is 1.0f.

This class is used to score a range of documents at once, and is returned by . Only queries that have a more optimized means of scoring across a range of documents need to override this. Otherwise, a default implementation is wrapped around the returned by .

Scores and collects all matching documents.

The collector to which all matching documents are passed.

Collects matching documents in a range.

The collector to which all matching documents are passed. Score up to, but not including, this doc true if more matching documents may remain.

Caches all docs, and optionally also scores, coming from a search, and is then able to replay them to another collector. You specify the max RAM this class may use. Once the collection is done, call . If this returns true, you can use against a new collector. If it returns false, this means too much RAM was required and you must instead re-run the original search. NOTE: this class consumes 4 (or 8 bytes, if scoring is cached) per collected document. If the result set is large this can easily be a very substantial amount of RAM! NOTE: this class caches at least 128 documents before checking RAM limits. See the Lucene modules/grouping module for more details including a full code example. @lucene.experimental

NOTE: This was EMPTY_INT_ARRAY in Lucene

A which caches scores

A which does not cache scores

Creates a which does not wrap another collector. The cached documents and scores can later be replayed ().

whether documents are allowed to be collected out-of-order

Create a new that wraps the given collector and caches documents and scores up to the specified RAM threshold.

The to wrap and delegate calls to. Whether to cache scores in addition to document IDs. Note that this increases the RAM consumed per doc. The maximum RAM in MB to consume for caching the documents and scores. If the collector exceeds the threshold, no documents and scores are cached.

Create a new that wraps the given collector and caches documents and scores up to the specified max docs threshold.

The to wrap and delegate calls to. Whether to cache scores in addition to document IDs. Note that this increases the RAM consumed per doc. The maximum number of documents for caching the documents and possible the scores. If the collector exceeds the threshold, no documents and scores are cached.

Called before successive calls to . Implementations that need the score of the current document (passed-in to ), should save the passed-in and call when needed.

Called once for every document matching a query, with the unbased document number. Note: The collection of the current segment can be terminated by throwing a . In this case, the last docs of the current will be skipped and will swallow the exception and continue collection with the next leaf. Note: this is called in an inner search loop. For good search performance, implementations of this method should not call or on every hit. Doing so can slow searches by an order of magnitude or more.

Reused by the specialized inner classes.

Replays the cached doc IDs (and scores) to the given . If this instance does not cache scores, then is not set on other.SetScorer(Scorer) as well as scores are not replayed.

If this collector is not cached (i.e., if the RAM limits were too low for the number of documents + scores to cache). If the given Collect's does not support out-of-order collection, while the collector passed to the ctor does.

Wraps another 's result and caches it. The purpose is to allow filters to simply filter, and then wrap with this class to add caching.

Wraps another filter's result and caches it.

Filter to cache results of

Gets the contained filter.

the contained filter.

Provide the to be cached, using the provided by the wrapped Filter. This implementation returns the given , if returns true, else it calls Note: this method returns if the given is null or if return null. The empty instance is use as a placeholder in the cache instead of the null value.

Default cache implementation: uses .

An empty instance

Returns total byte size used by cached filters.

Contains statistics for a collection (field) @lucene.experimental

Sole constructor.

Returns the field name

Returns the total number of documents, regardless of whether they all contain values for this field.

Returns the total number of documents that have at least one term for this field.

Returns the total number of tokens for this field

Returns the total number of postings for this field

Throw this exception in to prematurely terminate collection of the current leaf. Note: swallows this exception and never re-throws it. As a consequence, you should not catch it when calling any overload of as it is unnecessary and might hide misuse of this exception.

Sole constructor.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Expert: Collectors are primarily meant to be used to gather raw results from a search, and implement sorting or custom result filtering, collation, etc. Lucene's core collectors are derived from Collector. Likely your application can use one of these classes, or subclass , instead of implementing directly: is an abstract base class that assumes you will retrieve the top N docs, according to some criteria, after collection is done. is a concrete subclass and sorts according to score + docID. This is used internally by the search methods that do not take an explicit . It is likely the most frequently used collector. subclasses and sorts according to a specified object (sort by field). This is used internally by the search methods that take an explicit . , which wraps any other Collector and aborts the search if it's taken too much time. wraps any other and prevents collection of hits whose score is <= 0.0 decouples the score from the collected doc: the score computation is skipped entirely if it's not needed. Collectors that do need the score should implement the method, to hold onto the passed instance, and call within the collect method to compute the current hit's score. If your collector may request the score for a single hit multiple times, you should use . NOTE: The doc that is passed to the collect method is relative to the current reader. If your collector needs to resolve this to the docID space of the Multi*Reader, you must re-base it by recording the docBase from the most recent call. Here's a simple example showing how to collect docIDs into an :


             private class MySearchCollector : ICollector
             {
                 private readonly OpenBitSet bits;
                 private int docBase;
             
                 public MySearchCollector(OpenBitSet bits)
                 {
                     if (bits is null) throw new ArgumentNullException("bits");
                     this.bits = bits;
                 }
             
                 // ignore scorer
                 public void SetScorer(Scorer scorer)
                 { 
                 }
                 
                 // accept docs out of order (for a BitSet it doesn't matter)
                 public bool AcceptDocsOutOfOrder
                 {
                     get { return true; }
                 }
                 
                 public void Collect(int doc)
                 {
                     bits.Set(doc + docBase);
                 }
                 
                 public void SetNextReader(AtomicReaderContext context)
                 {
                     this.docBase = context.DocBase;
                 }
             }
             
             IndexSearcher searcher = new IndexSearcher(indexReader);
             OpenBitSet bits = new OpenBitSet(indexReader.MaxDoc);
             searcher.Search(query, new MySearchCollector(bits));

Not all collectors will need to rebase the docID. For example, a collector that simply counts the total number of hits would skip it. NOTE: Prior to 2.9, Lucene silently filtered out hits with score <= 0. As of 2.9, the core s no longer do that. It's very unusual to have such hits (a negative query boost, or function query returning negative custom scores, could cause it to happen). If you need that behavior, use . @lucene.experimental @since 2.9

Called before successive calls to . Implementations that need the score of the current document (passed-in to ), should save the passed-in and call scorer.GetScore() when needed.

Called before collecting from each . All doc ids in will correspond to . Add to the current 's internal document id to re-base ids in .

next atomic reader context

Return true if this collector does not require the matching docIDs to be delivered in int sort order (smallest to largest) to . Most Lucene Query implementations will visit matching docIDs in order. However, some queries (currently limited to certain cases of ) can achieve faster searching if the allows them to deliver the docIDs out of order. Many collectors don't mind getting docIDs out of order, so it's important to return true here.

LUCENENET specific class used to hold the static method.

Creates a new instance with the ability to specify the body of the method through the parameter, the body of the method through the parameter, the body of the method through the parameter, and the body of the property through the parameter. Simple example:


                IndexSearcher searcher = new IndexSearcher(indexReader);
                OpenBitSet bits = new OpenBitSet(indexReader.MaxDoc);
                int docBase;
                searcher.Search(query, 
                    Collector.NewAnonymous(setScorer: (scorer) =>
                    {
                        // ignore scorer
                    }, collect: (doc) =>
                    {
                        bits.Set(doc + docBase);
                    }, setNextReader: (context) =>
                    {
                        docBase = context.DocBase;
                    }, acceptsDocsOutOfOrder: () =>
                    {
                        return true;
                    })
                );

A delegate method that represents (is called by) the method. It accepts a scorer and has no return value. A delegate method that represents (is called by) the method. It accepts an doc and has no return value. A delegate method that represents (is called by) the method. It accepts a context and has no return value. A delegate method that represents (is called by) the property. It returns a value. A new instance.

Expert: Describes the score computation for document and query, and can distinguish a match independent of a positive value.

Gets or Sets the match status assigned to this explanation node. May be null if match status is unknown.

Indicates whether or not this models a good match. If the match status is explicitly set (i.e.: not null) this method uses it; otherwise it defers to the superclass.

Scorer for conjunctions, sets of queries, all of which are required.

A rewrite method that tries to pick the best constant-score rewrite method based on term and document counts from the query. If both the number of terms and documents is small enough, then is used. Otherwise, is used.

Defaults derived from rough tests with a 20.0 million doc Wikipedia index. With more than 350 terms in the query, the filter method is fastest:

If the query will hit more than 1 in 1000 of the docs in the index (0.1%), the filter method is fastest:

If the number of terms in this query is equal to or larger than this setting then is used.

If the number of documents to be visited in the postings exceeds this specified percentage of the for the index, then is used. Value may be 0.0 to 100.0.

Special implementation of that keeps parallel arrays for

A query that wraps another query or a filter and simply returns a constant score equal to the query boost for every document that matches the filter or query. For queries it therefore simply strips of all scores and returns a constant one.

Strips off scores from the passed in . The hits will get a constant score dependent on the boost factor of this query.

if is null.

Wraps a as a . The hits will get a constant score dependent on the boost factor of this query. If you simply want to strip off scores from a , no longer use new ConstantScoreQuery(new QueryWrapperFilter(query)), instead use !

if is null.

Returns the encapsulated filter, returns null if a query is wrapped.

Returns the encapsulated query, returns null if a filter is wrapped.

We return this as our so that if the CSQ wraps a query with its own optimized top-level scorer (e.g. ) we can use that top-level scorer.

Utility class that runs a thread to manage periodic reopens of a , with methods to wait for a specific index changes to become visible. To use this class you must first wrap your with a and always use it to make changes to the index, saving the returned generation. Then, when a given search request needs to see a specific index change, call the to wait for that change to be visible. Note that this will only scale well if most searches do not need to wait for a specific index generation. @lucene.experimental

Create , to periodically reopen the a .

Maximum time until a new reader must be opened; this sets the upper bound on how slowly reopens may occur, when no caller is waiting for a specific generation to become visible. Mininum time until a new reader can be opened; this sets the lower bound on how quickly reopens may occur, when a caller is waiting for a specific generation to become visible.

Kills the thread and releases all resources used by the . Also joins to the thread so that when this method returns the thread is no longer alive.

Waits for the target generation to become visible in the searcher. If the current searcher is older than the target generation, this method will block until the searcher is reopened, by another via or until the is closed.

The generation to wait for

Waits for the target generation to become visible in the searcher, up to a maximum specified milli-seconds. If the current searcher is older than the target generation, this method will block until the searcher has been reopened by another thread via , the given waiting time has elapsed, or until the is closed. NOTE: if the waiting time elapses before the requested target generation is available the current is returned instead.

The generation to wait for Maximum milliseconds to wait, or -1 to wait indefinitely true if the is now available, or false if wait time was exceeded

A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries. This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as would give). If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching another gets a higher score than "albino" matching both fields. To get this result, use both and : for each term a searches for it in each field, while the set of these 's is combined into a . The tie breaker capability allows results that include the same term in multiple fields to be judged better than results that include this term in only the best of those multiple fields, without confusing this with the better case of two different terms in the multiple fields. Collection initializer note: To create and populate a in a single statement, you can use the following example as a guide:


            var disjunctionMaxQuery = new DisjunctionMaxQuery(0.1f) {
                new TermQuery(new Term("field1", "albino")), 
                new TermQuery(new Term("field2", "elephant"))
            };

The subqueries

Multiple of the non-max disjunct scores added into our final score. Non-zero values support tie-breaking.

Creates a new empty . Use to add the subqueries.

The score of each non-maximum disjunct for a document is multiplied by this weight and added into the final score. If non-zero, the value should be small, on the order of 0.1, which says that 10 occurrences of word in a lower-scored field that is also in a higher scored field is just as good as a unique word in the lower scored field (i.e., one that is not in any higher scored field).

Creates a new

A of all the disjuncts to add The weight to give to each matching non-maximum disjunct

Add a subquery to this disjunction

The disjunct added

Add a collection of disjuncts to this disjunction via NOTE: When overriding this method, be aware that the constructor of this class calls a private method and not this virtual method. So if you need to override the behavior during the initialization, call your own private method from the constructor with whatever custom behavior you need.

A collection of queries to add as disjuncts. An over the disjuncts The disjuncts. Tie breaker value for multiple matches.

Expert: the Weight for DisjunctionMaxQuery, used to normalize, score and explain these queries. NOTE: this API and implementation is subject to change suddenly in the next release.

The s for our subqueries, in 1-1 correspondence with disjuncts

Construct the for this searched by . Recursively construct subquery weights.

Return our associated

Compute the sub of squared weights of us applied to our subqueries. Used for normalization.

Apply the computed normalization factor to our subqueries

Create the scorer used to score our associated

Explain the score we computed for doc

Create the used to score us

Optimize our representation and our subqueries representations

The we query An optimized copy of us (which may not be a copy if there is nothing to optimize)

Create a shallow copy of us -- used in rewriting if necessary

A copy of us (but reuse, don't copy, our subqueries)

Expert: adds all terms occurring in this query to the terms set. Only works if this query is in its rewritten () form.

If this query is not yet rewritten

Prettyprint us.

The field to which we are applied A string that shows what we do, of the form "(disjunct1 | disjunct2 | ... | disjunctn)^boost"

Return true if we represent the same query as

Another object true if is a with the same boost and the same subqueries, in the same order, as us

Compute a hash code for hashing us

the hash code

The for . The union of all documents generated by the the subquery scorers is generated in document number order. The score for each document is the maximum of the scores computed by the subquery scorers that generate that document, plus times the sum of the scores for the other subqueries that generate the document.

Multiplier applied to non-maximum-scoring subqueries for a document as they are summed into the result.

Used when scoring currently matching doc.

Creates a new instance of

The to be used. Multiplier applied to non-maximum-scoring subqueries for a document as they are summed into the result. The sub scorers this should iterate on

Determine the current document score. Initially invalid, until is called the first time.

The score of the current generated document

Recursively iterate all subScorers that generated last doc computing sum and max

Base class for s that score disjunctions. Currently this just provides helper methods to manage the heap.

The document number of the current match.

Organize subScorers into a min heap with scorers generating the earliest document on top.

The subtree of subScorers at root is a min heap except possibly for its root element. Bubble the root down as required to make the subtree a heap.

Remove the root from subScorers and re-establish it as a heap

Called after or land on a new document. subScorers[0] will be positioned to the new docid, which could be NO_MORE_DOCS (subclass must handle this). Implementations should assign doc appropriately, and do any other work necessary to implement and

A for OR like queries, counterpart of . This implements and uses Advance() on the given s.

The number of subscorers that provide the current match.

Construct a .

The weight to be used. Array of at least two subscorers. Table of coordination factors

Returns the score of the current document matching the query. Initially invalid, until is called the first time.

A contains a set of doc ids. Implementing classes must only implement to provide access to the set.

Provides a to access the set. This implementation can return null if there are no docs that match.

Optionally provides a interface for random access to matching documents.

null, if this does not support random access. In contrast to , a return value of null does not imply that no documents match the filter! The default implementation does not provide random access, so you only need to implement this method if your can guarantee random access to every docid in O(1) time without external disk access (as interface cannot throw ). This is generally true for bit sets like , which return itself if they are used as .

This method is a hint for , if this should be cached without copying it. The default is to return false. If you have an own implementation that does its iteration very effective and fast without doing disk I/O, override this property and return true.

Creates a new instance with the ability to specify the body of the method through the parameter. Simple example:


                var docIdSet = DocIdSet.NewAnonymous(getIterator: () =>
                {
                    OpenBitSet bitset = new OpenBitSet(5);
                    bitset.Set(0, 5);
                    return new DocIdBitSet(bitset);
                });

LUCENENET specific

A delegate method that represents (is called by) the method. It returns the for this . A new instance.

Creates a new instance with the ability to specify the body of the method through the parameter and the body of the property through the parameter. Simple example:


                var docIdSet = DocIdSet.NewAnonymous(getIterator: () =>
                {
                    OpenBitSet bitset = new OpenBitSet(5);
                    bitset.Set(0, 5);
                    return new DocIdBitSet(bitset);
                }, bits: () => 
                {
                    return bits;
                });

LUCENENET specific

A delegate method that represents (is called by) the method. It returns the for this . A delegate method that represents (is called by) the property. It returns the instance for this . A new instance.

Creates a new instance with the ability to specify the body of the method through the parameter and the body of the property through the parameter. Simple example:


                var docIdSet = DocIdSet.NewAnonymous(getIterator: () =>
                {
                    OpenBitSet bitset = new OpenBitSet(5);
                    bitset.Set(0, 5);
                    return new DocIdBitSet(bitset);
                }, isCacheable: () =>
                {
                    return true;
                });

LUCENENET specific

A delegate method that represents (is called by) the method. It returns the for this . A delegate method that represents (is called by) the property. It returns a value. A new instance.

Creates a new instance with the ability to specify the body of the method through the parameter and the body of the property through the parameter. Simple example:


                var docIdSet = DocIdSet.NewAnonymous(getIterator: () =>
                {
                    OpenBitSet bitset = new OpenBitSet(5);
                    bitset.Set(0, 5);
                    return new DocIdBitSet(bitset);
                }, bits: () => 
                {
                    return bits;
                }, isCacheable: () =>
                {
                    return true;
                });

LUCENENET specific

This abstract class defines methods to iterate over a set of non-decreasing doc ids. Note that this class assumes it iterates on doc Ids, and therefore is set to in order to be used as a sentinel object. Implementations of this class are expected to consider as an invalid value.

An empty instance

When returned by , and it means there are no more docs in the iterator.

Returns the following: -1 or if or were not called yet. if the iterator has exhausted. Otherwise it should return the doc ID it is currently on. @since 2.9

Advances to the next document in the set and returns the doc it is currently on, or if there are no more docs in the set. NOTE: after the iterator has exhausted you should not call this method, as it may result in unpredicted behavior. @since 2.9

Advances to the first beyond the current whose document number is greater than or equal to target, and returns the document number itself. Exhausts the iterator and returns if target is greater than the highest document number in the set. The behavior of this method is undefined when called with target <= current, or after the iterator has exhausted. Both cases may result in unpredicted behavior. When target > current it behaves as if written:


             int Advance(int target) 
             {
                 int doc;
                 while ((doc = NextDoc()) < target) 
                 {
                 }
                 return doc;
             }

Some implementations are considerably more efficient than that. NOTE: this method may be called with for efficiency by some s. If your implementation cannot efficiently determine that it should exhaust, it is recommended that you check for that value in each call to this method. @since 2.9

Slow (linear) implementation of relying on to advance beyond the target position.

Returns the estimated cost of this . This is generally an upper bound of the number of documents this iterator might match, but may be a rough heuristic, hardcoded value, or otherwise completely inaccurate.

A range filter built on top of a cached multi-valued term field (in ). Like , this is just a specialized range query versus using a with : it will only do two ordinal to term lookups.

This method is implemented for each data type

Creates a BytesRef range filter using . This works with all fields containing zero or one term in the field. The range can be half-open by setting one of the values to null.

Returns the field name for this filter

Returns true if the lower endpoint is inclusive

Returns true if the upper endpoint is inclusive

Returns the lower value of this range filter

Returns the upper value of this range filter

Rewrites s into a filter, using DocTermOrds for term enumeration. This can be used to perform these queries against an unindexed docvalues field. @lucene.experimental

Wrap a as a .

Returns the field name for this query

Returns a with documents that should be permitted in search results.

Expert: Describes the score computation for document and query.

Indicates whether or not this models a good match. By default, an Explanation represents a "match" if the value is positive.

Gets or Sets the value assigned to this explanation node.

Gets or Sets the description of this explanation node.

A short one line summary which should contain all high level information about this , without the "Details"

The sub-nodes of this explanation node.

Adds a sub-node to this explanation node.

Render an explanation as text.

Render an explanation as HTML.

Used by s that need to pass a to .

Expert: Maintains caches of term values. Created: May 19, 2004 11:13:14 AM @lucene.internal @since lucene 1.4

Checks the internal cache for an appropriate entry, and if none is found, reads the terms in and returns a bit set at the size of reader.MaxDoc, with turned on bits for each docid that does have a value for this field.

Checks the internal cache for an appropriate entry, and if none is found, reads the terms in as a single and returns an array of size reader.MaxDoc of the value each document has in the given field.

Used to get field values. Which field contains the single values. If true then will also be computed and stored in the . The values in the given field for each document. If any error occurs.

Checks the internal cache for an appropriate entry, and if none is found, reads the terms in as bytes and returns an array of size reader.MaxDoc of the value each document has in the given field.

Used to get field values. Which field contains the s. Computes for string values. If true then will also be computed and stored in the . The values in the given field for each document. If any error occurs.

Checks the internal cache for an appropriate entry, and if none is found, reads the terms in as s and returns an array of size reader.MaxDoc of the value each document has in the given field. NOTE: this was getShorts() in Lucene

Used to get field values. Which field contains the s. If true then will also be computed and stored in the . The values in the given field for each document. If any error occurs.

Checks the internal cache for an appropriate entry, and if none is found, reads the terms in as shorts and returns an array of size reader.MaxDoc of the value each document has in the given field. NOTE: this was getShorts() in Lucene

Returns an over the values found in documents in the given field. NOTE: this was getInts() in Lucene

Returns an over the values found in documents in the given field. If the field was indexed as , it simply uses to read the values. Otherwise, it checks the internal cache for an appropriate entry, and if none is found, reads the terms in as s and returns an array of size reader.MaxDoc of the value each document has in the given field. NOTE: this was getInts() in Lucene

Used to get field values. Which field contains the s. Computes for string values. May be null if the requested field was indexed as or . If true then will also be computed and stored in the . The values in the given field for each document. If any error occurs.

Returns a over the values found in documents in the given field. NOTE: this was getFloats() in Lucene

Returns a over the values found in documents in the given field. If the field was indexed as , it simply uses to read the values. Otherwise, it checks the internal cache for an appropriate entry, and if none is found, reads the terms in as s and returns an array of size reader.MaxDoc of the value each document has in the given field. NOTE: this was getFloats() in Lucene

Returns a over the values found in documents in the given field. NOTE: this was getLongs() in Lucene

Returns a over the values found in documents in the given field.

Checks the internal cache for an appropriate entry, and if none is found, reads the term values in and returns a instance, providing a method to retrieve the term (as a ) per document.

Used to get field values. Which field contains the strings. If true then will also be computed and stored in the . The values in the given field for each document. If any error occurs.

Expert: just like , but you can specify whether more RAM should be consumed in exchange for faster lookups (default is "true"). Note that the first call for a given reader and field "wins", subsequent calls will share the same cache entry.

Checks the internal cache for an appropriate entry, and if none is found, reads the term values in and returns a instance, providing methods to retrieve sort ordinals and terms (as a ) per document.

Used to get field values. Which field contains the strings. The values in the given field for each document. If any error occurs.

Checks the internal cache for an appropriate entry, and if none is found, reads the term values in and returns a instance, providing a method to retrieve the terms (as ords) per document.

Used to build a instance Which field contains the strings. a instance If any error occurs.

EXPERT: Generates an array of objects representing all items currently in the . NOTE: These objects maintain a strong reference to the Cached Values. Maintaining references to a the associated with it has garbage collected will prevent the Value itself from being garbage collected when the Cache drops the . @lucene.experimental

EXPERT: Instructs the FieldCache to forcibly expunge all entries from the underlying caches. This is intended only to be used for test methods as a way to ensure a known base state of the Cache (with out needing to rely on GC to free s). It should not be relied on for "Cache maintenance" in general application code. @lucene.experimental

Expert: drops all cache entries associated with this reader . NOTE: this cache key must precisely match the reader that the cache entry is keyed on. If you pass a top-level reader, it usually will have no effect as Lucene now caches at the segment reader level.

If non-null, will warn whenever entries are created that are not sane according to .

Field values as 8-bit signed bytes

Initialize an instance of .

Initialize an instance of with the specified delegate method.

A that implements the method body.

Return a single representation of this field's value.

Zero value for every document

Field values as 16-bit signed shorts NOTE: This was Shorts in Lucene

Initialize an instance of .

Initialize an instance of with the specified delegate method.

A that implements the method body.

Return a representation of this field's value.

Zero value for every document

Field values as 32-bit signed integers NOTE: This was Ints in Lucene

Initialize an instance of .

Initialize an instance of with the specified delegate method.

A that implements the method body.

Return an representation of this field's value.

Zero value for every document

Field values as 64-bit signed long integers NOTE: This was Longs in Lucene

Initialize an instance of .

Initialize an instance of with the specified delegate method.

A that implements the method body.

Return an representation of this field's value.

Zero value for every document

Field values as 32-bit floats NOTE: This was Floats in Lucene

Initialize an instance of .

Initialize an instance of with the specified delegate method.

A that implements the method body.

Return an representation of this field's value.

Zero value for every document

Field values as 64-bit doubles

Initialize an instance of .

Initialize an instance of with the specified delegate method.

A that implements the method body.

Return a representation of this field's value.

Zero value for every document

Interface used to identify a without referencing its generic closing type.

Placeholder indicating creation of this cache is currently in-progress.

Marker interface as super-interface to all parsers. It is used to specify a custom parser to .

Pulls a from the given . This method allows certain parsers to filter the actual before the field cache is filled.

The instance to create the from. A possibly filtered instance, this method must not return null. If an occurs

Interface to parse bytes from document fields.

Return a single Byte representation of this field's value.

Interface to parse s from document fields. NOTE: This was ShortParser in Lucene

Return a representation of this field's value. NOTE: This was parseShort() in Lucene

Interface to parse s from document fields. NOTE: This was IntParser in Lucene

Return an representation of this field's value. NOTE: This was parseInt() in Lucene

Interface to parse s from document fields. NOTE: This was FloatParser in Lucene

Return an representation of this field's value. NOTE: This was parseFloat() in Lucene

Interface to parse from document fields. NOTE: This was LongParser in Lucene

Return a representation of this field's value. NOTE: This was parseLong() in Lucene

Interface to parse s from document fields.

Return an representation of this field's value.

Expert: The cache used internally by sorting and range query classes.

The default parser for byte values, which are encoded by using .

The default parser for values, which are encoded by using . NOTE: This was DEFAULT_SHORT_PARSER in Lucene

NOTE: This was parseShort() in Lucene

The default parser for values, which are encoded by using . NOTE: This was DEFAULT_INT_PARSER in Lucene

NOTE: This was parseInt() in Lucene

The default parser for values, which are encoded by using . NOTE: This was DEFAULT_FLOAT_PARSER in Lucene

NOTE: This was parseFloat() in Lucene

The default parser for values, which are encoded by using . NOTE: This was DEFAULT_LONG_PARSER in Lucene

NOTE: This was parseLong() in Lucene

The default parser for values, which are encoded by using .

A parser instance for values encoded by , e.g. when indexed via /. NOTE: This was NUMERIC_UTILS_INT_PARSER in Lucene

NOTE: This was parseInt() in Lucene

A parser instance for values encoded with , e.g. when indexed via /. NOTE: This was NUMERIC_UTILS_FLOAT_PARSER in Lucene

NOTE: This was parseFloat() in Lucene

A parser instance for values encoded by , e.g. when indexed via /. NOTE: This was NUMERIC_UTILS_LONG_PARSER in Lucene

NOTE: This was parseLong() in Lucene

A parser instance for values encoded with , e.g. when indexed via /.

EXPERT: A unique Identifier/Description for each item in the . Can be useful for logging/debugging. @lucene.experimental

Computes (and stores) the estimated size of the cache

The most recently estimated size of the value, null unless has been called.

Base class for to be used with . The implementation of its iterator is very stupid and slow if the implementation of the method is not optimized, as iterators simply increment the document id until returns true. Because of this must be as fast as possible and in no case do any I/O. @lucene.internal

This method checks, if a doc is a hit

This DocIdSet is always cacheable (does not go back to the reader for iteration)

Expert: The default cache implementation, storing all values in memory. A WeakHashMap is used for storage. @since lucene 1.4

Expert: Internal cache.

Remove this reader from the cache, if present.

Sets the key to the value for the provided reader; if the key is already set then this doesn't change it.

Expert: Every composite-key in the internal cache is of this type.

Creates one of these objects for a custom comparer/parser.

Two of these are equal if they reference the same field and type.

Composes a hashcode based on the field and type.

Expert: Every composite-key in the internal cache is of this type.

Creates one of these objects for a custom comparer/parser.

Two of these are equal if they reference the same field and type.

Composes a hashcode based on the field and type.

Checks the internal cache for an appropriate entry, and if none is found, reads the terms in as a single and returns an array of size reader.MaxDoc of the value each document has in the given field.

Used to get field values. Which field contains the single values. If true then will also be computed and stored in the . The values in the given field for each document. If any error occurs.

Used to get field values. Which field contains the s. If true then will also be computed and stored in the . The values in the given field for each document. If any error occurs.

NOTE: This was ShortsFromArray in Lucene

NOTE: This was ShortCache in Lucene

Returns an over the values found in documents in the given field. NOTE: this was getInts() in Lucene

NOTE: This was IntsFromArray in Lucene

NOTE: This was IntCache in Lucene

NOTE: this was getFloats() in Lucene

NOTE: This was FloatsFromArray in Lucene

NOTE: This was FloatCache in Lucene

NOTE: this was getLongs() in Lucene

NOTE: This was LongsFromArray in Lucene

NOTE: This was LongCache in Lucene

A range filter built on top of a cached single term field (in ). builds a single cache for the field the first time it is used. Each subsequent on the same field then reuses this cache, even if the range itself changes. this means that is much faster (sometimes more than 100x as fast) as building a , if using a . However, if the range never changes it is slower (around 2x as slow) than building a on top of a single . For numeric data types, this filter may be significantly faster than . Furthermore, it does not need the numeric values encoded by , , or . But it has the problem that it only works with exact one value/document (see below). As with all based functionality, is only valid for fields which exact one term for each document (except for where 0 terms are also allowed). Due to a restriction of , for numeric ranges all terms that do not have a numeric value, 0 is assumed. Thus it works on dates, prices and other single value fields but will not work on regular text fields. It is preferable to use a field to ensure that there is only a single term. This class does not have an constructor, use one of the static factory methods available, that create a correct instance for different data types supported by .

Creates a string range filter using . This works with all fields containing zero or one term in the field. The range can be half-open by setting one of the values to null.

Creates a range filter using . This works with all fields containing zero or one term in the field. The range can be half-open by setting one of the values to null.

Creates a numeric range filter using . This works with all fields containing exactly one numeric term in the field. The range can be half-open by setting one of the values to null.

Creates a numeric range filter using . This works with all fields containing exactly one numeric term in the field. The range can be half-open by setting one of the values to null. NOTE: this was newShortRange() in Lucene

Creates a numeric range filter using . This works with all fields containing exactly one numeric term in the field. The range can be half-open by setting one of the values to null.

This method is implemented for each data type

Returns the field name for this filter

Returns true if the lower endpoint is inclusive

Returns true if the upper endpoint is inclusive

Returns the lower value of this range filter

Returns the upper value of this range filter

Returns the current numeric parser (null for is )

Rewrites s into a filter, using the for term enumeration. This can be used to perform these queries against an unindexed docvalues field. @lucene.experimental

Wrap a as a Filter.

Returns the field name for this query

Returns a DocIdSet with documents that should be permitted in search results.

A that only accepts documents whose single term value in the specified field is contained in the provided set of allowed terms. This is the same functionality as TermsFilter (from queries/), except this filter requires that the field contains only a single term for all documents. Because of drastically different implementations, they also have different performance characteristics, as described below. The first invocation of this filter on a given field will be slower, since a must be created. Subsequent invocations using the same field will re-use this cache. However, as with all functionality based on , persistent RAM is consumed to hold the cache, and is not freed until the is disposed. In contrast, TermsFilter has no persistent RAM consumption. With each search, this filter translates the specified set of into a private keyed by term number per unique (normally one reader per segment). Then, during matching, the term number for each docID is retrieved from the cache and then checked for inclusion using the . Since all testing is done using RAM resident data structures, performance should be very fast, most likely fast enough to not require further caching of the for each possible combination of terms. However, because docIDs are simply scanned linearly, an index with a great many small documents may find this linear scan too costly. In contrast, TermsFilter builds up a , keyed by docID, every time it's created, by enumerating through all matching docs using to seek and scan through each term's docID list. While there is no linear scan of all docIDs, besides the allocation of the underlying array in the , this approach requires a number of "disk seeks" in proportion to the number of terms, which can be exceptionally costly when there are cache misses in the OS's IO cache. Generally, this filter will be slower on the first invocation for a given field, but subsequent invocations, even if you change the allowed set of , should be faster than TermsFilter, especially as the number of being matched increases. If you are matching only a very small number of terms, and those terms in turn match a very small number of documents, TermsFilter may perform faster. Which filter is best is very application dependent.

Expert: a compares hits so as to determine their sort order when collecting the top results with . The concrete public classes here correspond to the types. This API is designed to achieve high performance sorting, by exposing a tight interaction with as it visits hits. Whenever a hit is competitive, it's enrolled into a virtual slot, which is an ranging from 0 to numHits-1. The is made aware of segment transitions during searching in case any internal state it's tracking needs to be recomputed during these transitions. A comparer must define these functions: Compare a hit at 'slot a' with hit 'slot b'. This method is called by to notify the of the current weakest ("bottom") slot. Note that this slot may not hold the weakest value according to your comparer, in cases where your comparer is not the primary one (ie, is only used to break ties from the comparers before it). Compare a new hit (docID) against the "weakest" (bottom) entry in the queue. This method is called by to notify the of the top most value, which is used by future calls to . Compare a new hit (docID) against the top value previously set by a call to . Installs a new hit into the priority queue. The calls this method when a new hit is competitive. Invoked when the search is switching to the next segment. You may need to update internal state of the comparer, for example retrieving new values from the . Return the sort value stored in the specified slot. This is only called at the end of the search, in order to populate when returning the top results. @lucene.experimental

Compare hit at with hit at .

first slot to compare second slot to compare any N < 0 if 's value is sorted after , any N > 0 if the 's value is sorted before and 0 if they are equal

Set the bottom slot, ie the "weakest" (sorted last) entry in the queue. When is called, you should compare against this slot. This will always be called before .

the currently weakest (sorted last) slot in the queue

Record the top value, for future calls to . This is only called for searches that use SearchAfter (deep paging), and is called before any calls to .

The value to use as the top value. does not derive from and is not null.

Record the top value, for future calls to . This is only called for searches that use SearchAfter (deep paging), and is called before any calls to .

Return the actual value in the slot. LUCENENET NOTE: This was value(int) in Lucene.

The value Value in this slot

Compare the bottom of the queue with this doc. This will only invoked after has been called. This should return the same result as as if bottom were slot1 and the new document were slot 2. For a search that hits many results, this method will be the hotspot (invoked by far the most frequently).

Doc that was hit Any N < 0 if the doc's value is sorted after the bottom entry (not competitive), any N > 0 if the doc's value is sorted before the bottom entry and 0 if they are equal.

Compare the top value with this doc. This will only invoked after has been called. This should return the same result as as if topValue were slot1 and the new document were slot 2. This is only called for searches that use SearchAfter (deep paging).

Doc that was hit Any N < 0 if the doc's value is sorted after the bottom entry (not competitive), any N > 0 if the doc's value is sorted before the bottom entry and 0 if they are equal.

This method is called when a new hit is competitive. You should copy any state associated with this document that will be required for future comparisons, into the specified slot.

Which slot to copy the hit to DocID relative to current reader

Set a new . All subsequent docIDs are relative to the current reader (you must add docBase if you need to map it to a top-level docID).

Current reader context The comparer to use for this segment; most comparers can just return "this" to reuse the same comparer across segments If there is a low-level IO error

Returns -1 if first is less than second. Default implementation to assume the type implements and invoke ; be sure to override this method if your 's type isn't a or if you need special null handling.

or does not derive from and is not null.

Compare hit at with hit at .

first slot to compare second slot to compare any N < 0 if 's value is sorted after , any N > 0 if the 's value is sorted before and 0 if they are equal

Set the bottom slot, ie the "weakest" (sorted last) entry in the queue. When is called, you should compare against this slot. This will always be called before .

The currently weakest (sorted last) slot in the queue

Record the top value, for future calls to . This is only called for searches that use SearchAfter (deep paging), and is called before any calls to .

Compare the bottom of the queue with this doc. This will only invoked after setBottom has been called. This should return the same result as as if bottom were slot1 and the new document were slot 2. For a search that hits many results, this method will be the hotspot (invoked by far the most frequently).

Doc that was hit Any N < 0 if the doc's value is sorted after the bottom entry (not competitive), any N > 0 if the doc's value is sorted before the bottom entry and 0 if they are equal.

This method is called when a new hit is competitive. You should copy any state associated with this document that will be required for future comparisons, into the specified slot.

Which slot to copy the hit to DocID relative to current reader

Set a new . All subsequent docIDs are relative to the current reader (you must add docBase if you need to map it to a top-level docID).

Current reader context The comparer to use for this segment; most comparers can just return "this" to reuse the same comparer across segments if there is a low-level IO error

Sets the to use in case a document's score is needed.

instance that you should use to obtain the current hit's score, if necessary.

Return the actual value in the slot. LUCENENET NOTE: This was value(int) in Lucene.

The value Value in this slot

Base FieldComparer class for numeric types

Parses field's values as (using and sorts by ascending value

Parses field's values as (using and sorts by ascending value NOTE: This was FloatComparator in Lucene

Parses field's values as (using and sorts by ascending value NOTE: This was ShortComparator in Lucene

Parses field's values as (using and sorts by ascending value NOTE: This was IntComparator in Lucene

Parses field's values as (using and sorts by ascending value NOTE: This was LongComparator in Lucene

Sorts by descending relevance. NOTE: if you are sorting only by descending relevance and then secondarily by ascending docID, performance is faster using directly (which all overloads of use when no is specified).

Sorts by ascending docID

Sorts by field's natural sort order, using ordinals. This is functionally equivalent to , but it first resolves the string to their relative ordinal positions (using the index returned by ), and does most comparisons using the ordinals. For medium to large results, this comparer will be much faster than . For very small result sets it may be slower.

Ords for each slot. @lucene.internal

Values for each slot. @lucene.internal

Which reader last copied a value into the slot. When we compare two slots, we just compare-by-ord if the readerGen is the same; else we must compare the values(slower). @lucene.internal

Gen of current reader we are on. @lucene.internal

Current reader's doc ord/values. @lucene.internal

Bottom slot, or -1 if queue isn't full yet @lucene.internal

Bottom ord (same as ords[bottomSlot] once bottomSlot is set). Cached for faster compares. @lucene.internal

True if current bottom slot matches the current reader. @lucene.internal

Bottom value (same as values[bottomSlot] once bottomSlot is set). Cached for faster compares. @lucene.internal

Set by setTopValue.

-1 if missing values are sorted first, 1 if they are sorted last

Which ordinal to use for a missing value.

Creates this, sorting missing values first.

Creates this, with control over how missing values are sorted. Pass true for to put missing values at the end.

Retrieves the for the field in this segment

Sorts by field's natural sort order. All comparisons are done using , which is slow for medium to large result sets but possibly very fast for very small results sets.

Sole constructor.

Provides a for custom field sorting. @lucene.experimental

Creates a comparer for the field in the given index.

Name of the field to create comparer for. . If an error occurs reading the index.

Expert: A which also contains information about how to sort the referenced document. In addition to the document number and score, this object contains an array of values for the document from the field(s) used to sort. For example, if the sort criteria was to sort by fields "a", "b" then "c", the fields object array will have three elements, corresponding respectively to the term values for the document in fields "a", "b" and "c". The class of each element in the array will be either , or depending on the type of values in the terms of each field. Created: Feb 11, 2004 1:23:38 PM @since lucene 1.4

Expert: The values which are used to sort the referenced document. The order of these will match the original sort criteria given by a object. Each Object will have been returned from the method corresponding FieldComparer used to sort this field.

Expert: Creates one of these objects with empty sort information.

Expert: Creates one of these objects with the given sort information.

A convenience method for debugging.

A that accepts all documents that have one or more values in a given field. this request from the and build the bits if not present.

Creates a new

The field to filter

Creates a new

The field to filter If true all documents with no value in the given field are accepted.

Returns the field this filter is applied on.

The field this filter is applied on.

Returns true if this filter is negated, otherwise false

true if this filter is negated, otherwise false

An implementation of which is optimized in case there is just one comparer.

Returns whether a is less relevant than b.

ScoreDoc ScoreDoc true if document a should be sorted after document b.

An implementation of which is optimized in case there is more than one comparer.

Creates a hit queue sorted by the given list of fields. NOTE: The instances returned by this method pre-allocate a full array of length numHits.

array we are sorting by in priority order (highest priority first); cannot be null or empty The number of hits to retain. Must be greater than zero. If there is a low-level IO error

Expert: A hit queue for sorting by hits by terms in more than one field. Uses FieldCache.DEFAULT for maintaining internal term lookup tables. @lucene.experimental @since 2.9

Stores the sort criteria being used.

Given a queue , creates a corresponding that contains the values used to sort the given document. These values are not the raw values out of the index, but the internal representation of them. This is so the given search hit can be collated by a MultiSearcher with other search hits.

The used to create a The newly created

Returns the s being used by this hit queue.

Abstract base class for restricting which documents may be returned during searching.

Creates a enumerating the documents that should be permitted in search results. NOTE: null can be returned if no documents are accepted by this . Note: this method will be called once per segment in the index during searching. The returned must refer to document IDs for that segment, not for the top-level reader.

a instance opened on the index currently searched on. Note, it is likely that the provided reader info does not represent the whole underlying index i.e. if the index has more than one segment the given reader only represents a single segment. The provided context is always an atomic context, so you can call on the context's reader, for example. that represent the allowable docs to match (typically deleted docs but possibly filtering other documents) A that provides the documents which should be permitted or prohibited in search results. NOTE: null should be returned if the filter doesn't accept any documents otherwise internal optimization might not apply in the case an empty is returned.

Creates a new instance with the ability to specify the body of the method through the parameter. Simple example:


                var filter = Filter.NewAnonymous(getDocIdSet: (context, acceptDocs) =>
                {
                    if (acceptDocs is null) acceptDocs = new Bits.MatchAllBits(5);
                    OpenBitSet bitset = new OpenBitSet(5);
                    if (acceptDocs.Get(1)) bitset.Set(1);
                    if (acceptDocs.Get(3)) bitset.Set(3);
                    return new DocIdBitSet(bitset);
                });

LUCENENET specific

A delegate method that represents (is called by) the method. It accepts a context and a acceptDocs and returns the for this filter.

Abstract decorator class for a implementation that provides on-demand filtering/validation mechanism on a given . Technically, this same functionality could be achieved with ChainedFilter (under queries/), however the benefit of this class is it never materializes the full bitset for the filter. Instead, the method is invoked on-demand, per docID visited during searching. If you know few docIDs will be visited, and the logic behind is relatively costly, this may be a better way to filter than ChainedFilter.

Constructor.

Underlying

This implementation is cacheable if the inner set is cacheable.

Validation method to determine whether a docid should be in the result set.

docid to be tested true if input docid should be in the result set, false otherwise.

Implementation of the contract to build a .

Abstract decorator class of a implementation that provides on-demand filter/validation mechanism on an underlying . See .

Constructor.

Underlying .

Validation method to determine whether a docid should be in the result set.

docid to be tested true if input docid should be in the result set, false otherwise.

A query that applies a filter to the results of another query. Note: the bits are retrieved from the filter each time this query is used in a search - use a to avoid regenerating the bits every time. @since 1.4

Constructs a new query which applies a filter to the results of the original query. will be called every time this query is used in a search.

Query to be filtered, cannot be null. Filter to apply to query results, cannot be null.

Expert: Constructs a new query which applies a filter to the results of the original query. will be called every time this query is used in a search.

Query to be filtered, cannot be null. Filter to apply to query results, cannot be null. A filter strategy used to create a filtered scorer.

Returns a that applies the filter to the enclosed query's . this is accomplished by overriding the returned by the .

A scorer that consults the filter if a document was matched by the delegate scorer. This is useful if the filter computation is more expensive than document scoring or if the filter has a linear running time to compute the next matching doc like exact geo distances.

A that uses a "leap-frog" approach (also called "zig-zag join"). The scorer and the filter take turns trying to advance to each other's next matching document, often jumping past the target document. When both land on the same document, it's collected.

Rewrites the query. If the wrapped is an instance of it returns a . Otherwise it returns a new wrapping the rewritten query.

Returns this 's (unfiltered)

Returns this 's filter

Returns this 's

Expert: adds all terms occurring in this query to the terms set. Only works if this query is in its rewritten () form.

If this query is not yet rewritten

Prints a user-readable version of this query.

Returns true if is equal to this.

Returns a hash code value for this object.

A that conditionally uses a random access filter if the given supports random access (returns a non-null value from ) and returns true. Otherwise this strategy falls back to a "zig-zag join" ( ) strategy. Note: this strategy is the default strategy in

A filter strategy that uses a "leap-frog" approach (also called "zig-zag join"). The scorer and the filter take turns trying to advance to each other's next matching document, often jumping past the target document. When both land on the same document, it's collected. Note: this strategy uses the filter to lead the iteration.

A filter strategy that advances the or rather its first and consults the filter for each matched document. Note: this strategy requires a to return a non-null value. Otherwise this strategy falls back to Use this strategy if the filter computation is more expensive than document scoring or if the filter has a linear running time to compute the next matching doc like exact geo distances.

Abstract class that defines how the filter () applied during document collection.

Returns a filtered based on this strategy.

the for which to return the . the to create the filtered scorer. the filter to apply a filtered scorer if an occurs

Returns a filtered based on this strategy. this is an optional method: the default implementation just calls and wraps that into a .

the for which to return the . the to create the filtered scorer. true to score docs in order the filter to apply a filtered top scorer

A that conditionally uses a random access filter if the given supports random access (returns a non-null value from ) and returns true. Otherwise this strategy falls back to a "zig-zag join" ( ) strategy .

Expert: decides if a filter should be executed as "random-access" or not. Random-access means the filter "filters" in a similar way as deleted docs are filtered in Lucene. This is faster when the filter accepts many documents. However, when the filter is very sparse, it can be faster to execute the query+filter as a conjunction in some cases. The default implementation returns true if the first document accepted by the filter is < 100. @lucene.internal

A filter strategy that advances the first and consults the for each matched document. Note: this strategy requires a to return a non-null value. Otherwise this strategy falls back to Use this strategy if the filter computation is more expensive than document scoring or if the filter has a linear running time to compute the next matching doc like exact geo distances.

Implements the fuzzy search query. The similarity measurement is based on the Damerau-Levenshtein (optimal string alignment) algorithm, though you can explicitly choose classic Levenshtein by passing false to the transpositions parameter. this query uses as default. So terms will be collected and scored according to their edit distance. Only the top terms are used for building the . It is not recommended to change the rewrite mode for fuzzy queries. At most, this query will match terms up to edits. Higher distances (especially with transpositions enabled), are generally not useful and will match a significant amount of the term dictionary. If you really want this, consider using an n-gram indexing technique (such as the SpellChecker in the suggest module) instead. NOTE: terms of length 1 or 2 will sometimes not match because of how the scaled distance between two terms is computed. For a term to match, the edit distance between the terms must be less than the minimum length term (either the input term, or the candidate term). For example, on term "abcd" with maxEdits=2 will not match an indexed term "ab", and on term "a" with maxEdits=2 will not match an indexed term "abc".

Create a new that will match terms with an edit distance of at most to . If a > 0 is specified, a common prefix of that length is also required.

The term to search for Must be >= 0 and <= . Length of common (non-fuzzy) prefix The maximum number of terms to match. If this number is greater than when the query is rewritten, then the maxClauseCount will be used instead. true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.

Calls FuzzyQuery(term, maxEdits, prefixLength, defaultMaxExpansions, defaultTranspositions).

Calls FuzzyQuery(term, maxEdits, defaultPrefixLength).

Calls FuzzyQuery(term, defaultMaxEdits).

The maximum number of edit distances allowed for this query to match.

Returns the non-fuzzy prefix length. This is the number of characters at the start of a term that must be identical (not fuzzy) to the query term if the query is to match that term.

Returns true if transpositions should be treated as a primitive edit operation. If this is false, comparisons will implement the classic Levenshtein algorithm.

Returns the pattern term.

@deprecated pass integer edit distances instead.

Helper function to convert from deprecated "minimumSimilarity" fractions to raw edit distances. NOTE: this was floatToEdits() in Lucene

Scaled similarity Length (in unicode codepoints) of the term. Equivalent number of maxEdits

Subclass of for enumerating all terms that are similar to the specified filter term. Term enumerations are always ordered by . Each term in the enumeration is greater than all that precede it.

Constructor for enumeration of all terms from specified reader which share a prefix of length with and which have a fuzzy similarity > . After calling the constructor the enumeration is already pointing to the first valid term if such a term exists.

Delivers terms. created by the rewrite method of thats contains information about competitive boosts during rewrite. It is also used to cache DFAs between segment transitions. Pattern term. Minimum required similarity for terms from the reader. Pass an integer value representing edit distance. Passing a fraction is deprecated. Length of required common prefix. Default value is 0. Transpositions if there is a low-level IO error

Return an automata-based enum for matching up to from , if possible

Initialize levenshtein DFAs up to maxDistance, if possible

Swap in a new actual enum to proxy to

Fired when the max non-competitive boost has changed. This is the hook to swap in a smarter actualEnum

Implement fuzzy enumeration with . This is the fastest method as opposed to LinearFuzzyTermsEnum: as enumeration is logarithmic to the number of terms (instead of linear) and comparison is linear to length of the term (rather than quadratic)

Finds the smallest Lev(n) DFA that accepts the term.

Returns true if is within edits of the query term

@lucene.internal

Reuses compiled automata across different segments, because they are independent of the index @lucene.internal

Stores compiled automata as a list (indexed by edit distance) @lucene.internal

Creates a new instance with elements. If is set to true, the queue will pre-populate itself with sentinel objects and set its to . In that case, you should not rely on to get the number of actual elements that were added to the queue, but keep track yourself. NOTE: in case is true, you should pop elements from the queue using the following code example:


             PriorityQueue<ScoreDoc> pq = new HitQueue(10, true); // pre-populate.
             ScoreDoc top = pq.Top;
            
             // Add/Update one element.
             top.Score = 1.0f;
             top.Soc = 0;
             top = (ScoreDoc) pq.UpdateTop();
             int totalHits = 1;
            
             // Now pop only the elements that were *truly* inserted.
             // First, pop all the sentinel elements (there are pq.Count - totalHits).
             for (int i = pq.Count - totalHits; i > 0; i--) pq.Pop();
            
             // Now pop the truly added elements.
             ScoreDoc[] results = new ScoreDoc[totalHits];
             for (int i = totalHits - 1; i >= 0; i--) 
             {
                 results[i] = (ScoreDoc)pq.Pop();
             }

NOTE: this overload will pre-allocate a full array of length .

The requested size of this queue. Specifies whether to pre-populate the queue with sentinel values.

Implements search over a single . Applications usually need only call the inherited or methods. For performance reasons, if your index is unchanging, you should share a single instance across multiple searches instead of creating a new one per-search. If your index has changed and you wish to see the changes reflected in searching, you should use to obtain a new reader and then create a new from that. Also, for low-latency turnaround it's best to use a near-real-time reader (). Once you have a new , it's relatively cheap to create a new from it.

NOTE: instances are completely thread safe, meaning multiple threads can call any of its methods, concurrently. If your application requires external synchronization, you should not synchronize on the instance; use your own (non-Lucene) objects instead.

Used with executor - each slice holds a set of leafs executed within one thread

Expert: returns a default instance. In general, this method is only called to initialize searchers and writers. User code and query implementations should respect . @lucene.internal

The implementation used by this searcher.

Creates a searcher searching the provided index.

is null.

Runs searches for each segment separately, using the provided . will not shutdown/awaitTermination this on dispose; you must do so, eventually, on your own. @lucene.experimental

is null.

Creates a searcher searching the provided top-level . Given a non-null this method runs searches for each segment separately, using the provided . will not shutdown/awaitTermination this on close; you must do so, eventually, on your own. @lucene.experimental

is null.

LUCENENET specific constructor that can be used by the subclasses to control whether the leaf slices are allocated in the base class or subclass.

If is non-null and you choose to skip allocating the leaf slices (i.e. == false), you must set the field in your subclass constructor. This is commonly done by calling and using the result to set . You may wish to do this if you have state to pass into your constructor and need to set it prior to the call to so it is available for use as a member field or property inside a custom override of . is null.

Creates a searcher searching the provided top-level . @lucene.experimental

is null.

Expert: Creates an array of leaf slices each holding a subset of the given leaves. Each is executed in a single thread. By default there will be one per leaf ().

is null.

Return the this searches.

Sugar for .IndexReader.Document(docID)

Sugar for .IndexReader.Document(docID, fieldVisitor)

is null.

Sugar for .IndexReader.Document(docID, fieldsToLoad)

@deprecated Use instead.

Expert: Set the implementation used by this IndexSearcher.

@lucene.internal

is null.

Finds the top hits for top where all results are after a previous result (top ). By passing the bottom result from a previous page as , this method can be used for efficient 'deep-paging' across potentially large result sets.

If a query would exceed clauses. is null.

Finds the top hits for , applying if non-null, where all results are after a previous result (). By passing the bottom result from a previous page as , this method can be used for efficient 'deep-paging' across potentially large result sets.

If a query would exceed clauses. is null.

Finds the top hits for .

If a query would exceed clauses. is null.

Finds the top hits for , applying if non-null.

If a query would exceed clauses.

Lower-level search API. is called for every matching document.

To match documents Ef non-null, used to permit documents to be collected. To receive hits If a query would exceed clauses. or is null.

Lower-level search API. is called for every matching document.

If a query would exceed clauses. or is null.

Search implementation with arbitrary sorting. Finds the top hits for , applying if non-null, and sorting the hits by the criteria in . NOTE: this does not compute scores by default; use to control scoring.

If a query would exceed clauses. or is null.

Search implementation with arbitrary sorting, plus control over whether hit scores and max score should be computed. Finds the top hits for , applying if non-null, and sorting the hits by the criteria in . If is true then the score of each hit will be computed and returned. If is true then the maximum score over all collected hits will be computed.

If a query would exceed clauses. or is null.

Search implementation with arbitrary sorting and no filter.

The query to search for Return only the top n results The object The top docs, sorted according to the supplied instance if there is a low-level I/O error or is null.

Finds the top hits for where all results are after a previous result (). By passing the bottom result from a previous page as , this method can be used for efficient 'deep-paging' across potentially large result sets.

If a query would exceed clauses. or is null.

Finds the top hits for where all results are after a previous result (), allowing control over whether hit scores and max score should be computed. By passing the bottom result from a previous page as , this method can be used for efficient 'deep-paging' across potentially large result sets. If is true then the score of each hit will be computed and returned. If is true then the maximum score over all collected hits will be computed.

If a query would exceed clauses. or is null.

Expert: Low-level search implementation. Finds the top hits for query, applying filter if non-null. Applications should usually call or instead.

If a query would exceed clauses. is null.

Expert: Low-level search implementation. Finds the top n hits for query. Applications should usually call or instead.

If a query would exceed clauses. or is null.

Expert: Low-level search implementation with arbitrary sorting and control over whether hit scores and max score should be computed. Finds the top hits for query and sorting the hits by the criteria in . Applications should usually call instead.

If a query would exceed clauses. or is null.

Just like , but you choose whether or not the fields in the returned instances should be set by specifying .

or is null.

Just like , but you choose whether or not the fields in the returned instances should be set by specifying .

or is null.

Lower-level search API. is called for every document. NOTE: this method executes the searches on all given leaves exclusively. To search across all the searchers leaves use .

The searchers leaves to execute the searches on To match documents To receive hits If a query would exceed clauses. , , or is null.

Expert: called to re-write queries into primitive queries.

If a query would exceed clauses. is null.

Returns an that describes how scored against . This is intended to be used in developing implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index.

is null.

Expert: low-level implementation method Returns an that describes how scored against . This is intended to be used in developing implementations, and, for good performance, should not be displayed with every hit. Computing an explanation is as expensive as executing the query over the entire index. Applications should call .

If a query would exceed clauses. is null.

Creates a normalized weight for a top-level . The query is rewritten by this method and called, afterwards the is normalized. The returned can then directly be used to get a . @lucene.internal

is null.

Returns this searchers the top-level .

A thread subclass for searching a single searchable

A helper class that wraps a and provides an iterable interface to the completed delegates.

the type of the return value

A class holding a subset of the s leaf contexts to be executed within a single thread. @lucene.experimental

Initializes a new instance of with the specified .

The collection of leaves. is null.

Returns for a term. This can be overridden for example, to return a term's statistics across a distributed collection. @lucene.experimental

or is null.

Returns for a field. This can be overridden for example, to return a field's statistics across a distributed collection. @lucene.experimental

Tracks live field values across NRT reader reopens. This holds a map for all updated ids since the last reader reopen. Once the NRT reader is reopened, it prunes the map. This means you must reopen your NRT reader periodically otherwise the RAM consumption of this class will grow unbounded! NOTE: you must ensure the same id is never updated at the same time by two threads, because in this case you cannot in general know which thread "won".

Releases all resources used by the .

Releases resources used by the and if overridden in a derived class, optionally releases unmanaged resources.

true to release both managed and unmanaged resources; false to release only unmanaged resources.

Call this after you've successfully added a document to the index, to record what value you just set the field to.

Call this after you've successfully deleted a document from the index.

Returns the [approximate] number of id/value pairs buffered in RAM. NOTE: This was size() in Lucene.

Returns the current value for this id, or null if the id isn't in the index or was deleted.

This is called when the id/value was already flushed & opened in an NRT IndexSearcher. You must implement this to go look up the value (eg, via doc values, field cache, stored fields, etc.).

A query that matches all documents.

Add this to a fresh before calling . is using this to control its internal behaviour to only return competitive terms. Please note: this attribute is intended to be added by the to an empty that is shared for all segments during query rewrite. This attribute source is passed to all segment enums on . uses this attribute to inform all enums about the current boost, that is not competitive. @lucene.internal

This is the maximum boost that would not be competitive.

This is the term or null of the term that triggered the boost change.

Implementation class for . @lucene.internal

A for OR like queries, counterpart of . This implements and uses Advance() on the given s. This implementation uses the minimumMatch constraint actively to efficiently prune the number of candidates, it is hence a mixture between a pure and a .

The overall number of non-finalized scorers

The minimum number of scorers that should match

A static array of all subscorers sorted by decreasing cost

A monotonically increasing index into the array pointing to the next subscorer that is to be excluded

mmStack is supposed to contain the most costly subScorers that still did not run out of docs, sorted by increasing sparsity of docs returned by that subScorer. For now, the cost of subscorers is assumed to be inversely correlated with sparsity.

The document number of the current match.

The number of subscorers that provide the current match.

Construct a .

The weight to be used. A collection of at least two subscorers. The positive minimum number of subscorers that should match to match this query. When is bigger than the number of , no matches will be produced. When equals the number of , it is more efficient to use .

Construct a , using one as the minimum number of matching .

Returns the score of the current document matching the query. Initially invalid, until is called the first time.

Advances to the first match beyond the current whose document number is greater than or equal to a given target. The implementation uses the Advance() method on the subscorers.

The target document number. The document whose number is greater than or equal to the given target, or -1 if none exist.

Organize into a min heap with scorers generating the earliest document on top.

The subtree of at root is a min heap except possibly for its root element. Bubble the root down as required to make the subtree a heap.

Remove the root from and re-establish it as a heap

Removes a given from the heap by placing end of heap at that position and bubbling it either up or down

A which allows running a search with several s. It offers a static method which accepts a list of collectors and wraps them with , while filtering out the null ones.

Wraps a list of s with a . This method works as follows: Filters out the null collectors, so they are not used during search time. If the input contains 1 real collector (i.e. non-null ), it is returned. Otherwise the method returns a which wraps the non-null ones.

if either 0 collectors were input, or all collectors are null.

is a generalized version of , with an added method . To use this class, to search for the phrase "Microsoft app*" first use on the term "Microsoft", then find all terms that have "app" as prefix using MultiFields.GetFields(IndexReader).GetTerms(string), and use to add them to the query. Collection initializer note: To create and populate a in a single statement, you can use the following example as a guide:


            var multiPhraseQuery = new MultiPhraseQuery() {
                new Term("field", "microsoft"), 
                new Term("field", "office")
            };

Note that as long as you specify all of the parameters, you can use either , , or as the method to use to initialize. If there are multiple parameters, each parameter set must be surrounded by curly braces.

Sets the phrase slop for this query.

Add a single term at the next position in the phrase.

Add multiple terms at the next position in the phrase. Any of the terms may match.

Allows to specify the relative position of terms within the phrase.

Returns a List of the terms in the multiphrase. Do not modify the List or its contents.

Returns the relative positions of terms in this phrase.

Expert: adds all terms occurring in this query to the terms set. Only works if this query is in its rewritten () form.

If this query is not yet rewritten

Prints a user-readable version of this query.

Returns true if is equal to this.

Returns a hash code value for this object.

Returns an enumerator that iterates through the collection.

An enumerator that can be used to iterate through the collection.

Returns an enumerator that iterates through the .

An enumerator that can be used to iterate through the collection.

Takes the logical union of multiple iterators.

NOTE: This was IntQueue in Lucene

An abstract that matches documents containing a subset of terms provided by a enumeration. This query cannot be used directly; you must subclass it and define to provide a that iterates through the terms to be matched. NOTE: if is either or , you may encounter a exception during searching, which happens when the number of terms to be searched exceeds . Setting to prevents this. The recommended rewrite method is : it doesn't spend CPU computing unhelpful scores, and it tries to pick the most performant rewrite method given the query. If you need scoring (like , use which uses a priority queue to only collect competitive terms and not hit this limitation. Note that QueryParsers.Classic.QueryParser produces s using by default.

Abstract class that defines how the query is rewritten.

Returns the s

A rewrite method that first creates a private , by visiting each term in sequence and marking all docs for that term. Matching documents are assigned a constant score equal to the query's boost. This method is faster than the rewrite methods when the number of matched terms or matched documents is non-trivial. Also, it will never hit an errant exception.

A rewrite method that first translates each term into clause in a , and keeps the scores as computed by the query. Note that typically such scores are meaningless to the user, and require non-trivial CPU to compute, so it's almost always better to use instead. NOTE: this rewrite method will hit if the number of terms exceeds .

Like except scores are not computed. Instead, each matching document receives a constant score equal to the query's boost. NOTE: this rewrite method will hit if the number of terms exceeds .

A rewrite method that first translates each term into clause in a , and keeps the scores as computed by the query. This rewrite method only uses the top scoring terms so it will not overflow the boolean max clause count. It is the default rewrite method for .

Create a for at most terms. NOTE: if is smaller than , then it will be used instead.

A rewrite method that first translates each term into clause in a , but the scores are only computed as the boost. This rewrite method only uses the top scoring terms so it will not overflow the boolean max clause count.

Create a for at most terms. NOTE: if is smaller than , then it will be used instead.

Read-only default instance of , with set to and set to . Note that you cannot alter the configuration of this instance; you'll need to create a private instance instead.

Constructs a query matching terms that cannot be represented with a single .

Returns the field name for this query

Construct the enumeration to be used, expanding the pattern term. this method should only be called if the field exists (ie, implementations can assume the field does exist). this method should not return null (should instead return if no terms match). The must already be positioned to the first matching term. The given is passed by the to provide attributes, the rewrite method uses to inform about e.g. maximum competitive boosts. this is currently only used by .

Convenience method, if no attributes are needed: this simply passes empty attributes and is equal to: GetTermsEnum(terms, new AttributeSource())

To rewrite to a simpler form, instead return a simpler enum from . For example, to rewrite to a single term, return a .

Gets or Sets the rewrite method to be used when executing the query. You can use one of the four core methods, or implement your own subclass of .

A wrapper for , that exposes its functionality as a . is not designed to be used by itself. Normally you subclass it to provide a counterpart for a subclass. For example, and extend . This class also provides the functionality behind ; this is why it is not abstract.

Wrap a as a .

Returns the field name for this query

Returns a with documents that should be permitted in search results.

This is a which is optimized for n-gram phrase query. For example, when you query "ABCD" on a 2-gram field, you may want to use rather than , because will the query to "AB/0 CD/2", while will query "AB/0 BC/1 CD/2" (where term/position). Collection initializer note: To create and populate a in a single statement, you can use the following example as a guide:


            var phraseQuery = new NGramPhraseQuery(2) {
                new Term("field", "ABCD"), 
                new Term("field", "EFGH")
            };

Note that as long as you specify all of the parameters, you can use either or as the method to use to initialize. If there are multiple parameters, each parameter set must be surrounded by curly braces.

Constructor that takes gram size.

n-gram size

Returns true if is equal to this.

Returns a hash code value for this object.

A that only accepts numeric values within a specified range. To use this, you must first index the numeric values using , , or (expert: ). You create a new with the static factory methods, eg:


             Filter f = NumericRangeFilter.NewFloatRange("weight", 0.03f, 0.10f, true, true);

Accepts all documents whose float valued "weight" field ranges from 0.03 to 0.10, inclusive. See for details on how Lucene indexes and searches numeric valued fields. @since 2.9

Returns true if the lower endpoint is inclusive

Returns true if the upper endpoint is inclusive

Returns the lower value of this range filter

Returns the upper value of this range filter

Returns the precision step.

LUCENENET specific static class to provide access to static methods without referring to the 's generic closing type.

Factory that creates a , that filters a range using the given . You can have half-open ranges (which are in fact </<= or >/>= queries) by setting the min or max value to null. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. NOTE: This was newLongRange() in Lucene

Factory that creates a , that queries a range using the default (4). You can have half-open ranges (which are in fact </<= or >/>= queries) by setting the min or max value to null. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. NOTE: This was newLongRange() in Lucene

Factory that creates a , that filters a range using the given . You can have half-open ranges (which are in fact </<= or >/>= queries) by setting the min or max value to null. will never match a half-open range, to hit NaN use a query with min == max == System.Double.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

Factory that creates a , that queries a range using the default (4). You can have half-open ranges (which are in fact </<= or >/>= queries) by setting the min or max value to null. will never match a half-open range, to hit NaN use a query with min == max == System.Double.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

Factory that creates a , that filters a range using the given . You can have half-open ranges (which are in fact </<= or >/>= queries) by setting the min or max value to null. will never match a half-open range, to hit NaN use a query with min == max == System.Single.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. NOTE: This was newFloatRange() in Lucene

Factory that creates a , that queries a range using the default (4). You can have half-open ranges (which are in fact </<= or >/>= queries) by setting the min or max value to null. will never match a half-open range, to hit NaN use a query with min == max == System.Single.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. NOTE: This was newFloatRange() in Lucene

A that matches numeric values within a specified range. To use this, you must first index the numeric values using , , or (expert: ). If your terms are instead textual, you should use . is the filter equivalent of this query. You create a new with the static factory methods, eg:


             Query q = NumericRangeQuery.NewFloatRange("weight", 0.03f, 0.10f, true, true);

matches all documents whose valued "weight" field ranges from 0.03 to 0.10, inclusive. The performance of is much better than the corresponding because the number of terms that must be searched is usually far fewer, thanks to trie indexing, described below. You can optionally specify a when creating this query. This is necessary if you've changed this configuration from its default (4) during indexing. Lower values consume more disk space but speed up searching. Suitable values are between 1 and 8. A good starting point to test is 4, which is the default value for all Numeric* classes. See below for details. This query defaults to . With precision steps of <=4, this query can be run with one of the rewrite methods without changing 's default max clause count.

How it works

See the publication about panFMP, where this algorithm was described (referred to as TrieRangeQuery):

Schindler, U, Diepenbroek, M, 2008. Generic XML-based Framework for Metadata Portals. Computers & Geosciences 34 (12), 1947-1955. doi:10.1016/j.cageo.2008.02.023

A quote from this paper: Because Apache Lucene is a full-text search engine and not a conventional database, it cannot handle numerical ranges (e.g., field value is inside user defined bounds, even dates are numerical values). We have developed an extension to Apache Lucene that stores the numerical values in a special string-encoded format with variable precision (all numerical values like s, s, s, and s are converted to lexicographic sortable string representations and stored with different precisions (for a more detailed description of how the values are stored, see ). A range is then divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. This reduces the number of terms dramatically. For the variant that stores long values in 8 different precisions (each reduced by 8 bits) that uses a lowest precision of 1 byte, the index contains only a maximum of 256 distinct values in the lowest precision. Overall, a range could consist of a theoretical maximum of 7*255*2 + 255 = 3825 distinct terms (when there is a term for every distinct value of an 8-byte-number in the index and the range covers almost all of them; a maximum of 255 distinct values is used because it would always be possible to reduce the full 256 values to one term with degraded precision). In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records and a uniform value distribution).

Precision Step

You can choose any when encoding values. Lower step values mean more precisions and so more terms in index (and index gets larger). The number of indexed terms per value is (those are generated by ): indexedTermsPerValue = ceil(bitsPerValue / precisionStep) As the lower precision terms are shared by many values, the additional terms only slightly grow the term dictionary (approx. 7% for precisionStep=4), but have a larger impact on the postings (the postings file will have more entries, as every document is linked to indexedTermsPerValue terms instead of one). The formula to estimate the growth of the term dictionary in comparison to one term per value: $\mathrm{termDictOverhead} = \sum\limits_{i=0}^{\mathrm{indexedTermsPerValue}-1} \frac{1}{2^{\mathrm{precisionStep}\cdot i}}$ On the other hand, if the is smaller, the maximum number of terms to match reduces, which optimizes query speed. The formula to calculate the maximum number of terms that will be visited while executing the query is: $\mathrm{maxQueryTerms} = \left[ \left( \mathrm{indexedTermsPerValue} - 1 \right) \cdot \left(2^\mathrm{precisionStep} - 1 \right) \cdot 2 \right] + \left( 2^\mathrm{precisionStep} - 1 \right)$ For longs stored using a precision step of 4, maxQueryTerms = 15*15*2 + 15 = 465, and for a precision step of 2, maxQueryTerms = 31*3*2 + 3 = 189. But the faster search speed is reduced by more seeking in the term enum of the index. Because of this, the ideal value can only be found out by testing. Important: You can index with a lower precision step value and test search speed using a multiple of the original step value. Good values for are depending on usage and data type: The default for all data types is 4, which is used, when no precisionStep is given. Ideal value in most cases for 64 bit data types (long, double) is 6 or 8. Ideal value in most cases for 32 bit data types (int, float) is 4. For low cardinality fields larger precision steps are good. If the cardinality is < 100, it is fair to use (see below). Steps >=64 for long/double and >=32 for int/float produces one token per value in the index and querying is as slow as a conventional . But it can be used to produce fields, that are solely used for sorting (in this case simply use as ). Using , , or for sorting is ideal, because building the field cache is much faster than with text-only numbers. These fields have one term per value and therefore also work with term enumeration for building distinct lists (e.g. facets / preselected values to search for). Sorting is also possible with range query optimized fields using one of the above s. Comparisons of the different types of RangeQueries on an index with about 500,000 docs showed that in boolean rewrite mode (with raised clause count) took about 30-40 secs to complete, in constant score filter rewrite mode took 5 secs and executing this class took <100ms to complete (on an Opteron64 machine, Java 1.5, 8 bit precision step). This query type was developed for a geographic portal, where the performance for e.g. bounding boxes or exact date/time stamps is important. @since 2.9

Returns true if the lower endpoint is inclusive

Returns true if the upper endpoint is inclusive

Returns the lower value of this range query

Returns the upper value of this range query

Returns the precision step.

NOTE: This was LONG_NEGATIVE_INFINITY in Lucene

NOTE: This was INT_NEGATIVE_INFINITY in Lucene

NOTE: This was INT_POSITIVE_INFINITY in Lucene

Subclass of for enumerating all terms that match the sub-ranges for trie range queries, using flex API. WARNING: this term enumeration is not guaranteed to be always ordered by . The ordering depends on how and generates the sub-ranges. For ordering is not relevant.

LUCENENET specific class to provide access to static factory metods of without referring to its genereic closing type.

Factory that creates a , that queries a range using the given . You can have half-open ranges (which are in fact </<= or >/>= queries) by setting the min or max value to null. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. NOTE: This was newLongRange() in Lucene

Factory that creates a , that queries a range using the given . You can have half-open ranges (which are in fact </<= or >/>= queries) by setting the min or max value to null. will never match a half-open range, to hit NaN use a query with min == max == System.Double.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.

Factory that creates a , that queries a range using the given . You can have half-open ranges (which are in fact </<= or >/>= queries) by setting the min or max value to null. will never match a half-open range, to hit NaN use a query with min == max == System.Single.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. NOTE: This was newFloatRange() in Lucene

Factory that creates a , that queries a range using the default (4). You can have half-open ranges (which are in fact </<= or >/>= queries) by setting the min or max value to null. will never match a half-open range, to hit NaN use a query with min == max == System.Single.NaN. By setting inclusive to false, it will match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. NOTE: This was newFloatRange() in Lucene

Calculate the final score as the average score of all payloads seen. Is thread safe and completely reusable.

Returns the maximum payload score seen, else 1 if there are no payloads on the doc. Is thread safe and completely reusable.

Calculates the minimum payload seen

An abstract class that defines a way for Payload*Query instances to transform the cumulative effects of payload scores for a document. @lucene.experimental this class and its derivations are experimental and subject to change

Calculate the score up to this point for this doc and field

The current doc The field The start position of the matching Span The end position of the matching Span The number of payloads seen so far The current score so far The score for the current payload The new current Score

Calculate the final score for all the payloads seen so far for this doc/field

The current doc The current field The total number of payloads seen on this document The raw score for those payloads The final score for the payloads

This class is very similar to except that it factors in the value of the payloads located at each of the positions where the occurs. NOTE: In order to take advantage of this with the default scoring implementation (), you must override , which returns 1 by default. Payload scores are aggregated using a pluggable .

By default, uses the to score the payloads, but can be overridden to do other things.

The payloads The start position of the span being scored The end position of the span being scored

Experimental class to get set of payloads for most standard Lucene queries. Operates like Highlighter - should only contain doc of interest, best to use MemoryIndex. @lucene.experimental

that contains doc with payloads to extract

Query should be rewritten for wild/fuzzy support.

rewritten query payloads Collection if there is a low-level I/O error

This class is very similar to except that it factors in the value of the payload located at each of the positions where the occurs. NOTE: In order to take advantage of this with the default scoring implementation (), you must override , which returns 1 by default. Payload scores are aggregated using a pluggable .

* if there is a low-level I/O error

Returns the score only. Should not be overridden without good cause!

the score for just the Span part w/o the payload if there is a low-level I/O error

The score for the payload

The score, as calculated by

Position of a term in a document that takes into account the term offset within the phrase.

Go to next location of this term current document, and set position as location - offset, so that a matching exact phrase is easily identified when all have exactly the same position.

For debug purposes

A that matches documents containing a particular sequence of terms. A is built by QueryParser for input like "new york". This query may be combined with other terms or queries with a . Collection initializer note: To create and populate a in a single statement, you can use the following example as a guide:


             var phraseQuery = new PhraseQuery() {
                 new Term("field", "microsoft"), 
                 new Term("field", "office")
             };

Constructs an empty phrase query.

Sets the number of other words permitted between words in query phrase. If zero, then this is an exact phrase search. For larger values this works like a WITHIN or NEAR operator. The slop is in fact an edit-distance, where the units correspond to moves of terms in the query phrase out of position. For example, to switch the order of two words requires two moves (the first move places the words atop one another), so to permit re-orderings of phrases, the slop must be at least two. More exact matches are scored higher than sloppier matches, thus search results are sorted by exactness. The slop is zero by default, requiring exact matches.

Adds a term to the end of the query phrase. The relative position of the term is the one immediately after the last term added.

Adds a term to the end of the query phrase. The relative position of the term within the phrase is specified explicitly. this allows e.g. phrases with more than one term at the same position or phrases with gaps (e.g. in connection with stopwords).

Returns the set of terms in this phrase.

Returns the relative positions of terms in this phrase.

Prints a user-readable version of this query.

Returns true if is equal to this.

Returns a hash code value for this object.

Returns an enumerator that iterates through the collection.

An enumerator that can be used to iterate through the collection.

Returns an enumerator that iterates through the collection.

An enumerator that can be used to iterate through the collection.

A implementation which wraps another and makes sure only documents with scores > 0 are collected.

A that restricts search results to values that have a matching prefix in a given field.

Prints a user-readable version of this query.

A that matches documents containing terms with a specified prefix. A is built by QueryParser for input like app*. This query uses the rewrite method.

Constructs a query for terms starting with .

Returns the prefix of this query.

Prints a user-readable version of this query.

Subclass of for enumerating all terms that match the specified prefix filter term. Term enumerations are always ordered by . Each term in the enumeration is greater than all that precede it.

The abstract base class for queries. Instantiable subclasses are: See also the family of Span Queries () and additional queries available in the Queries module

Gets or Sets the boost for this query clause. Documents matching this clause will (in addition to the normal weightings) have their score multiplied by . The boost is 1.0 by default.

Prints a query to a string, with assumed to be the default field and omitted.

Prints a query to a string.

Expert: Constructs an appropriate implementation for this query. Only implemented by primitive queries, which re-write to themselves.

Expert: called to re-write queries into primitive queries. For example, a will be rewritten into a that consists of s.

Expert: adds all terms occurring in this query to the terms set. Only works if this query is in its rewritten () form.

If this query is not yet rewritten

Returns a clone of this query.

A that uses a provided to assign scores to the first-pass hits. @lucene.experimental

Sole constructor, passing the 2nd pass query to assign scores to the 1st pass hits.

Implement this in a subclass to combine the first pass and second pass scores. If is false then the second pass query failed to match a hit from the first pass query, and you should ignore the .

Sugar API, calling using a simple linear combination of firstPassScore + * secondPassScore

Constrains search results to only match those which also match a provided query. This could be used, for example, with a on a suitably formatted date field to implement date filtering. One could re-use a single CachingWrapperFilter(QueryWrapperFilter) that matches, e.g., only documents modified within the last week. This would only need to be reconstructed once per day.

Constructs a filter which only matches documents matching .

Returns the inner Query

Utility class to safely share instances of a certain type across multiple threads, while periodically refreshing them. This class ensures each reference is closed only once all threads have finished using it. It is recommended to consult the documentation of implementations for their semantics. @lucene.experimental

The concrete type that will be d and d.

The current reference

Decrement reference counting on the given reference.

If reference decrement on the given resource failed.

Refresh the given reference if needed. Returns null if no refresh was needed, otherwise a new refreshed reference.

If the reference manager has been d. If the refresh operation failed

Try to increment reference counting on the given reference. Returns true if the operation was successful.

if the reference manager has been d.

Obtain the current reference. You must match every call to acquire with one call to ; it's best to do so in a finally clause, and set the reference to null to prevent accidental usage after it has been released.

If the reference manager has been d.

Closes this ReferenceManager to prevent future ing. A reference manager should be disposed if the reference to the managed resource should be disposed or the application using the is shutting down. The managed resource might not be released immediately, if the user is holding on to a previously d reference. The resource will be released once when the last reference is d. Those references can still be used as if the manager was still active. Applications should not new references from this manager once this method has been called. ing a resource on a disposed will throw an .

If the underlying reader of the current reference could not be disposed

Returns the current reference count of the given reference.

Called after , so subclass can free any resources. When overriding, be sure to include a call to base.Dispose(disposing) in your implementation.

if the after dispose operation in a sub-class throws an

You must call this (or ), periodically, if you want that will return refreshed instances. Threads: it's fine for more than one thread to call this at once. Only the first thread will attempt the refresh; subsequent threads will see that another thread is already handling refresh and will return immediately. Note that this means if another thread is already refreshing then subsequent threads will return right away without waiting for the refresh to complete. If this method returns true it means the calling thread either refreshed or that there were no changes to refresh. If it returns false it means another thread is currently refreshing.

If refreshing the resource causes an If the reference manager has been d.

You must call this (or ), periodically, if you want that will return refreshed instances. Threads: unlike , if another thread is currently refreshing, this method blocks until that thread completes. It is useful if you want to guarantee that the next call to will return a refreshed instance. Otherwise, consider using the non-blocking .

If refreshing the resource causes an If the reference manager has been d.

Called after a refresh was attempted, regardless of whether a new reference was in fact created.

if a low level I/O exception occurs

Release the reference previously obtained via . NOTE: it's safe to call this after .

If the release operation on the given resource throws an

Adds a listener, to be notified when a reference is refreshed/swapped.

Remove a listener added with .

LUCENENET specific class used to provide static access to without having to specifiy the generic closing type of .

Use to receive notification when a refresh has finished. See .

Called right before a refresh attempt starts.

Called after the attempted refresh; if the refresh did open a new reference then didRefresh will be true and is guaranteed to return the new reference.

A fast regular expression query based on the package. Comparisons are fast The term dictionary is enumerated in an intelligent way, to avoid comparisons. See for more details. The supported syntax is documented in the class. Note this might be different than other regular expression implementations. For some alternatives with different syntax, look under the sandbox. Note this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow s, a term should not start with the expression .* @lucene.experimental

A provider that provides no named automata

Constructs a query for terms matching . By default, all regular expression features are enabled.

Regular expression.

Constructs a query for terms matching .

Regular expression. Optional features from

Constructs a query for terms matching .

Regular expression. Optional features from Custom for named automata

Prints a user-readable version of this query.

A for queries with a required subscorer and an excluding (prohibited) sub . This implements , and it uses the SkipTo() on the given scorers.

Construct a .

The scorer that must match, except where Indicates exclusion.

Advance to non excluded doc. On entry: reqScorer != null, exclScorer != null, reqScorer was advanced once via Next() or SkipTo() and reqScorer.Doc may still be excluded. Advances reqScorer a non excluded required doc, if any.

true if there is a non excluded required doc.

Returns the score of the current document matching the query. Initially invalid, until is called the first time.

The score of the required scorer.

A for queries with a required part and an optional part. Delays SkipTo() on the optional part until a GetScore() is needed. This implements .

The scorers passed from the constructor. These are set to null as soon as their Next() or SkipTo() returns false.

Construct a .

The required scorer. This must match. The optional scorer. This is used for scoring only.

Returns the score of the current document matching the query. Initially invalid, until is called the first time.

The score of the required scorer, eventually increased by the score of the optional scorer when it also matches the current document.

Re-scores the topN results () from an original query. See for an actual implementation. Typically, you run a low-cost first-pass query across the entire index, collecting the top few hundred hits perhaps, and then use this class to mix in a more costly second pass scoring. See for a simple static method to call to rescore using a 2nd pass . @lucene.experimental

Rescore an initial first-pass .

used to produce the first pass topDocs Hits from the first pass search. It's very important that these hits were produced by the provided searcher; otherwise the doc IDs will not match! How many re-scored hits to return

Explains how the score for the specified document was computed.

A which wraps another scorer and caches the score of the current document. Successive calls to will return the same result and will not invoke the wrapped Scorer's GetScore() method, unless the current document has changed. This class might be useful due to the changes done to the interface, in which the score is not computed for a document by default, only if the collector requests it. Some collectors may need to use the score in several places, however all they have in hand is a object, and might end up computing the score of a document more than once.

Creates a new instance by wrapping the given scorer.

Holds one hit in .

The score of this document for the query.

A hit document's number.

Only set by

Constructs a .

A convenience method for debugging.

Expert: Common scoring functionality for different types of queries. A iterates over documents matching a query in increasing order of doc Id. Document scores are computed using a given implementation. NOTE: The values , and are not valid scores. Certain collectors (eg ) will not properly collect hits with these scores.

The 's parent . In some cases this may be null.

Constructs a

The scorers .

Returns the score of the current document matching the query. Initially invalid, until or is called the first time, or when called from within .

returns parent @lucene.experimental

Returns child sub-scorers @lucene.experimental

A child and its relationship to its parent. The meaning of the relationship depends upon the parent query. @lucene.experimental

Child . (note this is typically a direct child, and may itself also have children).

An arbitrary string relating this scorer to the parent.

Creates a new node with the specified relationship. The relationship can be any be any string that makes sense to the parent .

Base rewrite method that translates each term into a query, and keeps the scores as computed by the query. @lucene.internal - Only public to be accessible by spans package.

Like except scores are not computed. Instead, each matching document receives a constant score equal to the query's boost. NOTE: this rewrite method will hit if the number of terms exceeds .

This method is called after every new term to check if the number of max clauses (e.g. in ) is not exceeded. Throws the corresponding .

Special implementation of that keeps parallel arrays for boost and docFreq

Factory class used by to create new s. The default implementation just creates an with no custom behavior:


                 public IndexSearcher NewSearcher(IndexReader r)
                 {
                     return new IndexSearcher(r);
                 }

You can pass your own factory instead if you want custom behavior, such as: Setting a custom scoring model: Parallel per-segment search: Return custom subclasses of (for example that implement distributed scoring) Run queries to warm your before it is used. Note: when using near-realtime search you may want to also set to warm newly merged segments in the background, outside of the reopen path. @lucene.experimental

Returns a new over the given reader.

Keeps track of current plus old s, disposing the old ones once they have timed out. Use it like this:


                 SearcherLifetimeManager mgr = new SearcherLifetimeManager();

Per search-request, if it's a "new" search request, then obtain the latest searcher you have (for example, by using ), and then record this searcher:


                 // Record the current searcher, and save the returend
                 // token into user's search results (eg as a  hidden
                 // HTML form field):
                 long token = mgr.Record(searcher);

When a follow-up search arrives, for example the user clicks next page, drills down/up, etc., take the token that you saved from the previous search and:


                 // If possible, obtain the same searcher as the last
                 // search:
                 IndexSearcher searcher = mgr.Acquire(token);
                 if (searcher != null) 
                 {
                     // Searcher is still here
                     try 
                     {
                         // do searching...
                     } 
                     finally 
                     {
                         mgr.Release(searcher);
                         // Do not use searcher after this!
                         searcher = null;
                     }
                 } 
                 else 
                 {
                     // Searcher was pruned -- notify user session timed
                     // out, or, pull fresh searcher again
                 }

Finally, in a separate thread, ideally the same thread that's periodically reopening your searchers, you should periodically prune old searchers:


                 mgr.Prune(new PruneByAge(600.0));

NOTE: keeping many searchers around means you'll use more resources (open files, RAM) than a single searcher. However, as long as you are using , the searchers will usually share almost all segments and the added resource usage is contained. When a large merge has completed, and you reopen, because that is a large change, the new searcher will use higher additional RAM than other searchers; but large merges don't complete very often and it's unlikely you'll hit two of them in your expiration window. Still you should budget plenty of heap in the runtime to have a good safety margin.

Records that you are now using this . Always call this when you've obtained a possibly new , for example from . It's fine if you already passed the same searcher to this method before. This returns the token that you can later pass to to retrieve the same . You should record this token in the search results sent to your user, such that if the user performs a follow-on action (clicks next page, drills down, etc.) the token is returned.

Retrieve a previously recorded , if it has not yet been closed. NOTE: this may return null when the requested searcher has already timed out. When this happens you should notify your user that their session timed out and that they'll have to restart their search. If this returns a non-null result, you must match later call on this searcher, best from a finally clause.

Release a searcher previously obtained from . NOTE: it's fine to call this after Dispose().

See .

Return true if this searcher should be removed.

How much time has passed since this searcher was the current (live) searcher Searcher

Simple pruner that drops any searcher older by more than the specified seconds, than the newest searcher.

Calls provided to prune entries. The entries are passed to the in sorted (newest to oldest ) order. NOTE: you must peridiocally call this, ideally from the same background thread that opens new searchers.

Close this to future searching; any searches still in process in other threads won't be affected, and they should still call after they are done. NOTE: you must ensure no other threads are calling while you call Dispose(); otherwise it's possible not all searcher references will be freed.

Releases resources used by the and if overridden in a derived class, optionally releases unmanaged resources.

true to release both managed and unmanaged resources; false to release only unmanaged resources.

Utility class to safely share instances across multiple threads, while periodically reopening. This class ensures each searcher is disposed only once all threads have finished using it. Use to obtain the current searcher, and to release it, like this:


             IndexSearcher s = manager.Acquire();
             try 
             {
                 // Do searching, doc retrieval, etc. with s
             } 
             finally 
             {
                 manager.Release(s);
                 // Do not use s after this!
                 s = null;
             }

In addition you should periodically call . While it's possible to call this just before running each query, this is discouraged since it penalizes the unlucky queries that do the reopen. It's better to use a separate background thread, that periodically calls . Finally, be sure to call once you are done. @lucene.experimental

Creates and returns a new from the given .

The to open the from. If true, all buffered deletes will be applied (made visible) in the / . If false, the deletes may or may not be applied, but remain buffered (in ) so that they will be applied in the future. Applying deletes can be costly, so if your app can tolerate deleted documents being returned you might gain some performance by passing false. See . An optional . Pass null if you don't require the searcher to be warmed before going live or other custom behavior. if there is a low-level I/O error

Creates and returns a new from the given .

The directory to open the on. An optional . Pass null if you don't require the searcher to be warmed before going live or other custom behavior. If there is a low-level I/O error

Returns true if no changes have occured since this searcher ie. reader was opened, otherwise false.

Expert: creates a searcher from the provided using the provided . NOTE: this decRefs incoming reader on throwing an exception.

This class acts as the base class for the implementations of the first normalization of the informative content in the DFR framework. This component is also called the after effect and is defined by the formula Inf₂ = 1 - Prob₂, where Prob₂ measures the information gain. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns the aftereffect score.

Returns an explanation for the score.

Implementation used when there is no aftereffect.

Sole constructor: parameter-free

Subclasses must override this method to return the code of the after effect formula. Refer to the original paper for the list.

Model of the information gain based on the ratio of two Bernoulli processes. @lucene.experimental

Sole constructor: parameter-free

Model of the information gain based on Laplace's law of succession. @lucene.experimental

Sole constructor: parameter-free

This class acts as the base class for the specific basic model implementations in the DFR framework. Basic models compute the informative content Inf₁ = -log₂Prob₁ . @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns the informative content score.

Returns an explanation for the score. Most basic models use the number of documents and the total term frequency to compute Inf₁. this method provides a generic explanation for such models. Subclasses that use other statistics must override this method.

Subclasses must override this method to return the code of the basic model formula. Refer to the original paper for the list.

Limiting form of the Bose-Einstein model. The formula used in Lucene differs slightly from the one in the original paper: F is increased by tfn+1 and N is increased by F @lucene.experimental NOTE: in some corner cases this model may give poor performance with Normalizations that return large values for tfn such as . Consider using the geometric approximation () instead, which provides the same relevance but with less practical problems.

Sole constructor: parameter-free

The f helper function defined for B_E.

Implements the approximation of the binomial model with the divergence for DFR. The formula used in Lucene differs slightly from the one in the original paper: to avoid underflow for small values of N and F, N is increased by 1 and F is always increased by tfn+1. WARNING: for terms that do not meet the expected random distribution (e.g. stopwords), this model may give poor performance, such as abnormally high scores for low tf values. @lucene.experimental

Sole constructor: parameter-free

Geometric as limiting form of the Bose-Einstein model. The formula used in Lucene differs slightly from the one in the original paper: F is increased by 1 and N is increased by F. @lucene.experimental

Sole constructor: parameter-free

An approximation of the I(n_e) model. @lucene.experimental

Sole constructor: parameter-free

The basic tf-idf model of randomness. @lucene.experimental

Sole constructor: parameter-free

Tf-idf model of randomness, based on a mixture of Poisson and inverse document frequency. @lucene.experimental

Sole constructor: parameter-free

Implements the Poisson approximation for the binomial model for DFR. @lucene.experimental WARNING: for terms that do not meet the expected random distribution (e.g. stopwords), this model may give poor performance, such as abnormally high scores for low tf values.

log2(Math.E), precomputed.

Sole constructor: parameter-free

Stores all statistics commonly used ranking methods. @lucene.experimental

The number of documents.

The total number of tokens in the field.

The average field length.

The document frequency.

The total number of occurrences of this term across all documents.

Query's inner boost.

Any outer query's boost.

For most Similarities, the immediate and the top level query boosts are not handled differently. Hence, this field is just the product of the other two.

Constructor. Sets the query boost.

Gets or Sets the number of documents.

Returns the total number of tokens in the field.

Returns the average field length.

Returns the document frequency.

Returns the total number of occurrences of this term across all documents.

The field.

The square of the raw normalization value.

Computes the raw normalization value. This basic implementation returns the query boost. Subclasses may override this method to include other factors (such as idf), or to save the value for inclusion in , etc.

No normalization is done. is saved in the object, however.

Returns the total boost.

BM25 Similarity. Introduced in Stephen E. Robertson, Steve Walker, Susan Jones, Micheline Hancock-Beaulieu, and Mike Gatford. Okapi at TREC-3. In Proceedings of the Third Text REtrieval Conference (TREC 1994). Gaithersburg, USA, November 1994. @lucene.experimental

BM25 with the supplied parameter values.

Controls non-linear term frequency normalization (saturation). Controls to what degree document length normalizes tf values.

BM25 with these default values: k1 = 1.2, b = 0.75.

Implemented as log(1 + (numDocs - docFreq + 0.5)/(docFreq + 0.5)).

Implemented as 1 / (distance + 1).

The default implementation returns 1

The default implementation computes the average as sumTotalTermFreq / maxDoc, or returns 1 if the index does not store sumTotalTermFreq (Lucene 3.x indexes or any field that omits frequency information).

The default implementation encodes boost / sqrt(length) with . This is compatible with Lucene's default implementation. If you change this, then you should change to match.

The default implementation returns 1 / f² where f is .

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Gets or Sets whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms.

Cache of decoded bytes.

Computes a score factor for a simple term and returns an explanation for that score factor. The default implementation uses:


                 Idf(docFreq, searcher.MaxDoc);

Note that is used instead of because also is used, and when the latter is inaccurate, so is , and in the same direction. In addition, is more efficient to compute

collection-level statistics term-level statistics for the term an object that includes both an idf score factor and an explanation for the term.

Computes a score factor for a phrase. The default implementation sums the idf factor for each term in the phrase.

collection-level statistics term-level statistics for the terms in the phrase an object that includes both an idf score factor for the phrase and an explanation for each term.

Collection statistics for the BM25 model.

BM25's idf

The average document length.

query's inner boost

query's outer boost (only for explain)

weight (idf * boost)

field name, for pulling norms

precomputed norm[256] with k1 * ((1 - b) + b * dl / avgdl)

Returns the k1 parameter

Returns the b parameter

Expert: Default scoring implementation which encodes () norm values as a single byte before being stored. At search time, the norm byte value is read from the index and decoded () back to a float norm value. this encoding/decoding, while reducing index size, comes with the price of precision loss - it is not guaranteed that Decode(Encode(x)) = x. For instance, Decode(Encode(0.89)) = 0.75. Compression of norm values to a single byte saves memory at search time, because once a field is referenced at search time, its norms - for all documents - are maintained in memory. The rationale supporting such lossy compression of norm values is that given the difficulty (and inaccuracy) of users to express their true information need by a query, only big differences matter. Last, note that search time is too late to modify this norm part of scoring, e.g. by using a different for search.

Cache of decoded bytes.

Sole constructor: parameter-free

Implemented as overlap / maxOverlap.

Implemented as 1/sqrt(sumOfSquaredWeights).

Encodes a normalization factor for storage in an index. The encoding uses a three-bit mantissa, a five-bit exponent, and the zero-exponent point at 15, thus representing values from around 7x10^9 to 2x10^-9 with about one significant decimal digit of accuracy. Zero is also represented. Negative numbers are rounded up to zero. Values too large to represent are rounded down to the largest representable value. Positive values too small to represent are rounded up to the smallest positive representable value.

Decodes the norm value, assuming it is a single byte.

Implemented as state.Boost * LengthNorm(numTerms), where numTerms is if is false, else it's - . @lucene.experimental

Implemented as Math.Sqrt(freq).

Implemented as 1 / (distance + 1).

The default implementation returns 1

Implemented as log(numDocs/(docFreq+1)) + 1.

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Determines whether overlap tokens (Tokens with 0 position increment) are ignored when computing norm. By default this is true, meaning overlap tokens do not count when computing norms. @lucene.experimental

Implements the divergence from randomness (DFR) framework introduced in Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20, 4 (October 2002), 357-389. The DFR scoring formula is composed of three separate components: the basic model, the aftereffect and an additional normalization component, represented by the classes , and , respectively. The names of these classes were chosen to match the names of their counterparts in the Terrier IR engine. To construct a , you must specify the implementations for all three components of DFR: Component Implementations : Basic model of information content: : Limiting form of Bose-Einstein : Geometric approximation of Bose-Einstein : Poisson approximation of the Binomial : Divergence approximation of the Binomial : Inverse document frequency : Inverse expected document frequency [mixture of Poisson and IDF] : Inverse term frequency [approximation of I(ne)] : First normalization of information gain: : Laplace's law of succession : Ratio of two Bernoulli processes : no first normalization : Second (length) normalization: : Uniform distribution of term frequency : term frequency density inversely related to length : term frequency normalization provided by Dirichlet prior : term frequency normalization provided by a Zipfian relation : no second normalization Note that qtf, the multiplicity of term-occurrence in the query, is not handled by this implementation. @lucene.experimental

The basic model for information content.

The first normalization of the information content.

The term frequency normalization.

Creates DFRSimilarity from the three components. Note that null values are not allowed: if you want no normalization or after-effect, instead pass or respectively.

Basic model of information content First normalization of information gain Second (length) normalization , , or is null.

Returns the basic model of information content

Returns the first normalization

Returns the second normalization

The probabilistic distribution used to model term occurrence in information-based models. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Computes the score.

Explains the score. Returns the name of the model only, since both tfn and lambda are explained elsewhere.

Subclasses must override this method to return the name of the distribution.

Log-logistic distribution. Unlike for DFR, the natural logarithm is used, as it is faster to compute and the original paper does not express any preference to a specific base. @lucene.experimental

Sole constructor: parameter-free

The smoothed power-law (SPL) distribution for the information-based framework that is described in the original paper. Unlike for DFR, the natural logarithm is used, as it is faster to compute and the original paper does not express any preference to a specific base. @lucene.experimental

Sole constructor: parameter-free

Provides a framework for the family of information-based models, as described in StÉphane Clinchant and Eric Gaussier. 2010. Information-based models for ad hoc IR. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (SIGIR '10). ACM, New York, NY, USA, 234-241. The retrieval function is of the form RSV(q, d) = ∑ -x^q_w log Prob(X_w >= t^d_w | λ_w), where x^q_w is the query boost; X_w is a random variable that counts the occurrences of word w; t^d_w is the normalized term frequency; λ_w is a parameter. The framework described in the paper has many similarities to the DFR framework (see ). It is possible that the two Similarities will be merged at one point. To construct an , you must specify the implementations for all three components of the Information-Based model. Component Implementations : Probabilistic distribution used to model term occurrence : Log-logistic : Smoothed power-law : λ_w parameter of the probability distribution : N_w/N or average number of documents where w occurs : F_w/N or average number of occurrences of w in the collection : Term frequency normalization Any supported DFR normalization (listed in ) @lucene.experimental

The probabilistic distribution used to model term occurrence.

The lambda (λ_w) parameter.

The term frequency normalization.

Creates IBSimilarity from the three components. Note that null values are not allowed: if you want no normalization, instead pass .

probabilistic distribution modeling term occurrence distribution's λ_w parameter term frequency normalization

The name of IB methods follow the pattern IB <distribution> <lambda><normalization>. The name of the distribution is the same as in the original paper; for the names of lambda parameters, refer to the doc of the classes.

Returns the distribution

Returns the distribution's lambda parameter

Returns the term frequency normalization

The lambda (λ_w) parameter in information-based models. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Computes the lambda parameter.

Explains the lambda parameter.

Subclasses must override this method to return the code of the lambda formula. Since the original paper is not very clear on this matter, and also uses the DFR naming scheme incorrectly, the codes here were chosen arbitrarily.

Computes lambda as docFreq+1 / numberOfDocuments+1. @lucene.experimental

Sole constructor: parameter-free

Computes lambda as totalTermFreq+1 / numberOfDocuments+1. @lucene.experimental

Sole constructor: parameter-free

Bayesian smoothing using Dirichlet priors. From Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '01). ACM, New York, NY, USA, 334-342. The formula as defined the paper assigns a negative score to documents that contain the term, but with fewer occurrences than predicted by the collection language model. The Lucene implementation returns 0 for such documents. @lucene.experimental

The μ parameter.

Instantiates the similarity with the provided μ parameter.

Instantiates the similarity with the default μ value of 2000.

Returns the μ parameter.

Language model based on the Jelinek-Mercer smoothing method. From Chengxiang Zhai and John Lafferty. 2001. A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '01). ACM, New York, NY, USA, 334-342. The model has a single parameter, λ. According to said paper, the optimal value depends on both the collection and the query. The optimal value is around 0.1 for title queries and 0.7 for long queries. @lucene.experimental

The λ parameter.

Instantiates with the specified and λ parameter.

Instantiates with the specified λ parameter.

Returns the λ parameter.

Abstract superclass for language modeling Similarities. The following inner types are introduced: , which defines a new statistic, the probability that the collection language model generates the current term; , which is a strategy interface for object that compute the collection language model p(w|C); , an implementation of the former, that computes the term probability as the number of occurrences of the term in the collection, divided by the total number of tokens. @lucene.experimental

The collection model.

Creates a new instance with the specified collection language model.

Creates a new instance with the default collection language model.

Computes the collection probability of the current term in addition to the usual statistics.

Returns the name of the LM method. The values of the parameters should be included as well. Used in .

Returns the name of the LM method. If a custom collection model strategy is used, its name is included as well.

Stores the collection distribution of the current term.

The probability that the current term is generated by the collection.

Creates for the provided field and query-time boost

Returns the probability that the current term is generated by the collection.

A strategy for computing the collection language model.

Computes the probability p(w|C) according to the language model strategy for the current term.

The name of the collection model strategy.

Models p(w|C) as the number of occurrences of the term in the collection, divided by the total number of tokens + 1.

Sole constructor: parameter-free

Implements the CombSUM method for combining evidence from multiple similarity values described in: Joseph A. Shaw, Edward A. Fox. In Text REtrieval Conference (1993), pp. 243-252 @lucene.experimental

the sub-similarities used to create the combined score

Creates a which will sum the scores of the provided .

This class acts as the base class for the implementations of the term frequency normalization methods in the DFR framework. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns the normalized term frequency.

the field length.

Returns an explanation for the normalized term frequency. The default normalization methods use the field length of the document and the average field length to compute the normalized term frequency. This method provides a generic explanation for such methods. Subclasses that use other statistics must override this method.

Implementation used when there is no normalization.

Sole constructor: parameter-free

Subclasses must override this method to return the code of the normalization formula. Refer to the original paper for the list.

Normalization model that assumes a uniform distribution of the term frequency. While this model is parameterless in the original article, information-based models (see ) introduced a multiplying factor. The default value for the c parameter is 1. @lucene.experimental

Creates with the supplied parameter .

Hyper-parameter that controls the term frequency normalization with respect to the document length.

Calls

Returns the c parameter.

Normalization model in which the term frequency is inversely related to the length. While this model is parameterless in the original article, the thesis introduces the parameterized variant. The default value for the c parameter is 1. @lucene.experimental

Creates with the supplied parameter .

Hyper-parameter that controls the term frequency normalization with respect to the document length.

Calls

Returns the c parameter.

Dirichlet Priors normalization @lucene.experimental

Calls

Creates with the supplied parameter μ.

smoothing parameter μ

Returns the parameter μ

Pareto-Zipf Normalization @lucene.experimental

Calls

Creates with the supplied parameter .

represents A/(A+1) where A measures the specificity of the language.

Returns the parameter z

Provides the ability to use a different for different fields. Subclasses should implement to return an appropriate (for example, using field-specific parameter values) for the field. @lucene.experimental

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Returns a for scoring a field.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Hook to integrate coordinate-level matching. By default this is disabled (returns 1), as with most modern models this will only skew performance, but some implementations such as override this.

the number of query terms matched in the document the total number of terms in the query a score factor based on term overlap with the query

Computes the normalization value for a query given the sum of the normalized weights of each of the query terms. this value is passed back to the weight ( of each query term, to provide a hook to attempt to make scores from different queries comparable. By default this is disabled (returns 1), but some implementations such as override this.

the sum of the term normalization values a normalization factor for query weights

Computes the normalization value for a field, given the accumulated state of term processing for this field (see ). Matches in longer fields are less precise, so implementations of this method usually set smaller values when state.Length is large, and larger values when state.Length is small. @lucene.experimental

current processing state for this field computed norm value

Compute any collection-level weight (e.g. IDF, average document length, etc) needed for scoring a query.

the query-time boost. collection-level statistics, such as the number of tokens in the collection. term-level statistics, such as the document frequency of a term across the collection. object with the information this needs to score a query.

Creates a new to score matching documents from a segment of the inverted index.

collection information from segment of the inverted index to be scored. Sloppy for scoring documents across context if there is a low-level I/O error

API for scoring "sloppy" queries such as , , and . Frequencies are floating-point values: an approximate within-document frequency adjusted for "sloppiness" by .

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Score a single document

document id within the inverted index segment sloppy term frequency document's score

Computes the amount of a sloppy phrase match, based on an edit distance.

Calculate a scoring factor based on the data in the payload.

Explain the score for a single document

document id within the inverted index segment Explanation of how the sloppy term frequency was computed document's score

Stores the weight for a query across the indexed collection. this abstract implementation is empty; descendants of should subclass and define the statistics they require in the subclass. Examples include idf, average field length, etc.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

The value for normalization of contained query clauses (e.g. sum of squared weights). NOTE: a implementation might not use any query normalization at all, its not required. However, if it wants to participate in query normalization, it can return a value here.

Assigns the query normalization factor and boost from parent queries to this. NOTE: a implementation might not use this normalized value at all, its not required. However, its usually a good idea to at least incorporate the (e.g. from an outer ) into its score.

A subclass of that provides a simplified API for its descendants. Subclasses are only required to implement the and methods. Implementing is optional, inasmuch as already provides a basic explanation of the score and the term frequency. However, implementers of a subclass are encouraged to include as much detail about the scoring method as possible. Note: multi-word queries such as phrase queries are scored in a different way than Lucene's default ranking algorithm: whereas it "fakes" an IDF value for the phrase as a whole (since it does not know it), this class instead scores phrases as a summation of the individual term scores. @lucene.experimental

For . Precomputed for efficiency reasons.

True if overlap tokens (tokens with a position of increment of zero) are discounted from the document's length.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Factory method to return a custom stats object

Fills all member fields defined in in . Subclasses can override this method to fill additional stats.

Scores the document doc. Subclasses must apply their scoring formula in this class.

the corpus level statistics. the term frequency. the document length. the score.

Subclasses should implement this method to explain the score. already contains the score, the name of the class and the doc id, as well as the term frequency and its explanation; subclasses can add additional clauses to explain details of their scoring formulae. The default implementation does nothing.

the explanation to extend with details. the corpus level statistics. the document id. the term frequency. the document length.

Explains the score. The implementation here provides a basic explanation in the format Score(name-of-similarity, doc=doc-id, freq=term-frequency), computed from:, and attaches the score (computed via the method) and the explanation for the term frequency. Subclasses content with this format may add additional details in .

the corpus level statistics. the document id. the term frequency and its explanation. the document length. the explanation.

Subclasses must override this method to return the name of the and preferably the values of parameters (if any) as well.

Norm -> document length map.

Encodes the document length in the same way as .

Decodes a normalization factor (document length) stored in an index.

Encodes the length to a byte via .

Returns the base two logarithm of x.

Delegates the and methods to and , respectively.

Implementation of with the Vector Space Model. Expert: Scoring API. TFIDFSimilarity defines the components of Lucene scoring. Overriding computation of these components is a convenient way to alter Lucene scoring. Suggested reading: Introduction To Information Retrieval, Chapter 6. The following describes how Lucene scoring evolves from underlying information retrieval models to (efficient) implementation. We first brief on VSM Score, then derive from it Lucene's Conceptual Scoring Formula, from which, finally, evolves Lucene's Practical Scoring Function (the latter is connected directly with Lucene classes and methods). Lucene combines Boolean model (BM) of Information Retrieval with Vector Space Model (VSM) of Information Retrieval - documents "approved" by BM are scored by VSM. In VSM, documents and queries are represented as weighted vectors in a multi-dimensional space, where each distinct index term is a dimension, and weights are Tf-idf values. VSM does not require weights to be Tf-idf values, but Tf-idf values are believed to produce search results of high quality, and so Lucene is using Tf-idf. Tf and Idf are described in more detail below, but for now, for completion, let's just say that for given term t and document (or query) x, Tf(t,x) varies with the number of occurrences of term t in x (when one increases so does the other) and idf(t) similarly varies with the inverse of the number of index documents containing term t. VSM score of document d for query q is the Cosine Similarity of the weighted query vectors V(q) and V(d): cosine-similarity(q,d) = V(q) · V(d)–––––––––|V(q)| |V(d)|

VSM Score Where V(q) · V(d) is the dot product of the weighted vectors, and |V(q)| and |V(d)| are their Euclidean norms. Note: the above equation can be viewed as the dot product of the normalized weighted vectors, in the sense that dividing V(q) by its euclidean norm is normalizing it to a unit vector. Lucene refines VSM score for both search quality and usability: Normalizing V(d) to the unit vector is known to be problematic in that it removes all document length information. For some documents removing this info is probably ok, e.g. a document made by duplicating a certain paragraph 10 times, especially if that paragraph is made of distinct terms. But for a document which contains no duplicated paragraphs, this might be wrong. To avoid this problem, a different document length normalization factor is used, which normalizes to a vector equal to or larger than the unit vector: doc-len-norm(d). At indexing, users can specify that certain documents are more important than others, by assigning a document boost. For this, the score of each document is also multiplied by its boost value doc-boost(d). Lucene is field based, hence each query term applies to a single field, document length normalization is by the length of the certain field, and in addition to document boost there are also document fields boosts. The same field can be added to a document during indexing several times, and so the boost of that field is the multiplication of the boosts of the separate additions (or parts) of that field within the document. At search time users can specify boosts to each query, sub-query, and each query term, hence the contribution of a query term to the score of a document is multiplied by the boost of that query term query-boost(q). A document may match a multi term query without containing all the terms of that query (this is correct for some of the queries), and users can further reward documents matching more query terms through a coordination factor, which is usually larger when more terms are matched: coord-factor(q,d). Under the simplifying assumption of a single field in the index, we get Lucene's Conceptual scoring formula: score(q,d) = coord-factor(q,d) · query-boost(q) · V(q) · V(d) ––––––––– |V(q)| · doc-len-norm(d) · doc-boost(d) Lucene Conceptual Scoring Formula The conceptual formula is a simplification in the sense that (1) terms and documents are fielded and (2) boosts are usually per query term rather than per query. We now describe how Lucene implements this conceptual scoring formula, and derive from it Lucene's Practical Scoring Function. For efficient score computation some scoring components are computed and aggregated in advance: Query-boost for the query (actually for each query term) is known when search starts. Query Euclidean norm |V(q)| can be computed when search starts, as it is independent of the document being scored. From search optimization perspective, it is a valid question why bother to normalize the query at all, because all scored documents will be multiplied by the same |V(q)|, and hence documents ranks (their order by score) will not be affected by this normalization. There are two good reasons to keep this normalization: Recall that Cosine Similarity can be used find how similar two documents are. One can use Lucene for e.g. clustering, and use a document as a query to compute its similarity to other documents. In this use case it is important that the score of document d3 for query d1 is comparable to the score of document d3 for query d2. In other words, scores of a document for two distinct queries should be comparable. There are other applications that may require this. And this is exactly what normalizing the query vector V(q) provides: comparability (to a certain extent) of two or more queries. Applying query normalization on the scores helps to keep the scores around the unit vector, hence preventing loss of score data because of floating point precision limitations. Document length norm doc-len-norm(d) and document boost doc-boost(d) are known at indexing time. They are computed in advance and their multiplication is saved as a single value in the index: norm(d). (In the equations below, norm(t in d) means norm(field(t) in doc d) where field(t) is the field associated with term t.) Lucene's Practical Scoring Function is derived from the above. The color codes demonstrate how it relates to those of the conceptual formula: score(q,d) = coord(q,d) · queryNorm(q) · ∑ ( tf(t in d) · idf(t)² · t.Boost · norm(t,d) ) t in q Lucene Practical Scoring Function where tf(t in d) correlates to the term's frequency, defined as the number of times term t appears in the currently scored document d. Documents that have more occurrences of a given term receive a higher score. Note that tf(t in q) is assumed to be 1 and therefore it does not appear in this equation, However if a query contains twice the same term, there will be two term-queries with that same term and hence the computation would still be correct (although not very efficient). The default computation for tf(t in d) in DefaultSimilarity () is: tf(t in d) = frequency^½ idf(t) stands for Inverse Document Frequency. this value correlates to the inverse of DocFreq (the number of documents in which the term t appears). this means rarer terms give higher contribution to the total score. idf(t) appears for t in both the query and the document, hence it is squared in the equation. The default computation for idf(t) in DefaultSimilarity () is: idf(t) = 1 + log ( NumDocs ––––––––– DocFreq+1 ) coord(q,d) is a score factor based on how many of the query terms are found in the specified document. Typically, a document that contains more of the query's terms will receive a higher score than another document with fewer query terms. this is a search time factor computed in coord(q,d) () by the Similarity in effect at search time. queryNorm(q) is a normalizing factor used to make scores between queries comparable. this factor does not affect document ranking (since all ranked documents are multiplied by the same factor), but rather just attempts to make scores from different queries (or even different indexes) comparable. this is a search time factor computed by the Similarity in effect at search time. The default computation in DefaultSimilarity () produces a Euclidean norm: queryNorm(q) = queryNorm(sumOfSquaredWeights) = 1 –––––––––––––– sumOfSquaredWeights^½ The sum of squared weights (of the query terms) is computed by the query object. For example, a computes this value as: sumOfSquaredWeights = q.Boost ² · ∑ ( idf(t) · t.Boost ) ² t in q where sumOfSquaredWeights is and q.Boost is t.Boost is a search time boost of term t in the query q as specified in the query text (see query syntax), or as set by application calls to . Notice that there is really no direct API for accessing a boost of one term in a multi term query, but rather multi terms are represented in a query as multi objects, and so the boost of a term in the query is accessible by calling the sub-query . norm(t,d) encapsulates a few (indexing time) boost and length factors: Field boost - set before adding the field to a document. lengthNorm - computed when the document is added to the index in accordance with the number of tokens of this field in the document, so that shorter fields contribute more to the score. LengthNorm is computed by the class in effect at indexing. The method is responsible for combining all of these factors into a single . When a document is added to the index, all the above factors are multiplied. If the document has multiple fields with the same name, all their boosts are multiplied together: norm(t,d) = lengthNorm · ∏ field f in d named as t Note that search time is too late to modify this norm part of scoring, e.g. by using a different for search.

Sole constructor. (For invocation by subclass constructors, typically implicit.)

Computes a score factor based on the fraction of all query terms that a document contains. this value is multiplied into scores. The presence of a large portion of the query terms indicates a better match with the query, so implementations of this method usually return larger values when the ratio between these parameters is large and smaller values when the ratio between them is small.

The number of query terms matched in the document The total number of terms in the query A score factor based on term overlap with the query

Computes the normalization value for a query given the sum of the squared weights of each of the query terms. this value is multiplied into the weight of each query term. While the classic query normalization factor is computed as 1/sqrt(sumOfSquaredWeights), other implementations might completely ignore sumOfSquaredWeights (ie return 1). This does not affect ranking, but the default implementation does make scores from different queries more comparable than they would be by eliminating the magnitude of the vector as a factor in the score.

The sum of the squares of query term weights A normalization factor for query weights

Computes a score factor based on a term or phrase's frequency in a document. This value is multiplied by the factor for each term in the query and these products are then summed to form the initial score for a document. Terms and phrases repeated in a document indicate the topic of the document, so implementations of this method usually return larger values when is large, and smaller values when is small.

The frequency of a term within a document A score factor based on a term's within-document frequency

Computes a score factor for a simple term and returns an explanation for that score factor. The default implementation uses:


             Idf(docFreq, searcher.MaxDoc);

Note that is used instead of because also is used, and when the latter is inaccurate, so is , and in the same direction. In addition, is more efficient to compute

Collection-level statistics Term-level statistics for the term An Explain object that includes both an idf score factor and an explanation for the term.

Computes a score factor for a phrase. The default implementation sums the idf factor for each term in the phrase.

Collection-level statistics Term-level statistics for the terms in the phrase An Explain object that includes both an idf score factor for the phrase and an explanation for each term.

Computes a score factor based on a term's document frequency (the number of documents which contain the term). This value is multiplied by the factor for each term in the query and these products are then summed to form the initial score for a document. Terms that occur in fewer documents are better indicators of topic, so implementations of this method usually return larger values for rare terms, and smaller values for common terms.

The number of documents which contain the term The total number of documents in the collection A score factor based on the term's document frequency

Compute an index-time normalization value for this field instance. This value will be stored in a single byte lossy representation by .

Statistics of the current field (such as length, boost, etc) An index-time normalization value

Decodes a normalization factor stored in an index.

Encodes a normalization factor for storage in an index.

Computes the amount of a sloppy phrase match, based on an edit distance. this value is summed for each sloppy phrase match in a document to form the frequency to be used in scoring instead of the exact term count. A phrase match with a small edit distance to a document passage more closely matches the document, so implementations of this method usually return larger values when the edit distance is small and smaller values when it is large.

The edit distance of this sloppy phrase match The frequency increment for this match

Calculate a scoring factor based on the data in the payload. Implementations are responsible for interpreting what is in the payload. Lucene makes no assumptions about what is in the byte array.

The docId currently being scored. The start position of the payload The end position of the payload The payload byte array to be scored An implementation dependent float to be used as a scoring factor

Collection statistics for the TF-IDF model. The only statistic of interest to this model is idf.

The idf and its explanation

Score a candidate doc for all slop-valid position-combinations (matches) encountered while traversing/hopping the PhrasePositions. The score contribution of a match depends on the distance: - highest score for distance=0 (exact match). - score gets lower as distance gets higher. Example: for query "a b"~2, a document "x a b a y" can be scored twice: once for "a b" (distance=0), and once for "b a" (distance=2). Possibly not all valid combinations are encountered, because for efficiency we always propagate the least PhrasePosition. This allows to base on and move forward faster. As result, for example, document "a b c b a" would score differently for queries "a b c"~4 and "c b a"~4, although they really are equivalent. Similarly, for doc "a b c b a f g", query "c b"~2 would get same score as "g f"~2, although "c b"~2 could be matched twice. We may want to fix this in the future (currently not, for performance reasons).

Advance a PhrasePosition and update 'end', return false if exhausted

pp was just advanced. If that caused a repeater collision, resolve by advancing the lesser of the two colliding pps. Note that there can only be one collision, as by the initialization there were no collisions before pp was advanced.

Compare two pps, but only by position and offset

Index of a pp2 colliding with pp, or -1 if none

Initialize in place. A one time initialization for this scorer (on first doc matching all terms): Check if there are repetitions If there are, find groups of repetitions. Examples: no repetitions: "ho my"~2 >repetitions: "ho my my"~2 repetitions: "my ho my"~2

false if PPs are exhausted (and so current doc will not be a match)

No repeats: simplest case, and most common. It is important to keep this piece of the code simple and efficient

With repeats: not so simple.

Move all PPs to their first position

Fill the queue (all pps are already placed)

At initialization (each doc), each repetition group is sorted by (query) offset. this provides the start condition: no collisions. Case 1: no multi-term repeats It is sufficient to advance each pp in the group by one less than its group index. So lesser pp is not advanced, 2nd one advance once, 3rd one advanced twice, etc. Case 2: multi-term repeats

false if PPs are exhausted.

Initialize with checking for repeats. Heavy work, but done only for the first candidate doc. If there are repetitions, check if multi-term postings (MTP) are involved. Without MTP, once PPs are placed in the first candidate doc, repeats (and groups) are visible. With MTP, a more complex check is needed, up-front, as there may be "hidden collisions". For example P1 has {A,B}, P1 has {B,C}, and the first doc is: "A C B". At start, P1 would point to "A", p2 to "C", and it will not be identified that P1 and P2 are repetitions of each other. The more complex initialization has two parts: (1) identification of repetition groups. (2) advancing repeat groups at the start of the doc. For (1), a possible solution is to just create a single repetition group, made of all repeating pps. But this would slow down the check for collisions, as all pps would need to be checked. Instead, we compute "connected regions" on the bipartite graph of postings and terms.

Sort each repetition group by (query) offset. Done only once (at first doc) and allows to initialize faster for each doc.

Detect repetition groups. Done once - for first doc.

Actual position in doc of a PhrasePosition, relies on that position = tpPos - offset)

Find repeating terms and assign them ordinal values

Find repeating pps, and for each, if has multi-terms, update this.hasMultiTermRpts

bit-sets - for each repeating pp, for each of its repeating terms, the term ordinal values is set

Union (term group) bit-sets until they are disjoint (O(n^^2)), and each group have different terms

Map each term to the single group that contains it

Encapsulates sort criteria for returned hits. The fields used to determine sort order must be carefully chosen. s must contain a single term in such a field, and the value of the term should indicate the document's relative position in a given sort order. The field must be indexed, but should not be tokenized, and does not need to be stored (unless you happen to want it back with the rest of your document data). In other words: document.Add(new Field("byNumber", x.ToString(CultureInfo.InvariantCulture), Field.Store.NO, Field.Index.NOT_ANALYZED));

Valid Types of Values

There are four possible kinds of term values which may be put into sorting fields: s, s, s, or s. Unless objects are specified, the type of value in the field is determined by parsing the first term in the field. term values should contain only digits and an optional preceding negative sign. Values must be base 10 and in the range and inclusive. Documents which should appear first in the sort should have low value integers, later documents high values (i.e. the documents should be numbered 1..n where 1 is the first and n the last). term values should contain only digits and an optional preceding negative sign. Values must be base 10 and in the range and inclusive. Documents which should appear first in the sort should have low value integers, later documents high values. term values should conform to values accepted by (except that NaN and Infinity are not supported). s which should appear first in the sort should have low values, later documents high values. term values can contain any valid , but should not be tokenized. The values are sorted according to their comparable natural order (). Note that using this type of term value has higher memory requirements than the other two types.

Object Reuse

One of these objects can be used multiple times and the sort order changed between usages. This class is thread safe.

Memory Usage

Sorting uses of caches of term values maintained by the internal HitQueue(s). The cache is static and contains an or array of length IndexReader.MaxDoc for each field name for which a sort is performed. In other words, the size of the cache in bytes is: 4 * IndexReader.MaxDoc * (# of different fields actually used to sort) For fields, the cache is larger: in addition to the above array, the value of every term in the field is kept in memory. If there are many unique terms in the field, this could be quite large. Note that the size of the cache is not affected by how many fields are in the index and might be used to sort - only by the ones actually used to sort a result set. Created: Feb 12, 2004 10:53:57 AM @since lucene 1.4

Represents sorting by computed relevance. Using this sort criteria returns the same results as calling without a sort criteria, only with slightly more overhead.

Represents sorting by index order.

Sorts by computed relevance. This is the same sort criteria as calling without a sort criteria, only with slightly more overhead.

Sorts by the criteria in the given .

Sorts in succession by the criteria in each .

Sets the sort to the given criteria. NOTE: When overriding this method, be aware that the constructor of this class calls a private method and not this virtual method. So if you need to override the behavior during the initialization, call your own private method from the constructor with whatever custom behavior you need.

Sets the sort to the given criteria in succession. NOTE: When overriding this method, be aware that the constructor of this class calls a private method and not this virtual method. So if you need to override the behavior during the initialization, call your own private method from the constructor with whatever custom behavior you need.

Representation of the sort criteria.

Array of objects used in this sort criteria

Rewrites the s in this , returning a new if any of the fields changes during their rewriting. @lucene.experimental

to use in the rewriting this if the Sort/Fields have not changed, or a new if there is a change Can be thrown by the rewriting

Returns true if is equal to this.

Returns a hash code value for this object.

Returns true if the relevance score is needed to sort documents.

Stores information about how to sort documents by terms in an individual field. Fields must be indexed in order to sort by them. Created: Feb 11, 2004 1:25:29 PM @since lucene 1.4

Represents sorting by document score (relevance).

Represents sorting by document number (index order).

Creates a sort by terms in the given field with the type of term values explicitly given.

Name of field to sort by. Can be null if is or . Type of values in the terms.

Creates a sort, possibly in reverse, by terms in the given field with the type of term values explicitly given.

Name of field to sort by. Can be null if is or . Type of values in the terms. True if natural order should be reversed.

Creates a sort by terms in the given field, parsed to numeric values using a custom .

Name of field to sort by. Must not be null. Instance of a , which must subclass one of the existing numeric parsers from . Sort type is inferred by testing which numeric parser the parser subclasses. if the parser fails to subclass an existing numeric parser, or field is null

Creates a sort, possibly in reverse, by terms in the given field, parsed to numeric values using a custom .

Name of field to sort by. Must not be null. Instance of a , which must subclass one of the existing numeric parsers from . Sort type is inferred by testing which numeric parser the parser subclasses. True if natural order should be reversed. if the parser fails to subclass an existing numeric parser, or field is null

Pass this to to have missing string values sort first.

Pass this to to have missing string values sort last.

Creates a sort with a custom comparison function.

Name of field to sort by; cannot be null. Returns a comparer for sorting hits.

Creates a sort, possibly in reverse, with a custom comparison function.

Name of field to sort by; cannot be null. Returns a comparer for sorting hits. True if natural order should be reversed.

Returns the name of the field. Could return null if the sort is by or .

Name of field, possibly null.

Returns the type of contents in the field.

One of , , , or .

Returns the instance of a parser that fits to the given sort type. May return null if no parser was specified. Sorting is using the default parser then.

An instance of a parser, or null.

Returns whether the sort should be reversed.

True if natural order should be reversed.

Returns the used for custom sorting.

Returns true if is equal to this. If a or was provided, it must properly implement equals (unless a singleton is always used).

Returns a hash code value for this object. If a or was provided, it must properly implement GetHashCode() (unless a singleton is always used).

Returns the to use for sorting. @lucene.experimental

Number of top hits the queue will store Position of this within . The comparer is primary if sortPos==0, secondary if sortPos==1, etc. Some comparers can optimize themselves when they are the primary sort. to use when sorting

Rewrites this , returning a new if a change is made. Subclasses should override this define their rewriting behavior when this SortField is of type . @lucene.experimental

to use during rewriting New rewritten , or this if nothing has changed. Can be thrown by the rewriting

Whether the relevance score is needed to sort documents.

Specifies the type of the terms to be sorted, or special types such as CUSTOM

Sort by document score (relevance). Sort values are and higher values are at the front.

Sort by document number (index order). Sort values are and lower values are at the front.

Sort using term values as s. Sort values are s and lower values are at the front.

Sort using term values as encoded s. Sort values are and lower values are at the front. NOTE: This was INT in Lucene

Sort using term values as encoded s. Sort values are and lower values are at the front. NOTE: This was FLOAT in Lucene

Sort using term values as encoded s. Sort values are and lower values are at the front. NOTE: This was LONG in Lucene

Sort using term values as encoded s. Sort values are and lower values are at the front.

Sort using term values as encoded s. Sort values are and lower values are at the front. NOTE: This was SHORT in Lucene

Sort using a custom . Sort values are any and sorting is done according to natural order.

Sort using term values as encoded s. Sort values are and lower values are at the front.

Sort using term values as s, but comparing by value (using ) for all comparisons. this is typically slower than , which uses ordinals to do the sorting.

Sort use index values.

Force rewriting of using before it can be used for sorting

A that re-sorts according to a provided Sort.

Sole constructor.

Wrapper to allow objects participate in composite single-field SpanQueries by 'lying' about their search field. That is, the masked will function as normal, but simply hands back the value supplied in this class's constructor. This can be used to support Queries like or across different fields, which is not ordinarily permitted. This can be useful for denormalized relational data: for example, when indexing a document with conceptually many 'children':


              teacherid: 1
              studentfirstname: james
              studentsurname: jones
            
              teacherid: 2
              studenfirstname: james
              studentsurname: smith
              studentfirstname: sally
              studentsurname: jones

A with a slop of 0 can be applied across two objects as follows:


                 SpanQuery q1  = new SpanTermQuery(new Term("studentfirstname", "james"));
                 SpanQuery q2  = new SpanTermQuery(new Term("studentsurname", "jones"));
                 SpanQuery q2m = new FieldMaskingSpanQuery(q2, "studentfirstname");
                 Query q = new SpanNearQuery(new SpanQuery[] { q1, q2m }, -1, false);

to search for 'studentfirstname:james studentsurname:jones' and find teacherid 1 without matching teacherid 2 (which has a 'james' in position 0 and 'jones' in position 1). Note: as returns the masked field, scoring will be done using the and collection statistics of the field name supplied, but with the term statistics of the real field. This may lead to exceptions, poor performance, and unexpected scoring behavior.

A that is formed from the ordered subspans of a where the subspans do not overlap and have a maximum slop between them. The formed spans only contains minimum slop matches. The matching slop is computed from the distance(s) between the non overlapping matching . Successive matches are always formed from the successive of the . The formed spans may contain overlaps when the slop is at least 1. For example, when querying using t1 t2 t3 with slop at least 1, the fragment: t1 t2 t1 t3 t2 t3 matches twice: t1 t2 .. t3 t1 .. t2 t3 Expert: Only public for subclassing. Most implementations should not need this class

The spans in the same order as the

Indicates that all subSpans have same

Returns the document number of the current match. Initially invalid.

Returns the start position of the current match. Initially invalid.

Returns the end position of the current match. Initially invalid.

Move to the next match, returning true iff any such exists.

Skips to the first match beyond the current, whose document number is greater than or equal to target. The behavior of this method is undefined when called with target <= current, or after the iterator has exhausted. Both cases may result in unpredicted behavior. Returns true if there is such a match. Behaves as if written:


                bool SkipTo(int target) 
                {
                    do 
                    {
                        if (!Next())
                            return false;
                    } while (target > Doc);
                    return true;
                }

Most implementations are considerably more efficient than that.

Advances the to just after an ordered match with a minimum slop that is smaller than the slop allowed by the .

true if there is such a match.

Advance the to the same document

Check whether two in the same document are ordered.

true if starts before or the spans start at the same position, and ends before .

Like , but use the spans starts and ends as parameters.

Order the within the same document by advancing all later spans after the previous one.

The are ordered in the same doc, so there is a possible match. Compute the slop while making the match as short as possible by advancing all except the last one in reverse order.

Similar to , but for the unordered case. Expert: Only public for subclassing. Most implementations should not need this class

Wraps a , and can be used to form a linked list.

WARNING: The List is not necessarily in order of the the positions

Collection of payloads if there is a low-level I/O error

Matches spans near the beginning of a field. This class is a simple extension of in that it assumes the start to be zero and only checks the end boundary.

Construct a matching spans in whose end position is less than or equal to .

Wraps any as a , so it can be nested within other classes. The query is rewritten by default to a containing the expanded terms, but this can be customized. Example:


            WildcardQuery wildcard = new WildcardQuery(new Term("field", "bro?n"));
            SpanQuery spanWildcard = new SpanMultiTermQueryWrapper<WildcardQuery>(wildcard);
            // do something with spanWildcard, such as use it in a SpanFirstQuery

Create a new .

Query to wrap. NOTE: This will set on the wrapped , changing its rewrite method to a suitable one for spans. Be sure to not change the rewrite method on the wrapped query afterwards! Doing so will throw on rewriting this query!

Expert: Gets or Sets the rewrite method. This only makes sense to be a span rewrite method.

Returns the wrapped query

A rewrite method that first translates each term into a in a clause in a , and keeps the scores as computed by the query.

A rewrite method that first translates each term into a in a clause in a , and keeps the scores as computed by the query. This rewrite method only uses the top scoring terms so it will not overflow the boolean max clause count.

Create a for at most terms.

return the maximum priority queue size. NOTE: This was size() in Lucene.

Abstract class that defines how the query is rewritten.

LUCENENET specific interface for referring to/identifying a without referring to its generic closing type.

Expert: Gets or Sets the rewrite method. This only makes sense to be a span rewrite method.

Returns the wrapped query

Only return those matches that have a specific payload at the given position.

The underlying to check. The of payloads to match. IMPORTANT: If the type provided does not implement (including arrays) or , it should either implement or override and with implementations that compare the values of the byte arrays to ensure they are the same.

Matches spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order.

Construct a . Matches spans matching a span from each clause, with up to total unmatched positions between them. * When is true, the spans from each clause must be * ordered as in .

The clauses to find near each other The slop value true if order is important

Return the clauses whose spans are matched.

Return the maximum number of intervening unmatched positions permitted.

Return true if matches are required to be in-order.

Returns true iff o is equal to this.

Removes matches which overlap with another or within a x tokens before or y tokens after another .

Construct a matching spans from which have no overlap with spans from .

Construct a matching spans from which have no overlap with spans from within tokens of .

Construct a matching spans from which have no overlap with spans from within tokens before or tokens of .

Return the whose matches are filtered.

Return the whose matches must not overlap those returned.

Returns true if is equal to this.

Matches the union of its clauses.

Construct a merging the provided .

Adds a to this query

Return the clauses whose spans are matched.

Only return those matches that have a specific payload at the given position. Do not use this with a that contains a . Instead, use since it properly handles the fact that payloads aren't ordered by .

The underlying to check The of payloads to match. IMPORTANT: If the type provided does not implement (including arrays) or , it should either implement or override and with implementations that compare the values of the byte arrays to ensure they are the same.

Base class for filtering a based on the position of a match.

The whose matches are filtered.

Return value for .

Indicates the match should be accepted

Indicates the match should be rejected

Indicates the match should be rejected, and the enumeration should advance to the next document.

Implementing classes are required to return whether the current position is a match for the passed in "match" . This is only called if the underlying for the match is successful

The instance, positioned at the spot to check Whether the match is accepted, rejected, or rejected and should move to the next doc.

Checks to see if the lies between a start and end position

for a derivation that is optimized for the case where start position is 0 The minimum position permitted in a match The maximum end position permitted in a match.

Base class for span-based queries.

Expert: Returns the matches for this query in an index. Used internally to search for spans.

Returns the name of the field matched by this query. Note that this may return null if the query matches no terms.

Expert: an enumeration of span matches. Used to implement span searching. Each span represents a range of term positions within a document. Matches are enumerated in order, by increasing document number, within that by increasing start position and finally by increasing end position.

Move to the next match, returning true if any such exists.


                bool SkipTo(int target) 
                {
                    do 
                    {
                        if (!Next())
                            return false;
                    } while (target > Doc);
                    return true;
                }

Most implementations are considerably more efficient than that.

Returns the document number of the current match. Initially invalid.

Returns the start position of the current match. Initially invalid.

Returns the end position of the current match. Initially invalid.

Returns the payload data for the current span. this is invalid until is called for the first time. This method must not be called more than once after each call of . However, most payloads are loaded lazily, so if the payload data for the current position is not needed, this method may not be called at all for performance reasons. An ordered SpanQuery does not lazy load, so if you have payloads in your index and you do not want ordered SpanNearQuerys to collect payloads, you can disable collection with a constructor option. Note that the return type is a collection, thus the ordering should not be relied upon. @lucene.experimental

A of byte arrays containing the data of this payload, otherwise null if is false if there is a low-level I/O error

Checks if a payload can be loaded at this position. Payloads can only be loaded once per call to .

true if there is a payload available at this position that can be loaded

Returns the estimated cost of this spans. This is generally an upper bound of the number of documents this iterator might match, but may be a rough heuristic, hardcoded value, or otherwise completely inaccurate.

Public for extension only.

Returns the intermediate "sloppy freq" adjusted for edit distance @lucene.internal

Matches spans containing a term.

Construct a matching the named term's spans.

Return the term whose spans are matched.

Expert-only. Public for use by other weight implementations

Expert: Public for extension only

Return a suitable top-level for holding all expanded terms.

Add a term to the top-level query

attributes used for communication with the enum

return false to stop collecting

the next segment's that is used to collect terms

A that matches documents containing a term. this may be combined with other terms with a .

Returns a positioned at this weights or null if the term does not exist in the given context.

Constructs a query for the term .

Expert: constructs a that will use the provided instead of looking up the docFreq against the searcher.

Expert: constructs a that will use the provided docFreq instead of looking up the docFreq against the searcher.

Returns the term of this query.

Prints a user-readable version of this query.

Returns true if is equal to this.

Returns a hash code value for this object.

A that restricts search results to a range of term values in a given field. This filter matches the documents looking for terms that fall into the supplied range according to , It is not intended for numerical ranges; use instead. If you construct a large number of range filters with different ranges but on the same field, may have significantly better performance. @since 2.9

The field this range applies to The lower bound on this range The upper bound on this range Does this range include the lower bound? Does this range include the upper bound? if both terms are null or if lowerTerm is null and includeLower is true (similar for upperTerm and includeUpper)

Factory that creates a new using s for term text.

Constructs a filter for field matching less than or equal to .

Constructs a filter for field matching greater than or equal to .

Returns the lower value of this range filter

Returns the upper value of this range filter

Returns true if the lower endpoint is inclusive

Returns true if the upper endpoint is inclusive

A that matches documents within an range of terms. This query matches the documents looking for terms that fall into the supplied range according to . It is not intended for numerical ranges; use instead. This query uses the rewrite method. @since 2.9

Constructs a query selecting all terms greater/equal than but less/equal than . If an endpoint is null, it is said to be "open". Either or both endpoints may be open. Open endpoints may not be exclusive (you can't select all but the first or last term without explicitly specifying the term to exclude.)

The field that holds both lower and upper terms. The term text at the lower end of the range. The term text at the upper end of the range. If true, the is included in the range. If true, the is included in the range.

Factory that creates a new using s for term text.

Returns the lower value of this range query

Returns the upper value of this range query

Returns true if the lower endpoint is inclusive

Returns true if the upper endpoint is inclusive

Prints a user-readable version of this query.

Subclass of for enumerating all terms that match the specified range parameters. Term enumerations are always ordered by . Each term in the enumeration is greater than all that precede it.

Enumerates all terms greater/equal than but less/equal than . If an endpoint is null, it is said to be "open". Either or both endpoints may be open. Open endpoints may not be exclusive (you can't select all but the first or last term without explicitly specifying the term to exclude.)

TermsEnum to filter The term text at the lower end of the range The term text at the upper end of the range If true, the is included in the range. If true, the is included in the range.

Expert: A for documents matching a .

Construct a .

The weight of the in the query. An iterator over the documents matching the . The implementation to be used for score computations.

Advances to the next document matching the query.

The document matching the query or if there are no more documents.

Advances to the first match beyond the current whose document number is greater than or equal to a given target. The implementation uses .

The target document number. The matching document or if none exist.

Returns a string representation of this .

Contains statistics for a specific term @lucene.experimental

Sole constructor.

Returns the term text

Returns the number of documents this term occurs in

Returns the total number of occurrences of this term

The is used to timeout search requests that take longer than the maximum allowed search time limit. After this time is exceeded, the search thread is stopped by throwing a .

Thrown when elapsed search time exceeds allowed search time.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Returns allowed time (milliseconds).

Returns elapsed time (milliseconds).

Returns last doc (absolute doc id) that was collected when the search time exceeded.

Create a wrapper over another with a specified timeout.

The wrapped The timer clock Max time allowed for collecting hits after which is thrown

Sets the baseline for this collector. By default the collectors baseline is initialized once the first reader is passed to the collector. To include operations executed in prior to the actual document collection set the baseline through this method in your prelude. Example usage:


                // Counter is in the Lucene.Net.Util namespace
                Counter clock = Counter.NewCounter(true);
                long baseline = clock.Get();
                // ... prepare search
                TimeLimitingCollector collector = new TimeLimitingCollector(c, clock, numTicks);
                collector.SetBaseline(baseline);
                indexSearcher.Search(query, collector);

Syntactic sugar for using on the clock passed to the constructor.

Checks if this time limited collector is greedy in collecting the last hit. A non greedy collector, upon a timeout, would throw a without allowing the wrapped collector to collect current doc. A greedy one would first allow the wrapped hit collector to collect current doc and only then throw a .

Calls on the decorated unless the allowed time has passed, in which case it throws an exception.

If the time allowed has exceeded.

This is so the same timer can be used with a multi-phase search process such as grouping. We don't want to create a new for each phase because that would reset the timer for each phase. Once time is up subsequent phases need to timeout quickly.

The actual collector performing search functionality.

Returns the global 's Invoking this creates may create a new instance of iff the global has never been accessed before. The thread returned from this method is started on creation and will be alive unless you stop the via . @lucene.experimental

the global TimerThreads

Returns the global . Invoking this creates may create a new instance of iff the global has never been accessed before. The thread returned from this method is started on creation and will be alive unless you stop the via . @lucene.experimental

the global

Thread used to timeout search requests. Can be stopped completely with @lucene.experimental

Get the timer value in milliseconds.

Stops the timer thread

Return the timer resolution.

Represents hits returned by and .

The total number of hits for the query.

The top hits for the query.

Stores the maximum score value encountered, needed for normalizing.

Returns the maximum score value encountered. Note that in case scores are not tracked, this returns .

Constructs a with a default maxScore=System.Single.NaN.

Returns a new , containing results across the provided , sorting by the specified . Each of the must have been sorted by the same , and sort field values must have been filled (ie, fillFields=true must be passed to . Pass =null to merge sort by score descending. @lucene.experimental

is null.

Same as but also slices the result at the same time based on the provided start and size. The return TopDocs will always have a scoreDocs with length of at most .

is null.

A base class for all collectors that return a output. This collector allows easy extension by providing a single constructor which accepts a as well as protected members for that priority queue and a counter of the number of total hits. Extending classes can override any of the methods to provide their own implementation, as well as avoid the use of the priority queue entirely by passing null to . In that case however, you might want to consider overriding all methods, in order to avoid a .

This is used in case is called with illegal parameters, or there simply aren't (enough) results.

The priority queue which holds the top documents. Note that different implementations of give different meaning to 'top documents'. for example aggregates the top scoring documents, while other priority queue implementations may hold documents sorted by other criteria.

The total number of documents that the collector encountered.

Sole constructor.

Populates the results array with the instances. This can be overridden in case a different type should be returned.

Returns a instance containing the given results. If is null it means there are no results to return, either because there were 0 calls to or because the arguments to were invalid.

The total number of documents that matched this query.

The number of valid priority queue entries

Returns the top docs that were collected by this collector.

Returns the documents in the rage [ .. pq.Count) that were collected by this collector. Note that if >= pq.Count, an empty is returned. This method is convenient to call if the application always asks for the last results, starting from the last 'page'. NOTE: you cannot call this method more than once for each search execution. If you need to call it more than once, passing each time a different , you should call and work with the returned object, which will contain all the results this search execution collected.

Returns the documents in the rage [ .. +) that were collected by this collector. Note that if >= pq.Count, an empty is returned, and if pq.Count - < , then only the available documents in [ .. pq.Count) are returned. This method is useful to call in case pagination of search results is allowed by the search application, as well as it attempts to optimize the memory used by allocating only as much as requested by . NOTE: you cannot call this method more than once for each search execution. If you need to call it more than once, passing each time a different range, you should call and work with the returned object, which will contain all the results this search execution collected.

Called before successive calls to . Implementations that need the score of the current document (passed-in to ), should save the passed-in and call when needed.

Called before collecting from each . All doc ids in will correspond to . Add to the current 's internal document id to re-base ids in .

Next atomic reader context

LUCENENET specific interface used to reference without referencing its generic type.

The total number of documents that matched this query.

Returns the top docs that were collected by this collector.

A that sorts by using s. See the method for instantiating a . @lucene.experimental

Implements a over one criteria, without tracking document scores and maxScore.

Implements a over one criteria, without tracking document scores and maxScore, and assumes out of orderness in doc Ids collection.

Implements a over one criteria, while tracking document scores but no maxScore.

Implements a over one criteria, while tracking document scores but no maxScore, and assumes out of orderness in doc Ids collection.

Implements a over one criteria, with tracking document scores and maxScore.

Implements a over one criteria, with tracking document scores and maxScore, and assumes out of orderness in doc Ids collection.

Implements a over multiple criteria, without tracking document scores and maxScore.

Implements a over multiple criteria, without tracking document scores and maxScore, and assumes out of orderness in doc Ids collection.

Implements a over multiple criteria, with tracking document scores and maxScore.

Implements a over multiple criteria, with tracking document scores and maxScore, and assumes out of orderness in doc Ids collection.

Implements a over multiple criteria, with tracking document scores and maxScore.

Implements a over multiple criteria, with tracking document scores and maxScore, and assumes out of orderness in doc Ids collection.

Implements a when after != null.

Stores the maximum score value encountered, needed for normalizing. If document scores are not tracked, this value is initialized to NaN.

Creates a new from the given arguments. NOTE: The instances returned by this method pre-allocate a full array of length .

The sort criteria (s). The number of results to collect. Specifies whether the actual field values should be returned on the results (). Specifies whether document scores should be tracked and set on the results. Note that if set to false, then the results' scores will be set to . Setting this to true affects performance, as it incurs the score computation on each competitive result. Therefore if document scores are not required by the application, it is recommended to set it to false. Specifies whether the query's should be tracked and set on the resulting . Note that if set to false, returns . Setting this to true affects performance as it incurs the score computation on each result. Also, setting this true automatically sets to true as well. Specifies whether documents are scored in doc Id order or not by the given in . A instance which will sort the results by the sort criteria. If there is a low-level I/O error is null.

Creates a new from the given arguments. NOTE: The instances returned by this method pre-allocate a full array of length .

The sort criteria (s). The number of results to collect. Only hits after this will be collected Specifies whether the actual field values should be returned on the results (). Specifies whether document scores should be tracked and set on the results. Note that if set to false, then the results' scores will be set to . Setting this to true affects performance, as it incurs the score computation on each competitive result. Therefore if document scores are not required by the application, it is recommended to set it to false. Specifies whether the query's maxScore should be tracked and set on the resulting . Note that if set to false, returns . Setting this to true affects performance as it incurs the score computation on each result. Also, setting this true automatically sets to true as well. Specifies whether documents are scored in doc Id order or not by the given in . A instance which will sort the results by the sort criteria. If there is a low-level I/O error is null.

Represents hits returned by .

The fields which were used to sort results by.

Creates one of these objects.

Total number of hits for the query. The top hits for the query. The sort criteria used to find the top hits. The maximum score encountered.

A implementation that collects the top-scoring hits, returning them as a . this is used by to implement -based search. Hits are sorted by score descending and then (when the scores are tied) docID ascending. When you create an instance of this collector you should know in advance whether documents are going to be collected in doc Id order or not. NOTE: The values and are not valid scores. This collector will not properly collect hits with such scores.

Creates a new given the number of hits to collect and whether documents are scored in order by the input to . NOTE: The instances returned by this method pre-allocate a full array of length , and fill the array with sentinel objects.

Creates a new given the number of hits to collect, the bottom of the previous page, and whether documents are scored in order by the input to . NOTE: The instances returned by this method pre-allocate a full array of length , and fill the array with sentinel objects.

Base rewrite method for collecting only the top terms via a priority queue. @lucene.internal - Only public to be accessible by spans package.

Create a for at most terms. NOTE: if is smaller than , then it will be used instead.

Return the maximum priority queue size. NOTE: This was size() in Lucene.

Return the maximum size of the priority queue (for boolean rewrites this is ).

Just counts the total number of hits.

Returns how many hits matched the search.

Expert: Calculate query weights and build query scorers. The purpose of is to ensure searching does not modify a , so that a instance can be reused. dependent state of the query should reside in the . dependent state should reside in the . Since creates instances for a given () callers must maintain the relationship between the searcher's top-level and the context used to create a . A is used in the following way: A is constructed by a top-level query, given a (). The method is called on the to compute the query normalization factor of the query clauses contained in the query. The query normalization factor is passed to . At this point the weighting is complete. A is constructed by . @since 2.9

An explanation of the score computation for the named document.

The readers context to create the for. The document's id relative to the given context's reader An for the score if an occurs

The query that this concerns.

The value for normalization of contained query clauses (e.g. sum of squared weights).

Assigns the query normalization factor and boost from parent queries to this.

Returns a which scores documents in/out-of order according to scoreDocsInOrder. NOTE: even if scoreDocsInOrder is false, it is recommended to check whether the returned indeed scores documents out of order (i.e., call ), as some implementations will always return documents in-order. NOTE: null can be returned if no documents will be scored by this query.

The for which to return the . that represent the allowable docs to match (typically deleted docs but possibly filtering other documents) A which scores documents in/out-of order. if there is a low-level I/O error

Optional method, to return a to score the query and send hits to a . Only queries that have a different top-level approach need to override this; the default implementation pulls a normal and iterates and collects the resulting hits.

The for which to return the . Specifies whether in-order scoring of documents is required. Note that if set to false (i.e., out-of-order scoring is required), this method can return whatever scoring mode it supports, as every in-order scorer is also an out-of-order one. However, an out-of-order scorer may not support and/or , therefore it is recommended to request an in-order scorer if use of these methods is required. that represent the allowable docs to match (typically deleted docs but possibly filtering other documents) A which scores documents and passes them to a collector. if there is a low-level I/O error

Just wraps a and performs top scoring using it.

Returns true if this implementation scores docs only out of order. This method is used in conjunction with 's and to create a matching instance for a given , or vice versa. NOTE: the default implementation returns false, i.e. the scores documents in-order.

Implements the wildcard search query. Supported wildcards are *, which matches any character sequence (including the empty one), and ?, which matches any single character. '\' is the escape character. Note this query can be slow, as it needs to iterate over many terms. In order to prevent extremely slow WildcardQueries, a Wildcard term should not start with the wildcard * This query uses the rewrite method.

String equality with support for wildcards

Char equality with support for wildcards

Escape character

Constructs a query for terms matching .

Convert Lucene wildcard syntax into an automaton. @lucene.internal

Returns the pattern term.

Prints a user-readable version of this query.

holds a reference instance and ensures it is properly de-referenced from its corresponding when is called. This struct is intended to be used with a using block to simplify releasing a reference such as a instance. LUCENENET specific

The reference type.

The reference acquired from the .

Ensures the reference is properly de-referenced from its . After this call, will be null.

Obtain the current reference. Like , but intended for use in a using block so calling happens implicitly. For example:


            var searcherManager = new SearcherManager(indexWriter, true, null);
            using (var context = searcherManager.GetContext())
            {
                IndexSearcher searcher = context.Reference;
                
                // use searcher...
            }

The reference type this A instance that holds the and ensures it is released properly when is called.

Base implementation for a concrete . @lucene.experimental

Gets a value indicating whether the current instance is open. Expert: This is useful for implementing the logic.

Atomically sets the value to the given updated value if the current value == the expected value. Expert: Use this in the call to skip duplicate calls by using the folling if block to guard the dispose logic.


            protected override void Dispose(bool disposing)
            {
                if (!CompareAndSetIsOpen(expect: true, update: false)) return;
            
                // Dispose unmanaged resources
                if (disposing)
                {
                    // Dispose managed resources
                }
            }

The expected value (the comparand). The new value. true if successful. A false return value indicates that the actual value was not equal to the expected value.

Holds the LockFactory instance (implements locking for this instance).

Sole constructor.

Wraps another with an internal buffer to speed up checksum calculations.

Default buffer size: 256

Create a new with

Create a new with the specified

Simple implementation of that wraps another input and delegates calls.

Creates a new

Base implementation class for buffered .

Default buffer size set to .

A buffer size for merges set to .

Inits with a specific

Change the buffer size used by this

Returns buffer size.

NOTE: this was readShort() in Lucene

NOTE: this was readInt() in Lucene

NOTE: this was readLong() in Lucene

NOTE: this was readVInt() in Lucene

NOTE: this was readVLong() in Lucene

Expert: implements buffer refill. Reads bytes from the current position in the input.

the array to read bytes into the offset in the array to start storing bytes the number of bytes to read

Expert: implements seek. Sets current position in this file, where the next will occur.

Flushes the in-memory buffer to the given output, copying at most . NOTE: this method does not refill the buffer, however it does advance the buffer position.

the number of bytes actually flushed from the in-memory buffer.

Returns default buffer sizes for the given

Base implementation class for buffered .

The default buffer size in bytes ().

Creates a new with the default buffer size ( bytes see )

Creates a new with the given buffer size.

the buffer size in bytes used to buffer writes internally. if the given buffer size is less or equal to 0

Expert: implements buffer write. Writes bytes at the current position in the output.

the bytes to write the number of bytes to write

Expert: implements buffer write. Writes bytes at the current position in the output.

the bytes to write the offset in the byte array the number of bytes to write

Returns size of the used output buffer in bytes.

DataInput backed by a byte array. WARNING: this class omits all low-level checks. @lucene.experimental

NOTE: sets pos to 0, which is not right if you had called reset w/ non-zero offset!!

LUCENENET NOTE: Important - always cast to ushort (System.UInt16) before using to ensure the value is positive! NOTE: this was readShort() in Lucene

NOTE: this was readInt() in Lucene

NOTE: this was readLong() in Lucene

NOTE: this was readVInt() in Lucene

NOTE: this was readVLong() in Lucene

DataOutput backed by a byte array. WARNING: this class omits most low-level checks, so be sure to test heavily with assertions enabled. @lucene.experimental

NOTE: When overriding this method, be aware that the constructor of this class calls a private method and not this virtual method. So if you need to override the behavior during the initialization, call your own private method from the constructor with whatever custom behavior you need.

Base implementation that uses an array of s to represent a file. Because Java's uses an to address the values, it's necessary to access a file greater in size using multiple byte buffers. For efficiency, this class requires that the buffers are a power-of-two (chunkSizePower).

NOTE: this was readShort() in Lucene

NOTE: this was readInt() in Lucene

NOTE: this was readLong() in Lucene

Creates a slice of this index input, with the given description, offset, and length. The slice is seeked to the beginning.

Returns a sliced view from a set of already-existing buffers: the last buffer's will be correct, but you must deal with separately (the first buffer will not be adjusted)

Called when the contents of a buffer will be no longer needed.

Extension of , computing checksum as it goes. Callers can retrieve the checksum via .

should be a non-null, opaque string describing this resource; it's returned from .

Returns the current checksum value

Sets current position in this file, where the next read will occur. can only seek forward and seeks are expensive since they imply to read bytes in-between the current position and the target position in order to update the checksum.

Class for accessing a compound stream. This class implements a directory, but is limited to only read operations. Directory methods that would normally modify data throw an exception. All files belonging to a segment have the same name with varying extensions. The extensions correspond to the different file formats used by the . When using the Compound File format these files are collapsed into a single .cfs file (except for the , with a corresponding .cfe file indexing its sub-files. Files: .cfs: An optional "virtual" file consisting of all the other index files for systems that frequently run out of file handles. .cfe: The "virtual" compound file's entry table holding all entries in the corresponding .cfs file. Description: Compound (.cfs) --> Header, FileData ^FileCount Compound Entry Table (.cfe) --> Header, FileCount, <FileName, DataOffset, DataLength> ^FileCount, Footer Header --> FileCount --> DataOffset,DataLength --> FileName --> FileData --> raw file data Footer --> Notes: FileCount indicates how many files are contained in this compound file. The entry table that follows has that many entries. Each directory entry contains a long pointer to the start of this file's data section, the files length, and a with that file's name. @lucene.experimental

Offset/Length for a slice inside of a compound file

Create a new .

Helper method that reads CFS entries from an input stream

Returns an array of strings, one for each file in the directory.

Returns true iff a file with the given name exists.

Not implemented

always: not supported by CFS

Not implemented

always: not supported by CFS

Returns the length of a file in the directory.

if the file does not exist

Not implemented

always: not supported by CFS

Combines multiple files into a single compound file. @lucene.internal

source file

temporary holder for the start of this file's data section

the directory which contains the file.

Create the compound stream in the specified file. The file name is the entire name (no extensions are added).

if or is null

Returns the directory of the compound file.

Returns the name of the compound file.

Disposes all resources and writes the entry table

if had been called before or if no file has been added to this object

Copy the contents of the file with specified extension into the provided output stream.

Abstract base class for performing read operations of Lucene's low-level data types. may only be used from one thread, because it is not thread safe (it keeps internal state like file position). To allow multithreaded use, every instance must be cloned before used in another thread. Subclasses must therefore implement , returning a new which operates on the same underlying resource, but positioned independently.

This buffer is used to skip over bytes with the default implementation of skipBytes. The reason why we need to use an instance member instead of sharing a single instance across threads is that some delegating implementations of DataInput might want to reuse the provided buffer in order to eg.update the checksum. If we shared the same buffer across threads, then another thread might update the buffer while the checksum is being computed, making it invalid. See LUCENE-5583 for more information.

Reads and returns a single byte.

Reads a specified number of bytes into an array at the specified offset.

the array to read bytes into the offset in the array to start storing bytes the number of bytes to read

Reads a specified number of bytes into an array at the specified offset with control over whether the read should be buffered (callers who have their own buffer should pass in "false" for ). Currently only respects this parameter.

the array to read bytes into the offset in the array to start storing bytes the number of bytes to read set to false if the caller will handle buffering.

Reads two bytes and returns a . LUCENENET NOTE: Important - always cast to ushort (System.UInt16) before using to ensure the value is positive! NOTE: this was readShort() in Lucene

Reads four bytes and returns an . NOTE: this was readInt() in Lucene

Reads an stored in variable-length format. Reads between one and five bytes. Smaller values take fewer bytes. Negative numbers are not supported. The format is described further in . NOTE: this was readVInt() in Lucene

Reads eight bytes and returns a . NOTE: this was readLong() in Lucene

Reads a stored in variable-length format. Reads between one and nine bytes. Smaller values take fewer bytes. Negative numbers are not supported. The format is described further in . NOTE: this was readVLong() in Lucene

Reads a .

Reads a IDictionary<string,string> previously written with .

Reads a ISet<string> previously written with .

Skip over bytes. The contract on this method is that it should have the same behavior as reading the same number of bytes into a buffer and discarding its content. Negative values of are not supported.

Abstract base class for performing write operations of Lucene's low-level data types. may only be used from one thread, because it is not thread safe (it keeps internal state like file position).

Writes a single byte. The most primitive data type is an eight-bit byte. Files are accessed as sequences of bytes. All other data types are defined as sequences of bytes, so file formats are byte-order independent.

Writes an array of bytes.

the bytes to write the number of bytes to write

Writes an array of bytes.

the bytes to write the offset in the byte array the number of bytes to write

Writes an as four bytes. 32-bit unsigned integer written as four bytes, high-order bytes first. NOTE: this was writeInt() in Lucene

Writes a short as two bytes. NOTE: this was writeShort() in Lucene

Writes an in a variable-length format. Writes between one and five bytes. Smaller values take fewer bytes. Negative numbers are supported, but should be avoided. VByte is a variable-length format for positive integers is defined where the high-order bit of each byte indicates whether more bytes remain to be read. The low-order seven bits are appended as increasingly more significant bits in the resulting integer value. Thus values from zero to 127 may be stored in a single byte, values from 128 to 16,383 may be stored in two bytes, and so on. VByte Encoding Example Value Byte 1 Byte 2 Byte 3 0 00000000 1 00000001 2 00000010 ... 127 01111111 128 10000000 00000001 129 10000001 00000001 130 10000010 00000001 ... 16,383 11111111 01111111 16,384 10000000 10000000 00000001 16,385 10000001 10000000 00000001 ... this provides compression while still being efficient to decode. NOTE: this was writeVInt() in Lucene

Smaller values take fewer bytes. Negative numbers are supported, but should be avoided. If there is an I/O error writing to the underlying medium.

Writes a as eight bytes. 64-bit unsigned integer written as eight bytes, high-order bytes first. NOTE: this was writeLong() in Lucene

Writes an in a variable-length format. Writes between one and nine bytes. Smaller values take fewer bytes. Negative numbers are not supported. The format is described further in . NOTE: this was writeVLong() in Lucene

Writes a string. Writes strings as UTF-8 encoded bytes. First the length, in bytes, is written as a , followed by the bytes.

Copy numBytes bytes from input to ourself.

Writes a . First the size is written as an , followed by each key-value pair written as two consecutive s.

Input . May be null (equivalent to an empty dictionary)

Writes a set. First the size is written as an , followed by each value written as a .

Input . May be null (equivalent to an empty set)

A is a flat list of files. Files may be written once, when they are created. Once a file is created it may only be opened for read, or deleted. Random access is permitted both when reading and writing. .NET's i/o APIs not used directly, but rather all i/o is through this API. This permits things such as: implementation of RAM-based indices; implementation indices stored in a database; implementation of an index as a single file; Directory locking is implemented by an instance of , and can be changed for each instance using .

Returns an array of strings, one for each file in the directory.

if the directory is not prepared for any write operations (such as ). in case of other IO errors

Returns true iff a file with the given name exists.

Removes an existing file in the directory.

Returns the length of a file in the directory. this method follows the following contract: Throws if the file does not exist. Returns a value >=0 if the file exists, which specifies its length.

the name of the file for which to return the length. if there was an IO error while retrieving the file's length.

Creates a new, empty file in the directory with the given name. Returns a stream writing this file.

Ensure that any writes to these files are moved to stable storage. Lucene uses this to properly commit changes to the index, to prevent a machine/OS crash from corrupting the index.

NOTE: Clients may call this method for same files over and over again, so some impls might optimize for that. For other impls the operation can be a noop, for various reasons.

Returns a stream reading an existing file, with the specified read buffer size. The particular implementation may ignore the buffer size. Currently the only implementations that respect this parameter are and . Throws if the file does not exist.

Returns a stream reading an existing file, computing checksum as it reads

Construct a .

the name of the lock file

Attempt to clear (forcefully unlock and remove) the specified lock. Only call this at a time when you are certain this lock is no longer in use.

name of the lock to be cleared.

Disposes the store.

Set the that this instance should use for its locking implementation. Each * instance of should only be used for one directory (ie, do not share a single instance across multiple Directories).

instance of .

Get the that this instance is using for its locking implementation. Note that this may be null for implementations that provide their own locking implementation.

Return a string identifier that uniquely differentiates this instance from other instances. This ID should be the same if two instances (even in different AppDomains and/or on different machines) are considered "the same index". This is how locking "scopes" to the right index.

Copies the file to under the new file name . If you want to copy the entire source directory to the destination one, you can do so like this:


             Directory to; // the directory to copy to
             foreach (string file in dir.ListAll()) {
                 dir.Copy(to, file, newFile, IOContext.DEFAULT); // newFile can be either file, or a new name
             }

NOTE: this method does not check whether exist and will overwrite it if it does.

Creates an for the given file name. allows other implementations to efficiently open one or more sliced instances from a single file handle. The underlying file handle is kept open until the is closed. Throws if the file does not exist. @lucene.internal @lucene.experimental

if an occurs if this Directory is closed

Allows to create one or more sliced instances from a single file handle. Some implementations may be able to efficiently map slices of a file into memory when only certain parts of a file are required. @lucene.internal @lucene.experimental

Returns an slice starting at the given offset with the given length.

Returns an slice starting at offset 0 with a length equal to the length of the underlying file

Implementation of an that reads from a portion of a file.

Expert: implements buffer refill. Reads bytes from the current position in the input.

the array to read bytes into the offset in the array to start storing bytes the number of bytes to read

Expert: implements seek. Sets current position in this file, where the next will occur.

Closes the stream to further operations.

Expert: A instance that switches files between two other instances. Files with the specified extensions are placed in the primary directory; others are placed in the secondary directory. The provided must not change once passed to this class, and must allow multiple threads to call contains at once. @lucene.experimental

Return the primary directory

Return the secondary directory

Utility method to return a file's extension.

Directory implementation that delegates calls to another directory. This class can be used to add limitations on top of an existing implementation such as rate limiting () or to add additional sanity checks for tests. However, if you plan to write your own implementation, you should consider extending directly or rather than try to reuse functionality of existing s by extending this class. @lucene.internal

Sole constructor, typically called from sub-classes.

Return the wrapped .

A FlushInfo provides information required for a FLUSH context. It is used as part of an in case of FLUSH context.

Creates a new instance from the values required for a FLUSH context. These values are only estimates and are not the actual values.

Base class for implementations that store index files in the file system. There are currently three core subclasses: is a straightforward implementation using , which is ideal for writing without using much RAM. However, it has poor concurrent performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from the same file. uses 's positional seeking, which makes it slightly less efficient than using during reading, with similar write performance. uses memory-mapped IO when reading. This is a good choice if you have plenty of virtual memory relative to your index size, eg if you are running on a 64 bit runtime, or you are running on a 32 bit runtime but your index sizes are small enough to fit into the virtual memory space. Unfortunately, because of system peculiarities, there is no single overall best implementation. Therefore, we've added the method (or one of its overloads), to allow Lucene to choose the best implementation given your environment, and the known limitations of each implementation. For users who have no reason to prefer a specific implementation, it's best to simply use (or one of its overloads). For all others, you should instantiate the desired implementation directly. The locking implementation is by default , but can be changed by passing in a custom instance. NOTE: Unlike in Java, it is not recommended to use in .NET in conjunction with an open because it is not guaranteed to exit atomically. Any lock statement or call can throw a , which makes shutting down unpredictable. To exit parallel tasks safely, we recommend using s and "interrupt" them with s.

Default read chunk size: 8192 bytes (this is the size up to which the runtime does not allocate additional arrays while reading/writing)

The collection of stale files that need to be 'ed

LUCENENET NOTE: This is a non-thread-safe collection so that we can synchronize access to it using the field. This is to prevent race conditions, i.e. one thread adding a file to the collection while another thread is trying to sync the files, which could cause a missed sync. If you need to access this collection from a derived type, you should synchronize access to it using the protected field.

A object to synchronize access to the collection. You should synchronize access to using this object from derived types.

Create a new for the named location (ctor for subclasses).

the path of the directory the lock factory to use, or null for the default (); if there is a low-level I/O error

Creates an instance, trying to pick the best implementation given the current environment. The directory returned uses the . Currently this returns for most Solaris and Windows 64-bit runtimes, for other non-Windows runtimes, and for other runtimes on Windows. It is highly recommended that you consult the implementation's documentation for your platform before using this method. NOTE: this method may suddenly change which implementation is returned from release to release, in the event that higher performance defaults become possible; if the precise implementation is important to your application, please instantiate it directly, instead. For optimal performance you should consider using on 64 bit runtimes. See .

Just like , but allows you to specify the directory as a .

The path (to a directory) to open An open

Just like , but allows you to also specify a custom .

Just like , but allows you to specify the directory as a .

The path (to a directory) to open An open

Lists all files (not subdirectories) in the directory. This method never returns null (throws instead).

if the directory does not exist, or does exist but is not a directory or is invalid (for example, it is on an unmapped drive). The caller does not have the required permission.

Lists all files (not subdirectories) in the directory.

Returns true iff a file with the given name exists.

Returns the length in bytes of a file in the directory.

Removes an existing file in the directory.

Creates an for the file with the given name.

Closes the store to future operations.

the underlying filesystem directory

For debug output.

this setting has no effect anymore.

Writes output with

Random-access methods

Base class for file system based locking implementation.

Directory for the lock files.

Set the lock directory. This property can be only called once to initialize the lock directory. It is used by to set the lock directory to itself. Subclasses can also use this property to set the directory in the constructor.

Gets the lock directory.

Abstract base class for input from a file in a . A random-access input stream. Used for all Lucene index input operations. may only be used from one thread, because it is not thread safe (it keeps internal state like file position). To allow multithreaded use, every instance must be cloned before used in another thread. Subclasses must therefore implement , returning a new which operates on the same underlying resource, but positioned independently. Lucene never closes cloned s, it will only do this on the original one. The original instance must take care that cloned instances throw when the original one is closed.

should be a non-null, opaque string describing this resource; it's returned from .

Closes the stream to further operations.

Returns the current position in this file, where the next read will occur. This was getFilePointer() in Lucene.

Sets current position in this file, where the next read will occur.

The number of bytes in the file.

Returns the resourceDescription that was passed into the constructor.

Returns a clone of this stream. Clones of a stream access the same data, and are positioned at the same point as the stream they were cloned from. Expert: Subclasses must ensure that clones may be positioned at different points in the input from each other and from the stream they were cloned from. Warning: Lucene never closes cloned s, it will only do this on the original one. The original instance must take care that cloned instances throw when the original one is closed.

Abstract base class for output to a file in a . A random-access output stream. Used for all Lucene index output operations. may only be used from one thread, because it is not thread safe (it keeps internal state like file position).

Forces any buffered output to be written.

Closes this stream to further operations.

Returns the current position in this file, where the next write will occur. This was getFilePointer() in Lucene.

Sets current position in this file, where the next write will occur.

Returns the current checksum of bytes written so far

Gets or Sets the file length. By default, this property's setter does nothing (it's optional for a to implement it). But, certain implementations (for example ) can use this to inform the underlying IO system to pre-allocate the file to the specified size. If the length is longer than the current file length, the bytes added to the file are undefined. Otherwise the file is truncated.

A wrapping a plain .

holds additional details on the merge/search context. A object can never be initialized as null as passed as a parameter to either or

is a enumeration which specifies the context in which the is being used for. NOTE: This was Context in Lucene

A setting

This constructor is used to initialize a instance with a new value for the property.

object whose information is used to create the new instance except the property. The new object will use this value for .

An interprocess mutex lock. Typical use might look like:


                var result = Lock.With.NewAnonymous<string>(
                    @lock: directory.MakeLock("my.lock"), 
                    lockWaitTimeout: Lock.LOCK_OBTAIN_WAIT_FOREVER, 
                    doBody: () =>
                {
                    //... code to execute while locked ...
                    return "the result";
                }).Run();

How long waits, in milliseconds, in between attempts to acquire the lock.

Pass this value to to try forever to obtain the lock.

Creates a new instance with the ability to specify the method through the argument Simple example:


                var result = Lock.With.NewAnonymous<string>(
                    @lock: directory.MakeLock("my.lock"), 
                    lockWaitTimeout: Lock.LOCK_OBTAIN_WAIT_FOREVER, 
                    doBody: () =>
                {
                    //... code to execute while locked ...
                    return "the result";
                }).Run();

The result of the operation is the value that is returned from (i.e. () => { return "the result"; }). The type of determines the return type of the operation.

the instance to use length of time to wait in milliseconds or to retry forever a delegate method that The value that is returned from the delegate method (i.e. () => { return theObject; })

Attempts to obtain exclusive access and immediately return upon success or failure. Use to release the lock.

true iff exclusive access is obtained

If a lock obtain called, this failureReason may be set with the "root cause" as to why the lock was not obtained.

Attempts to obtain an exclusive lock within amount of time given. Polls once per (currently 1000) milliseconds until is passed.

length of time to wait in milliseconds or to retry forever true if lock was obtained if lock wait times out if is out of bounds if throws

Releases exclusive access.

Returns true if the resource is currently locked. Note that one must still call before using the resource.

Utility class for executing code with exclusive access.

Constructs an executor that will grab the named .

the instance to use length of time to wait in milliseconds or to retry forever

Code to execute with exclusive access.

Calls while lock is obtained. Blocks if lock cannot be obtained immediately. Retries to obtain lock once per second until it is obtained, or until it has tried ten times. Lock is released when exits.

if lock could not be obtained if throws

LUCENENET specific class to simulate the anonymous creation of a With class in Java by using deletate methods.

Base class for Locking implementation. uses instances of this class to implement locking. Lucene uses by default for -based index directories. Special care needs to be taken if you change the locking implementation: First be certain that no writer is in fact writing to the index otherwise you can easily corrupt your index. Be sure to do the change on all Lucene instances and clean up all leftover lock files before starting the new configuration for the first time. Different implementations can not work together! If you suspect that some implementation is not working properly in your environment, you can easily test it by using , and .

Gets or Sets the prefix in use for all locks created in this . This is normally called once, when a gets this instance. However, you can also call this (after this instance is assigned to a ) to override the prefix in use. This is helpful if you're running Lucene on machines that have different mount points for the same shared directory.

Return a new instance identified by .

name of the lock to be created.

Attempt to clear (forcefully unlock and remove) the specified lock. Only call this at a time when you are certain this lock is no longer in use.

name of the lock to be cleared.

This exception is thrown when the write.lock could not be acquired. This happens when a writer tries to open an index that another writer already has open.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

This exception is thrown when the write.lock could not be released.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Simple standalone tool that forever acquires & releases a lock using a specific . LUCENENET specific: In the Java implementation, this class' Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to that method: lock stress-test

The command line arguments Thrown if the incorrect number of arguments are provided

Simple standalone server that must be running when you use . This server simply verifies at most one process holds the lock at a time. LUCENENET specific: In the Java implementation, this class' Main method was intended to be called from the command line. However, in .NET a method within a DLL can't be directly called from the command line so we provide a .NET tool, lucene-cli, with a command that maps to that method: lock verify-server

The command line arguments Thrown if the incorrect number of arguments are provided

A MergeInfo provides information required for a MERGE context. It is used as part of an in case of MERGE context.

Creates a new instance from the values required for a MERGE context. These values are only estimates and are not the actual values.

File-based implementation that uses for reading, and for writing. NOTE: memory mapping uses up a portion of the virtual memory address space in your process equal to the size of the file being mapped. Before using this class, be sure your have plenty of virtual address space, e.g. by using a 64 bit runtime, or a 32 bit runtime with indexes that are guaranteed to fit within the address space. On 32 bit platforms also consult if you have problems with mmap failing because of fragmented address space. If you get an , it is recommended to reduce the chunk size, until it works. NOTE: Unlike in Java, it is not recommended to use in .NET in conjunction with an open because it is not guaranteed to exit atomically. Any lock statement or call can throw a , which makes shutting down unpredictable. To exit parallel tasks safely, we recommend using s and "interrupt" them with s.

Default max chunk size.

Create a new for the named location.

the path of the directory the lock factory to use, or null for the default (); if there is a low-level I/O error

Create a new for the named location and .

the path of the directory if there is a low-level I/O error

Create a new for the named location, specifying the maximum chunk size used for memory mapping.

the path of the directory the lock factory to use, or null for the default (); maximum chunk size (default is 1 GiBytes for 64 bit runtimes and 256 MiBytes for 32 bit runtimes) used for memory mapping. Especially on 32 bit platform, the address space can be very fragmented, so large index files cannot be mapped. Using a lower chunk size makes the directory implementation a little bit slower (as the correct chunk may be resolved on lots of seeks) but the chance is higher that mmap does not fail. On 64 bit platforms, this parameter should always be 1 << 30, as the address space is big enough. Please note: The chunk size is always rounded down to a power of 2. if there is a low-level I/O error

Create a new for the named location. LUCENENET specific overload for convenience using string instead of .

the path of the directory the lock factory to use, or null for the default (); if there is a low-level I/O error

Create a new for the named location and . LUCENENET specific overload for convenience using string instead of .

the path of the directory if there is a low-level I/O error

Create a new for the named location, specifying the maximum chunk size used for memory mapping. LUCENENET specific overload for convenience using string instead of .

Returns the current mmap chunk size.

Creates an for the file with the given name.

Try to unmap the buffer, this method silently fails if no support for that in the runtime. On Windows, this leads to the fact, that mmapped files cannot be modified or deleted.

Maps a file into a set of buffers

Implements using native OS file locks. For NFS based access to an index, it's recommended that you try first and work around the one limitation that a lock file could be left when the runtime exits abnormally. The primary benefit of is that locks (not the lock file itsself) will be properly removed (by the OS) if the runtime has an abnormal exit. Note that, unlike , the existence of leftover lock files in the filesystem is fine because the OS will free the locks held against these files even though the files still remain. Lucene will never actively remove the lock files, so although you see them, the index may not be locked. Special care needs to be taken if you change the locking implementation: First be certain that no writer is in fact writing to the index otherwise you can easily corrupt your index. Be sure to do the change on all Lucene instances and clean up all leftover lock files before starting the new configuration for the first time. Different implementations can not work together! If you suspect that this or any other is not working properly in your environment, you can easily test it by using , and .

Create a instance, with null (unset) lock directory. When you pass this factory to a subclass, the lock directory is automatically set to the directory itself. Be sure to create one instance for each directory your create!

Create a instance, storing lock files into the specified

where lock files are created.

Create a instance, storing lock files into the specified

where lock files are created.

Given a lock name, return the full prefixed path of the actual lock file.

Return true if the is the result of a share violation

Return true if the is the result of a lock violation

An implementation that uses 's positional read, which allows multiple threads to read from the same file without synchronizing. This class only uses when reading; writing is achieved with . NOTE: Since the .NET uses additional seeking during reads, it will generally be slightly less efficient than . This class has poor concurrent read performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from the same file. It's usually better to use for reading. NOTE: Unlike in Java, it is not recommended to use in .NET in conjunction with an open because it is not guaranteed to exit atomically. Any lock statement or call can throw a , which makes shutting down unpredictable. To exit parallel tasks safely, we recommend using s and "interrupt" them with s.

Create a new for the named location.

the path of the directory the lock factory to use, or null for the default (); if there is a low-level I/O error

Create a new for the named location and .

the path of the directory if there is a low-level I/O error

Create a new for the named location. LUCENENET specific overload for convenience using string instead of .

the path of the directory the lock factory to use, or null for the default (); if there is a low-level I/O error

Create a new for the named location and . LUCENENET specific overload for convenience using string instead of .

the path of the directory if there is a low-level I/O error

Creates an for the file with the given name.

Reads bytes with the extension method for .

The maximum chunk size for reads of 16384 bytes.

the file channel we will read from

is this instance a clone and hence does not own the file to close it

start offset: non-zero in the slice case

end offset (start+length)

Use this to disable locking entirely. Only one instance of this lock is created. You should call to get the instance.

Wraps a around any provided delegate directory, to be used during NRT search. This class is likely only useful in a near-real-time context, where indexing rate is lowish but reopen rate is highish, resulting in many tiny files being written. This directory keeps such segments (as well as the segments produced by merging them, as long as they are small enough), in RAM. This is safe to use: when your app calls , all cached files will be flushed from the cached and sync'd. Here's a simple example usage:


                 using Directory fsDir = FSDirectory.Open(new DirectoryInfo("/path/to/index"));
                 using NRTCachingDirectory cachedFSDir = new NRTCachingDirectory(fsDir, 5.0, 60.0);
                 IndexWriterConfig conf = new IndexWriterConfig(Version.LUCENE_48, analyzer);
                 using IndexWriter writer = new IndexWriter(cachedFSDir, conf);

This will cache all newly flushed segments, all merges whose expected segment size is <= 5 MB, unless the net cached bytes exceeds 60 MB at which point all writes will not be cached (until the net bytes falls below 60 MB). @lucene.experimental

We will cache a newly created output if 1) it's a flush or a merge and the estimated size of the merged segment is <= maxMergeSizeMB, and 2) the total cached bytes is <= maxCachedMB

Returns how many bytes are being used by the cache

Dispose this directory, which flushes any cached files to the delegate and then disposes the delegate.

Subclass can override this to customize logic; return true if this file should be written to the .

A wrapping a plain .

Releases all resources used by the .

Releases resources used by the and if overridden in a derived class, optionally releases unmanaged resources.

true to release both managed and unmanaged resources; false to release only unmanaged resources.

A memory-resident implementation. Locking implementation is by default the but can be changed with . Warning: This class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of arrays. This class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. It is recommended to materialize large indexes on disk and use , which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to heap space is not useful.

Constructs an empty .

Creates a new instance from a different implementation. This can be used to load a disk-based index into memory. Warning: this class is not intended to work with huge indexes. Everything beyond several hundred megabytes will waste resources (GC cycles), because it uses an internal buffer size of 1024 bytes, producing millions of arrays. this class is optimized for small memory-resident indexes. It also has bad concurrency on multithreaded environments. For disk-based indexes it is recommended to use , which is a high-performance directory implementation working directly on the file system cache of the operating system, so copying data to heap space is not useful. Note that the resulting instance is fully independent from the original (it is a complete copy). Any subsequent changes to the original will not be visible in the instance.

a value io context if an error occurs

Returns true iff the named file exists in this directory.

Returns the length in bytes of a file in the directory.

if the file does not exist

Return total size in bytes of all files in this directory. This is currently quantized to .

Removes an existing file in the directory.

if the file does not exist

Creates a new, empty file in the directory with the given name. Returns a stream writing this file.

Returns a new for storing data. this method can be overridden to return different impls, that e.g. override .

Returns a stream reading an existing file.

Closes the store to future operations, releasing associated memory.

Represents a file in RAM as a list of buffers. @lucene.internal

File used as buffer, in no

For non-stream access from thread that might be concurrent with writing

Expert: allocate a new buffer. Subclasses can allocate differently.

size of allocated buffer. allocated buffer.

A memory-resident implementation. @lucene.internal

Construct an empty output buffer.

Copy the current contents of this buffer to the named output.

Copy the current contents of this buffer to output byte array

Resets this to an empty file.

Returns byte usage of all buffers.

A wrapper that allows rate limiting using IO context () specific rate limiters (). @lucene.experimental

Sets the maximum (approx) MB/sec allowed by all write IO performed by created with the given . Pass null for to have no limit. NOTE: For already created instances there is no guarantee this new rate will apply to them; it will only be guaranteed to apply for new created instances. NOTE: this is an optional operation and might not be respected by all implementations. Currently only buffered () implementations use rate-limiting. @lucene.experimental

if the is already disposed

Sets the rate limiter to be used to limit (approx) MB/sec allowed by all IO performed with the given context (). Pass null to have no limit. Passing an instance of rate limiter compared to setting it using allows to use the same limiter instance across several directories globally limiting IO across them. @lucene.experimental

if the is already disposed

See . @lucene.experimental

if the is already disposed

A rate limiting () @lucene.internal

Abstract base class to rate limit IO. Typically implementations are shared across multiple s or s (for example those involved all merging). Those s and s would call whenever they want to read bytes or write bytes.

Sets an updated mb per second rate limit.

The current mb per second rate limit.

Pauses, if necessary, to keep the instantaneous IO rate at or below the target. Note: the implementation is thread-safe

the pause time in nano seconds

Simple class to rate limit IO.

is the MB/sec max IO rate

Sets an updated mb per second rate limit.

The current mb per second rate limit.

Pauses, if necessary, to keep the instantaneous IO rate at or below the target. NOTE: multiple threads may safely use this, however the implementation is not perfectly thread safe but likely in practice this is harmless (just means in some rare cases the rate might exceed the target). It's best to call this with a biggish count, not one byte at a time.

the pause time in nano seconds

A straightforward implementation of using . is ideal for use cases where efficient writing is required without utilizing too much RAM. However, reading is less efficient than when using . This class has poor concurrent read performance (multiple threads will bottleneck) as it synchronizes when multiple threads read from the same file. It's usually better to use for reading. NOTE: Unlike in Java, it is not recommended to use in .NET in conjunction with an open because it is not guaranteed to exit atomically. Any lock statement or call can throw a , which makes shutting down unpredictable. To exit parallel tasks safely, we recommend using s and "interrupt" them with s.

Create a new for the named location.

the path of the directory the lock factory to use, or null for the default (); if there is a low-level I/O error

Create a new for the named location and .

the path of the directory if there is a low-level I/O error

Create a new for the named location. LUCENENET specific overload for convenience using string instead of .

the path of the directory the lock factory to use, or null for the default (); if there is a low-level I/O error

Create a new for the named location and . LUCENENET specific overload for convenience using string instead of .

the path of the directory if there is a low-level I/O error

Creates an for the file with the given name.

Reads bytes with followed by .

the file channel we will read from

is this instance a clone and hence does not own the file to close it

start offset: non-zero in the slice case

end offset (start+length)

methods

Implements using (writes the file with UTF8 encoding and no byte order mark). Special care needs to be taken if you change the locking implementation: First be certain that no writer is in fact writing to the index otherwise you can easily corrupt your index. Be sure to do the change to all Lucene instances and clean up all leftover lock files before starting the new configuration for the first time. Different implementations can not work together! If you suspect that this or any other is not working properly in your environment, you can easily test it by using , and .

Instantiate using the provided directory (as a instance).

where lock files should be created.

Instantiate using the provided directory name ().

where lock files should be created.

Implements for a single in-process instance, meaning all locking will take place through this one instance. Only use this when you are certain all s and s for a given index are running against a single shared in-process instance. This is currently the default locking for .

A delegating that records which files were written to and deleted.

A that wraps another and verifies that each lock obtain/release is "correct" (never results in two processes holding the lock at the same time). It does this by contacting an external server () to assert that at most one process holds the lock at a time. To use this, you should also run on the host & port matching what you pass to the constructor.

Creates a new instance.

the that we are testing the socket's stream input/output to

Returns the current position in this file, where the next read will occur.

Returns the current position in this file, where the next write will occur.

Compares the entire members of one array whith the other one.

The array to be compared. The array to be compared with. Returns true if the two specified arrays of Objects are equal to one another. The two arrays are considered equal if both arrays contain the same number of elements, and all corresponding pairs of elements in the two arrays are equal. Two objects e1 and e2 are considered equal if (e1==null ? e2==null : e1.Equals(e2)). In other words, the two arrays are equal if they contain the same elements in the same order. Also, two array references are considered equal if both are null. Note that if the type of is a , , or , its values and any nested collection values will be compared for equality as well.

Returns a hash code based on the contents of the given array. For any two arrays a and b, if Arrays.Equals(b) returns true, it means that the return value of Arrays.GetHashCode(a) equals Arrays.GetHashCode(b).

The array element type. The array whose hash code to compute. The hash code for .

Assigns the specified value to each element of the specified array.

the type of the array the array to be filled the value to be stored in all elements of the array

Assigns the specified long value to each element of the specified range of the specified array of longs. The range to be filled extends from index , inclusive, to index , exclusive. (If fromIndex==toIndex, the range to be filled is empty.)

the type of the array the array to be filled the index of the first element (inclusive) to be filled with the specified value the index of the last element (exclusive) to be filled with the specified value the value to be stored in all elements of the array if fromIndex > toIndex if fromIndex < 0 or toIndex > a.Length

Copies a range of elements from an Array starting at the first element and pastes them into another Array starting at the first element. The length is specified as a 32-bit integer. Usage Note: This implementation uses the most efficient (known) method for copying the array based on the data type and platform.

The array type. The Array that contains the data to copy. The Array that receives the data. A 32-bit integer that represents the number of elements to copy.

Copies a range of elements from an Array starting at the specified source index and pastes them to another Array starting at the specified destination index. The length and the indexes are specified as 32-bit integers. Usage Note: This implementation uses the most efficient (known) method for copying the array based on the data type and platform.

The array type. The Array that contains the data to copy. A 32-bit integer that represents the index in the at which copying begins. The Array that receives the data. A 32-bit integer that represents the index in the at which storing begins. A 32-bit integer that represents the number of elements to copy.

Creates a representation of the array passed. The result is surrounded by brackets "[]", each element is converted to a via the and separated by ", ". If the array is null, then "null" is returned.

The type of array element. The array to convert. The converted array string.

Creates a representation of the array passed. The result is surrounded by brackets "[]", each element is converted to a via the and separated by ", ". If the array is null, then "null" is returned.

The type of array element. The array to convert. A instance that supplies the culture formatting information. The converted array string.

Methods for working with Assemblies.

Gets a list of the host assembly's referenced assemblies excluding any Microsoft, System, or Mono prefixed assemblies or assemblies with official Microsoft key hashes. Essentially, we get a list of all non Microsoft assemblies here.

Assembly filter logic from: https://raw.githubusercontent.com/Microsoft/dotnet-apiport/master/src/Microsoft.Fx.Portability/Analyzer/DotNetFrameworkFilter.cs

These keys are a collection of public key tokens derived from all the reference assemblies in "%ProgramFiles%\Reference Assemblies\Microsoft" on a Windows 10 machine with VS 2015 installed

Gets a best guess as to whether this assembly is a .NET Framework assembly or not.

Extensions for .

Removes the given collection of elements from the source . Usage Note: This is the same operation as or with a predicate of (value) => collection.Contains(value). It is recommended to use these alternatives when possible.

The type of the elements of . An to remove elements from. An containing the items to remove from . true if the collection changed as a result of the call; otherwise, false.

Retains only the elements in this list that are contained in the specified collection (optional operation). In other words, removes from this list all of its elements that are not contained in the specified collection. Usage Note: This is the same operation as or with a predicate of (value) => !collection.Contains(value). It is recommended to use these alternatives when possible.

The type of the elements of . An to remove elements from. An containing the items to remove from . true if the collection changed as a result of the call; otherwise, false.

This is the same implementation of ToString from Java's AbstractCollection (the default implementation for all sets and lists)

This is the same implementation of ToString from Java's AbstractCollection (the default implementation for all sets and lists), plus the ability to specify culture for formatting of nested numbers and dates. Note that this overload will change the culture of the current thread.

This is the same implementation of ToString from Java's AbstractMap (the default implementation for all dictionaries)

This is the same implementation of ToString from Java's AbstractMap (the default implementation for all dictionaries), plus the ability to specify culture for formatting of nested numbers and dates. Note that this overload will change the culture of the current thread.

This is a helper method that assists with recursively building a string of the current collection and all nested collections.

This is a helper method that assists with recursively building a string of the current collection and all nested collections, plus the ability to specify culture for formatting of nested numbers and dates. Note that this overload will change the culture of the current thread.

The comparer specified in the static factory. This will never be null, as the static factory returns a ReverseComparer instance if its argument is null. @serial

Represents a thread-safe hash-based unique collection.

The type of the items in the collection. All public members of are thread-safe and may be used concurrently from multiple threads.

Gets the number of items contained in the .

The number of items contained in the . Count has snapshot semantics and represents the number of items in the at the moment when Count was accessed.

Gets a value that indicates whether the is empty.

true if the is empty; otherwise, false.

Initializes a new instance of the class that is empty, has the default concurrency level, has the default initial capacity, and uses the default comparer for the item type.

Initializes a new instance of the class that is empty, has the specified concurrency level and capacity, and uses the default comparer for the item type.

The estimated number of threads that will update the concurrently. The initial number of elements that the can contain. is less than 1. is less than 0.

Initializes a new instance of the class that contains elements copied from the specified , has the default concurrency level, has the default initial capacity, and uses the default comparer for the item type.

The whose elements are copied to the new . is a null reference.

Initializes a new instance of the class that is empty, has the specified concurrency level and capacity, and uses the specified .

The implementation to use when comparing items.

Initializes a new instance of the class that contains elements copied from the specified , has the default concurrency level, has the default initial capacity, and uses the specified .

The whose elements are copied to the new . The implementation to use when comparing items. is a null reference (Nothing in Visual Basic).

Initializes a new instance of the class that contains elements copied from the specified , has the specified concurrency level, has the specified initial capacity, and uses the specified .

The estimated number of threads that will update the concurrently. The whose elements are copied to the new . The implementation to use when comparing items. is a null reference. is less than 1.

Initializes a new instance of the class that is empty, has the specified concurrency level, has the specified initial capacity, and uses the specified .

The estimated number of threads that will update the concurrently. The initial number of elements that the can contain. The implementation to use when comparing items. is less than 1. -or- is less than 0.

Adds the specified item to the .

The item to add. true if the items was added to the successfully; false if it already exists. The contains too many items.

Removes all items from the .

Determines whether the contains the specified item.

The item to locate in the . true if the contains the item; otherwise, false.

Attempts to remove the item from the .

The item to remove. true if an item was removed successfully; otherwise, false.

Returns an enumerator that iterates through the .

An enumerator for the . The enumerator returned from the collection is safe to use concurrently with reads and writes to the collection, however it does not represent a moment-in-time snapshot of the collection. The contents exposed through the enumerator may contain modifications made to the collection after was called.

Determines whether the specified object is structurally equal to the current set using rules provided by the specified .

The object to compare with the current object. The implementation to use to determine whether the current object and are structurally equal. true if is structurally equal to the current set; otherwise, false. If is null.

Gets the hash code representing the current set using rules specified by the provided .

The implementation to use to generate the hash code. A hash code representing the current set.

Determines whether the specified object is structurally equal to the current set using rules similar to those in the JDK's AbstactSet class. Two sets are considered equal when they both contain the same objects (in any order).

The object to compare with the current object. true if the specified object implements and it contains the same elements; otherwise, false.

Gets the hash code for the current list. The hash code is calculated by taking each nested element's hash code into account.

A hash code for the current object.

Returns a string that represents the current set using the specified and .

A string that represents the current set. is null. is invalid. -or- The index of a format item is not zero.

Returns a string that represents the current set using . The presentation has a specific format. It is enclosed by square brackets ("[]"). Elements are separated by ', ' (comma and space).

A string that represents the current set.

Returns a string that represents the current set using the specified .

A string that represents the current set. is null.

Returns a string that represents the current set using the specified and . The presentation has a specific format. It is enclosed by square brackets ("[]"). Elements are separated by ', ' (comma and space).

A string that represents the current set. is null. is invalid. -or- The index of a format item is not zero.

The .NET ticks representing January 1, 1970 0:00:00, also known as the "epoch".

Extensions to .

Copies all of the mappings from the specified to this dictionary. These mappings will replace any mappings that this dictionary had for any of the keys currently in the specified dictionary.

The type of key. The type of value. This dictionary. The collection to merge. If or is null.

Associates the specified value with the specified key in this dictionary. If the dictionary previously contained a mapping for the key, the old value is replaced. Usage Note: Unless the return value is required, it is more efficient to use the setter of the dictionary indexer than this method. This method will only work right if is a nullable type, since it may not be possible to distinguish value types with actual values from their default value. Java collections only accept reference types, so this is a direct port from Java, not accounting for value types.

The type of key. The type of value. This dictionary. The key with which the specified is associated. The value to be associated with the specified . The previous value associated with key, or null if there was no mapping for key. (A null return can also indicate that the map previously associated null with key.) is null. -or- The underlying dictionary implementation doesn't accept null for . -or- The underlying dictionary implementation doesn't accept null for .

.NET Specific Helper Extensions for IEnumerable

Enumerates a sequence in pairs

In the case of an uneven amount of elements, the list call to pases default as the second parameter. The type of the elements of . The type of the elements returned from . An to enumerate in pairs. A function that is invoked for each pair of elements. or is . A new containing the results from each pair.

Take all but the last element of the sequence.

The type of the elements of . This . The resulting .

Take all but the last elements of the sequence.

The type of the elements of . This . The number of elements at the end of the sequence to exclude. The resulting .

Use this attribute to make an exception to the class naming rules (which should not be named like Interfaces).

Properties, methods, or events marked with this attribute can ignore the numeric naming conventions of "Int16", "Int32", "Int64", and "Single" that are commonly used in .NET method and property names.

Use this attribute to make an exception to the nullable enum rule. Some of these cannot be avoided.

Contains conversion support elements such as classes, interfaces and static methods.

Poached from dotnet runtime. This is only available on .NET 6+, so we just made a copy to make passing parameters easier.

A bitwise combination of the enumeration values that determines how the file can be accessed by the object. This also determines the values returned by the and properties of the object.

When contains an invalid value.

A bitwise combination of the enumeration values that determines how the file will be shared by processes. The default value is .

When contains an invalid value.

A bitwise combination of the enumeration values that specifies additional file options. The default value is , which indicates synchronous IO.

When contains an invalid value.

The size of the buffer used by for buffering. The default buffer size is 4096. 0 or 1 means that buffering should be disabled. Negative values are not allowed.

When is negative.

Represents the methods to support some operations over files.

Creates a new empty file in a random subdirectory of , using the given prefix and suffix strings to generate its name.

If this method returns successfully then it is guaranteed that: The file denoted by the returned abstract pathname did not exist before this method was invoked, and Neither this method nor any of its variants will return the same abstract pathname again in the current invocation of the virtual machine. This method provides only part of a temporary-file facility. However, the file will not be deleted automatically, it must be deleted by the caller. The prefix argument must be at least three characters long. It is recommended that the prefix be a short, meaningful string such as "hjb" or "mail". The suffix argument may be null, in which case a random suffix will be used. Both prefix and suffix must be provided with valid characters for the underlying system, as specified by . If the directory argument is null then the system-dependent default temporary-file directory will be used, with a random subdirectory name. The default temporary-file directory is specified by the method. On UNIX systems the default value of this property is typically "/tmp" or "/var/tmp"; on Microsoft Windows systems it is typically "C:\\Users\\[UserName]\\AppData\Local\Temp". The prefix string to be used in generating the file's name; must be at least three characters long The suffix string to be used in generating the file's name; may be null, in which case a random suffix will be generated A instance representing the temp file that was created. is null. length is less than 3 characters. -or- or contains invalid characters according to .

Creates a new empty file in the specified directory, using the given prefix and suffix strings to generate its name.

If this method returns successfully then it is guaranteed that: The file denoted by the returned abstract pathname did not exist before this method was invoked, and Neither this method nor any of its variants will return the same abstract pathname again in the current invocation of the application. This method provides only part of a temporary-file facility. However, the file will not be deleted automatically, it must be deleted by the caller. The prefix argument must be at least three characters long. It is recommended that the prefix be a short, meaningful string such as "hjb" or "mail". The suffix argument may be null, in which case a random suffix will be used. Both prefix and suffix must be provided with valid characters for the underlying system, as specified by . If the directory argument is null then the system-dependent default temporary-file directory will be used, with a random subdirectory name. The default temporary-file directory is specified by the method. On UNIX systems the default value of this property is typically "/tmp" or "/var/tmp"; on Microsoft Windows systems it is typically "C:\\Users\\[UserName]\\AppData\Local\Temp". The prefix string to be used in generating the file's name; must be at least three characters long The suffix string to be used in generating the file's name; may be null, in which case a random suffix will be generated The directory in which the file is to be created, or null if the default temporary-file directory is to be used A instance representing the temp file that was created. is null. length is less than 3 characters. -or- or contains invalid characters according to .

Creates a new empty file in the specified directory, using the given prefix and suffix strings to generate its name.

If this method returns successfully then it is guaranteed that: The file denoted by the returned abstract pathname did not exist before this method was invoked, and Neither this method nor any of its variants will return the same abstract pathname again in the current invocation of the application. This method provides only part of a temporary-file facility. However, the file will not be deleted automatically, it must be deleted by the caller. The prefix argument must be at least three characters long. It is recommended that the prefix be a short, meaningful string such as "hjb" or "mail". The suffix argument may be null, in which case a random suffix will be used. Both prefix and suffix must be provided with valid characters for the underlying system, as specified by . If the directory argument is null then the system-dependent default temporary-file directory will be used, with a random subdirectory name. The default temporary-file directory is specified by the method. On UNIX systems the default value of this property is typically "/tmp" or "/var/tmp"; on Microsoft Windows systems it is typically "C:\\Users\\[UserName]\\AppData\Local\Temp". The prefix string to be used in generating the file's name; must be at least three characters long The suffix string to be used in generating the file's name; may be null, in which case a random suffix will be generated The directory in which the file is to be created, or null if the default temporary-file directory is to be used A instance representing the temp file that was created. is null. length is less than 3 characters. -or- or contains invalid characters according to .

Creates a new empty file in the specified directory, using the given prefix and suffix strings to generate its name and returns an open stream to it.

If this method returns successfully then it is guaranteed that: The file denoted by the returned abstract pathname did not exist before this method was invoked, and Neither this method nor any of its variants will return the same abstract pathname again in the current invocation of the application. This method provides only part of a temporary-file facility. However, the file will not be deleted automatically, it must be deleted by the caller. The prefix argument must be at least three characters long. It is recommended that the prefix be a short, meaningful string such as "hjb" or "mail". The suffix argument may be null, in which case a random suffix will be used. Both prefix and suffix must be provided with valid characters for the underlying system, as specified by . If the directory argument is null then the system-dependent default temporary-file directory will be used, with a random subdirectory name. The default temporary-file directory is specified by the method. On UNIX systems the default value of this property is typically "/tmp" or "/var/tmp"; on Microsoft Windows systems it is typically "C:\\Users\\[UserName]\\AppData\Local\Temp". The prefix string to be used in generating the file's name; must be at least three characters long The suffix string to be used in generating the file's name; may be null, in which case a random suffix will be generated The directory in which the file is to be created, or null if the default temporary-file directory is to be used A instance representing the temp file that was created. is null. length is less than 3 characters. -or- or contains invalid characters according to .

Creates a new empty file in the specified directory, using the given prefix and suffix strings to generate its name and returns an open stream to it.

If this method returns successfully then it is guaranteed that: The file denoted by the returned abstract pathname did not exist before this method was invoked, and Neither this method nor any of its variants will return the same abstract pathname again in the current invocation of the application. This method provides only part of a temporary-file facility. However, the file will not be deleted automatically, it must be deleted by the caller. The prefix argument must be at least three characters long. It is recommended that the prefix be a short, meaningful string such as "hjb" or "mail". The suffix argument may be null, in which case a random suffix will be used. Both prefix and suffix must be provided with valid characters for the underlying system, as specified by . If the directory argument is null then the system-dependent default temporary-file directory will be used, with a random subdirectory name. The default temporary-file directory is specified by the method. On UNIX systems the default value of this property is typically "/tmp" or "/var/tmp"; on Microsoft Windows systems it is typically "C:\\Users\\[UserName]\\AppData\Local\Temp". The prefix string to be used in generating the file's name; must be at least three characters long The suffix string to be used in generating the file's name; may be null, in which case a random suffix will be generated The directory in which the file is to be created, or null if the default temporary-file directory is to be used A instance representing the temp file that was created. is null. length is less than 3 characters. -or- or contains invalid characters according to .

Creates a new empty file in the specified directory, using the given prefix and suffix strings to generate its name and returns an open stream to it.

If this method returns successfully then it is guaranteed that: The file denoted by the returned abstract pathname did not exist before this method was invoked, and Neither this method nor any of its variants will return the same abstract pathname again in the current invocation of the application. This method provides only part of a temporary-file facility. However, the file will not be deleted automatically, it must be deleted by the caller. The prefix argument must be at least three characters long. It is recommended that the prefix be a short, meaningful string such as "hjb" or "mail". The suffix argument may be null, in which case a random suffix will be used. Both prefix and suffix must be provided with valid characters for the underlying system, as specified by . If the directory argument is null then the system-dependent default temporary-file directory will be used, with a random subdirectory name. The default temporary-file directory is specified by the method. On UNIX systems the default value of this property is typically "/tmp" or "/var/tmp"; on Microsoft Windows systems it is typically "C:\\Users\\[UserName]\\AppData\Local\Temp". The prefix string to be used in generating the file's name; must be at least three characters long The suffix string to be used in generating the file's name; may be null, in which case a random suffix will be generated The directory in which the file is to be created, or null if the default temporary-file directory is to be used The options to pass to the . A instance representing the temp file that was created. or is null. length is less than 3 characters. -or- or contains invalid characters according to . -or- . is set to .

Creates a new empty file in the specified directory, using the given prefix and suffix strings to generate its name and returns an open stream to it.

If this method returns successfully then it is guaranteed that: The file denoted by the returned abstract pathname did not exist before this method was invoked, and Neither this method nor any of its variants will return the same abstract pathname again in the current invocation of the application. This method provides only part of a temporary-file facility. However, the file will not be deleted automatically, it must be deleted by the caller. The prefix argument must be at least three characters long. It is recommended that the prefix be a short, meaningful string such as "hjb" or "mail". The suffix argument may be null, in which case a random suffix will be used. Both prefix and suffix must be provided with valid characters for the underlying system, as specified by . If the directory argument is null then the system-dependent default temporary-file directory will be used, with a random subdirectory name. The default temporary-file directory is specified by the method. On UNIX systems the default value of this property is typically "/tmp" or "/var/tmp"; on Microsoft Windows systems it is typically "C:\\Users\\[UserName]\\AppData\Local\Temp". The prefix string to be used in generating the file's name; must be at least three characters long The suffix string to be used in generating the file's name; may be null, in which case a random suffix will be generated The directory in which the file is to be created, or null if the default temporary-file directory is to be used The options to pass to the . A instance representing the temp file that was created. or is null. length is less than 3 characters. -or- or contains invalid characters according to . -or- . is set to .

Tests whether the passed in is an corresponding to the underlying operating system's "File Already Exists" violation. This works by forcing the exception to occur during initialization and caching the value for the current OS.

An exception, for comparison. The path of the file to check. This is used as a fallback in case the current OS doesn't have an HResult (an edge case). true if the exception passed is an with an corresponding to the operating system's "File Already Exists" violation, which occurs when an attempt is made to create a file that already exists.

Generates a new random file name with the provided , and optional .

The prefix string to be used in generating the file's name The suffix string to be used in generating the file's name; may be null, in which case a random suffix will be generated A object containing the temp directory path. Must not be null. A random file name is null or whitespace or is null.

Generates a new random file name with the provided , and optional .

The prefix string to be used in generating the file's name The suffix string to be used in generating the file's name; may be null, in which case a random suffix will be generated A containing the temp directory path. Must not be null. A random file name or is null or whitespace.

Returns the absolute path of this with all references resolved and any drive letters normalized to upper case on Windows. An absolute path is one that begins at the root of the file system. The canonical path is one in which all references have been resolved. For the cases of '..' and '.', where the file system supports parent and working directory respectively, these are removed and replaced with a direct directory reference.

This instance. The canonical path of this file.

Decorates a instance and makes no assumptions about whether has been called on the inner instance or not. Acts like a circuit breaker - the first caught turns it off and the rest of the calls are ignored after that point until is called. The primary purpose is for using a instance within a non-disposable parent object. Since the creator of the ultimately is responsible for disposing it, our non-disposable object has no way of knowing whether it is safe to use the . Wraping the within a ensures the non-disposable object can continue to make calls to the without raising exceptions (it is presumed that the functionality is optional).

Extension methods that make a effectively into a binary serializer with no encoding. We simply convert types into bytes and write them without any concern whether surrogate pairs are respected, similar to what BinaryFormatter does. This makes it possible to serialize/deserialize raw character arrays and get the data back in the same order without any exceptions warning that the order is not valid and without the need for BinaryFormatter. Byte order is little-endian (same as and ).

Reads a sequence of bytes from a to the given , starting at the given position. The must be both seekable and readable.

The stream to read. The to write to. The file position at which the transfer is to begin; must be non-negative. The number of bytes read, possibly zero. or is null is not readable. -or- is not seekable. is less than 0. -or- is greater than the of the stream. An I/O error occurs. has already been disposed. This method is atomic when used by itself, but does not synchronize with the rest of the stream methods.

Extensions to help obtain/release from a ReaderWriterSlimLock. Taken from: http://stackoverflow.com/questions/170028/how-would-you-simplify-entering-and-exiting-a-readerwriterlock LUCENENET specific

Provides a task scheduler that ensures a maximum concurrency level while running on top of the thread pool. Source: https://msdn.microsoft.com/en-us/library/system.threading.tasks.taskscheduler(v=vs.110).aspx

A lock that uses an unfair locking strategy, similar to how it works in Java. This lock is unfair in that it will aquire the lock even if there are any threads waiting on . This implementation also does not use FIFO order when waiting on . Each queued thread will continue to acquire the lock continually, but yield between each iteration. So, any waiting thread could be next to aquire the lock. This differs from how it works in Java, but the overhead of fixing this behavior with a queue is probably not worth the cost.

Tries to aquire the lock. If the lock is not available, the thread will block until it can obtain the lock. FIFO order is not respected on waiting locks. Also, threads that are waiting are not allowed to sleep. Instead, they call and one of them will acquire the lock as soon as there are no other callers to or . Threads that call and are allowed to obtain the lock even if there are other threads waiting for it. This "barging" behavior is similar to how ReentryLock works in Java.

NOTE: This is not the full implementation that correctly throws after is called. Since this is only used in tests and Lucene.NET doesn't support , this is okay. But if this method is ever used in production scenarios, the approach used for this lock needs to be reevaluated.

Releases the lock when called the same number of times as , and for the current task/thread.

Tries to aquire the lock and immediately returns a boolean value indicating whether the lock was obtained. Threads that call and are allowed to obtain the lock even if there are other threads waiting for it. This "barging" behavior is similar to how ReentryLock works in Java.

true if the lock was obtained successfully; otherwise, false.

Returns a value indicating whether the lock is held by the current thread.

A drop-in replacement for that doesn't throw when entering locks, but defers the excepetion until a wait or sleep occurs. This is to mimic the behavior in Java, which does not throw when entering a lock. NOTE: this is just a best effort. The BCL and other libraries we depend on don't take such measures, so any call to an API that we don't own could result in a if it attempts to aquire a lock. It is not practical to put a try/catch block around every 3rd party API call that attempts to lock. As such, Lucene.NET does not support and using it is discouraged. See https://github.com/apache/lucenenet/issues/526.

Acquires an exclusive lock on the specified object, and atomically sets a value that indicates whether the lock was taken. See for more details. If the lock is interrupted, this method will not throw a . Instead, it will reset the interrupt state. This matches the behavior of the synchronized keyword in Java, which never throws when the current thread is in an interrupted state. It allows us to catch in a specific part of the application, rather than allowing it to be thrown anywhere we atempt to lock. NOTE: this is just a best effort. The BCL and other libraries we depend on don't take such measures, so any call to an API that we don't own could result in a if it attempts to aquire a lock. It is not practical to put a try/catch block around every 3rd party API call that attempts to lock. As such, Lucene.NET does not support and using it is discouraged. See https://github.com/apache/lucenenet/issues/526.

Acquires an exclusive lock on the specified object. See for more details. If the lock is interrupted, this method will not throw a . Instead, it will reset the interrupt state. This matches the behavior of the synchronized keyword in Java, which never throws when the current thread is in an interrupted state. It allows us to catch in a specific part of the application, rather than allowing it to be thrown anywhere we atempt to lock. NOTE: this is just a best effort. The BCL and other libraries we depend on don't take such measures, so any call to an API that we don't own could result in a if it attempts to aquire a lock. It is not practical to put a try/catch block around every 3rd party API call that attempts to lock. As such, Lucene.NET does not support and using it is discouraged. See https://github.com/apache/lucenenet/issues/526.

Cascades the call to . Releases an exclusive lock on the specified object.

Cascades the call to . Determines whether the current thread holds the lock on the specified object.

Cascades the call to . Attempts to acquire an exclusive lock on the specified object.

Cascades the call to . Attempts to acquire an exclusive lock on the specified object, and atomically sets a value that indicates whether the lock was taken.

Cascades the call to . Attempts, for the specified number of milliseconds, to acquire an exclusive lock on the specified object.

Cascades the call to . Attempts, for the specified amount of time, to acquire an exclusive lock on the specified object.

Cascades the call to . Attempts, for the specified number of milliseconds, to acquire an exclusive lock on the specified object, and atomically sets a value that indicates whether the lock was taken.

Cascades the call to . Attempts, for the specified amount of time, to acquire an exclusive lock on the specified object, and atomically sets a value that indicates whether the lock was taken.

Cascades the call to . Notifies a thread in the waiting queue of a change in the locked object's state.

Cascades the call to . Notifies all waiting threads of a change in the object's state.

Cascades the call to . Releases the lock on an object and blocks the current thread until it reacquires the lock.

Cascades the call to . Releases the lock on an object and blocks the current thread until it reacquires the lock. If the specified time-out interval elapses, the thread enters the ready queue.

Cascades the call to . Releases the lock on an object and blocks the current thread until it reacquires the lock. If the specified time-out interval elapses, the thread enters the ready queue. This method also specifies whether the synchronization domain for the context (if in a synchronized context) is exited before the wait and reacquired afterward.

Cascades the call to . Releases the lock on an object and blocks the current thread until it reacquires the lock If the specified time-out interval elapses, the thread enters the ready queue. This method also specifies whether the synchronization domain for the context (if in a synchronized context) is exited before the wait and reacquired afterward.

Extensions to

Returns a concurrent wrapper for the current .

The type of elements in the set. The collection to make concurrent (thread-safe). An object that acts as a read-only wrapper around the current . is null. To synchronize any modifications to the object, expose it only through this wrapper. The set returned uses simple locking and may not be the most performant solution, but it provides a quick way to make any set thread-safe. A synchronization object is exposed through the property that can be used for external synchronization. This method is an O(1) operation.

LUCENENET specific simple formatter to pass a value to in order to defer allocating until the assert fails.

Extensions to .

Returns true if contains any character from .

The string in which to seek characters from . An array of characters to check. true if any are found, otherwise; false.

Attribute to define a property or method as a writable array. Per MSDN, members should never return arrays because the array contents can be updated, which makes the behavior confusing. However, Lucene's design sometimes relies on other classes to update arrays - both as array fields and as methods that return arrays. So, in these cases we are making an exception to this rule and marking them with to signify that this is intentional. For properties that violate this rule, you should also use the :


            [WritableArray, SuppressMessage("Microsoft.Performance", "CA1819", Justification = "Lucene's design requires some writable array properties")]

Implements

Indicates if this token will proactively raise callbacks. Callbacks are still guaranteed to be invoked, eventually.

Gets a value that indicates if a change has occurred.

Registers for a callback that will be invoked when the entry has changed. MUST be set before the callback is invoked.

The callback to invoke. State to be passed into the callback.

Used to trigger the change token when a reload occurs.

The root node for a configuration.

Initializes a Configuration root with a list of providers.

The s for this configuration.

Gets or sets the value corresponding to a configuration key.

The configuration key. The configuration value.

Gets the immediate children sub-sections.

Returns a that can be used to observe when this configuration is reloaded.

Gets a configuration sub-section with the specified key.

The key of the configuration section. The . This method will never return null. If no matching sub-section is found with the specified key, an empty will be returned.

Force the configuration values to be reloaded from the underlying sources.

Represents a section of application configuration values.

Initializes a new instance.

The configuration root. The path to this section.

Gets the full path to this section from the .

Gets the key this section occupies in its parent.

Gets or sets the section value.

Gets or sets the value corresponding to a configuration key.

The configuration key. The configuration value.

Gets a configuration sub-section with the specified key.

The key of the configuration section. The . This method will never return null. If no matching sub-section is found with the specified key, an empty will be returned.

Gets the immediate descendant configuration sub-sections.

The configuration sub-sections.

Returns a that can be used to observe when this configuration is reloaded.

Provides access to the application's configuration settings.

Sets the instance used to instantiate subclasses.

The new . The parameter is null.

Gets the associated factory.

The factory.

Returns the current configuration

The default implementation of that is used when the end user doesn't supply one. This implementation simply reads settings from environment variables.

Returns the default configuration instance, creating it first if necessary.

The default instance.

An environment variable based .

Initializes a new instance.

Initializes a new instance with the specified prefix.

A prefix used to filter the environment variables.

Loads the environment variables.

The configuration key value pairs for this provider.

Sets a value for a given key.

The configuration key to set. The value to set.

Returns the list of keys that this provider has.

The earlier keys that other providers contain. The path for the parent IConfiguration. The list of keys for this provider.

Returns a that can be used to listen when this provider is reloaded.

Contract for extending the functionality of system properties by providing an application-defined instance. Usage: Implement this interface and set the implementation at application startup using .

Gets or creates an instance of that Lucene.NET can use to read application-defined settings. The implementation is responsible for the lifetime of the instance. A typical implementation will either get the instance from a dependency injection container or provide its own caching mechanism to ensure the settings are not reloaded each time the method is called.

The current instance.

Thrown to indicate that an assertion has failed.

Constructs an with no detail message.

Constructs an with the provided .

Value to be used as the assertion message.

Constructs an with the provided and .

Value to be used as the assertion message. The exception that is the cause of the current exception, or a null reference (Nothing in Visual Basic) if no inner exception is specified.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Provides a set of methods that help debug your code.

Allows toggling "assertions" on/off even in release builds. The default is false. This allows loggers and testing frameworks to enable test point messages ("TP") from , , , , , and .

Checks for a condition; if the condition is false, throws an . IMPORTANT: For best performance, only call this method after checking to ensure the value of is true.

The conditional expression to evaluate. If the condition is true, no exception is thrown.

Checks for a condition; if the is false, throws an with the message formated from the specified . IMPORTANT: The purpose of using this overload is to defer execution of building the string until it the is false. Ideally, we would use a parameter, but doing so allocates extra RAM even when calls to the method are in an unreachable execution path. When passing parameters, strive to pass value or reference types without doing any pre-processing or string formatting. If necessary, wrap the parameter in another class or struct and override the method so any expensive formatting is deferred until after is checked. IMPORTANT: For best performance, only call this method after checking to ensure the value of is true.

The conditional expression to evaluate. If the condition is true, no exception is thrown. A composite format string to use to build a failure message. This message contains text intermixed with zero or more format items, which correspond to the , , , or parameters. The parameter corresponding to the format item at index 0 ({0}). The parameter corresponding to the format item at index 1 ({1}). The parameter corresponding to the format item at index 2 ({2}). The parameter corresponding to the format item at index 3 ({3}). The parameter corresponding to the format item at index 4 ({4}).

The conditional expression to evaluate. If the condition is true, no exception is thrown. A composite format string to use to build a failure message. This message contains text intermixed with zero or more format items, which correspond to the , , , , or parameters. The parameter corresponding to the format item at index 0 ({0}). The parameter corresponding to the format item at index 1 ({1}). The parameter corresponding to the format item at index 2 ({2}). The parameter corresponding to the format item at index 3 ({3}). The parameter corresponding to the format item at index 4 ({4}). The parameter corresponding to the format item at index 5 ({5}).

The conditional expression to evaluate. If the condition is true, no exception is thrown. A composite format string to use to build a failure message. This message contains text intermixed with zero or more format items, which correspond to the , , , , , or parameters. The parameter corresponding to the format item at index 0 ({0}). The parameter corresponding to the format item at index 1 ({1}). The parameter corresponding to the format item at index 2 ({2}). The parameter corresponding to the format item at index 3 ({3}). The parameter corresponding to the format item at index 4 ({4}). The parameter corresponding to the format item at index 5 ({5}). The parameter corresponding to the format item at index 6 ({6}).

The conditional expression to evaluate. If the condition is true, no exception is thrown. A composite format string to use to build a failure message. This message contains text intermixed with zero or more format items, which correspond to the , , , , , , or parameters. The parameter corresponding to the format item at index 0 ({0}). The parameter corresponding to the format item at index 1 ({1}). The parameter corresponding to the format item at index 2 ({2}). The parameter corresponding to the format item at index 3 ({3}). The parameter corresponding to the format item at index 4 ({4}). The parameter corresponding to the format item at index 5 ({5}). The parameter corresponding to the format item at index 6 ({6}). The parameter corresponding to the format item at index 7 ({7}).

Checks for a condition; if the is false, throws an with the given message. IMPORTANT: If you need to use string concatenation when building the message, use an overload of for better performance. IMPORTANT: For best performance, only call this method after checking to ensure the value of is true.

The conditional expression to evaluate. If the condition is true, no exception is thrown. The message to use to indicate a failure of .

Returns number of set bits. NOTE: this visits every in the backing bits array, and the result is not internally cached!

Returns number of set bits. NOTE: this visits every long in the backing bits array, and the result is not internally cached!

Get the number of set bits.

The number of set bits.

Return the number of documents in this in constant time.

Class to cast to type .

Target type

Casts to . This does not cause boxing for value types. Useful in generic methods.

Source type to cast from. Usually a generic type.

Extensions to the class to allow for adding and retrieving suppressed exceptions, like you can do in Java.

Marks a field exempt from the calculation.

Base class for types that exclude services from Reflection scanning.

Contract for Java-style properties.

Retrieves the value of a property from the current process.

The name of the property. The property value.

Retrieves the value of a property from the current process, with a default value if it doens't exist or the caller doesn't have permission to read the value.

The name of the property. The value to use if the property does not exist or the caller doesn't have permission to read the value. The property value.

Retrieves the value of a property from the current process as . If the value cannot be cast to , returns false.

The name of the property. The property value.

Retrieves the value of a property from the current process as , with a default value if it doens't exist, the caller doesn't have permission to read the value, or the value cannot be cast to a .

The name of the property. The value to use if the property does not exist, the caller doesn't have permission to read the value, or the value cannot be cast to . The property value.

Retrieves the value of a property from the current process as . If the value cannot be cast to , returns 0.

The name of the property. The property value.

Retrieves the value of a property from the current process as , with a default value if it doens't exist, the caller doesn't have permission to read the value, or the value cannot be cast to a .

The name of the property. The value to use if the property does not exist, the caller doesn't have permission to read the value, or the value cannot be cast to . The property value.

Contract for a set of localized resources. Generally, this is an abstraction over one or more instances.

Returns the value of the string resource localized for the specified .

The name of the resource to retrieve. An object that represents the culture for which the resource is localized. The value of the resource localized for the specified , or null if cannot be found in a resource set. The parameter is null. The value of the specified resource is not a string. No usable set of resources has been found, and there are no resources for a default culture. For information about how to handle this exception, see the "Handling MissingManifestResourceException and MissingSatelliteAssemblyException Exceptions" section in the class topic. The default culture's resources reside in a satellite assembly that could not be found. For information about how to handle this exception, see the "Handling MissingManifestResourceException and MissingSatelliteAssemblyException Exceptions" section in the class topic.

Gets the value of the specified non-string resource localized for the specified .

The name of the resource to get. The culture for which the resource is localized. If the resource is not localized for this culture, the resource manager uses fallback rules to locate an appropriate resource. If this value is null, the object is obtained by using the property. The value of the resource, localized for the specified culture. If an appropriate resource set exists but cannot be found, the method returns null. The parameter is null. No usable set of resources has been found, and there are no resources for a default culture. For information about how to handle this exception, see the "Handling MissingManifestResourceException and MissingSatelliteAssemblyException Exceptions" section in the class topic. The default culture's resources reside in a satellite assembly that could not be found. For information about how to handle this exception, see the "Handling MissingManifestResourceException and MissingSatelliteAssemblyException Exceptions" section in the class topic.

Returns an unmanaged memory stream object from the specified resource, using the specified .

The name of a resource. An object that specifies the culture to use for the resource lookup. If is null, the culture for the current thread is used. An unmanaged memory stream object that represents a resource. The parameter is null. The value of the specified resource is not a object. No usable set of resources has been found, and there are no resources for a default culture. For information about how to handle this exception, see the "Handling MissingManifestResourceException and MissingSatelliteAssemblyException Exceptions" section in the class topic. The default culture's resources reside in a satellite assembly that could not be found. For information about how to handle this exception, see the "Handling MissingManifestResourceException and MissingSatelliteAssemblyException Exceptions" section in the class topic.

LUCENENET specific contract that provides support for , , and . Implement this interface in addition to , , or to provide optional support for the above methods when providing a custom implementation. If this interface is not supported by the corresponding factory, a will be thrown from the above methods.

Lists the available services for the current service type.

Extensions to .

Adds the elements of the specified collection to the end of the .

The element type. The list to add to. The collection whose elements should be added to the end of the . The collection itself cannot be null, but it can contain elements that are null, if type is a reference type. or is null.

If the underlying type is , calls . If not, uses

this

If the underlying type is , calls . If not, uses

this the comparer to use for the sort

If the underlying type is , calls . If not, uses

this the comparison function to use for the sort

Sorts the given using the . This method uses the Tim sort algorithm, but falls back to binary sort for small lists.

this

Sorts the given using the . This method uses the Tim sort algorithm, but falls back to binary sort for small lists.

this The to use for the sort.

Sorts the given using the . This method uses the intro sort algorithm, but falls back to insertion sort for small lists.

this

Sorts the given using the . This method uses the intro sort algorithm, but falls back to insertion sort for small lists.

this The to use for the sort.

A general exception type thrown from Lucene.NET. This corresponds to the RuntimeException type in Java. In .NET, is similar, but includes types such as that Java does not include. Per the Microsoft documentation: "Because serves as the base class of a variety of exception types, your code should not throw a exception, nor should it attempt to handle a exception unless you intend to re-throw the original exception." However, since we are not throwing the original exception, we are making a best effort by wrapping it in a custom exception that derives from . This will allow for code to catch for auditing or logging purposes to continue doing so without missing these exceptions. Lucene.NET will throw this exception with an populated with the actual exception (normally a in .NET). The primary reason for throwing a wrapper exception is to eliminate the possibility that the exception will be caught in one of the numerous catch blocks in Lucene unintentionally, and this is a way to preserve the stack trace of the original exception when it is rethrown.

Initializes a new instance of .

Initializes a new instance of with the specified .

The message that describes the error.

Initializes a new instance of with the specified and .

The message that describes the error. The original .

Initializes a new instance of with the specified .

The original .

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

LUCENENET specific abstract class containing common fuctionality for named service factories.

The type of service this factory applies to.

Ensures the method has been called since the last application start. This method is thread-safe.

Initializes the dependencies of this factory (such as using Reflection to populate the type cache).

The Lucene.Net.Codecs assembly or null if the assembly is not referenced in the host project.

Determines whether the given type is corresponding service for this class, based on its generic closing type .

The of service to analyze. true if the service subclasses , is public, and is not abstract; otherwise false.

Get the service name for the class (either by convention or by attribute).

A service to get the name for. The canonical name of the service or the name provided in the corresponding name attribute, if supplied.

Gets the type name without the suffix of the abstract base class it implements. If the class is generic, it will add the word "Generic" to the suffix in place of "`" to ensure the name is ASCII-only.

The to get the name for. The canonical name of the service.

Validates that a service name meets the requirements of Lucene

Checks whether a character is a letter or digit (ascii) which are defined in the spec.

Gets a value that indicates whether the current application domain executes with full trust.

A LUCENENET specific class that represents a numeric format. This class mimicks the design of Java's NumberFormat class, which unlike the class in .NET, can be subclassed.

When overridden in a subclass, provides the numeric format as a . Generally, this is the same format that is passed into the method.

A numeric format string.

Implementation of that handles type conversion and default values for Java-style properties. Reads properties from a that is supplied to the constructor.

Initializes a new instance of with the specified . The delegate method ensures the current instance of is used.

The .

Retrieves the value of an property from the current process.

The name of the property. The property value.

Retrieves the value of an property from the current process, with a default value if it doens't exist or the caller doesn't have permission to read the value.

The name of the property. The value to use if the property does not exist or the caller doesn't have permission to read the value. The property value.

Retrieves the value of an property from the current process as . If the value cannot be cast to , returns false.

The name of the property. The property value.

Retrieves the value of an property from the current process as , with a default value if it doens't exist, the caller doesn't have permission to read the value, or the value cannot be cast to a .

The name of the property. The value to use if the property does not exist, the caller doesn't have permission to read the value, or the value cannot be cast to . The property value.

Retrieves the value of an property from the current process as . If the value cannot be cast to , returns 0.

The name of the property. The property value.

Retrieves the value of an property from the current process as , with a default value if it doens't exist, the caller doesn't have permission to read the value, or the value cannot be cast to a .

The name of the property. The value to use if the property does not exist, the caller doesn't have permission to read the value, or the value cannot be cast to . The property value.

LUCENENET specific abstract class for s that can be used to override the default convention-based names of services. For example, "Lucene40Codec" will by convention be named "Lucene40". Using the , the name can be overridden with a custom value.

Sole constructor. Initializes the service name.

Gets the service name.

Mimics , but allows for swapping the of and with user-defined implementations.

Reads properties from an instance. The default configuration reads the property valies from an instance returned by a implementation. The is set using . This can be supplied a user implemented to customize the property sources.

Retrieves the value of a property from the current process.

The name of the property. The property value.

Retrieves the value of a property from the current process, with a default value if it doens't exist or the caller doesn't have permission to read the value.

The name of the property. The value to use if the property does not exist or the caller doesn't have permission to read the value. The property value.

Retrieves the value of a property from the current process as . If the value cannot be cast to , returns false.

The name of the property. The property value.

Retrieves the value of a property from the current process as , with a default value if it doens't exist, the caller doesn't have permission to read the value, or the value cannot be cast to a .

The name of the property. The value to use if the property does not exist, the caller doesn't have permission to read the value, or the value cannot be cast to . The property value.

Retrieves the value of a property from the current process as . If the value cannot be cast to , returns 0.

The name of the property. The property value.

Retrieves the value of a property from the current process as , with a default value if it doens't exist, the caller doesn't have permission to read the value, or the value cannot be cast to a .

The name of the property. The value to use if the property does not exist, the caller doesn't have permission to read the value, or the value cannot be cast to . The property value.

An object whose RAM usage can be computed. @lucene.internal

Return the memory usage of this object in bytes. Negative values are illegal.

An for object arrays. @lucene.internal

Create a new .

An for object arrays. @lucene.internal

Create a new .

A for object arrays. @lucene.internal

Create a new .

Methods for manipulating arrays. @lucene.internal

Maximum length for an array; we set this to "a bit" below because the exact max allowed byte[] is JVM dependent, so we want to avoid a case where a large value worked during indexing on one JVM but failed later at search time with a different JVM.

Parses the string argument as if it was an value and returns the result. Throws if the string does not represent an int quantity. NOTE: This was parseInt() in Lucene

A string representation of an int quantity. The value represented by the argument If the argument could not be parsed as an int quantity.

Parses a char array into an . NOTE: This was parseInt() in Lucene

The character array The offset into the array The length the If it can't parse

Parses the string argument as if it was an value and returns the result. Throws if the string does not represent an quantity. The second argument specifies the radix to use when parsing the value. NOTE: This was parseInt() in Lucene

A string representation of an int quantity. The base to use for conversion. The value represented by the argument If the argument could not be parsed as an int quantity.

Returns an array size >= , generally over-allocating exponentially to achieve amortized linear-time cost as the array grows. NOTE: this was originally borrowed from Python 2.4.2 listobject.c sources (attribution in LICENSE.txt), but has now been substantially changed based on discussions from java-dev thread with subject "Dynamic array reallocation algorithms", started on Jan 12 2010. @lucene.internal

Minimum required value to be returned. Bytes used by each element of the array. See constants in .

Returns hash of chars in range start (inclusive) to end (inclusive)

Returns hash of bytes in range start (inclusive) to end (inclusive)

See if two array slices are the same.

The left array to compare The offset into the array. Must be positive The right array to compare the offset into the right array. Must be positive The length of the section of the array to compare true if the two arrays, starting at their respective offsets, are equal

See if two array slices are the same.

NOTE: This was toIntArray() in Lucene

Get the natural for the provided object class. The comparer returned depends on the argument: If the type is , the comparer returned uses the to make the comparison to ensure that the current culture doesn't affect the results. This is the default string comparison used in Java, and what Lucene's design depends on. If the type implements , the comparer uses for the comparison. This allows the use of types with custom comparison schemes. If neither of the above conditions are true, will default to . NOTE: This was naturalComparer() in Lucene

Swap values stored in slots and

Sorts the given array slice using the . This method uses the intro sort algorithm, but falls back to insertion sort for small arrays.

Start index (inclusive) End index (exclusive)

Sorts the given array using the . This method uses the intro sort algorithm, but falls back to insertion sort for small arrays.

Sorts the given array slice in natural order. This method uses the intro sort algorithm, but falls back to insertion sort for small arrays.

Start index (inclusive) End index (exclusive)

Sorts the given array in natural order. This method uses the intro sort algorithm, but falls back to insertion sort for small arrays.

Sorts the given array slice using the . This method uses the Tim sort algorithm, but falls back to binary sort for small arrays.

Start index (inclusive) End index (exclusive)

Sorts the given array using the . this method uses the Tim sort algorithm, but falls back to binary sort for small arrays.

Sorts the given array slice in natural order. this method uses the Tim sort algorithm, but falls back to binary sort for small arrays.

Start index (inclusive) End index (exclusive)

Sorts the given array in natural order. this method uses the Tim sort algorithm, but falls back to binary sort for small arrays.

Base interface for attributes.

Base class for Attributes that can be added to a . Attributes are used to add data in a dynamic, yet type-safe way to a source of usually streamed objects, e. g. a .

Clears the values in this and resets it to its default value. If this implementation implements more than one interface it clears all.

This is equivalent to the anonymous class in the Java version of ReflectAsString

This method returns the current attribute values as a string in the following format by calling the method: if =true: "AttributeClass.Key=value,AttributeClass.Key=value" if =false: "key=value,key=value"

This method is for introspection of attributes, it should simply add the key/values this attribute holds to the given . The default implementation calls for all non-static fields from the implementing class, using the field name as key and the field value as value. The class is also determined by Reflection. Please note that the default implementation can only handle single-Attribute implementations. Custom implementations look like this (e.g. for a combined attribute implementation):


                public void ReflectWith(IAttributeReflector reflector) 
                {
                    reflector.Reflect(typeof(ICharTermAttribute), "term", GetTerm());
                    reflector.Reflect(typeof(IPositionIncrementAttribute), "positionIncrement", GetPositionIncrement());
                }

If you implement this method, make sure that for each invocation, the same set of interfaces and keys are passed to in the same order, but possibly different values. So don't automatically exclude e.g. null properties!

The default implementation of this method accesses all declared fields of this object and prints the values in the following syntax:


             public String ToString() 
             {
                 return "start=" + startOffset + ",end=" + endOffset;
             }

This method may be overridden by subclasses.

Copies the values from this into the passed-in attribute. The implementation must support all the s this implementation supports.

Shallow clone. Subclasses must override this if they need to clone any members deeply,

This interface is used to reflect contents of or .

LUCENENET specific overload to support generics.

This method gets called for every property in an / passing the of the , a and the actual . E.g., an invocation of would call this method once using typeof(ICharTermAttribute) as attribute type, "term" as and the actual as a .

An contains a list of different s, and methods to add and get them. There can only be a single instance of an attribute in the same instance. This is ensured by passing in the actual type of the to the , which then checks if an instance of that type is already present. If yes, it returns the instance, otherwise it creates a new instance and returns it.

An creates instances of s.

returns an for the supplied interface.

This is the default factory that creates s using the of the supplied interface by removing the I from the prefix.

This class holds the state of an .

An using the default attribute factory .

An that uses the same attributes as the supplied one.

An using the supplied for creating new instances.

Returns the used .

Returns a new iterator that iterates the attribute classes in the same order they were added in.

Returns a new iterator that iterates all unique implementations. This iterator may contain less entries than , if one instance implements more than one interface.

A cache that stores all interfaces for known implementation classes for performance (slow reflection)

Expert: Adds a custom instance with one or more interfaces. Please note: It is not guaranteed, that is added to the , because the provided attributes may already exist. You should always retrieve the wanted attributes using after adding with this method and cast to your . The recommended way to use custom implementations is using an .

The caller must pass in an interface type that extends . This method first checks if an instance of the corresponding class is already in this and returns it. Otherwise a new instance is created, added to this and returned.

Returns true, if this has any attributes

The caller must pass in an interface type that extends . Returns true, if this contains the corrsponding .

The caller must pass in an interface type that extends . Returns the instance of the corresponding contained in this

if this does not contain the . It is recommended to always use even in consumers of s, because you cannot know if a specific really uses a specific . will automatically make the attribute available. If you want to only use the attribute, if it is available (to optimize consuming), use .

Resets all s in this by calling on each implementation.

Captures the state of all s. The return value can be passed to to restore the state of this or another .

Restores this state by copying the values of all attribute implementations that this state contains into the attributes implementations of the targetStream. The targetStream must contain a corresponding instance for each argument contained in this state (e.g. it is not possible to restore the state of an containing a into a using a instance as implementation). Note that this method does not affect attributes of the targetStream that are not contained in this state. In other words, if for example the targetStream contains an , but this state doesn't, then the value of the remains unchanged. It might be desirable to reset its value to the default, in which case the caller should first call (TokenStream.ClearAttributes() on the targetStream.

This method is for introspection of attributes, it should simply add the key/values this holds to the given . This method iterates over all implementations and calls the corresponding method.

Performs a clone of all instances returned in a new instance. This method can be used to e.g. create another with exactly the same attributes (using ). You can also use it as a (non-performant) replacement for , if you need to look into / modify the captured state.

Copies the contents of this to the given target . The given instance has to provide all s this instance contains. The actual attribute implementations must be identical in both instances; ideally both instances should use the same . You can use this method as a replacement for , if you use instead of .

Returns a string consisting of the class's simple name, the hex representation of the identity hash code, and the current reflection of all attributes.

Finite-state automaton with regular expression operations. Class invariants: An automaton is either represented explicitly (with and objects) or with a singleton string (see and ) in case the automaton is known to accept exactly one string. (Implicitly, all states and transitions of an automaton are reachable from its initial state.) Automata are always reduced (see ) and have no transitions to dead states (see ). If an automaton is nondeterministic, then returns false (but the converse is not required). Automata provided as input to operations are generally assumed to be disjoint. If the states or transitions are manipulated manually, the method and setter should be used afterwards to restore representation invariants that are assumed by the built-in automata operations. Note: this class has internal mutable state and is not thread safe. It is the caller's responsibility to ensure any necessary synchronization if you wish to use the same Automaton from multiple threads. In general it is instead recommended to use a for multithreaded matching: it is immutable, thread safe, and much faster. @lucene.experimental

Minimize using Hopcroft's O(n log n) algorithm. this is regarded as one of the most generally efficient algorithms that exist.

Selects minimization algorithm (default: MINIMIZE_HOPCROFT).

Initial state of this automaton.

If true, then this automaton is definitely deterministic (i.e., there are no choices for any run, but a run may crash).

Extra data associated with this automaton.

Singleton string. Null if not applicable.

Minimize always flag.

Selects whether operations may modify the input automata (default: false).

Constructs a new automaton that accepts the empty language. Using this constructor, automata can be constructed manually from and objects.

Selects minimization algorithm (default: MINIMIZE_HOPCROFT).

minimization algorithm

Sets or resets minimize always flag. If this flag is set, then will automatically be invoked after all operations that otherwise may produce non-minimal automata. By default, the flag is not set.

if true, the flag is set

Sets or resets allow mutate flag. If this flag is set, then all automata operations may modify automata given as input; otherwise, operations will always leave input automata languages unmodified. By default, the flag is not set.

if true, the flag is set previous value of the flag

Returns the state of the allow mutate flag. If this flag is set, then all automata operations may modify automata given as input; otherwise, operations will always leave input automata languages unmodified. By default, the flag is not set.

current value of the flag

Returns the singleton string for this automaton. An automaton that accepts exactly one string may be represented in singleton mode. In that case, this method may be used to obtain the string.

String, null if this automaton is not in singleton mode.

Gets initial state.

state

Returns deterministic flag for this automaton.

true if the automaton is definitely deterministic, false if the automaton may be nondeterministic

Associates extra information with this automaton.

extra information

Returns the set of reachable accept states.

Set of objects.

Adds transitions to explicit crash state to ensure that transition function is total.

Restores representation invariant. This method must be invoked before any built-in automata operation is performed if automaton states or transitions are manipulated manually.

Reduces this automaton. An automaton is "reduced" by combining overlapping and adjacent edge intervals with same destination.

Returns sorted array of all interval start points.

Returns the set of live states. A state is "live" if an accept state is reachable from it.

Set of objects.

Removes transitions to dead states and calls . (A state is "dead" if no accept state is reachable from it.)

Returns a sorted array of transitions for each state (and sets state numbers).

Expands singleton representation to normal representation. Does nothing if not in singleton representation.

Returns the number of states in this automaton.

Returns the number of transitions in this automaton. This number is counted as the total number of edges, where one edge may be a character interval.

Returns a string representation of this automaton.

Returns Graphviz Dot representation of this automaton.

Returns a clone of this automaton, expands if singleton.

Returns a clone of this automaton unless is set, expands if singleton.

Returns a clone of this automaton.

Returns a clone of this automaton, or this automaton itself if flag is set.

See .

See . Returns the automaton being given as argument.

Automaton provider for . @lucene.experimental

Returns automaton of the given name.

Automaton name. Automaton. If errors occur.

Construction of basic automata. @lucene.experimental

Returns a new (deterministic) automaton with the empty language.

Returns a new (deterministic) automaton that accepts only the empty string.

Returns a new (deterministic) automaton that accepts all strings.

Returns a new (deterministic) automaton that accepts any single codepoint.

Returns a new (deterministic) automaton that accepts a single codepoint of the given value.

Returns a new (deterministic) automaton that accepts a single codepoint whose value is in the given interval (including both end points).

Constructs sub-automaton corresponding to decimal numbers of length x.Substring(n).Length.

Constructs sub-automaton corresponding to decimal numbers of value at least x.Substring(n) and length x.Substring(n).Length.

Constructs sub-automaton corresponding to decimal numbers of value at most x.Substring(n) and length x.Substring(n).Length.

Constructs sub-automaton corresponding to decimal numbers of value between x.Substring(n) and y.Substring(n) and of length x.Substring(n).Length (which must be equal to y.Substring(n).Length).

Returns a new automaton that accepts strings representing decimal non-negative integers in the given interval.

Minimal value of interval. Maximal value of interval (both end points are included in the interval). If > 0, use fixed number of digits (strings must be prefixed by 0's to obtain the right length) - otherwise, the number of digits is not fixed. If min > max or if numbers in the interval cannot be expressed with the given fixed number of digits.

Returns a new (deterministic) automaton that accepts the single given string.

Returns a new (deterministic and minimal) automaton that accepts the union of the given collection of s representing UTF-8 encoded strings.

The input strings, UTF-8 encoded. The collection must be in sorted order. An accepting all input strings. The resulting automaton is codepoint based (full unicode codepoints on transitions).

Basic automata operations. @lucene.experimental

Returns an automaton that accepts the concatenation of the languages of the given automata. Complexity: linear in number of states.

Returns an automaton that accepts the concatenation of the languages of the given automata. Complexity: linear in total number of states.

Returns an automaton that accepts the union of the empty string and the language of the given automaton. Complexity: linear in number of states.

Returns an automaton that accepts the Kleene star (zero or more concatenated repetitions) of the language of the given automaton. Never modifies the input automaton language. Complexity: linear in number of states.

Returns an automaton that accepts or more concatenated repetitions of the language of the given automaton. Complexity: linear in number of states and in .

Returns an automaton that accepts between and (including both) concatenated repetitions of the language of the given automaton. Complexity: linear in number of states and in and .

Returns a (deterministic) automaton that accepts the complement of the language of the given automaton. Complexity: linear in number of states (if already deterministic).

Returns a (deterministic) automaton that accepts the intersection of the language of and the complement of the language of . As a side-effect, the automata may be determinized, if not already deterministic. Complexity: quadratic in number of states (if already deterministic).

Returns an automaton that accepts the intersection of the languages of the given automata. Never modifies the input automata languages. Complexity: quadratic in number of states.

Returns true if these two automata accept exactly the same language. This is a costly computation! Note also that and will be determinized as a side effect.

Returns true if the language of is a subset of the language of . As a side-effect, is determinized if not already marked as deterministic. Complexity: quadratic in number of states.

Returns an automaton that accepts the union of the languages of the given automata. Complexity: linear in number of states.

Determinizes the given automaton. Worst case complexity: exponential in number of states.

Adds epsilon transitions to the given automaton. This method adds extra character interval transitions that are equivalent to the given set of epsilon transitions.

Automaton. Collection of objects representing pairs of source/destination states where epsilon transitions should be added.

Returns true if the given automaton accepts the empty string and nothing else.

Returns true if the given automaton accepts no strings.

Returns true if the given automaton accepts all strings.

Returns true if the given string is accepted by the automaton. Complexity: linear in the length of the string. Note: for full performance, use the class.

Automaton representation for matching UTF-8 .

Expert: if utf8 is true, the input is already byte-based

Returns true if the given byte array is accepted by this automaton.

Automaton representation for matching .

Returns true if the given string is accepted by this automaton.

Immutable class holding compiled details for a given . The is deterministic, must not have dead states but is not necessarily minimal. @lucene.experimental

Automata are compiled into different internal forms for the most efficient execution depending upon the language they accept.

Automaton that accepts no strings.

Automaton that accepts all possible strings.

Automaton that accepts only a single fixed string.

Automaton that matches all strings with a constant prefix.

Catch-all for any other automata.

For , this is the prefix term; for this is the singleton term.

Matcher for quickly determining if a is accepted. only valid for .

Two dimensional array of transitions, indexed by state number for traversal. The state numbering is consistent with . Only valid for .

Shared common suffix accepted by the automaton. Only valid for , and only when the automaton accepts an infinite language.

Indicates if the automaton accepts a finite set of strings. Null if this was not computed. Only valid for .

Finds largest term accepted by this Automaton, that's <= the provided input term. The result is placed in output; it's fine for output and input to point to the same . The returned result is either the provided output, or null if there is no floor term (ie, the provided input term is before the first term accepted by this ).

Builds a minimal, deterministic that accepts a set of strings. The algorithm requires sorted input data, but is very fast (nearly linear with the input size).

DFSA state with labels on transitions.

An empty set of labels.

An empty set of states.

Labels of outgoing transitions. Indexed identically to . Labels must be sorted lexicographically.

States reachable from outgoing transitions. Indexed identically to .

true if this state corresponds to the end of at least one input sequence.

Returns the target state of a transition leaving this state and labeled with . If no such transition exists, returns null.

Two states are equal if: They have an identical number of outgoing transitions, labeled with the same labels. Corresponding outgoing transitions lead to the same states (to states with an identical right-language).

Compute the hash code of the current status of this state.

Return true if this state has any children (outgoing transitions).

Create a new outgoing transition labeled and return the newly created target state for this transition.

Return the most recent transitions's target state.

Return the associated state if the most recent transition is labeled with .

Replace the last added outgoing transition's target state with the given .

Compare two lists of objects for reference-equality.

A "registry" for state interning.

Root automaton state.

Previous sequence added to the automaton in .

A comparer used for enforcing sorted UTF8 order, used in assertions only.

Add another character sequence to this automaton. The sequence must be lexicographically larger or equal compared to any previous sequences added to this automaton (the input must be sorted).

Finalize the automaton and return the root state. No more strings can be added to the builder after this call.

Root automaton state.

Internal recursive traversal for conversion.

Must use a dictionary with passed into its constructor.

Build a minimal, deterministic automaton from a sorted list of representing strings in UTF-8. These strings must be binary-sorted.

Copy into an internal buffer.

Replace last child of with an already registered state or stateRegistry the last child state.

Add a suffix of starting at (inclusive) to state .

Parametric description for generating a Levenshtein automaton of degree 1

Parametric description for generating a Levenshtein automaton of degree 1, with transpositions as primitive edits

Parametric description for generating a Levenshtein automaton of degree 2

Parametric description for generating a Levenshtein automaton of degree 2, with transpositions as primitive edits

Class to construct DFAs that match a word within some edit distance. Implements the algorithm described in: Schulz and Mihov: Fast String Correction with Levenshtein Automata @lucene.experimental

@lucene.internal

Create a new for some string. Optionally count transpositions as a primitive edit.

Expert: specify a custom maximum possible symbol (alphaMax); default is .

Compute a DFA that accepts all strings within an edit distance of . All automata have the following properties: They are deterministic (DFA). There are no transitions to dead states. They are not minimal (some transitions could be combined).

Get the characteristic vector X(x, V) where V is Substring(pos, end - pos).

A describes the structure of a Levenshtein DFA for some degree n. There are four components of a parametric description, all parameterized on the length of the word w: The number of states: The set of final states: The transition function: Minimal boundary function:

Return the number of states needed to compute a Levenshtein DFA. NOTE: This was size() in Lucene.

Returns true if the state in any Levenshtein DFA is an accept state (final state).

Returns the position in the input word for a given state. this is the minimal boundary for the state.

Returns the state number for a transition from the given , assuming and characteristic vector .

Operations for minimizing automata. @lucene.experimental

Minimizes (and determinizes if not already deterministic) the given automaton.

Minimizes the given automaton using Hopcroft's algorithm.

NOTE: This was IntPair in Lucene

Syntax flag, enables intersection (&).

Syntax flag, enables complement (~).

Syntax flag, enables empty language (#).

Syntax flag, enables anystring (@).

Syntax flag, enables named automata (<identifier>).

Syntax flag, enables numerical intervals ( <n-m>).

Syntax flag, enables all optional regexp syntax.

Syntax flag, enables no optional regexp syntax.

Regular Expression extension to . Regular expressions are built from the following abstract syntax: regexp ::= unionexp | unionexp ::= interexp | unionexp (union) | interexp interexp ::= concatexp & interexp (intersection) [OPTIONAL] | concatexp concatexp ::= repeatexp concatexp (concatenation) | repeatexp repeatexp ::= repeatexp ? (zero or one occurrence) | repeatexp * (zero or more occurrences) | repeatexp + (one or more occurrences) | repeatexp {n} (n occurrences) | repeatexp {n,} (n or more occurrences) | repeatexp {n,m} (n to m occurrences, including both) | complexp complexp ::= ~ complexp (complement) [OPTIONAL] | charclassexp charclassexp ::= [ charclasses ] (character class) | [^ charclasses ] (negated character class) | simpleexp charclasses ::= charclass charclasses | charclass charclass ::= charexp - charexp (character range, including end-points) | charexp simpleexp ::= charexp | . (any single character) | # (the empty language) [OPTIONAL] | @ (any string) [OPTIONAL] | " <Unicode string without double-quotes> " (a string) | ( ) (the empty string) | ( unionexp ) (precedence override) | < <identifier> > (named automaton) [OPTIONAL] | <n-m> (numerical interval) [OPTIONAL] charexp ::= <Unicode character> (a single non-reserved character) | \ <Unicode character> (a single character) The productions marked [OPTIONAL] are only allowed if specified by the syntax flags passed to the constructor. The reserved characters used in the (enabled) syntax must be escaped with backslash (\) or double-quotes ("..."). (In contrast to other regexp syntaxes, this is required also in character classes.) Be aware that dash (-) has a special meaning in charclass expressions. An identifier is a string not containing right angle bracket (>) or dash (-). Numerical intervals are specified by non-negative decimal integers and include both end points, and if n and m have the same number of digits, then the conforming strings must have that length (i.e. prefixed by 0's). @lucene.experimental

Constructs new from a string. Same as RegExp(s, RegExpSyntax.ALL).

Regexp string. If an error occured while parsing the regular expression.

Constructs new from a string.

Regexp string. Boolean 'or' of optional constructs to be enabled. If an error occured while parsing the regular expression

Constructs new from this . Same as ToAutomaton(null) (empty automaton map).

Constructs new from this . The constructed automaton is minimal and deterministic and has no transitions to dead states.

Provider of automata for named identifiers. If this regular expression uses a named identifier that is not available from the automaton provider.

Constructs new from this . The constructed automaton is minimal and deterministic and has no transitions to dead states.

A map from automaton identifiers to automata (of type ). If this regular expression uses a named identifier that does not occur in the automaton map.

Sets or resets allow mutate flag. If this flag is set, then automata construction uses mutable automata, which is slightly faster but not thread safe. By default, the flag is not set.

If true, the flag is set Previous value of the flag.

Constructs string from parsed regular expression.

Returns set of automaton identifiers that occur in this regular expression.

Finite-state automaton with fast run operation. @lucene.experimental

Returns a string representation of this automaton.

Returns number of states in automaton. NOTE: This was size() in Lucene.

Returns acceptance status for given state.

Returns initial state.

Returns array of codepoint class interval start points. The array should not be modified by the caller.

Gets character class of given codepoint.

Constructs a new from a deterministic .

An automaton.

Returns the state obtained by reading the given char from the given state. Returns -1 if not obtaining any such state. (If the original had no dead states, -1 is returned here if and only if a dead state is entered in an equivalent automaton with a total transition function.)

Just holds a set of states, plus a corresponding count per state. Used by . NOTE: This was SortedIntSet in Lucene

NOTE: This was FrozenIntSet in Lucene

Special automata operations. @lucene.experimental

Finds the largest entry whose value is less than or equal to , or 0 if there is no such entry.

Returns true if the language of this automaton is finite.

Checks whether there is a loop containing . (This is sufficient since there are never transitions to dead states.)

Returns the longest string that is a prefix of all accepted strings and visits each state at most once.

Common prefix.

Returns the longest string that is a suffix of all accepted strings and visits each state at most once.

Common suffix.

Reverses the language of the given (non-singleton) automaton while returning the set of new initial states.

Returns the set of accepted strings, assuming that at most strings are accepted. If more than strings are accepted, the first limit strings found are returned. If <0, then the limit is infinite.

Returns the strings that can be produced from the given state, or false if more than strings are found. <0 means "infinite".

state. @lucene.experimental

Constructs a new state. Initially, the new state is a reject state.

Resets transition set.

Returns the set of outgoing transitions. Subsequent changes are reflected in the automaton.

Transition set.

Adds an outgoing transition.

Transition.

Sets acceptance for this state. If true, this state is an accept state.

Performs lookup in transitions, assuming determinism.

Codepoint to look up. Destination state, null if no matching outgoing transition.

Performs lookup in transitions, allowing nondeterminism.

Codepoint to look up. Collection where destination states are stored.

Virtually adds an epsilon transition to the target state. this is implemented by copying all transitions from to this state, and if is an accept state then set accept for this state.

Downsizes transitionArray to numTransitions.

Reduces this state. A state is "reduced" by combining overlapping and adjacent edge intervals with same destination.

Returns sorted list of outgoing transitions.

Comparer to sort with. Transition list.

Sorts transitions array in-place.

Return this state's number. Expert: Will be useless unless has been called first to number the states.

The number.

Returns string describing this state. Normally invoked via .

Compares this object with the specified object for order. States are ordered by the time of construction.

Pair of states. @lucene.experimental

Constructs a new state pair.

First state. Second state.

Returns first component of this pair.

First state.

Returns second component of this pair.

Second state.

Checks for equality.

Object to compare with. true if represents the same pair of states as this pair.

Returns hash code.

Hash code.

transition. A transition, which belongs to a source state, consists of a Unicode codepoint interval and a destination state. @lucene.experimental

Constructs a new singleton interval transition.

Transition codepoint. Destination state.

Constructs a new transition. Both end points are included in the interval.

Transition interval minimum. Transition interval maximum. Destination state.

Returns minimum of this transition interval.

Returns maximum of this transition interval.

Returns destination of this transition.

Checks for equality.

Object to compare with. true if is a transition with same character interval and destination state as this transition.

Returns hash code. The hash code is based on the character interval (not the destination state).

Hash code.

Clones this transition.

Clone with same character interval and destination state.

Returns a string describing this state. Normally invoked via .

Converts UTF-32 automata to the equivalent UTF-8 representation. @lucene.internal

Converts an incoming utf32 to an equivalent utf8 one. The incoming automaton need not be deterministic. Note that the returned automaton will not in general be deterministic, so you must determinize it if that's needed.

Interface for Bitset-like structures. @lucene.experimental

Returns the value of the bit with the specified .

Index, should be non-negative and < . The result of passing negative or out of bounds values is undefined by this interface, just don't do it! true if the bit is set, false otherwise.

Returns the number of bits in this set

Bits impl of the specified length with all bits set.

Bits impl of the specified length with no bits set.

A variety of high efficiency bit twiddling routines. @lucene.internal

the python code that generated bitlist


             def bits2int(val):
             arr=0
             for shift in range(8,0,-1):
              if val & 0x80:
                arr = (arr << 4) | shift
              val = val << 1
             return arr
            
             def int_table():
              tbl = [ hex(bits2int(val)).strip('L') for val in range(256) ]
              return ','.join(tbl)

Return the number of bits sets in .

Return the list of bits which are set in encoded as followed: (i >>> (4 * n)) & 0x0F is the offset of the n-th set bit of the given byte plus one, or 0 if there are n or less bits set in the given byte. For example bitList(12) returns 0x43: 0x43 & 0x0F is 3, meaning the the first bit set is at offset 3-1 = 2, (0x43 >>> 4) & 0x0F is 4, meaning there is a second bit set at offset 4-1=3, (0x43 >>> 8) & 0x0F is 0, meaning there is no more bit set in this byte.

Returns the number of set bits in an array of s.

Returns the popcount or cardinality of the two sets after an intersection. Neither array is modified.

Returns the popcount or cardinality of the union of two sets. Neither array is modified.

Returns the popcount or cardinality of A & ~B. Neither array is modified.

Returns the popcount or cardinality of A ^ B Neither array is modified.

Returns the next highest power of two, or the current value if it's already a power of two or zero

Methods and constants inspired by the article "Broadword Implementation of Rank/Select Queries" by Sebastiano Vigna, January 30, 2012: algorithm 1: , count of set bits in a algorithm 2: , selection of a set bit in a , bytewise signed smaller <₈ operator: . shortwise signed smaller <₁₆ operator: . some of the Lk and Hk constants that are used by the above: L8 , H8 , L9 , L16 and H16 . @lucene.internal

Bit count of a . Only here to compare the implementation with , normally is preferable.

The total number of 1 bits in x.

Select a 1-bit from a .

The index of the r-th 1 bit in x, or if no such bit exists, 72.

A signed bytewise smaller <₈ operator, for operands 0L<= x, y <=0x7L. This uses the following numbers of basic operations: 1 or, 2 and, 2 xor, 1 minus, 1 not.

A with bits set in the positions corresponding to each input signed byte pair that compares smaller.

An unsigned bytewise smaller <₈ operator. This uses the following numbers of basic operations: 3 or, 2 and, 2 xor, 1 minus, 1 not.

A with bits set in the positions corresponding to each input unsigned byte pair that compares smaller.

An unsigned bytewise not equals 0 operator. This uses the following numbers of basic operations: 2 or, 1 and, 1 minus.

A with bits set in the positions corresponding to each unsigned byte that does not equal 0.

A bytewise smaller <₁₆ operator. This uses the following numbers of basic operations: 1 or, 2 and, 2 xor, 1 minus, 1 not.

A with bits set in the positions corresponding to each input signed short pair that compares smaller.

Lk denotes the constant whose ones are in position 0, k, 2k, . . . These contain the low bit of each group of k bits. The suffix _L indicates the implementation.

Hk = Lk << (k-1) . These contain the high bit of each group of k bits. The suffix _L indicates the implementation.

Naive implementation of , using repetitively. Works relatively fast for low ranks.

The index of the r-th 1 bit in x, or if no such bit exists, 72.

Class that Posting and PostingVector use to write byte streams into shared fixed-size arrays. The idea is to allocate slices of increasing lengths. For example, the first slice is 5 bytes, the next slice is 14, etc. We start by writing our bytes into the first 5 bytes. When we hit the end of the slice, we allocate the next slice and then write the address of the new slice into the last 4 bytes of the previous slice (the "forwarding address"). Each slice is filled with 0's initially, and we mark the end with a non-zero byte. This way the methods that are writing into the slice don't need to record its length and instead allocate a new slice once they hit a non-zero byte. @lucene.internal

Abstract class for allocating and freeing byte blocks.

A simple that never recycles.

A simple that never recycles, but tracks how much total RAM is in use.

Array of buffers currently used in the pool. Buffers are allocated if needed don't modify this outside of this class.

index into the buffers array pointing to the current buffer used as the head

Where we are in head buffer

Current head buffer

Current head offset

Resets the pool to its initial state reusing the first buffer and fills all buffers with 0 bytes before they reused or passed to . Calling is not needed after reset.

Expert: Resets the pool to its initial state reusing the first buffer. Calling is not needed after reset.

if true the buffers are filled with 0. this should be set to true if this pool is used with slices. if true the first buffer will be reused and calling is not needed after reset if the block pool was used before ie. was called before.

Advances the pool to its next buffer. This method should be called once after the constructor to initialize the pool. In contrast to the constructor a call will advance the pool to its first buffer immediately.

Allocates a new slice with the given size.

An array holding the offset into the to quickly navigate to the next slice level.

An array holding the level sizes for byte slices.

The first level size for new slices

Creates a new byte slice with the given starting size and returns the slices offset in the pool.

Appends the bytes in the provided at the current position.

Reads bytes bytes out of the pool starting at the given offset with the given length into the given byte array at offset off. Note: this method allows to copy across block boundaries.

Represents , as a slice (offset + length) into an existing . The property should never be null; use if necessary. Important note: Unless otherwise noted, Lucene uses this class to represent terms that are encoded as UTF8 bytes in the index. To convert them to a .NET (which is UTF16), use . Using code like new String(bytes, offset, length) to do this is wrong, as it does not respect the correct character set and may return wrong results (depending on the platform's defaults)!

An empty byte array for convenience

The contents of the BytesRef. Should never be null.

Offset of first valid byte.

Length of used bytes.

Create a with

This instance will directly reference w/o making a copy. should not be null.

Create a pointing to a new array of size . Offset and length will both be zero.

Initialize the from the UTF8 bytes for the provided .

This must be well-formed unicode text, with no unpaired surrogates.

Initialize the from the UTF8 bytes for the provided .

This must be well-formed unicode text, with no unpaired surrogates.

Copies the UTF8 bytes for this .

Must be well-formed unicode text, with no unpaired surrogates or invalid UTF16 code units.

Copies the UTF8 bytes for this .

Must be well-formed unicode text, with no unpaired surrogates or invalid UTF16 code units.

Expert: Compares the bytes against another , returning true if the bytes are equal. @lucene.internal

Another , should not be null.

Returns a shallow clone of this instance (the underlying bytes are not copied and will be shared by both the returned object and this object.

Calculates the hash code as required by during indexing. This is currently implemented as MurmurHash3 (32 bit), using the seed from , but is subject to change from release to release.

Interprets stored bytes as UTF8 bytes, returning the resulting .

Returns hex encoded bytes, eg [0x6c 0x75 0x63 0x65 0x6e 0x65]

Copies the bytes from the given NOTE: if this would exceed the array size, this method creates a new reference array.

Appends the bytes from the given NOTE: if this would exceed the array size, this method creates a new reference array.

Used to grow the reference array. In general this should not be used as it does not take the offset into account. @lucene.internal

Unsigned byte order comparison

@deprecated this comparer is only a transition mechanism @deprecated this comparer is only a transition mechanism

Creates a new that points to a copy of the bytes from . The returned will have a length of other.Length and an offset of zero.

Performs internal consistency checks. Always returns true (or throws )

@deprecated this comparer is only a transition mechanism

A simple append only random-access array that stores full copies of the appended bytes in a . Note: this class is not Thread-Safe! @lucene.internal @lucene.experimental

Creates a new with a counter to track allocated bytes

Clears this

Appends a copy of the given to this .

The bytes to append The index of the appended bytes

Returns the current size of this . NOTE: This was size() in Lucene.

The current size of this

Returns the n'th element of this

A spare instance The elements index to retrieve The n'th element of this

Sugar for with a null comparer

Returns a with point in time semantics. The iterator provides access to all so far appended instances. If a non null is provided the iterator will iterate the byte values in the order specified by the comparer. Otherwise the order is the same as the values were appended. This is a non-destructive operation.

Sugar for with a null comparer.

Returns a with point in time semantics. The enumerator provides access to all so far appended instances. If a non null is provided the enumerator will iterate the byte values in the order specified by the comparer. Otherwise the order is the same as the values were appended. This is a non-destructive operation.

is a special purpose hash-map like data-structure optimized for instances. maintains mappings of byte arrays to ids (Map<BytesRef,int>) storing the hashed bytes efficiently in continuous storage. The mapping to the id is encapsulated inside and is guaranteed to be increased for each added . Note: The maximum capacity instance passed to must not be longer than -2. The internal storage is limited to 2GB total byte storage. @lucene.internal

Creates a new with a using a .

Creates a new

Returns the number of values in this . NOTE: This was size() in Lucene.

The number of values in this .

Populates and returns a with the bytes for the given bytesID. Note: the given bytesID must be a positive integer less than the current size ()

The id The to populate The given instance populated with the bytes for the given bytesID

Returns the ids array in arbitrary order. Valid ids start at offset of 0 and end at a limit of - 1 Note: this is a destructive operation. must be called in order to reuse this instance.

Returns the values array sorted by the referenced byte values. Note: this is a destructive operation. must be called in order to reuse this instance.

The used for sorting

Clears the which maps to the given

Closes the and releases all internally used memory

Adds a new

The bytes to hash The id the given bytes are hashed if there was no mapping for the given bytes, otherwise (-(id)-1). this guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before. if the given bytes are > 2 +

Returns the id of the given .

The bytes to look for The id of the given bytes, or -1 if there is no mapping for the given bytes.

Adds a "arbitrary" int offset instead of a term. This is used in the indexer to hold the hash for term vectors, because they do not redundantly store the term directly and instead reference the term already stored by the postings . See .

Called when hash is too small (> 50% occupied) or too large (< 20% occupied).

Reinitializes the after a previous call. If has not been called previously this method has no effect.

Returns the bytesStart offset into the internally used for the given

The id to look up The bytesStart offset into the internally used for the given id

Thrown if a exceeds the limit of -2.

Initializes a new instance of this class with serialized data.

The that holds the serialized object data about the exception being thrown. The that contains contextual information about the source or destination.

Manages allocation of the per-term addresses.

Initializes the . This call will allocate memory.

The initialized bytes start array.

Grows the .

The grown array.

Clears the and returns the cleared instance.

The cleared instance, this might be null.

A reference holding the number of bytes used by this . The uses this reference to track it memory usage.

a reference holding the number of bytes used by this .

A simple that tracks memory allocation using a private instance.

A simple enumerator interface for iteration.

Increments the iteration to the next in the enumerator.

true if the enumerator was successfully advanced to the next element; false if the enumerator has passed the end of the collection. If there is a low-level I/O error.

Gets the for the current iteration. The returned may be reused across calls to .

Return the Comparer used to sort terms provided by the iterator. This may return null if there are no items or the iterator is not sorted. Callers may invoke this method many times, so it's best to cache a single instance & reuse it.

LUCENENET specific class to make the syntax of creating an empty the same as it was in Lucene. Example:


            var iter = BytesRefEnumerator.EMPTY;

Singleton that iterates over 0 BytesRefs.

A simple iterator interface for iteration.

Increments the iteration to the next in the iterator. Returns the resulting or null if the end of the iterator is reached. The returned may be re-used across calls to . After this method returns null, do not call it again: the results are undefined.

The next in the iterator or null if the end of the iterator is reached. If there is a low-level I/O error.

LUCENENET specific class to make the syntax of creating an empty the same as it was in Lucene. Example:


            var iter = BytesRefIterator.EMPTY;

Singleton that iterates over 0 BytesRefs.

Represents , as a slice (offset + Length) into an existing . The property should never be null; use if necessary. @lucene.internal

An empty character array for convenience

The contents of the . Should never be null.

Offset of first valid character.

Length of used characters.

Creates a new initialized an empty array zero-Length

Creates a new initialized with an array of the given .

Creates a new initialized with the given , and .

Creates a new initialized with the given character array.

Returns a shallow clone of this instance (the underlying characters are not copied and will be shared by both the returned object and this object.

Signed order comparison

Copies the given referenced content into this instance.

The to copy.

Used to grow the reference array. In general this should not be used as it does not take the offset into account. @lucene.internal

Copies the given array into this .

Appends the given array to this .

@deprecated this comparer is only a transition mechanism @deprecated this comparer is only a transition mechanism @deprecated this comparer is only a transition mechanism

Creates a new that points to a copy of the chars from . The returned will have a Length of other.Length and an offset of zero.

Performs internal consistency checks. Always returns true (or throws )

.NET's built-in has a serious flaw: internally, it creates an array with an internal lattice structure which in turn causes the garbage collector to cause long blocking pauses when tearing the structure down. See https://ayende.com/blog/189761-A/production-postmortem-the-slow-slowdown-of-large-systems for a more detailed explanation. This is a completely different problem than in Java which the ClosableThreadLocal<T> class is meant to solve, so is specific to Lucene.NET and can be used as a direct replacement for ClosableThreadLocal<T>. This class works around the issue by using an alternative approach than using . It keeps track of each thread's local and global state in order to later optimize garbage collection. A complete explanation can be found at https://ayende.com/blog/189793-A/the-design-and-implementation-of-a-better-threadlocal-t. @lucene.internal

Specifies the type of data stored per-thread.

Initializes the instance.

The default value of is used to initialize the instance when is accessed for the first time.

Initializes the instance with the specified function.

The invoked to produce a lazily-initialized value when an attempt is made to retrieve without it having been previously initialized. is null.

Gets a collection for all of the values currently stored by all of the threads that have accessed this instance.

The instance has been disposed.

Gets whether Value is initialized on the current thread.

The instance has been disposed.

Gets or sets the value of this instance for the current thread.

The instance has been disposed. If this instance was not previously initialized for the current thread, accessing Value will attempt to initialize it. If an initialization function was supplied during the construction, that initialization will happen by invoking the function to retrieve the initial value for . Otherwise, the default value of will be used.

Releases the resources used by this instance.

Methods for manipulating (sorting) collections. Sort methods work directly on the supplied lists and don't copy to/from arrays before/after. For medium size collections as used in the Lucene indexer that is much more efficient. @lucene.internal

Sorts the given using the . This method uses the intro sort algorithm, but falls back to insertion sort for small lists.

This The to use for the sort.

Sorts the given random access in natural order. This method uses the intro sort algorithm, but falls back to insertion sort for small lists.

This

Sorts the given using the . This method uses the Tim sort algorithm, but falls back to binary sort for small lists.

this The to use for the sort.

Sorts the given in natural order. This method uses the Tim sort algorithm, but falls back to binary sort for small lists.

This

Class containing some useful methods used by command line tools

Creates a specific instance starting from its class name.

The name of the class to load. The to be used as parameter constructor. The new instance

Loads a specific implementation.

The name of the class to load. The class loaded. If the specified class cannot be found.

Loads a specific implementation.

The name of the class to load. The class loaded. If the specified class cannot be found.

Creates a new specific instance.

The class of the object to be created The to be used as parameter constructor The new instance. If the does not have a constructor that takes . If the class is abstract or an interface. If the constructor does not have public visibility. If the constructor throws an exception

Some useful constants.

The maximum stack allocation size before switching to making allocations on the heap.

NOTE: This was JAVA_VENDOR in Lucene

The value of , excluding the version number.

True iff running on Linux.

True iff running on Windows.

True iff running on SunOS.

True iff running on Mac OS X

True iff running on FreeBSD

The value of the version parsed from . NOTE: This was JAVA_VERSION in Lucene

NOTE: This was JRE_IS_64BIT in Lucene

this is the internal Lucene version, recorded into each segment. NOTE: we track per-segment version as a with the "X.Y" format (no minor version), e.g. "4.0", "3.1", "3.0". Alpha and Beta versions will have numbers like "X.Y.0.Z", anything else is not allowed. This is done to prevent people from using indexes created with ALPHA/BETA versions with the released version.

This is the Lucene version for display purposes.

Returns a LUCENE_MAIN_VERSION without any ALPHA/BETA qualifier Used by test only!

Extracts the first group matched with the regex as a new string.

The string to examine A regex object to use to extract the string

Simple counter class @lucene.internal @lucene.experimental

Adds the given delta to the counters current value.

The delta to add. The counters updated value.

Gets the counters current value.

Returns the counters current value.

The counters current value.

Returns a new counter. The returned counter is not thread-safe.

Returns a new counter.

true if the returned counter can be used by multiple threads concurrently. A new counter.

Returns this counter's implicitly.

Simple and backed by a

This DocIdSet implementation is cacheable.

Returns the underlying .

Simple concurrent LRU cache, using a "double barrel" approach where two ConcurrentHashMaps record entries. At any given time, one hash is primary and the other is secondary. first checks primary, and if that's a miss, checks secondary. If secondary has the entry, it's promoted to primary (NOTE: the key is cloned at this point). Once primary is full, the secondary is cleared and the two are swapped. This is not as space efficient as other possible concurrent approaches (see LUCENE-2075): to achieve perfect LRU(N) it requires 2*N storage. But, this approach is relatively simple and seems in practice to not grow unbounded in size when under hideously high load. @lucene.internal

LUCENENET specific class to nest the so it can be accessed without referencing the generic closing types of .

Object providing clone(); the key class must subclass this.

Provides methods for sanity checking that entries in the FieldCache are not wasteful or inconsistent. Lucene 2.9 Introduced numerous enhancements into how the FieldCache is used by the low levels of Lucene searching (for Sorting and ValueSourceQueries) to improve both the speed for Sorting, as well as reopening of IndexReaders. But these changes have shifted the usage of FieldCache from "top level" IndexReaders (frequently a MultiReader or DirectoryReader) down to the leaf level SegmentReaders. As a result, existing applications that directly access the FieldCache may find RAM usage increase significantly when upgrading to 2.9 or Later. This class provides an API for these applications (or their Unit tests) to check at run time if the FieldCache contains "insane" usages of the FieldCache. @lucene.experimental

If set, estimate size for all objects will be calculated.

Quick and dirty convenience method

Quick and dirty convenience method that instantiates an instance with "good defaults" and uses it to test the s

Tests a CacheEntry[] for indication of "insane" cache usage. NOTE:FieldCache CreationPlaceholder objects are ignored. (:TODO: is this a bad idea? are we masking a real problem?)

Internal helper method used by check that iterates over and generates a of instances accordingly. The are used to populate the objects.

Internal helper method used by check that iterates over the keys of and generates a of instances whenever two (or more) instances are found that have an ancestry relationships.

Checks if the is an , and if so will walk the hierarchy of subReaders building up a list of the objects returned by seed.CoreCacheKey

Simple pair object for using "readerKey + fieldName" a Map key

Simple container for a collection of related objects that in conjunction with each other represent some "insane" usage of the .

Type of insane behavior this object represents

Description of the insane behavior

objects which suggest a problem

Multi-Line representation of this object, starting with the Type and Msg, followed by each CacheEntry.ToString() on it's own line prefaced by a tab character

An Enumeration of the different types of "insane" behavior that may be detected in a .

Indicates an overlap in cache usage on a given field in sub/super readers.

Indicates entries have the same reader+fieldname but different cached values. This can happen if different datatypes, or parsers are used -- and while it's not necessarily a bug it's typically an indication of a possible problem. NOTE: Only the reader, fieldname, and cached value are actually tested -- if two cache entries have different parsers or datatypes but the cached values are the same Object (== not just Equal()) this method does not consider that a red flag. This allows for subtle variations in the way a Parser is specified (null vs DEFAULT_INT64_PARSER, etc...)

Indicates an expected bit of "insanity". This may be useful for clients that wish to preserve/log information about insane usage but indicate that it was expected.

An implementation that filters elements with a boolean predicate.

Initializes a new instance of with the specified and .

Returns true, if this element should be set to by .

An implementation that filters elements with a boolean predicate.

Returns true, if this element should be set to by .

BitSet of fixed length (numBits), backed by accessible () long[], accessed with an int index, implementing and . If you need to manage more than 2.1B bits, use . @lucene.internal

A which iterates over set bits in a .

Creates an iterator over the given .

Creates an iterator over the given array of bits.

If the given is large enough to hold , returns the given bits, otherwise returns a new which can hold the requested number of bits. NOTE: the returned bitset reuses the underlying of the given if possible. Also, calling on the returned bits may return a value greater than .

Returns the number of 64 bit words it would take to hold

Returns the popcount or cardinality of the intersection of the two sets. Neither set is modified.

Returns the popcount or cardinality of the union of the two sets. Neither set is modified.

Returns the popcount or cardinality of "a and not b" or "intersection(a, not(b))". Neither set is modified.

This DocIdSet implementation is cacheable.

Expert.

Gets the number of set bits. NOTE: this visits every in the backing bits array, and the result is not internally cached!

Returns the index of the first set bit starting at the index specified. -1 is returned if there are no more set bits.

Returns the index of the last set bit before or on the index specified. -1 is returned if there are no more set bits.

Does in-place OR of the bits provided by the iterator.

this = this OR other

this = this XOR other

Does in-place XOR of the bits provided by the iterator.

Does in-place AND of the bits provided by the iterator.

Returns true if the sets have any elements in common

this = this AND other

Does in-place AND NOT of the bits provided by the iterator.

this = this AND NOT other

Flips a range of bits

Lower index One-past the last bit to flip

Sets a range of bits

Lower index One-past the last bit to set

Clears a range of bits.

Lower index One-past the last bit to clear

Returns true if both sets have the same bits set

Builds a minimal FST (maps an term to an arbitrary output) from pre-sorted terms with outputs. The FST becomes an FSA if you use NoOutputs. The FST is written on-the-fly into a compact serialized format byte array, which can be saved to / loaded from a Directory or used directly for traversal. The FST is always finite (no cycles). NOTE: The algorithm is described at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.3698 The parameterized type is the output type. See the subclasses of . FSTs larger than 2.1GB are now possible (as of Lucene 4.2). FSTs containing more than 2.1B nodes are also now possible, however they cannot be packed. @lucene.experimental

Instantiates an FST/FSA builder without any pruning. A shortcut to with pruning options turned off.

Instantiates an FST/FSA builder with all the possible tuning and construction tweaks. Read parameter documentation carefully.

The input type (transition labels). Can be anything from enumeration. Shorter types will consume less memory. Strings (character sequences) are represented as (full unicode codepoints). If pruning the input graph during construction, this threshold is used for telling if a node is kept or pruned. If transition_count(node) >= minSuffixCount1, the node is kept. (Note: only Mike McCandless knows what this one is really doing...) If true, the shared suffixes will be compacted into unique paths. this requires an additional RAM-intensive hash map for lookups in memory. Setting this parameter to false creates a single suffix path for all input sequences. this will result in a larger FST, but requires substantially less memory and CPU during building. Only used if is true. Set this to true to ensure FST is fully minimal, at cost of more CPU and more RAM during building. Only used if is true. Set this to to ensure FST is fully minimal, at cost of more CPU and more RAM during building. The output type for each input sequence. Applies only if building an FST. For FSA, use and as the singleton output object. Pass true to create a packed FST. How to trade speed for space when building the FST. this option is only relevant when doPackFST is true. Pass false to disable the array arc optimization while building the FST; this will make the resulting FST smaller but slower to traverse. How many bits wide to make each block in the ; if you know the FST will be large then make this larger. For example 15 bits = 32768 byte pages.

It's OK to add the same input twice in a row with different outputs, as long as outputs impls the merge method. Note that input is fully consumed after this method is returned (so caller is free to reuse), but output is not. So if your outputs are changeable (eg or ) then you cannot reuse across calls.

Returns final FST. NOTE: this will return null if nothing is accepted by the FST.

LUCENENET specific type used to access nested types of without referring to its generic closing type.

Expert: this is invoked by Builder whenever a suffix is serialized.

Expert: holds a pending (seen but not yet serialized) arc.

Expert: holds a pending (seen but not yet serialized) Node.

this node's depth, starting from the automaton root.

The node's depth starting from the automaton root. Needed for LUCENE-2934 (node expansion based on conditions other than the fanout size).

An FST implementation where each output is a sequence of bytes. @lucene.experimental

Enumerates all input () + output pairs in an FST. @lucene.experimental

doFloor controls the behavior of advance: if it's true doFloor is true, advance positions to the biggest term before target.

Seeks to smallest term that's >= target.

Seeks to biggest term that's <= target.

Seeks to exactly this term, returning null if the term doesn't exist. This is faster than using or because it short-circuits as soon the match is not found.

LUCENENET specific. This class is to mimic Java's ability to specify nested classes of Generics without having to specify the generic type (i.e. BytesRefFSTEnum.InputOutput{T} rather than BytesRefFSTEnum{T}.InputOutput{T})

Holds a single input () + output pair.

Pulls bytes from the provided .

Absolute write byte; you must ensure dest is < max position written so far.

Absolute writeBytes without changing the current position. Note: this cannot "grow" the bytes, so you must only call it on already written parts.

Absolute copy bytes self to self, without changing the position. Note: this cannot "grow" the bytes, so must only call it on already written parts.

Writes an at the absolute position without changing the current pointer. NOTE: This was writeInt() in Lucene

Reverse from , inclusive, to , inclusive.

Pos must be less than the max position written so far! i.e., you cannot "grow" the file with this!

Writes all of our bytes to the target .

An FST implementation where each output is a sequence of characters. @lucene.experimental

Reads from a single .

Represents an finite state machine (FST), using a compact format. The format is similar to what's used by Morfologik (http://sourceforge.net/projects/morfologik). See the FST package documentation for some simple examples. @lucene.experimental

Load a previously saved FST.

Load a previously saved FST; allows you to control the size of the pages used to hold the FST bytes.

Returns bytes used to represent the FST

Writes an automaton to a file.

returns true if the node at this address has any outgoing arcs

Fills virtual 'start' arc, ie, an empty incoming arc to the FST's start node

Follows the arc and reads the last arc of its target; this changes the provided (2nd arg) in-place and returns it.

Returns the second argument ().

Follow the arc and read the first arc of its target; this changes the provided (2nd arg) in-place and returns it.

Returns the second argument ().

Checks if arc's target state is in expanded (or vector) format.

Returns true if arc points to a state in an expanded array format.

In-place read; returns the arc.

Peeks at next arc's label; does not alter . Do not call this if arc.IsLast!

Never returns null, but you should never call this if arc.IsLast is true.

Finds an arc leaving the incoming , replacing the arc in place. this returns null if the arc was not found, else the incoming .

Nodes will be expanded if their depth (distance from the root node) is <= this value and their number of arcs is >= . Fixed array consumes more RAM but enables binary search on the arcs (instead of a linear scan) on lookup by arc label.

true if should be stored in an expanded (array) form.

Returns a for this FST, positioned at position 0.

Creates a packed FST

Expert: creates an FST by packing this one. This process requires substantial additional RAM (currently up to ~8 bytes per node depending on acceptableOverheadRatio), but then should produce a smaller FST. The implementation of this method uses ideas from Smaller Representation of Finite State Automata, which describes techniques to reduce the size of a FST. However, this is not a strict implementation of the algorithms described in this paper.

LUCENENET specific: This new base class is to mimic Java's ability to use nested types without specifying a type parameter. i.e. FST.BytesReader instead of FST<BytesRef>.BytesReader

Changed numBytesPerArc for array'd case from byte to . NOTE: This was VERSION_INT_NUM_BYTES_PER_ARC in Lucene

Write BYTE2 labels as 2-byte , not v. NOTE: This was VERSION_SHORT_BYTE2_LABELS in Lucene

Added optional packed format.

Changed from to v for encoding arc targets. Also changed maxBytesPerArc from int to v in the array case. NOTE: This was VERSION_VINT_TARGET in Lucene

Never serialized; just used to represent the virtual final node w/ no arcs:

Never serialized; just used to represent the virtual non-final node w/ no arcs:

If arc has this label then that arc is final/accepted

returns true if the node at this address has any outgoing arcs

Reads an automaton from a file.

Reads bytes stored in an FST.

Current read position

Returns true if this reader uses reversed bytes under-the-hood.

Skips bytes.

Specifies allowed range of each int input label for this FST.

Represents a single arc.

From node (ord or address); currently only used when building an FST w/ willPackFST=true:

To node (ord or address)

address (into the byte[]), or ord/address if label == END_LABEL

This is non-zero if current arcs are fixed array:

Return this

Can Next() and Advance() through the terms in an FST @lucene.experimental

doFloor controls the behavior of advance: if it's true doFloor is true, advance positions to the biggest term before target.

Rewinds enum state to match the shared prefix between current term and target term

Seeks to smallest term that's >= target.

Seeks to largest term that's <= target.

Seeks to exactly target term.

Appends current arc, and then recurses from its target, appending first arc all the way to the final node

Recurses from current arc, appending last arc all the way to the first final node

An FST implementation where each output is a sequence of s. NOTE: This was IntSequenceOutputs in Lucene @lucene.experimental

Enumerates all input () + output pairs in an FST. NOTE: This was IntsRefFSTEnum{T} in Lucene @lucene.experimental

doFloor controls the behavior of advance: if it's true doFloor is true, advance positions to the biggest term before target.

Seeks to smallest term that's >= target.

Seeks to biggest term that's <= target.

Seeks to exactly this term, returning null if the term doesn't exist. This is faster than using or because it short-circuits as soon the match is not found.

LUCENENET specific. This class is to mimic Java's ability to specify nested classes of Generics without having to specify the generic type (i.e. Int32sRefFSTEnum.InputOutput{T} rather than Int32sRefFSTEnum{T}.InputOutput{T}) NOTE: This was Int32sRefFSTEnum{T} in Lucene

Holds a single input () + output pair.

Used to dedup states (lookup already-frozen states)

hash code for an unfrozen node. this must be identical to the frozen case (below)!!

hash code for a frozen node

called only by rehash

A null FST implementation; use this if you just want to build an FSA. @lucene.experimental

NodeHash calls hashCode for this output; we fix this so we get deterministic hashing.

Represents the outputs for an FST, providing the basic algebra required for building and traversing the FST. Note that any operation that returns NO_OUTPUT must return the same singleton object from . LUCENENET IMPORTANT: If is a collection type, it must implement in order to properly compare its nested values. @lucene.experimental

Eg common("foobar", "food") -> "foo"

Eg subtract("foobar", "foo") -> "bar"

Eg add("foo", "bar") -> "foobar"

Encode an output value into a .

Encode an final node output value into a . By default this just calls .

Decode an output value previously written with .

Decode an output value previously written with . By default this just calls .

NOTE: this output is compared with == so you must ensure that all methods return the single object if it's really no output

An FST implementation, holding two other outputs. @lucene.experimental

Holds a single pair of two outputs.

Create a new

An FST implementation where each output is a non-negative value. NOTE: This was PositiveIntOutputs in Lucene @lucene.experimental

Reads in reverse from a single .

Static helper methods. @lucene.experimental

Looks up the output for this input, or null if the input is not accepted.

Looks up the output for this input, or null if the input is not accepted

Reverse lookup (lookup by output instead of by input), in the special case when your FSTs outputs are strictly ascending. This locates the input/output pair where the output is equal to the target, and will return null if that output does not exist. NOTE: this only works with , only works when the outputs are ascending in order with the inputs. For example, simple ordinals (0, 1, 2, ...), or file offets (when appending to a file) fit this.

Expert: like except reusing , initial and scratch Arc, and result.

Represents a path in TopNSearcher. @lucene.experimental

Sole constructor

Compares first by the provided comparer, and then tie breaks by .

Utility class to find top N shortest paths from start point(s).

Creates an unbounded TopNSearcher

the to search on the number of top scoring entries to retrieve the maximum size of the queue of possible top entries the comparer to select the top N

If back plus this arc is competitive then add to queue:

Adds all leaving arcs, including 'finished' arc, if the node is final, from this node into the queue.

Holds a single input () + output, returned by .

Holds the results for a top N search using

true iff this is a complete result ie. if the specified queue size was large enough to find the complete list of results. this might be false if the rejected too many results.

The top results

Starting from node, find the top N min cost completions to a final node.

Dumps an to a GraphViz's dot language description for visualization. Example of use:


             using (TextWriter sw = new StreamWriter("out.dot"))
             {
                 Util.ToDot(fst, sw, true, true);
             }

and then, from command line:


             dot -Tpng -o out.png out.dot

Note: larger FSTs (a few thousand nodes) won't even render, don't bother. If the FST is > 2.1 GB in size then this method will throw strange exceptions. See also http://www.graphviz.org/.

If true, the resulting dot file will try to order states in layers of breadth-first traversal. This may mess up arcs, but makes the output FST's structure a bit clearer. If true states will have labels equal to their offsets in their binary format. Expands the graph considerably.

Emit a single state in the dot language.

Ensures an arc's label is indeed printable (dot uses US-ASCII).

Just maps each UTF16 unit (char) to the s in an .

Decodes the Unicode codepoints from the provided and places them in the provided scratch , which must not be null, returning it.

Just takes unsigned byte values from the and converts into an . NOTE: This was toIntsRef() in Lucene

Just converts to ; you must ensure the values fit into a .

Reads the first arc greater or equal that the given label into the provided arc in place and returns it iff found, otherwise return null.

the label to ceil on the fst to operate on the arc to follow reading the label from the arc to read into in place the fst's

A that can be used to build a . @lucene.internal

The bytes

The length

Create a with the given initial capacity.

Provides support for converting byte sequences to s and back again. The resulting s preserve the original byte sequences' sort order. The s are constructed using a Base 8000h encoding of the original binary data - each char of an encoded represents a 15-bit chunk from the byte sequence. Base 8000h was chosen because it allows for all lower 15 bits of char to be used without restriction; the surrogate range [U+D8000-U+DFFF] does not represent valid chars, and would require complicated handling to avoid them and allow use of char's high bit. Although unset bits are used as padding in the final char, the original byte sequence could contain trailing bytes with no set bits (null bytes): padding is indistinguishable from valid information. To overcome this problem, a char is appended, indicating the number of encoded bytes in the final content char. @lucene.experimental

Returns the number of chars required to encode the given s.

Byte sequence to be encoded Initial offset into Number of bytes in The number of chars required to encode the number of s.

Returns the number of chars required to encode the given s.

sequence to be encoded Initial offset into Number of sbytes in The number of chars required to encode the number of s.

Returns the number of s required to decode the given char sequence.

Char sequence to be decoded Initial offset Number of characters The number of s required to decode the given char sequence

Encodes the input sequence into the output char sequence. Before calling this method, ensure that the output array has sufficient capacity by calling .

sequence to be encoded Initial offset into Number of bytes in sequence to store encoded result Initial offset into outputArray Length of output, must be GetEncodedLength(inputArray, inputOffset, inputLength)

Encodes the input sequence into the output char sequence. Before calling this method, ensure that the output array has sufficient capacity by calling .

sequence to be encoded Initial offset into Number of bytes in sequence to store encoded result Initial offset into outputArray Length of output, must be getEncodedLength

Decodes the input sequence into the output sequence. Before calling this method, ensure that the output array has sufficient capacity by calling .

sequence to be decoded Initial offset into Number of chars in sequence to store encoded result Initial offset into outputArray Length of output, must be GetDecodedLength(inputArray, inputOffset, inputLength)

Decodes the input char sequence into the output sbyte sequence. Before calling this method, ensure that the output array has sufficient capacity by calling .

Debugging API for Lucene classes such as and . NOTE: Enabling infostreams may cause performance degradation in some components. @lucene.internal

Instance of that does no logging at all.

Prints a message

Returns true if messages are enabled and should be posted to .

Gets or Sets the default used by a newly instantiated classes.

Disposes this

Clones this

implementation based on the merge-sort algorithm that merges in place (no extra memory will be allocated). Small arrays are sorted with insertion sort. @lucene.internal

Create a new

Sort the slice which starts at (inclusive) and ends at (exclusive).

A pool for blocks similar to . NOTE: This was IntBlockPool in Lucene @lucene.internal

NOTE: This was INT_BLOCK_SHIFT in Lucene

NOTE: This was INT_BLOCK_SIZE in Lucene

NOTE: This was INT_BLOCK_MASK in Lucene

Abstract class for allocating and freeing blocks.

NOTE: This was recycleIntBlocks() in Lucene

NOTE: This was getIntBlock() in Lucene

A simple that never recycles.

Creates a new with a default block size

NOTE: This was recycleIntBlocks() in Lucene

Array of buffers currently used in the pool. Buffers are allocated if needed don't modify this outside of this class.

Index into the buffers array pointing to the current buffer used as the head.

Pointer to the current position in head buffer NOTE: This was intUpto in Lucene

Current head buffer.

Current head offset. NOTE: This was intOffset in Lucene

Creates a new with a default .

Creates a new with the given .

Resets the pool to its initial state reusing the first buffer. Calling is not needed after reset.

Expert: Resets the pool to its initial state reusing the first buffer.

If true the buffers are filled with 0. this should be set to true if this pool is used with . If true the first buffer will be reused and calling is not needed after reset if the block pool was used before ie. was called before.

Creates a new slice with the given starting size and returns the slices offset in the pool.

An array holding the offset into the to quickly navigate to the next slice level.

An array holding the level sizes for slices.

The first level size for new slices.

Allocates a new slice from the given offset.

A that allows to write multiple integer slices into a given . @lucene.internal

Writes the given value into the slice and resizes the slice if needed NOTE: This was writeInt() in Lucene

Starts a new slice and returns the start offset. The returned value should be used as the start offset to initialize a .

Returns the offset of the currently written slice. The returned value should be used as the end offset to initialize a once this slice is fully written or to reset the this writer if another slice needs to be written.

A that can read slices written by a . @lucene.internal

Creates a new on the given pool.

Resets the reader to a slice give the slices absolute start and end offset in the pool.

Returns true if the current slice is fully read. If this method returns true should not be called again on this slice.

Reads the next from the current slice and returns it. NOTE: This was readInt() in Lucene

implementation based on a variant of the quicksort algorithm called introsort: when the recursion level exceeds the log of the length of the array to sort, it falls back to heapsort. This prevents quicksort from running into its worst-case quadratic runtime. Small arrays are sorted with insertion sort. @lucene.internal

Create a new .

Sort the slice which starts at (inclusive) and ends at (exclusive).

Save the value at slot so that it can later be used as a pivot, see .

Compare the pivot with the slot at , similarly to Compare(i, j) ().

Represents , as a slice (offset + length) into an existing . The member should never be null; use if necessary. NOTE: This was IntsRef in Lucene @lucene.internal

An empty integer array for convenience. NOTE: This was EMPTY_INTS in Lucene

The contents of the . Should never be null. NOTE: This was ints (field) in Lucene

Offset of first valid integer.

Length of used s.

Create a with .

Create a pointing to a new array of size . Offset and length will both be zero.

This instance will directly reference w/o making a copy. should not be null.

Returns a shallow clone of this instance (the underlying s are not copied and will be shared by both the returned object and this object.

NOTE: This was intsEquals() in Lucene

Signed order comparison.

NOTE: This was copyInts() in Lucene

Used to grow the reference array. In general this should not be used as it does not take the offset into account. @lucene.internal

Creates a new that points to a copy of the s from The returned will have a length of other.Length and an offset of zero.

Performs internal consistency checks. Always returns true (or throws )

This class emulates the new Java 7 "Try-With-Resources" statement. Remove once Lucene is on Java 7. @lucene.internal

UTF-8 instance to prevent repeated lookups and match Java's behavior with respect to a lack of a byte-order mark (BOM).

UTF-8 charset string. Where possible, use instead, as using the constant may slow things down.

Disposes all given IDisposables, suppressing all thrown exceptions. Some of the IDisposables may be null, they are ignored. After everything is disposed, method either throws , if one is supplied, or the first of suppressed exceptions, or completes normally. Sample usage:


            IDisposable resource1 = null, resource2 = null, resource3 = null;
            ExpectedException priorE = null;
            try
            {
                resource1 = ...; resource2 = ...; resource3 = ...; // Acquisition may throw ExpectedException
                ..do..stuff.. // May throw ExpectedException
            }
            catch (ExpectedException e)
            {
                priorE = e;
            }
            finally
            {
                IOUtils.CloseWhileHandlingException(priorE, resource1, resource2, resource3);
            }

null or an exception that will be rethrown after method completion. Objects to call on.

Disposes all given s, suppressing all thrown exceptions.

Disposes all given s. Some of the s may be null; they are ignored. After everything is closed, the method either throws the first exception it hit while closing, or completes normally if there were no exceptions.

Objects to call on

Disposes all given s.

Disposes all given s, suppressing all thrown exceptions. Some of the s may be null, they are ignored.

Objects to call on

Disposes all given s, suppressing all thrown exceptions.


            IDisposable resource1 = null, resource2 = null, resource3 = null;
            ExpectedException priorE = null;
            try
            {
                resource1 = ...; resource2 = ...; resource3 = ...; // Acquisition may throw ExpectedException
                ..do..stuff.. // May throw ExpectedException
            }
            catch (ExpectedException e)
            {
                priorE = e;
            }
            finally
            {
                IOUtils.DisposeWhileHandlingException(priorE, resource1, resource2, resource3);
            }

null or an exception that will be rethrown after method completion. Objects to call on.

Disposes all given s, suppressing all thrown exceptions.

Objects to call on

Disposes all given s.

Disposes all given s, suppressing all thrown exceptions. Some of the s may be null, they are ignored.

Objects to call on

Disposes all given s, suppressing all thrown exceptions.

Since there's no C# equivalent of Java's Exception.AddSuppressed, we add the suppressed exceptions to a data field via the method. The exceptions can be retrieved by calling or .

this exception should get the suppressed one added the suppressed exception

Wrapping the given in a reader using a . Unlike Java's defaults this reader will throw an exception if your it detects the read charset doesn't match the expected . Decoding readers are useful to load configuration files, stopword lists or synonym files to detect character set problems. However, its not recommended to use as a common purpose reader.

The stream to wrap in a reader The expected charset A wrapping reader

Opens a for the given using a . Unlike Java's defaults this reader will throw an exception if your it detects the read charset doesn't match the expected . Decoding readers are useful to load configuration files, stopword lists or synonym files to detect character set problems. However, its not recommended to use as a common purpose reader.

The file to open a reader on The expected charset A reader to read the given file

Opens a for the given resource using a . Unlike Java's defaults this reader will throw an exception if your it detects the read charset doesn't match the expected . Decoding readers are useful to load configuration files, stopword lists or synonym files to detect character set problems. However, its not recommended to use as a common purpose reader.

The class used to locate the resource The resource name to load The expected charset A reader to read the given file

Deletes all given files, suppressing all thrown s. Note that the files should not be null.

Copy one file's contents to another file. The target will be overwritten if it exists. The source must exist.

Simple utilty method that takes a previously caught and rethrows either or an unchecked exception. If the argument is null then this method does nothing.

Simple utilty method that takes a previously caught and rethrows it as an unchecked exception. If the argument is null then this method does nothing.

BitSet of fixed length (), backed by accessible () , accessed with a index. Use it only if you intend to store more than 2.1B bits, otherwise you should use . NOTE: This was LongBitSet in Lucene @lucene.internal

If the given is large enough to hold , returns the given , otherwise returns a new which can hold the requested number of bits. NOTE: the returned bitset reuses the underlying of the given if possible. Also, reading on the returned bits may return a value greater than .

Returns the number of 64 bit words it would take to hold .

Returns the number of bits stored in this bitset.

Expert.

Gets the number of set bits. NOTE: this visits every long in the backing bits array, and the result is not internally cached!

Returns the index of the first set bit starting at the specified. -1 is returned if there are no more set bits.

Returns the index of the last set bit before or on the specified. -1 is returned if there are no more set bits.

this = this OR other

this = this XOR other

Returns true if the sets have any elements in common

this = this AND other

this = this AND NOT other

Flips a range of bits

Lower index One-past the last bit to flip

Sets a range of bits

Lower index One-past the last bit to set

Clears a range of bits.

Lower index One-past the last bit to clear

Returns true if both sets have the same bits set

Represents , as a slice (offset + length) into an existing . The member should never be null; use if necessary. NOTE: This was LongsRef in Lucene @lucene.internal

An empty array for convenience NOTE: This was EMPTY_LONGS in Lucene

The contents of the . Should never be null. NOTE: This was longs (field) in Lucene

Offset of first valid long.

Length of used longs.

Create a with

Create a pointing to a new array of size . Offset and length will both be zero.

This instance will directly reference w/o making a copy. should not be null.

Returns a shallow clone of this instance (the underlying s are not copied and will be shared by both the returned object and this object.

NOTE: This was longsEquals() in Lucene

Signed order comparison

NOTE: This was copyLongs() in Lucene

Used to grow the reference array. In general this should not be used as it does not take the offset into account. @lucene.internal

Creates a new that points to a copy of the s from . The returned will have a length of other.Length and an offset of zero.

Performs internal consistency checks. Always returns true (or throws )

Abstraction over an array of s. This class extends so that we don't need to add another level of abstraction every time we want eg. to use the utility classes to represent a instance. NOTE: This was LongValues in Lucene @lucene.internal

Get value at .

Helper class for keeping Lists of Objects associated with keys. WARNING: this CLASS IS NOT THREAD SAFE @lucene.internal

The backing store for this object. Direct access to the map backing this object.

Adds to the associated with key in the . If is not already in the map, a new will first be created.

The size of the associated with key once val is added to it.

Adds multiple to the associated with key in the . If is not already in the map, a new will first be created.

The size of the associated with key once val is added to it.

Math static utility methods.

Returns x <= 0 ? 0 : Math.Floor(Math.Log(x) / Math.Log(base)).

Must be > 1.

Calculates logarithm in a given with doubles.

Return the greatest common divisor of and , consistently with System.Numerics.BigInteger.GreatestCommonDivisor(System.Numerics.BigInteger, System.Numerics.BigInteger). NOTE: A greatest common divisor must be positive, but 2^64 cannot be expressed as a although it is the GCD of and 0 and the GCD of and . So in these 2 cases, and only them, this method will return .

Calculates inverse hyperbolic sine of a value. Special cases: If the argument is NaN, then the result is NaN. If the argument is zero, then the result is a zero with the same sign as the argument. If the argument is infinite, then the result is infinity with the same sign as the argument.

Calculates inverse hyperbolic cosine of a value. Special cases: If the argument is NaN, then the result is NaN. If the argument is +1, then the result is a zero. If the argument is positive infinity, then the result is positive infinity. If the argument is less than 1, then the result is NaN.

Calculates inverse hyperbolic tangent of a value. Special cases: If the argument is NaN, then the result is NaN. If the argument is zero, then the result is a zero with the same sign as the argument. If the argument is +1, then the result is positive infinity. If the argument is -1, then the result is negative infinity. If the argument's absolute value is greater than 1, then the result is NaN.

Provides a merged sorted view from several sorted iterators. If built with set to true and an element appears in multiple iterators then it is deduplicated, that is this iterator returns the sorted union of elements. If built with set to false then all elements in all iterators are returned. Caveats: The behavior is undefined if the iterators are not actually sorted. Null elements are unsupported. If is set to true and if a single iterator contains duplicates then they will not be deduplicated. When elements are deduplicated it is not defined which one is returned. If is set to false then the order in which duplicates are returned isn't defined. The caller is responsible for disposing the instances that are passed into the constructor, doesn't do it automatically. @lucene.internal

Extension of for live documents.

Sets the bit specified by to false.

index, should be non-negative and < . The result of passing negative or out of bounds values is undefined by this interface, just don't do it!

Base class for all mutable values. @lucene.internal

implementation of type .

implementation of type . NOTE: This was MutableValueFloat in Lucene

implementation of type . NOTE: This was MutableValueInt in Lucene

implementation of type . NOTE: This was MutableValueLong in Lucene

implementation of type .

This is a helper class to generate prefix-encoded representations for numerical values and supplies converters to represent float/double values as sortable integers/longs. To quickly execute range queries in Apache Lucene, a range is divided recursively into multiple intervals for searching: The center of the range is searched only with the lowest possible precision in the trie, while the boundaries are matched more exactly. this reduces the number of terms dramatically. This class generates terms to achieve this: First the numerical integer values need to be converted to bytes. For that integer values (32 bit or 64 bit) are made unsigned and the bits are converted to ASCII chars with each 7 bit. The resulting byte[] is sortable like the original integer value (even using UTF-8 sort order). Each value is also prefixed (in the first char) by the shift value (number of bits removed) used during encoding. To also index floating point numbers, this class supplies two methods to convert them to integer values by changing their bit layout: , . You will have no precision loss by converting floating point numbers to integers and back (only that the integer form is not usable). Other data types like dates can easily converted to s or s (e.g. date to long: ). For easy usage, the trie algorithm is implemented for indexing inside that can index , , , and . For querying, and implement the query part for the same data types. This class can also be used, to generate lexicographically sortable (according to ) representations of numeric data types for other usages (e.g. sorting). @lucene.internal @since 2.9, API changed non backwards-compliant in 4.0

The default precision step used by , , , , , , and .

Longs are stored at lower precision by shifting off lower bits. The shift count is stored as SHIFT_START_INT64+shift in the first byte NOTE: This was SHIFT_START_LONG in Lucene

The maximum term length (used for buffer size) for encoding values. NOTE: This was BUF_SIZE_LONG in Lucene

Integers are stored at lower precision by shifting off lower bits. The shift count is stored as SHIFT_START_INT32+shift in the first byte NOTE: This was SHIFT_START_INT in Lucene

The maximum term length (used for buffer size) for encoding values. NOTE: This was BUF_SIZE_INT in Lucene

Returns prefix coded bits after reducing the precision by bits. This is method is used by . After encoding, bytes.Offset will always be 0. NOTE: This was longToPrefixCoded() in Lucene

The numeric value How many bits to strip from the right Will contain the encoded value

Returns prefix coded bits after reducing the precision by bits. This is method is used by . After encoding, bytes.Offset will always be 0. NOTE: This was intToPrefixCoded() in Lucene

The numeric value How many bits to strip from the right Will contain the encoded value

Returns prefix coded bits after reducing the precision by bits. This is method is used by . After encoding, bytes.Offset will always be 0. NOTE: This was longToPrefixCodedBytes() in Lucene

The numeric value How many bits to strip from the right Will contain the encoded value

Returns prefix coded bits after reducing the precision by bits. This is method is used by . After encoding, bytes.Offset will always be 0. NOTE: This was intToPrefixCodedBytes() in Lucene

The numeric value How many bits to strip from the right Will contain the encoded value

Returns the shift value from a prefix encoded . NOTE: This was getPrefixCodedLongShift() in Lucene

if the supplied is not correctly prefix encoded.

Returns the shift value from a prefix encoded . NOTE: This was getPrefixCodedIntShift() in Lucene

if the supplied is not correctly prefix encoded.

Returns a from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. This method can be used to decode a term's value. NOTE: This was prefixCodedToLong() in Lucene

if the supplied is not correctly prefix encoded.

Returns an from prefixCoded bytes. Rightmost bits will be zero for lower precision codes. This method can be used to decode a term's value. NOTE: This was prefixCodedToInt() in Lucene

if the supplied is not correctly prefix encoded.

Converts a value to a sortable signed . The value is converted by getting their IEEE 754 floating-point "double format" bit layout and then some bits are swapped, to be able to compare the result as . By this the precision is not reduced, but the value can easily used as a . The sort order (including ) is defined by ; NaN is greater than positive infinity. NOTE: This was doubleToSortableLong() in Lucene

Converts a sortable back to a . NOTE: This was sortableLongToDouble() in Lucene

Converts a value to a sortable signed . The value is converted by getting their IEEE 754 floating-point "float format" bit layout and then some bits are swapped, to be able to compare the result as . By this the precision is not reduced, but the value can easily used as an . The sort order (including ) is defined by ; NaN is greater than positive infinity. NOTE: This was floatToSortableInt() in Lucene

Converts a sortable back to a . NOTE: This was sortableIntToFloat() in Lucene

Splits a long range recursively. You may implement a builder that adds clauses to a for each call to its method. This method is used by . NOTE: This was splitLongRange() in Lucene

Splits an range recursively. You may implement a builder that adds clauses to a for each call to its method. This method is used by . NOTE: This was splitIntRange() in Lucene

This helper does the splitting for both 32 and 64 bit.

Helper that delegates to correct range builder.

Callback for . You need to override only one of the methods. NOTE: This was LongRangeBuilder in Lucene @lucene.internal @since 2.9, API changed non backwards-compliant in 4.0

Override this method, if you like to receive the already prefix encoded range bounds. You can directly build classical (inclusive) range queries from them.

Override this method, if you like to receive the raw long range bounds. You can use this for e.g. debugging purposes (print out range bounds).

Callback for . You need to override only one of the methods. NOTE: This was IntRangeBuilder in Lucene @lucene.internal @since 2.9, API changed non backwards-compliant in 4.0

Override this method, if you like to receive the already prefix encoded range bounds. You can directly build classical range (inclusive) queries from them.

Override this method, if you like to receive the raw int range bounds. You can use this for e.g. debugging purposes (print out range bounds).

Filters the given by accepting only prefix coded 64 bit terms with a shift value of 0. NOTE: This was filterPrefixCodedLongs() in Lucene

The terms enum to filter A filtered that only returns prefix coded 64 bit terms with a shift value of 0.

Filters the given by accepting only prefix coded 32 bit terms with a shift value of 0. NOTE: This was filterPrefixCodedInts() in Lucene

The terms enum to filter A filtered that only returns prefix coded 32 bit terms with a shift value of 0.

On-disk sorting of byte arrays. Each byte array (entry) is a composed of the following fields: (two bytes) length of the following byte array, exactly the above count of bytes for the sequence to be sorted.

The default encoding (UTF-8 without a byte order mark) used by and . This encoding should always be used when calling the constructor overloads that accept or .

The recommended buffer size to use on or when creating a and .

Convenience constant for megabytes

Convenience constant for gigabytes

Minimum recommended buffer size for sorting.

Absolute minimum required buffer size for sorting.

Maximum number of temporary files before doing an intermediate merge.

A bit more descriptive unit for constructors.

Creates a in MB. The given values must be > 0 and < 2048.

Approximately half of the currently available free heap, but no less than . However if current heap allocation is insufficient or if there is a large portion of unallocated heap-space available for sorting consult with max allowed heap size.

Sort info (debugging mostly).

Number of temporary files created when merging partitions

Number of partition merges

Number of lines of data read

Time spent merging sorted partitions (in milliseconds)

Time spent sorting data (in milliseconds)

Total time spent (in milliseconds)

Time spent in i/o read (in milliseconds)

Read buffer size (in bytes)

Create a new (with empty statistics) for debugging.

is null.

Returns a string representation of this object.

Default comparer: sorts in binary (codepoint) order

LUCENENET specific - cache the temp directory path so we can return it from a property.

Defaults constructor.

Defaults constructor with a custom comparer.

All-details constructor.

, or is null. bytes are less than . is less than 2.

All-details constructor.

, or is null. bytes are less than . is less than 2.

Sort input to output, explicit hint for the buffer size. The amount of allocated memory may deviate from the hint (may be smaller or larger).

The input stream. Must be both seekable and readable. The output stream. Must be seekable and writable. or is null. or is not seekable. -or- is not readable. -or- is not writable. or is not seekable.

Sort input to output, explicit hint for the buffer size. The amount of allocated memory may deviate from the hint (may be smaller or larger).

or is null.

Returns the default temporary directory. By default, the System's temp folder.

Copies one file to another.

Sort a single partition in-memory.

Merge a list of sorted temporary files (partitions) into an output file.

Read in a single partition of data.

Utility class to emit length-prefixed entries to an output stream for sorting. Complementary to .

Constructs a to the provided .

is null.

Constructs a to the provided .

is null.

Constructs a to the provided file path.

is null.

Constructs a to the provided .

is null.

Constructs a to the provided . NOTE: To match Lucene, pass the 's constructor the , which is UTF-8 without a byte order mark.

is null.

Writes a .

is null.

Writes a byte array.

is null.

Writes a byte array. The length is written as a , followed by the bytes.

is null. or is less than 0. and refer to a position outside of the array.

Disposes the provided if it is .

Utility class to read length-prefixed entries from an input. Complementary to .

Constructs a from the provided .

is null.

Constructs a from the provided .

is null.

Constructs a from the provided .

is null or whitespace.

Constructs a from the provided .

is null or whitespace.

Constructs a from the provided . NOTE: To match Lucene, pass the 's constructor the , which is UTF-8 without a byte order mark.

is null.

Reads the next entry into the provided . The internal storage is resized if needed.

Returns false if EOF occurred when trying to read the header of the next sequence. Returns true otherwise. If the file ends before the full sequence is read. is null.

Reads the next entry and returns it if successful.

Returns null if EOF occurred before the next entry could be read. If the file ends before the full sequence is read.

Disposes the provided if it is .

Returns the comparer in use to sort entries

An "open" BitSet implementation that allows direct access to the array of words storing the bits. NOTE: This can be used in .NET any place where a java.util.BitSet is used in Java. Unlike java.util.BitSet, the fact that bits are packed into an array of longs is part of the interface. This allows efficient implementation of other algorithms by someone other than the author. It also allows one to efficiently implement alternate serialization or interchange formats. is faster than java.util.BitSet in most operations and *much* faster at calculating cardinality of sets and results of set operations. It can also handle sets of larger cardinality (up to 64 * 2**32-1) The goals of are the fastest implementation possible, and maximum code reuse. Extra safety and encapsulation may always be built on top, but if that's built in, the cost can never be removed (and hence people re-implement their own version in order to get better performance).

Performance Results

Test system: Pentium 4, Sun Java 1.5_06 -server -Xbatch -Xmx64M BitSet size = 1,000,000 Results are java.util.BitSet time divided by OpenBitSet time. cardinality IntersectionCount Union NextSetBit Get GetIterator 50% full 3.36 3.96 1.44 1.46 1.99 1.58 1% full 3.31 3.90 1.04 0.99 Test system: AMD Opteron, 64 bit linux, Sun Java 1.5_06 -server -Xbatch -Xmx64M BitSet size = 1,000,000 Results are java.util.BitSet time divided by OpenBitSet time. cardinality IntersectionCount Union NextSetBit Get GetIterator 50% full 2.50 3.50 1.00 1.03 1.12 1.25 1% full 2.51 3.49 1.00 1.02

Constructs an large enough to hold .

Constructor: allocates enough space for 64 bits.

Constructs an from an existing . The first 64 bits are in long[0], with bit index 0 at the least significant bit, and bit index 63 at the most significant. Given a bit index, the word containing it is long[index/64], and it is at bit number index%64 within that word. are the number of elements in the array that contain set bits (non-zero longs). should be <= bits.Length, and any existing words in the array at position >= numWords should be zero.

This DocIdSet implementation is cacheable.

Returns the current capacity in bits (1 greater than the index of the last bit).

Returns the current capacity of this set. This is *not* equal to . NOTE: This is equivalent to size() or length() in Lucene.

Returns true if there are no set bits

Expert: returns the storing the bits.

Expert: gets the number of s in the array that are in use.

Returns true or false for the specified bit .

Returns true or false for the specified bit . The index should be less than the .

Returns true or false for the specified bit .

Returns true or false for the specified bit . The index should be less than the .

Returns 1 if the bit is set, 0 if not. The should be less than the .

Sets a bit, expanding the set size if necessary.

Sets the bit at the specified . The should be less than the .

Sets a range of bits, expanding the set size if necessary.

Lower index One-past the last bit to set

Clears a bit. The should be less than the .

Clears a bit, allowing access beyond the current set size without changing the size.

Clears a range of bits. Clearing past the end does not change the size of the set.

Lower index One-past the last bit to clear

Clears a range of bits. Clearing past the end does not change the size of the set.

Lower index One-past the last bit to clear

Sets a bit and returns the previous value. The should be less than the .

Flips a bit. The should be less than the .

Flips a bit, expanding the set size if necessary.

Flips a bit and returns the resulting bit value. The should be less than the .

Flips a range of bits, expanding the set size if necessary.

Lower index One-past the last bit to flip

Gets the number of set bits.

The number of set bits.

Returns the popcount or cardinality of the intersection of the two sets. Neither set is modified.

Returns the popcount or cardinality of the union of the two sets. Neither set is modified.

Returns the popcount or cardinality of "a and not b" or "intersection(a, not(b))". Neither set is modified.

Returns the popcount or cardinality of the exclusive-or of the two sets. Neither set is modified.

Returns the index of the first set bit starting at the specified. -1 is returned if there are no more set bits.

Returns the index of the first set bit starting downwards at the specified. -1 is returned if there are no more set bits.

this = this AND other

this = this OR other

Remove all elements set in other. this = this AND_NOT other.

this = this XOR other

see

returns true if the sets have any elements in common.

Expand the with the size given as a number of words (64 bit longs).

Ensure that the is big enough to hold numBits, expanding it if necessary.

Lowers numWords, the number of words in use, by checking for trailing zero words.

Returns the number of 64 bit words it would take to hold .

Returns true if both sets have the same bits set.

with added methods to bulk-update the bits from a . (DISI stands for ).

Construct an with its bits set from the doc ids of the given . Also give a maximum size one larger than the largest doc id for which a bit may ever be set on this .

Construct an with no bits set, and a given maximum size one larger than the largest doc id for which a bit may ever be set on this .

Perform an inplace OR with the doc ids from a given , setting the bit for each such doc id. These doc ids should be smaller than the maximum size passed to the constructor.

Perform an inplace AND with the doc ids from a given , leaving only the bits set for which the doc ids are in common. These doc ids should be smaller than the maximum size passed to the constructor.

Perform an inplace NOT with the doc ids from a given , clearing all the bits for each such doc id. These doc ids should be smaller than the maximum size passed to the constructor.

Perform an inplace XOR with the doc ids from a given , flipping all the bits for each such doc id. These doc ids should be smaller than the maximum size passed to the constructor.

An iterator to iterate over set bits in an . this is faster than for iterating over the complete set of bits, especially when the density of the bits set is high.

Common functionality shared by and . NOTE: This was AbstractAppendingLongBuffer in Lucene

Get the number of values that have been added to the buffer. NOTE: This was size() in Lucene.

Append a value to this buffer.

Bulk get: read at least one and at most s starting from into arr[off:off+len] and return the actual number of values that have been read.

Return an iterator over the values of this buffer.

Whether or not there are remaining values.

Return the next long in the buffer.

Return the number of bytes used by this instance.

Pack all pending values in this buffer. Subsequent calls to will fail.

NOTE: This was writeVLong() in Lucene.

Sole constructor.

the number of values of a single block, must be a multiple of 64.

Reset this writer to wrap . The block size remains unchanged. NOTE: When overriding this method, be aware that the constructor of this class calls a private method and not this virtual method. So if you need to override the behavior during the initialization, call your own private method from the constructor with whatever custom behavior you need.

Append a new long.

Flush all buffered data to disk. This instance is not usable anymore after this method has been called until has been called.

Return the number of values which have been added.

Base implementation for and . @lucene.internal

The number of values. NOTE: This was size() in Lucene.

Set value at .

Return the number of bytes used by this object.

Create a new copy of size based on the content of this buffer. This method is much more efficient than creating a new instance and copying values one by one.

Similar to .

Utility class to buffer a list of signed longs in memory. This class only supports appending and is optimized for the case where values are close to each other. NOTE: This was AppendingDeltaPackedLongBuffer in Lucene @lucene.internal

Create .

The initial number of pages. The size of a single page. An acceptable overhead ratio per value.

Create an with initialPageCount=16, pageSize=1024 and acceptableOverheadRatio=.

Create an with initialPageCount=16, pageSize=1024.

Utility class to buffer a list of signed longs in memory. This class only supports appending and is optimized for non-negative numbers with a uniform distribution over a fixed (limited) range. NOTE: This was AppendingPackedLongBuffer in Lucene @lucene.internal

Initialize a .

The initial number of pages. The size of a single page. An acceptable overhead ratio per value.

Create an with initialPageCount=16, pageSize=1024 and acceptableOverheadRatio=.

Create an with initialPageCount=16, pageSize=1024.

Provides random access to a stream written with . @lucene.internal

Sole constructor.

Returns approximate RAM bytes used.

Reader for sequences of s written with . @lucene.internal

NOTE: This was readVLong() in Lucene.

Sole constructor.

The number of values of a block, must be equal to the block size of the which has been used to write the stream.

Reset the current reader to wrap a stream of values contained in . The block size remains unchanged.

Skip exactly values.

Read the next value.

Read between 1 and values.

Return the offset of the next value to read.

A writer for large sequences of longs. The sequence is divided into fixed-size blocks and for each block, the difference between each value and the minimum value of the block is encoded using as few bits as possible. Memory usage of this class is proportional to the block size. Each block has an overhead between 1 and 10 bytes to store the minimum value and the number of bits per value of the block. Format: <BLock>^BlockCount BlockCount: ⌈ ValueCount / BlockSize ⌉ Block: <Header, (Ints)> Header: <Token, (MinValue)> Token: a byte (), first 7 bits are the number of bits per value (bitsPerValue). If the 8th bit is 1, then MinValue (see next) is 0, otherwise MinValue and needs to be decoded MinValue: a zigzag-encoded variable-length () whose value should be added to every int from the block to restore the original values Ints: If the number of bits per value is 0, then there is nothing to decode and all ints are equal to MinValue. Otherwise: BlockSize packed ints () encoded on exactly bitsPerValue bits per value. They are the subtraction of the original values and MinValue @lucene.internal

Sole constructor.

the number of values of a single block, must be a power of 2

Efficient sequential read/write of packed integers.

NOTE: This was longValueCount() in Lucene.

NOTE: This was longBlockCount() in Lucene.

NOTE: This was writeLong() in Lucene.

For every number of bits per value, there is a minimum number of blocks (b) / values (v) you need to write in order to reach the next block boundary: - 16 bits per value -> b=2, v=1 - 24 bits per value -> b=3, v=1 - 50 bits per value -> b=25, v=4 - 63 bits per value -> b=63, v=8 - ... A bulk read consists in copying iterations*v values that are contained in iterations*b blocks into a long[] (higher values of iterations are likely to yield a better throughput) => this requires n * (b + 8v) bytes of memory. This method computes iterations as ramBudget / (b + 8v) (since a long is 8 bytes).

Non-specialized for .

NOTE: This was longBlockCount() in Lucene.

NOTE: This was longValueCount() in Lucene.

Efficient sequential read/write of packed integers.

Non-specialized for .

NOTE: This was longBlockCount() in Lucene.

NOTE: This was longValueCount() in Lucene.

NOTE: This was readLong() in Lucene.

Direct wrapping of 16-bits values to a backing array. @lucene.internal

Direct wrapping of 32-bits values to a backing array. @lucene.internal

Direct wrapping of 64-bits values to a backing array. @lucene.internal

Direct wrapping of 8-bits values to a backing array. @lucene.internal

A decoder for an . @lucene.internal

NOTE: This was LOG2_LONG_SIZE in Lucene.

Construct a decoder for a given . The decoding index is set to just before the first encoded value.

The Elias-Fano encoder that is decoded.

The number of values encoded by the encoder.

The current decoding index. The first value encoded by has index 0. Only valid directly after , , , or returned another value than , or returned true.

The decoding index of the last decoded value, or as last set by .

The value at the current decoding index. Only valid when would return a valid result. This is only intended for use after returned true.

The value encoded at . The high value for the current decoding index.