Class IndexedDISI
DocIdSetIterator
which can return the index of the current
document, i.e. the ordinal of the current document among the list of documents that this iterator
can return. This is useful to implement sparse doc values by only having to encode values for
documents that actually have a value.
Implementation-wise, this DocIdSetIterator
is inspired of roaring bitmaps
and encodes ranges of 65536
documents independently and picks between 3
encodings depending on the density of the range:
ALL
if the range contains 65536 documents exactly,DENSE
if the range contains 4096 documents or more; in that case documents are stored in a bit set,SPARSE
otherwise, and the lower 16 bits of the doc IDs are stored in ashort
.
Only ranges that contain at least one value are encoded.
This implementation uses 6 bytes per document in the worst-case, which happens in the case that all ranges contain exactly one document.
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionprivate int
private long
private final long
private int
(package private) boolean
private int
private int
(package private) static final int
(package private) IndexedDISI.Method
private int
private int
private final IndexInput
The slice that stores theDocIdSetIterator
.private long
private int
Fields inherited from class org.apache.lucene.search.DocIdSetIterator
NO_MORE_DOCS
-
Constructor Summary
ConstructorsConstructorDescriptionIndexedDISI
(IndexInput slice, long cost) IndexedDISI
(IndexInput in, long offset, long length, long cost) -
Method Summary
Modifier and TypeMethodDescriptionint
advance
(int target) Advances to the first beyond the current whose document number is greater than or equal to target, and returns the document number itself.private void
advanceBlock
(int targetBlock) boolean
advanceExact
(int target) long
cost()
Returns the estimated cost of thisDocIdSetIterator
.int
docID()
Returns the following:-1
ifDocIdSetIterator.nextDoc()
orDocIdSetIterator.advance(int)
were not called yet.private static void
flush
(int block, FixedBitSet buffer, int cardinality, IndexOutput out) int
index()
int
nextDoc()
Advances to the next document in the set and returns the doc it is currently on, orDocIdSetIterator.NO_MORE_DOCS
if there are no more docs in the set.
NOTE: after the iterator has exhausted you should not call this method, as it may result in unpredicted behavior.private void
(package private) static void
writeBitSet
(DocIdSetIterator it, IndexOutput out) Methods inherited from class org.apache.lucene.search.DocIdSetIterator
all, empty, range, slowAdvance
-
Field Details
-
MAX_ARRAY_LENGTH
static final int MAX_ARRAY_LENGTH- See Also:
-
slice
The slice that stores theDocIdSetIterator
. -
cost
private final long cost -
block
private int block -
blockEnd
private long blockEnd -
nextBlockIndex
private int nextBlockIndex -
method
IndexedDISI.Method method -
doc
private int doc -
index
private int index -
exists
boolean exists -
word
private long word -
wordIndex
private int wordIndex -
numberOfOnes
private int numberOfOnes -
gap
private int gap
-
-
Constructor Details
-
IndexedDISI
IndexedDISI(IndexInput in, long offset, long length, long cost) throws IOException - Throws:
IOException
-
IndexedDISI
IndexedDISI(IndexInput slice, long cost) throws IOException - Throws:
IOException
-
-
Method Details
-
flush
private static void flush(int block, FixedBitSet buffer, int cardinality, IndexOutput out) throws IOException - Throws:
IOException
-
writeBitSet
- Throws:
IOException
-
docID
public int docID()Description copied from class:DocIdSetIterator
Returns the following:-1
ifDocIdSetIterator.nextDoc()
orDocIdSetIterator.advance(int)
were not called yet.DocIdSetIterator.NO_MORE_DOCS
if the iterator has exhausted.- Otherwise it should return the doc ID it is currently on.
- Specified by:
docID
in classDocIdSetIterator
-
advance
Description copied from class:DocIdSetIterator
Advances to the first beyond the current whose document number is greater than or equal to target, and returns the document number itself. Exhausts the iterator and returnsDocIdSetIterator.NO_MORE_DOCS
if target is greater than the highest document number in the set.The behavior of this method is undefined when called with
target ≤ current
, or after the iterator has exhausted. Both cases may result in unpredicted behavior.When
target > current
it behaves as if written:int advance(int target) { int doc; while ((doc = nextDoc()) < target) { } return doc; }
Some implementations are considerably more efficient than that.NOTE: this method may be called with
DocIdSetIterator.NO_MORE_DOCS
for efficiency by some Scorers. If your implementation cannot efficiently determine that it should exhaust, it is recommended that you check for that value in each call to this method.- Specified by:
advance
in classDocIdSetIterator
- Throws:
IOException
-
advanceExact
- Throws:
IOException
-
advanceBlock
- Throws:
IOException
-
readBlockHeader
- Throws:
IOException
-
nextDoc
Description copied from class:DocIdSetIterator
Advances to the next document in the set and returns the doc it is currently on, orDocIdSetIterator.NO_MORE_DOCS
if there are no more docs in the set.
NOTE: after the iterator has exhausted you should not call this method, as it may result in unpredicted behavior.- Specified by:
nextDoc
in classDocIdSetIterator
- Throws:
IOException
-
index
public int index() -
cost
public long cost()Description copied from class:DocIdSetIterator
Returns the estimated cost of thisDocIdSetIterator
.This is generally an upper bound of the number of documents this iterator might match, but may be a rough heuristic, hardcoded value, or otherwise completely inaccurate.
- Specified by:
cost
in classDocIdSetIterator
-