Class KnnJoinIndexJudgement<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry>
java.lang.Object
org.apache.sedona.core.joinJudgement.KnnJoinIndexJudgement<T,U>
- Type Parameters:
T- extends Geometry - the type of geometries in the left setU- extends Geometry - the type of geometries in the right set
- All Implemented Interfaces:
Serializable,org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T>,Iterator<U>, org.apache.commons.lang3.tuple.Pair<T, U>>
public class KnnJoinIndexJudgement<T extends org.locationtech.jts.geom.Geometry,U extends org.locationtech.jts.geom.Geometry>
extends Object
implements org.apache.spark.api.java.function.FlatMapFunction2<Iterator<T>,Iterator<U>,org.apache.commons.lang3.tuple.Pair<T,U>>, Serializable
This class is responsible for performing a K-nearest neighbors (KNN) join operation using a
spatial index. It extends the JudgementBase class and implements the FlatMapFunction2 interface.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final org.apache.spark.util.LongAccumulatorprotected final org.apache.spark.util.LongAccumulatorprotected final org.apache.spark.util.LongAccumulatorprotected final org.apache.spark.util.LongAccumulator -
Constructor Summary
ConstructorsConstructorDescriptionKnnJoinIndexJudgement(int k, DistanceMetric distanceMetric, boolean includeTies, org.apache.spark.broadcast.Broadcast<List<T>> broadcastQueryObjects, org.apache.spark.broadcast.Broadcast<org.locationtech.jts.index.strtree.STRtree> broadcastObjectsTreeIndex, org.apache.spark.util.LongAccumulator buildCount, org.apache.spark.util.LongAccumulator streamCount, org.apache.spark.util.LongAccumulator resultCount, org.apache.spark.util.LongAccumulator candidateCount) Constructor for the KnnJoinIndexJudgement class. -
Method Summary
Modifier and TypeMethodDescriptionThis method performs the KNN join operation.callUsingBroadcastObjectIndex(Iterator<T> queryShapes) This method performs the KNN join operation using the broadcast spatial index built using all geometries in the object side.callUsingBroadcastQueryList(Iterator<U> objectShapes) This method performs the KNN join operation using the broadcast query geometries.static <U extends org.locationtech.jts.geom.Geometry,T extends org.locationtech.jts.geom.Geometry>
doubledistance(U key, T value, DistanceMetric distanceMetric) static doubledistanceByMetric(org.locationtech.jts.geom.Geometry queryGeom, org.locationtech.jts.geom.Geometry candidateGeom, DistanceMetric distanceMetric) This method calculates the distance between two geometries using the specified distance metric.static org.locationtech.jts.index.strtree.ItemDistancegetItemDistance(DistanceMetric distanceMetric) static org.locationtech.jts.index.strtree.ItemDistancegetItemDistanceByMetric(DistanceMetric distanceMetric) This method returns the ItemDistance object based on the specified distance metric.protected booleanhasNextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes) Iterator model for the nest loop join.protected booleanhasNextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft) Iterator model for the index-based join.protected voidLooks up the extent of the current partition.protected voidnextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes) Iterator model for the nest loop join.nextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft) Iterator model for the index-based join.
-
Field Details
-
buildCount
protected final org.apache.spark.util.LongAccumulator buildCount -
streamCount
protected final org.apache.spark.util.LongAccumulator streamCount -
resultCount
protected final org.apache.spark.util.LongAccumulator resultCount -
candidateCount
protected final org.apache.spark.util.LongAccumulator candidateCount
-
-
Constructor Details
-
KnnJoinIndexJudgement
public KnnJoinIndexJudgement(int k, DistanceMetric distanceMetric, boolean includeTies, org.apache.spark.broadcast.Broadcast<List<T>> broadcastQueryObjects, org.apache.spark.broadcast.Broadcast<org.locationtech.jts.index.strtree.STRtree> broadcastObjectsTreeIndex, org.apache.spark.util.LongAccumulator buildCount, org.apache.spark.util.LongAccumulator streamCount, org.apache.spark.util.LongAccumulator resultCount, org.apache.spark.util.LongAccumulator candidateCount) Constructor for the KnnJoinIndexJudgement class.- Parameters:
k- the number of nearest neighbors to finddistanceMetric- the distance metric to usebroadcastQueryObjects- the broadcast geometries on queriesbroadcastObjectsTreeIndex- the broadcast spatial index on objectsbuildCount- accumulator for the number of geometries processed from the build sidestreamCount- accumulator for the number of geometries processed from the stream sideresultCount- accumulator for the number of join resultscandidateCount- accumulator for the number of candidate matches
-
-
Method Details
-
call
public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> call(Iterator<T> queryShapes, Iterator<U> objectShapes) throws Exception This method performs the KNN join operation. It iterates over the geometries in the stream side and uses the spatial index to find the k nearest neighbors for each geometry. The method returns an iterator over the join results.- Specified by:
callin interfaceorg.apache.spark.api.java.function.FlatMapFunction2<Iterator<T extends org.locationtech.jts.geom.Geometry>,Iterator<U extends org.locationtech.jts.geom.Geometry>, org.apache.commons.lang3.tuple.Pair<T extends org.locationtech.jts.geom.Geometry, U extends org.locationtech.jts.geom.Geometry>> - Parameters:
queryShapes- iterator over the geometries in the query sideobjectShapes- iterator over the geometries in the object side- Returns:
- an iterator over the join results
- Throws:
Exception- if the spatial index is not of type STRtree
-
callUsingBroadcastObjectIndex
public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> callUsingBroadcastObjectIndex(Iterator<T> queryShapes) This method performs the KNN join operation using the broadcast spatial index built using all geometries in the object side.- Parameters:
queryShapes- iterator over the geometries in the query side- Returns:
- an iterator over the join results
-
callUsingBroadcastQueryList
public Iterator<org.apache.commons.lang3.tuple.Pair<T,U>> callUsingBroadcastQueryList(Iterator<U> objectShapes) This method performs the KNN join operation using the broadcast query geometries.- Parameters:
objectShapes- iterator over the geometries in the object side- Returns:
- an iterator over the join results
-
distanceByMetric
public static double distanceByMetric(org.locationtech.jts.geom.Geometry queryGeom, org.locationtech.jts.geom.Geometry candidateGeom, DistanceMetric distanceMetric) This method calculates the distance between two geometries using the specified distance metric.- Parameters:
queryGeom- the query geometrycandidateGeom- the candidate geometrydistanceMetric- the distance metric to use- Returns:
- the distance between the two geometries
-
getItemDistanceByMetric
public static org.locationtech.jts.index.strtree.ItemDistance getItemDistanceByMetric(DistanceMetric distanceMetric) This method returns the ItemDistance object based on the specified distance metric.- Parameters:
distanceMetric- the distance metric to use- Returns:
- the ItemDistance object
-
distance
public static <U extends org.locationtech.jts.geom.Geometry,T extends org.locationtech.jts.geom.Geometry> double distance(U key, T value, DistanceMetric distanceMetric) -
getItemDistance
public static org.locationtech.jts.index.strtree.ItemDistance getItemDistance(DistanceMetric distanceMetric) -
initPartition
protected void initPartition()Looks up the extent of the current partition. If found, `match` method will activate the logic to avoid emitting duplicate join results from multiple partitions.Must be called before processing a partition. Must be called from the same instance that will be used to process the partition.
-
hasNextBase
protected boolean hasNextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft) Iterator model for the index-based join. It checks if there is a next match and populate it to the result.- Parameters:
spatialIndex-streamShapes-buildLeft-- Returns:
-
hasNextBase
protected boolean hasNextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes) Iterator model for the nest loop join. It checks if there is a next match and populate it to the result.- Parameters:
buildShapes-streamShapes-- Returns:
-
nextBase
protected org.apache.commons.lang3.tuple.Pair<U,T> nextBase(org.locationtech.jts.index.SpatialIndex spatialIndex, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes, boolean buildLeft) Iterator model for the index-based join. It returns 1 pair in the current batch. Each batch contains a list of pairs of geometries that satisfy the join condition. The current batch is the result of the current stream shape against all the build shapes.- Parameters:
spatialIndex-streamShapes-buildLeft-- Returns:
-
nextBase
protected org.apache.commons.lang3.tuple.Pair<U,T> nextBase(List<? extends org.locationtech.jts.geom.Geometry> buildShapes, Iterator<? extends org.locationtech.jts.geom.Geometry> streamShapes) Iterator model for the nest loop join. It returns 1 pair in the current batch. Each batch contains a list of pairs of geometries that satisfy the join condition. The current batch is the result of the current stream shape against all the build shapes.- Parameters:
buildShapes-streamShapes-- Returns:
-
log
-