# Chromatic $$k$$-Nearest Neighbor Queries

## Abstract

Let $$P$$ be a set of $$n$$ colored points. We develop efficient data structures that store $$P$$ and can answer chromatic $$k$$-nearest neighbor ($$k$$-NN) queries. Such a query consists of a query point $$q$$ and a number $$k$$, and asks for the color that appears most frequently among the $$k$$ points in $$P$$ closest to $$q$$. Answering such queries efficiently is the key to obtain fast $$k$$-NN classifiers. Our main aim is to obtain query times that are independent of $$k$$ while using near-linear space.

We show that this is possible using a combination of two data structures. The first data structure allow us to compute a region containing exactly the $$k$$-nearest neighbors of a query point $$q$$, and the second data structure can then report the most frequent color in such a region. This leads to linear space data structures with query times of $$O(n^{1 / 2} \log n)$$ for points in $$\mathbb{R}^1$$, and with query times varying between $$O(n^{2/3}\log^{2/3} n)$$ and $$O(n^{5/6} \polylog n)$$, depending on the distance measure used, for points in $$\mathbb{R}^2$$. These results can be extended to work in higher dimensions as well. Since the query times are still fairly large we also consider approximations. If we are allowed to report a color that appears at least $$(1-\varepsilon)f^*$$ times, where $$f^*$$ is the frequency of the most frequent color, we obtain a query time of $$O(\log n + \log\log_{\frac{1}{1-\varepsilon}} n)$$ in $$\mathbb{R}^1$$ and expected query times ranging between $$\tilde{O}(n^{1/2}\varepsilon^{-3/2})$$ and $$\tilde{O}(n^{1/2}\varepsilon^{-5/2})$$ in $$\mathbb{R}^2$$ using near-linear space (ignoring polylogarithmic factors).

# Chromatic $$k$$-Nearest Neighbor Queries

Proc. 30th European Symposium on Algorithms, 2022

@inproceedings{chromatic_knn2022,
author = {van der Horst, Thijs and L{\"o}ffler, Maarten and Staals, Frank},
title = {Chromatic $k$-Nearest Neighbor Queries},
booktitle = {Proc. 30th European Symposium on Algorithms},
year = {2022},
location = {Potsdam, Germany},
numpages = {15},
pages = {67:1--67:15},
keywords = {data structure, nearest neighbor, classification},
category = {datastructures},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
volume = {244},
editor = {Chechik, Shiri and Navarro, Gonzalo and Rotenberg, Eva and Herman, Grzegorz},
doi = {10.4230/LIPIcs.ESA.2022.67},
url = {https://doi.org/10.4230/LIPIcs.ESA.2022.67},
}