Introduction

Given two vectors of length n, X and Y, top-k on X + Y finds the k smallest values of the form Xi + Yj. Note that this problem definition is presented w.l.o.g.; X and Y need not share the same length. Top-k is important to practical applications, such as selecting the most abundant k isotopologue peaks from a compound (Kreitzberg et al., 2020). Top-k is ∈ Ω(n + k), because loading the vectors is ∈ Θ(n) and returning the minimal k values is ∈ Θ(k).

Naive approach

Top-k can be solved trivially in $O\left({n}^{2}\mathrm{log}\left(n\right)+k\right)=O\left({n}^{2}\mathrm{log}\left(n\right)\right)$ steps by generating and sorting all n2 values of the form Xi + Yj. By using median-of-medians (Blum et al., 1973), this can be improved to O(n2) steps by generating all n2 values and performing k-selection on them.

Existing, tree-based methods for top-k

In 1982, Frederickson & Johnson introduced a method reminiscent of median-of-medians (Blum et al., 1973); their method selects only the k th minimum value from X + Y in $O\left(n+min\left(n,k\right)\mathrm{log}\left(\frac{k}{min\left(n,k\right)}\right)\right)$ steps (Frederickson & Johnson, 1982).

Frederickson subsequently published a second algorithm, which finds the k smallest elements from a min-heap in O(k), assuming the heap has already been built (Frederickson, 1993). Combining this method with a combinatoric heap on X + Y (described below for the Kaplan et al. method) solves top-k in O(n + k). The tree data structure in Frederickson’s method can be combined with a combinatoric heap to compute the kth smallest value from X + Y.

Kaplan et al. described an alternative method for selecting the kth smallest value (Kaplan et al., 2019); that method explicitly used Chazelle’s soft heaps (Chazelle, 2000). By heapifying X and Y in linear time (i.e., guaranteeing w.l.o.g. that ${X}_{i}\le {X}_{2i},{X}_{2i+1}$), mini,j Xi + Yj = X1 + Y1. Likewise, ${X}_{i}+{Y}_{j}\le {X}_{2i}+{Y}_{j},{X}_{2i+1}+{Y}_{j},{X}_{i}+{Y}_{2j},{X}_{i}+{Y}_{2j+1}$. The soft heap is initialized to contain tuple (X1 + Y1,1,1). Then, as tuple (v,i,j) is popped from soft heap, lower-quality tuples are inserted into the soft heap. These lower-quality tuples of (i,j) are

$\left\{\begin{array}{cc}\left\{\left(2i,1\right),\left(2i+1,1\right),\left(i,2\right),\left(i,3\right)\right\},& j=1\\ \left\{\left(i,2j\right),\left(i,2j+1\right)\right\},& j>1.\\ & \end{array}$

In the matrix Xi + Yj (which is not realized), this scheme progresses in row-major order, thereby avoiding a tuple being added multiple times.

To compute the kth smallest value from X + Y, the best k values are popped from the soft heap. Even though only the minimal k values are desired, “corruption” in the soft heap means that the soft heap will not always pop the minimal value; however, as a result, soft heaps can run faster than the $\mathrm{\Omega }\left(n\mathrm{log}\left(n\right)\right)$ lower bound on comparison sorting. The free parameter $\epsilon \in \left(0,\frac{1}{2}\right]$ bounds the number of corrupt elements in the soft heap (which may be promoted earlier in the queue than they should be) as ≤ t · ε, where t is the number of insertions into the soft heap thus far. Thus, instead of popping k items (and inserting their lower-quality dependents as described in Eq. (1)), the total number of pops, p, can be found: The maximal size of the soft heap after p pops is ≤ 3p (because each pop removes one element and inserts ≤ 4 elements according to Eq. (1)); therefore, pcorruptionp − 4p· ε, and thus $p-4p\cdot \epsilon \ge k$ guarantees that $p-corruption\ge k$. This leads to $p=\frac{k}{1-4\epsilon }$, $\epsilon \le \frac{1}{4}$. This guarantees that Θ(k) values, which must include the minimal k values, are popped. These values are post-processed to retrieve the minimal k values via linear time one-dimensional selection (Blum et al., 1973). For constant ε, both pop and insertion operations to the soft heap are $\in \stackrel{~}{O}\left(1\right)$, and thus the overall runtime of the algorithm is ∈ O(n + k).

Note that the Kaplan et al. method easily solves top-k in O(n + k) steps; this is because computing the kth smallest value from X + Y pops the minimal k values from the soft heap.

Layer-ordered heaps and a novel selection algorithm on X + Y

This paper uses layer-ordered heaps (LOHs) (Kreitzberg et al., 2020) to produce an optimal selection algorithm on X + Y. LOHs are stricter than heaps but not as strict as sorting: Heaps guarantee only that XiXchildren(i), but do not guarantee any ordering between one child of Xi, a, and the child of the sibling of a. Sorting is stricter still, but sorting n values cannot be done faster than ${\mathrm{log}}_{2}\left(n!\right)\in \mathrm{\Omega }\left(n\mathrm{log}\left(n\right)\right)$. LOHs partition the array into several layers such that the values in a layer are ≤ to the values in subsequent layers: X(u) = X(u)1, X(u)2, … ≤ X(u + 1). The size of these layers starts with |X(1)| = 1 and grows exponentially such that $\underset{u\to \mathrm{\infty }}{lim}\frac{|{X}^{\left(u+1\right)}|}{|{X}^{\left(u\right)}|}=\alpha \ge 1$ (note that α = 1 is equivalent to sorting because all layers have size 1). By assigning values in layer u children from layer u + 1, this can be seen as a more constrained form of the heap; however, unlike sorting, for any constant α > 1, LOHs can be constructed ∈ O(n) by performing iterative linear time one-dimensional selection, iteratively selecting and removing the largest layer until all layers have been partitioned. For example, 8,1,6,4,5,3,2 can be LOHified with α = 2 into an LOH with three layers (1 ≤ 3,2 ≤ 8,4,6,5) by first selecting the largest 4 values on the entire list (8,4,6,5), removing them, and then selecting the largest 2 values from the remaining 3 values (3,2).

Although selections reminiscent of LOHs may have been used previously, formalization of rank α LOHs has been necessary to demonstrate that for $1\ll \alpha \ll 2$, a combination of LOHs and soft heaps allow generating the minimum k values from X1 + X2 + ⋯ + Xm (where each Xi has length n) in $o\left(n\cdot m+k\cdot m\right)$ (Kreitzberg, Lucke & Serang, 2020). Furthermore, efficiently constructing an LOH of rank α is not trivial when $\alpha \ll 2$; after all, $\alpha \to 1$ results in layers of size |X(1)| = |X(2)| = ⋯ = 1, indicating a sorting, which implies a runtime ∈ Ω(nlog(n)) (Pennington et al., 2020).

A python implementation of a LOH is shown in listing 1.

Contribution in this manuscript

The new, optimal algorithm for solving top-k presented here makes extensive use of LOHs. It is simple to implement, does not rely on anything more complicated than linear time one-dimensional selection (i.e., it does not use soft heap). Due to its simplicity and contiguous memory access, it has a fast performance in practice.

Methods

Algorithm

The algorithm presented is broken into phases. An illustration of these phases is provided in Fig. 1.

Phase 0

The algorithm first LOHifies (i.e., constructs a layer order heap from) both X and Y. This is performed by using linear time one-dimensional selection to iteratively remove the largest remaining layer (i.e., the simplest LOH construction method, which is optimal when $\alpha \gg 1$).

Phase 1

Now layer products of the form ${X}^{\left(u\right)}+{Y}^{\left(v\right)}={X}_{1}^{\left(u\right)}+{Y}_{1}^{\left(v\right)},{X}_{1}^{\left(u\right)}+{Y}_{2}^{\left(v\right)},\dots ,{X}_{2}^{\left(u\right)}+{Y}_{1}^{\left(v\right)},\dots$ are considered, where X(u) and Y(v) are layers of their respective LOHs.

In phases 1–2, the algorithm initially considers only the minimum and maximum values in each layer product: $⌊\left(u,v\right)⌋=\left(min\left({X}^{\left(u\right)}+{Y}^{\left(v\right)}\right),\left(u,v\right),0\right)$, $⌈\left(u,v\right)⌉=\left(max\left({X}^{\left(u\right)}+{Y}^{\left(v\right)}\right),\left(u,v\right),1\right)$. It is unnecessary to compute the Cartesian product of values to build a layer product; instead, only the minimum or maximum values in X(u) and Y(v) are needed. Note that the final value in the tuple uses 0 to indicate that this is the minimum value in the layer product or 1 to indicate the maximum value in the layer product; this ensures that even layer products with homogeneous values satisfy $⌊\left(u,v\right)⌋<⌈\left(u,v\right)⌉$. Scalar values can be compared to tuples: ${X}_{i}+{Y}_{j}\le ⌈\left(u,v\right)⌉=\left(max\left({X}^{\left(u\right)}+{Y}^{\left(v\right)}\right),\left(u,v\right),1\right)↔{X}_{i}+{Y}_{j}\le max\left({X}^{\left(u\right)}+{Y}^{\left(v\right)}\right)$.

Binary heap H is initialized to contain tuple $⌊\left(1,1\right)⌋$. A set of all tuples in H is maintained to prevent duplicates from being inserted into H (this set could be excluded by using the Kaplan et al. proposal scheme). The algorithm proceeds by popping the lexicographically minimum tuple from H. W.l.o.g., there is no guaranteed ordering of the form X(u) + Y(v)X(u + 1) + Y(v), because it may be that max(X(u) + Y(v)) > min(X(u + 1) + Y(v)); however, lexicographically, $⌊\left(u,v\right)⌋<⌊\left(u+1,v\right)⌋,⌊\left(u,v+1\right)⌋,⌈\left(u,v\right)⌉$; thus, the latter tuples need be inserted into H only after ⌊(u,v)⌋ has been popped from H. Note that for this reason and to break ties where layer products contain identical values, (u,v) are included in the tuple. ⌈(u,v)⌉ tuples do not insert any new tuples into H when they’re popped.

Whenever a tuple of the form ⌈(u,v)⌉ is popped from H, the index (u,v) is appended to list q and the size of the layer product |X(u) + Y(v)| = |X(u)|·|Y(v)| is accumulated into integer s. This method proceeds until sk.

Phase 2

Any remaining tuple in H of the form $\left(max\left({X}^{\left({u}^{\mathrm{\prime }}\right)}+{Y}^{\left({v}^{\mathrm{\prime }}\right)}\right),\left({u}^{\mathrm{\prime }},{v}^{\mathrm{\prime }}\right),1\right)$ has its index (u′,v′) appended to list q. s′ is the total number of elements in each of these (u′,v′) layer products appended to q during phase 2.

Phase 3

The values from every element in each layer product in q are generated. A linear time one-dimensional k-selection is performed on these values and returned.

Proof of correctness

Lemma 2.4 proves that at termination all layer products found in q must contain the minimal k values in X + Y. Thus, by performing one-dimensional k-selection on those values in phase 3, the minimal k values in X + Y are found.

Lemma 2.1. If $⌊\left(u,v\right)⌋$ is popped from H, then both $⌊\left(u-1,v\right)⌋$ (if u > 1) and $⌊\left(u,v-1\right)⌋$ (if v > 1) must previously have been popped from H.

Proof. There is a chain of pops and insertions backwards from $⌊\left(u,v\right)⌋$ to $⌊\left(1,1\right)⌋$.

When both u, v = 1, the lemma is true.

W.l.o.g. if u = 1 this chain is of the form $⌊\left(1,1\right)⌋,\dots ,⌊\left(1,2\right)⌋,\dots ⌊\left(1,3\right)⌋,\dots ⌊\left(u,v\right)⌋$, proving the lemma for that case.

Otherwise, both $u,v>1$. Because insertions into H increment either row or column, something of the form $⌊\left(a,v-1\right)⌋$ with $a\le u$ must be inserted into H before inserting $⌊\left(u,v\right)⌋$. $⌊\left(a,v-1\right)⌋<⌊\left(u,v\right)⌋$, so $⌊\left(a,v-1\right)⌋$ must precede $⌊\left(u,v\right)⌋$ in the chain of pops. If a = u, then $⌊\left(u,v-1\right)⌋$ is popped before $⌊\left(u,v\right)⌋$. If a < u, then from the insertion of $⌊\left(a,v-1\right)⌋$ into H, until $⌊\left(u,v-1\right)⌋$ is popped, H must contain something of the form $⌊\left({a}^{\prime },v-1\right)⌋:\text{}{a}^{\prime }\le u$, because popping $⌊\left({a}^{\prime },v-1\right)⌋$ inserts $⌊\left({a}^{\prime }+1,v-1\right)⌋$. $⌊\left({a}^{\prime },v-1\right)⌋<⌊\left(u,v\right)⌋$ when ${a}^{\prime }\le u$; therefore, $⌊\left(u,v\right)⌋$ cannot be popped before any $⌊\left({a}^{\prime },v-1\right)⌋$ currently in H. Because there are a finite number of these a' and they are not revisited, before $⌊\left(u,v\right)⌋$ is popped, $⌊\left(u,v-1\right)⌋$ must be popped. This same process can be repeated with $⌊\left(u-1,b\right)⌋:\text{}b\le v$ to show that $⌊\left(u-1,v\right)⌋$ must be popped before $⌊\left(u,v\right)⌋$, proving the lemma for the final case. □

Lemma 2.2 If $⌈\left(u,v\right)⌉$ is popped from H, then both $⌈\left(u-1,v\right)⌉$ (if u>1) and $⌈\left(u,v-1\right)⌉$ (if v > 1) must previously have been popped from H.

Proof. Inserting $⌈\left(u,v\right)⌉$ requires previously popping $⌊\left(u,v\right)⌋$. By lemma 2.1, this requires previously popping $⌊\left(u-1,v\right)⌋$ (if u > 1) and $⌊\left(u,v-1\right)⌋$ (if v > 1). These pops will insert $⌈\left(u-1,v\right)⌉$ and $⌈\left(u,v-1\right)⌉$ respectively. Thus, $⌈\left(u-1,v\right)⌉$ and $⌈\left(u,v-1\right)⌉$, which are both $<⌈\left(u,v\right)⌉$, are inserted before $⌈\left(u,v\right)⌉$, and will therefore be popped before $⌈\left(u,v\right)⌉$. □

Lemma 2.3 All tuples will be visited in ascending order as they are popped from H.

Proof. Let $⌊\left(u,v\right)⌋$ be popped from H and let $⌊\left(a,b\right)⌋<⌊\left(u,v\right)⌋$. Either w.l.o.g. a < u, bv, or w.l.o.g. a < u, b > v. In the former case, $⌊\left(a,b\right)⌋$ will be popped before $⌊\left(u,v\right)⌋$ by applying induction to lemma 2.1.

In the latter case, lemma 2.1 says that $⌊\left(a,v\right)⌋$ is popped before $⌊\left(u,v\right)⌋$. $⌊\left(a,v\right)⌋<⌊\left(a,v+1\right)⌋<⌊\left(a,v+1\right)⌋<\cdots <⌊\left(a,b\right)⌋<⌊\left(u,v\right)⌋$, meaning that $\mathrm{\forall }r\in \left[v,b\right],⌊\left(a,r\right)⌋<⌊\left(u,v\right)⌋$. After $⌊\left(a,v\right)⌋$ is inserted (necessarily before it is popped), at least one such $⌊\left(a,r\right)⌋$ must be in H until $⌊\left(a,b\right)⌋$ is popped. Thus, all such $⌊\left(a,r\right)⌋$ will be popped before $⌊\left(u,v\right)⌋$.

Ordering on popping with $⌈\left(a,b\right)⌉<⌈\left(u,v\right)⌉$ is shown in the same manner: For $⌈\left(u,v\right)⌉$ to be in H, $⌊\left(u,v\right)⌋$ must have previously been popped. As above, whenever $⌈\left(u,v\right)⌉$ is in H, then $⌊\left(a,v\right)⌋$ must have been popped, inserting $⌊\left(a,v+1\right)⌋$ into H. Each $⌊\left(a,r\right)⌋$ popped inserts

$⌊\left(a,r+1\right)⌋$, so at least one $⌊\left(a,r\right)⌋,r\in \left[v,b\right]$ must also be in H until $⌊\left(a,b\right)⌋$ is popped. These $⌊\left(a,r\right)⌋\le ⌊\left(a,b\right)⌋<⌈\left(a,b\right)⌉<⌈\left(u,v\right)⌉$, and so $⌈\left(a,b\right)⌉$ will be popped before $⌈\left(u,v\right)⌉$.

Identical reasoning also shows that $⌊\left(a,b\right)⌋$ will pop before $⌈\left(u,v\right)⌉$ if $⌊\left(a,b\right)⌋<⌈\left(u,v\right)⌉$ or if $⌈\left(a,b\right)⌉<⌊\left(u,v\right)⌋$.

Thus, all tuples are popped in ascending order.

Lemma 2.4 At the end of phase 2, the layer products whose indices are found in q contain the minimal k values.

Proof. Let (u,v) be the layer product that first makes sk. There are at least k values of X + Y that are $\le max\left({X}^{\left(u\right)}+{Y}^{\left(v\right)}\right)$; this means that $\tau =max\left(select\left(X+Y,k\right)\right)\le max\left({X}^{\left(u\right)}+{Y}^{\left(v\right)}\right)$. The quality of the elements in layer products in q at the end of phase 1 can only be improved by trading some value for a smaller value, and thus require a new value <max(X(u) + Y(v)).

By lemma 2.3, tuples will be popped from H in ascending order; therefore, any layer product $\left({u}^{\prime },{v}^{\prime }\right)$ containing values $ must have had $⌊\left({u}^{\prime },{v}^{\prime }\right)⌋$ popped before $⌈\left(u,v\right)⌉$. If $⌈\left({u}^{\prime },{v}^{\prime }\right)⌉$ was also popped, then this layer product is already included in q and cannot improve it. Thus the only layers that need be considered further have had $⌊\left({u}^{\prime },{v}^{\prime }\right)⌋$ popped but not $⌈\left({u}^{\prime },{v}^{\prime }\right)⌉$ popped; these can be found by looking for all $⌈\left({u}^{\prime },{v}^{\prime }\right)⌉$ that have been inserted into H but not yet popped.

Phase 2 appends to q all such remaining layer products of interest. Thus, at the end of phase 2, q contains all layer products that will be represented in the k-selection of X + Y.

A python implementation of this method is shown in listing 2.

Runtime

Theorem 2.8 proves that the total runtime is ∈ O(n + k).

Lemma 2.5 Let (u′,v′) be a layer product appended to q during phase 2. Either u′ = 1, v′ = 1, or (u′ − 1,v′ − 1) was already appended to q in phase 1.

Proof. Let u′>1 and v′>1. By lemma 2.3, minimum and maximum layer products are popped in ascending order. By the layer ordering property of X and Y, $max\left({X}^{\left({u}^{\prime }-1\right)}\right)\le min\left({X}^{\left({u}^{\prime }\right)}\right)$ and $max\left({Y}^{\left({v}^{\prime }-1\right)}\right)\le min\left({Y}^{\left({v}^{\prime }\right)}\right)$. Thus, $⌈\left({u}^{\prime }-1,{v}^{\prime }-1\right)⌉<⌊\left({u}^{\prime },{v}^{\prime }\right)⌋$ and so $⌈\left({u}^{\prime }-1,{v}^{\prime }-1\right)⌉$ must be popped before $⌊\left({u}^{\prime },{v}^{\prime }\right)⌋$.

Lemma 2.6 s, the number of elements in all layer products appended to q in phase 1, is ∈ O(k).

Proof. (u,v) is the layer product whose inclusion during phase 1 in q achieves s ≥ k; therefore, $s-|{X}^{\left(u\right)}+{Y}^{\left(v\right)}|. This happens when $⌈\left(u,v\right)⌉$ is popped from H.

If k = 1, popping $⌈\left(1,1\right)⌉$ ends phase 1 with s = 1 ∈ O(k).

If k > 1, then at least one layer index is >1: u > 1 or v > 1. W.l.o.g., let u > 1. By lemma 2.1, popping $⌈\left(u,v\right)⌉$ from H requires previously popping $⌈\left(u-1,v\right)⌉$. $|{X}^{\left(u\right)}+{Y}^{\left(v\right)}|=|{X}^{\left(u\right)}|\cdot |{Y}^{\left(v\right)}|\approx \alpha \cdot |{X}^{\left(u-1\right)}|\cdot |{Y}^{\left(v\right)}|=\alpha \cdot |{X}^{\left(u-1\right)}+{Y}^{\left(v\right)}|$ (where ≈ indicates asymptotic behavior); therefore, |X(u) + Y(v)|∈ O(|X(u 1) + Y(v)|). |X(u 1) + Y(v)| is already counted in s − |X(u) + Y(v)| < k, and so $|{X}^{\left(u-1\right)}+{Y}^{\left(v\right)}| and |X(u) + Y(v)|∈ O(k). $s and hence sO(k).

Lemma 2.7 s′, the total number of elements in all layer products appended to q in phase 2, ∈ O(k).

Proof. Each layer product appended to q in phase 2 has had $⌊\left({u}^{\prime },{v}^{\prime }\right)⌋$ popped in phase 1. By lemma 2.5, either u′ = 1 or v′ = 1 or $⌈\left({u}^{\prime }-1,{v}^{\prime }-1\right)⌉$ must have been popped before $⌊\left({u}^{\prime },{v}^{\prime }\right)⌋$.

First consider when u′ > 1 and v′ > 1. Each (u′,v′) matches exactly one layer product (u′ − 1,v′ − 1). Because $⌈\left({u}^{\prime }-1,{v}^{\prime }-1\right)⌉$ must have been popped before $⌊\left({u}^{\prime },{v}^{\prime }\right)⌋$, then $⌈\left({u}^{\prime }-1,{v}^{\prime }-1\right)⌉$ was also popped during phase 1. s, the count of all elements whose layer products were inserted into q in phase 1, includes $|{X}^{\left({u}^{\mathrm{\prime }}-1\right)}+{Y}^{\left({v}^{\mathrm{\prime }}-1\right)}|$ but does not include X(u′) + Y(v′) (the latter is appended to q during phase 2). By exponential growth of layers in X and Y, $|{X}^{\left({u}^{\mathrm{\prime }}\right)}+{Y}^{\left({v}^{\mathrm{\prime }}\right)}|\approx {\alpha }^{2}\cdot |{X}^{\left({u}^{\mathrm{\prime }}-1\right)}+{Y}^{\left({v}^{\mathrm{\prime }}-1\right)}|$. These |X(u 1) + Y(v 1)| values were included in s during phase 1, and thus the total number of elements in all such (u′ − 1,v′ − 1) layer products is ≤ s. Thus the sum of sizes of all layer products (u′,v′) with u′ > 1 and v′ > 1 that are appended to q during phase 2 is asymptotically ≤ α2· s.

When either u′ = 1 or v′ = 1, the number of elements in all layer products must be ∈ O(n): ${\sum }_{{u}^{\mathrm{\prime }}}|{X}^{\left({u}^{\mathrm{\prime }}\right)}+{Y}^{\left(1\right)}|+{\sum }_{{v}^{\mathrm{\prime }}}|{X}^{\left({u}^{\mathrm{\prime }}\right)}+{Y}^{\left(1\right)}|<2n$; however, it is possible to show that contributions where u′ = 1 or v′ = 1 are ∈ O(k):

W.l.o.g. for u′>1, $⌊\left(u\prime ,1\right)⌋$ is inserted into H only when $⌊\left(u\prime -1,1\right)⌋$ is popped. Thus at most one $⌊\left(u\prime ,1\right)⌋$ can exist in H at any time. Furthermore, popping $⌊\left(u\prime ,1\right)⌋$ from H requires previously popping $⌈\left(u\prime -1,1\right)⌉$ from H: layer ordering on X implies max(X(u 1)) ≤ min (X(u′)) and |Y(1)| = 1 implies min(Y(1)) = max(Y(1)), and so $⌈\left({u}^{\mathrm{\prime }}-1,1\right)⌉=\left(max\left({X}^{\left({u}^{\mathrm{\prime }}-1\right)}+{Y}^{\left(1\right)}\right),\left({u}^{\mathrm{\prime }}-1,1\right),1\right)<⌊\left({u}^{\mathrm{\prime }},1\right)⌋=\left(min\left({X}^{\left({u}^{\mathrm{\prime }}\right)}+{Y}^{\left(1\right)}\right),\left({u}^{\mathrm{\prime }},1\right),0\right)$. Thus $⌈\left(u\prime -1,1\right)⌉$ has been popped from H and counted in s. By the exponential growth of layers, the contribution of all such u′ > 1, v′ = 1 will be $\approx \le \alpha \cdot s$, and so the contributions of u′ > 1, v′ = 1 or u′ = 1, v′ > 1 will be ≈ ≤ 2 α · s.

When u′ = v′ = 1, the layer product contains 1 element.

Therefore, s′, the total number of elements found in layer products appended to q during phase 2, has ${s}^{\mathrm{\prime }}\le \left({\alpha }^{2}+2\alpha \right)\cdot s+1$. By lemma 2.6, $s\in O\left(k\right)$, and thus s′∈ O(k).

Theorem 2.8 The total runtime of the algorithm is ∈ O(n + k).

Proof. For any constant α > 1, LOHification of X and Y runs in linear time, and so phase 0 runs ∈ O(n).

The total number of layers in each LOH is ≈ logα(n); therefore, the total number of layer products is $\approx {\mathrm{log}}_{\alpha }^{2}\left(n\right)$. In the worst-case scenario, the heap insertions and pops (and corresponding set insertions and removals) will sort ≈ 2 log2α(n) elements, because each layer product may be inserted as both $⌊·⌋$ or $⌈·⌉$; the worst-case runtime via comparison sort will be $\in O\left({\mathrm{log}}_{\alpha }^{2}\left(n\right)\mathrm{log}\left({\mathrm{log}}_{\alpha }^{2}\left(n\right)\right)\right)\subset o\left(n\right)$. The operations to maintain a set of indices in the heap have the same runtime per operation as those inserting/removing to a binary heap, and so can be amortized out. Thus, the runtimes of phases 1–2 are amortized out by the O(n) runtime of phase 0.

Lemma 2.6 shows that sO(k). Likewise, lemma 2.7 shows that s′∈ O(k). The number of elements in all layer products in q during phase 3 is s + s′∈ O(k). Thus, the number of elements on which the one-dimensional selection is performed will be $\in O\left(k\right)$. Using a linear time one-dimensional selection algorithm, the runtime of the k-selection in phase 3 is ∈ O(k).

The total runtime of all phases ∈ O(n + k + k + k) = O(n + k).

Results

Runtimes of the naive O(n2log(n) + k) method (chosen for reference because it is the easiest method to implement and because of the fast runtime constant on python’s built-in sorting routine), the soft heap-based method from Kaplan et al., and the LOH-based method in this paper are shown in Table 1. The proposed approach achieves a >295× speedup over the naive approach and >18× speedup over the soft heap approach. LOHs are more lightweight than soft heaps, including contiguous memory access patterns and far fewer pointer dereferences than soft heaps.

Discussion

The algorithm can be thought of as “zooming out” as it pans through the layer products, thereby passing the value threshold at which the kth best value Xi + Yj occurs. It is somewhat reminiscent of skip lists (Pugh, 1990); however, where a skip list begins coarse and progressively refines the search, this approach begins finely and becomes progressively coarser. The notion of retrieving the best k values while “overshooting” the target by as little as possible results in some values that may be considered but which will not survive the final one-dimensional selection in phase 3. This is reminiscent of “corruption” in Chazelle’s soft heaps. Like soft heaps, this method eschews sorting in order to prevent a runtime $\in \mathrm{\Omega }\left(n\mathrm{log}\left(n\right)\right)$ or ∈ ω(k log(k)). But unlike soft heaps, LOHs can be constructed easily using only an implementation of median-of-medians (or any other linear time one-dimensional selection algorithm).

Phase 3 is the only part of the algorithm in which k appears in the runtime formula. This is significant because the layer products in q at the end of phase 2 could be returned in their compressed form (i.e., as the two layers to be combined). The total runtime of phases 0–2 is ∈ O(n). It may be possible to recursively perform X + Y selection on layer products X(u) + Y(v) to compute layer products constituting exactly the k values in the solution, still in factored Cartesian layer product form. Similarly, it may be possible to perform the one-dimensional selection without fully inflating every layer product into its constituent elements. For some applications, a compressed form may be acceptable, thereby making it plausible to remove the requirement that the runtime be ∈ ω(k).

As noted in theorem 2.8, even fully sorting all of the minimum and maximum layer products would be $\in o\left(n\right)$; sorting in this manner may be preferred in practice, because it simplifies the implementation (Listing 3) at the cost of incurring greater runtime in practice when $k\ll {n}^{2}$. Furthermore, listing 3 is unsuitable for online processing (i.e., where X and Y are extended on the fly or where several subsequent selections are performed), whereas listing 2 could be adapted to those uses.

Phase 0 (which performs LOHification) is the slowest part of the presented python implementation; it would benefit from having a practically faster implementation to perform LOHify.

The fast practical performance is partially due to the algorithm’s simplicity and partially due to the contiguous nature of LOHs. Online data structures like soft heap are less easily suited to contiguous access, because they support efficient removal and therefore move pointers to memory rather than moving the contents of the memory.

The choice of α affects performance through the cost of LOHifying and the amount by which the number of generated values overshoots the k minimum values wanted: when α ≈ 1, LOHify effectively sorts X and Y, but generates few extra values; $\alpha \gg 1$, LOHify has a linear runtime, but generates more extra values, which need to be removed by the final k-selection.

Conclusion

LOHs can be constructed in linear time and used to produce a theoretically optimal algorithm for selecting the minimal k values from X + Y. The new optimal algorithm presented here is faster in practice than the existing soft heap-based optimal algorithm.

Appendix

Python code

Listing 1. LayerOrderedHeap.py: A class for LOHifying, retrieving layers, and the minimum and maximum value in a layer.

# https://stackoverflow.com/questions/10806303/python-implementation-of-median-of-medians-algorithm

def median_of_medians_select(L, j): # returns j-th smallest value:

if len(L) < 10:

L.sort()

return L[j]

S = []

lIndex = 0

while lIndex+5 < len(L)-1:

S.append(L[lIndex:lIndex+5])

lIndex += 5

S.append(L[lIndex:])

Meds = []

for subList in S:

Meds.append(median_of_medians_select(subList, int((len(subList)-1)/2)))

med = median_of_medians_select(Meds, int((len(Meds)-1)/2))

L1 = []

L2 = []

L3 = []

for i in L:

if i < med:

L1.append(i)

elif i > med:

L3.append(i)

else:

L2.append(i)

if j < len(L1):

return median_of_medians_select(L1, j)

elif j < len(L2) + len(L1):

return L2[0]

else:

return median_of_medians_select(L3, j-len(L1)-len(L2))

def partition(array, left_n):

n = len(array)

right_n = n - left_n

# median_of_medians_select argument is index, not size:

max_value_in_left = median_of_medians_select(array, left_n-1)

left = []

right = []

for i in range(n):

if array[i] < max_value_in_left:

left.append(array[i])

elif array[i] > max_value_in_left:

right.append(array[i])

num_at_threshold_in_left = left_n - len(left)

left.extend([max_value_in_left]*num_at_threshold_in_left)

num_at_threshold_in_right = right_n - len(right)

right.extend([max_value_in_left]*num_at_threshold_in_right)

return left, right

def layer_order_heapify_alpha_eq_2(array):

n = len(array)

if n == 0:

return []

if n == 1:

return array

new_layer_size = 1

layer_sizes = []

remaining_n = n

while remaining_n > 0:

if remaining_n >= new_layer_size:

layer_sizes.append(new_layer_size)

else:

layer_sizes.append(remaining_n)

remaining_n -= new_layer_size

new_layer_size *= 2

result = []

for i,ls in enumerate(layer_sizes[::-1]):

small_vals,large_vals = partition(array, len(array) - ls)

array = small_vals

result.append(large_vals)

return result[::-1]

class LayerOrderedHeap:

def __init__(self, array):

self._layers = layer_order_heapify_alpha_eq_2(array)

self._min_in_layers = [ min(layer) for layer in self._layers ]

self._max_in_layers = [ max(layer) for layer in self._layers ]

#self._verify()

def __len__(self):

return len(self._layers)

def _verify(self):

for i in range(len(self)-1):

assert(self.max(i) <= self.min(i+1))

def __getitem__(self, layer_num):

return self._layers[layer_num]

def min(self, layer_num):

assert( layer_num < len(self) )

return self._min_in_layers[layer_num]

def max(self, layer_num):

assert( layer_num < len(self) )

return self._max_in_layers[layer_num]

def __str__(self):

return str(self._layers)

Listing 2. CartesianSumSelection.py: A class for efficiently performing selection on X + Y in Θ(n + k) steps.

from LayerOrderedHeap import *

import heapq

class CartesianSumSelection:

def _min_tuple(self,i,j):

# True for min corner, False for max corner

return (self._loh_a.min(i) + self._loh_b.min(j), (i,j), False)

def _max_tuple(self,i,j):

# True for min corner, False for max corner

return (self._loh_a.max(i) + self._loh_b.max(j), (i,j), True)

def _in_bounds(self,i,j):

return i < len(self._loh_a) and j < len(self._loh_b)

def _insert_min_if_in_bounds(self,i,j):

if not self._in_bounds(i,j):

return

if (i,j,False) not in self._hull_set:

heapq.heappush(self._hull_heap, self._min_tuple(i,j))

def _insert_max_if_in_bounds(self,i,j):

if not self._in_bounds(i,j):

return

if (i,j,True) not in self._hull_set:

heapq.heappush(self._hull_heap, self._max_tuple(i,j))

def __init__(self, array_a, array_b):

self._loh_a = LayerOrderedHeap(array_a)

self._loh_b = LayerOrderedHeap(array_b)

self._hull_heap = [ self._min_tuple(0,0) ]

# False for min:

self._hull_set = { (0,0,False) }

self._num_elements_popped = 0

self._layer_products_considered = []

self._full_cartesian_product_size = len(array_a) * len(array_b)

def _pop_next_layer_product(self):

result = heapq.heappop(self._hull_heap)

val, (i,j), is_max = result

self._hull_set.remove( (i,j,is_max) )

if not is_max:

# when min corner is popped, push their own max and neighboring mins

self._insert_min_if_in_bounds(i+1,j)

self._insert_min_if_in_bounds(i,j+1)

self._insert_max_if_in_bounds(i,j)

else:

# when max corner is popped, do not push

self._num_elements_popped += len(self._loh_a[i]) * len(self._loh_b[j])

self._layer_products_considered.append( (i,j) )

return result

def select(self, k):

assert( k <= self._full_cartesian_product_size )

while self._num_elements_popped < k:

self._pop_next_layer_product()

# also consider all layer products still in hull

for val, (i,j), is_max in self._hull_heap:

if is_max:

self._num_elements_popped += len(self._loh_a[i]) * len(self._loh_b[j])

self._layer_products_considered.append( (i,j) )

# generate: values in layer products

# Note: this is not always necessary, and could lead to a potentially large speedup.

candidates = [ val_a+val_b for i,j in self._layer_products_considered for val_a in self._loh_a[i] for val_b in self._loh_b[j] ]

print( 'Ratio of total popped candidates to k: {}'.format(len(candidates) / k) )

k_small_vals, large_vals = partition(candidates, k)

return k_small_vals

Listing 3. SimplifiedCartesianSumSelection.py: A simplified implementation of Listing 2. This implementation is slower when kn2; however, it has the same asymptotic runtime for any n and k: Θ(n + k).

from LayerOrderedHeap import *

class SimplifiedCartesianSumSelection:

def _min_tuple(self,i,j):

# True for min corner, False for max corner

return (self._loh_a.min(i) + self._loh_b.min(j), (i,j), False)

def _max_tuple(self,i,j):

# True for min corner, False for max corner

return (self._loh_a.max(i) + self._loh_b.max(j), (i,j), True)

def __init__(self, array_a, array_b):

self._loh_a = LayerOrderedHeap(array_a)

self._loh_b = LayerOrderedHeap(array_b)

self._full_cartesian_product_size = len(array_a) * len(array_b)

self._sorted_corners = sorted([self._min_tuple(i,j) for i in range(len(self._loh_a)) for j in range(len(self._loh_b))] + [self._max_tuple(i,j) for i in range(len(self._loh_a)) for j in range(len(self._loh_b))])

def select(self, k):

assert( k <= self._full_cartesian_product_size )

candidates = []

index_in_sorted = 0

num_elements_with_max_corner_popped = 0

while num_elements_with_max_corner_popped < k:

val, (i,j), is_max = self._sorted_corners[index_in_sorted]

new_candidates = [ v_a+v_b for v_a in self._loh_a[i] for v_b in self._loh_b[j] ]

if is_max:

num_elements_with_max_corner_popped += len(new_candidates)

else:

# Min corners will be popped before corresponding max corner;

# this gets a superset of what is needed (just as in phase 2)

candidates.extend(new_candidates)

index_in_sorted += 1

print( 'Ratio of total popped candidates to k: {}'.format(len(candidates) / k) )

k_small_vals, large_vals = partition(candidates, k)

return k_small_vals