Pareto Fronts Made Easy!
22 July 2024
Suppose you have a set of different objects, each object having different scores for a particular set of metrics.
Suppose we ask ourselves this question:
How does one select as many objects as possible, say to a set , such that there are no two distinct objects and where all metrics of are lower compared to ?
The set of objects that we want to obtain here is called a Pareto front. If you look at the Wikipedia page, this particular concept is widely used for optimizations in engineering. However, what I’ll be discussing is just a small subset of it, particularly on simple objectives such as the main question above.
I’ll also discuss only the case for 2 metrics and 3 metrics, respectively the 2D Pareto front and the 3D Parento front. For more than 3 metrics, it would require more sophisticated (approximation) algorithms which is out of the scope of this article.
Finally, for simplicity, we shall assume that all values within the same metric are distinct. For example, if we have two points and , then and .
2D Pareto Front
An example where we are to obtain a 2D Pareto front is this Kattis problem: Physical Music.
Let me simplify the problem statement:
There are CD singles each with their ranks on Single Top and on Download Top . We want to select as many CD singles as possible such that for any CD single in this selection we can’t find any other CD single with both higher rank on Single Top and lower rank on Download Top . Output the complement of this selection.
Suppose we represent these CD singles as point with the ranks as coordinates in the 2D Cartesian plane (, ). Basically the Pareto front here is the set of points such that for any point in this selection, we can’t find any other point with both higher -coordinate and lower -coordinate.
Now let’s take a look at the sample input:
- The first part has three points: . All three points lie on the Pareto front, so we output .
- The second part has four points: . Both points and do not lie on the Pareto front because and have both higher single rank and lower download rank than and , respectively. Therefore, we output both and , meaning the actual output is (because there are 2 points) and then and (the download ranks in sorted order).
Having visualized the sample input, let’s solve the problem:
- We shall use a sorted set (like AVL tree) to solve this problem, denote this sorted set .
- Go through the points in ascending order of .
- For any point , insert to . If there are currently no element smaller than in , then belongs to the Pareto front.
- Since we’re taking the complement of this Pareto front, we simply negate the condition: we output if there is an element smaller than it in at the moment of insertion of .
Overall, the algorithm takes time due to sorting of the points by , followed by insertions of , each costing . Thus, one should be able to pass the time limit even if .
If we do it naively using double loops, it will take , which is suboptimal for the problem.
3D Pareto Front
This would be a great opportunity to use another Kattis problem as an example: Excellent Engineers.
Here’s the simplified problem statement:
Suppose we have points in the 3D cartesian plane. We want to shortlist these points such that for any point in this shortlist, none of the points have lower values of , , and simultaneously. Simply output the size of the 3D Pareto front.
So how exactly can we extend from the 2D scenario to this?
- Again, let’s use a sorted set and call this .
- Sort these points in ascending -coordinate.
- For every point that we want to insert:
- Insert to .
- Suppose is the largest element of that is smaller than and is the point with the corresponding . Note that this means .
- If such point exists and , then will never belong to the Pareto front and we have to redelete it from . This is because were inserted before and thus . Combining the three inequalities led to such conclusion.
- Otherwise, we add to the Pareto front and keep at .
- While there exists an element of that is larger than , do the following:
- Suppose is the smallest such element and is the point with the corresponding .
- If , we remove from . This is because we know that and , but . So both points might still belong to the Pareto front, but since we only check the previous -coordinate during the insertion of a new point, we want to avoid false positives and maintain the elements of in ascending order to be in descending order of . The image below will show you a scenario where this deletion actually matters.
- Otherwise (), we stop the process and move on to the next point to be inserted, if any.
Overall, the algorithm still takes time because of these reasons:
- sorting by -coordinates takes time
- every insertion of a point takes time, so in total this takes time
- all while loops combined ensure that every point is deleted at most once, so these while loops will take at most another time (see also: Amortized analysis)
Similar to the 2D case, if we do this naively, it will take time, which is also suboptimal for the problem.