Point-region QuadTree (PR Quadtree)

Introduction

A rooted tree in which every node has 4 children:

  • each node corresponds to a square in the plane
  • each child of a node corresponds to a quadrant of the
    square of the node
  • each node contains a list of maximum K points. K is called the capacity of the PR Quadtree

PR QuadTree is often used to represent a collection of data points in two dimensions space.
Point QuadTree is a special case of PR QuadTree where its capacity == 1 (K == 1)

Operations on PR Quadtree

1. Insert new point P to the tree
The basic idea is, starting from root, we try to insert point to the node's point list if it still has capacity. Otherwise, we split the node's square into 4 parts and add 4 corresponding children. Then, we insert the point to the node that contain it.

If current node does not contain the point P: return
Else 
    If current node contains < K point: 
        push P to current node's point list
        return
    Else
        Create 4 children 
        Insert P to 4 children

Complexity: O(h), h is the height of the tree. Height of the tree depends on the distribution of the points on the space. Worst case can be n/4, Average case is log(n/4)/log4

A QuadTree with K = 4, contains 500 points.

2. Find all the points in a region.
The region can be any shape (rectangle, circle, ...), as long as you can implement the intersection logic of it with a square.

Find all the point in green circle
  • starting from root, if the current node does not intersect the querying region then return
  • otherwise:
    • we check each point in current node if it is inside the querying region and add to the result
    • if current node has children, recursively query its children.
queryCircle(circle, res) {
    if (!this.boundary.intersectCircle(circle)) {
        return res;
    } else {
        for(let p of this.points) {
            if (circle.contains(p)) {
                res.push(p);
            }
        }

        if (this.divided) {
            this.topLeft.queryCircle(circle, res);
            this.topRight.queryCircle(circle, res);
            this.bottomLeft.queryCircle(circle, res);
            this.bottomRight.queryCircle(circle, res);
        }

        return res;
    }
}

Please check the source code here and the demo here.

Proximity server system design

Proximity servers

Proximity servers are services like Yelp, NearBy in which users can discover near by attractions like restaurant, temples, market ...
Functional requirement:

  • Users can add/delete/update Places.
  • Given their location (longitude/latitude), users can find all nearby places within a given radius.
  • 500M places, 100K QPS

Non-functional requirement:

  • users should have real-time search experience
  • heavy search load.
Naive solution

Using above schema and store Places in a relational DB. To find all nearyby places for a given location (X,Y) within radius R, we can query:

Select * From Place
Where lat >= X-R And lat <= X+R And long >= Y-R And long <= Y-R

This query is not 100% accurate, as we are finding all places in a square, not a circle, but for simplicity, just use it for now.
With 500M rows in the DB, this query will be very slow even with indexes, because there are a lot of places between X-R and X+R, and a lot of places between Y-R and Y+R. The storage engine needs to merge these 2 huge lists.

Better Solution

We divide the whole map into smaller grids. Each grid store all places residing within its boundary. Based on location, we can calculate the gridId for each place.

If the grid size equal to the distance we want to query, we only need to query within the grid which contains the given location and neighboring grids.

Select * From Place
Where gridId in (GridId0, GridId1, ..., GridId8) And
      lat >= X-R And lat <= X+R And long >= Y-R And long <= Y-R

This solution still can be very slow if we query in grids having a lot of places (e.g. big city center). This problem can be solved if we can dynamically adjust the grid size so that each grid only contains maxium a fixed reasonable number of places.

Better Better Solution using PR QuadTree

We can use a PR QuadTree with capacity of 500 to represent the map.
Number of node = 500M / 500 = 1M
Each location we need to store long, lat and id. Total memory needed:
24 bytes * 500M = 12 GB
=> can fit in moden server.

  • Sharding: if the number of place grows in the future, we can shard the place based on PlaceId. To find nearby places of a given location (X,Y,R), we query all the sharded servers, and aggregate the result.

References: