Given a set of intervals such as (10,20), (15,25), (28,40), (50,70), (0,9) (60,90) and build a data structure. Query the data structure for point x, and it find out all the intervals that contain this point x.
The most trivial solution the problem is to go over all the intervals and see whether the point lies in that range. But then this is a little inefficient process. When the interval ranges are too large then this process can become prohibitive. The most intuitive alternative that comes to the mind is some form of tree like data structure. And yes there exists a specialized tree known as Interval Tree [(topcoder) (pdf)].
This image below illustrates a simple interval tree
To construct an interval tree from the input given in the question, the following steps should be taken
Firstly we will arrange the end points of all the intervals in the increasing order. So for the above given input it will become 0 9 10 15 20 25 28 40 50 60 90. If there are N intervals, there will be 2N end-points and hence sorting will take O(NlogN) time. The entire range of all the intervals now becomes 0-90.
We start by taking the entire range of all the intervals and dividing it in half at x_center (in practice, x_center should be picked to keep the tree relatively balanced). This gives three sets of intervals, those completely to the left of x_center which we'll call S_left, those completely to the right of x_center which we'll call S_right, and those overlapping x_center which we'll call S_center.
The intervals in S_left and S_right are recursively divided in the same manner until there are no intervals left.
The intervals in S_center that overlap the center point are stored in a separate data structure linked to the node in the interval tree. This data structure consists of two lists, one containing all the intervals sorted by their beginning points, and another containing all the intervals sorted by their ending points.
The result is a binary tree with each node storing:
To find the intervals in which a given number 'x' lies we do the following:
For each tree node, x is compared to x_center, the midpoint used in node construction above. If x is less than x_center, the leftmost set of intervals, S_left, is considered. If x is greater than x_center, the rightmost set of intervals, S_right, is considered.
As each node is processed as we traverse the tree from the root to a leaf, the ranges in its S_center are processed. If x is less than x_center, we know that all intervals in S_center end after x, or they could not also overlap x_center. Therefore, we need only find those intervals in S_center that begin before x. Suppose we find the closest number no greater than x in this list. All ranges from the beginning of the list to that found point overlap x because they begin before x and end after x (as we know because they overlap x_center which is larger than x). Thus, we can simply start enumerating intervals in the list until the endpoint value exceeds x.
Likewise, if x is greater than x_center, we know that all intervals in S_center must begin before x, so we find those intervals that end after x using the list sorted by interval endings.
If x exactly matches x_center, all intervals in S_center can be added to the results without further processing and tree traversal can be stopped.
So we simply traverse down the Binary tree and consider all the sub-trees which may contain the range in which the number lies. The input is read from a file and it contains N lines if there are N ranges and each line contains the starting and ending co-ordinate separated by space. A sample input can be
5 20
15 30
28 40
50 70
0 9
60 90
struct tree
{
pair< list < int >, list < int > > S;
int node_val;
tree* left;
tree* right;
}*root;
// This function builds the tree out of the range values
void build_tree(tree*&, int,int, vector < int >);
// This function adds the appropriate ranges in the tree nodes
void add_ranges(tree* root, map < int,int >);
// Helper function for add_ranges
void add_pair(tree *root, pair < int,int >);
// Returns the number of valid ranges for a given number
int num_of_ranges(tree* root,int num);
int main()
{
// Read input from a file and store it in suitable data structure
ifstream file;
file.open("input.txt");
string line;
vector < int > numbers;
map < int,int > range;
while(file.good())
{
getline(file,line);
istringstream stream(line);
int num1,num2;
stream >> num1;
stream >> num2;
numbers.push_back(num1);
numbers.push_back(num2);
range[num1] = num2;
}
sort(numbers.begin(),numbers.end());
// Display the ranges that have been read from the file
map < int,int > :: iterator jt;
for(jt = range.begin();jt!=range.end(); jt++)
cout << "(" << jt->first << "-" << jt->second << ") ";
cout << endl;
// Build tree and add the ranges
build_tree(root,0, numbers.size()-1, numbers);
add_ranges(root,range);
// Enter the number for which ranges are to be searched
cout << "Enter a number" << endl;
int num;
cin >> num;
cout << "The number of ranges for " << num << " are " << num_of_ranges(root,num) << endl;
system("pause");
return 0;
}
void build_tree(tree*& root, int left, int right, vector < int > numbers)
{
if(left > right)
return;
root = new tree;
int index = (left+right)/2;
root->node_val = numbers[index];
root->left = NULL;
root->right = NULL;
build_tree(root->left, left, index-1,numbers);
build_tree(root->right, index+1, right,numbers);
}
void add_ranges(tree* root, map < int,int > ranges)
{
map < int,int > :: iterator it;
for(it = ranges.begin(); it!= ranges.end(); it++)
{
pair < int,int > temp (it->first,it->second);
add_pair(root,temp);
}
}
void add_pair(tree* root, pair < int,int > range)
{
if(root==NULL)
return;
if(range.second < root->node_val)
add_pair(root->left,range);
else if(range.first > root->node_val)
add_pair(root->right,range);
else
{
root->S.first.push_back(range.first);
root->S.first.sort();
root->S.second.push_back(range.second);
root->S.second.sort();
}
}
int num_of_ranges(tree* root,int num)
{
if(root == NULL)
return 0;
int count = 0;
if(num < root->node_val)
{
list < int > temp = root->S.first;
list < int >:: iterator it;
for(it = temp.begin(); it!=temp.end(); it++)
{
if(*it < num)
count++;
}
return count + num_of_ranges(root->left,num);
}
else if (num > root->node_val)
{
list < int > temp = root->S.second;
list < int >:: iterator it;
for(it = temp.begin(); it!=temp.end(); it++)
{
if(num < *it)
count++;
}
return count + num_of_ranges(root->right,num);
}
else
{
count = root->S.first.size();
return count;
}
}
As we can see the pre-processing time to construct the data structure is O(n log n), however the query time is O(log n). This trade off of higher pre-processing time for lower query time is often desirable when we have to make a lot of queries.
Look at it from a practical point of view, if there is product for which a higher investment in the development phase will give a much faster performance in deployment, then that is always preferred.
Moreover interval trees are dynamic, and support insertion and deletion of intervals during runtime. Hence a higher preprocessing time, is often offset by the lower query time. Anyways this is what the interviewer also expects you to answer.
Please login to answer the question.