Find the intervals from a set of intervals in which a given point lies

6
|
|
viewed: 40 times
Jul 08 '2014 at 10:21AM

2 Answers

The most trivial solution the problem is to go over all the intervals and see whether the point lies in that range. But then this is a little inefficient process. When the interval ranges are too large then this process can become prohibitive. The most intuitive alternative that comes to the mind is some form of tree like data structure. And yes there exists a specialized tree known as Interval Tree [(topcoder) (pdf)].

This image below illustrates a simple interval tree

To construct an interval tree from the input given in the question, the following steps should be taken

  1. Firstly we will arrange the end points of all the intervals in the increasing order. So for the above given input it will become 0 9 10 15 20 25 28 40 50 60 90. If there are N intervals, there will be 2N end-points and hence sorting will take O(NlogN) time. The entire range of all the intervals now becomes 0-90.

  2. We start by taking the entire range of all the intervals and dividing it in half at x_center (in practice, x_center should be picked to keep the tree relatively balanced). This gives three sets of intervals, those completely to the left of x_center which we'll call S_left, those completely to the right of x_center which we'll call S_right, and those overlapping x_center which we'll call S_center.

  3. The intervals in S_left and S_right are recursively divided in the same manner until there are no intervals left.

  4. The intervals in S_center that overlap the center point are stored in a separate data structure linked to the node in the interval tree. This data structure consists of two lists, one containing all the intervals sorted by their beginning points, and another containing all the intervals sorted by their ending points.

The result is a binary tree with each node storing:

  • A center point
  • A pointer to another node containing all intervals completely to the left of the center point
  • A pointer to another node containing all intervals completely to the right of the center point
  • All intervals overlapping the center point sorted by their beginning point
  • All intervals overlapping the center point sorted by their ending point

To find the intervals in which a given number 'x' lies we do the following:

  1. For each tree node, x is compared to x_center, the midpoint used in node construction above. If x is less than x_center, the leftmost set of intervals, S_left, is considered. If x is greater than x_center, the rightmost set of intervals, S_right, is considered.

  2. As each node is processed as we traverse the tree from the root to a leaf, the ranges in its S_center are processed. If x is less than x_center, we know that all intervals in S_center end after x, or they could not also overlap x_center. Therefore, we need only find those intervals in S_center that begin before x. Suppose we find the closest number no greater than x in this list. All ranges from the beginning of the list to that found point overlap x because they begin before x and end after x (as we know because they overlap x_center which is larger than x). Thus, we can simply start enumerating intervals in the list until the endpoint value exceeds x.

  3. Likewise, if x is greater than x_center, we know that all intervals in S_center must begin before x, so we find those intervals that end after x using the list sorted by interval endings.

  4. If x exactly matches x_center, all intervals in S_center can be added to the results without further processing and tree traversal can be stopped.

So we simply traverse down the Binary tree and consider all the sub-trees which may contain the range in which the number lies. The input is read from a file and it contains N lines if there are N ranges and each line contains the starting and ending co-ordinate separated by space. A sample input can be

    5 20
    15 30
    28 40 
    50 70
    0 9 
    60 90

struct tree
{
    pair< list < int >, list < int > > S;
    int node_val;
    tree* left;
    tree* right;
}*root;

// This function builds the tree out of the range values
void build_tree(tree*&, int,int, vector < int >);

// This function adds the appropriate ranges in the tree nodes
void add_ranges(tree* root, map < int,int >);

// Helper function for add_ranges
void add_pair(tree *root, pair < int,int >);

// Returns the number of valid ranges for a given number
int num_of_ranges(tree* root,int num);

int main()
{
    // Read input from a file and store it in suitable data structure
    ifstream file;
    file.open("input.txt");
    string line;
    vector < int > numbers;
    map < int,int > range;
    while(file.good())
    {
        getline(file,line);
        istringstream stream(line);

        int num1,num2;
        stream >> num1;
        stream >> num2;
        numbers.push_back(num1);
        numbers.push_back(num2);
        range[num1] = num2;
    }

    sort(numbers.begin(),numbers.end());

    // Display the ranges that have been read from the file
    map < int,int > :: iterator jt;
    for(jt = range.begin();jt!=range.end(); jt++)
        cout << "(" << jt->first << "-" << jt->second << ") ";
    cout << endl;

    // Build tree and add the ranges
    build_tree(root,0, numbers.size()-1, numbers);  
    add_ranges(root,range);

    // Enter the number for which ranges are to be searched
    cout <<  "Enter a number" << endl;
    int num;
    cin >> num;
    cout << "The number of ranges for " << num << " are " << num_of_ranges(root,num) << endl;

    system("pause");
    return 0;
}

void build_tree(tree*& root, int left, int right, vector < int > numbers)
{
    if(left > right)
        return;

    root = new tree;
    int index = (left+right)/2;
    root->node_val = numbers[index];
    root->left = NULL;
    root->right = NULL;
    build_tree(root->left, left, index-1,numbers);
    build_tree(root->right, index+1, right,numbers);
}

void add_ranges(tree* root, map < int,int > ranges)
{
    map < int,int > :: iterator it;
    for(it = ranges.begin(); it!= ranges.end(); it++)
    {
        pair < int,int > temp (it->first,it->second);
        add_pair(root,temp);
    }
}

void add_pair(tree* root, pair < int,int > range)
{
    if(root==NULL)
        return;

    if(range.second < root->node_val)
        add_pair(root->left,range);

    else if(range.first > root->node_val)
        add_pair(root->right,range);

    else
    {
        root->S.first.push_back(range.first);
        root->S.first.sort();
        root->S.second.push_back(range.second);
        root->S.second.sort();
    }
}

int num_of_ranges(tree* root,int num)
{

    if(root == NULL)        
        return 0;

    int count = 0;

    if(num < root->node_val)
    {
        list < int > temp = root->S.first;
        list < int >:: iterator it;
        for(it = temp.begin(); it!=temp.end(); it++)
        {
            if(*it < num)
                count++;
        }
        return count + num_of_ranges(root->left,num);
    }
    else if (num > root->node_val)
    {
        list < int > temp = root->S.second;
        list < int >:: iterator it;
        for(it = temp.begin(); it!=temp.end(); it++)
        {
            if(num < *it)
                count++;
        }
        return count + num_of_ranges(root->right,num);
    }
    else
    {
        count = root->S.first.size();
        return count;
    }   
}

As we can see the pre-processing time to construct the data structure is O(n log n), however the query time is O(log n). This trade off of higher pre-processing time for lower query time is often desirable when we have to make a lot of queries.

Look at it from a practical point of view, if there is product for which a higher investment in the development phase will give a much faster performance in deployment, then that is always preferred.

Moreover interval trees are dynamic, and support insertion and deletion of intervals during runtime. Hence a higher preprocessing time, is often offset by the lower query time. Anyways this is what the interviewer also expects you to answer.

Jul 08 '2014 at 10:21AM

  • I feel the explanation is little vague. can u plzz expain what happens after Btree is created?
    vishu
    |
    Jul 08 '2014 at 10:21AM
    |
    reply
  • Firstly a Binary Tree is different from Btree and secondly it will be very difficult to explain the solution in a comment, the best would be for you to work out the example in the question using the steps that been outlined above.
    Sachin Gupta
    |
    Jul 08 '2014 at 10:21AM
    |
    reply
  • ok....i tried working with code...but did'nt get it...just explain what add_pair function do?
    vishu
    |
    Jul 08 '2014 at 10:21AM
    |
    reply
  • 'add_pair' is the function where we look for the correct node in the Binary Tree where we need to place a particular pair. This is akin to Binary Searching. You check the pair range with node values, and accordingly go in the left sub-tree or right sub-tree, or insert it in the current node
    Sachin Gupta
    |
    Jul 08 '2014 at 10:21AM
    |
    reply
  • What if we keep intervals sorted by start in Start array and end in End array. Now, we just have to perform two binary searches. One for removing all the elements ending before x and other for removing all segments starting after x. These would be disjoint hence no overhead.
    tushar1391
    |
    Jul 08 '2014 at 10:21AM
    |
    reply

if you want to return list if intervals then segment tree is good idea but if you want to return just number of intervals only then I think Fenwick Tree is batter

Segment Tree Complexity : C*(log n + no of output intervals) , C has heigher value compare to fenwick tree C1

Fenwick Tree : Complexity: C1* (log n) fast as it is array based easy to implement 10-15 line code is enough.

Oct 08 '2014 at 01:45AM

Answer this Question

Please login to answer the question.