X
Write a Tutorial

Trie

Submitted by careerstack

A trie is the most suitable data structure for making a dictionary of words, when we are also concerned about the prefixes of a word. A practical use of this structure can be in auto completion for search bars, or while searching names in a phone dictionary. There are multiple implementations of Trie, but we will be giving a basic one here. You can read more about trie here.

A node in a trie represents a single character of the whole string. So at any level of the node we can have a word formed by the characters from the root node to that node but that may not form a valid word, so each node must also contain a bool value, telling whether the word formed by the characters till that node form a valid word or not.

When we are searching for a valid word, while traversing down from the root the depth of the node is the number of prefixes it has and number of intermediate node which have a bool value as true, represents a valid prefix of that word. The given below representation does not store a bool field, and assumes whatever is present in the DS forms a valid word.

#include< iostream>
#include< stdio.h>
#include< string.h>
#define NOT_FOUND 0
#define FOUND 1
using namespace std;

struct trie_node
{
       int words;
       int prefixes;
       trie_node* child[26];

}*root=NULL;

// typedef so that the name becomes short
typedef struct trie_node tn;

//initialize the trie
void initializeRoot(tn *root)             
{
    tn* node=new tn;
    root=node;

    //initializing the trie root node
    for(int i=0;i < 26;i  )
             root-> child[i]=NULL;

    root-> words=0;
    root-> prefixes=0;
}

//add a child at character 'c' to a node 'node'
void addNode(tn *node, char c)          
{
     tn* childNode =new tn;

     for(int i=0;i < 26;i  )
             childNode->child[i]=NULL;

     childNode->words=0;
     childNode->prefixes=0;

     int index=(int)c-97;
     node->child[index]=childNode;
}

int findWord(tn *root, char *word)
{
     tn* curr;
     curr=root;
     int len=strlen(word);

     //find length of the word
     int i;
     for(i=0;i < len;i  )
     {
         // find the child of the current node depending on the next character
         int index = (int)word[i] - 97;         
         curr=curr->child[index];

          // if node exists then the word might exist
         if(curr==NULL)                        
           return NOT_FOUND;
     }
     if(i==len)
        return FOUND;
     else
         return NOT_FOUND;
}

void addWord(tn *root, char *word)
{
     int len=strlen(word);
     tn* curr=root;
     tn* parent=root;
     int i;
     for(i=0;i < len;i++)
     {
         // increment the number of prefix with the word till word[i]
         curr->prefixes++;

         int index= (int)word[i] - 97;            
         parent=curr;              
         curr=curr->child[i];

         // no child exists for word[i] -- create one
         if(curr==NULL)
            addNode(parent,word[i]);

         // word is finished increment the number of words
         if(i==len-1)
            curr->words++;              
     }
}

int countPrefix(tn *root, char *prefix)
{
     tn* curr;
     curr=root;
     int len=strlen(prefix);            
     int i;
     for(i=0;i < len;i++)
     {
         // find the child of the current node depending on the next character
         int index = (int)prefix[i] - 97;         
         curr=curr->child[index];

         // if node exists then the word might exist
         if(curr==NULL)                         
           return NOT_FOUND;
     }
     if(i==len)
        return curr->prefixes;
     else
         return NOT_FOUND;
}

int main()
{
    initializeRoot(root);
    system("pause");
    return 0;
}
views: 655
Edit this Tutorial

Comments

  • jitu7uec Jitesh Khandelwal Aug 07 '2012 at 01:21PM

    The attribute "words" of "trie_node" is very unclear.

    How is it possible to have more than one word ending at a node ? ( Does it mean that duplicate entries of a word is possible ? )

    Please correct me if my interpretation is wrong !!!

    Reply
    • sachin Sachin Gupta Aug 07 '2012 at 01:31PM

      'words' out here means the number of valid words that have been formed up till that depth of the node i.e. at any level above that level

      Reply
      • jitu7uec Jitesh Khandelwal Aug 08 '2012 at 06:09AM

        If I am not wrong,
        Beginning from the root node following any path to any child node will form a UNIQUE word ie. only one word is possible formed uptill any given child node.

        - for example. Consider a word "apple". Formation of this word is possible only along a unique path in a TRIE.
        ie. at the node for letter "e" the number of valid words cannot be more than one.
        ( tn.words = 0 or tn.words = 1 where "tn" is a trie_node corresponding to some letter. )

        Reply
        • sachin Sachin Gupta Aug 08 '2012 at 06:17AM

          Yes you are correct , the words value cannot be more than 1. So this essentially signifies that the word ending at a node having words value as 1, is a valid word

          Reply

Post Comment