Submitted by careerstack
A trie is the most suitable data structure for making a dictionary of words, when we are also concerned about the prefixes of a word. A practical use of this structure can be in auto completion for search bars, or while searching names in a phone dictionary. There are multiple implementations of Trie, but we will be giving a basic one here. You can read more about trie here.
A node in a trie represents a single character of the whole string. So at any level of the node we can have a word formed by the characters from the root node to that node but that may not form a valid word, so each node must also contain a bool value, telling whether the word formed by the characters till that node form a valid word or not.
When we are searching for a valid word, while traversing down from the root the depth of the node is the number of prefixes it has and number of intermediate node which have a bool value as true, represents a valid prefix of that word. The given below representation does not store a bool field, and assumes whatever is present in the DS forms a valid word.
#include< iostream>
#include< stdio.h>
#include< string.h>
#define NOT_FOUND 0
#define FOUND 1
using namespace std;
struct trie_node
{
int words;
int prefixes;
trie_node* child[26];
}*root=NULL;
// typedef so that the name becomes short
typedef struct trie_node tn;
//initialize the trie
void initializeRoot(tn *root)
{
tn* node=new tn;
root=node;
//initializing the trie root node
for(int i=0;i < 26;i )
root-> child[i]=NULL;
root-> words=0;
root-> prefixes=0;
}
//add a child at character 'c' to a node 'node'
void addNode(tn *node, char c)
{
tn* childNode =new tn;
for(int i=0;i < 26;i )
childNode->child[i]=NULL;
childNode->words=0;
childNode->prefixes=0;
int index=(int)c-97;
node->child[index]=childNode;
}
int findWord(tn *root, char *word)
{
tn* curr;
curr=root;
int len=strlen(word);
//find length of the word
int i;
for(i=0;i < len;i )
{
// find the child of the current node depending on the next character
int index = (int)word[i] - 97;
curr=curr->child[index];
// if node exists then the word might exist
if(curr==NULL)
return NOT_FOUND;
}
if(i==len)
return FOUND;
else
return NOT_FOUND;
}
void addWord(tn *root, char *word)
{
int len=strlen(word);
tn* curr=root;
tn* parent=root;
int i;
for(i=0;i < len;i++)
{
// increment the number of prefix with the word till word[i]
curr->prefixes++;
int index= (int)word[i] - 97;
parent=curr;
curr=curr->child[i];
// no child exists for word[i] -- create one
if(curr==NULL)
addNode(parent,word[i]);
// word is finished increment the number of words
if(i==len-1)
curr->words++;
}
}
int countPrefix(tn *root, char *prefix)
{
tn* curr;
curr=root;
int len=strlen(prefix);
int i;
for(i=0;i < len;i++)
{
// find the child of the current node depending on the next character
int index = (int)prefix[i] - 97;
curr=curr->child[index];
// if node exists then the word might exist
if(curr==NULL)
return NOT_FOUND;
}
if(i==len)
return curr->prefixes;
else
return NOT_FOUND;
}
int main()
{
initializeRoot(root);
system("pause");
return 0;
}
The attribute "words" of "trie_node" is very unclear.
How is it possible to have more than one word ending at a node ? ( Does it mean that duplicate entries of a word is possible ? )
Please correct me if my interpretation is wrong !!!
Reply'words' out here means the number of valid words that have been formed up till that depth of the node i.e. at any level above that level
ReplyIf I am not wrong,
Beginning from the root node following any path to any child node will form a UNIQUE word ie. only one word is possible formed uptill any given child node.
- for example. Consider a word "apple". Formation of this word is possible only along a unique path in a TRIE.
ie. at the node for letter "e" the number of valid words cannot be more than one.
( tn.words = 0 or tn.words = 1 where "tn" is a trie_node corresponding to some letter. )
Yes you are correct , the words value cannot be more than 1. So this essentially signifies that the word ending at a node having words value as 1, is a valid word
Reply