30 July 2013

Algorithmathon

Instead of more String manipulations, as I indicated, we will be looking into Tries and the Longest Common Substring problem. A friend mentioned the problem to me and I became very interested in Tries as a result.

A Trie is a type of Tree that proves itself useful in string manipulation. To find the Longest Common Substring (LCS) a Suffix Trie is particularly useful. You create a Suffix tree by taking all the possible suffixes of a string and adding it to the Trie. We can find the LCS by adding all of the suffixes for two strings to a Trie with a label for each string. Then finding the longest path with both string labels will give us the LCS.

  1. Label strings
  2. Find all suffixes of the Strings
  3. Add all suffixes to the trie.
  4. Find next pointer which points to a branch with both string labels
  5. Follow the branch until it no longer has both string labels.
  6. Record the resulting sequence of letters
  7. If the sequence of letters is longer than the last sequence of letters remember it.
  8. Goto step 4

The cost of adding the strings should be n(n-1). For string S1 and string S2 this would be S1(S1 - 1) + S2(S2 - 1). If S1 represents the the longer string. The searching should have a cost of S1 + (26 * S2(S2 - 1)). Where S2 is the shorter of the two strings.

Descriptions of Suffix Tries often refer to them being Compressed Trie variant. To reduce complexity I have chose not to compress the trie.

We will need several RSpec tests to verify the needed functionality for adding strings to the Trie.

trie_spec.rb
  describe "Class" do
    it "should have method 'root'" do
      Trie.method_defined?(:root).should be_true
    end 

    it "should have method 'add'" do
      Trie.method_defined?(:add).should be_true
    end 
  end

  describe "Instance" do


    describe "Add method tests" do
      before(:each) do
        @trie = Trie.new()
        @trie.add("ABCDEFG",0)
        @trie.add("DEFGHIJK",1)
      end

      it "should not return nil when 'root' is called" do
        @trie.root.should_not be_nil
      end

  
      it "should not return nil when 'root[65]' is called" do
        @trie.root[65].should_not be_nil
      end

      it "should not return nil when 'root[68]' is called" do
        @trie.root[68].should_not be_nil
      end    

      it "should not return nil when 'root[65][66]' is called" do
        @trie.root[65][66].should_not be_nil
      end        

      it "should not return nil when 'root[65][66][67][68][69][70][71]' is called" do
        @trie.root[65][66][67][68][69][70][71].should_not be_nil
      end

      it "should not return 1 when 'root[65][66][67][68][69][70][71].length' is called" do
        @trie.root[65][66][67][68][69][70][71].length.should == 2
      end 

      it "should not return 1 when 'root[65][66][67][68][69][70][71][0]' is called" do
        @trie.root[65][66][67][68][69][70][71][0].should == 1
      end

      describe "Markers indicating a specific string" do
        it "should not return nil when 'root[68][1]' is called" do
          @trie.root[68][1].should_not be_nil
        end 

        it "should return true when 'root[68][1][1]' is called" do
          @trie.root[68][1][1].should be_true
        end

        it "should return false when 'root[68][1][0]' is called" do
          @trie.root[68][1][0].should be_false
        end    
      end    
    end
  end
end

The code below satisfies these tests and provides the core functionality of our Trie. Rather than building a custom node Class we are using Arrays.

trie.rb
class Trie
  attr_reader :root

  def initialize()
    @root = Array.new(100)
  end

  def add(new_string,marker=0)
    unless new_string.nil? || new_string.length < 1
      add_by_recursion(@root,new_string,marker)
    end
  end

  private
  def add_by_recursion(current_node_array,new_string,marker)
    first_char=new_string.slice!(0)
    unless first_char.nil?
      first_char_number=first_char.chr.ord
      if current_node_array[first_char_number].nil?
        current_node_array[first_char_number] = Array.new()
      end 
      add_marker(current_node_array[first_char_number],marker)   
      add_by_recursion(current_node_array[first_char_number],new_string,marker)
    else
      current_node_array[0]=1
    end
  end

  def add_marker(current_node_array,marker)
    if current_node_array[1].nil?
      current_node_array[1]=Array.new()
    end
    current_node_array[1][marker]=true
  end

end

We now need to write a method to add all the suffixes of a string to the trie.

trie_spec.rb (excerpt)
#code omitted here
    it "should have method 'add_suffixes'" do
      Trie.method_defined?(:add_suffixes).should be_true
    end 
#code omitted here
    describe "Add_suffix method tests" do
      before(:each) do
        @trie = Trie.new()
        @trie.add_suffixes("ABCDEFG",0)
        @trie.add_suffixes("FGHIJK",1)
      end

      it "should not return nil when 'root[65]' is called" do
        @trie.root[65].should_not be_nil
      end      

      it "should not return nil when 'root[68]' is called" do
        @trie.root[68].should_not be_nil
      end

      it "should not return nil when 'root[72]' is called" do
        @trie.root[72].should_not be_nil
      end

      it "should not return nil when 'root[75]' is called" do
        @trie.root[75].should_not be_nil
      end

      it "should return false when 'root[69][1][1]' is called" do
        @trie.root[69][1][1].should be_false
      end

      it "should return true when 'root[69][1][0]' is called" do
        @trie.root[69][1][0].should be_true
      end

      it "should return true when 'root[70][1][0]' is called" do
        @trie.root[70][1][0].should be_true
      end

      it "should return true when 'root[70][1][1]' is called" do
        @trie.root[70][1][1].should be_true
      end

      it "should return true when 'root[71][1][0]' is called" do
        @trie.root[71][1][0].should be_true
      end

      it "should return true when 'root[71][1][1]' is called" do
        @trie.root[71][1][1].should be_true
      end

      it "should return false when 'root[72][1][0]' is called" do
        @trie.root[72][1][0].should be_false
      end

      it "should return true when 'root[72][1][1]' is called" do
        @trie.root[72][1][1].should be_true
      end

    end
#code omitted here

The code to satisfy these tests is quite straightforward. We add all possible suffixes of the string with a marker.

trie.rb
#code omitted here
  def add_suffixes(new_string,marker)
    unless new_string.nil? || new_string.length < 1
      string_end=new_string.length-1
      for counter in 0..string_end
        add(new_string[counter..string_end],marker)
      end
    end
  end
#code omitted here

Now all that is required is for us to find the LCS. Which we can do by searching for branches of the trie that have markers from both strings and returning the longest one.

trie_spec.rb (excerpt)
#code omitted here
    it "should have method 'longest_common_substring'" do
      Trie.method_defined?(:longest_common_substring).should be_true
    end    
#code omitted here
    describe "Longest common substring method tests" do
      it "should return 'bcd' when Trie.longest_common_substring(ABCD,BCDE)  is called" do
        trie= Trie.new()
        trie.longest_common_substring('ABCD','BCDE').should == 'BCD'
      end

      it "should return 'try' when Trie.longest_common_substring(TRY,RETRY)  is called" do
        trie= Trie.new()
        trie.longest_common_substring('TRY','RETRY').should == 'TRY'
      end 

       it "should return 'FGH' when Trie.longest_common_substring(ABCDFGHXYZ,TSVPFGHARS)  is called" do
        trie= Trie.new()
        trie.longest_common_substring('ABCDFGHXYZ','TSVPFGHARS').should == 'FGH'
      end     
    end
#code omitted here

Finally the code to satisfy the latest RSpec tests will give us our functionality.

trie.rb
#code omitted here
  def longest_common_substring(string_zero, string_one)
    add_suffixes(string_zero,0)
    add_suffixes(string_one,1)
    longest_common_substring_by_recursion()
  end
#code omitted here
  def longest_common_substring_by_recursion()
    root_end=@root.length-1
    latest_stack=Array.new()
    for counter in 65..root_end
      unless @root[counter].nil?
        if @root[counter][1][0] && @root[counter][1][1]
          tracking_stack=Array.new()
          tracking_stack.push(counter.chr)
          longest_common_substring_by_recursion_follow_branch(@root[counter],tracking_stack)
          if tracking_stack.length > latest_stack.length
            latest_stack = tracking_stack
          end
        end 
      end
    end
    latest_stack.join
  end

  def longest_common_substring_by_recursion_follow_branch(current_node_array,tracking_stack)
    array_end=current_node_array.length-1
    for counter in 65..array_end
      unless current_node_array[counter].nil?
        if current_node_array[counter][1][0] && current_node_array[counter][1][1]
          tracking_stack.push(counter.chr)
          longest_common_substring_by_recursion_follow_branch(current_node_array[counter],tracking_stack)
        end 
      end
    end
  end
#code omitted here

Less Is More ~ Older posts are available in the archive.