30 July 2013

Algorithmathon

Instead of more String manipulations, as I indicated, we will be looking into Tries and the Longest Common Substring problem. A friend mentioned the problem to me and I became very interested in Tries as a result.

A Trie is a type of Tree that proves itself useful in string manipulation. To find the Longest Common Substring (LCS) a Suffix Trie is particularly useful. You create a Suffix tree by taking all the possible suffixes of a string and adding it to the Trie. We can find the LCS by adding all of the suffixes for two strings to a Trie with a label for each string. Then finding the longest path with both string labels will give us the LCS.

1. Label strings
2. Find all suffixes of the Strings
3. Add all suffixes to the trie.
4. Find next pointer which points to a branch with both string labels
5. Follow the branch until it no longer has both string labels.
6. Record the resulting sequence of letters
7. If the sequence of letters is longer than the last sequence of letters remember it.
8. Goto step 4

The cost of adding the strings should be n(n-1). For string S1 and string S2 this would be S1(S1 - 1) + S2(S2 - 1). If S1 represents the the longer string. The searching should have a cost of S1 + (26 * S2(S2 - 1)). Where S2 is the shorter of the two strings.

Descriptions of Suffix Tries often refer to them being Compressed Trie variant. To reduce complexity I have chose not to compress the trie.

We will need several RSpec tests to verify the needed functionality for adding strings to the Trie.

trie_spec.rb
describe "Class" do
it "should have method 'root'" do
Trie.method_defined?(:root).should be_true
end

it "should have method 'add'" do
end
end

describe "Instance" do

before(:each) do
@trie = Trie.new()
end

it "should not return nil when 'root' is called" do
@trie.root.should_not be_nil
end

it "should not return nil when 'root[65]' is called" do
@trie.root[65].should_not be_nil
end

it "should not return nil when 'root[68]' is called" do
@trie.root[68].should_not be_nil
end

it "should not return nil when 'root[65][66]' is called" do
@trie.root[65][66].should_not be_nil
end

it "should not return nil when 'root[65][66][67][68][69][70][71]' is called" do
@trie.root[65][66][67][68][69][70][71].should_not be_nil
end

it "should not return 1 when 'root[65][66][67][68][69][70][71].length' is called" do
@trie.root[65][66][67][68][69][70][71].length.should == 2
end

it "should not return 1 when 'root[65][66][67][68][69][70][71][0]' is called" do
@trie.root[65][66][67][68][69][70][71][0].should == 1
end

describe "Markers indicating a specific string" do
it "should not return nil when 'root[68][1]' is called" do
@trie.root[68][1].should_not be_nil
end

it "should return true when 'root[68][1][1]' is called" do
@trie.root[68][1][1].should be_true
end

it "should return false when 'root[68][1][0]' is called" do
@trie.root[68][1][0].should be_false
end
end
end
end
end

The code below satisfies these tests and provides the core functionality of our Trie. Rather than building a custom node Class we are using Arrays.

trie.rb
class Trie

def initialize()
@root = Array.new(100)
end

unless new_string.nil? || new_string.length < 1
end
end

private
first_char=new_string.slice!(0)
unless first_char.nil?
first_char_number=first_char.chr.ord
if current_node_array[first_char_number].nil?
current_node_array[first_char_number] = Array.new()
end
else
current_node_array[0]=1
end
end

if current_node_array[1].nil?
current_node_array[1]=Array.new()
end
current_node_array[1][marker]=true
end

end

We now need to write a method to add all the suffixes of a string to the trie.

trie_spec.rb (excerpt)
#code omitted here
it "should have method 'add_suffixes'" do
end
#code omitted here
before(:each) do
@trie = Trie.new()
end

it "should not return nil when 'root[65]' is called" do
@trie.root[65].should_not be_nil
end

it "should not return nil when 'root[68]' is called" do
@trie.root[68].should_not be_nil
end

it "should not return nil when 'root[72]' is called" do
@trie.root[72].should_not be_nil
end

it "should not return nil when 'root[75]' is called" do
@trie.root[75].should_not be_nil
end

it "should return false when 'root[69][1][1]' is called" do
@trie.root[69][1][1].should be_false
end

it "should return true when 'root[69][1][0]' is called" do
@trie.root[69][1][0].should be_true
end

it "should return true when 'root[70][1][0]' is called" do
@trie.root[70][1][0].should be_true
end

it "should return true when 'root[70][1][1]' is called" do
@trie.root[70][1][1].should be_true
end

it "should return true when 'root[71][1][0]' is called" do
@trie.root[71][1][0].should be_true
end

it "should return true when 'root[71][1][1]' is called" do
@trie.root[71][1][1].should be_true
end

it "should return false when 'root[72][1][0]' is called" do
@trie.root[72][1][0].should be_false
end

it "should return true when 'root[72][1][1]' is called" do
@trie.root[72][1][1].should be_true
end

end
#code omitted here

The code to satisfy these tests is quite straightforward. We add all possible suffixes of the string with a marker.

trie.rb
#code omitted here
unless new_string.nil? || new_string.length < 1
string_end=new_string.length-1
for counter in 0..string_end
end
end
end
#code omitted here

Now all that is required is for us to find the LCS. Which we can do by searching for branches of the trie that have markers from both strings and returning the longest one.

trie_spec.rb (excerpt)
#code omitted here
it "should have method 'longest_common_substring'" do
Trie.method_defined?(:longest_common_substring).should be_true
end
#code omitted here
describe "Longest common substring method tests" do
it "should return 'bcd' when Trie.longest_common_substring(ABCD,BCDE)  is called" do
trie= Trie.new()
trie.longest_common_substring('ABCD','BCDE').should == 'BCD'
end

it "should return 'try' when Trie.longest_common_substring(TRY,RETRY)  is called" do
trie= Trie.new()
trie.longest_common_substring('TRY','RETRY').should == 'TRY'
end

it "should return 'FGH' when Trie.longest_common_substring(ABCDFGHXYZ,TSVPFGHARS)  is called" do
trie= Trie.new()
trie.longest_common_substring('ABCDFGHXYZ','TSVPFGHARS').should == 'FGH'
end
end
#code omitted here

Finally the code to satisfy the latest RSpec tests will give us our functionality.

trie.rb
#code omitted here
def longest_common_substring(string_zero, string_one)
longest_common_substring_by_recursion()
end
#code omitted here
def longest_common_substring_by_recursion()
root_end=@root.length-1
latest_stack=Array.new()
for counter in 65..root_end
unless @root[counter].nil?
if @root[counter][1][0] && @root[counter][1][1]
tracking_stack=Array.new()
tracking_stack.push(counter.chr)
longest_common_substring_by_recursion_follow_branch(@root[counter],tracking_stack)
if tracking_stack.length > latest_stack.length
latest_stack = tracking_stack
end
end
end
end
latest_stack.join
end

def longest_common_substring_by_recursion_follow_branch(current_node_array,tracking_stack)
array_end=current_node_array.length-1
for counter in 65..array_end
unless current_node_array[counter].nil?
if current_node_array[counter][1][0] && current_node_array[counter][1][1]
tracking_stack.push(counter.chr)
longest_common_substring_by_recursion_follow_branch(current_node_array[counter],tracking_stack)
end
end
end
end
#code omitted here

Less Is More ~ Older posts are available in the archive.