Quick Search for:  in language:    
Levenshtein,edit,distance,measure,similarity,
   Code/Articles » |  Newest/Best » |  Community » |  Jobs » |  Other » |  Goto » | 
CategoriesSearch Newest CodeCoding ContestCode of the DayAsk A ProJobsUpload
SQL Stats

 Code: 45,316. lines
 Jobs: 126. postings

 How to support the site

 
Sponsored by:

 
You are in:
 
Login





Latest Code Ticker for SQL.
sp_db_paging_pr imary_key
By Marcus LeBlanc on 1/11


sp_db_paging
By Marcus LeBlanc on 1/11


Beginners pack
By hardik shah on 1/11


Inline Views
By Thivya Prabakaran on 1/6


Click here to put this ticker on your site!


Add this ticker to your desktop!


Daily Code Email
To join the 'Code of the Day' Mailing List click here!

Affiliate Sites



 
 
   

Quick Levenshtein Edit Distance

Print
Email
 
VB icon
Submitted on: 11/4/2002 10:26:56 PM
By: Larry Lewis 
Level: Advanced
User Rating: By 3 Users
Compatibility:SQL Server 2000

Users have accessed this code 1706 times.
 
(About the author)
 
     Levenshtein edit distance is a measure of the similarity between two strings. Edit distance is the the minimum number of character deletions, insertions, substitutions, and transpositions required to transform string1 into string2. In essence, the function is used to perform a fuzzy or approximate string search. This is very handy for trying to find the "correct" string for one that has been entered incorrectly, mistyped, etc. The code has been optimized to find strings that are very similar. A "limit" parameter is provided so the function will quickly reject strings that contain more than k mismatches.
 
code:
Can't Copy and Paste this?
Click here for a copy-and-paste friendly version of this code!
 
Terms of Agreement:   
By using this code, you agree to the following terms...   
1) You may use this code in your own programs (and may compile it into a program and distribute it in compiled format for languages that allow it) freely and with no charge.   
2) You MAY NOT redistribute this code (for example to a web site) without written permission from the original author. Failure to do so is a violation of copyright laws.   
3) You may link to this code from another website, but ONLY if it is not wrapped in a frame. 
4) You will abide by any additional copyright restrictions which the author may have placed in the code or code's description.

--**************************************
--     
-- Name: Quick Levenshtein Edit Distance
--     
-- Description:Levenshtein edit distance
--      is a measure of the similarity between 
--     two strings. Edit distance is the the mi
--     nimum number of character deletions, ins
--     ertions, substitutions, and transpositio
--     ns required to transform string1 into st
--     ring2. In essence, the function is used 
--     to perform a fuzzy or approximate string
--      search. This is very handy for trying t
--     o find the "correct" string for one that
--      has been entered incorrectly, mistyped,
--      etc. The code has been optimized to fin
--     d strings that are very similar. A "limi
--     t" parameter is provided so the function
--      will quickly reject strings that contai
--     n more than k mismatches.
-- By: Larry Lewis
--
-- Inputs:@s varchar(50), @t varchar(50)
--     , @limit integer
--
-- Returns:Returns the integer edit dist
--     ance (minimum number of character deleti
--     ons, insertions, substitutions, and tran
--     spositions) required to transform string
--     1 into string2 where edit distance is &l;
--     t;= limit. Otherwise returns (len(s) + l
--     en(t)).
--
-- Assumes:This code takes character tra
--     nspositions into account when calculatin
--     g edit distance (If desired, the transpo
--     sition code can be commented out). The l
--     imit parameter provides a significant pe
--     rformance gain (over 10x faster) over st
--     andard implementations when searching fo
--     r highly similar strings.
--
--This code is copyrighted and has-- limited warranties.Please see http://
--     www.Planet-Source-Code.com/vb/scripts/Sh
--     owCode.asp?txtCodeId=584&lngWId;=5--for details.--**************************************
--     

CREATE function LEV2( @s varchar(50), @t varchar(50) , @limit integer)
--Returns the Levenshtein Distance with 
--     transpositions between
-- strings s1 and s2 using k-limit pruni
--     ng
--Includes transposed characters and man
--     y times faster than 
-- the original LEVENSHTEIN function (es
--     pecially when limit is low)
--Returns EditDistance where EditDistanc
--     e is <= limit otherwise returns len(@
--     s) + len(@t)
--Based on code by: Michael Gilleland ht
--     tp://www.merriampark.com/ld.htm
--Originally translated to TQL by Joseph
--      Gama
--Author: Larry Lewis
returns integer
as
    BEGIN
    DECLARE @d varchar(2500), @LD int, @m int, @n int, @i int, @j int, @k int, 
    @s_i char(1), @t_j char(1), @lendif int, @mindist int, @y int, @z int, @smallLen int, @cost int
    SET @LD = LEN(@s) + LEN(@t)
    --Remove leftmost matching portion of st
    --     rings
    WHILE Left(@s, 1) = Left(@t, 1) AND Len(@s) > 0 And Len(@t) > 0
    	BEGIN
    	SET @s = Right(@s, Len(@s) - 1)
    	SET @t = Right(@t, Len(@t) - 1)
    	END
    -- Find shorter string
     SET @n = Len(@s)
     SET @m = Len(@t)
     SET @smallLen = @n
     IF @m < @n 
     SET @smallLen = @m
    --Stop if string lengths differ by more 
    --     than LIMIT
     SET @lendif = @n - @m
     IF Abs(@lendif) > @limit 
    	GOTO done
     IF @n = 0 
    	BEGIN
    	IF @m <= @limit 
    		SET @LD = @m
     	GOTO done
    	END
     IF @m = 0 
    	BEGIN
    	IF @n <= @limit 
    		SET @LD = @n
    	GOTO done
    	END
    SET @d=replicate(CHAR(0),2500)
    -- Init Matrix d
    SET @i=0
    WHILE @i<=@n
    	BEGIN
    	SET @d=STUFF(@d,@i+1,1,CHAR(@i))--d(i, 0) = i
    	SET @i=@i+1
    	END
    SET @i=1
    WHILE @i<=@m
    	BEGIN
    	SET @d=STUFF(@d,@i*(@n+1)+1,1,CHAR(@i))--d(0, j) = j
    	SET @i=@i+1
    	END
    -- ' Main Loop - Levenshtein --Try to tr
    --     averse the matrix so that pruning cells 
    --     are calculated ASAP
    	SET @k=1
    	WHILE @k<=@smallLen
    		BEGIN
    		SET @s_i=(substring(@s,@k,1))
    	SET @j=1
    	WHILE @j<=@k
    		BEGIN
    		SET @t_j=(substring(@t,@j,1))
    		-- Evaluate d(k,j)
    	 	SET @MinDist = ASCII(substring(@d,(@j-1)*(@n+1)+@k,1)) --d(k - 1, j - 1)
    	 	IF @s_i <> @t_j SET @MinDist = @MinDist + 1	
    		---y = d(k - 1, j) + 1
    		SET @y = ASCII(substring(@d,@j*(@n+1)+@k,1))+1
    		---z = d(k, j - 1) + 1
    		SET @z = ASCII(substring(@d,(@j-1)*(@n+1)+@k+1,1))+1
    		IF @y < @MinDist SET @MinDist = @y
    		IF @z < @MinDist SET @MinDist = @z
    		-- d(k, j) = MinDist
    		SET @d=STUFF(@d,@j*(@n+1)+@k+1,1,CHAR(@MinDist))
    		-- CHECK Transposition
    		-- Transposition sections can be commented out IF transposition IS NOT desired
    		IF @j > 1 AND @k > 1
    			BEGIN
    				-- IF @MinDist - d(k - 2, j - 2) = 2 
    				IF @MinDist - ASCII(substring(@d,(@j-2)*(@n+1)+@k-1,1)) = 2
    				BEGIN
    					IF substring(@s,@k-1,1) = @t_j 
    					BEGIN
    						IF @s_i = substring(@t,@j-1,1) -- Mid$(t, j - 1, 1) 
    					 	--d(k, j) = d(k - 2, j - 2) + 1
    						SET @d = STUFF(@d,@j*(@n+1)+@k+1,1,Char( ASCII(substring(@d,(@j-2)*(@n+1)+@k-1,1))+1))
    					END
    				END
    			END
    		-- Limit Pruning - Stop processing IF we are already OVER the limit
    		IF @k = @j + @lendif
    			BEGIN
    			IF @k < @smallLen OR @j < @smallLen
    				BEGIN
    				--If d(k, j) > limit THEN EXIT Function
    				IF ASCII(substring(@d,@j*(@n+1)+@k+1,1)) > @limit GOTO done
    			END
     		END
    		SET @j=@j+1
    		END
    	IF @j <= @m
    	BEGIN
    	SET @t_j=substring(@t,@j,1)
     	SET @i = 1
     	WHILE @i <= @k
    		BEGIN
    		SET @s_i = substring(@s,@i,1)
    		-- Evaluate d(i,j)----------------
    		SET @MinDist = ASCII(substring(@d,(@j-1)*(@n+1)+@i,1)) --d(i - 1, j - 1)
    		 IF @s_i <> @t_j SET @MinDist = @MinDist + 1	
    		---y = d(i - 1, j) + 1
    		SET @y = ASCII(substring(@d,@j*(@n+1)+@i,1))+1
    		---z = d(i, j - 1) + 1
    		 SET @z = ASCII(substring(@d,(@j-1)*(@n+1)+@i+1,1))+1
    		IF @y < @MinDist SET @MinDist = @y
    		IF @z < @MinDist SET @MinDist = @z
    	 	-- d(k, j) = MinDist
    		SET @d=STUFF(@d,@j*(@n+1)+@i+1,1,CHAR(@MinDist))
    		-- CHECK Transposition
    		-- Transposition sections can be commented out IF transposition IS NOT desired
    		IF @i> 1 -- j IS always greater than 1
    			BEGIN
    			-- IF @MinDist - d(i - 2, j - 2) = 2 
    			IF @MinDist - ASCII(substring(@d,(@j-2)*(@n+1)+@i-1,1)) = 2
    				BEGIN
    				IF substring(@s,@i-1,1) = @t_j 
    					BEGIN
    		 			IF @s_i = substring(@t,@j-1,1) -- Mid$(t, j - 1, 1) 
    		 		--d(k, j) = d(i - 2, j - 2) + 1
    		 		SET @d = STUFF(@d,@j*(@n+1)+@i+1,1,CHAR(ASCII(substring(@d,(@j-2)*(@n+1)+@i-1,1))+1))
    				END
    				END
    			END
    		-- Limit Pruning - Stop processing IF we are already OVER the limit
    		IF @i = @j + @lendif
    			BEGIN
    		 	IF @i < @smallLen OR @j < @smallLen
    		 		BEGIN
    		 		--If d(i, j) > limit THEN EXIT Function
    		 		IF ASCII(substring(@d,@j*(@n+1)+@i+1,1)) > @limit GOTO done
    		 		END
    			END
    		SET @i = @i + 1
    		END
    		END
    		SET @k=@k+1
    		END 
    -- process remaining rightmost portion o
    --     f matrix if any (if m > n)
    	SET @i = @n + 1
    	WHILE @i <= @m
    		BEGIN
    		SET @t_j=(substring(@t,@i,1))
    	SET @j=1
    	WHILE @j<=@n
    		BEGIN
    		SET @s_i=(substring(@s,@j,1))
    		-- Evaluate d(j, i)
    	 	SET @MinDist = ASCII(substring(@d,(@i-1)*(@n+1)+@j,1)) --d(j - 1, i - 1)
    	 	IF @s_i <> @t_j SET @MinDist = @MinDist + 1	
    		---y = d(j - 1, i) + 1
    		SET @y = ASCII(substring(@d,@i*(@n+1)+@j,1))+1
    		---z = d(j, i - 1) + 1
    		SET @z = ASCII(substring(@d,(@i-1)*(@n+1)+@j+1,1))+1
    		IF @y < @MinDist SET @MinDist = @y
    		IF @z < @MinDist SET @MinDist = @z
    		-- d(j, i) = MinDist
    		SET @d=STUFF(@d,@i*(@n+1)+@j+1,1,CHAR(@MinDist))
    		-- CHECK Transposition
    		-- Transposition sections can be commented out IF transposition IS NOT desired
    		IF @j> 1 -- i IS always greater than 1
    			BEGIN
    				-- IF @MinDist - d(j - 2, i - 2) = 2 
    				IF @MinDist - ASCII(substring(@d,(@i-2)*(@n+1)+@j-1,1)) = 2
    				BEGIN
    					IF substring(@s,@j-1,1) = @t_j 
    					BEGIN
    						IF @s_i = substring(@t,@i-1,1) -- Mid$(t, i - 1, 1) 
    					 --d(j, i) = d(j - 2, i - 2) + 1
    	 					SET @d = STUFF(@d,@i*(@n+1)+@j+1,1,CHAR(ASCII(substring(@d,(@i-2)*(@n+1)+@j-1,1))+1))
    				END
    				END
    			END
     		 -- Limit Pruning - Stop processing IF we are already OVER the limit
     		IF @j = @i + @lendif
     			BEGIN
    			IF @i < @m OR @j < @n
    				BEGIN
    				--If d(j, i) > limit THEN EXIT Function
    				IF ASCII(substring(@d,@i*(@n+1)+@j+1,1)) > @limit GOTO done
    				END
     			END
    		SET @j = @j + 1
    		END
    		SET @i = @i + 1
    		END
    -- process remaining lower portion of ma
    --     trix if any (n > m)
    	SET @i = @m + 1
    	WHILE @i <= @n
    		BEGIN
    		SET @s_i= substring(@s,@i,1)
    	SET @j=1
    	WHILE @j<=@n
    		BEGIN
    		SET @t_j=(substring(@t,@j,1))
    		-- Evaluate d(i,j)
    		SET @MinDist = ASCII(substring(@d,(@j-1)*(@n+1)+@i,1)) --d(i - 1, j - 1)
    	 	IF @s_i <> @t_j SET @MinDist = @MinDist + 1	
    		---y = d(i - 1, j) + 1
    		SET @y = ASCII(substring(@d,@j*(@n+1)+@i,1))+1
    		---z = d(i, j - 1) + 1
     		 SET @z = ASCII(substring(@d,(@j-1)*(@n+1)+@i+1,1))+1
    		IF @y < @MinDist SET @MinDist = @y
    		IF @z < @MinDist SET @MinDist = @z
     		-- d(i, j) = MinDist
    		SET @d=STUFF(@d,@j*(@n+1)+@i+1,1,CHAR(@MinDist))
    		-- CHECK Transposition
    		-- This section can be commented out IF transposition is NOT desired
    		IF @j> 1 -- i IS always greater than 1
    			BEGIN
    				-- IF @MinDist - d(i - 2, j - 2) = 2 
    				IF @MinDist - ASCII(substring(@d,(@j-2)*(@n+1)+@i-1,1)) = 2
    				BEGIN
    					IF substring(@s,@i-1,1) = @t_j 
    					BEGIN
    	 					IF @s_i = substring(@t,@j-1,1) -- Mid$(t, j - 1, 1) 
    	 				--d(k, j) = d(i - 2, j - 2) + 1
    					SET @d = STUFF(@d,@j*(@n+1)+@i+1,1,CHAR(ASCII(substring(@d,(@j-2)*(@n+1)+@i-1,1))+1))
    				END
    				END
    			END
     		-- Limit Pruning - Stop processing IF we are already OVER the limit
     		IF @i = @j + @lendif
     		BEGIN
    			IF @i < @smallLen OR @j < @smallLen
    			BEGIN
    				--If d(i, j) > limit THEN EXIT Function
    				IF ASCII(substring(@d,@j*(@n+1)+@i+1,1)) > @limit GOTO done
    			END
     		END
     		SET @j = @j + 1
    		END
    		SET @i= @i + 1
    		END 
    -- Return EditDistance
    SET @LD = ASCII(substring(@d,@n*(@m+1)+@m+1,1))
    done:
    --RETURN @LD
    --I kept this code that can be used to d
    --     isplay the matrix with all calculated va
    --     lues
    --From Query Analyser it provides a nice
    --      way to check the algorithm in action
    --
    RETURN @LD
    --declare @z varchar(8000)
    --set @z=''
    --SET @i=0
    --WHILE @i<=@n
    --	BEGIN
    --	SET @j=0
    --	WHILE @j<=@m
    --		BEGIN
    --		set @z=@z+CONVERT(char(3),ASCII(subs
    --     tring(@d,@i*(@m+1 )+@j+1 ,1)))
    --		SET @j=@j+1 
    --		END
    --	SET @i=@i+1
    --	END
    --print dbo.wrap(@z,3*(@n+1))
END

 
Report Bad Submission
Use this form to notify us if this entry should be deleted (i.e contains no code, is a virus, etc.).
Reason:
 
Your Vote!

What do you think of this code(in the Advanced category)?
(The code with your highest vote will win this month's coding contest!)
Excellent  Good  Average  Below Average  Poor See Voting Log
 
Other User Comments

 There are no comments on this submission.
 
Add Your Feedback!
Note:Not only will your feedback be posted, but an email will be sent to the code's author in your name.

NOTICE: The author of this code has been kind enough to share it with you.  If you have a criticism, please state it politely or it will be deleted.

For feedback not related to this particular code, please click here.
 
Name:
Comment:

 

Categories | Articles and Tutorials | Advanced Search | Recommended Reading | Upload | Newest Code | Code of the Month | Code of the Day | All Time Hall of Fame | Coding Contest | Search for a job | Post a Job | Ask a Pro Discussion Forum | Live Chat | Feedback | Customize | SQL Home | Site Home | Other Sites | About the Site | Feedback | Link to the Site | Awards | Advertising | Privacy

Copyright© 1997 by Exhedra Solutions, Inc. All Rights Reserved.  By using this site you agree to its Terms and Conditions.  Planet Source Code (tm) and the phrase "Dream It. Code It" (tm) are trademarks of Exhedra Solutions, Inc.