Anthony Y. Fu, Xiaotie Deng, Wenyin Liu, and Greg Little: Methodology and an Application to Fight Unicode Attacks
July 14, 2006 by PingUnicode makes available a wide range of similar-looking characters that can be used to fool us into trusting the wrong domain name or address. For example, “citibank” can be spelled with similar-looking characters in over 200 billion different ways (there are about 20 characters that look like “c”, 58 that look like “i”, and so on).
The authors presented two methods for automatically detecting the confusability of Unicode strings: one based on visual and semantic edit distance (VSED) and one based on the Knuth-Morris-Pratt algorithm (VSKMP). Both use a table of visual similarity between characters, produced by comparing the pixels of characters rendered in Arial Unicode MS, and a table of semantic similarity, produced by hand.
See also the researchers’ website.
July 14th, 2006 at 05:59
I’m highly amused by how their semantic similarity example shows that “student” and “coin” are completely unrelated.