June 13th, 2007

Bird

(no subject)

Here's a rather curious problem I've been thinking on for a while but didn't succeed.

Formal statement: Suppose R is a symmetric and reflexive, but not necessarily transitive, binary relation: R :: T -> T -> bool. What properties must R posess so that there would exist an integer n and a function norm :: T -> Rn so that forall a, b in T : R(a,b) <=> |norm(a)-norm(b)|<1? And how to built such a norm function?

Informal statement: R is a 'similarity function' that is computed very slowly and we need to compute it really many times. We are trying to speed it up by substituting objects from T with their vector 'normal forms' so that similarity becomes proximity in the vector space.

An example: suppose T is the domain of strings over a finite alphabet A of size k, and R states 'are anagrams of each other, +-m letters'. Then we can take norm(a) = vector of letter frequencies (of dimension k) in a, and compare frequency vectors. However, in this case a substantial speedup for computing similarity does not occur, but see below.

A reformulation: For what undirected graphs does there exist a layout such that an edge connects those and only those pairs of vertices that are on a distance of less than 1?

If we invent such a 'norm' function, we will be able to efficiently find similar objects in non-vector domains, such as texts etc. by substituting the original problem with the k-nearest problem. I suppose there are many other problems that will largely benefit from the solution of this one.