Similarity Engine

About the Similarity Engine

* New Search * Choosing Input * Input Examples * About the Engine * Comments?

What do saffron rice and green beans have in common? You might find them on the same plate at dinner time! This sort of general association is at the core of the Similarity Engine.

The Similarity Engine uses user input to form associations between different items. It initially poses a question which asks for a number of responses: what of this sort of thing do you like , or what of this sort of thing goes together. By nature of the question, the various responses have a causal relationship: specifically, the person's belief that the items go together. By comparing the overlap between different responses, predictions can be made!

In layman terms, the Similarity Engine forms associations between items by storing each set of items submitted, and then finds similarities by finding the most common items previously submitted with each of the items.

In mathematical terms, each time a user enters a group of items, the group forms a loosely associated set. That is to say, any item in the set is associated with the other items in the set. For any given item existing in more than one group, the total associated set is simply the union of the groups. Given a large enough collection of groups, a given item will be a member of several groups and therefore possess a large total associated set. If this total associated set possesses repeating members, these members are more statistically significant than the non-repeating members. This is equivalent to taking the items which appear in the intersection of the largest number of sets.

It is possible to extend the range of the search by also counting the items deemed similar to the items found by searching on the initial group of items.

How about an example? Lets consider things you find on your dinner plate.

[steak, eggs, rice, green beans]
[hamburger, fries, cole slaw, beer]
[hamburger, tater tots, pickles, beer]
[steak, baked potato, corn, beer]
[spam, eggs, fries, beer]

With those four sets of four items, for each unique item, we may take the union of the sets containing that item. For this example it is:

steak		-> [eggs, rice, green beans, baked potato, corn, beer]
eggs		-> [steak, rice, green beans, spam, fries, beer]
rice		-> [steak, eggs, green beans]
green beans	-> [steak, eggs, rice]
hamburger	-> [fries, cole slaw, beer x 2, tater tots, pickles]
fries		-> [hamburger, cole slaw, beer x 2, spam, eggs]
cole slaw	-> [hamburger, fries, beer]
beer		-> [burger x 2, fries x 2, cole slaw, tater tots, pickles, steak, baked potato, corn, spam, eggs]
tater tots	-> [hamburger, pickles, beer]
pickles		-> [hamburger, tater tots, beer]
baked potato	-> [hot dog, corn, beer]
corn		-> [hot dog, baked potato, beer]
spam		-> [eggs, fries, beer]

Then given two items as input, we may generate a list of related items based on the relational information in the table. So given the input [fries, cole slaw], the intersection of the two table entries is:

[hamburger, cole slaw, beer x 2, spam, eggs] X [hamburger, fries, beer]

== [hamburger x 2, beer x 3]

suggesting that burgers and beer goes well with fries and cole slaw.