Wednesday, March 21, 2012

Degree of separation search

I have an application that lets users search based on degree of separation,
so for instance, a user can search for age, hobbies, etc. and limit the
search to just users who are 1 degree separate, 2 degrees separate, or 3
degrees separate.
However as it stands now, searches are taking a long long time because
according to the programmer, the degree of separation is calculated
dynamically upon search, and with a system of about 50,000 users this is
taking way over one minute to execute, causing timeouts in the browser.
Is there a better way to go about doing this type of search? Perhaps using
a scheduler to perform some calculations beforehand so the searches can use
it? Any feedback will be greatly appreciated.
Shabam
Read this article
http://www.sommarskog.se/dyn-search.html
"Shabam" <chalupa@.yomama-nospam.com> wrote in message
news:_aCdnc14X62v2bnfRVn-iw@.adelphia.com...
> I have an application that lets users search based on degree of
separation,
> so for instance, a user can search for age, hobbies, etc. and limit the
> search to just users who are 1 degree separate, 2 degrees separate, or 3
> degrees separate.
> However as it stands now, searches are taking a long long time because
> according to the programmer, the degree of separation is calculated
> dynamically upon search, and with a system of about 50,000 users this is
> taking way over one minute to execute, causing timeouts in the browser.
> Is there a better way to go about doing this type of search? Perhaps
using
> a scheduler to perform some calculations beforehand so the searches can
use
> it? Any feedback will be greatly appreciated.
>
|||"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:%23zTvJokHFHA.3624@.tk2msftngp13.phx.gbl...
> Shabam
> Read this article
> http://www.sommarskog.se/dyn-search.html
Thanks for the article link. However the main problem with this search is
the degree of separation search, not the other search criterias. It appears
the application is going through calculating the degree of separation of
each user, then taking the acceptable ones and doing a search on them. The
degree of separation is stored in a function, and thus is being called
hundres, perhaps thousands of times per search. This is why it's taking 1+
minute to do a search. Do you have any ideas/suggestions on how to do this
right?
|||"Shabam" <chalupa@.yomama-nospam.com> wrote in message
news:LNednSboiO-g2rnfRVn-iQ@.adelphia.com...
> "Uri Dimant" <urid@.iscar.co.il> wrote in message
> news:%23zTvJokHFHA.3624@.tk2msftngp13.phx.gbl...
> Thanks for the article link. However the main problem with this search is
> the degree of separation search, not the other search criterias. It
> appears
> the application is going through calculating the degree of separation of
> each user, then taking the acceptable ones and doing a search on them.
> The
> degree of separation is stored in a function, and thus is being called
> hundres, perhaps thousands of times per search. This is why it's taking
> 1+
> minute to do a search. Do you have any ideas/suggestions on how to do
> this
> right?
>
There's not much concrete advice we can give without table DDL, sample data
and an explanation of the expected results.
David
|||Basically, this is a spacial problem. What I find most often is that the
developer wants to take the parameters as dynamic, then calculate a
"distance" function between the user with respect to the remaining users
based on the chosen metrics. That is time consuming and expensive.
However, coordinates in "space" are fixed, relativistic effects aside. So,
they are not dynamic and everyone's "position" is known for all metrics.
The problem is that many metrics have differing scales, but we will ignore
that for the moment. So, from beginning geometry, we have for each user,
there position is the set of coordinates, with respect to the origin:
User A: (x1, x2, ..., xn)
User B: (y1, y2, ..., yn)
Their "distance" from the origin is just the Pythagorean Theorem: a^2 + b^2
= c^2, but in N dimensions. The "distance" of all users from a specific one
is just a change of coordinates such that the specific user is put at the
origin:
User A, new coordinates: (x1 - x1, x2 - x2, ..., xn - xn), which is 0 and
what we wanted. For all other users, with respect to the specific user:
User B, new coordinates: (y1 - x1, y2 - x2, ..., yn - xn).
Now, the "distance" from the specific user to any other, in that reference
frame, is just the multi-dimensional, Pythagorean Theorem:
[(y1 - x1)^2 + (y2 - x2)^2 + ... + (yn - xn)^2]^1/2 = distance.
This outlines a multi-dimensional sphere, centered on the specific user.
The point is that everyone's position in space is know with respect to a
common origin and can be calculated beforehand and saved. Now, if you know
my position, you know my direction from the origin, then all users that are
a similar distance from the origin as I am, and in the general direction as
me, must be near me. This logic will produce a subset. Depending on how
restrictive you need to be, like top 100, top 10, top 5, etc., you could
create a general list of others that are near enough to calculate the
specific value without having to calculate it for everyone.
Say you need the 10 closest. Then with a set of, say 100, that where in my
general direction, you could quickly calculate the distance function above
for a mere 100 or so others and come up with the 10 closest, orders of
magnitude quicker than you could if you calculated the distance for
everyone.
Hope this helps.
Sincerely,
Anthony Thomas

"Shabam" <chalupa@.yomama-nospam.com> wrote in message
news:LNednSboiO-g2rnfRVn-iQ@.adelphia.com...
"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:%23zTvJokHFHA.3624@.tk2msftngp13.phx.gbl...
> Shabam
> Read this article
> http://www.sommarskog.se/dyn-search.html
Thanks for the article link. However the main problem with this search is
the degree of separation search, not the other search criterias. It appears
the application is going through calculating the degree of separation of
each user, then taking the acceptable ones and doing a search on them. The
degree of separation is stored in a function, and thus is being called
hundres, perhaps thousands of times per search. This is why it's taking 1+
minute to do a search. Do you have any ideas/suggestions on how to do this
right?
|||However as it stands now, searches are taking a long long time because
according to the programmer, the degree of separation is calculated
dynamically upon search, and with a system of about 50,000 users this
is
taking way over one minute to execute, causing timeouts in the browser
Have you tried to increase the timeout value?
Madhivanan
|||See http://groups.google.co.uk/groups?q=nearestExamplar for
further discussion along the lines of what Anthony has said.
Steve Kass
Drew University
Shabam wrote:

>I have an application that lets users search based on degree of separation,
>so for instance, a user can search for age, hobbies, etc. and limit the
>search to just users who are 1 degree separate, 2 degrees separate, or 3
>degrees separate.
>However as it stands now, searches are taking a long long time because
>according to the programmer, the degree of separation is calculated
>dynamically upon search, and with a system of about 50,000 users this is
>taking way over one minute to execute, causing timeouts in the browser.
>Is there a better way to go about doing this type of search? Perhaps using
>a scheduler to perform some calculations beforehand so the searches can use
>it? Any feedback will be greatly appreciated.
>
>
|||Thanks for the reply, but I think there's a misunderstanding here. When I
say degree of separation, I don't mean separation by physical distance, but
by friendship. For instance, A knows B, and B knows C. A doesn't know C.
In this case the relationship would be:
A <-> B <-> C
B would be a first degree friend of A, and C would be a second degree friend
of A, and so on.
The search is limiting based on this type of degree of separation.
|||Yes, I realize the answer I gave you was generic and mathematically based;
however, the logic is the same. For each user, you know there first level
acquantinces, etc., etc. This shouldn't change.
The only thing that is dynamic is which metrics to use for each search. If
you try to recompute it for each query, danamically, this becomes an M x N x
(N - 1) computation. As N or M gets large, this WILL NOT BE LINEAR; thus,
it does NOT scale well.
That, my friend, is a poorly written application and I wouldn't allow into
production. Add just 10% more users and it will bring your system to a
screetching halt!
Tell your "developer" to go back to school and learn what "good" code looks
like.
Sincerely,
Anthony Thomas

"Shabam" <chalupa@.yomama-nospam.com> wrote in message
news:O-idnU8g8Imw4bjfRVn-1w@.adelphia.com...
Thanks for the reply, but I think there's a misunderstanding here. When I
say degree of separation, I don't mean separation by physical distance, but
by friendship. For instance, A knows B, and B knows C. A doesn't know C.
In this case the relationship would be:
A <-> B <-> C
B would be a first degree friend of A, and C would be a second degree friend
of A, and so on.
The search is limiting based on this type of degree of separation.
|||"Shabam" <chalupa@.yomama-nospam.com> wrote in message
news:O-idnU8g8Imw4bjfRVn-1w@.adelphia.com...
> Thanks for the reply, but I think there's a misunderstanding here. When I
> say degree of separation, I don't mean separation by physical distance,
> but
> by friendship. For instance, A knows B, and B knows C. A doesn't know C.
> In this case the relationship would be:
> A <-> B <-> C
> B would be a first degree friend of A, and C would be a second degree
> friend
> of A, and so on.
> The search is limiting based on this type of degree of separation.
>
Ok. If you are just looking for a couple of "levels" you can do this pretty
quickly with a join.
There are several tricky problems with storing and sorting this kind of
relationship data, and you still didn't post DDL or sample data, so here's a
simple example:
drop table friend
drop table person
go
create table person
(
name varchar(20) primary key,
favorite_band varchar(50)
)
create table friend
(
friend1 varchar(20) not null references person,
friend2 varchar(20) not null references person,
constraint pk_friends primary key (friend1,friend2)
)
create index ix_friend2 on friend(friend2)
insert into person (name,favorite_band) values ('Joe','Def Leopard')
insert into person (name,favorite_band) values ('Alex','Wham')
insert into person (name,favorite_band) values ('Helmut','David Hasselhoff')
insert into person (name,favorite_band) values ('Dennis','Def Leopard')
insert into friend (friend1,friend2) values ('Joe','Alex')
insert into friend (friend1,friend2) values ('Alex','Joe')
insert into friend (friend1,friend2) values ('Joe','Helmut')
insert into friend (friend1,friend2) values ('Helmut','Joe')
insert into friend (friend1,friend2) values ('Helmut','Dennis')
insert into friend (friend1,friend2) values ('Dennis','Helmut')
insert into friend (friend1,friend2) values ('Dennis','Alex')
insert into friend (friend1,friend2) values ('Alex','Dennis')
'Joe has two tickes to the Def Leopard concert and needs someone to'
go
'with, but being shy he wants to go with a friend or a friend of a friend'
create view friends_and_friends_of_friends
as
select
p0.name,
p0.favorite_band,
p1.name friend_name,
p1.favorite_band friend_favorite_band,
p2.name friend_of_friend_name,
p2.favorite_band friend_of_friend_favorite_band
from
person p0
join friend f1 on p0.name = f1.friend1
join person p1 on f1.friend2 = p1.name
join friend f2 on p1.name = f2.friend1
join person p2 on p2.name = f2.friend2
where
p0.name <> p2.name
This query tells Joe that he can go with Dennis, and that he can get
introduced through either Alex or Helmut.
select *
from friends_and_friends_of_friends
where
name = 'Joe'
and
(
friend_favorite_band = 'Def Leopard'
or
friend_of_friend_favorite_band = 'Def Leopard'
)
David
sql

No comments:

Post a Comment