Massive Algorithms: 寻找发帖“水王”

http://www.nowamagic.net/librarys/veda/detail/2377
《编程之美》2.3：
Tango是微软亚洲研究院的一个试验项目。研究院的员工和实习生们都很喜欢在Tango上面交流灌水。传说，Tango有一大“水王”，他不但喜欢发贴，还会回复其他ID发的每个帖子。坊间风闻该“水王”发帖数目超过了帖子总数的一半。如果你有一个当前论坛上所有帖子（包括回帖）的列表，其中帖子作者的ID也在表中，你能快速找出这个传说中的Tango水王吗？

Actually this is classic major element algorithm.
http://blog.csdn.net/kimili1987/article/details/8037984
题目：寻找一个ID列表中，有一个ID超过了总数的一半，找出这个ID

/*寻找发帖水王，算法有些类似于寻找最大和子序列*/
Type FInd(Type * ID, int N){
Type candicate;
int i, nTimes; //nTimes用来记录重复次数
for(i = nTimes = 0; i < N; i ++){
if(nTimes == 0){
candicate = ID[i]; //如果发现nTimes=0，则前面的重复完全抵消，重新选择一个candicate
nTimes = 1;
}else {
if(candicate == ID[i]) nTimes ++; //如果发现相同ID，重复次数加1
else nTimes --; //否则减少
}
}
return candicate;
}

扩展问题
随着Tango的发展，管理员发现，“超级水王”没有了。统计结果表明，有3个发帖很多的ID，他们的发帖数目都超过了帖子总数目N的1/4。你能从发帖ID列表中快速找出他们的ID吗？

“超级水王”没有了，有3个ID发帖也很多，每个人都超过帖子总数的1/4了，怎么快速找到这三个ID？

思路和“超级水王”的是一样的，只是这次需要有三个candicate[3]，和三个nTimes[3]。如果找到与这三个ID相同的，则对应的加1，如果都不相同，则三个ID数量都要减1

/*寻找发帖水王，算法有些类似于寻找最大和子序列*/
Type *Find(Type * ID, int N){
Type candicate[3];
int i, j, k, nTimes[3]; //nTimes用来记录重复次数
for(i = 0; i < 3; i++) nTimes[i] = 0;
for(i = 0; i < N; i ++){
for(j = 0; j < 3; j ++){
if(nTimes[j] == 0){
candicate[j] = ID[i]; //如果发现nTimes=0，则前面的重复完全抵消，重新选择一个candicate
nTimes[j] = 1;
break; // should break here
}else {
if(candicate[j] == ID[i]) {
nTimes[j] ++; //如果发现相同ID，重复次数加1
break;
}
}
}
//与这三个ID都不同，则都要减1
if(j == 3){
for(k = 0; k < 3; k++) nTimes[k]--;
}
}
return candicate;
}

http://www.nowamagic.net/librarys/veda/detail/2377

void Find(Type* ID, int N,Type candidate[3])
{
    Type ID_NULL;//定义一个不存在的ID
    int nTimes[3], i;
    nTimes[0]=nTimes[1]=nTimes[2]=0;
    candidate[0]=candidate[1]=candidate[2]=ID_NULL;
    for(i = 0; i < N; i++)
    {
        if(ID[i]==candidate[0])
        {
             nTimes[0]++;
        }
        else if(ID[i]==candidate[1])
        {
             nTimes[1]++;
        }
        else if(ID[i]==candidate[2])
        {
             nTimes[2]++;
        }
        else if(nTimes[0]==0)
        {
             nTimes[0]=1;
             candidate[0]=ID[i];
        }
        else if(nTimes[1]==0)
        {
             nTimes[1]=1;
             candidate[1]=ID[i];
        }
        else if(nTimes[2]==0)
        {
             nTimes[2]=1;
             candidate[2]=ID[i];
        }
        else
        {
             nTimes[0]--;
             nTimes[1]--;
             nTimes[2]--;
         }
    }
    return;
}

https://www.cs.ucsb.edu/~suri/cs130a/Celebrity.txt

2. Extension to K elements, each appearing at least n/(k+1) times.
   Maintain k candidates and counters.
   When considering xi, if xi equals one of the cands, increment it;
   if xi different and some counter available,
    set xi as that cand, and M = 1;
   if xi different from all, and all cands full;
    decrement them all.

3. A hot application problem in Data Stream context---heavy hitters
 in routers, top k popular items, etc...

   One important attribute of this scheme is that it uses O(1)
 memory---independent of the stream size.

寻找发帖“水王” - 编程之美-2.3

Labels

Popular Posts