`
cloudeagle_bupt
  • 浏览: 538945 次
文章分类
社区版块
存档分类
最新评论

一次失败的Hama改进经历

 
阅读更多

注意到IncomingVertexMessageManager (Hama0.7.1)类中有如下代码:


  @Override
  public GraphJobMessage poll() {
    if (mapMessages.size() > 0) {
      return mapMessages.poll();
    } else {
      if (storage.size() > 0 && it.hasNext()) {
        GraphJobMessage m = it.next();
        it.remove();
        return m;
      } else {
        return null;
      }
    }
  }

从理论上说,一边读取一边删除效率较低,不如一次性读取后全部删除,测试代码如下:


package concurrencyTest;

import java.util.Iterator;
import java.util.concurrent.ConcurrentHashMap;



public class ConcurrencyReadTest {

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		ConcurrentHashMap<Integer, Integer>  map = new ConcurrentHashMap<Integer, Integer>() ;
		for(int i=0;i<1000000 ; i++)
			map.put(i, i+111111) ;
		
		long start = System.currentTimeMillis() ;
		Thread t1 = new Thread(new Worker(map)) ;
		
		t1.start(); 
		
		try {
			t1.join() ;
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
//		map.clear() ;
		System.out.println("Read last : " + (System.currentTimeMillis() - start) + " ms");
	}
}

class Worker implements Runnable {
	ConcurrentHashMap<Integer, Integer>  sharemap ;
    int num = 0 ;
	
	public Worker(ConcurrentHashMap<Integer, Integer> map) {
		sharemap = map ;
	}
	
	@Override
	public void run() {
		 it = sharemap.values().iterator();
		 Integer i = poll(); 
		 while(i!=null) {
			 i = poll() ;
			 num++ ;
		 }
		 System.out.println(Thread.currentThread().getName() + " poll " + num + " elements! ShareMap size: " + sharemap.size() );
	}
	

	Iterator<Integer> it;

	public Integer poll() {
		     if (sharemap.size() > 0 && it!=null && it.hasNext()) {
		         Integer m = it.next();
		         it.remove();
		         return m;
		       } else {
		         return null;
		       }
	}
}

测试三次结果: 187,172, 172

将代码修改如下:

package concurrencyTest;

import java.util.Iterator;
import java.util.concurrent.ConcurrentHashMap;



public class ConcurrencyReadTest {

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		ConcurrentHashMap<Integer, Integer>  map = new ConcurrentHashMap<Integer, Integer>() ;
		for(int i=0;i<1000000 ; i++)
			map.put(i, i+111111) ;
		
		long start = System.currentTimeMillis() ;
		Thread t1 = new Thread(new Worker(map)) ;
//		Thread t2 = new Thread(new Worker(map)) ;
		
		t1.start(); 
//		t2.start();
		
		try {
			t1.join() ;
//			t2.join();
		} catch (InterruptedException e) {
			e.printStackTrace();
		}
		map.clear() ;
		System.out.println("Read last : " + (System.currentTimeMillis() - start) + " ms");
	}
}

class Worker implements Runnable {
	ConcurrentHashMap<Integer, Integer>  sharemap ;
    int num = 0 ;
	
	public Worker(ConcurrentHashMap<Integer, Integer> map) {
		sharemap = map ;
	}
	
	@Override
	public void run() {
		 it = sharemap.values().iterator();
		 Integer i = poll(); 
		 while(i!=null) {
			 i = poll() ;
			 num++ ;
		 }
		 System.out.println(Thread.currentThread().getName() + " poll " + num + " elements! ShareMap size: " + sharemap.size() );
	}
	

	Iterator<Integer> it;

	public Integer poll() {
		     if (sharemap.size() > 0 && it!=null && it.hasNext()) {
		         Integer m = it.next();
//		         it.remove();
		         return m;
		       } else {
		         return null;
		       }
	}
}

测试三次结果: 140,124, 140

可见确实快了。


但是将这种改进应用到Hama中,统计总的计算时间 出现:

新版:

16/09/24 16:20:20 INFO graph.GraphJobRunner: totalBroasdCastTime: 1793 totalSyncTime: 9219 totalLoopingTime: 13059
16/09/24 16:21:30 INFO graph.GraphJobRunner: totalBroasdCastTime: 2008 totalSyncTime: 9699 totalLoopingTime: 12834
16/09/24 16:22:47 INFO graph.GraphJobRunner: totalBroasdCastTime: 1909 totalSyncTime: 9192 totalLoopingTime: 12634

老版

16/09/24 16:32:22 INFO graph.GraphJobRunner: totalBroasdCastTime: 1993 totalSyncTime: 8351 totalLoopingTime: 13387
16/09/24 16:34:28 INFO graph.GraphJobRunner: totalBroasdCastTime: 1582 totalSyncTime: 8635 totalLoopingTime: 12827
16/09/24 16:35:39 INFO graph.GraphJobRunner: totalBroasdCastTime: 1679 totalSyncTime: 8543 totalLoopingTime: 12836


但是在最后总的时间总,还是老版略快些,不知道原因,猜测是否因为提前删除导致可用内存增大,性能提升?


分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics