`
cloudeagle_bupt
  • 浏览: 541196 次
文章分类
社区版块
存档分类
最新评论

Giraph调试尝试

阅读更多

方法1: 看日志。



方法2:

  <property>
   <name>mapred.child.java.opts</name>
   <value>-Xmx1024m -Xdebug -Xrunjdwp:transport=dt_socket,address=8792,server=y,suspend=y</value>
 </property>

 <property>
   <name>mapred.tasktracker.map.tasks.maximum</name>
   <value>1</value>
 </property>

 <property>
   <name>mapred.tasktracker.reduce.tasks.maximum</name>
   <value>1</value>
 </property>


GiraphX即使-w参数设为1, 仍然要运行两个map任务, 一个master,一个worker, worker负责注册和实际计算,master汇总数据。

修改后pagerank能够顺利运行,理论上说也应该能顺利调试,但是结果好像不行。 master 的map 任务和worker的map任务出现了Debug端口抢占现象。



方法3:

IsolationRunner


mapred-site.xml 增加:

 <property>
   <name>keep.failed.task.files</name>
   <value>true</value>
 </property>  

 
 <property>
   <name>mapred.local.dir</name>
   <value>/opt/hadoop-1.2.1/tmp/mapred</value>
 </property>

到此目录:

/opt/hadoop-1.2.1/tmp/mapred/taskTracker/liuqiang2/jobcache/job_201603171716_0003/attempt_201603171716_0003_m_000001_0/work

执行:

[liuqiang2@mu02 work]$ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml
Exception in thread "main" java.lang.NullPointerException
        at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.ifExists(LocalDirAllocator.java:508)
        at org.apache.hadoop.fs.LocalDirAllocator.ifExists(LocalDirAllocator.java:216)
        at org.apache.hadoop.mapred.IsolationRunner.run(IsolationRunner.java:195)
        at org.apache.hadoop.mapred.IsolationRunner.main(IsolationRunner.java:238)



发现是LocalDirAllocator中出现问题,于是增加一行代码:

  public boolean ifExists(String pathStr,Configuration conf) {
    AllocatorPerContext context = obtainContext(contextCfgItemName);
    try {
context.confChanged(conf);
	
	} catch (IOException e) {
		e.printStackTrace();
	}  
    return context.ifExists(pathStr, conf);
  }


然后执行,发现没有包含giraph相关jar包,修改hadoop 脚本中的classpath, 见http://blog.csdn.net/cloudeagle_bupt/article/details/50916686

然后可以执行:

[liuqiang2@mu02 work]$ pwd
/opt/hadoop-1.2.1/tmp/mapred/taskTracker/liuqiang2/jobcache/job_201603171947_0001/attempt_201603171947_0001_m_000001_0/work
[liuqiang2@mu02 work]$ hadoop org.apache.hadoop.mapred.IsolationRunner  ../job.xml
结果:
[liuqiang2@mu02 work]$ hadoop org.apache.hadoop.mapred.IsolationRunner ../job.xml
Listening for transport dt_socket at address: 8792
16/03/17 20:40:19 WARN bsp.BspOutputFormat: getOutputCommitter: Returning ImmutableOutputCommiter (does nothing).
16/03/17 20:40:19 INFO util.ProcessTree: setsid exited with exit code 0
16/03/17 20:40:19 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@13b3625
16/03/17 20:40:19 INFO mapred.MapTask: Processing split: 'org.apache.giraph.bsp.BspInputSplit, index=-1, num=-1
16/03/17 20:40:19 INFO graph.GraphTaskManager: setup: Log level remains at info
16/03/17 20:40:19 INFO zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201603171947_0001
16/03/17 20:40:19 INFO zk.ZooKeeperManager: createCandidateStamp: Made the directory _bsp/_defaultZkManagerDir/job_201603171947_0001/_zkServer
16/03/17 20:40:19 INFO zk.ZooKeeperManager: createCandidateStamp: Creating my filestamp _bsp/_defaultZkManagerDir/job_201603171947_0001/_task/mu02 1
16/03/17 20:40:19 INFO zk.ZooKeeperManager: getZooKeeperServerList: For task 1, got file 'zkServerList_mu02 0 ' (polling period is 3000)
16/03/17 20:40:19 INFO zk.ZooKeeperManager: getZooKeeperServerList: Found [mu02, 0] 2 hosts in filename 'zkServerList_mu02 0 '
16/03/17 20:40:19 INFO zk.ZooKeeperManager: onlineZooKeeperServers: Got [mu02] 1 hosts from 1 ready servers when 1 required (polling period is 3000) on attempt 0
16/03/17 20:40:19 INFO graph.GraphTaskManager: setup: Starting up BspServiceWorker...
16/03/17 20:40:19 INFO bsp.BspService: BspService: Path to create to halt is /_hadoopBsp/job_201603171947_0001/_haltComputation
16/03/17 20:40:19 INFO bsp.BspService: BspService: Connecting to ZooKeeper with job job_201603171947_0001, 1 on mu02:22181
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:host.name=mu02
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.version=1.7.0_79
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.home=/home/liuqiang2/jdk/jdk1.7.0_79/jre
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/opt/hadoop-1.2.1/libexec/../conf:/home/liu ............  一堆jar包
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.library.path=/opt/hadoop-1.2.1/libexec/../lib/native/Linux-amd64-64
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-279.el6.x86_64
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:user.name=liuqiang2
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/liuqiang2
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Client environment:user.dir=/opt/hadoop-1.2.1/tmp/mapred/taskTracker/liuqiang2/jobcache/job_201603171947_0001/attempt_201603171947_0001_m_000001_0/work
16/03/17 20:40:19 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=mu02:22181 sessionTimeout=60000 watcher=org.apache.giraph.worker.BspServiceWorker@623cc34d
16/03/17 20:40:19 INFO zookeeper.ClientCnxn: Opening socket connection to server mu02/192.168.0.100:22181. Will not attempt to authenticate using SASL (unknown error)
16/03/17 20:40:19 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
16/03/17 20:40:20 INFO zookeeper.ClientCnxn: Opening socket connection to server mu02/192.168.0.100:22181. Will not attempt to authenticate using SASL (unknown error)
16/03/17 20:40:21 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
16/03/17 20:40:22 INFO zookeeper.ClientCnxn: Opening socket connection to server mu02/192.168.0.100:22181. Will not attempt to authenticate using SASL (unknown error)
16/03/17 20:40:22 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
16/03/17 20:40:23 INFO zookeeper.ClientCnxn: Opening socket connection to server mu02/192.168.0.100:22181. Will not attempt to authenticate using SASL (unknown error)
16/03/17 20:40:23 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
16/03/17 20:40:24 INFO zookeeper.ClientCnxn: Opening socket connection to server mu02/192.168.0.100:22181. Will not attempt to authenticate using SASL (unknown error)
16/03/17 20:40:24 WARN zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)

这里由于map任务作为子进程需要进行zookeeper通信,但是由于只是跑一个单任务,因此没法继续运行,但是单任务测试的目的已达到。




分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics