Coin163

Could not find output/spill0.out in any of the configured local directories

2016-05-30by coin, 次阅读
ava.lang.Exception: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/spill0.out in any of the configured local directories
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find output/spill0.out in any of the configured local directories
	at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:429)
	at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:160)
	at org.apache.hadoop.mapred.MapOutputFile.getSpillFile(MapOutputFile.java:107)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1619)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1328)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:431)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
ERROR fetcher.Fetcher - Fetcher: java.io.IOException: Job failed!
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
	at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1340)
	at org.apache.nutch.fetcher.Fetcher.run(Fetcher.java:1376)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.fetcher.Fetcher.main(Fetcher.java:1349)

nutch在fetcher时会写到临时文件中,默认为/tmp/hadoop,
/tmp目录满了 (7.1G时)导致最老的spill0.out  被冲走了,当时已计数到 spill689.out
parse时,就会出现上面错误。
解决:

  使用另一个大的分区给hadoop.tmp.dir  或使用Hadoop cluster

在nutch-site.xml加入:

<property>
<name>hadoop.tmp.dir</name>
<value>/path/to/large/hadoop/tmp</value>
</property>
------分隔线----------------------------