Coin163

java.io.IOException: Segment already parsed

2016-05-30by coin, 次阅读
ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:java.io.IOException: Segment already parsed!
Exception in thread "main" java.io.IOException: Segment already parsed!
	at org.apache.nutch.parse.ParseOutputFormat.checkOutputSpecs(ParseOutputFormat.java:89)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:975)
	at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910)
	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353)
	at org.apache.nutch.parse.ParseSegment.parse(ParseSegment.java:209)
	at org.apache.nutch.parse.ParseSegment.run(ParseSegment.java:243)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
	at org.apache.nutch.parse.ParseSegment.main(ParseSegment.java:216)
已经parse过的Segment 不能再parse,需要删除parse_data parse_text crawl_parse,才可重新parse。
删除相应segments下面的内容:
rm -rf parse_data parse_text crawl_parse
再执行:
bin/nutch parse $crawldir/segments/<segmentnumber>


------分隔线----------------------------