Project

General

Profile

Bug #3453

Exception extracting text from doc11.pdf

Added by Ryan McFall about 11 years ago. Updated about 11 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
parser
Target version:
Start date:
07/16/2008
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
3453

Description

When parsing doc11.pdf from the private/samples directory, calling PDFExtract.extract gives the following stack trace:

java.util.NoSuchElementException
at java.util.LinkedList$ListItr.next(Unknown Source)
at edu.msu.first.parser.extract.PDFExtract.stripHeaders(PDFExtract.java:931)
at edu.msu.first.parser.extract.PDFExtract.extractContent(PDFExtract.java:206)
at edu.msu.first.parser.extract.PDFExtract.extract(PDFExtract.java:136)

History

#1 Updated by Bridger Hamilton about 11 years ago

PDFExtract.stripHeaders now checks to make sure there is another line of text before checking to see if it's a header.

#2 Updated by Redmine Admin over 6 years ago

Original Bugzilla ID was 3453

Also available in: Atom PDF