Project

General

Profile

Actions

Bug #3453

closed

Exception extracting text from doc11.pdf

Added by Ryan McFall over 16 years ago. Updated over 16 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
parser
Target version:
Start date:
07/16/2008
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
3453

Description

When parsing doc11.pdf from the private/samples directory, calling PDFExtract.extract gives the following stack trace:

java.util.NoSuchElementException
at java.util.LinkedList$ListItr.next(Unknown Source)
at edu.msu.first.parser.extract.PDFExtract.stripHeaders(PDFExtract.java:931)
at edu.msu.first.parser.extract.PDFExtract.extractContent(PDFExtract.java:206)
at edu.msu.first.parser.extract.PDFExtract.extract(PDFExtract.java:136)

Actions #1

Updated by Bridger Hamilton over 16 years ago

PDFExtract.stripHeaders now checks to make sure there is another line of text before checking to see if it's a header.

Actions #2

Updated by Redmine Admin over 11 years ago

Original Bugzilla ID was 3453

Actions

Also available in: Atom PDF