Project

General

Profile

Bug #4058

Question 46 on Fall05-Exam3.pdf causes error in recognition

Added by Ryan McFall over 9 years ago. Updated over 9 years ago.

Status:
New
Priority:
Normal
Category:
parser
Target version:
Start date:
05/11/2009
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
4058

Description

(This is a resubmission of a bug lost due to fire. The old bug number was 4057; that number seems to have taken by a newer bug submitted after the restore. We talked about this bug during the videoconference on 2009-05-06).

Question 46 on the exam private/samples/Fall05-Exam3.pdf is not recognized
correctly. This question contains some text, followed by a line of text on
it's own line that represents a string of DNA, and then contains several
choices which are all numbers.

Not all of the prompt is put into the question text, and the last choice (E)
also contains the question text for the next question.

Sandeep, can you investigate this and make a report back as to why this is
occurring?

History

#1 Updated by Sandeep Namilikonda over 9 years ago

The following pseudo code is relevant in the context of Bugs 3923, 3924, 3989, 4058, 4005.
I tried to elaborate more for the conditions that seemed more probable to be faulty.

PDFAssessment: parse() {

// SeparateQuestions()

// AssociateImages()

// CombineSplitQuestions()

// RecognizeQuestions()

}

Pseudo code for SeparateQuestions():

for each string block "i":

1) Find new section or part
2) In the text block, if found 
(i)  delimited question
(ii) "answer key" text
(iii)instr block and range
(iv) image tag
(v)  digit 
- if expected question # found => set nextNumMatches and 
update qnum list and the index (multi-digit case)!
- else if new section found
- else if answer key found {} (goto for "i")!
- else append text to current question {} (goto for "i")!
- if correct number found then
(below are examples of cases handled where
134 is an arbitrary question number)
exclude 134" " and 134"a" 
exclude 134."a" and 134.")" and 134."$"
exclude 134" " and 134" a"
allow: 
134)
134" ".
134" "$
134#
134 a
134 #
- check for answer key "<number>) <letter>" 
- if index+1 is digit then append to current Q
- else set current Q's question number and prepend
instr block if any!
- else if new page then change pagenum and set question to current 
- else if left or right indented text
- if right then append
(vi) instr block 
- update instr block with current
- Remove the text from list and i--.
(vii)white lines between two consecutive lines of text
- if text corresponds to instructions (e.g., questions i-j)
then choose the text to be the new question text and set
indented to false.
- else append current text to prev question and add new box
to existing one. Remove the text and i--.
(viii)else same question but check indentation
- if left indented and choice text found (e.g., a. or b) )
then add the current text to prev question and update box.
- else set indented to true if current.x > q.x.
Add the current text to prev question and update box.
- Remove text and i--.

Most of the cases are handled by this last condition! (e.g., choices, multiple
lines of question). Infact, this is what results in bug 4058 due to faulty indentation check!

#2 Updated by Redmine Admin over 5 years ago

Original Bugzilla ID was 4058

Also available in: Atom PDF