Project

General

Profile

Bug #7210

View service duplicates EML Text content

Added by Bryce Mecum about 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
09/14/2017
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:

Description

This abstract


<abstract>
      <section>
        <title>Introduction</title>
        <para>Between 1958 and 1999, Austin Post led the USGS collection of aerial imagery of North American glaciers. These images are primarily vertical stereo black and white images, although single oblique images, as well as color images have been collected. The glaciers of North America were the subjects, and the digital products made available here serve to document the changes that have occurred to the glaciers over the past 5 decades. The purpose of this project is to preserve the data contained within these film images in a digital format for future analysis of North American glacier change.</para>
      </section>
      <section>
        <title>File Layout</title>
        <para>
          <orderedlist>
            <listitem>
              <para>The first level contains an overall data set of image metadata from 1964 - 1997 (nagapData.csv) and an R script (searchData.R) with instructions on how to search and subset the data.  fileLayout.pdf shows the file structure and folder contents visually.  There are also three kml files with flight path information by decade.</para>
            </listitem>
            <listitem>
              <para>The second level is the year in which the pictures were taken.  There are 32 years with images from 1964 – 1997.  The majority of these folders are jpegs with notes provided by Austin Post.  They also contain a year-specific csv (YYYY.csv) that contains image metadata for the entire year (date, roll numbers, location name, longitude, latitude, altitude, media, and comments).  The overall data set (nagapData.csv) is the aggregate of each individual “YYYY.csv” file.</para>
            </listitem>
            <listitem>
              <para>The glacier photos are located at the third level (this level).  The folders at this level are distinguished by camera roll number (1, 2, etc.), and image type (thumbnail, jpeg, or tif); some also contain fiducial and oblique image folders.  This level primarily contains image files of aerial photos as either thumbnails, jpegs, or tifs. It also includes a csv with image metadata specific to each roll (date, roll numbers, location name, longitude, latitude, altitude, media, and comments), a text file (info.txt) with camera specifications unique to each image, and a text file (histo.txt or matchReport.txt) with color information and scanner specifications unique to each image.</para>
            </listitem>
          </orderedlist>
        </para>
      </section>
    </abstract>

produces the following HTML:

<div class="sectionText">
    <h4 class="bold">Introduction</h4>
    <p>Between 1958 and 1999, Austin Post led the USGS collection of aerial imagery of North American glaciers. These images are primarily vertical stereo black and white images, although single oblique images, as well as color images have been collected. The glaciers of North America were the subjects, and the digital products made available here serve to document the changes that have occurred to the glaciers over the past 5 decades. The purpose of this project is to preserve the data contained within these film images in a digital format for future analysis of North American glacier change.</p>
</div>
<div class="sectionText">
    <h4 class="bold">File Layout</h4>
    <p>The first level contains an overall data set of image metadata from 1964 - 1997 (nagapData.csv) and an R script (searchData.R) with instructions on how to search and subset the data.  fileLayout.pdf shows the file structure and folder contents visually.  There are also three kml files with flight path information by decade.</p>
    <p>The second level is the year in which the pictures were taken.  There are 32 years with images from 1964 &ndash; 1997.  The majority of these folders are jpegs with notes provided by Austin Post.  They also contain a year-specific csv (YYYY.csv) that contains image metadata for the entire year (date, roll numbers, location name, longitude, latitude, altitude, media, and comments).  The overall data set (nagapData.csv) is the aggregate of each individual &ldquo;YYYY.csv&rdquo; file.</p>
    <p>The glacier photos are located at the third level (this level).  The folders at this level are distinguished by camera roll number (1, 2, etc.), and image type (thumbnail, jpeg, or tif); some also contain fiducial and oblique image folders.  This level primarily contains image files of aerial photos as either thumbnails, jpegs, or tifs. It also includes a csv with image metadata specific to each roll (date, roll numbers, location name, longitude, latitude, altitude, media, and comments), a text file (info.txt) with camera specifications unique to each image, and a text file (histo.txt or matchReport.txt) with color information and scanner specifications unique to each image.</p>
    <p>

              The first level contains an overall data set of image metadata from 1964 - 1997 (nagapData.csv) and an R script (searchData.R) with instructions on how to search and subset the data.  fileLayout.pdf shows the file structure and folder contents visually.  There are also three kml files with flight path information by decade.

              The second level is the year in which the pictures were taken.  There are 32 years with images from 1964 &ndash; 1997.  The majority of these folders are jpegs with notes provided by Austin Post.  They also contain a year-specific csv (YYYY.csv) that contains image metadata for the entire year (date, roll numbers, location name, longitude, latitude, altitude, media, and comments).  The overall data set (nagapData.csv) is the aggregate of each individual &ldquo;YYYY.csv&rdquo; file.

              The glacier photos are located at the third level (this level).  The folders at this level are distinguished by camera roll number (1, 2, etc.), and image type (thumbnail, jpeg, or tif); some also contain fiducial and oblique image folders.  This level primarily contains image files of aerial photos as either thumbnails, jpegs, or tifs. It also includes a csv with image metadata specific to each roll (date, roll numbers, location name, longitude, latitude, altitude, media, and comments), a text file (info.txt) with camera specifications unique to each image, and a text file (histo.txt or matchReport.txt) with color information and scanner specifications unique to each image.

        </p>
</div>

which you can see duplicates the content in the ordreredlist. The content shouldn't be duplicated.

pids-with-texttype-content.csv (455 KB) pids-with-texttype-content.csv Chris Jones, 10/06/2017 09:50 AM

History

#1 Updated by Bryce Mecum almost 2 years ago

So I found the cause which was just some bizarre XSLT that just doesn't make much sense. So I re-wrote the whole XSLT to support the entire EML Text module. This can be seen here: https://dev.nceas.ucsb.edu/#view/urn:uuid:98da3cab-6697-4021-b8b9-d46d4acffaec

#2 Updated by Chris Jones almost 2 years ago

Okay Bryce - here's the list of pids that have textType content on the KNB. You might want to ignore the ones that are the old <scope>.<docid>.<rev> format unless they have a DOI shoulder in front of them. I got some NotFound exceptions because they don't have system metadata associated. I didn't dig into it though. There's plenty of others to look at. It seems like literalLayout is the most common element used, but there are others.

#3 Updated by Bryce Mecum almost 2 years ago

Thanks, Chris! I'm taking a run through those now.

#4 Updated by Bryce Mecum almost 2 years ago

Okay I've taken a look through those (thanks again) and they were helpful. I found two minor bugs and have fixed them. I'm feeling satisfied that this change will work on a wide variety of content.

#5 Updated by Bryce Mecum almost 2 years ago

Discussed with Lauren Walker and she caught a few good improvements. I've made those.

Also available in: Atom PDF