Project

General

Profile

Bug #2463

Morpho can't delete or insert columns at right edge of tables with uneven record lengths

Added by Will Tyburczy about 13 years ago. Updated almost 13 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
morpho - general
Target version:
Start date:
06/16/2006
Due date:
% Done:

0%

Estimated time:
Bugzilla-Id:
2463

Description

To recreate this bug:
- Using a text-editor, create a tab-delimited (or otherwise delimited) file in which lines have different numbers of fields (i.e. if a particular line doesn't have a value for the last field, it simply inserts a newline, without putting in the last tab to signify an empty field).

- Create a dp in morpho and import the file as a dataTable.
- Save the dp (optional)
- Try to delete the one of the rightmost columns (far enough to the right that some of the lines in the file didn't have a tab character recording the value of the field)

The column operation will fail, though if you go into the tree editor at this point you can see that the metadata was changed to reflect the operation. This could lead to the column metadata being mispaired to the corresponding column values.

Morpho should either detect the uneven line lengths and refuse to import the table to begin with (giving the user notification of why), or it should be able to handle subsequent column operations on the table.

fake.tab (46 Bytes) fake.tab Will Tyburczy, 06/16/2006 01:25 PM

Related issues

Is duplicate of Morpho - Bug #2462: Can't delete columns toward end of tableResolved06/15/2006

History

#2 Updated by Will Tyburczy about 13 years ago

  • Bug 2462 has been marked as a duplicate of this bug. ***

#3 Updated by Will Tyburczy about 13 years ago

Mark Sch. says that on import, Morpho should just fill out short lines with extra delimiter characters as needed to make all the records even in length.

#4 Updated by Margaret O'Brien about 13 years ago

from Margaret: padding the lines to the length of the longest line is ok as long as all those extra columns could be deleted. We run into text files with uneven line lengths often in excel output, and sometimes get quite a few extra columns. An alternative would be to create only the number of columns corresponding to the number of headers entered, (either auto or manually entered). As a user, I think rejecting the file altogether is OK, too (with an appropriate message).

#5 Updated by Callie Bowdish almost 13 years ago

Work Around and description of Excel Empty Value problem

It is common to use Microsoft Excel to save tables as tab delimited text files. Sometimes tables will have rows that END with cells with empty values In some cases like this, Excel creates a file that does not include the delimiters for these cells. The first 16 rows may include the delimiters because the header row does not contain missing values. But after 16 rows it was observed that the new line starts with out delineating where the cells with empty values are.


from web search:

Why do I get the error "wrong number of values" when I use insheet to read data from Excel?
Title How Excel writes empty cells
Author Paul Lin, StataCorp
Date April 1997
This problem has to do with how Excel writes empty cells into files. Pretend that a row of your spreadsheet reads

(1)   (2)   (3)   (4)   (5)   (6)   (7)   (8)   (9)
--------------------------------------------------- | | | | | | | | | | | 136 | 9.8 | | 64 | 108 | | | 7.8 | | | | | | | | | | | |
---------------------------------------------------

Excel would write this in a file as

136,9.8,,64,108,,,7.8,

and Stata can read this without difficulty. Note the comma at the end of the line. It is because of that comma that Stata knows there is a ninth value; it is merely that the ninth value is missing in this observation. Stata understands this.

Sometimes, however, Excel will write these same data as

136,9.8,,64,108,,,7.8

without the trailing comma. Thus, it appears to insheet that the line contains eight rather than nine values, and so insheet complains.

The conditions under which Excel does this are complicated. Basically, Excel reads a chunk of the spreadsheet and then looks at the chunk. If the last column in the chunk contains empty cells, then it omits that column! So, in the resulting file, there may be some records showing nine columns, others showing eight, and even others showing seven or fewer! Each will occur in a group (that is determined by Excel's buffer size).

If you run into this difficulty, here is what you do:

1. To your spreadsheet, add another column. Fill the columns with 1s.
2. Save the spreadsheet.
3. Read the spreadsheet into Stata using insheet.
4. Drop the last variable.

Adding a column of 1s will avoid the problem of empty cells in the last column, so Excel will write the correct number of separators on each line.

http://www.stata.com/support/faqs/data/1excelcol.html


The above work around of adding a column at the end of the table with values sounds like it could work. I like this one better. I used OpenOffice.org Calc to convert an Excel file into a tab deliniated file (save as Text CVS and then use Edit filter settings). Doing a "save as" with Calc creates a file that includes all of the delimiters even with cells with empty values at the end of rows. A comparison can be made using a binary file editor (bvi) to look at the files. Opening them in Microsoft Word and doing a "show/hide" shows the delimiters and new line feed too.

Morpho was unable to delete some of the columns, near the end, with the Excel problem file. I replaced the text file that Excel had created with the one that Calc had created. Morpho was able to delete columns. Before this I could not delete columns. The response time working within the table view in Morpho was better too.

#6 Updated by Redmine Admin about 6 years ago

Original Bugzilla ID was 2463

Also available in: Atom PDF