Morpho can't delete or insert columns at right edge of tables with uneven record lengths
To recreate this bug:
- Using a text-editor, create a tab-delimited (or otherwise delimited) file in which lines have different numbers of fields (i.e. if a particular line doesn't have a value for the last field, it simply inserts a newline, without putting in the last tab to signify an empty field).
- Create a dp in morpho and import the file as a dataTable.
- Save the dp (optional)
- Try to delete the one of the rightmost columns (far enough to the right that some of the lines in the file didn't have a tab character recording the value of the field)
The column operation will fail, though if you go into the tree editor at this point you can see that the metadata was changed to reflect the operation. This could lead to the column metadata being mispaired to the corresponding column values.
Morpho should either detect the uneven line lengths and refuse to import the table to begin with (giving the user notification of why), or it should be able to handle subsequent column operations on the table.
#4 Updated by Margaret O'Brien over 14 years ago
from Margaret: padding the lines to the length of the longest line is ok as long as all those extra columns could be deleted. We run into text files with uneven line lengths often in excel output, and sometimes get quite a few extra columns. An alternative would be to create only the number of columns corresponding to the number of headers entered, (either auto or manually entered). As a user, I think rejecting the file altogether is OK, too (with an appropriate message).
#5 Updated by Callie Bowdish over 14 years ago
Work Around and description of Excel Empty Value problem
It is common to use Microsoft Excel to save tables as tab delimited text files. Sometimes tables will have rows that END with cells with empty values In some cases like this, Excel creates a file that does not include the delimiters for these cells. The first 16 rows may include the delimiters because the header row does not contain missing values. But after 16 rows it was observed that the new line starts with out delineating where the cells with empty values are.
from web search:
Why do I get the error "wrong number of values" when I use insheet to read data from Excel?
Title How Excel writes empty cells
Author Paul Lin, StataCorp
Date April 1997
This problem has to do with how Excel writes empty cells into files. Pretend that a row of your spreadsheet reads
(1) (2) (3) (4) (5) (6) (7) (8) (9)
| | | | | | | | | | | 136 | 9.8 | | 64 | 108 | | | 7.8 | | | | | | | | | | | |
Excel would write this in a file as
and Stata can read this without difficulty. Note the comma at the end of the line. It is because of that comma that Stata knows there is a ninth value; it is merely that the ninth value is missing in this observation. Stata understands this.
Sometimes, however, Excel will write these same data as
without the trailing comma. Thus, it appears to insheet that the line contains eight rather than nine values, and so insheet complains.
The conditions under which Excel does this are complicated. Basically, Excel reads a chunk of the spreadsheet and then looks at the chunk. If the last column in the chunk contains empty cells, then it omits that column! So, in the resulting file, there may be some records showing nine columns, others showing eight, and even others showing seven or fewer! Each will occur in a group (that is determined by Excel's buffer size).
If you run into this difficulty, here is what you do:
1. To your spreadsheet, add another column. Fill the columns with 1s.
2. Save the spreadsheet.
3. Read the spreadsheet into Stata using insheet.
4. Drop the last variable.
Adding a column of 1s will avoid the problem of empty cells in the last column, so Excel will write the correct number of separators on each line.
The above work around of adding a column at the end of the table with values sounds like it could work. I like this one better. I used OpenOffice.org Calc to convert an Excel file into a tab deliniated file (save as Text CVS and then use Edit filter settings). Doing a "save as" with Calc creates a file that includes all of the delimiters even with cells with empty values at the end of rows. A comparison can be made using a binary file editor (bvi) to look at the files. Opening them in Microsoft Word and doing a "show/hide" shows the delimiters and new line feed too.
Morpho was unable to delete some of the columns, near the end, with the Excel problem file. I replaced the text file that Excel had created with the one that Calc had created. Morpho was able to delete columns. Before this I could not delete columns. The response time working within the table view in Morpho was better too.