Page 1 of 1
Separators in CSV values
Posted: 14 Feb 2011, 16:35
by mdruiter
Sometimes, I have CSV files where some values contain separators. An example (also attached):
Code: Select all
LastUpdated, Location, Percentile
2010-10-20, "Amsterdam, The Netherlands", 0
This file is an excerpt from a result generated by:
http://stackusers.com/Users/Find
Every comma is seen as a separator, which causes the value to be split. Even if I explicitly specify double quotes as text qualifiers.
After a while, I found the problem.

The commas each have a space after it. So the first double quote is not directly following the comma, and it is not seen as a text qualifier.
When I remove the space after the commas, the text qualifiers
are respected and the file is properly displayed.
I'm not sure whitespace around separators is "officially" allowed when using text qualifiers (whatever "officially" means anyway). But it wouldn't hurt! And if it does, one can choose not to use text delimiters.
It would be nice if the Database Viewer did The Right Thing automatically.
Re: Separators in CSV values
Posted: 14 Feb 2011, 18:05
by therube
I'm not seeing an issue?
CSV Options: Separator, Comma; Text Qualifier, Double Quotes; Convert, Auto.
Re: Separators in CSV values
Posted: 14 Feb 2011, 20:17
by mdruiter
I see this:

- Screen shot
- csv.PNG (1.7 KiB) Viewed 9841 times
But I would like to see this:

- Expected screen shot
- csvok.PNG (1.71 KiB) Viewed 9841 times
Re: Separators in CSV values
Posted: 14 Feb 2011, 21:16
by therube
I see this:
I see that (^^^) from your attached file, which is different from your "coded" file.
Attached file is "correct" (assuming you want each field comma delimited, with double quotes as text qualifier).
The coded file is kind ambiguous in that respect. (First field is comma+space delimited, the second field
you want to be comma+space delimited, but also with double quote qualifier. Assume that because all fields do not match, that is throwing things off (& not that I know how things things are supposed to work, though it would seem to me that if you want as you want, then all fields should be double quoted text qualifier, not just the one.)
Re: Separators in CSV values
Posted: 14 Feb 2011, 21:33
by mdruiter
Ouch, sorry. Apparently I edited the file before pressing
Submit (but after
Add the file).
To summarize the feature request: it would be nice if spaces around text qualifiers (if any) were ignored.
Re: Separators in CSV values
Posted: 14 Feb 2011, 21:35
by therube
(I've probably edited my reply above - a number of times, so re-read though I think the end result will be the same.)
Re: Separators in CSV values
Posted: 14 Feb 2011, 22:18
by mdruiter
I agree the format (inconsistent text qualifier use, spaces after separators) could be improved. But I cannot change the generation process of these files.
And I do not see any downsides of my proposal. Except that somebody will have to implement it.

Why not do The Right Thing in more cases?
Re: Separators in CSV values
Posted: 14 Feb 2011, 23:38
by therube
How do other CSV programs handle the same data?
If your data is consistent, if you changed to space separated with double quote qualifier, appears it should work - though it then does retain the commas as part of the field data. (Tested with a VERY small sampling of data

.)
Re: Separators in CSV values
Posted: 15 Feb 2011, 00:06
by SvA
Well, there is some kind of "official" description of the format in
rfc 4180 and there is the most "down to earth" description of the format at
http://creativyst.com/Doc/Articles/CSV/CSV01.htm. They contradict each other in this point.
I believe there are two approaches to this issue
- go with the flow and implement the latter
- be versatile and add an option
Note: The latter description failes to mention that Microsoft redefined the C of
CSV to read Character instead of Comma and uses the list separator character defined in Window's Regional setting as the separator for the CSV file, so, for me, Excel uses ; as the separator. Salamander already does a good job at guessing the separator character.
Moreover the description of Excel's preservation of leading spaces is not correct, at least not according to my tests with Excel 2007. Excel expects the CSV format to preserve spaces adjacent to the separator when writing, and preserves those spaces when reading. However, after a field is read, excel checks it's format and converts the field to the type it recognizes. If, for instance, the content of a field looks like a number, it is convertet to a number and any spaces, leading or trailing (and leading zeros also) are lost. This has nothing to do with reading the CSV, but with postprocessing the data read.