17-03-21 08:49 PM
Hi everyone! Hope that you guys can help me.
I have a file (with file type: 'file') that can be opened manually to Notepad. I want to extract that file information to a collection and text mining it. I tried to import it with CSV with MS VBO Excel and it split the data at ' , ' and I want it to split at ' ; '
Does anyone know how can I solve this problem?
PS: it's possible to open the file through excel and force the opening to a read-only file
Thanks a lot,
Rita Cabrita
18-03-21 08:41 AM
18-03-21 04:28 PM
19-03-21 01:39 AM
How about replacing the separator?
19-03-21 10:58 AM
"Pi";3,14
"Oh,hi!; This is a ""simple"" string
with a line break!";0
Excel would "open" it correctly on a PC with language/locale set to russian for example, but would fail on en_us, VBO action ("import text") would fail for other reasons (linebreak). Parsing it manually is a pain too.
Also many 3rd party programs fail to generate a proper csv, so readin it becomes a problem. Excel doesn't like to quote strings for example.24-03-22 08:01 AM
24-03-22 12:18 PM
24-03-22 05:11 PM
25-03-22 01:10 PM
Handle duplicate column headers from a CSV file:
If you have a CSV that will have duplicate column headers and use this action:
...and set the flag 'First Line Is Header' to True, it will crash due to duplicate column headers. If you set it to False, it will assume the first row to be data and not crash as a result of the duplicate column headers. Of course you still have to remove the first row and this was not meant to be data.
Note that it could also be a problem if the column headers in the CSV contain illegal characters. From what I've experienced, a dot '.' in a column header is not allowed. There are likely more illegal chars.
CSV delimiters:
In my part of the world, delimiters in CSV files can either be a comma och a semicolon. If you check your windows settings for your region:
Region settings>>Additional date, time & regional settings>>Change date, time, or number formats>>Additional settings
There you'll find your windows preference for correct CSV build and split-up. On my PC I have:
So, if I get a CSV file delimited by ; it will work fine, if I get a comma separated file instead, it will not work as intended.
At my place of work, we specify to the sender of the file what format and delimiter we expect and we reject alternatives. When validating an incoming CSV file, our process will first read the first row, check if it contains our preferred delimiter and either approve or reject it.
When correctly formatted, the CSV will open in a multi column collection, incorrectly formatted, it will create only one column in the collection. Assuming there were multiple columns in the CSV of course.