24-07-24 03:53 PM
We noticed something strange with this action, when a column has a mix of numbers and text , the column will be designated as a data type of 'Number' , this will then not upload into the collection any item in the column which has a text value ,
Example - 5555555/023
will be omitted from collection.
Looking at the code behind this action , the following is done
Dim HDRString As String = "No"
If First_Line_Is_Header Then HDRSTring = "Yes"
Dim Folder as String = nothing
Dim FileName As String = Nothing
Split_Path(CSV_Location, Folder, Filename)
Dim cn As New OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" & Folder & ";Extended Properties=""Text;HDR=" & HDRString & ";FMT=Delimited""")
Dim da As New OleDbDataAdapter()
Dim ds As New DataSet()
Dim cd As New OleDbCommand("SELECT * FROM [" & Filename & "]", cn)
cn.Open()
da.SelectCommand = cd
ds.Clear()
da.Fill(ds, "CSV")
Values = ds.Tables(0)
cn.Close()
I'm wondering what the logic is to determine if a column in the collection is to be designated with a datatype of Number or Text
Instead of using this action we now use the following combination to make sure the data type of all columns are Text
Utility - File Management 64bit / Read All Text from File
Utility - Strings / Get CSV As Collection
Is the use of OLEDB in the original code causing the data type mismatch for the column ?
Also , what causes the column to be defined as Number , I presumed if the first row in the column is a pure number, then it defines the data type as Number. But we have seen instances were the first row had 12345678/111 and it still defined the data type as Number.
25-07-24 05:30 AM
Hi @Stephen.Jones ,
Yes, this is a pretty common issue with OLEDB in general where it always will analyze data types of columns based on the initial few rows of any given dataset. So, ideally if in your dataset, it saw that there are first few rows with Column X having numbers only, then it will treat the entire Column X as a Number.
We can get around it though but that requires us to change the connection string a bit while adding the IMEX = 1 parameter which tells that we should handle mixed data types, here IMEX stands for Import Mixed Data Types
Hence, we would need to change this line of code:
Dim cn As New OleDbConnection("Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" & Folder & ";Extended Properties=""Text;HDR=" & HDRString & ";FMT=Delimited""")
to:
Dim connectionString As String = "Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" & Folder & ";Extended Properties=""Text;HDR=" & HDRString & ";FMT=Delimited;IMEX=1"""
This also can give incorrect results at time mainly with UNICODE as per my experience but mostly does work. If this won't work, then we need to cast individual columns as TEXT in the query itself but since this VBO selects all the columns, I won't recommend that.
25-07-24 12:55 PM
This can be a real pain, as OLEDB effectively guesses the datatype from the first (I think) 8 rows and then tries to convert all rows to that datatype. And using CAST in the query won't always work because OLEDB attempts to convert before executing the query.
One approach you can use is to make a temporary copy of your source file, insert 8 dummy rows at the top of the file that will force OLDBE to guess correctly, and then apply the query to the temp file. It's a crude workaround, but it works.
I think with the Jet provider the 8 row thing is set in the registry and can be changed, but there is a potential for loss of performance if OLEDB is forced to scan the whole file.
There's also a way of creating a schema.ini file that defines the data type of each column - from memory you place this file next to the source CSV file and OLEDB uses it to interpret the data.
30-07-24 08:50 AM
@Stephen.Jones I hope @devneetmohanty07 and @John__Carter's responses helped you here - could you let us know? It's always great for us to understand what solutions work best!
01-08-24 10:42 AM
We have found using combination
Utility - File Management 64bit / Read All Text from File
Utility - Strings / Get CSV As Collection
seems to be the best way to deal with the issue.