Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
11-08-21 04:29 AM
Hi Community,
I am trying to merge two .dat files after comparing that the first row has the same columns. Syntax for the .dat file is as below(when I open with notepad)
Project ID Employee ID Project Type
XXXX xxxxxxxxxxx xx
1 ) First I need to check if the column matches in both files
2) After that I need to append the content of second file to first file after removing the header
I thought of renaming .dat with .csv file,Append the records,save and rename back to .dat. but there were two problems :-
1 ) Headers are in japanese. When i change extension to .csv , japanese characters are getting replaced with some other weird character.
2) When I append data to csv and rename back to .dat, " is added before the line and after the line as below
"Project ID Employee ID Project Type"
"XXXX xxxxxxxxxxx xx"
Could someone please suggest how to proceed ?
------------------------------
Kirtish Trivedi
------------------------------
I am trying to merge two .dat files after comparing that the first row has the same columns. Syntax for the .dat file is as below(when I open with notepad)
Project ID Employee ID Project Type
XXXX xxxxxxxxxxx xx
1 ) First I need to check if the column matches in both files
2) After that I need to append the content of second file to first file after removing the header
I thought of renaming .dat with .csv file,Append the records,save and rename back to .dat. but there were two problems :-
1 ) Headers are in japanese. When i change extension to .csv , japanese characters are getting replaced with some other weird character.
2) When I append data to csv and rename back to .dat, " is added before the line and after the line as below
"Project ID Employee ID Project Type"
"XXXX xxxxxxxxxxx xx"
Could someone please suggest how to proceed ?
------------------------------
Kirtish Trivedi
------------------------------
Answered! Go to Answer.
1 BEST ANSWER
Helpful Answers
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
11-08-21 11:30 AM
Why not just check headers, remove header from 2nd file and append the rest to the first file?
If you don't want to use code stages that would be like:
1) Load as text files in file1, file2 data items
2) Get header from file1 (up to first newline)
3) Check that it exists in file2
4) Replace header+newline with empty string "" in file2
5) append file2 to file1
6) write result
If files are huge, you can do it with code stage without loading everything in memory.
Also in nix I'd just use tail -n +2 file2.dat >> file1.dat, there should be similar way to do it in powershell too.
------------------------------
Andrey Kudinov
Project Manager
MobileTelesystems PJSC
Europe/Moscow
------------------------------
If you don't want to use code stages that would be like:
1) Load as text files in file1, file2 data items
2) Get header from file1 (up to first newline)
3) Check that it exists in file2
4) Replace header+newline with empty string "" in file2
5) append file2 to file1
6) write result
If files are huge, you can do it with code stage without loading everything in memory.
Also in nix I'd just use tail -n +2 file2.dat >> file1.dat, there should be similar way to do it in powershell too.
------------------------------
Andrey Kudinov
Project Manager
MobileTelesystems PJSC
Europe/Moscow
------------------------------
3 REPLIES 3
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
11-08-21 11:30 AM
Why not just check headers, remove header from 2nd file and append the rest to the first file?
If you don't want to use code stages that would be like:
1) Load as text files in file1, file2 data items
2) Get header from file1 (up to first newline)
3) Check that it exists in file2
4) Replace header+newline with empty string "" in file2
5) append file2 to file1
6) write result
If files are huge, you can do it with code stage without loading everything in memory.
Also in nix I'd just use tail -n +2 file2.dat >> file1.dat, there should be similar way to do it in powershell too.
------------------------------
Andrey Kudinov
Project Manager
MobileTelesystems PJSC
Europe/Moscow
------------------------------
If you don't want to use code stages that would be like:
1) Load as text files in file1, file2 data items
2) Get header from file1 (up to first newline)
3) Check that it exists in file2
4) Replace header+newline with empty string "" in file2
5) append file2 to file1
6) write result
If files are huge, you can do it with code stage without loading everything in memory.
Also in nix I'd just use tail -n +2 file2.dat >> file1.dat, there should be similar way to do it in powershell too.
------------------------------
Andrey Kudinov
Project Manager
MobileTelesystems PJSC
Europe/Moscow
------------------------------
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
11-08-21 12:24 PM
Hi @AndreyKudinov,
Thank you for your suggestion.
As per suggestion 2) Get header from file1 (up to first newline)
Could you please assist how to get just the first line from text ? Text file contains almost 20k records hence is there any way I can just access first line for header comparison and NOT the entire content ?
Update : I was able to do it with help of your suggestion. I found position of first newline using InSTR function and then extracted header using MID function.
This resolved my overall issue.
Thank you for suggestion.
------------------------------
Kirtish Trivedi
------------------------------
Thank you for your suggestion.
As per suggestion 2) Get header from file1 (up to first newline)
Could you please assist how to get just the first line from text ? Text file contains almost 20k records hence is there any way I can just access first line for header comparison and NOT the entire content ?
Update : I was able to do it with help of your suggestion. I found position of first newline using InSTR function and then extracted header using MID function.
This resolved my overall issue.
Thank you for suggestion.
------------------------------
Kirtish Trivedi
------------------------------
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content
11-08-21 01:55 PM
Chr(10) is a LF (newline) symbol in a file, this should work:
Left([file1],InStr([file1], Chr(10)))
Store it in [header], then check condition (same header in file2?)
InStr([file2],[header])>0
Yes - remove it from file2 and add rest to file1: [file1] & Replace([file2],[header],"")
Write result to a new file
p.s. That is assuming headers are exactly the same ofc, but I assume its true in your case. Otherwise you'd need to compare them somehow, rest of the logic should not change much.
------------------------------
Andrey Kudinov
Project Manager
MobileTelesystems PJSC
Europe/Moscow
------------------------------
Left([file1],InStr([file1], Chr(10)))
Store it in [header], then check condition (same header in file2?)
InStr([file2],[header])>0
Yes - remove it from file2 and add rest to file1: [file1] & Replace([file2],[header],"")
Write result to a new file
p.s. That is assuming headers are exactly the same ofc, but I assume its true in your case. Otherwise you'd need to compare them somehow, rest of the logic should not change much.
------------------------------
Andrey Kudinov
Project Manager
MobileTelesystems PJSC
Europe/Moscow
------------------------------
