cancel
Showing results for 
Search instead for 
Did you mean: 

Handling large CSV file

SayeedBinAbdullah
Verified Partner

Hi Everyone,

We’re working with a .txt file approximately 180MB in size, and need to perform the following operations efficiently given its volume, what will be the right approach to handle such files?

  • Cleanse the data by removing all "|" characters from each line.

  • Convert the cleaned file to a .csv format, ensuring all special characters are eliminated.

  • Aggregate currency values: The file contains over 300,000 rows across 10 currency types and 15 Columns. The goal is to compute the total sum for each currency and store the results in an output collection containing 10 records (one per currency).

 

Best Regards,
Sayeed Bin Abdullah

Senior Consultant
Wonderbotz
1 BEST ANSWER

Helpful Answers

@faheemsd @Parthiban_Viatris24 

I'm currently using below custom code to accomplish this. I’d appreciate any suggestions to help fine-tune the implementation or ideas for alternative approaches.

Code to Replace "|" character

Here I'm reading each line from ".txt" file replacing the "|" character and writing the updated content to a ".csv" file.

SayeedBinAbdullah_3-1754301272422.png

SayeedBinAbdullah_0-1754301160296.png

Input Parameters

SayeedBinAbdullah_1-1754301214975.png

Output Parameters

SayeedBinAbdullah_2-1754301264740.png

Code to Fetch Records in Collection

SayeedBinAbdullah_7-1754301480424.png

SayeedBinAbdullah_9-1754301823932.png

Input Parameters

SayeedBinAbdullah_5-1754301450068.png

Output Parameters

SayeedBinAbdullah_6-1754301460968.png

Final Output

SayeedBinAbdullah_8-1754301499301.png

Best Regards,
Sayeed Bin Abdullah

Senior Consultant
Wonderbotz

View answer in original post

9 REPLIES 9

Using OLEDB queries to interact with CSV files can be an effective solution for handling large-sized or data-heavy CSV files

Parthiban A

Hi @Parthiban_Viatris24 

Thankyou for the reply,

Could you also suggest how we can eliminate "|" characters from each line, as there are over 3,00,000 records in the file?

Best Regards,
Sayeed Bin Abdullah

Senior Consultant
Wonderbotz

Try reading the file as a text file and replacing the '|' character with a blank in a single step. After the replacement, convert the text into CSV format. You can then either proceed with collection-based manipulation for further processing or save it back as a CSV file and use OLEDB queries to interact with the data and access the relevant information.

Parthiban A

As this is a huge file will we be able to get the whole text in a data item and perform replace action in calculation stage and write it back to the text file?

Best Regards,
Sayeed Bin Abdullah

Senior Consultant
Wonderbotz

Yes, you can proceed—there are no character limitations for Blue Prism Data Items with the Text data type. If you still encounter issues, we can address them using VB.NET or C#.NET code.

Parthiban A

faheemsd
MVP

Dear @SayeedBinAbdullah ,

I'm happy to help you if I get a text file with a dummy data to give you the final solution on this.


MVP

@faheemsd 

Thankyou for your response.

The file size is around 180 MB and we can only upload 5 MB max here,

I found some solution online and I'm working on it, will share it here if that solution works.

Best Regards,
Sayeed Bin Abdullah

Senior Consultant
Wonderbotz

@faheemsd @Parthiban_Viatris24 

I'm currently using below custom code to accomplish this. I’d appreciate any suggestions to help fine-tune the implementation or ideas for alternative approaches.

Code to Replace "|" character

Here I'm reading each line from ".txt" file replacing the "|" character and writing the updated content to a ".csv" file.

SayeedBinAbdullah_3-1754301272422.png

SayeedBinAbdullah_0-1754301160296.png

Input Parameters

SayeedBinAbdullah_1-1754301214975.png

Output Parameters

SayeedBinAbdullah_2-1754301264740.png

Code to Fetch Records in Collection

SayeedBinAbdullah_7-1754301480424.png

SayeedBinAbdullah_9-1754301823932.png

Input Parameters

SayeedBinAbdullah_5-1754301450068.png

Output Parameters

SayeedBinAbdullah_6-1754301460968.png

Final Output

SayeedBinAbdullah_8-1754301499301.png

Best Regards,
Sayeed Bin Abdullah

Senior Consultant
Wonderbotz

SouravSaha
Verified Partner

The optimal solution to handle this kind of scenario, thanks @SayeedBinAbdullah for sharing detailed explanation!

Best regards,
Sourav S
Consultant - Automation Developer
WonderBotz