Fix text encoding issue when exporting to CSV (comma separated values) – Ruby on Rails

Hi all!!

In this post, I will be sharing a problem that I came across while dealing with exporting a CSV in Ruby on Rails. ( The rails version used was 4.2.10 and ruby version was 2.4.5).

Problem:

When writing into a file encoded with utf-8 encoding, some special characters weren’t getting parsed in Excel. It was getting parsed problem in other editors like Google Sheets, Number, Notepad etc.
P.S. It was getting exported using a network call as background job (as asynchronous task)

Below is the example of how the error was appearing:
[ABCD] — Write a blog and help people.
[ABCD] – Write a blog and help people.

Solution:

While going through different stack-overflow posts, reading articles and documentations. I came across a solution which we will be learning about in this post.

The reason for Excel being unable to read those special characters (Example: Long dash — , Inverted commas copied from text editor ‘Hello’ “world”, some greek characters ä č ḇ β, etc) was that the generated CSV file uses a UTF-8 character encoding, but Excel tries to read it using a different encoding (probably ISO-8859-1) and we can’t expect the user to have it configured for utf-8.

To solve the above stated issue, we can add a byte order mark to our csv file while creating.

What’s BYTE ORDER MARK?
At the beginning of a page that uses a Unicode character encoding you may find some bytes that represent the Unicode code point U+FEFF BYTE ORDER MARK (abbreviated as BOM). It is also know as UTF-8 signature.
For more read: Byte Order Mark

How to add BOM?

BOM = "\xEF\xBB\xBF"
file_path = # Your file path
CSV.open(file_path, 'wb', headers: headers, write_headers: true) do |csv|
        # Below line will add BOM to your csv
        csv.to_io.write(BOM)
        # Write your logic here
end

The above snippet shows how we can add BOM to our CSV file while writing.

After writing BOM to your file. Check the length of string. It will get increased by one due to addition of extra byte.
To get more insights, you can also get byte array representation of a string by using ‘Piyush’.bytes

# Without adding BOM to your CSV.
> 'Piyush'.length
6
# After adding BOM
> 'Piyush'.length
7

## For more closer look. What happened here?
# Before BOM
> 'Piyush'.bytes
[80, 105, 121, 117, 115, 104]
> 'Piyush'.bytes
[239, 187, 191, 80, 105, 121, 117, 115, 104]

If you closely, you will find that 239, 187, 191 byte got added to the string.

Now, what about reading csv files with UTF-8 (BOM) encoding?

rows = CSV.read(csv_file_path, 'r:bom|utf-8')

I hope this article covers all aspects when dealing with export and import of a CSV in ruby.
Thanks to everyone reading this article. If I have missed out something that is in the scope of solving the stated issue, feel free to leave a comment.

See you all in the next post. Ciao!

2 thoughts on “Fix text encoding issue when exporting to CSV (comma separated values) – Ruby on Rails

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.