嘿, 我是Mofei!
Generating CSV with JavaScript and the Chinese Garbled Text Problem

Due to work reasons, I often need to export tables to CSV format. Previously, such tasks were handled by the backend, but this time, since it's a purely frontend tool, I didn't want to bother the backend experts and tried generating CSV using JavaScript. The entire process can actually be completed smoothly following the solutions found on Baidu/Google, but in practice, I encountered some minor issues, such as Chinese garbled text. After struggling with it for half an hour and using a very hacky method to solve it, I still found it quite interesting overall. Here, I'll briefly summarize.

Data URLs

First, I need to introduce Data URLs, which allow files to be stored in the form of a URL. Our CSV also uses this technology, converting the contents of the CSV into a URL and then opening it in a new window, so the browser will default to downloading the CSV file.

The default syntax of Data URLs is as follows:

data:[<mediatype>][;base64],<data>

  • data: is the prefix
  • mediatype: the MIME type of the file, for instance, image/jpge corresponds to JPEG files, with the default value being text/plain;charset=US-ASCII
  • base64: whether the file content is in base64 format
  • data: the main content of the file

For example, you can enter the following address in your browser to test:

data:text/plain;base64,SGVsbG8sIFdvcmxkIQ%3D%3D
data:text/html,<script>alert('hi');</script>

With that said, I’m sure everyone has answers regarding the content below:

Name Age
Mofei 18

We just need to assemble the following content:

Name,Age\nMofei,18

Adding the default header for CSV, which is text/csv;charset=utf-8, thus we obtain the content below:

data:text/csv;charset=utf-8,Name,Age\nMofei,18;

Try opening this in the browser, and unfortunately...

The main reason is that we used Chinese and other characters in the URL that can be hard for some browsers to handle. To solve this issue, we tried to encodeURI the content like this: encodeURI('Name,Age\nMofei,18')

data:text/csv;charset=utf-8,%E5%A7%93%E5%90%8D,%E5%B9%B4%E9%BE%84%0AMofei,18

Now it should be fine, right? Let's open it in the browser and see the results...

Although the format is correct, the Chinese characters... are still a disaster...

JavaScript CSV Chinese Garbled Text Problem

Upon seeing the Chinese garbled text, the first thing that comes to mind is charset, but we clearly set charset=utf-8; why is it still garbled?

After investigation, we found that the problem might be with the BOM. Although we specified the text type, the file type cannot be set in the URL. We need to set the document mode to be able to recognize Chinese.

After consulting relevant information, I finally found a magical solution \uFEFF, which can change the BOM format of the document. Thus our code changed to:

data:text/csv;charset=utf-8,\uFEFF%E5%A7%93%E5%90%8D,%E5%B9%B4%E9%BE%84%0AMofei,18

Now, the Chinese garbled text problem is resolved, but the downloaded file is named 下载.csv. Can we customize the downloaded file name?

Customizing CSV File Name

Actually, this method is quite tricky. Most people would instinctively try to manipulate the Data URLs, but to solve this problem, we can take a roundabout way to use the a tag's download attribute. This attribute can specify the name of the file we are downloading, and thus we have the following code:

var a = document.createElement('a');
a.href='data:text/csv;charset=utf-8,\uFEFF%E5%A7%93%E5%90%8D,%E5%B9%B4%E9%BE%84%0AMofei,18';
a.download="Mofei's CSV.csv";
a.click();

The principle is simple: we create an tag with a download attribute, and then simulate a click with JavaScript to initiate the download.

Comma Issue in Cells

As we previously described, each cell needs to be separated by a comma. However, how do we handle the situation when a cell also needs to contain a comma?

As a programmer, my first thought was to use an escape character and I wrote the field like this:

Mofei\,Zhu,18

Expecting it would become:

Name Age
Mofei,Zhu 18

But the result unfortunately displayed as:

Name Age
Mofei Zhu 18

The reason lies in the encodeURI, but to avoid going into details, later I tried processing it as a string, wrapping the fields in double quotes turned it into:

"Mofei\,Zhu",18

And as expected, it worked! CSV would automatically treat the content between double quotes as a single cell, everyone was happy. I quickly ran to the convenience store to buy an ice cream, but then suddenly thought, since CSV treats double quotes as cell boundaries, what if I also need double quotes inside a cell?

For example, if we want to add "super man" before Mofei, we tried the following:

"\"super nam\" Mofei,Zhu",18

The result:

Name Age
super nam" Mofei Zhu" 18

My ice cream fell in shock...

After some trial and error, I discovered that double quotes in CSV can be represented using double quotes, which means:

""super nam"" Mofei,Zhu",18

Now, everything is wonderful:

Name Age
"super nam" Mofei,Zhu 18

(Quickly pick up the ice cream and continue eating... T_T)

END

In summary, generating CSV with JavaScript is relatively simple, but it involves many little tricks. Here is a brief record for reference.

THE END

More Articles You Might Be Interested In

Your comment might help others too—feel free to share your thoughts!

avatar

Mofei's Friend (Click to edit)

Hey, say something!

HI. I AM MOFEI!

NICE TO MEET YOU!