HTML and charsets

curl downloads the exact binary data that the server sends. This might be of importance to you in case, for example, you download an HTML page or other text data that uses a certain character encoding that your browser then displays as expected. curl does not translate the arriving data.

A common example where this causes some surprising results is when a user downloads a webpage with something like:

curl -o storage.html

…and when inspecting the storage.html file after the fact, the user realizes that one or more characters look funny or downright wrong. This might occur because the server sent the characters using charset X, while your editor and environment use charset Y. In an ideal world, we would all use UTF-8 everywhere but unfortunately, that is still not the case.

A common work-around for this issue that works decently is to use the common iconv utility to translate a text file to and from different charsets.