诗歌rails之encoding in ruby and rails

最新推荐文章于 2025-09-12 12:39:11 发布

转载最新推荐文章于 2025-09-12 12:39:11 发布 · 88 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/orez88/articles/1553629.html

文章标签：

#ruby

本文介绍了解决CSV文件因编码问题导致解析失败的方法。通过使用Ruby的Iconv库将文件从latin1编码转换为UTF-8编码，并提供了一个更为通用的解决方案，即使用工具猜测文件的原始编码。

让ruby简化你的工作之blog阅读器
UTF8编码和正则表达式
Ruby每周一测 - 中英文混合字符串截取

推荐圈子: EXT
更多相关推荐

项目中有一个csv文件因为是latin1 encoded，所以解析就会出现错误。

解决方案：可以通过Iconv.iconv("UTF-8", "latin1", file.read)把stream转化成UTF-8格式（rails的默认编码是UTF-8）。

局限：当然此种方法的局限是限定了文件的编码只能是latin1的时候才能解析正确。当然，如果没有特殊字符，其他的一些ASCII字符(single byte character)在两种编码之间会转换正确。但一当文件中有其它编码的特殊字符，那就歇菜了。

更进一步：使用gussing工具猜编码吧，比如下面这段

    File.open(tmp_file, 'w') do |f|
    input = file.read
    charset = CMess::GuessEncoding::Automatic.guess(input)
f.write(Iconv.iconv('UTF-8', charset, input))
    end

以上问题详见此讨论

顺便说下ruby的encoding ：

1. In ruby <= 1.8, strings were effectively just byte streams. Those bytes would often contain text in one encoding or another, but there was no formal way to record exactly which one (if any; binary data has none). All the default methods assumed single byte encoding (such as US-ASCII), so behaviour was odd when the encoding was multibyte (such as UTF8).

2. In ruby 1.9:

str.encode!("UTF-8")

File.open("file", "r:UTF-8") {|f| ...........}

转载于:https://www.cnblogs.com/orez88/articles/1553629.html