2011-11-13 16 views
1

I have text containing multi-byte characters as shown:Counting the number of multi-byte characters

ウィキペディア、百科事典。

台数(λ - 、lambda - )はする。 1930年代だ。関数s(x、y)= x * x +入力xである。 x↦xとy ↦yは変数の名前は。また、(x、y)↦x * x + y * yと(u、v)↦u * u + v * vは.123456

In a word processor、it gives me a character count of 148 。

On a HTML form encoded in UTF-8、it uses up a​​3210 field with attribute of maxlength = 150

Using the PHP function mb_strlen($_POST['text'],'UTF-8')、it returns a value of 。

Which one is correct?

+1

Which Word processor? Maybe they do not count newlines as characters? –

+0

@ Yzmir Ramirez、libreoffice writer。 You are right、it does not count newlines。 Then why is there a difference between HTML and PHP? –

答えて

2

I'm going to say they all are correct。

With no line endings it is 148.

With line endings it is 150 or 152 depending on the encoding(Windows uses 2 characters per line ending)。

+0

Can you elaborate? I do not understand。 You mean PHP counted two extra '\ n's、HTML counted one? –

+2

In Windows a newline are two characters '\ n \ r'、but on a Mac OS X and * nix machines its' \ n'、and '\ r' for Mac OS(up to 9)if memory served me correctly。 History of the newline http://en.wikipedia.org/wiki/Newline –

+0

@ Yzmir Ramirez、I see。I am using Linux。 –