XMPPDecoder has a decode problem for UTF-8


A utf8 character ( as char in Java) is usually composed of 1-3 bytes ( max is 6 bytes), see http://en.wikipedia.org/wiki/UTF-8 .

From now on, I assume a character that is 3 bytes.

Openfire use mina nio process network stream, and implement a XMPPDecoder for docode bytes to String/Stanza.

When decode a bytebuffer, it's may be incomplete bytes for a character. eg. In bytebuffer's last few bytes, you may receive one or two or three

bytes for a character, if there's 1 or 2 bytes then it's incomplete. It's Random happen incomplete state. If input long 3bytes character, the random probability significantly increased.

let's see org.jivesoftware.openfire.nio.XMLLightweightParser ( openfire 3.6.4 ):

Charset encoder = Charset.forName(charset);
CharBuffer charBuffer = encoder.decode(byteBuffer.buf());
char[] buf = charBuffer.array();
int readByte = charBuffer.remaining();
car lastChar = buf[readByte-1];
if (lastChar >= 0xfff0) { // you think it's incomplete, then position-1, readByte-1
byteBuffer.position(byteBuffer.position()-1); //error
readByte--; //error

The above code is not properly handled the case that is incomplete for UTF-8.

If a character is 3 bytes, there is incomplete for one or two bytes at the end of bytebuffer.

If one byte incomplete, bb's position should -1. If two bytes position incomplete, bb's position should -2.

So, if position-1 and two bytes position incomplete, this 3 bytes become the last two bytes for decode, and then be replace to two "FD".

Or so, if position-2 and one bytes position incomplete, this 3 bytes become the 4 bytes for decode, and then there's one more "FD" and this character.




Liyu Wang
August 15, 2011, 6:13 AM

actually the path fix the utf-8 bug.
the exception of my previous test case is threw by
the more() function from the MXParser.java. if you comment
that more() function, everything works fine.

can anyone explain me what is the purpose of checking the
range of the char like that in the more()?

Liyu Wang
August 10, 2011, 5:03 AM


Liyu Wang
August 10, 2011, 4:53 AM

try if you can survive 𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢

Daryl Herzmann
June 12, 2011, 9:15 PM


June 12, 2011, 10:17 AM

As Java 7 is coming this July, maybe we should already drop Java 5 support? Looking at this poll i see that majority is voting for the drop http://community.igniterealtime.org/polls/1025

Your pinned fields
Click on the next to a field label to start pinning.


Daryl Herzmann