We're updating the issue view to help you get more done. 

XMPPDecoder has a decode problem for UTF-8

Description

A utf8 character ( as char in Java) is usually composed of 1-3 bytes ( max is 6 bytes), see http://en.wikipedia.org/wiki/UTF-8 .

From now on, I assume a character that is 3 bytes.

Openfire use mina nio process network stream, and implement a XMPPDecoder for docode bytes to String/Stanza.

When decode a bytebuffer, it's may be incomplete bytes for a character. eg. In bytebuffer's last few bytes, you may receive one or two or three

bytes for a character, if there's 1 or 2 bytes then it's incomplete. It's Random happen incomplete state. If input long 3bytes character, the random probability significantly increased.

let's see org.jivesoftware.openfire.nio.XMLLightweightParser ( openfire 3.6.4 ):

Charset encoder = Charset.forName(charset);
CharBuffer charBuffer = encoder.decode(byteBuffer.buf());
char[] buf = charBuffer.array();
int readByte = charBuffer.remaining();
car lastChar = buf[readByte-1];
if (lastChar >= 0xfff0) { // you think it's incomplete, then position-1, readByte-1
byteBuffer.position(byteBuffer.position()-1); //error
readByte--; //error
}

The above code is not properly handled the case that is incomplete for UTF-8.

If a character is 3 bytes, there is incomplete for one or two bytes at the end of bytebuffer.

If one byte incomplete, bb's position should -1. If two bytes position incomplete, bb's position should -2.

So, if position-1 and two bytes position incomplete, this 3 bytes become the last two bytes for decode, and then be replace to two "FD".

Or so, if position-2 and one bytes position incomplete, this 3 bytes become the 4 bytes for decode, and then there's one more "FD" and this character.

Environment

None

Acceptance Test - Entry

None

Activity

Show:
wroot
June 12, 2011, 10:17 AM

As Java 7 is coming this July, maybe we should already drop Java 5 support? Looking at this poll i see that majority is voting for the drop http://community.igniterealtime.org/polls/1025

Daryl Herzmann
June 12, 2011, 9:15 PM

r12472

Liyu Wang
August 10, 2011, 4:53 AM

try if you can survive 𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢

Liyu Wang
August 10, 2011, 5:03 AM

𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢𤭢

Liyu Wang
August 15, 2011, 6:13 AM

actually the path fix the utf-8 bug.
the exception of my previous test case is threw by
the more() function from the MXParser.java. if you comment
that more() function, everything works fine.

can anyone explain me what is the purpose of checking the
range of the char like that in the more()?

Fixed

Assignee

Daryl Herzmann

Reporter

wroot

Labels

None

Expected Effort

None

Components

Fix versions

Affects versions

Priority

Major
Configure