Hey there coder. We begin with the question, "UTF-8 Encoding" in this article.
Reader, we first want you to check out the problem so that you are able to understand its outline.
Important Links : Problem Link, Solution Video
- You are given an array of n integers. These integers represent the encoding in their binary form.
- You are required to test whether the array represents a valid sequence of UTF-8 characters or not.
- As per UTF-8 encoding, a character can be one to four bytes long and according to the length of the character rules are governing UTF-8 encoding:
- For a character of length 1 byte, the MSB should always begin with 0.
- For a character of length 2 bytes, the starting bits of the upper byte should be 110 and the lower byte should start with 10.
- For a character of length 2 bytes, the starting bits of the upper byte should be 1110 and the rest lower bytes should start with 10.
- For a character of length 4 bytes, the starting bits of the upper byte should be 11110 and the rest of the lower bytes should start with 10.
- So to summarize:
- Byte -> 0 _ _ _ _ _ _ _
- Bytes -> [1 1 0 _ _ _ _ _ ] [1 0 _ _ _ _ _ _ ]
- Bytes -> [1 1 1 0 _ _ _ _ ] [1 0 _ _ _ _ _ _ ] [1 0 _ _ _ _ _ _ ]
- Bytes -> [1 1 1 1 0 _ _ _ ] [1 0 _ _ _ _ _ _ ] [1 0 _ _ _ _ _ _ ] [1 0 _ _ _ _ _ _ ]