QR code encoding
The format information records two things: the error correction level and the mask pattern used for the symbol. Masking is used to break up patterns in the data area that might confuse a scanner, such as large blank areas or misleading features that look like the locator marks.
The mask patterns are defined on a grid that is repeated as necessary to cover the whole symbol. Modules corresponding to the dark areas of the mask are inverted. The format information is protected from errors with a BCH code[1], and two complete copies are included in each QR symbol.
The message dataset is placed from right to left in a zigzag pattern, as shown below. In larger symbols, this is complicated by the presence of the alignment patterns and the use of multiple interleaved error-correction blocks.
In the Figure 1, the format information is protected by a (15,5) BCH code, which can correct up to 3 bit errors. The total length of the code is 15 bits, of which 5 are data bits (2 EC level + 3 mask pattern) and 10 are extra bits for error correction. The format mask for these 15 bits is: [101010000010010]. Note that we map the masked values directly to its meaning here, in contrast to image 4 "Levels & Masks" where the mask pattern numbers are the result of putting the 3rd to 5th mask bit, [101], over the 3rd to 5th format info bit of the QR code.
The message is encoded using a (255,249) Reed Solomon code (shortened to (24,18) code by using "padding") which can correct up to 3 byte errors.
The message has 26 data bytes and is encoded using two Reed-Solomon code blocks. Each block is a (255,233) Reed Solomon code (shortened to (35,13) code), which can correct up to 11 byte errors in a single burst, containing 13 data bytes and 22 "parity" bytes appended to the data bytes. The two 35-byte Reed-Solomon code blocks are interleaved so it can correct up to 22 byte errors in a single burst (resulting in a total of 70 code bytes). The symbol achieves level H error correction.
The general structure of a QR encoding is as a sequence of 4 bit indicators with payload length dependent on the indicator mode (e.g. byte encoding payload length is dependent on the first byte).[2]
Mode indicator | Description | Typical structure '[ type : sizes in bits ]' |
0001 | Numeric | [0001 : 4] [ Character Count Indicator : variable ] [ Data Bit Stream : 3 1⁄3 × charcount ] |
0010 | Alphanumeric | [0010 : 4] [ Character Count Indicator : variable ] [ Data Bit Stream : 5 1⁄2 × charcount ] |
0100 | Byte encoding | [0100 : 4] [ Character Count Indicator : variable ] [ Data Bit Stream : 8 × charcount ] |
1000 | Kanji encoding | [1000 : 4] [ Character Count Indicator : variable ] [ Data Bit Stream : 13 × charcount ] |
0011 | Structured append | [0011 : 4] [ Symbol Position : 4 ] [ Total Symbols: 4 ] [ Parity : 8 ] |
0111 | ECI | [0111 : 4] [ ECI Assignment number : variable ] |
0101 | FNC1 in first position | [0101 : 4] [ Numeric/Alphanumeric/Byte/Kanji payload : variable ] |
1001 | FNC1 in second position | [1001 : 4] [ Application Indicator : 8 ] [ Numeric/Alphanumeric/Byte/Kanji payload : variable ] |
0000 | End of message | [0000 : 4] |
Note:
- Character Count Indicator depends on how many modules are in a QR code (Symbol Version).
- ECI Assignment number Size:
- 8 × 1 bits if ECI Assignment Bitstream starts with '0'
- 8 × 2 bits if ECI Assignment Bitstream starts with '10'
- 8 × 3 bits if ECI Assignment Bitstream starts with '110'
Four-bit indicators are used to select the encoding mode and convey other information.
Encoding modes
Indicator | Meaning |
0001 | Numeric encoding (10 bits per 3 digits) |
0010 | Alphanumeric encoding (11 bits per 2 characters) |
0100 | Byte encoding (8 bits per character) |
1000 | Kanji encoding (13 bits per character) |
0011 | Structured append (used to split a message across multiple QR symbols) |
0111 | Extended Channel Interpretation[3] (select alternate character set or encoding) |
0101 | FNC1 in first position (see Code 128[4] for more information) |
1001 | FNC1 in second position |
0000 | End of message (Terminator) |
Encoding modes can be mixed as needed within a QR symbol. (e.g., a url with a long string of alphanumeric characters )
[ Mode Indicator][ Mode bitstream ] --> [ Mode Indicator][ Mode bitstream ] --> etc... --> [ 0000 End of message (Terminator) ]
After every indicator that selects an encoding mode is a length field that tells how many characters are encoded in that mode. The number of bits in the length field depends on the encoding and the symbol version.
Number of bits in a length field
(Character Count Indicator)
Encoding | Ver. 1–9 | 10–26 | 27–40 |
Numeric | 10 | 12 | 14 |
Alphanumeric | 9 | 11 | 13 |
Byte | 8 | 16 | 16 |
Kanji | 8 | 10 | 12 |
Alphanumeric encoding mode stores a message more compactly than the byte mode can, but cannot store lower-case letters and has only a limited selection of punctuation marks, which are sufficient for rudimentary web addresses[5]. Two characters are coded in an 11-bit value by this formula:
V = 45 × C1 + C2
This has the exception that the last character in an alphanumeric string with an odd length is read as a 6-bit value instead.
Alphanumeric character codes
Code | Character | Code | Character | Code | Character | Code | Character | Code | Character |
00 | 0 | 09 | 9 | 18 | I | 27 | R | 36 | Space |
01 | 1 | 10 | A | 19 | J | 28 | S | 37 | $ |
02 | 2 | 11 | B | 20 | K | 29 | T | 38 | % |
03 | 3 | 12 | C | 21 | L | 30 | U | 39 | * |
04 | 4 | 13 | D | 22 | M | 31 | V | 40 | + |
05 | 5 | 14 | E | 23 | N | 32 | W | 41 | - |
06 | 6 | 15 | F | 24 | O | 33 | X | 42 | . |
07 | 7 | 16 | G | 25 | P | 34 | Y | 43 | / |
08 | 8 | 17 | H | 26 | Q | 35 | Z | 44 | : |
[1] https://en.wikipedia.org/wiki/BCH_code
[2] ISO/IEC 18004:2006(E) § 6.4 Data encoding; Table 3 – Number of bits in character count indicator for QR Code 2005
[3] https://en.wikipedia.org/wiki/Extended_Channel_Interpretation
[4] https://en.wikipedia.org/wiki/Code_128
[5] https://en.wikipedia.org/wiki/URL
Source: Wikipedia.com
More information