< Previous | Contents | Next >

UTF-8 encoding

These considerations apply to using UTF-8 encoding for object names:

Some character-set encoding schemes, such as UTF-8, can require more than one byte to encode a single character. As a result, such encoding can invisibly increase the length of an object name, causing it to exceed the HCP limit of 4,095 bytes.

When searching buckets, HDDS and HCP rely on UTF-8 encoding conventions to find objects by name. If the name of an object is not UTF-8 encoded, searches for the object by name may return unexpected results.

When the metadata query engine or HCP search facility indexes an object with a name that includes certain characters that cannot be UTF-8 encoded, it percent-encodes those characters. Searches for such objects by name must explicitly include the percent-encoded characters in the name.

Chapter 2: Bucket and object properties 19