Jump to content

Wikipedia:Naming conventions (technical restrictions)

Page semi-protected
From Wikipedia, the free encyclopedia
(Redirected from Wikipedia:TITLELENGTH)

Some page names are not possible because of limitations imposed by the MediaWiki software. In some cases (such as names which should begin with a lowercase letter, like eBay), a template can be added to the article to cause the title header to be displayed as desired. In other cases (such as names containing restricted characters) it is necessary to adopt and display a different title. This page describes appropriate ways to manage these situations.

Restrictions and workarounds

Restrictions on page titles are listed at Wikipedia:Page name § Technical restrictions and limitations. The most commonly encountered problems are that:

  • titles cannot begin with a lowercase letter;
  • titles cannot contain certain restricted characters.

There are two basic ways of handling a situation where the desired title of a page is technically impossible:

  • Use the magic word DISPLAYTITLE to change the way the title header is displayed on the page (although the stored page name is not affected). This is often done through a template, the most common one being {{lowercase}}, which causes the title to be displayed with an initial lowercase letter, as in iPod.
  • If this is not possible (due to restrictions on DISPLAYTITLE), choose a different title for the page, and use a template such as {{correct title}} to place a hatnote stating what the correct title should be. This is normally necessary in the case of restricted characters.

These templates should never be substituted (subst). To see which articles have these naming problems you can click on "What links here" in the toolbox for each template. If the template is substituted, it will no longer be linked.

Before declaring the current title to be "wrong" with the "correct title" template (or one of the more specific templates), please consider whether the title you are proposing as "correct" would really comply with Wikipedia conventions, particularly Wikipedia:Naming conventions (use English), Wikipedia:Manual of Style (capital letters) and Wikipedia:Manual of Style (trademarks).

Lowercase first letter

The MediaWiki software is configured so that a page title on the English Wikipedia (as stored in the database) cannot begin with a lower-case letter, and links that begin with a lower-case letter are treated as if capitalized, i.e. [[foo]] is treated the same as [[Foo]].

Examples of articles affected by this problem are:

Examples of categories affected by this problem are:

  • Category:macOS, located at Category:MacOS (and subcategories beginning with macOS)

Example of template affected by this problem:

This also means that the page Long s, on the character ſ, cannot be moved to (or redirected from) ſ, as ſ is a lowercase letter whose uppercase form is S.

To fix this problem, you can place the {{lowercase title}} wiki markup at the top of the article, category or template page (and optionally at the top of their talk/discussion page). This will cause the page title to be displayed with the initial letter in lowercase, as at eBay. Note that it does not fix every occurrence, like Wikipedia search bar search suggest drop-down list feature and Search results, as well as the page history, edit, log pages, or the browser address bar (it only affects the page title on the rendered HTML page and tab/window title bars).

Forbidden characters

Due to clashes with various elements of the MediaWiki software, some characters (and "characters") are not allowed to be part of page titles (nor are they supported by DISPLAYTITLE).

Clashes with wiki markup/HTML syntax

The following characters are forbidden due to clashes with wiki markup and HTML syntax:

# < > [ ] { } |

For articles about these characters, see number sign, less-than sign, greater-than sign, bracket (covers several characters), and vertical bar, respectively.

If the desired title of an article contains any of these characters, then an alternative title must be used instead. Often, you can simply remove the characters (e.g. MARRS instead of M|A|R|R|S). However, it may be necessary to spell out the character (e.g. C-sharp instead of C#) or use another substitute. Note that the sharp sign ♯ (different from the keyboard # character) can be used, as in C♯ (musical note).

In any of these cases, a hatnote should be placed at the top of the article informing readers what the correct title is. This is done using one of the following template calls:

  • {{Correct title|Title|reason=#}} for titles containing #
  • {{Correct title|Title|reason=bracket}} for titles containing < > [ ] { }
  • {{Correct title|Title|reason=vbar}} for titles containing |
    Use {{!}} to represent the | character within the correct title.
  • {{Correct title|Title}} for cases not covered by any one of the above.

Examples:

Clashes with invalid-UTF-8 handling

Titles cannot contain invalid UTF-8 sequences (for our purposes, those that would decode to UTF-16 unpaired surrogates or code points beyond U+10FFFF). Thus, titles like %ED%9F%C0 (contains a UTF-8 sequence decoding to code point U+D800, an unpaired surrogate) or %F6%80%80%80 (contains a UTF-8 sequence decoding to code point U+180000, beyond the U+10FFFF limit) are invalid. (These examples use percent-encoded URLs rather than wikilinks, as the "characters" themselves should be impossible to insert into wikitext without percent-encoding.)

This also means that three valid UTF-8 sequences are forbidden in page titles (how these are displayed may vary depending on your browser and installed fonts):

� � �

The first of these characters or "characters", the replacement character, is forbidden because the MediaWiki software uses the replacement character to represent invalid UTF-8 sequences, and cannot differentiate this use as a placeholder from an actual instance of the replacement character. The other two (the two noncharacters at the end of Unicode plane 0, the Basic Multilingual Plane) are forbidden because the MediaWiki software uses the replacement character as a placeholder for these, just as it does for invalid UTF-8 sequences. Note, however, that the other 64 Unicode noncharacters (a block of 32 from U+FDD0 through U+FDEF, plus the two at the end of each of planes 1 through 16 [totaling another 32]) are not forbidden in page titles, as can be seen in the following examples:

Noncharacter encoded at U+FDD0
Noncharacter encoded at U+10FFFE

Other problematic characters

Colons

In general, article titles containing colons are fine, subject to the following exceptions:

In the case of aliases a redirect can be created; as an example, "Project: Mersh" will be at Wikipedia:Mersh, which is what it resolves to.

Except in the case of initial colons and the w: and en: prefixes, DISPLAYTITLE will not work in the above situations. Use {{Correct title|Correct title|reason=:}}.

Forward slashes and periods

In namespaces where the subpage feature is enabled, the forward slash (/) separates a subpage name from its main page name. However subpages are disabled in the main namespace, so article names can contain slashes if appropriate, as in Providence/Stoughton Line – there is no need for such titles to be fixed. Be aware of the following side effects, however:

  • Subpages are still enabled in the talk namespace as they are widely used for archiving old discussions. Therefore, if an article has a forward slash in its name, its corresponding talk page may display an extraneous subpage level-up link at the top (for example, Talk:Providence/Stoughton Line has a link to Talk:Providence at the top).
  • If / is the first character of the title, then links to it from outside the main namespace will not work as expected (they will prepend the title of the current page); a workaround is to prepend a colon, or to use an HTML entity as the beginning of the link, e.g. [[:/pol/]], [[&#47;pol/]] or [[&#x2f;pol/]] to get to /pol/.

Page names consisting of exactly one or two periods (full stops), or beginning with ./ or ../, or containing /./ or /../, or ending with /. or /.., are not allowed. In most such cases DISPLAYTITLE will not work, so {{correct title}} should be used. As a result of this, the abbreviation of Slashdot, /., does not redirect to the page.

Percent and encoded characters

A title can normally contain the character %. However it cannot contain % followed by two hexadecimal digits (which would cause it to be converted to a single character, by percent-encoding). Similarly a title cannot contain HTML character entities such as &#47; and &ndash;, even if the character they represent is allowed. In the unlikely event of such sequences appearing in a desired title, an alternative title must be constructed (for example by inserting a space after the %, or omitting a semicolon).

Question marks and plus signs

There is no reason why titles should not include ? or +. However, with such titles, attention is required when typing URLs into the address bar of a browser. Here ? is interpreted as beginning a query string, and a + in a query string is interpreted as a space. In URLs, ? and + should be replaced by their corresponding escape codes, %3F and %2B. (The same technique is necessary for many other special characters, depending on browser.)

Spaces and underscores

In links, spaces ( ) and underscores (_) are treated equivalently. Underscores are used in URLs, spaces in displayed titles. Leading and trailing spaces/underscores are stripped, consecutive spaces/underscores are reduced to a single one, and page names consisting of only spaces and underscores are not allowed at all.

Titles affected by this behavior can generally be made to display correctly using the DISPLAYTITLE magic word. However, this does not work for titles consisting of only spaces and underscores, which should use a parenthetical disambiguator e.g. _ (album) is located at (album). Articles with underscores in titles are tracked in Category:Articles with underscores in the title.

Three or more consecutive tildes

Titles cannot contain three or more consecutive tildes (~~~), as four consecutive tildes are used to create standard editors' signatures on talk pages, while three consecutive tildes generates an undated signature. For this reason, ~~~ is located at Tilde Tilde Tilde.

Title length

Titles must be fewer than 256 bytes long when encoded in UTF-8. Therefore, the full titles of The Boy Bands Have Won and When the Pawn... cannot be displayed properly, so they must be located under their common shorthand names. Non-ASCII characters can take up to 4 bytes to encode, so the total number of allowable characters may be lower.

Italics and formatting

It is not possible for a title as stored in the database to contain formatting, such as italics or bolding. The double or triple apostrophes normally used to produce these effects in wiki markup are treated just as groups of apostrophes if they appear in titles. Other wiki markup or HTML-based formatting would require characters that are not permissible in titles (see Forbidden characters above).

It is technically possible to display formatting in titles using DISPLAYTITLE. A template, {{italic title}}, exists to display the title in italics. For guidance on when this technique should be used, see WP:ITALICTITLE.

Pictorial names

Titles cannot contain images (which would require forbidden characters in order to be displayed), only Unicode characters. For example, the recycling symbol is encoded in Unicode as U+2672, so it can be included, but the non-directional beacon symbol is not a Unicode character and cannot appear in a page title.

Browser support limitations

Use precomposed characters when possible.

Use the text normalization "Normalization Form C" (often abbreviated NFC). For more information, see the W3C's Character Model for the World Wide Web and Unicode's normalization forms.

Restrictions on usernames

Usernames are subject to the same technical restrictions as page titles (see Forbidden characters above). In particular, the symbols # < > [ ] | { } are not allowed. There are also additional restrictions:

  • The username must not already exist in the single unified login system.
  • It may not contain any of the symbols / @ : =.
  • It may not contain various control characters, unusual whitespace, or Private Use Area characters: U+0080–U+009F, U+00A0, U+2000–U+200F, U+2028–U+202F, U+3000, or U+E000–U+F8FF.
  • It may not be an IP address (including IPv6 such as 2606:4700:4700::1111), nor may it look like an IP address (for example, "564.348.992.800" is not a valid IP address, but since it looks like one, it is an invalid username).
  • It may not be one of a list of configured reserved usernames (e.g. "MediaWiki default").
  • It may not have a namespace or interwiki prefix.
  • It may not be more than 85 bytes long.

Additionally, there are the restrictions tested by the AntiSpoof extension, which includes more blacklisted characters (various '/'-lookalikes and characters from unusual scripts such as Runic, Ugaritic, and so on) and checks against mixed scripts. There are also limitations placed by meta:Title blacklist, both the normal blacklisting rules and those tagged by <newaccountonly>. Among the more notable of these are that accounts containing strings implying advanced permissions (e.g. "admin") or impersonating high-profile users are blocked.

Notes

  1. ^ except on a foreign WP:sister project where it links to the current language Wikipedia. See Help:Interwiki_linking.