Talk:Numeric character reference
This article has not yet been rated on Wikipedia's content assessment scale. |
Numeric character conversor
[edit]Perl
[edit]For usual needs, there are a "1 line code" conversor for Perl:
while (<STDIN>) { s/(.)/(ord($1)>127)? ('&#'.ord($1).';'): $1/ge; print $_; }
(use %perl code.pl < fileIn.txt > fileOut.txt
)
It converts unicode or ISO Latim to XML-compatible ASCII.
JavaScript
[edit]function unicode_to_ncr(text){ var ncr_text = "" var text_length = text.length for(var index = 0; index < text_length; index++) { var character = text.charAt(index) var ncr_character = character.charCodeAt(0) if(ncr_character < 128) { ncr_text += character } else { ncr_text += "&#"+ncr_character+";" } } return ncr_text }
It, also, converts unicode or ISO Latin to XML-compatible ASCII.
Terminology?
[edit]The nomenclature used in this article is not the same as the basic SGML one. SGML has two proper names, "character reference", which is the numeric character reference described here, and "entity reference", which is a macro resolving to any sequence of characters.
The list of entity references used in HTML all resolve to exactly one character. But that doesn't make them special cases, as the phrase character entity reference implies; they just all happen to be one-character strings. Pim 2 (talk) 11:26, 11 December 2011 (UTC)