Characters are Unicode scalar values.
Constructor
Char.mk
Fields
val : UInt32
The underlying Unicode scalar value as a UInt32.
valid : self.val.isValidChar
The value must be a legal scalar value.
Characters are represented by the type Char, which may be any Unicode scalar value.
While strings are UTF-8-encoded arrays of bytes, characters are represented by full 32-bit values.
Lean provides special syntax for character literals.
From the perspective of Lean's logic, characters consist of a 32-bit unsigned integer paired with a proof that it is a valid Unicode scalar value.
Char : TypeChar : Type
Characters are Unicode scalar values.
Char.mk
val : UInt32
The underlying Unicode scalar value as a UInt32.
valid : self.val.isValidChar
The value must be a legal scalar value.
As a trivial wrapper, characters are represented identically to UInt32.
In particular, characters are represented as 32-bit immediate values in monomorphic contexts.
In other words, a field of a constructor or structure of type Char does not require indirection to access.
In polymorphic contexts, characters are boxed.
Character literals consist of a single character or an escape sequence enclosed in single quotes (', Unicode 'APOSTROPHE' (U+0027)).
Between these single quotes, the character literal may contain character other that ', including newlines, which are included literally (with the caveat that all newlines in a Lean source file are interpreted as '\n', regardless of file encoding and platform).
Special characters may be escaped with a backslash, so '\'' is a character literal that contains a single quote.
The following forms of escape sequences are accepted:
\r, \n, \t, \\, \", \'
These escape sequences have the usual meaning, mapping to CR, LF, tab, backslash, double quote, and single quote, respectively.
\xNN
When NN is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode code point is indicated by the two-digit hexadecimal code.
\uNNNN
When NN is a sequence of two hexadecimal digits, this escape denotes the character whose Unicode code point is indicated by the four-digit hexadecimal code.
Converts an 8-bit unsigned integer into a character.
The integer's value is interpreted as a Unicode code point.
Converts a character into a UInt8 that contains its code point.
If the code point is larger than 255, it is truncated (reduced modulo 256).
There are two ways to convert a character to a string.
Char.toString converts a character to a singleton string that consists of only that character, while Char.quote converts the character to a string representation of the corresponding character literal.
Char.toString produces a string that contains only the character in question:
#eval 'e'.toString
"e"
#eval '\x65'.toString
"e"
#eval '"'.toString
"\""
Char.quote produces a string that contains a character literal, suitably escaped:
#eval 'e'.quote
"'e'"
#eval '\x65'.quote
"'e'"
#eval '"'.quote
"'\\\"'"
Returns true if the character is an ASCII letter.
The ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.
Returns true if the character is an ASCII letter or digit.
The ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.
The ASCII digits are the following: 0123456789.
Returns true if the character is an ASCII digit.
The ASCII digits are the following: 0123456789.
Returns true if the character is a lowercase ASCII letter.
The lowercase ASCII letters are the following: abcdefghijklmnopqrstuvwxyz.
Returns true if the character is a uppercase ASCII letter.
The uppercase ASCII letters are the following: ABCDEFGHIJKLMNOPQRSTUVWXYZ.
Converts a lowercase ASCII letter to the corresponding uppercase letter. Letters outside the ASCII alphabet are returned unchanged.
The lowercase ASCII letters are the following: abcdefghijklmnopqrstuvwxyz.
One character is less than or equal to another if its code point is less than or equal to the other's.