String and Text Handling
Char
char represent a single Unicode character and alias the System.Char struct.
// char literals: char c = 'A'; char newLine = '\n';
System.Char defines a range of static methods for working with characters
Console.WriteLine (char.ToUpper ('c')); // C Console.WriteLine (char.IsWhiteSpace ('\t')); // True Console.WriteLine (char.IsLetter ('x')); // True Console.WriteLine (char.GetUnicodeCategory ('x')); // LowercaseLetter
ToUpper and ToLower honor the end-user’s locale, which can lead to subtle bugs. This applies to both char and string.
Turkey Example:
// To illustrate, let's pretend we live in Turkey: Thread.CurrentThread.CurrentCulture = CultureInfo.GetCultureInfo ("tr-TR"); // The following expression evaluates to false: Console.WriteLine (char.ToUpper ('i') == 'I'); // Let's see why: Console.WriteLine (char.ToUpper ('i')); // İ
To avoid this problem System.Char and System.String provides culture-invariant version.
// In contrast, the *Invariant methods always apply the same culture: Console.WriteLine (char.ToUpperInvariant ('i')); // I Console.WriteLine (char.ToUpperInvariant ('i') == 'I'); // True //shorthand Console.WriteLine(char.ToUpper('i', CultureInfo.InvariantCulture));
Static methods for categorizing characters
Static Method | Characters Included | Unicode |
---|---|---|
IsLetter | A–Z, a–z, and letters of other alphabets | UpperCaseLetter LowerCaseLetter TitleCaseLetter ModifierLetter OtherLetter |
IsUpper | Uppercase letters | UpperCaseLetter |
IsLower | Lowercase letters | LowerCaseLetter |
IsDigit | 0–9 plus digits of other alphabets | DecimalDigitNumber |
IsLetterOrDigit | Letters plus digits | (IsLetter, IsDigit) |
IsNumber | All digits plus Unicode fractions and Roman numeral symbols | DecimalDigitNumber LetterNumber OtherNumber |
IsSeparator | Space plus all Unicode separator characters | LineSeparator ParagraphSeparator |
IsWhiteSpace | All separators plus \n, \r, \t, \f, and \v | LineSeparator ParagraphSeparator |
IsPunctuation | Symbols used for punctuation in Western and other alphabets | DashPunctuation ConnectorPunctuation InitialQuotePunctuation FinalQuotePunctuation |
IsSymbol | Most other printable symbols | MathSymbol ModifierSymbol OtherSymbol |
IsControl | Nonprintable “control” characters below 0x20, such as \r, \n, \t, \0, and characters between 0x7F and 0x9A | (None) |
String
C# string is an immutable sequence of characters.
Constructing strings
// String literals: string s1 = "Hello"; string s2 = "First Line\r\nSecond Line"; string s3 = @"\\server\fileshare\helloworld.cs"; // To create a repeating sequence of characters you can use string’s constructor: Console.Write (new string ('*', 10)); // ********** // You can also construct a string from a char array. ToCharArray does the reverse: char[] ca = "Hello".ToCharArray(); string s = new string (ca); // s = "Hello" s.Dump();
Null and empty strings
An empty string has a length of zero, to create empty string you can use literal or static field: string.Empty. For testing empty string you can use Length property or equality comparison.
string empty = ""; Console.WriteLine (empty == ""); // True Console.WriteLine (empty == string.Empty); // True Console.WriteLine (empty.Length == 0); // True
Since strings are reference types they can be null also:
string nullString = null; Console.WriteLine (nullString == null); // True Console.WriteLine (nullString == ""); // False Console.WriteLine (string.IsNullOrEmpty (nullString)); // True Console.WriteLine (nullString.Length == 0); // NullReferenceException
static string.IsNullOrEmpty method is useful to test whether string is null or empty.
Accessing characters within a string
A string’s indexer returns a single character at the given index.
string str = "abcde"; char letter = str[1]; // letter == 'b' // string also implements IEnumerable, so you can foreach over its characters: foreach (char c in "123") Console.Write (c + ","); // 1,2,3,
Searching within strings
StartsWith | Returns boolean result and overloaded with StringComparison enum or CultureInfo object |
EndsWith | Returns boolean result and overloaded with StringComparison enum or CultureInfo object |
Contains | Search for a word within a string |
IndexOf | Returns the position of a given character or -1 if not found. |
// The simplest search methods are Contains, StartsWith, and EndsWith: Console.WriteLine ("quick brown fox".Contains ("brown")); // True Console.WriteLine ("quick brown fox".EndsWith ("fox")); // True // LastIndexOf is like IndexOf, but works backward through the string. // IndexOfAny returns the first matching position of any one of a set of characters: Console.WriteLine ("ab,cd ef".IndexOfAny (new char[] {' ', ','} )); // 2 Console.WriteLine ("pas5w0rd".IndexOfAny ("0123456789".ToCharArray() )); // 3 // LastIndexOfAny does the same in the reverse direction. // IndexOf is overloaded to accept a startPosition StringComparison enum, which enables case-insensitive searches: Console.WriteLine ("abcde".IndexOf ("CD", StringComparison.CurrentCultureIgnoreCase)); // 2
Manipulating Strings
String is immutable thus all the manipulated string return a new one.
Substrings(Int index, int length) | Extract portion of the string, length is optional. |
Insert(int index, string char) | Insert string at the specified index position. |
Remove(int32 index, int32 count) | Remove characters from string based on the specified position and count the number of characters, count is option |
Padleft(int totalwidth, paddingChar) | Return a right-align string of specified length, with or without padding characters. |
PadRight(int totalwidth, paddingChar) | Return a left-align string of specified length, with or without padding characters. |
TrimStart(char[]) | Remove specified characters from the beginning of a string. |
TrimEnd(char[]) | Remove specified characters from the end of a string. |
Trim() | Removes all leading and trailing white-space characters. |
Replace() | Replaces all nonoverlapping occurrences of a particular character or string |
ToUpper() | Return uppercase of the string. |
ToLower() | Return lowercase of the string. |
Splitting and joining strings
Split() | Divides the string based on the delimiter specified, overloaded to accept params and also accepts StringSplitOptions enum. StringSplitOptions has an option to remove empty entries useful when words are seperated by several delimiters in a row. |
Join() | Joins set of string, it requires a delimiter and string array. |
Concat() | Joining set of strings but accepts params of string and has no separator. |
// Split takes a sentence and returns an array of words (default delimiters = whitespace): string[] words = "The quick brown fox".Split(); words.Dump(); // The static Join method does the reverse of Split: string together = string.Join (" ", words); together.Dump(); // The quick brown fox // The static Concat method accepts only a params string array and applies no separator. // This is exactly equivalent to the + operator: string sentence = string.Concat ("The", " quick", " brown", " fox"); string sameSentence = "The" + " quick" + " brown" + " fox"; sameSentence.Dump(); // The quick brown fox
String.Format and composite format strings
String.Format is used for building strings that embeds variables.
When calling String.Format you provide composite format string.
// When calling String.Format, provide a composite format string followed by each of the embedded variables string composite = "It's {0} degrees in {1} on this {2} morning"; string s = string.Format (composite, 35, "Perth", DateTime.Now.DayOfWeek); s.Dump(); // The minimum width in a format string is useful for aligning columns. // If the value is negative, the data is left-aligned; otherwise, it’s right-aligned: composite = "Name={0,-20} Credit Limit={1,15:C}"; Console.WriteLine (string.Format (composite, "Mary", 500)); Console.WriteLine (string.Format (composite, "Elizabeth", 20000)); // The equivalent without using string.Format: s = "Name=" + "Mary".PadRight (20) + " Credit Limit=" + 500.ToString ("C").PadLeft (15); s.Dump();
From C# 6, you can use interpolated string literals, just precede with $ symbol.
string s = $"It's hot this {DateTime.Now.DayOfWeek} morning";
Comparing Strings
Equality Comparisons
Compares that two instances are semantically same.
For string-equality you can use == operator or strings Equals method.
Order Comparisons
Tests which two instances comes first when arranging in ascending or descending order.
For string order comparisons you cna use CompareTo instance method or static Compare and CompareOrdinal methods.
Ordinal versus culture comparison
Ordinal
Interpret characters simply as numbers based on numeric Unicode value.
Culture
Interpret characters with reference to a particular alphabet.
2 Types of Cultures
1. Current Culture – settings picked up from the computer’s control panel.
2. Invariant Culture – same on every computer closely matches American culture.
Both culture types are useful for equality comparisons.
Culture-specific comparison is nearly always preferable.
String equality comparison
string == performs ordinal casesensitive comparison and also for string.Equals when it is called without parameter.
public enum StringComparison { CurrentCulture, CurrentCultureIgnoreCase, InvariantCulture, InvariantCultureIgnoreCase, Ordinal, OrdinalIgnoreCase }
String-order comparison
String’s CompareTo instance method performs culture-sensitive, case-sensitive order
comparison.
// String comparisons can be ordinal vs culture-sensitive; case-sensitive vs case-insensitive. Console.WriteLine (string.Equals ("foo", "FOO", StringComparison.OrdinalIgnoreCase)); // True // (The following symbols may not be displayed correctly, depending on your font): Console.WriteLine ("ṻ" == "ǖ"); // False // The order comparison methods return a positive number, a negative number, or zero, depending // on whether the first value comes after, before, or alongside the second value: Console.WriteLine ("Boston".CompareTo ("Austin")); // 1 Console.WriteLine ("Boston".CompareTo ("Boston")); // 0 Console.WriteLine ("Boston".CompareTo ("Chicago")); // -1 Console.WriteLine ("ṻ".CompareTo ("ǖ")); // 0 Console.WriteLine ("foo".CompareTo ("FOO")); // -1 // The following performs a case-insensitive comparison using the current culture: Console.WriteLine (string.Compare ("foo", "FOO", true)); // 0 // By supplying a CultureInfo object, you can plug in any alphabet: CultureInfo german = CultureInfo.GetCultureInfo ("de-DE"); int i = string.Compare ("Müller", "Muller", false, german); i.Dump(); // 1