String and Text Handling

Char

char represent a single Unicode character and alias the System.Char struct.

// char literals:
char c = 'A';
char newLine = '\n';

System.Char defines a range of static methods for working with characters

Console.WriteLine (char.ToUpper ('c'));				// C
Console.WriteLine (char.IsWhiteSpace ('\t'));		// True
Console.WriteLine (char.IsLetter ('x'));			// True
Console.WriteLine (char.GetUnicodeCategory ('x'));	// LowercaseLetter

ToUpper and ToLower honor the end-user’s locale, which can lead to subtle bugs. This applies to both char and string.

Turkey Example:

// To illustrate, let's pretend we live in Turkey:
Thread.CurrentThread.CurrentCulture = CultureInfo.GetCultureInfo ("tr-TR");

// The following expression evaluates to false:
Console.WriteLine (char.ToUpper ('i') == 'I');

// Let's see why:
Console.WriteLine (char.ToUpper ('i'));   // İ

To avoid this problem System.Char and System.String provides culture-invariant version.

// In contrast, the *Invariant methods always apply the same culture:
Console.WriteLine (char.ToUpperInvariant ('i'));			// I
Console.WriteLine (char.ToUpperInvariant ('i') == 'I');		// True

//shorthand
Console.WriteLine(char.ToUpper('i', CultureInfo.InvariantCulture));

Static methods for categorizing characters

Static Method Characters Included Unicode
IsLetter A–Z, a–z, and letters of other alphabets UpperCaseLetter
LowerCaseLetter
TitleCaseLetter
ModifierLetter
OtherLetter
IsUpper Uppercase letters UpperCaseLetter
IsLower Lowercase letters LowerCaseLetter
IsDigit 0–9 plus digits of other alphabets DecimalDigitNumber
IsLetterOrDigit Letters plus digits (IsLetter, IsDigit)
IsNumber All digits plus Unicode fractions and Roman numeral symbols DecimalDigitNumber
LetterNumber
OtherNumber
IsSeparator Space plus all Unicode separator characters LineSeparator
ParagraphSeparator
IsWhiteSpace All separators plus \n, \r, \t, \f, and \v LineSeparator
ParagraphSeparator
IsPunctuation Symbols used for punctuation in Western and other alphabets DashPunctuation
ConnectorPunctuation
InitialQuotePunctuation
FinalQuotePunctuation
IsSymbol Most other printable symbols MathSymbol
ModifierSymbol
OtherSymbol
IsControl Nonprintable “control” characters below 0x20, such as \r, \n, \t, \0, and characters between 0x7F and 0x9A (None)

String

Related Article

C# string is an immutable sequence of characters.

Constructing strings

// String literals:
string s1 = "Hello";
string s2 = "First Line\r\nSecond Line";
string s3 = @"\\server\fileshare\helloworld.cs";

// To create a repeating sequence of characters you can use string’s constructor:
Console.Write (new string ('*', 10));    // **********

// You can also construct a string from a char array. ToCharArray does the reverse:
char[] ca = "Hello".ToCharArray();
string s = new string (ca);              // s = "Hello"
s.Dump();

Null and empty strings

An empty string has a length of zero, to create empty string you can use literal or static field: string.Empty. For testing empty string you can use Length property or equality comparison.

string empty = "";
Console.WriteLine (empty == "");              // True
Console.WriteLine (empty == string.Empty);    // True
Console.WriteLine (empty.Length == 0);        // True

Since strings are reference types they can be null also:

string nullString = null;
Console.WriteLine (nullString == null);        // True
Console.WriteLine (nullString == "");          // False
Console.WriteLine (string.IsNullOrEmpty (nullString));	// True
Console.WriteLine (nullString.Length == 0);             // NullReferenceException

static string.IsNullOrEmpty method is useful to test whether string is null or empty.

Accessing characters within a string

A string’s indexer returns a single character at the given index.

string str  = "abcde";
char letter = str[1];        // letter == 'b'

// string also implements IEnumerable, so you can foreach over its characters:
foreach (char c in "123") Console.Write (c + ",");    // 1,2,3,

Searching within strings

StartsWith Returns boolean result and overloaded with StringComparison enum or CultureInfo object
EndsWith Returns boolean result and overloaded with StringComparison enum or CultureInfo object
Contains Search for a word within a string
IndexOf Returns the position of a given character or -1 if not found.
// The simplest search methods are Contains, StartsWith, and EndsWith:
Console.WriteLine ("quick brown fox".Contains ("brown"));    // True
Console.WriteLine ("quick brown fox".EndsWith ("fox"));      // True

// LastIndexOf is like IndexOf, but works backward through the string.
// IndexOfAny returns the first matching position of any one of a set of characters:
Console.WriteLine ("ab,cd ef".IndexOfAny (new char[] {' ', ','} ));       // 2
Console.WriteLine ("pas5w0rd".IndexOfAny ("0123456789".ToCharArray() ));  // 3

// LastIndexOfAny does the same in the reverse direction.

// IndexOf is overloaded to accept a startPosition StringComparison enum, which enables case-insensitive searches:
Console.WriteLine ("abcde".IndexOf ("CD", StringComparison.CurrentCultureIgnoreCase));    // 2

Manipulating Strings

String is immutable thus all the manipulated string return a new one.

Substrings(Int index, int length) Extract portion of the string, length is optional.
Insert(int index, string char) Insert string at the specified index position.
Remove(int32 index, int32 count) Remove characters from string based on the specified position and count the number of characters, count is option
Padleft(int totalwidth, paddingChar) Return a right-align string of specified length, with or without padding characters.
PadRight(int totalwidth, paddingChar) Return a left-align string of specified length, with or without padding characters.
TrimStart(char[]) Remove specified characters from the beginning of a string.
TrimEnd(char[]) Remove specified characters from the end of a string.
Trim() Removes all leading and trailing white-space characters.
Replace() Replaces all nonoverlapping occurrences of a particular character or string
ToUpper() Return uppercase of the string.
ToLower() Return lowercase of the string.

Splitting and joining strings

Split() Divides the string based on the delimiter specified, overloaded to accept params and also accepts StringSplitOptions enum. StringSplitOptions has an option to remove empty entries useful when words are seperated by several delimiters in a row.
Join() Joins set of string, it requires a delimiter and string array.
Concat() Joining set of strings but accepts params of string and has no separator.
// Split takes a sentence and returns an array of words (default delimiters = whitespace):
string[] words = "The quick brown fox".Split();
words.Dump();
  
// The static Join method does the reverse of Split:
string together = string.Join (" ", words);
together.Dump();								// The quick brown fox

// The static Concat method accepts only a params string array and applies no separator.
// This is exactly equivalent to the + operator:
string sentence     = string.Concat ("The", " quick", " brown", " fox");
string sameSentence = "The" + " quick" + " brown" + " fox";

sameSentence.Dump();		// The quick brown fox

String.Format and composite format strings

String.Format is used for building strings that embeds variables.

When calling String.Format you provide composite format string.

// When calling String.Format, provide a composite format string followed by each of the embedded variables
string composite = "It's {0} degrees in {1} on this {2} morning";
string s = string.Format (composite, 35, "Perth", DateTime.Now.DayOfWeek);
s.Dump();

// The minimum width in a format string is useful for aligning columns.
// If the value is negative, the data is left-aligned; otherwise, it’s right-aligned:
composite = "Name={0,-20} Credit Limit={1,15:C}";

Console.WriteLine (string.Format (composite, "Mary", 500));
Console.WriteLine (string.Format (composite, "Elizabeth", 20000));

// The equivalent without using string.Format:
s = "Name=" + "Mary".PadRight (20) + " Credit Limit=" + 500.ToString ("C").PadLeft (15);
s.Dump();

From C# 6, you can use interpolated string literals, just precede with $ symbol.

string s = $"It's hot this {DateTime.Now.DayOfWeek} morning";

Comparing Strings
Equality Comparisons

Compares that two instances are semantically same.

For string-equality you can use == operator or strings Equals method.

Order Comparisons

Tests which two instances comes first when arranging in ascending or descending order.

For string order comparisons you cna use CompareTo instance method or static Compare and CompareOrdinal methods.

Ordinal versus culture comparison
Ordinal

Interpret characters simply as numbers based on numeric Unicode value.

Culture

Interpret characters with reference to a particular alphabet.

2 Types of Cultures

1. Current Culture – settings picked up from the computer’s control panel.

2. Invariant Culture – same on every computer closely matches American culture.

Both culture types are useful for equality comparisons.

Culture-specific comparison is nearly always preferable.

String equality comparison

string == performs ordinal casesensitive comparison and also for string.Equals when it is called without parameter.

public enum StringComparison
{
	CurrentCulture,
	CurrentCultureIgnoreCase,
	InvariantCulture,
	InvariantCultureIgnoreCase,
	Ordinal,
	OrdinalIgnoreCase
}

String-order comparison

String’s CompareTo instance method performs culture-sensitive, case-sensitive order
comparison.

// String comparisons can be ordinal vs culture-sensitive; case-sensitive vs case-insensitive.

Console.WriteLine (string.Equals ("foo", "FOO", StringComparison.OrdinalIgnoreCase));   // True

// (The following symbols may not be displayed correctly, depending on your font):
Console.WriteLine ("ṻ" == "ǖ");   // False

// The order comparison methods return a positive number, a negative number, or zero, depending
// on whether the first value comes after, before, or alongside the second value:
Console.WriteLine ("Boston".CompareTo ("Austin"));    // 1
Console.WriteLine ("Boston".CompareTo ("Boston"));    // 0
Console.WriteLine ("Boston".CompareTo ("Chicago"));   // -1
Console.WriteLine ("ṻ".CompareTo ("ǖ"));              // 0
Console.WriteLine ("foo".CompareTo ("FOO"));          // -1

// The following performs a case-insensitive comparison using the current culture:
Console.WriteLine (string.Compare ("foo", "FOO", true));   // 0

// By supplying a CultureInfo object, you can plug in any alphabet:
CultureInfo german = CultureInfo.GetCultureInfo ("de-DE");
int i = string.Compare ("Müller", "Muller", false, german);
i.Dump();	// 1

Leave a Reply

Your email address will not be published. Required fields are marked *