Welcome to a tutorial on Elixir. Here you will learn about Strings in Elixir.
In Elixir, Strings are inserted between double quotes, and they are encoded in UTF-8. This is quite de-similar from C and C++ in which the default strings are ASCII encoded and only 256 different characters are possible, as the UTF-8 consists of 1,112,064 code points. It means that UTF-8 encoding consists of those many different possible characters. But, we can use symbols such as ö, ł, etc, since the strings use utf-8.
Let’s create a string variable, by simply assigning a string to a variable, as shown below.
str = "Hello world"
But, to print this to your console, just call the IO.puts function and pass it the variable str:
str = str = "Hello world"
IO.puts(str)
And the output is:
Hello World
Here we can create an empty string by using the string literal, "". Check the example below.
a = ""
if String.length(a) === 0 do
IO.puts("a is an empty string")
end
The output is:
a is an empty string
String interpolation is a simple way of constructing a new String value from a mix of constants, variables, literals, and expressions by including their values inside a string literal. Thus, Elixir supports string interpolation, so to make use of a variable in a string, when writing it, wrap it with curly braces and prepend the curly braces with a '#' sign. This is shown below.
x = "Justin"
y = "My Name is #{x}"
IO.puts(y)
the above code takes the value of x and substitutes it with y. The output will look like this:
My Name is Justin
You already learned about String concatenation in the previous tutorial, where the '<>' operator is used to concatenate strings in Elixir. Check out the example on how to concatenate 2 strings,
x = "Justin"
y = "Drake"
z = x <> " " <> y
IO.puts(z)
The output is:
Justin Drake
We can use the String.length function to obtain the length of a string, by simply passing the string as a parameter and the size will be displayed. Check out the example below.
IO.puts(String.length("Hello"))
The output is:
5
We can reverse a string by simply passing it to the String.reverse function. This is shown below.
IO.puts(String.reverse("Elixir"))
The output is:
rixilE
Now to compare 2 strings, the == or the === operators can be used. This is shown below:
var_1 = "Hello world"
var_2 = "Hello Elixir"
if var_1 === var_2 do
IO.puts("#{var_1} and #{var_2} are the same")
else
IO.puts("#{var_1} and #{var_2} are not the same")
end
The output is:
Hello world and Hello elixir are not the same.
You have already known the use of the =~ string match operator. Now, to confirm if a string matches a regex, is the string match operator or the String.match? the function can be used. Check out the example below.
IO.puts(String.match?("foo", ~r/foo/))
IO.puts(String.match?("bar", ~r/foo/))
The output is:
true
false
Also, this can be achieved by using the =~ operator, as shown below.
IO.puts("foo" =~ ~r/foo/)
The output is:
true
Elixir supports a large number of functions related to strings, the table below shows a few of the most used functions and their purpose.
Sr.No. | Function and its Purpose |
1 | at(string, position): This returns the grapheme at the position of the given utf8 string. If position is greater than string length, then it returns nil |
2 | capitalize(string): This converts the first character in the given string to uppercase and the remainder to lowercase |
3 | contains?(string, contents): This checks if a string contains any of the given contents |
4 | downcase(string): This converts all characters in the given string to lowercase |
5 | ends_with?(string, suffixes): This returns true if a string ends with any of the suffixes given |
6 | first(string): This returns the first graphene from a utf8 string, nil if the string is empty |
7 | last(string): This returns the last grapheme from a utf8 string, nil if the string is empty |
8 | replace(subject, pattern, replacement, options \ []): This returns a new string created by replacing occurrences of pattern in subject with replacement |
9 | slice(string, start, len): This returns a substring starting at the offset start and of length len |
10 | split(string): This divides a string into substrings at each Unicode whitespace occurrence with leading and trailing whitespace ignored. The groups of whitespace are treated as a single occurrence. However, divisions do not occur on non-breaking whitespace |
11 | upcase(string): This converts all characters in the given string to uppercase |
A binary is simply a sequence of bytes. Binaries are defined using << >>, as shown below.
<< 0, 1, 2, 3 >>
But, interestingly, those bytes can be organized in any way, even in a sequence that does not make them a valid string. Check out the example below.
<< 239, 191, 191 >>
But, Strings are also binaries, also, the string concatenation operator <> is a Binary concatenation operator: check out the example below
IO.puts(<< 0, 1 >> <> << 2, 3 >>)
The output is:
<< 0, 1, 2, 3 >>
Note that the ł character representation takes up 2 bytes since it is utf-8 encoded.
Now, since each number represented in a binary is meant to be a byte when this value goes up from 255, it will be truncated. But, to prevent this, we can make use of a size modifier to specify how many bits we want that number to take. Check out the example below.
IO.puts(<< 256 >>) # truncated, it'll print << 0 >>
IO.puts(<< 256 :: size(16) >>) #Takes 16 bits/2 bytes, will print << 1, 0 >>
The output is:
<< 0 >>
<< 1, 0 >>
Also, we use the utf8 modifier, if a character is a code point then, it will be produced in the output or else the bytes. This is shown below.
IO.puts(<< 256 :: utf8 >>)
The output is:
Ā
In addition, we have a function called is_binary that checks if a given variable is a binary. But, take note that only variables which are stored as multiples of 8bits are binaries.
If we define a binary by making use of the size modifier and passing it a value that is not a multiple of 8, we end up having a bitstring rather than a binary. Check out the example below.
bs = << 1 :: size(1) >>
IO.puts(bs)
IO.puts(is_binary(bs))
IO.puts(is_bitstring(bs))
The output is:
<< 1::size(1) >>
false
true
This means that variable bs is not a binary, but rather a bitstring. Also, we can say that a binary is a bitstring where the number of bits is divisible by 8. Interestingly, pattern matching works even on binaries and on bitstrings in the same manner.