Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Modern C++ Programming Cookbook

You're reading from   Modern C++ Programming Cookbook Master C++ core language and standard library features, with over 100 recipes, updated to C++20

Arrow left icon
Product type Paperback
Published in Sep 2020
Publisher Packt
ISBN-13 9781800208988
Length 750 pages
Edition 2nd Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Marius Bancila Marius Bancila
Author Profile Icon Marius Bancila
Marius Bancila
Arrow right icon
View More author details
Toc

Table of Contents (16) Chapters Close

Preface Learning Modern Core Language Features Working with Numbers and Strings FREE CHAPTER Exploring Functions Preprocessing and Compilation Standard Library Containers, Algorithms, and Iterators General-Purpose Utilities Working with Files and Streams Leveraging Threading and Concurrency Robustness and Performance Implementing Patterns and Idioms Exploring Testing Frameworks C Plus Plus 20 Core Features Bibliography Other Books You May Enjoy
Index

Replacing the content of a string using regular expressions

In the previous two recipes, we looked at how to match a regular expression on a string or a part of a string and iterate through matches and submatches. The regular expression library also supports text replacement based on regular expressions. In this recipe, we will learn how to use std::regex_replace() to perform such text transformations.

Getting ready

For general information about regular expressions support in C++11, refer to the Verifying the format of a string using regular expressions recipe, earlier in this chapter.

How to do it...

In order to perform text transformations using regular expressions, you should perform the following:

  • Include <regex> and <string> and the namespace std::string_literals for C++14 standard user-defined literals for strings:
    #include <regex>
    #include <string>
    using namespace std::string_literals;
    
  • Use the std::regex_replace() algorithm with a replacement string as the third argument. Consider this example: replace all words composed of exactly three characters that are either a, b, or c with three hyphens:
    auto text{"abc aa bca ca bbbb"s};
    auto rx = std::regex{ R"(\b[a|b|c]{3}\b)"s };
    auto newtext = std::regex_replace(text, rx, "---"s);
    
  • Use the std::regex_replace() algorithm with match identifiers prefixed with a $ for the third argument. For example, replace names in the format "lastname, firstname" with names in the format "firstname lastname", as follows:
    auto text{ "bancila, marius"s };
    auto rx = std::regex{ R"((\w+),\s*(\w+))"s };
    auto newtext = std::regex_replace(text, rx, "$2 $1"s);
    

How it works...

The std::regex_replace() algorithm has several overloads with different types of parameters, but the meaning of the parameters is as follows:

  • The input string on which the replacement is performed.
  • An std::basic_regex object that encapsulates the regular expression used to identify the parts of the strings to be replaced.
  • The string format used for replacement.
  • Optional matching flags.

The return value is, depending on the overload used, either a string or a copy of the output iterator provided as an argument. The string format used for replacement can either be a simple string or a match identifier, indicated with a $ prefix:

  • $& indicates the entire match.
  • $1, $2, $3, and so on indicate the first, second, and third submatches, and so on.
  • $` indicates the part of the string before the first match.
  • $' indicates the part of the string after the last match.

In the first example shown in the How to do it... section, the initial text contains two words made of exactly three a, b, and c characters, abc and bca. The regular expression indicates an expression of exactly three characters between word boundaries. This means a subtext, such as bbbb, will not match the expression. The result of the replacement is that the string text will be --- aa --- ca bbbb.

Additional flags for the match can be specified for the std::regex_replace() algorithm. By default, the matching flag is std::regex_constants::match_default, which basically specifies ECMAScript as the grammar used for constructing the regular expression. If we want, for instance, to replace only the first occurrence, then we can specify std::regex_constants::format_first_only. In the following example, the result is --- aa bca ca bbbb as the replacement stops after the first match is found:

auto text{ "abc aa bca ca bbbb"s };
auto rx = std::regex{ R"(\b[a|b|c]{3}\b)"s };
auto newtext = std::regex_replace(text, rx, "---"s,
                 std::regex_constants::format_first_only);

The replacement string, however, can contain special indicators for the whole match, a particular submatch, or the parts that were not matched, as explained earlier. In the second example shown in the How to do it... section, the regular expression identifies a word of at least one character, followed by a comma and possible white spaces, and then another word of at least one character. The first word is supposed to be the last name, while the second word is supposed to be the first name. The replacement string is in the $2 $1 format. This is an instruction that's used to replace the matched expression (in this example, the entire original string) with another string formed of the second submatch, followed by a space and then the first submatch.

In this case, the entire string was a match. In the following example, there will be multiple matches inside the string, and they will all be replaced with the indicated string. In this example, we are replacing the indefinite article a when preceding a word that starts with a vowel (this, of course, does not cover words that start with a vowel sound) with the indefinite article an:

auto text{"this is a example with a error"s};
auto rx = std::regex{R"(\ba ((a|e|i|u|o)\w+))"s};
auto newtext = std::regex_replace(text, rx, "an $1");

The regular expression identifies the letter a as a single word (\b indicates a word boundary, so \ba means a word with a single letter, a), followed by a space and a word of at least two characters starting with a vowel. When such a match is identified, it is replaced with a string formed of the fixed string an, followed by a space and the first subexpression of the match, which is the word itself. In this example, the newtext string will be this is an example with an error.

Apart from the identifiers of the subexpressions ($1, $2, and so on), there are other identifiers for the entire match ($&), the part of the string before the first match ($`), and the part of the string after the last match ($'). In the last example, we change the format of a date from dd.mm.yyyy to yyyy.mm.dd, but also show the matched parts:

auto text{"today is 1.06.2016!!"s};
auto rx =
   std::regex{R"((\d{1,2})(\.|-|/)(\d{1,2})(\.|-|/)(\d{4}))"s};
// today is 2016.06.1!!
auto newtext1 = std::regex_replace(text, rx, R"($5$4$3$2$1)");
// today is [today is ][1.06.2016][!!]!!
auto newtext2 = std::regex_replace(text, rx, R"([$`][$&][$'])");

The regular expression matches a one- or two-digit number followed by a dot, hyphen, or slash; followed by another one- or two-digit number; then a dot, hyphen, or slash; and lastly a four-digit number.

For newtext1, the replacement string is $5$4$3$2$1; this means year, followed by the second separator, then month, the first separator, and finally day. Therefore, for the input string today is 1.06.2016!, the result is today is 2016.06.1!!.

For newtext2, the replacement string is [$`][$&][$']; this means the part before the first match, followed by the entire match, and finally the part after the last match, are in square brackets. However, the result is not [!!][1.06.2016][today is ] as you perhaps might expect at first glance, but today is [today is ][1.06.2016][!!]!!. The reason for this is that what is replaced is the matched expression, and, in this case, that is only the date (1.06.2016). This substring is replaced with another string formed of all the parts of the initial string.

See also

  • Verifying the format of a string using regular expressions to familiarize yourself with the C++ library support for working with regular expressions
  • Parsing the content of a string using regular expressions to learn how to perform multiple matches of a pattern in a text
You have been reading a chapter from
Modern C++ Programming Cookbook - Second Edition
Published in: Sep 2020
Publisher: Packt
ISBN-13: 9781800208988
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image