Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Becoming a Rockstar SRE
Becoming a Rockstar SRE

Becoming a Rockstar SRE: Electrify your site reliability engineering mindset to build reliable, resilient, and efficient systems

Arrow left icon
Profile Icon Jeremy Proffitt Profile Icon Rod Anami L. Anami
Arrow right icon
€33.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (10 Ratings)
Paperback Apr 2023 420 pages 1st Edition
eBook
€17.99 €26.99
Paperback
€33.99
Subscription
Free Trial
Renews at €18.99p/m
Arrow left icon
Profile Icon Jeremy Proffitt Profile Icon Rod Anami L. Anami
Arrow right icon
€33.99
Full star icon Full star icon Full star icon Full star icon Full star icon 5 (10 Ratings)
Paperback Apr 2023 420 pages 1st Edition
eBook
€17.99 €26.99
Paperback
€33.99
Subscription
Free Trial
Renews at €18.99p/m
eBook
€17.99 €26.99
Paperback
€33.99
Subscription
Free Trial
Renews at €18.99p/m

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Table of content icon View table of contents Preview book icon Preview Book

Becoming a Rockstar SRE

SRE Job Role – Activities and Responsibilities

A lot has been said about site reliability engineering, what it is, what it is not, and the multiple practices and techniques that we should apply to adopt the site reliability engineering model. Who site reliability engineers (SREs) are is often put aside even though it is a crucial aspect. Moreover, how people from various parts of information technology (IT) become SREs and how some of them are recognized as thought leaders in this domain.

However, little has been said about the site reliability engineer persona, as detailed in the following list:

  • What do they know?
  • Which skills have they developed?
  • What do they do daily?
  • What are their primary responsibilities?

Those characteristics would explain, at a bare minimum, why someone should start the journey to becoming an SRE rockstar. That’s precisely why we decided to start this book by outlining the SRE job role.

In this chapter, we’re going to cover the following main topics:

  • Making this journey personal
  • Understanding the mindset and hobbies of an SRE
  • DevOps engineers versus SRE versus others
  • Describing an SRE’s main responsibilities
  • An overview of the daily activities of an SRE
  • People that inspire

Making this journey personal

Unfortunately, often when an enterprise starts to adopt SRE into their IT governance processes, they don’t use a people-processes-tools (PPT) model to transform their operations and software development areas, having a clear vision of these pillars. Even more often, they don’t emphasize or focus on the people element of PPT in such transformations. We want to change that by making this learning journey personal and centered on the individuals rather than the involved processes or technologies.

It’s critical to understand (and learn) what drives typical SREs forward, which fundamental skills they have developed, and how they hone their skills over time to go above and beyond at work. For that purpose, we will divide this subject into three sections:

  • SRE driving forces
  • SRE skills
  • SRE traits

Let’s start this personal journey by understanding why you should become an SRE.

SRE driving forces

We want to explore what motivates or incentivizes site reliability engineers. There’s no journey of any nature if there is no driving force pushing you through. As a word of advice, we should warn you that learning about site reliability engineering is more of an expedition than a tourism trip. In other words, it’s more a marathon than a sprint. Having clarified that, we’ll begin by putting the possible rewards of this journey on the table. Let’s depict each driving force as a mockup code snippet (JavaScript) to make it fun.

Money

If we could represent in the form of an algorithm how money drives people when they don’t earn enough, it would look like the following:

// money
if (money < MyMinimumSalary) {
motivated = false;
excitement--;
}
doMyWork();
if (motivated && jobSatisfaction) {
    honeSRESkills();
    doExtraWork();
} else lookForAnotherJob();

Site reliability engineers make more money than most other technical professionals. According to a Glassdoor (2022) report, they can earn more than USD 118K per year on average. In similar reports, SREs are even noted to have surpassed DevOps engineers in a salary comparison. Nevertheless, not making enough money can be a key demotivating factor. It is hard for anyone to move forward with their career if they are preoccupied with expenses.

Although SREs have a notorious income on average, their salaries will vary per country, years of experience, and employer. Companies justify SRE salary levels based on the reliability value they bring to the table. Rest assured, the site reliability engineering career is well paved in the compensation field.

Job satisfaction

What affects our job satisfaction can be depicted as code logic as follows:

// jobSatisfaction
if (interestingJob || purposefulWorkActivities || challengingSkillDevelopment || technicalAppreciation) {
    jobSatisfaction = true;
    excitement++;
}

Job satisfaction is another driving force of site reliability engineers, and it has many factors. We usually translate job satisfaction to employee happiness at work. Site reliability engineering leads to job satisfaction when we look at the following profession characteristics: exciting job content, purposeful work activities, challenging skill development, and technical appreciation.

The job content of site reliability engineering spans multiple domains. You can work with developers one day and help systems administrators the next. You may need to assist in redesigning an app to increase its service reliability. As with any generalist model job with technical depth in many subject areas, you will never get bored for sure.

As we will see later in this chapter, SRE work activities have clear business value. They improve not just the service quality, availability, and resiliency, but also the system’s reliability. Reliable services might help with customer loyalty, bringing additional revenue to the service provider. There is a direct relationship between SRE work and business metrics improvement, making their efforts purposeful.

Since site reliability engineering is a cross-technology domain engineering discipline, any skills acquisition is challenging. SREs have knowledge and skills that a systems administrator or software developer doesn’t have. They are required to keep those skills updated and hone them over time. This necessity to keep learning brings the always-moving-forward feeling that may not happen if you only need to master a single product or technology.

The last factor on our list is technical appreciation. According to Boston Consulting Group (BCG) research, appreciation is the number one job happiness factor. Being an SRE, you will aid customers, users, and other technical professionals because of your keen holistic view of the systems. Consequently, technical appreciation for the job you do is common, and who doesn’t like that?

Innovative solutions

The following code gives you an idea of how exciting exploring uncharted terrains is:

If (!solutionExists) {
    deviseNewSolution();
    excitement++;
}

Site reliability engineers are natural trailblazers as they explore new technologies and processes to obtain better reliability and eliminate toil (manual and repetitive tasks that are devoid of value). They face many scenarios and situations that are a first of their kind. Moreover, they are responsible for paving the path for others by documenting procedures in runbooks when none exist. There’s nothing more exciting than devising new solutions or improving existing ones. Imagine how you would feel if they named a technical operating procedure after you.

Nevertheless, SREs want to minimize complexity and reduce technical debt. They don’t create a solution just for the sake of doing it unless it adds value and resolves or prevents events that impact customers.

Good relationships

The following code snippet is a representation of how good relationships are a result of an exciting working environment:

If (excitement > HIGH) {
    motivateOthers();
    relationships.healthy = true;
}

Also, good work environment relationships are one of the top 10 factors contributing to employee happiness. SREs have good relationships in their work environment. The reason is straightforward; they act as integration hubs among different tribes and have the mission to break company siloes. SREs need cooperation from both development and operations teams. They are technical diplomats and have strong communication skills. Since they are usually excited about their work, they tend to socialize more with colleagues and leaders, potentially helping to improve the social environment around them. That doesn’t mean they need to be extroverts with progressive public-speaking skills, but certainly, SREs are good teachers because they are excited and compelled to talk about what they do.

SRE skills

Now that you know what’s in it for you, it’s time to check which skillsets SREs must develop throughout their careers. Site reliability engineers have a good mix of knowledge, skills, and experience that are shared with other roles and those that are unique to them. SREs have technical skills that span the entire solution life cycle, from the design to the manage step.

Figure 1.1 – SRE skills

Figure 1.1 – SRE skills

The preceding Venn diagram shows how SREs acquire skills common to other professions and how SRE skills connect the various steps of the solution life cycle. In essence, site reliability engineers are senior technical resources that follow a generalist proficiency model with good depth at certain areas of expertise.

There’s no consensus in the market about the canonical set of skills for SREs. It would not make sense for this to be the case because as soon as any technology-based skill becomes obsolete, we would need to remodel the whole profession. Instead, SRE core skills should be as technology-agnostic as possible.

We recommend a blend of distinct expertise from the IT architect, software developer, data scientist, DevOps engineer, and systems administrator roles. The proportion of each skill level varies per a multitude of factors. You will need to determine which skills are more in demand than the others, but an organization should have all of them in its toolkit.

Systems thinking

Site reliability engineers have a holistic view of the system’s reliability by understanding the availability, resiliency, and performance of each solution component at both the application and infrastructure levels.

Software engineering

SREs develop code and software. They know how to utilize algorithms and software development techniques such as agile frameworks. SREs need to be proficient enough in instrumenting the app code to increase its manageability and observability. SREs know how to use software development life cycle tools and technologies, including DevOps (continuous integration/continuous deliveryCI/CD) pipelines. They can provide testers with better test cases that consider service reliability targets.

Systems management

SREs know how to manage, administer, and operate systems. They share most of the skills from the systems administrator role on multiple technologies. Their technology knowledge spans the cloud, containers, storage, networking, operating systems, middleware, and databases. They have the skills to implement monitoring, event management, logging, tracing, service levels, observability, DevOps toolchains, and automation of toil.

Data science

SREs work with huge amounts of structured data. They must acquire the knowledge and skills to make sense of such datasets by using mathematical models. SREs know how to analyze data to uncover trends, anomalies, and insights – always from the user’s perspective.

We recommend that every SRE has the following selection of core skills:

  • Systems thinking for focusing on the reliability of the system
  • The ability to develop and test software
  • The ability to deploy and release apps
  • IT service management
  • Systems monitoring and observability
  • Working with DevOps tools and automation
  • The application of data science for reliability of systems

Although we didn’t explicitly mention security in any of the knowledge domains, enforcing security across multiple layers is present in all of them.

Important note

All fundamental SRE skills are covered in this book’s chapters. The chapters have been organized to optimize your learning journey, so they don’t follow the preceding order of skills. We structured this book based on our own experience acquired from a multitude of site reliability engineering coaching and mentoring sessions.

We provide a manifesto model in the Appendix A, The Site Reliability Engineer Manifesto, that acts as a more structured guide for site reliability engineering adoption, including the fundamental skills. We hope that helps your company in joining the site reliability engineering movement.

SRE traits

Besides what the SREs know and which skills they must develop, it’s relevant to know their other good traits.

Software is everywhere

Site reliability engineers have a software engineering mindset. The idea of approaching any issue as a software problem may be disruptive at first; however, there is a good reason for it. Imagine that you need to restart and verify a system by manually issuing a specific set of commands and parameters many times per week. If you handle it as a software development problem, the solution will be developing and scheduling a simple program or automation to execute this task instead. SREs embrace automation over toil as one of their best tools.

Comfortable to code

SREs are not just able to develop code; they really like doing it. As we will see later in this chapter, they develop code as a frequent activity and main responsibility. It’s not just a question of learning how to code or program when someone asks; SREs always feel confident in constructing good pieces of software.

Change as a constant

Frequent releases of new features, code enhancements, and reliability improvements are vital for any business. SREs are the first to accept calculated risks to provide more value to the system users. They are not risk averse but bring visibility to inherent risks so they can throttle the speed of change. They are always prepared to make progress and go above and beyond for service reliability.

Handle complexity and scale

They are not afraid of complexity or scale. They know that modern workloads are intrinsically complex and must be scalable horizontally and vertically. SREs work with large systems with multiple components running in hybrid multi-cloud environments. They understand the application’s full-stack design, its moving parts, and how they connect to each other.

Problems as opportunities

SREs participate in on-call rotations and schedules to respond to service disruptions. They see incidents and problems as opportunities to learn and advance the reliability of the system. Not just that, they also have the competence to translate technology into business language to measure the impact on users and customers. They advocate for a blameless culture by prioritizing answers to questions, such as how to detect and repair incidents faster next time. They also consider how technical challenges may affect business results.

We have just gone over what makes the SRE persona: their motivations, skills, and traits. Now we are going to understand how site reliability engineers think.

Understanding the mindset and hobbies of an SRE

It’s not rare for site reliability engineers to have a broader and divergent view of their surroundings. We are not saying that SREs are weird; well, they are in a certain sense, as they employ a relentless search for improving reliability in all things. However, we are referring to their mindset and how they approach the world.

In this section, we will explore different aspects of their thought process in the work environment and what they like to do in the job and outside it. We have divided this topic into three sections:

  • SRE affinity game
  • SRE guiding principles
  • SRE hobbies

You may have asked whether site reliability engineering is the right profession for you. Let’s examine that next.

SRE affinity game

Let’s play a game! What do you think your affinity or compatibility is with the site reliability engineering profession? We will present a series of scenarios that SREs face. You need to answer them with either love, like, dislike, or hate indicating how much you see yourself doing it and how you would feel about it. Try to be as honest as possible.

Disclaimer

This is not an anthropological scientific survey based on a human behavioral model or theory by any means. It’s a simple questionnaire to help you understand your own affinity to the SRE job role.

The scenarios are in the following list. Get a piece of paper, write down the question number, and answer it. Good luck!

  1. Your boss asks you to resolve a problem that no one else has ever resolved.
  2. You need to spend a few hours looking through logs, metrics, graphs, and events to verify whether there are any new anomalies that were not detected automatically.
  3. You need to participate in an on-call rotation or schedule where you might be called late in the night to respond to a service disruption that has a business impact.
  4. You need to work on a backend system or software that is not visible to external users.
  5. You need to devise new ways to increase a large system’s overall reliability.
  6. You are asked to work on a large-scale problem, which affects hundreds of users and has dozens of components and dependencies, that runs on a hybrid multi-cloud environment.
  7. You are diagnosing a system problem that is making users from a certain geography unable to access their services, and there is great pressure on you.
  8. You need to approach problems with a selected scientific method or data model to uncover facts instead of guessing.
  9. You constantly ask yourself how you could make things around you better and more reliable.
  10. You need to classify and categorize systems information and functionalities so you can isolate causes from effects.
  11. You must diagnose and fix a system problem by investigating components that are not usually visible by going deep into each component configuration as debugging mode is not available.
  12. You need to design a detailed diagram of how the user interacts with a system or software so you can point out where to observe for symptoms.

After you complete this exercise, assign points to each of the answers. If you replied to a scheme with a love answer, assign 5 points to it. For a like answer, you get 3 points. Dislike has a value of 0, and hate is -3 (negative!). Sum your points across all 12 scenarios to get your score, and check the result against the following list:

  • Over 34 points: Your affinity is very high; this is the right career for you
  • From 21 to 34 points: Your affinity is high; you should consider this profession
  • From 13 to 20 points: Your affinity is medium; this may be a good job role for you
  • Below 13 points: SRE may not be your best option

This may be a game, but it will have made you imagine yourself in an SRE’s shoes. We have started to understand the SRE mindset, so let’s check what guides them in the convoluted scenarios listed previously.

SRE guiding principles

Everyone has a conjunction of principles (and values) that acts as their compass. SREs also follow a set of values; they embrace guiding principles to advise them on technical decisions and act as a reliability compass.

Google® coined most of those principles in its site reliability engineering books (https://sre.google/books/), but others appeared later in conference sessions at SREcon (https://www.usenix.org/srecon) and blog posts on many websites.

Again, we have selected some of them as canonical guiding principles based on our experience in assisting customers and organizations in enabling site reliability engineering in their IT shops. The following is the set of guiding principles that are rooted in the SRE persona:

  • Scalable operations
  • Engineering fidelity
  • Observability to the core
  • Well-designed service levels
  • User-perspective notification trigger
  • Blameless postmortems
  • Simplicity

We must remark that such principles are not procedures or prescriptive instructions to accomplish something but guidelines. Don’t worry if you are not familiar with the terminology applied here; we dig into them in a detailed manner throughout the book. Let’s investigate each of them along with their most familiar patterns and anti-patterns.

Scalable operations

The operations team, which includes site reliability engineers, is responsible for managing production systems. They are the first responders for any service disruption when something goes wrong. The scalable operations principle states that this team will not grow proportionally to the system as its load increases. Another way to say that is if the number of active users for the determined service doubles, the operations team size will not double. A more mathematically accurate way to visualize this is through a logarithm growth curve. As the operations team gains technical maturity, eliminates repetitive manual tasks, and adopts automation at large, they will need fewer resources to manage more system load:

Figure 1.2 – A logarithm growth curve

Figure 1.2 – A logarithm growth curve

It is worth mentioning that SREs employ a proactive approach as they strive to identify the root cause of issues and devise solutions to detect or prevent problems. The patterns for this principle are as follows:

  • Identify and eliminate toil whenever possible
  • Document operational procedures as runbooks
  • Train operations teams to use and refine runbooks
  • Adopt automation platforms and automated procedures documented in runbooks at large

The anti-patterns are as follows:

  • Have linear (or exponential) growth for operations teams when the system load rises
  • Operational knowledge is tacit or not documented
  • Automation is the end goal and not merely a way to eliminate toil

Engineering fidelity

This tenet asserts the obvious: site reliability engineers do engineering. Yet it’s not uncommon to see SREs only working on incident, problem, and change management processes. We are not telling you that site reliability engineers don’t get their hands dirty; on the contrary, they do operational and engineering work. This principle exists to guarantee that SREs will have time to excel in both.

The patterns are as follows:

  • Cap operational work at 50% of the available SRE time. The other half is dedicated to engineering solutions and increasing reliability.
  • Share some of the operational work with the development team. Sharing 5% of the operational work is usually recommended, so the development team is prepared to take on SRE work.
  • Send operational overflow work to the development team as they share the same goals.

The anti-patterns are as follows:

  • SREs only work on operational work, resolving incidents, implementing changes, and running root cause analysis (RCA)
  • SREs spend most of their time doing firefighting (incident resolution)
  • Development teams don’t share any responsibilities with the operations team

Observability to the core

Observability is the ability to comprehend the internal state of a system by inspecting its outputs. It extends the monitoring concept by adding layers to expand the system visibility and allows a more proactive posture by detecting anomalies before they become disruptions. This guiding principle craves visibility and discernability of what’s happening inside a system or application by measuring certain signals.

The patterns of observability are as follows:

  • Observe the system behavior through the golden signals; this can be either four (LETS) or five (STELA) signals depending on the school of thought you follow. The LETS acronym stands for latency, error rate, traffic, and saturation. STELA stands for saturation, traffic, error rate, latency, and availability.
  • Have monitoring metrics, events, logs, and traces (MELT) at the SRE’s disposal. These are the fundamental data components of any observability platform.
  • Run synthetic user testing from time to time. This is a method where a bot mimics a user to test system functionality and response times.

The anti-patterns are as follows:

  • Observe only the liveness of the system components, but not from the user’s perspective. For example, checking that components are running versus checking that users can use the system as designed.
  • Lack of user experience monitoring. You don’t have visibility of what’s happening in the user interface.

Well-designed service levels

There’s no way to verify whether a service is being delivered to the target user within the expected and agreed-to parameters without established service levels. Part of the undeniable success of site reliability engineering is due to this redefinition of what a good service level is and how we document it. This tenet aims to have not just well-defined service levels but also well-designed ones that measure the system’s reliability.

The patterns are as follows:

  • Define service-level indicators (SLIs) from the system user angle, then delineate service-level objectives (SLOs) as an aggregation of the former
  • Set the SLO target to less than 100%, so there’s some room for errors (error budget) between 100% and the SLO target to launch new features and enhance overall reliability
  • Establish service-level agreements (SLAs), with penalties and fines if they are not met after the measured SLOs
  • Improve SLOs and increase their targets over time through engineering work carried out by the site reliability engineering team

The anti-patterns are as follows:

  • Define the SLAs first, then measure the SLOs to see whether they are feasible with the current workings of the system.
  • Establish a target of 100% for the SLAs or SLOs. This anti-pattern reduces the team’s ability to release new features (or develop system reliability further) as there’s no space for testing them in production. Soon enough, the whole system will become obsolete or non-competitive in its market.

User-perspective notification trigger

Notification is the process of alerting on-call first responders about service or system performance deterioration or downtime. It translates to when a site reliability engineer must be engaged to resolve an incident. This principle states that triggering a notification of an issue should only happen when this issue is affecting the system user. For instance, we never alert an SRE if the CPU load is high, but the user is not feeling any service degradation.

The pattern is as follows:

  • Alerts are triggered if there are any symptoms at the user level, and if such warnings are actionable, SREs can resolve them

The anti-patterns are as follows:

  • Alerting noise. SREs cannot differentiate between alerts that are mere informative events and ones that affect the system user.
  • Lack of alerting. End users engage with the help desk to notify them that there are problems in the system.

Blameless postmortems

Postmortems are in essence root cause analysis (RCA) acts. They receive a peculiar name to avoid running into the same old pitfall: finding someone or something to be blamed (the root cause) rather than improving the system’s quality and learning from mistakes. Postmortems also focus on questions, such as how to detect, respond to, and repair disruption in the service faster than just uncovering the root cause alone. This tenet is one of the hardest to deploy for a new organization if it has been doing traditional RCAs for some time and requires a blameless culture to support it.

The pattern is as follows:

  • Infinite hows. Ask multiple questions, starting with the term how, to determine enhancements to the system (infrastructure and applications), processes, and knowledge base.

The anti-pattern is as follows:

  • Go back to traditional RCAs where no progress is made on reliability

Simplicity

This guiding principle was imported from the Agile Manifesto. We can’t explain it better than the manifesto (https://agilemanifesto.org/principles.html): “the art of maximizing the amount of work not done.” In other words, it dictates that site reliability engineers are always looking for ways of simplifying and avoiding unnecessary work. They are eager to eliminate toil, that is, repetitive, manually intensive, or low or no business-value tasks. However, inherently as humans, we tend to complicate everything, so ensuring runbooks are kept easy to observe and readable is a good example of this principle.

The pattern is as follows:

  • Keep it simple, stupid (KISS) is a proven design principle from the US Navy that says most systems work better if they are simple to use or follow

The anti-pattern is as follows:

  • Too elaborate processes for SRE work

We just explained our preferred seven guiding principles that site reliability engineers follow in their profession. They are an integral chunk of the SRE mindset. Let’s now cover what SREs do in their free time to overcome learning limitations.

SRE hobbies

Jeremy and I couldn’t agree more about what makes a site reliability engineer rockstar: their hobbies. What you do in your free time for leisure or as a second profession leads to greater levels of conceptual and practical knowledge. The trick is finding a hobby that you have a passion for and that helps in the SRE role.

We can’t tell you what the best-fit extra-curricular activity that will pump up your SRE skills is, but here we list some examples that may interest you, grouped by the skills that they enhance.

Analytical thinking

Site reliability engineers have a good analytical processing capacity. They need to analyze big amounts of data and detect patterns, trends, and anomalies by correlating different data sources. Some engaging hobbies that leverage your analytical thinking are as follows:

  • Chess: Without saying too much about it, this game has its own set of theories and algorithms. It is a good way to practice thinking multiple steps ahead while focusing on the present.
  • Board games: There are plenty of board games that make you analyze information to win. And they make it enjoyable and social.
  • Rubik’s cube: This fun toy is also a good example of simplicity in operation and shape. It presents a complex challenge with a plain design.
  • Video games: Strategy and role-playing games will train your mind in thinking analytically.

Creativity

SREs need to forge new algorithms for observability. They also need to construct new ways of measuring system reliability, as applications and infrastructure components have an uncountable number of arrangements, technologies, and architectures. This may sound cliché, but thinking outside the box is where site reliability engineers shine. Here are some hobbies that may help with your creativity:

  • Algorithms development: Although this may be part of your daily work life, you can find fun in it by developing 2D or 3D video games, for instance. Another option is to contribute to open source software projects in the wider community.
  • Drawing or painting: This is a relaxing and artistic example. It also gets you used to finding inspiration.
  • LEGO®: An across-the-globe famous construction toy, LEGO makes you think about new forms, shapes, structures, and ways of assembly. It also has a robotics range that gives you programming skills as well.
  • Internet-of-Things prototyping: How about developing embedded projects with Arduino® or Raspberry Pi® boards? You need to build both the hardware and firmware for the project to come together.
  • Video games: The ones where you need to build something with blocks and basic structures, such as Minecraft®, are especially useful in nurturing creativity.
  • 3D printing: Author Jeremy is a 3D printing master. Like painting, you can express your art in three dimensions.

Troubleshooting

Troubleshooting is not exclusive to site reliability engineers. Systems administrators must also figure out why a service or system is down and how to repair it. However, SREs use systems thinking and scientific approaches to troubleshoot differently. You need to train your mind to resolve problems logically and calmly, and you’re going to need it. There are plenty of hobbies that can stimulate you to excel in this area. Let’s list some of them:

  • Crossword or jigsaw puzzles: People are addicted to this type of entertainment. It’s an excellent choice to keep the mind sharp and trained
  • Sudoku: This was a trend not long ago, but it is still an excellent way to polish up troubleshooting skills.
  • Video games: The ones full of puzzles, such as Portal, as especially good for troubleshooting practice.

We have now covered the aspects of the site reliability engineer persona. Next, we will look at what makes site reliability engineering professionals unique by comparing them to other roles and listing responsibilities and activities.

DevOps engineers versus SRE versus others

This is one of the most frequently asked questions we receive from customers and organizations: how does the site reliability engineering profession differ from other existing technical roles? We already talked about how SREs are the connection between the different steps of the solution life cycle. Here, we’ll focus our discussion on the DevOps engineer role, and later, we’ll broaden it. We have split this discussion into two sections:

  • DevOps and site reliability engineers
  • Software and site reliability engineers

DevOps and site reliability engineers

Google described the relationship between DevOps and SRE with a famous subtitle in their The Site Reliability Workbook publication:

Class SRE implements interface DevOps

This statement is an elegant way to define this link and refers to Java programming. It implies that site reliability engineering describes and deepens the implementation of whatever DevOps is. Moreover, we can say that site reliability engineering has commonalities with DevOps as a logically derived conclusion. However, what exactly does site reliability engineering implement from DevOps, or what are the differences between a site reliability engineer and a DevOps engineer? We have visualized these similarities and divergences in an infographic as follows:

Figure 1.3 – An infographic on SRE and DevOps

Figure 1.3 – An infographic on SRE and DevOps

Notice that they have shared values. Both SREs and DevOps engineers require those values in the orange (bottom right in the above diagram) box. In the bottom-left table, you can see the difference between those roles. Typically, site reliability engineers resolve operational problems by applying the right software engineering disciplines. On the other hand, DevOps engineers resolve development and delivery pipeline issues with systems management techniques mainly by using automation and infrastructure-as-code. They also concentrate different levels of effort on distinct phases of the solution life cycle, as depicted in the infographic.

It’s not rare to hear that DevOps is a shift-right transformation while site reliability engineering is a shift-left one. That implies moving from the left (development side of the equation) to the right (operations side of the equation), and vice versa. Another term we hear a lot is DevSecOps, which has the addition of security. Since security has always been implied in these roles, we think including new letters in the middle is confusing and redundant.

SREs and DevOps engineers are, in our opinion, different sides of the same coin. They should be more like best friends forever than opposing roles as they share values. Let’s check how SREs fulfill those values from the five main areas of DevOps:

  • Reduce organizational silos: SREs use the same tooling as developers or DevOps engineers. They also share objectives and performance metrics with them.
  • Accept failure as normal: SREs embrace risks using the error budget for new features. They quantify failure through SLIs and SLOs. And they run postmortems in a blameless culture.
  • Implement gradual changes: SREs work to increase reliability, and more reliable systems allow more frequent changes and releases.
  • Leverage tooling and automation: SREs eliminate toil by automating operational tasks at a constant pace.
  • Measure everything: SREs measure reliability by implementing MELT data and observability. They also have ways to identify and size toil.

Software and site reliability engineers

Another frequently asked question is how site reliability engineers differ from software engineers (SWEs). The short answer is simple: they have the same core skills but specific work scopes.

What are SWEs? SWEs design, engineer, and architect applications using modeling languages and requirements analysis techniques. They implement an integrated development environment (IDE) and develop code for use cases using one of the multiple available programming languages. They create test cases and testing suites. Also, they integrate software and service components and handle their dependencies. SWEs work with many software development life cycle tools and processes.

Site reliability engineers may execute the same activities, but they intend to improve reliability when doing so. For instance, developing code for an SRE translates much more to instrumenting the application code, so it generates more logs, than coding a use case. Also, SREs treat operations as a software problem and see daily systems management tasks as possible software coding opportunities. Besides that, SREs have other core skills, relating to systems thinking, systems management, and data science.

Indeed, an SRE could become an SWE and vice versa, and that leads us to another principle that we find in the Google materials.

Common staffing pool

Another principle is hiring site reliability engineers and SWEs from the same staffing pool. This principle works well for companies where most employees are software developers and engineers, and having a shared pool means that site reliability and software engineering job roles are interchangeable. However, this principle may be much more challenging for enterprises with a mix of systems administrators and developers. Hence, we left it out of our list in the previous section.

We could compare the SRE’s unique profession to many others, but we limited this topic to the most common comparisons. SREs are not architects, developers, systems administrators, or data scientists; they are more than all of these roles combined. Up next, we are going to understand the primary responsibilities of an SRE.

Describing an SRE’s main responsibilities

We hope the SRE job role mission and scope are less foggy at this point. As an SRE, what would you be responsible for? In this section, we will investigate the most trivial duties that SREs are accountable for. We’ve divided these responsibilities into two sections:

  • Operational work responsibilities
  • Engineering work responsibilities

Let’s start by reviewing the operational group first.

Operational work responsibilities

Site reliability engineers have work duties related to the process of managing systems. Such tasks are called operational work. SREs are not just accountable for operational work together with the operations team, but they also have the authority to execute their management processes.

First, they are responsible for the ITIL® processes, including incident, problem, and change management. That means they actively participate in on-call schedules for critical services downtime as first responders. They need to isolate the faulty components of the service, troubleshoot the causes of the component issues, repair them or provide a workaround, reestablish the affected service to nominal performance, and verify whether the service has been restored from the user’s perspective. After significant service disruptions, SREs must determine their root causes and contributing factors. They implement change requests to the systems, backing services, delivery pipelines, integrations, infrastructure, and applications.

Second, they are accountable for maintaining systems, services, applications, and infrastructure. They may need to patch a bug into production or assist the development team. SREs may have to deploy a new software version using a canary release, A/B testing, or blue-green deployment.

Third, SREs have the responsibility of taking care of the observability platform. That includes installing, configuring, maintaining, and monitoring the observability tools. Yes, we monitor the monitoring.

Engineering work responsibilities

SREs do engineering work to reach higher levels of availability, resiliency, performance, quality, and scalability on a system. They work on each configuration item or component to increase its reliability. The overall system delivers more trustable services and SLOs by handling each component reliability index.

Site reliability engineers are responsible for reliability metrics, such as the mean time to detect (MTTD) and mean time to repair (MTTR). MTTD indicates how fast the monitoring system can detect a service problem or an anomaly that will lead to a problem if nothing is done. MTTR indicates how swiftly an incident is repaired after it’s detected. Those metrics make SREs accountable for the effectiveness of the observability platform and tools, and the runbooks documentation.

The mean time between failure (MTBF) is another reliability metric under SRE accountability. That indicates how much time it takes for a system failure. SREs must adopt the blameless postmortems principle to improve this metric every time a failure happens. And that translates to multiple reliability enhancements to different parts of the system as a result of these postmortems.

SREs are accountable for toil management. The less toil we have in systems management, the better the metrics mentioned previously. Site reliability engineers work tirelessly to detect and eliminate repetitive tasks devoid of business value.

We described the ordinary responsibilities of an SRE with the intent of giving you an idea of what to expect in this career. Of course, this is not a comprehensive list of duties or intended to suggest a constraint to their responsibilities. As long as they work to fulfill the guiding principles, they are doing SRE work. We are going to review which activities SREs execute daily next.

An overview of the daily activities of an SRE

Now that we have examined SRE responsibilities, it’s time to check what you, as an SRE, should be performing on a frequent basis. There’s no better way to understand a profession than by asking what someone does in it. When you go to a job interview, you probably want to know the activities a person in that position will carry out. SREs will have a list of assignments as sticky notes on their displays. We have separated those notable activities into two sections:

  • Reactive work activities
  • Proactive work activities

We’ll start by understanding reactive activities.

Reactive work activities

SREs execute many tasks that don’t lift (or shift) system reliability directly; they are usually operational types of work. Nevertheless, those activities either lessen the service downtime or mitigate risks. Examples of jobs that SREs perform daily in this category are as follows:

  • Repair or restore a system or multiple services to their original state
  • Follow and execute instructions from a runbook (standard operating procedure) during an incident to diagnose the application
  • Implement a change request to apply a patch to a software component
  • Attend a meeting to run a postmortem with system administrators and developers about the recent service or system outage
  • Install a new Kubernetes cluster for a new application according to the development team’s specifications and enable monitoring of it
  • Configure a new cloud-based service for a new application following the architecture design and include it in cloud monitoring
  • Deploy a new software release to VMs and execute the testing scripts

Proactive work activities

SREs also carry out jobs that improve the quality, scalability, observability, manageability, resiliency, or availability of a system or service. Since those tasks increase the reliability levels of specific systems or services, they are considered proactive and mostly engineering type of work. Such assignments affect toil and technical debt. Examples of this category are as follows:

  • Maintain a runbook on how to diagnose problems with a specific application
  • Design and develop an automaton to execute procedures previously documented in a runbook automatically
  • Establish, together with the DevOps team, the release strategy, such as a canary release, A/B testing, or blue-green deployment
  • Work with the SWE to add management code to the application so SREs can instruct the application to do self-administration or self-healing operations
  • Work with the development team to adopt an immutable infrastructure philosophy into the application-building process
  • Instrument the application code to increase its observability with logs and traces
  • Design and implement observability to obtain good metrics, events, logs, and traces from a critical application

Note

Site reliability engineers perform many more activities than the ones listed here. This is not a comprehensive list; the only intention is to show you how SREs work across multiple dimensions and aspects of systems and services.

We listed what an SRE does frequently. We wanted to give you a good sense of their day-to-day activities and how it differs from other roles. Again, this is not a complete or closed list. We want to close this chapter by telling you who our SRE rockstars are.

People that inspire

We want to finalize this chapter by pointing out other SREs that have inspired us and have been encouraging the wider community. We couldn’t even think about starting this book without the work of the parents of site reliability engineering at Google. We are immensely grateful to them. Site reliability engineering would probably not exist outside Google if they had chosen not to share their thoughts, principles, techniques, and practices through the site reliability engineering foundation books. They are mandatory reading for anyone following this career path. If you haven’t read them yet, please check out Google’s site reliability engineering books at this site: https://sre.google/books/.

We want to recognize a few other rockstar SREs that have really made a difference in our professional lives as individuals. They are trailblazers of site reliability engineering outside Google.

Jeremy’s recognition – Paul Tyma, former CTO, LendingTree

In technology, finding your way can be difficult. The constant struggle of being an SRE leads us into discussions of what went wrong; often, we have to say what some don’t want to hear – that a negative thing happened due to what a person or team did or didn’t do. We are, in fact, often the bearers of bad news. Paul opened the door for me to become an SRE, and we drove a great reliability revolution together. Most importantly, he taught me that there is a balance to all things, and we have a choice in that balance. And what we often consider a responsibility or duty can have its limits.

Rod’s recognition – Ingo Averdunk, Distinguished Engineer, IBM, and Gene Brown, Distinguished Engineer, Kyndryl

Ingo and Gene triggered a small revolution inside IBM by designing and deploying site reliability engineering principles, practices, professions, and methodologies to its organizations across the globe. They first transformed many internal teams to adopt such extraordinary tenets, then later, they helped external customers in doing the same. Of course, they didn’t accomplish this alone, but they were (and are) paramount examples of technical executive leadership. They shaped the site reliability engineering profession from within IBM, which later spread to Kyndryl after its spin-off.

Summary

In this chapter, we learned what the site reliability engineering persona looks like – what they know and how they think. We divided their mindset into smaller traits so you can understand what’s expected of you on this journey. Also, we explained why the site reliability engineering profession is unique. Then, we entered the practical knowledge sections and looked at their main responsibilities and daily activities.

By now, you should be able to explain why someone would become an SRE and the path for that. You can differentiate the site reliability engineering job role from others and understand what the site reliability engineering persona is. Finally, you know the typical duties of SREs, and the activities most SREs perform. As we close this chapter, we hope everything we say here resonates with your career aspirations.

In the next chapter, we dive into the world of numbers and understand how they are intrinsically part of the site reliability engineering domain. You will learn why SREs see production systems like no one else.

Further reading

You can read more about the site reliability engineering profession and other topics in the Awesome Site Reliability Engineering repository: https://github.com/dastergon/awesome-sre.

Left arrow icon Right arrow icon
Download code icon Download Code

Key benefits

  • Understand the goals of an SRE in terms of reliability, efficiency, and constant improvement
  • Master highly resilient architecture in server, serverless, and containerized workloads
  • Learn the why and when of employing Kubernetes, GitHub, Prometheus, Grafana, Terraform, Python, Argo CD, and GitOps

Description

Site reliability engineering is all about continuous improvement, finding the balance between business and product demands while working within technological limitations to drive higher revenue. But quantifying and understanding reliability, handling resources, and meeting developer requirements can sometimes be overwhelming. With a focus on reliability from an infrastructure and coding perspective, Becoming a Rockstar SRE brings forth the site reliability engineer (SRE) persona using real-world examples. This book will acquaint you the role of an SRE, followed by the why and how of site reliability engineering. It walks you through the jobs of an SRE, from the automation of CI/CD pipelines and reducing toil to reliability best practices. You’ll learn what creates bad code and how to circumvent it with reliable design and patterns. The book also guides you through interacting and negotiating with businesses and vendors on various technical matters and exploring observability, outages, and why and how to craft an excellent runbook. Finally, you’ll learn how to elevate your site reliability engineering career, including certifications and interview tips and questions. By the end of this book, you’ll be able to identify and measure reliability, reduce downtime, troubleshoot outages, and enhance productivity to become a true rockstar SRE!

Who is this book for?

This book is for IT professionals, including developers looking to advance into an SRE role, system administrators mastering technologies, and executives experiencing repeated downtime in their organizations. Anyone interested in bringing reliability and automation to their organization to drive down customer impact and revenue loss while increasing development throughput will find this book useful. A basic understanding of API and web architecture and some experience with cloud computing and services will assist with understanding the concepts covered.

What you will learn

  • Get insights into the SRE role and its evolution, starting from Google's original vision
  • Understand the key terms, such as golden signals, SLO, SLI, MTBF, MTTR, and MTTD
  • Overcome the challenges in adopting site reliability engineering
  • Employ reliable architecture and deployments with serverless, containerization, and release strategies
  • Identify monitoring targets and determine observability strategy
  • Reduce toil and leverage root cause analysis to enhance efficiency and reliability
  • Realize how business decisions can impact quality and reliability
Estimated delivery fee Deliver to Spain

Premium delivery 7 - 10 business days

€17.95
(Includes tracking information)

Product Details

Country selected
Publication date, Length, Edition, Language, ISBN-13
Publication date : Apr 28, 2023
Length: 420 pages
Edition : 1st
Language : English
ISBN-13 : 9781803239224
Languages :
Concepts :
Tools :

What do you get with Print?

Product feature icon Instant access to your digital eBook copy whilst your Print order is Shipped
Product feature icon Paperback book shipped to your preferred address
Product feature icon Download this book in EPUB and PDF formats
Product feature icon Access this title in our online reader with advanced features
Product feature icon DRM FREE - Read whenever, wherever and however you want
Product feature icon AI Assistant (beta) to help accelerate your learning
OR
Modal Close icon
Payment Processing...
tick Completed

Shipping Address

Billing Address

Shipping Methods
Estimated delivery fee Deliver to Spain

Premium delivery 7 - 10 business days

€17.95
(Includes tracking information)

Product Details

Publication date : Apr 28, 2023
Length: 420 pages
Edition : 1st
Language : English
ISBN-13 : 9781803239224
Languages :
Concepts :
Tools :

Packt Subscriptions

See our plans and pricing
Modal Close icon
€18.99 billed monthly
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Simple pricing, no contract
€189.99 billed annually
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts
€264.99 billed in 18 months
Feature tick icon Unlimited access to Packt's library of 7,000+ practical books and videos
Feature tick icon Constantly refreshed with 50+ new titles a month
Feature tick icon Exclusive Early access to books as they're written
Feature tick icon Solve problems while you work with advanced search and reference features
Feature tick icon Offline reading on the mobile app
Feature tick icon Choose a DRM-free eBook or Video every month to keep
Feature tick icon PLUS own as many other DRM-free eBooks or Videos as you like for just €5 each
Feature tick icon Exclusive print discounts

Frequently bought together


Stars icon
Total 105.97
Automating DevOps with GitLab CI/CD Pipelines
€33.99
Becoming a Rockstar SRE
€33.99
Terraform Cookbook
€37.99
Total 105.97 Stars icon
Banner background image

Table of Contents

24 Chapters
Part 1 - Understanding the Basics of Who, What, and Why Chevron down icon Chevron up icon
Chapter 1: SRE Job Role – Activities and Responsibilities Chevron down icon Chevron up icon
Chapter 2: Fundamental Numbers – Reliability Statistics Chevron down icon Chevron up icon
Chapter 3: Imperfect Habits – Duct Tape Architecture and Spaghetti Code Chevron down icon Chevron up icon
Part 2 - Implementing Observability for Site Reliability Engineering Chevron down icon Chevron up icon
Chapter 4: Essential Observability – Metrics, Events, Logs, and Traces (MELT) Chevron down icon Chevron up icon
Chapter 5: Resolution Path – Master Troubleshooting Chevron down icon Chevron up icon
Chapter 6: Operational Framework – Managing Infrastructure and Systems Chevron down icon Chevron up icon
Chapter 7: Data Consumed – Observability Data Science Chevron down icon Chevron up icon
Part 3 - Applying Architecture for Reliability Chevron down icon Chevron up icon
Chapter 8: Reliable Architecture – Systems Strategy and Design Chevron down icon Chevron up icon
Chapter 9: Valued Automation – Toil Discovery and Elimination Chevron down icon Chevron up icon
Chapter 10: Exposing Pipelines – GitOps and Testing Essentials Chevron down icon Chevron up icon
Chapter 11: Worker Bees – Orchestrations of Serverless, Containers, and Kubernetes Chevron down icon Chevron up icon
Chapter 12: Final Exam – Tests and Capacity Planning Chevron down icon Chevron up icon
Part 4 - Mastering the Outage Moments Chevron down icon Chevron up icon
Chapter 13: First Thing – Runbooks and Low Noise Outage Notifications Chevron down icon Chevron up icon
Chapter 14: Rapid Response – Outage Management Techniques Chevron down icon Chevron up icon
Chapter 15: Postmortem Candor – Long-Term Resolution Chevron down icon Chevron up icon
Part 5 - Looking into Future Trends and Preparing for SRE Interviews Chevron down icon Chevron up icon
Chapter 16: Chaos Injector – Advanced Systems Stability Chevron down icon Chevron up icon
Chapter 17: Interview Advice – Hiring and Being Hired Chevron down icon Chevron up icon
Index Chevron down icon Chevron up icon
Other Books You May Enjoy Chevron down icon Chevron up icon

Customer reviews

Top Reviews
Rating distribution
Full star icon Full star icon Full star icon Full star icon Full star icon 5
(10 Ratings)
5 star 100%
4 star 0%
3 star 0%
2 star 0%
1 star 0%
Filter icon Filter
Top Reviews

Filter reviews by




Pavlos Ratis Jun 10, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Becoming a Rockstar SRE is an excellent resource for anyone who wants to learn more about the role of an SRE. The book is comprehensive, well-written, and full of practical advice. It is an informative resource that will give you a solid foundation in the field.
Amazon Verified review Amazon
Cloud Apr 28, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
This book is a thorough examination of important principles, practices, and practical guidelines for those in the SRE profession. The inclusion of lab exercises provides a hands-on way to demonstrate tools and concepts. This is a valuable resource for both experienced SREs and those starting out in this field.
Amazon Verified review Amazon
Paul Tyma Jul 11, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The term "SRE" has been an evolving idea for a long time. I've also found it hard to encapsulate what it completely entails in one place. It's clear the Authors have "been there and done that" with regard to building, managing, and maintaining large sites and systems.You can go as far down the rabbit hole of reliability as you like, but this book gives a no-nonsense guide that explains the trade-offs. I had a good few "I hadn't thought of that" moments that made it worthwhile.If you're interested in understanding a big picture of the SRE world, this book is a great way to help you get there.
Amazon Verified review Amazon
MK Nov 04, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
The book dwells into different details of SRE principles and guidelines and clearly demonstrates the training an SRE should have in todays workforce for IT Ops. Great work touching all aspects from SRE organization structure, interview prep, coding samples, ability to download online copy, apart from design principles and resiliency patterns makes it a complete copy for beginner to experts to get something out of this book.
Amazon Verified review Amazon
Rodrigo Anami May 25, 2023
Full star icon Full star icon Full star icon Full star icon Full star icon 5
Yes, I co-authored this book and believe we did well! It's intended for any software developer or system administrator that wants to become a site reliability engineer. Jeremy and I took this journey, so we have many things to share.I received five copies from Packt but decided to buy another copy of this book. You know, to test the system's reliability. It was interesting to see the first copies were printed in the UK while this last one was published in the US. The content is exactly the same except for the last page, which has this information about where it was printed. Nevertheless, people started asking me why I was buying my own book. Of course, I jokingly said I needed to do so by contract to fulfill its clauses. You should see their faces.
Amazon Verified review Amazon
Get free access to Packt library with over 7500+ books and video courses for 7 days!
Start Free Trial

FAQs

What is the delivery time and cost of print book? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela
What is custom duty/charge? Chevron down icon Chevron up icon

Customs duty are charges levied on goods when they cross international borders. It is a tax that is imposed on imported goods. These duties are charged by special authorities and bodies created by local governments and are meant to protect local industries, economies, and businesses.

Do I have to pay customs charges for the print book order? Chevron down icon Chevron up icon

The orders shipped to the countries that are listed under EU27 will not bear custom charges. They are paid by Packt as part of the order.

List of EU27 countries: www.gov.uk/eu-eea:

A custom duty or localized taxes may be applicable on the shipment and would be charged by the recipient country outside of the EU27 which should be paid by the customer and these duties are not included in the shipping charges been charged on the order.

How do I know my custom duty charges? Chevron down icon Chevron up icon

The amount of duty payable varies greatly depending on the imported goods, the country of origin and several other factors like the total invoice amount or dimensions like weight, and other such criteria applicable in your country.

For example:

  • If you live in Mexico, and the declared value of your ordered items is over $ 50, for you to receive a package, you will have to pay additional import tax of 19% which will be $ 9.50 to the courier service.
  • Whereas if you live in Turkey, and the declared value of your ordered items is over € 22, for you to receive a package, you will have to pay additional import tax of 18% which will be € 3.96 to the courier service.
How can I cancel my order? Chevron down icon Chevron up icon

Cancellation Policy for Published Printed Books:

You can cancel any order within 1 hour of placing the order. Simply contact [email protected] with your order details or payment transaction id. If your order has already started the shipment process, we will do our best to stop it. However, if it is already on the way to you then when you receive it, you can contact us at [email protected] using the returns and refund process.

Please understand that Packt Publishing cannot provide refunds or cancel any order except for the cases described in our Return Policy (i.e. Packt Publishing agrees to replace your printed book because it arrives damaged or material defect in book), Packt Publishing will not accept returns.

What is your returns and refunds policy? Chevron down icon Chevron up icon

Return Policy:

We want you to be happy with your purchase from Packtpub.com. We will not hassle you with returning print books to us. If the print book you receive from us is incorrect, damaged, doesn't work or is unacceptably late, please contact Customer Relations Team on [email protected] with the order number and issue details as explained below:

  1. If you ordered (eBook, Video or Print Book) incorrectly or accidentally, please contact Customer Relations Team on [email protected] within one hour of placing the order and we will replace/refund you the item cost.
  2. Sadly, if your eBook or Video file is faulty or a fault occurs during the eBook or Video being made available to you, i.e. during download then you should contact Customer Relations Team within 14 days of purchase on [email protected] who will be able to resolve this issue for you.
  3. You will have a choice of replacement or refund of the problem items.(damaged, defective or incorrect)
  4. Once Customer Care Team confirms that you will be refunded, you should receive the refund within 10 to 12 working days.
  5. If you are only requesting a refund of one book from a multiple order, then we will refund you the appropriate single item.
  6. Where the items were shipped under a free shipping offer, there will be no shipping costs to refund.

On the off chance your printed book arrives damaged, with book material defect, contact our Customer Relation Team on [email protected] within 14 days of receipt of the book with appropriate evidence of damage and we will work with you to secure a replacement copy, if necessary. Please note that each printed book you order from us is individually made by Packt's professional book-printing partner which is on a print-on-demand basis.

What tax is charged? Chevron down icon Chevron up icon

Currently, no tax is charged on the purchase of any print book (subject to change based on the laws and regulations). A localized VAT fee is charged only to our European and UK customers on eBooks, Video and subscriptions that they buy. GST is charged to Indian customers for eBooks and video purchases.

What payment methods can I use? Chevron down icon Chevron up icon

You can pay with the following card types:

  1. Visa Debit
  2. Visa Credit
  3. MasterCard
  4. PayPal
What is the delivery time and cost of print books? Chevron down icon Chevron up icon

Shipping Details

USA:

'

Economy: Delivery to most addresses in the US within 10-15 business days

Premium: Trackable Delivery to most addresses in the US within 3-8 business days

UK:

Economy: Delivery to most addresses in the U.K. within 7-9 business days.
Shipments are not trackable

Premium: Trackable delivery to most addresses in the U.K. within 3-4 business days!
Add one extra business day for deliveries to Northern Ireland and Scottish Highlands and islands

EU:

Premium: Trackable delivery to most EU destinations within 4-9 business days.

Australia:

Economy: Can deliver to P. O. Boxes and private residences.
Trackable service with delivery to addresses in Australia only.
Delivery time ranges from 7-9 business days for VIC and 8-10 business days for Interstate metro
Delivery time is up to 15 business days for remote areas of WA, NT & QLD.

Premium: Delivery to addresses in Australia only
Trackable delivery to most P. O. Boxes and private residences in Australia within 4-5 days based on the distance to a destination following dispatch.

India:

Premium: Delivery to most Indian addresses within 5-6 business days

Rest of the World:

Premium: Countries in the American continent: Trackable delivery to most countries within 4-7 business days

Asia:

Premium: Delivery to most Asian addresses within 5-9 business days

Disclaimer:
All orders received before 5 PM U.K time would start printing from the next business day. So the estimated delivery times start from the next day as well. Orders received after 5 PM U.K time (in our internal systems) on a business day or anytime on the weekend will begin printing the second to next business day. For example, an order placed at 11 AM today will begin printing tomorrow, whereas an order placed at 9 PM tonight will begin printing the day after tomorrow.


Unfortunately, due to several restrictions, we are unable to ship to the following countries:

  1. Afghanistan
  2. American Samoa
  3. Belarus
  4. Brunei Darussalam
  5. Central African Republic
  6. The Democratic Republic of Congo
  7. Eritrea
  8. Guinea-bissau
  9. Iran
  10. Lebanon
  11. Libiya Arab Jamahriya
  12. Somalia
  13. Sudan
  14. Russian Federation
  15. Syrian Arab Republic
  16. Ukraine
  17. Venezuela