An Introduction to Amazon EventBridge – Software Architecture for Building Serverless Microservices

An Introduction to Amazon EventBridge

Amazon EventBridge is a fully managed serverless event bus that allows you to send events from multiple event producers, apply event filtering to detect events, perform data transformation where needed, and route events to one or more target applications or services (see Figure 3-39). It’s one of the core fully managed and serverless services from AWS that plays a pivotal role in architecting and building event-driven applications. As an architect or a developer, familiarity with the features and capabilities of EventBridge is crucial. If you are already familiar with EventBridge and its capabilities, you may skip this section.

Figure 3-39. The components of Amazon EventBridge (source: adapted from an image on the Amazon EventBridge web page)

The technical ecosystem of EventBridge can be divided into two main categories. The first comprises its primary functionality, such as:

  • The interface for ingesting events from various sources (applications and services)
  • The interface for delivering events to configured target applications or services (consumers)
  • Support for multiple custom event buses as event transportation channels
  • The ability to configure rules to identify events and route them to one or more targets

The second consists of features that are auxiliary (but still important), including:

  • Support for archiving and replaying events
  • The event schema registry
  • EventBridge Scheduler for scheduling tasks on a one-time or recurring basis
  • EventBridge Pipes for one-to-one event transport needs

Let’s take a look at some of these items, to give you an idea of how to get started with EventBridge.

Event buses in Amazon EventBridge

Every event sent to EventBridge is associated with an event bus. If you consider EventBridge as the overall event router ecosystem, then event buses are individual channels of event flow. Event producers choose which bus to send the events to, and you configure event routing on each bus.

The EventBridge service in every AWS account has a default event bus. AWS uses the default bus for all events from several of its services.

You can also create one or more custom event buses for your needs. In addition, to receive events from AWS EventBridge partners, you can configure a partner event source and send events to a partner event bus.

Event producers and event publishing best practices – Software Architecture for Building Serverless Microservices

Event producers and event publishing best practices

Event producers are applications that create and publish events. As you develop on AWS, you publish your events to one of your custom event buses on Amazon EventBridge. Here are some best practices for event publishers to follow:

Event publishers should be agnostic of the consumers of their events.

One of the golden rules in event-driven architecture is that event producers remain agnostic of the consumers. The event producers should not make assumptions about who or what might consume their events and tailor the data. This agnosticism lets you keep applications decoupled—one of the main benefits of EDA.

Tip

In its pure form, consumer agnosticism suggests the use of a publish-and-forget model. However, the reality, as you develop granular microservices in serverless, can be different. There will be situations (still within the loosely coupled services construct) where a publisher may want to know the outcome of the handling of an event by a downstream consumer so that it can update its status for recordkeeping, trigger an action, etc. The event types listed in “Differentiating event categories from event types” can be indicators for this purpose.

Every event should carry a clear identification of its origin.

The details of the domain, service, function, etc., are important information to identify the origin of an event. Not all events need to follow a strict pattern of the hierarchy of their origin, but it benefits cross-domain consumers to set the event filters as part of consumption.

In a secured and regulated environment, teams apply event encryption measures to protect data privacy. Often, third-party systems sign the event payload, and consumers perform IP address checks to validate the event origin before consumption.

Treat domain events as data contracts that conform to event schemas.

With distributed services, event producers should conform the events to the published schema definitions, treating them as the equivalent of API contracts.

Versioning your events is essential to avoid introducing breaking changes.

Event producers should adhere to an agreed structure for uniformity across the organization.

As discussed earlier, uniformity in the event structure at the organizational, domain, or department level helps make the development process smoother in many ways.

It may be challenging to create a standard format for your events at the outset. You can evolve it as you gain experience and learn from others. Allow flexibility within the overall design to enable teams that need to accommodate information specific to them to do so.

An event should carry just the required data to denote the occurrence of the event.

Often it takes time to decide on the content of an event. If you follow the structure shown earlier, with metadata and data sections, start with the metadata, as you may already have clarity on most of those fields.

Begin from the context of when and where the event occurred, and build from there. It’s a good practice to include a minimal set of shareable data that is just enough to understand the event as an entity.

Event producers should add a unique tracing identifier for each event.

Including a unique identifier that can travel with the event to its consumers improves your application’s tracing capabilities and observability.

Be aware of the event payload size limit and service quota.

The maximum payload size of an event in Amazon EventBridge is 256 KB (at the time of writing). In high-volume event publishing use cases, consider the limit on how many events you can send to EventBridge per second, and have measures in place to avoid losing critical events if you exceed this limit.

Tip

When you publish events with sensitive data, you can add a metadata attribute—say, severity—to indicate the level of severity of the risk of this data being exposed, with values like RED, AMBER, and GREEN. You can then implement logic to prevent certain subscribers from receiving high-severity events, for example.

The gatekeeper event bus pattern described in Chapter 5 can make use of the severity classification of events to consider encryption measures when sharing events outside of its domain.

Uses for event sourcing – Software Architecture for Building Serverless Microservices

Uses for event sourcing

Although early thoughts on event sourcing focused on the ability to re-create the current state of an entity, many modern implementations use event sourcing for additional purposes, including:

Re-creating user session activities in a distributed event-driven system

Many applications capture user interactions in timeboxed sessions. A session usually starts at the point of a user signing into the application and stays active until they sign out, or the session expires.

Event sourcing is valuable here to help users resume from where they left off or resolve any queries or disputes, as the system can chart each user’s journey.

Enabling audit tracing in situations where you cannot fully utilize logs

While many applications rely on accumulated, centrally stored logs to trace details of system behaviors, customer activities, financial data flows, etc., enterprises need to comply with data privacy policies that prevent them from sending sensitive data and PII to the logs. With event sourcing, as the data resides inside the guarded cloud accounts, teams can build tools to reconstruct the flows from the event store.

Performing data analysis to gain insights

Data is a key driver behind many decisions in the modern digital business world. Event sourcing enables deeper insights and analytics at a fine-grained level. For example, the event store of a holiday booking system harvests every business event from several microservices that coordinate to help customers book their vacations. Often customers will spend time browsing through several destinations, offers, and customizable options, among other things, before completing the booking or, in some cases, abandoning it. The events that occur during this process carry clues that can be used, for example, to identify popular (and unpopular) destinations, packages, and offers.

Note

Since the conception of event sourcing a couple of decades ago, due to the emergence of the cloud and managed services, there have been vast changes in the volume of data captured and the available ingestion mechanisms and storage options. The data models of many (but not all) modern applications accommodate storing the change history for a certain period alongside the actual data, as per the business requirements, to enable quickly tracing all the activities.

EventStorming – Software Architecture for Building Serverless Microservices

EventStorming

One of the classic problems in software engineering is balancing what’s in the requirements, and what gets implemented and delivered. Misunderstandings of business requirements and misalignments between what the business stakeholders want and what the engineering team actually builds are common in the software industry. Applying the first principles of serverless development brings clarity to what you are building, making it easier to align with the business needs. Developing iteratively and in small increments makes it easier to correct when things go wrong before it is too late and becomes expensive.

You cannot expect every serverless engineer to have participated in requirements engineering workshops and UML modeling sessions or to understand domain-driven design. Often, engineers lack a complete understanding of why they are building what they are building. EventStorming is a collaborative activity that can help alleviate this problem.

What is EventStorming?

EventStorming is a collaborative, non-technical workshop format that brings together business and technology people to discuss, ideate, brainstorm, and model a business process or analyze a problem domain. Its inventor, Alberto Brandolini, drew his inspiration from domain-driven design. EventStorming is a fast, inexpensive activity that brings many thoughts to the board as a way of unearthing the details of a business domain using simple language that everybody understands. The two key elements of EventStorming are domain experts (contributors) and domain events (outcomes). Domain experts are subject matter experts (SMEs) who act as catalysts and leading contributors to the workshop. They bring domain knowledge to the process, answer questions, and explain business activities to everyone (especially the technical members). Domain events are significant events that reflect business facts at specific points. These events are identified and captured throughout the course of the workshop.

The EventStorming process looks at the business process as a series of domain events, arranges the events over a timeline, and depicts a story from start to finish. From the thoughts gathered and domain events identified, you begin to recognize the actors, commands, external systems, and, importantly, pivotal events that signal the change of context from one part to the other and indicate the border of a bounded context.

A command is a trigger or action that emits one or more domain events. For example, the success of a redeem reward command produces a reward-redeemed domain event. You will see the domain model emerging as aggregates (clusters of domain objects) as you identify the actors, commands, and domain events. In the previous example, the reward is an aggregate that receives a command and generates a domain event.

A full explanation of how you conduct an EventStorming workshop is beyond the scope of this book, but several resources are available. In addition to the ones listed on the website, Vlad Khononov’s book Learning Domain-Driven Design (O’Reilly) has a chapter on EventStorming.

Security Can Be Simple – Serverless and Security

Security Can Be Simple

Given the stakes, ensuring the security of a software application can be a daunting task. Breaches of application perimeters and data stores are often dramatic and devastating. Besides the immediate consequences, such as data loss and the need for remediation, these incidents usually have a negative impact on trust between consumers and the business, and between the business and its technologists.

Security Challenges

Securing cloud native serverless applications can be particularly challenging for several reasons, including:

Managed services

Throughout this book, you will see that managed services are core to serverless applications and, when applied correctly, can support clear separation of concerns, optimal performance, and effective observability. While managed services provide a solid foundation for your infrastructure, as well as several security benefits—primarily through the shared responsibility model, discussed later in this chapter—the sheer number of them available to teams building on AWS presents a problem: in order to utilize (or even evaluate) a managed service, you must first understand the available features, pricing model, and, crucially, security implications. How do IAM permissions work for this service? How is the shared responsibility model applied to this service? How will access control and encryption work?

Configurability

An aspect all managed services share is configurability. Every AWS service has an array of options that can be tweaked to optimize throughput, latency, resiliency, and cost. The combination of services can also yield further optimizations, such as the employment of SQS queues between Lambda functions to provide batching and buffering. Indeed, one of the primary benefits of serverless that is highlighted in this book is granularity. As you’ve seen, you have the ability to configure each of the managed services in your applications to a fine degree. In terms of security, this represents a vast surface area for the inadvertent introduction of flaws like excessive permissions and privilege escalation.

Emergent standards

AWS delivers new services, new features, and improvements to existing features and services at a consistently high rate. These new services and features could either be directly related to application or account security or present new attack vectors to analyze and secure. There are always new levers to pull and more things to configure. The community around AWS and, in particular, serverless also moves at a relatively fast pace, with new blog posts, video tutorials, and conference talks appearing every day. The security aspect of software engineering perhaps moves slightly slower than other elements, but there is still a steady stream of advice from cybersecurity professionals along with regular releases of vulnerability disclosures and associated research. Keeping up with all the AWS product updates and the best practices when it comes to securing your ever-evolving application can easily become one of your biggest challenges.

While cloud native serverless applications present unique security challenges, there are also plenty of inherent benefits when it comes to securing this type of software. The architecture of serverless applications introduces a unique security framework and provides the potential to work in a novel way within this framework. You have a chance to redefine your relationship to application security. Security can be simple.

Next, let’s explore how to start securing your serverless application.

Getting Started – Serverless and Security

Getting Started

Establishing a solid foundation for your serverless security practice is pivotal. Security can, and must, be a primary concern. And it is never too late to establish this foundation.

As previously alluded to, security must be a clearly defined process. It is not a case of completing a checklist, deploying a tool, or deferring to other teams. Security should be part of the design, development, testing, and operation of every part of your system.

Working within sound security frameworks that fit well with serverless and adopting sensible engineering habits, combined with all the support and expertise of your cloud provider, will go a long way toward ensuring your applications remain secure.

When applied to serverless software, two modern security trends can provide a solid foundation for securing your application: zero trust and the principle of least privilege. The next section examines these concepts.

Once you have established a zero trust, least privilege security framework, the next step is to identify the attack surface of your applications and the security threats that they are vulnerable to. Subsequent sections examine the most common serverless threats and the threat modeling process.

Optimism Is Greater than Pessimism

The Optimism Otter says: “People in our organisation need to move fast to meet the needs of our customers. The job of security is to help them move fast AND stay secure.”

Serverless enables rapid development; security specialists should not only support this pace but also act upon it. They should enhance the safety and sustainability of the pace and, above all, not slow it down.

Software engineers should delegate to security professionals whenever there is a clear need, either through knowledge acquisition or services, such as penetration testing and vulnerability scanning.

Combining the Zero Trust Security Model with Least Privilege Permissions – Serverless and Security

Combining the Zero Trust Security Model with Least Privilege Permissions

There are two modern cybersecurity principles that you can leverage as the cornerstones of your serverless security strategy: zero trust architecture and the principle of least privilege.

Zero trust architecture

The basic premise of zero trust security is to assume every connection to your system is a threat. Every single interface should then be protected by a layer of authentication (who are you?) and authorization (what do you want?). This applies both to public API endpoints, or the perimeter in the traditional castle-and-moat model, and private, internal interfaces, such as Lambda functions or DynamoDB tables. Zero trust controls access to each distinct resource in your application, whereas a castle-and-moat model only controls access to the resources at the perimeter of your application.

Imagine a knight errant galloping up to the castle walls, presenting likely-looking credentials to the guards and persuading them of their honorable intentions before confidently entering the castle across the lowered drawbridge. If these perimeter guards form the extent of the castle’s security, the knight is now free to roam the rooms, dungeons, and jewel store, collecting sensitive information for future raids or stealing valuable assets on the spot. If, however, each door or walkway had additional suspicious guards or sophisticated security controls that assumed zero trust by default, the knight would be entirely restricted and might even be deterred from infiltrating this castle at all.

Another scenario to keep in mind is a castle that cuts a single key for every heavy-duty door: should the knight gain access to one copy of this key, they’ll be able to open all the doors, no matter how thick or cumbersome. With zero trust, there’s a unique key for every door. Figure 4-2 shows how the castle-and-moat model compares to a zero trust architecture.

Figure 4-2. Castle-and-moat perimeter security compared to zero trust architecture

There are various applications of zero trust architecture, such as remote computing and enterprise network security. The next section briefly discusses how the zero trust model can be interpreted and applied to serverless applications.

The Power of AWS IAM – Serverless and Security

The Power of AWS IAM

AWS IAM is the one service you will use everywhere—but it’s also often seen as one of the most complex. Therefore, it’s important to understand IAM and learn how to harness its power. (You don’t have to become an IAM expert, though—unless you want to, of course!)

The power of AWS IAM lies in roles and policies. Policies define the actions that can be taken on certain resources. For example, a policy could define the permission to put events onto a specific EventBridge event bus. Roles are collections of one or more policies. Roles can be attached to IAM users, but the more common pattern in a modern serverless application is to attach a role to a resource. In this way, an EventBridge rule can be granted permission to invoke a Lambda function, and that function can in turn be permitted to put items into a DynamoDB table.

IAM actions can be split into two categories: control plane actions and data plane actions. Control plane actions, such as PutEvents and GetItem (e.g., used by an automated deployment role) manage resources. Data plane actions, such as PutEvents and GetItem (e.g., used by a Lambda execution role), interact with those resources.

Let’s take a look at a simple IAM policy statement and the elements it is composed of:
{
“Sid”
:
“ListObjectsInBucket”
,
# Statement ID, optional identifier for
                               
# policy statement
“Action”
:
“s3:ListBucket”
,
# AWS service API action(s) that will be allowed
                            
# or denied
“Effect”
:
“Allow”
,
# Whether the statement should result in an allow or deny
“Resource”
:
“arn:aws:s3:::bucket-name”
,
# Amazon Resource Name (ARN) of the
                                         
# resource(s) covered by the statement
“Condition”
:
{
# Conditions for when a policy is in effect
“StringLike”
:
{
# Condition operator
“s3:prefix”
:
[
# Condition key
“photos/”
,
# Condition value
]
}
}
}

See the AWS IAM documentation for full details of all the elements of an IAM policy.

Lambda execution roles – Serverless and Security

Lambda execution roles

A key use of IAM roles in serverless applications is Lambda function execution roles. An execution role is attached to a Lambda function and grants the function the permissions necessary to execute correctly, including access to any other AWS resources that are required. For example, if the Lambda function uses the AWS SDK to make a DynamoDB request that inserts a record in a table, the execution role must include a policy with the dynamodb:PutItem action for the table resource.

The execution role is assumed by the Lambda service when performing control plane and data plane operations. The AWS Security Token Service (STS) is used to fetch short-lived, temporary security credentials which are made available via the function’s environment variables during invocation.

Each function in your application should have its own unique execution role with the minimum permissions required to perform its duty. In this way, single-purpose functions (introduced in Chapter 6) are also key to security: IAM permissions can be tightly scoped to the function and remain extremely restricted according to the limited functionality.

IAM guardrails

As you are no doubt beginning to notice, effective serverless security in the cloud is about basic security hygiene. Establishing guardrails for the use of AWS IAM is a core part of promoting a secure approach to everyday engineering activity. Here are some recommended guardrails:

Apply the principle of least privilege in policies.

IAM policies should only include the minimum set of permissions required for the associated resource to perform the necessary control or data plane operations. As a general rule, do not use wildcards (*) in your policy statements. Wildcards are the antithesis of least privilege, as they apply blanket permissions for actions and resources. Unless the action explicitly requires a wildcard, always be specific.

Avoid using managed IAM policies.

These are policies provided by AWS, and they’re often tempting shortcuts, especially when you’re just getting started or using a service for the first time. You can use these policies early in prototyping or development, but you should replace them with custom policies as soon as you understand the integration better. Because these policies are designed to be applied to generic scenarios, they are simply not restricted enough and will usually violate the principle of least privilege when applied to interactions within your application.

Prefer roles to users.

IAM users are issued with static, long-lived AWS access credentials (an access key ID and secret access key). These credentials can be used to directly access the application provider’s AWS account, including all the resources and data in that account. Depending on the associated IAM roles and policies, the authenticating user may even have the ability to create or destroy resources. Given the power they grant the holder, the use and distribution of static credentials must be limited to reduce the risk of unauthorized access. Where possible, restrict IAM users to an absolute minimum (or, even better, do not have any IAM users at all).

Prefer a role per resource.

Each resource in your application, such as an EventBridge rule, a Lambda function, and an SQS queue, should have its own unique role. Permissions for those roles should be fine-grained and least-privileged.

Serverless Threat Modeling – Serverless and Security

Serverless Threat Modeling

Before designing a comprehensive security strategy for any serverless application, it is crucial to understand the attack vectors and model potential threats. This can be done by clearly defining the surface area of the application, the assets worth securing, and the threats, both internal and external, to the application’s security.

As previously stated, security is a continuous process: there is no final state. In order to maintain the security of an application as it grows, threats must be constantly reviewed and attack vectors regularly assessed. New features are added over time, more users serviced and more data collected. Threats will change, their severity will rise and fall, and application behavior will evolve. The tools available and industry best practices will also evolve, becoming more effective and focused in reaction to these changes.

Introduction to threat modeling

By this point you should have a fairly clear understanding of your security responsibilities, a foundational security framework, and the primary threats to serverless applications. Next, you need to map the framework and threats to your application and its services.

Threat modeling is a process that can help your team to identify attack vectors, threats, and mitigations through discussion and collaboration. It can support a shift-left (or even start-left) approach to security, where security is primarily owned by the team designing, building, and operating the application and is treated as a primary concern throughout the software development lifecycle. This is also sometimes referred to as DevSecOps.

To ensure continuous hardening of your security posture, threat modeling should be a process that you conduct regularly, for example at task refinement sessions. Threats should initially be modeled early in the solution design process (see Chapter 6) and focused at the feature or service level.

Tip

Threat Composer is a tool from AWS Labs that can help guide and visualize your threat modeling process.

Next you will be introduced to a framework that adds structure to the threat modeling process: STRIDE.

STRIDE

The STRIDE acronym describes six threat categories:

Spoofing

Pretending to be something or somebody other than who you are

Tampering

Changing data on disk, in memory, on the network, or elsewhere

Repudiation

Claiming that you were not responsible for an action

Information disclosure

Obtaining information that was not intended for you

Denial of service

Destruction or excessive consumption of finite resources

Elevation of privilege

Performing actions on protected resources that you should not be allowed to perform

STRIDE-per-element, or STRIDE/element for short, is a way to apply the STRIDE threat categories to elements in your application. It can help to further focus the threat modeling process.

The elements are targets of potential threats and are defined as:

  • Human actors/external entities
  • Processes
  • Data stores
  • Data flows

It is important not to get overwhelmed by the threat modeling process. Securing an application can be daunting, but remember, as outlined at the beginning of this chapter, it can also be simple, especially with serverless. Start small, work as a team, and follow the process one stage at a time. Identifying one threat for each element/threat combination in the matrix in Figure 4-4 would represent a great start.

Figure 4-4. Applying the STRIDE threat categories per element in your application

A process for threat modeling

As preparation for your threat modeling sessions, you may find it conducive to productive meetings to have the following information prepared:

  • High-level architecture of the application
  • Solution design documents
  • Data models and schemas
  • Data flow diagrams
  • Domain-specific industry compliance standards

A typical threat modeling process will comprise the following steps:

  1. Identify the elements in your application that could be targets for potential threats, including data assets, external actors, externally accessible entry points, and infrastructure resources.
  2. Identify a list of threats for each element identified in step 1. Be sure to focus on threats and not mitigations at this stage.
  3. For each threat identified in step 2, identify appropriate steps that can be taken to mitigate the threat. This could include encryption of sensitive data assets, applying access control to external actors and entry points, and ensuring each resource is granted only the minimum permissions required to perform its operations.
  4. Finally, assess whether the agreed remediation adequately mitigates the threat or if there is any residual risk that should be addressed.

For a comprehensive threat modeling template, see Appendix C.