An Introduction to Amazon EventBridge – Software Architecture for Building Serverless Microservices

An Introduction to Amazon EventBridge

Amazon EventBridge is a fully managed serverless event bus that allows you to send events from multiple event producers, apply event filtering to detect events, perform data transformation where needed, and route events to one or more target applications or services (see Figure 3-39). It’s one of the core fully managed and serverless services from AWS that plays a pivotal role in architecting and building event-driven applications. As an architect or a developer, familiarity with the features and capabilities of EventBridge is crucial. If you are already familiar with EventBridge and its capabilities, you may skip this section.

Figure 3-39. The components of Amazon EventBridge (source: adapted from an image on the Amazon EventBridge web page)

The technical ecosystem of EventBridge can be divided into two main categories. The first comprises its primary functionality, such as:

  • The interface for ingesting events from various sources (applications and services)
  • The interface for delivering events to configured target applications or services (consumers)
  • Support for multiple custom event buses as event transportation channels
  • The ability to configure rules to identify events and route them to one or more targets

The second consists of features that are auxiliary (but still important), including:

  • Support for archiving and replaying events
  • The event schema registry
  • EventBridge Scheduler for scheduling tasks on a one-time or recurring basis
  • EventBridge Pipes for one-to-one event transport needs

Let’s take a look at some of these items, to give you an idea of how to get started with EventBridge.

Event buses in Amazon EventBridge

Every event sent to EventBridge is associated with an event bus. If you consider EventBridge as the overall event router ecosystem, then event buses are individual channels of event flow. Event producers choose which bus to send the events to, and you configure event routing on each bus.

The EventBridge service in every AWS account has a default event bus. AWS uses the default bus for all events from several of its services.

You can also create one or more custom event buses for your needs. In addition, to receive events from AWS EventBridge partners, you can configure a partner event source and send events to a partner event bus.

Event consumers and event consumption best practices – Software Architecture for Building Serverless Microservices

Event consumers and event consumption best practices

Event consumers are applications on the receiving end. They set up subscription policies to identify the events that are of interest to them. As you get started with Amazon EventBridge, here are a few tips and best practices for event consumers to keep in mind:

Consumer applications may receive duplicate events and should be idempotent.

In event-driven computing, in the majority of cases, the event delivery is guaranteed to be at least once (as opposed to exactly once or at most once). If you don’t properly account for this, it can cause severe consequences. Imagine your bank account getting debited twice for a purchase you made!

Building idempotency into an application that reacts upon receipt of events is the most critical measure to implement.

Storing the event data while processing it has benefits.

Depending on the event consumer’s logic, the event processing may happen in near real time or with a delay. A practice often adopted by event subscribers is to store the event data—temporarily or long-term—before acting on it. This is a form of storage-first pattern, which you will learn about in Chapter 5.

There are several benefits to this practice. Primarily, it helps to alleviate the problem of events potentially being received more than once by providing a register that can be checked for duplicates before handling an event. In addition, storing the events eases the retry burden on the consumer application; if a downstream application goes down, for example, it won’t need to request that the producer resend all of the events that application needs to process.

Ordering of events is not guaranteed.

Maintaining the order of events in a distributed event-driven architecture with multiple publishers and subscribers is hard. EventBridge does not guarantee event ordering. If the order of events is crucial, you’ll need to work with the event producers to add sequence numbering. If that’s not possible, subscribers can implement sorting based on the event creation timestamps to put them into the correct order.

Avoid modifying events before relaying them.

There are situations where applications engage in an asynchronous chain of actions known as the event relay pattern: the service receives an event, performs an action, and emits an event for a downstream service. In such situations, the subscriber should never modify the source event and publish a modified version. It must always emit a new event with its identity as publisher and the schema it is responsible for.

Collect events that failed to reach the target consumer.

In a resilient event-driven architecture, a consumer may have all the measures it needs to process an event successfully—but what happens if the event gets lost in transit and does not arrive, or if the consumer experiences an unforeseen outage?

EventBridge retries event delivery until successful for up to 24 hours. If it fails to deliver an event to the target, it can send the failed event to a dead letter queue (DLQ) for later processing. As you saw earlier, you can also use EventBridge’s archive and replay feature to reduce the risk of missing critical business events.

Note

CloudEvents is a specification for describing event data in a common way. It’s supported by the Cloud Native Computing Foundation (CNCF). Adopting CloudEvents as the standard for defining and producing your event payloads will ensure your events remain interoperable, understandable, and predictable, particularly as usage increases across the domains in your organization.

AsyncAPI is the industry standard for defining asynchronous APIs, and it can be used to describe and document message-driven APIs in a machine-readable format. Whereas the CloudEvents specification constrains the schema of your event payloads, Asyn⁠c­API helps you to document the API for producing and consuming your events. AsyncAPI is to event-driven interfaces as OpenAPI is to RESTful APIs.

Architectural considerations for event sourcing – Software Architecture for Building Serverless Microservices

Architectural considerations for event sourcing

At a high level, the concept of event sourcing is simple—but its implementation requires careful planning. When distributed microservices come together to perform a business function, you face the challenge of having hundreds of events of different categories and types being produced and consumed by various services. In such a situation:

  • How do you identify which events to keep in an event store?
  • How do you collect all the related events in one place?
  • Should you keep an event store per microservice, bounded context, application, domain, or enterprise?
  • How do you handle encrypted and sensitive data?
  • How long do you keep the events in an event store?

Finding and implementing the answers to these critical questions involves several teams and business stakeholders working together. Let’s take a look at some of the options.

Dedicated microservice for event sourcing

Domain events flow via one or more event buses in a distributed service environment. With a dedicated microservice for event sourcing, you separate the concerns from different services and assign it to a single-purpose microservice. It manages the rules to ingest the required events, perform necessary data translations, own one or more event stores, and manage data retention and transition policies, among other tasks.

Event store per bounded context

A well-defined bounded context will benefit from having its own event store, which can be helpful for auditing purposes or for reconstructing the events that led to the current state of the application or a particular business entity. For example, in the rewards system we looked at earlier in this chapter (Figure 3-36), you might want to have an event store to keep track of rewards updates. With an extendable event-driven architecture, it’s as simple as adding another set piece microservice for event sourcing, as shown in Figure 3-42.

Figure 3-42. Adding a dedicated rewards-audit microservice for event sourcing to the rewards system

Application-level event store

Many applications you interact with daily coordinate with several distributed services. An ecommerce domain, for example, has many subdomains and bounded contexts, as you saw back in Figure 2-3 (in “Domain-first”). Each bounded context can successfully implement its own event sourcing capability, as discussed in the previous subsection, but it can only capture its part in the broader application context.

As shown in Figure 3-43, your journey as an ecommerce customer purchasing items touches several bounded contexts—product details, stock, cart, payments, rewards, etc. To reconstruct the entire journey, you need events from all these areas. To plot a customer’s end-to-end journey, you must collate the sequence of necessary events. An application-level event store is beneficial in this use case.

Figure 3-43. An ecommerce customer’s end-to-end order journey, with the different touchpoints

Centralized event sourcing cloud account

So far, you have seen single-purpose dedicated microservice, bounded context, and application-level event sourcing scenarios. A centralized event store takes things to an even more advanced level, as shown in Figure 3-44. This is an adaptation of the centralized logging pattern, where enterprises use a consolidated central cloud account to stream all the CloudWatch logs from multiple accounts from different AWS Regions. It provides a single point of access for all their critical logs, allowing them to perform security audits, compliance checks, and business analysis.

Figure 3-44. A central cloud account for event sourcing

There are, however, substantial efforts and challenges involved in setting up a central event sourcing account and related services:

  • The first challenge is agreeing upon a way of sharing events. Not all organizations have a central event bus that touches every domain. EventBridge’s cross-account, cross-region event sharing is an ideal option here.
  • Identifying and sourcing the necessary events is the next challenge. A central repository is required in order to have visibility into the details of all the event definitions. EventBridge Schema Registry is useful, but it is per AWS account, and there is no central schema registry.
  • With several event categories and types, structuring the event store and deriving the appropriate data queries and access patterns to suit the business requirements requires careful planning. You may need multiple event stores and different types of data stores—SQL, NoSQL, object, etc.—depending on the volume of events and the frequency of data access.
  • Providing access to the event stores and events is a crucial element of this setup, with consideration given to data privacy, business confidentiality, regulatory compliance, and other critical measures.

Event sourcing is an important pattern and practice for teams building serverless applications. Even if your focus is primarily on delivering the core business features (to bring value), enabling features such as event sourcing is still crucial. As mentioned earlier, not every team will need the ability to reconstruct the application’s state based on the events; however, all teams will benefit from being able to use the event store for auditing and tracing critical business flows.

The importance of EventStorming in serverless development – Software Architecture for Building Serverless Microservices

The importance of EventStorming in serverless development

EventStorming is a great way to collaborate and to learn about business requirements, identify domain events, and shape the model before considering architecture and solution design. However, depending on the scale of the domain or product, the outcome of EventStorming could be high-level.

Say your organization is transforming the IT operations of its manufacturing division. The EventStorming exercise will bring together several domain experts, business stakeholders, enterprise and solution architects, engineering leads and engineers, product managers, UX designers, QA engineers, test specialists, etc. After a few days of collaboration, you identify various business process flows, domain events, model entities, and many bounded contexts, among other things. With clarity about the entire domain, you start assigning ownership—stream-aligned teams—to the bounded contexts.

These teams then delve into each bounded context to identify web applications, microservices, APIs, events, and architectural constructs to implement. While the artifacts from the domain-level EventStorming sessions form a base, serverless teams need more granular details. Hence, it is useful in serverless development if you employ EventStorming in two stages:

Domain-level EventStorming

According to Brandolini, this is the “Big Picture” EventStorming workshop aimed at identifying the business processes, domain events, commands, actors, aggregates, etc.

Development-level EventStorming

This is a more small-scale activity that involves an engineering team, its business stakeholders, the product manager, and UX designers. This is similar to what Brandolini calls “Design Level EventStorming.”

Here, the focus is on the bounded context and developments within it. The team identifies the internal process flows, local events, and separation of functionality and responsibilities. These become the early sketches for set-piece microservices, their interfaces, and event interactions. The outcome from the development-level EventStorming feeds into the solution design process (explained in Chapter 6) as engineers start thinking about the serverless architecture.

Let’s consider an example situation for development-level EventStorming:

Context: Figure 2-3 (in “Domain-first”) shows the breakdown of an ecommerce domain. A domain-level EventStorming workshop has identified the subdomains and bounded contexts. A stream-aligned team owns the user payments bounded context.

Use case: Due to customer demand and to prevent fraud, the stakeholders want to add a new feature where customers who call the customer support center to place orders over the phone can make their payments via a secure link emailed to them rather than providing the card number over the phone.

The proposed new feature only requires part of the ecommerce team to participate in a (development-level) EventStorming session. It is a small-scale activity within a bounded context with fewer participants.

Summary

You’ve just completed one of the most crucial chapters of this book on serverless development. The architectural thoughts, best practices, and recommendations you’ve learned here are essential whether you work as an independent consultant or part of a team in a big enterprise. Irrespective of the organization’s size, your ambition is to architect solutions to the strengths of serverless. Business requirements and problem domains can be complex and hard to comprehend, and it is the same in other fields and walks of life. You can observe and learn how people successfully solve non-software problems and apply those principles in your work.

Serverless architecture need not be a complex and tangled web of lines crisscrossing your entire organization. Your vision is to architect single-purpose, loosely coupled, distributed, and event-driven microservices as set pieces that are easier to conceive, develop, operate, observe, and evolve within the serverless technology ecosystem of your organization.

You will carry the learnings from these initial chapters with you as you go through the remainder of the book. You will begin to apply the architectural lessons in Chapter 5, which will teach you some core implementation patterns in serverless development. But first, the next chapter delves into one of the fundamental and critical topics in software development: security.

Security Can Be Simple – Serverless and Security

Security Can Be Simple

Given the stakes, ensuring the security of a software application can be a daunting task. Breaches of application perimeters and data stores are often dramatic and devastating. Besides the immediate consequences, such as data loss and the need for remediation, these incidents usually have a negative impact on trust between consumers and the business, and between the business and its technologists.

Security Challenges

Securing cloud native serverless applications can be particularly challenging for several reasons, including:

Managed services

Throughout this book, you will see that managed services are core to serverless applications and, when applied correctly, can support clear separation of concerns, optimal performance, and effective observability. While managed services provide a solid foundation for your infrastructure, as well as several security benefits—primarily through the shared responsibility model, discussed later in this chapter—the sheer number of them available to teams building on AWS presents a problem: in order to utilize (or even evaluate) a managed service, you must first understand the available features, pricing model, and, crucially, security implications. How do IAM permissions work for this service? How is the shared responsibility model applied to this service? How will access control and encryption work?

Configurability

An aspect all managed services share is configurability. Every AWS service has an array of options that can be tweaked to optimize throughput, latency, resiliency, and cost. The combination of services can also yield further optimizations, such as the employment of SQS queues between Lambda functions to provide batching and buffering. Indeed, one of the primary benefits of serverless that is highlighted in this book is granularity. As you’ve seen, you have the ability to configure each of the managed services in your applications to a fine degree. In terms of security, this represents a vast surface area for the inadvertent introduction of flaws like excessive permissions and privilege escalation.

Emergent standards

AWS delivers new services, new features, and improvements to existing features and services at a consistently high rate. These new services and features could either be directly related to application or account security or present new attack vectors to analyze and secure. There are always new levers to pull and more things to configure. The community around AWS and, in particular, serverless also moves at a relatively fast pace, with new blog posts, video tutorials, and conference talks appearing every day. The security aspect of software engineering perhaps moves slightly slower than other elements, but there is still a steady stream of advice from cybersecurity professionals along with regular releases of vulnerability disclosures and associated research. Keeping up with all the AWS product updates and the best practices when it comes to securing your ever-evolving application can easily become one of your biggest challenges.

While cloud native serverless applications present unique security challenges, there are also plenty of inherent benefits when it comes to securing this type of software. The architecture of serverless applications introduces a unique security framework and provides the potential to work in a novel way within this framework. You have a chance to redefine your relationship to application security. Security can be simple.

Next, let’s explore how to start securing your serverless application.

Combining the Zero Trust Security Model with Least Privilege Permissions – Serverless and Security

Combining the Zero Trust Security Model with Least Privilege Permissions

There are two modern cybersecurity principles that you can leverage as the cornerstones of your serverless security strategy: zero trust architecture and the principle of least privilege.

Zero trust architecture

The basic premise of zero trust security is to assume every connection to your system is a threat. Every single interface should then be protected by a layer of authentication (who are you?) and authorization (what do you want?). This applies both to public API endpoints, or the perimeter in the traditional castle-and-moat model, and private, internal interfaces, such as Lambda functions or DynamoDB tables. Zero trust controls access to each distinct resource in your application, whereas a castle-and-moat model only controls access to the resources at the perimeter of your application.

Imagine a knight errant galloping up to the castle walls, presenting likely-looking credentials to the guards and persuading them of their honorable intentions before confidently entering the castle across the lowered drawbridge. If these perimeter guards form the extent of the castle’s security, the knight is now free to roam the rooms, dungeons, and jewel store, collecting sensitive information for future raids or stealing valuable assets on the spot. If, however, each door or walkway had additional suspicious guards or sophisticated security controls that assumed zero trust by default, the knight would be entirely restricted and might even be deterred from infiltrating this castle at all.

Another scenario to keep in mind is a castle that cuts a single key for every heavy-duty door: should the knight gain access to one copy of this key, they’ll be able to open all the doors, no matter how thick or cumbersome. With zero trust, there’s a unique key for every door. Figure 4-2 shows how the castle-and-moat model compares to a zero trust architecture.

Figure 4-2. Castle-and-moat perimeter security compared to zero trust architecture

There are various applications of zero trust architecture, such as remote computing and enterprise network security. The next section briefly discusses how the zero trust model can be interpreted and applied to serverless applications.

The Power of AWS IAM – Serverless and Security

The Power of AWS IAM

AWS IAM is the one service you will use everywhere—but it’s also often seen as one of the most complex. Therefore, it’s important to understand IAM and learn how to harness its power. (You don’t have to become an IAM expert, though—unless you want to, of course!)

The power of AWS IAM lies in roles and policies. Policies define the actions that can be taken on certain resources. For example, a policy could define the permission to put events onto a specific EventBridge event bus. Roles are collections of one or more policies. Roles can be attached to IAM users, but the more common pattern in a modern serverless application is to attach a role to a resource. In this way, an EventBridge rule can be granted permission to invoke a Lambda function, and that function can in turn be permitted to put items into a DynamoDB table.

IAM actions can be split into two categories: control plane actions and data plane actions. Control plane actions, such as PutEvents and GetItem (e.g., used by an automated deployment role) manage resources. Data plane actions, such as PutEvents and GetItem (e.g., used by a Lambda execution role), interact with those resources.

Let’s take a look at a simple IAM policy statement and the elements it is composed of:
{
“Sid”
:
“ListObjectsInBucket”
,
# Statement ID, optional identifier for
                               
# policy statement
“Action”
:
“s3:ListBucket”
,
# AWS service API action(s) that will be allowed
                            
# or denied
“Effect”
:
“Allow”
,
# Whether the statement should result in an allow or deny
“Resource”
:
“arn:aws:s3:::bucket-name”
,
# Amazon Resource Name (ARN) of the
                                         
# resource(s) covered by the statement
“Condition”
:
{
# Conditions for when a policy is in effect
“StringLike”
:
{
# Condition operator
“s3:prefix”
:
[
# Condition key
“photos/”
,
# Condition value
]
}
}
}

See the AWS IAM documentation for full details of all the elements of an IAM policy.

Lambda execution roles – Serverless and Security

Lambda execution roles

A key use of IAM roles in serverless applications is Lambda function execution roles. An execution role is attached to a Lambda function and grants the function the permissions necessary to execute correctly, including access to any other AWS resources that are required. For example, if the Lambda function uses the AWS SDK to make a DynamoDB request that inserts a record in a table, the execution role must include a policy with the dynamodb:PutItem action for the table resource.

The execution role is assumed by the Lambda service when performing control plane and data plane operations. The AWS Security Token Service (STS) is used to fetch short-lived, temporary security credentials which are made available via the function’s environment variables during invocation.

Each function in your application should have its own unique execution role with the minimum permissions required to perform its duty. In this way, single-purpose functions (introduced in Chapter 6) are also key to security: IAM permissions can be tightly scoped to the function and remain extremely restricted according to the limited functionality.

IAM guardrails

As you are no doubt beginning to notice, effective serverless security in the cloud is about basic security hygiene. Establishing guardrails for the use of AWS IAM is a core part of promoting a secure approach to everyday engineering activity. Here are some recommended guardrails:

Apply the principle of least privilege in policies.

IAM policies should only include the minimum set of permissions required for the associated resource to perform the necessary control or data plane operations. As a general rule, do not use wildcards (*) in your policy statements. Wildcards are the antithesis of least privilege, as they apply blanket permissions for actions and resources. Unless the action explicitly requires a wildcard, always be specific.

Avoid using managed IAM policies.

These are policies provided by AWS, and they’re often tempting shortcuts, especially when you’re just getting started or using a service for the first time. You can use these policies early in prototyping or development, but you should replace them with custom policies as soon as you understand the integration better. Because these policies are designed to be applied to generic scenarios, they are simply not restricted enough and will usually violate the principle of least privilege when applied to interactions within your application.

Prefer roles to users.

IAM users are issued with static, long-lived AWS access credentials (an access key ID and secret access key). These credentials can be used to directly access the application provider’s AWS account, including all the resources and data in that account. Depending on the associated IAM roles and policies, the authenticating user may even have the ability to create or destroy resources. Given the power they grant the holder, the use and distribution of static credentials must be limited to reduce the risk of unauthorized access. Where possible, restrict IAM users to an absolute minimum (or, even better, do not have any IAM users at all).

Prefer a role per resource.

Each resource in your application, such as an EventBridge rule, a Lambda function, and an SQS queue, should have its own unique role. Permissions for those roles should be fine-grained and least-privileged.

Securing the Serverless Supply Chain – Serverless and Security

Securing the Serverless Supply Chain

Vulnerable and outdated components and supply chain–based attacks are quickly becoming a primary concern for software engineers.

Note

According to supply chain security company Socket, “supply chain attacks rose a whopping 700% in the past year, with over 15,000 recorded attacks.” One example they cite occurred in January 2022, when an open source software maintainer intentionally added malware to his own package, which was being downloaded an average of 100 million times per month. A notable casualty was the official AWS SDK.

Who is responsible for protecting against these vulnerabilities and attacks? Serverless compute on AWS Lambda provides you with a clear example of the shared responsibility model presented earlier in this chapter. It is the responsibility of AWS to keep the software in the runtime and execution environment updated with the latest security patches and performance improvements, and it is the responsibility of the application engineer to secure the function code itself. This includes keeping the libraries used by the function up-to-date.

Given that it is your responsibility as a cloud application developer to secure the code you deliver to the cloud and run in your Lambda functions, what are the attack vectors and threat levels here, and how can you mitigate the related security issues?

Securing the Dependency Supply Chain

Open source software is an incredible enabler of rapid software development and delivery. As a software engineer, you can rely on the expertise and work of others in your community when composing your applications. However, this relationship is built on a fragile layer of trust. Every time you install a dependency, you are implicitly trusting the myriad contributors to that package and everything in that package’s own tree of dependencies. The code of hundreds of programmers becomes a key component of your production software.

You must be aware of the risks involved in installing and executing open source software, and the steps you can take to mitigate such risks.

Going Further with SLSA – Serverless and Security

Going Further with SLSA

The SLSA security framework (pronounced salsa, short for Supply chain Levels for Software Artifacts) is “a checklist of standards and controls to prevent tampering, improve integrity, and secure packages and infrastructure.” SLSA is all about going from “safe enough” to maximum resiliency across the entire software supply chain.

If you’re at a fairly high level of security maturity, you may find it useful to use this framework to measure and improve the security of your software supply chain. Follow the SLSA documentation to get started. The Software Component Verification Standard (SCVS) from OWASP is another framework for measuring supply chain security.

Lambda Code Signing

The last mile in the software supply chain is packaging and deploying your function code to the cloud. At this point, your function will usually consist of your business logic (code you have authored) and any third-party libraries listed in the function’s dependencies (code someone else has authored).

Lambda provides the option to sign your code before deploying it. This enables the Lambda service to verify that a trusted source has initiated the deployment and that the code has not been altered or tampered with in any way. Lambda will run several validation checks to verify the integrity of the code, including that the package has not been modified since it was signed and that the signature itself is valid.

To sign your code, you first create one or more signing profiles. These profiles might map to the environments and accounts your application uses—for example, you may have a signing profile per AWS account. Alternatively, you could opt to have a signing profile per function for greater isolation and security. The CloudFormation resource for a signing profile looks like this, where the PlatformID denotes the signature format and signing algorithm that will be used by the profile:


{
“Type”
:
“AWS::Signer::SigningProfile”
,
“Properties”
:
{
“PlatformId”
:
“AWSLambda-SHA384-ECDSA”
,
}
}

Once you have defined a signing profile, you can then use it to configure code signing for your functions:


{
“Type”
:
“AWS::Lambda::CodeSigningConfig”
,
“Properties”
:
{
“AllowedPublishers”
:
[
{
“SigningProfileVersionArns”
:
[
“arn:aws:signer:us-east-1:123456789123:/signing-profiles/my-profile”
]
}
],
“CodeSigningPolicies”
:
{
“UntrustedArtifactOnDeployment”
:
“Enforce”
}
}
}

Finally, assign the code signing configuration to your function:


{
“Type”
:
“AWS::Lambda::Function”
,
“Properties”
:
{
“CodeSigningConfigArn”
:
[
“arn:aws:lambda:us-east-1:123456789123:code-signing-config:csc-config-id”
,
]
}
}

Now, when you deploy this function the Lambda service will verify the code was signed by a trusted source and has not been tampered with since being signed.