The Lambda Project Series: Chapter One
Exploring Logging, Service Maps and Distributed Tracing for AWS Lambda
In this post of Cloudy With a Chance of Serverless, we will try our hand at integrating better Logging, Service Maps and Distributed Tracing for AWS Lambda. Ready? Let’s go!
The Operational Excellence Club
You’re about to see the power of observability! Let’s move beyond print()
statements (yeah, I am guilty as well) and add a logger instead; eventually moving toward structured logging.
Both the CloudWatch and Lambda AWS Service teams (SDEs) are pretty darn amazing, as you’re about to see soon! Our game plan is to utilize the power of Lambda Layers, which contain utilities and libraries, and the CloudWatch’s Service Lens. We’ll be using Python in this example, but the process is similar for other supported languages.
Our Lambda Layer will be Lambda Powertools Python, it’s AWS Managed and open-source on Github. It generously offers us the ability to add tracing, structured logging, custom metrics, and more, quite easily!
The process of adding a managed Lambda Layer is pretty simple, all you need to grab is the ARN from the link above, add it from your infra-as-code stack, or from the Lambda Console in a Sandbox Account.
Let’s look at some code, say a before and after, assuming we removed print() statements in favor of the native logging library in Python. This is the before.
import boto3
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
ddb = boto3.client('dynamodb')
#=====================
def get_ddb_tables() -> list:
response = (
ddb.get_paginator('list_tables')
.paginate()
.build_full_result()
)
tables = response["TableNames"]
logger.info(f"Found {len(tables)} DynamoDB Tables")
return tables
#=====================
@logger.inject_lambda_context
def lambda_handler(event, context):
try:
return get_ddb_tables()
except Exception as e:
logger.exception(e)
raise
This simple example above adds a paginated API call to list all DynamoDB Tables in this Region. If something goes wrong, such as missing permissions to make this API call, then the exception will be logged and we’ll exit with an error.
CloudWatch Logs will then reflect the Logging Level, [INFO] [ERROR], and this can make investigations a bit easier for us in the future
[INFO] 2022-07-04T12:01:15.123Z <requestId> starting
[ERROR] ClientError:
An error occurred (AccessDeniedException) when calling the ListTables operation:
User: arn:aws:sts::<accountId>:assumed-role/<hash>/<function> is not authorized to perform:
dynamodb:ListTables on resource: <arn> because no identity-based policy allows the dynamodb:ListTables action
...
Easy enough, right?
Well, sure, but, the concern I have is that the logging in CloudWatch is line-by-line, and it may not be easy to debug, especially when collating in an aggregated logging solution, say a corporate search cluster.
We should add some structure to it; JSON being a popular data format.
Not much refactoring required: Let’s see what the added libraries, via the Powertools, have to offer!
Below is the after code, straight out of the logger docs. We’ll just need to add a few Environment Variables to the Lambda Function’s Configuration for Powertools:
LOG_LEVEL = INFO
POWERTOOLS_LOGGER_LOG_EVENT = True
POWERTOOLS_SERVICE_NAME = devops_resource_lister
import boto3
from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext
logger = Logger()
ddb = boto3.client('dynamodb')
#=====================
def get_ddb_tables() -> list:
response = (
ddb.get_paginator('list_tables')
.paginate()
.build_full_result()
)
tables = response["TableNames"]
logger.info(f"Found {len(tables)} DynamoDB Tables")
return tables
#=====================
@logger.inject_lambda_context
def lambda_handler(event, context):
try:
get_ddb_tables()
return True
except Exception as e:
logger.exception(e)
raise
Above, we just added these 4 lines to our code
Removing all other logging references
from aws_lambda_powertools import Logger
from aws_lambda_powertools.utilities.typing import LambdaContext
logger = Logger()
@logger.inject_lambda_context # above the main handler
How is this any better than before?
Roll that beautifulbeanslogging footage 🤤
{
"level": "INFO",
"location": "decorate:352",
"message": {
"key1": "value1",
"key2": "value2",
"key3": "value3"
},
"timestamp": "2022-07-17 15:55:18,861+0000",
"service": "devops_resource_lister",
"cold_start": true,
"function_name": "<name>",
"function_memory_size": "128",
"function_arn": "<arn>",
"function_request_id": "<hash>",
"xray_trace_id": "<hash>"
}
...
{
"level": "INFO",
"location": "get_ddb_tables:18",
"message": "Found 4 DynamoDB Tables",
"timestamp": "2022-07-17 15:55:19,231+0000",
"service": "devops_resource_lister",
"cold_start": true,
"function_name": "<name>",
"function_memory_size": "128",
"function_arn": "<arn>",
"function_request_id": "<hash>",
"xray_trace_id": "<hash>"
}
There we go - This is much better for searching and debugging, aggregating into existing logging tools - even shows the input data (which can be turned off), method (get_ddb_tables:18) and cold start information!
Alright, I think you have the gist of using logging and can improve your current apps pretty easily. If you’re not sold on the Lambda Layer (Powertools) yet, then let me offer an additional benefit that it provides: Distributed Tracing and Service Maps!
Add the Powertools Tracer
import boto3
from aws_lambda_powertools import Logger, Tracer
from aws_lambda_powertools.utilities.typing import LambdaContext
logger = Logger()
tracer = Tracer()
ddb = boto3.client('dynamodb')
s3 = boto3.client('s3')
#=====================
@tracer.capture_method
def get_ddb_tables() -> list:
response = (
ddb.get_paginator('list_tables')
.paginate()
.build_full_result()
)
tables = response["TableNames"]
logger.info(f"Found {len(tables)} DynamoDB Tables")
return tables
#=====================
@tracer.capture_method
def get_s3_buckets() -> list:
response = s3.list_buckets()
buckets = response["Buckets"]
logger.info(f"Found {len(buckets)} S3 Buckets")
return buckets
#=====================
@logger.inject_lambda_context
@tracer.capture_lambda_handler
def lambda_handler(
event: dict,
context: LambdaContext
) -> bool:
try:
get_ddb_tables()
get_s3_buckets()
return True
except Exception as e:
logger.exception(e)
raise
Above, we just added/updated these few lines to our code
from aws_lambda_powertools import Logger, Tracer
tracer = Tracer()
@tracer.capture_method # above two methods
@tracer.capture_lambda_handler # above the main handler
@logger.inject_lambda_context # above the main handler
Enable “Active Tracing” in your Lambda Function’s Configuration. This can be done from the UI, under Monitoring and operations tools, or from your preferred infra-as-code tool
Invoke or Test your Lambda function a few more times to generate some data for us to look at for distributed tracing, and a visual map of the different API calls the app is making.
From your Lambda Function’s Console, select Monitor → View X-Ray Traces in ServiceLens Button. This will take you to CloudWatch’s Traces section, and you will see collected Traces toward the bottom of the page. Alternatively, it takes a moment to see this information directly in the Lambda Function’s Console as well: Under the Monitor → Traces tab.
Armed with this data, and visualizations, you will now be able to better analyze and debug apps, especially when determining root causes of performance issues and errors! 🙌
You will see a visual representation of how your app interacts with AWS APIs, the AWS Icons are automatically added
You will also see a break down of latency, response codes, including any exceptions, for the API calls which the app makes for each Trace Id. This data includes the method names as well
Lastly, this won’t be fun if you have to find your function, traces, and logs every single time you need to see what’s happening. Creating a Dashboard, including the service map, is pretty darn easy: From the Lambda Function’s Console: Under the Monitor → Traces tab, click on “Add to dashboard”, create a new dashboard by giving it a name. Going forward, we’ll have our Function’s metrics, service maps, and logs all in one centralized view!
Alright, that’s all for now - I hope you found something valuable in this post. We’ll work on improving our Security posture in Chapter Two: Security’s Curse, next time!