use the aws-cli to suspend an AWS Lambda function processing an Amazon Kinesis stream, then resume it again
This week, Steve Caldwell (CTO and prolific developer) encountered a situation which required pausing an AWS Lambda function with a Kinesis stream source, and later resuming it, preferably from the same point at which it had been reading in each Kinesis shard.
We brainstormed a half dozen different ways to accomplish this with varying levels of difficulty, varying levels of cost, and varying levels of not-quite-what-we-wanted-ness.
A few hours later, Steve shared that he had discovered the answer (and suggested I pass on the answer to you).
Buried in the AWS Lambda documentation for
update-event-source-mapping in the aws-cli (and the
UpdateEventSourceMapping in the API), is the mention of
--no-enabled with this description:
Specifies whether AWS Lambda should actively poll the stream or not. If disabled, AWS Lambda will not poll the stream.
As it turns out, this does exactly what we need. These options can be specified to change the processing enabled state without changing anything else about the AWS Lambda function or how it reads from the stream.
The big benefit that isn’t documented (but verified by Amazon) is that this saves the place in each Kinesis shard. On resume, AWS Lambda continues reading from the same shard iterators without missing or duplicating records in the stream.
To pause an AWS Lambda function reading an Amazon Kinesis stream:
region=us-east-1 event_source_mapping_uuid=... # (see below) aws lambda update-event-source-mapping \ --region "$region" \ --uuid "$event_source_mapping_uuid" \ --no-enabled
And to resume the AWS Lambda function right where it was suspended without losing place in any of the Kinesis shards:
aws lambda update-event-source-mapping \ --region "$region" \ --uuid "$event_source_mapping_uuid" \ --enabled
You can find the current state of the event source mapping (e.g., whether it is enabled/unpaused or disabled/paused) with this command:
aws lambda get-event-source-mapping \ --region "$region" \ --uuid "$event_source_mapping_uuid" \ --output text \ --query 'State'
Here are the possible states:
Deleting. I’m not sure how long
it can spend in the
Disabling state before transitioning to full
Disabled, but you might want to monitor the state and wait if you
want to make sure it is fully paused before taking some other action.
If you’re not sure what
$event_source_mapping_uuid should be set to
in all the above commands, keep reading.
Here’s an aws-cli incantation that will return the event source mapping UUID given a Kinesis stream and connected AWS Lambda function.
source_arn=arn:aws:kinesis:us-east-1:ACCOUNTID:stream/STREAMNAME function_name=FUNCTIONNAME event_source_mapping_uuid=$( aws lambda list-event-source-mappings \ --region "$region" \ --function-name "$function_name" \ --output text \ --query 'EventSourceMappings[?EventSourceArn==`'$source_arn'`].UUID') echo event_source_mapping_uuid=$event_source_mapping_uuid
If your AWS Lambda function has multiple Kinesis event sources, you will need to pause each one of them separately.
Other Event Sources
The same process described above should be usable to pause/resume an AWS Lambda function reading from a DynamoDB Stream, though I have not tested it.
Other types of AWS Lambda function event sources are not currently possible to pause and resume without missing events (e.g., S3, SNS). However, if pause/resume is something you’d like to make easy for those sources, you could use AWS Lambda, the glue of AWS.
For example, suppose you currently have events flowing like this:
and you want to be able to pause the Lambda function, without losing S3 events.
Insert a trivial new Lambda(pipe) function that reposts the S3/SNS events to a new Kinesis stream like so:
and now you can pause the last Kinesis->Lambda mapping while saving S3/SNS events in the Kinesis stream for up to 7 days, then resume where you left off.
I still like my “pause Lambda” brainstorming idea of updating the AWS Lambda function code to simply sleep forever, triggering a timeout error after 5 minutes and causing the Kinesis/Lambda framework to retry the function call with the same data over and over until we are ready to resume by uploading the real code again, but Steve’s discovery is going to end up being somewhat simpler, safer, and cheaper.