Provisioned Concurrency is a Lambda feature and works with any trigger. The limit applies to all functions in the same region and is set to 1000 by default. In Lambda functions' concurrency is the number of instances that serves requests at a given time. The way it works is pretty straight-forward. According to AWS, provisioned concurrency "keeps functions initialized and hyper-ready to respond in double-digit milliseconds." Choose Reserve concurrency. In AWS Provisioned Concurrency can be enabled for a specific Lambda function version (or alias pointing to a version), so without versionFunctions it will likely not work as expected. However, you can request a given number of workers to be always-warm and dedicated to a specific Lambda. The provisioned concurrency can be set manually from the AWS Console. AWS Lambda automatically scales up until the number of concurrent function executions reaches 1000. Scheduled scaling increases provisioned concurrency in the anticipation of some peak traffic. As the name implies, functions using Provisioned Concurrency are pre-provisioned in advance and ready to serve requests as they come in. "dev" and I've configured provisioned concurrency for it but it is not taking effect. Using provisioned concurrency puts us back in the game of rightsizing and capacity planning, one of the big things we want to avoid when we choose serverless in the first place. To throttle a function, set the reserved concurrency to zero. Lambda starts allocating provisioned concurrency after a minute or two of preparation. Similar to how functions scale under load, up to 3000 instances of the function can be initialized at once, depending on the Region. After the initial burst, instances are allocated at a steady rate of 500 per minute until the request is fulfilled. Lets take a simple example of a company, where employees work from 9 to 5, so request rates will be higher during this time. You can then setup provisioned concurrency on that lambda alias and the routed requests should be handled by your provisioned instances. Provisioned concurrency may be configured on a chosen version of your function, or an alias. You can reserve up to the Unreserved account concurrency value that is shown, minus 100 for functions that don't have reserved concurrency. Are you sure these are really cold start latencies rather than problem with database connection? Have you think of using X-Ray for tracing? A colleague of mine ran a test to figure out what is going on here and the cloudwatch logs are misleading. You can scale your Lambda in many different ways, eg: a) start all possible instances, b) scale up 60 additional instances per minute to a maximum of 1,000 concurrent invocations (with SQS), c) set provisioned concurrency to always have min. It is also imperative that you set a minimum concurrency of 5 on your processing Lambda function due to the initial scaling behavior. My lambda is throwing higher response time. Lambda: Provisioned concurrency not working as expected. It should be run in an environment with proper AWS credentials to create and modify AWS lambda resources. After you enable Provisioned Concurrency, Lambda will provision the requested number of concurrent executions. Provisioned Concurrency can only be set to Lambda function that has the version ( $Latest is not accepted). Alias is the name of Provisioned Concurrency you want to set. Technically you can define multiple Provisioned Concurrencies and switch them depending on your use case. It's been reported to AWS. Provisioned Concurrency is a new feature of AWS Lambda in Serverless. What does it do? It minimizes the estimate of cold starts by generating execution environments ahead of usage completely up to running the initialization code. It reduces the time spent on APIs invocations tremendously. Provisioned Concurrency is very easy to use. If you change the version that an alias points to, Lambda deallocates the provisioned concurrency from the old version and allocates it to the new version. How to engage Provisioned Concurrency. In AWS Lambda, a cold start refers to the initial increase in response time that occurs when a Lambda function is invoked for the first time, or after a period of inactivity. Activating Provisioned Concurrency. I very clearly see cold start latency unless I run the function several times within a short period of time. This can take a minute or two, and you can check on its progress in the meantime. 