Managing Lambda Disk Space: Practical Guide to Ephemeral Storage and EFS in AWS Lambda

When building serverless applications with AWS Lambda, disk space is a factor often overlooked in favor of memory, CPU, and invocation limits. Yet the way Lambda handles temporary storage directly affects how efficiently you can process files, stream data, and manage intermediate results. This guide explains the concept of lambda disk space, how ephemeral storage works, and the best practices to store larger datasets securely and cost-effectively without sacrificing performance.

Understanding the Lambda disk space model

Lambda functions run inside containers that start with a writable filesystem. The only writable portion that persists across executions is the ephemeral storage mounted at /tmp. This storage is fast and local to the running environment, but its lifecycle is tied to the execution container. If a container is reused for subsequent invocations, /tmp remains available; if the container is recycled, the data in /tmp is lost. This model makes lambda disk space ideal for temporary intermediates, scratch work, or small caches—not as a substitute for long-term storage.

Beyond /tmp, there is no built-in, persistent local disk space in Lambda for long-lived data. For anything larger or that must survive across invocations, developers typically rely on external services such as Amazon S3 or Amazon EFS. Those services provide scalable storage with clearly defined durability and access patterns, while keeping the Lambda function stateless from a data perspective.

Ephemeral storage: limits, defaults, and configuration

As of recent updates, AWS Lambda supports configurable ephemeral storage up to 10 GB per function. The default is 512 MB, which is sufficient for lightweight tasks but quickly becomes a bottleneck for larger temporary datasets or heavy file processing. You can adjust this setting when you create or update a function, either through the AWS Console or via the AWS CLI/SDK by specifying the desired amount of ephemeral storage.

Key points to remember about lambda disk space and ephemeral storage:

Location: /tmp inside the Lambda execution environment.
Lifetime: Tied to the container; data can persist across invocations if the container is reused, but there is no guarantee of retention.
Limits: Up to 10 GB of ephemeral storage per function.
Usage pattern: Best for temporary files, working datasets, and intermediate results not needed after the invocation ends.

When to increase ephemeral storage

Consider increasing lambda disk space when your workload involves substantial intermediate files or large temporary datasets. Examples include:

Image and video processing that creates large temporary files during transformation or encoding
Data parsing and aggregation tasks that generate sizable intermediate outputs
Batch operations that download multiple files from S3 to /tmp before processing
Complex ETL tasks with multiple streaming stages that write intermediate results to disk

Raising ephemeral storage can reduce the need to shuttle data back and forth to external services during a function execution, which often lowers latency and simplifies error handling. However, it should not be treated as a substitute for scalable storage patterns. If a task routinely touches very large datasets, you’ll likely benefit from attaching EFS or streaming to S3 rather than maximizing /tmp usage.

Storing data beyond /tmp: S3 and EFS

To handle data that cannot fit into lambda disk space or needs to persist beyond a single invocation, two main options exist: Amazon S3 and Amazon EFS. Each has its own strengths and trade-offs.

Amazon S3: Ideal for input and output data, backups, and long-term storage. S3 is highly scalable, durable, and cost-effective. Use streaming techniques or multipart uploads to process large datasets without accumulating them in /tmp.
Amazon EFS: Provides a POSIX-compliant, scalable file system that can be mounted directly by Lambda functions. EFS is suitable for workloads that require shared storage across concurrent invocations, access from multiple functions, or stateful processing. It is slower than local /tmp but offers persistent storage between invocations and across containers.

Using EFS with Lambda: setup and considerations

Attach an Amazon EFS file system to your Lambda function when you need sustained storage that survives across executions. A typical setup includes mounting the EFS file system inside your function at a path such as /mnt/efs. Here are the essential steps:

Create an EFS file system in the same VPC as your Lambda function and configure appropriate security groups.
Create an EFS Access Point if you want a stable, scalable mount point with a defined root directory for your Lambda workloads.
In the Lambda configuration, enable VPC access and specify the VPC subnets and security groups that can reach the EFS mount targets.
Mount the EFS path inside your function code (for example, /mnt/efs) and read/write files as needed during the function’s lifetime.

Consider performance implications: EFS throughput scales with the size of data and provisioned throughput if you opt for a specific mode. For cost management, prefer access points and appropriate performance modes, and monitor EFS usage to avoid under- or over-provisioning.

Best practices for lambda disk space management

Use streams to process large files or datasets directly from S3 or from other services, reducing the need to store everything in /tmp.
Write only what you truly need temporarily, and delete files as soon as they are no longer required. Implement cleanup logic in finally blocks or via proper error handling.
Use S3 for long-term storage and large datasets; use EFS when multiple invocations or functions need shared, persistent storage; reserve /tmp for ephemeral data.
Increasing memory also increases CPU power, which can shorten processing time and reduce the overall volume of intermediate data written to disk, but it does not magically increase disk space.
Design functions so that they don’t rely on local state across invocations. This minimizes pressure on lambda disk space and simplifies scaling.
If you encounter “No space left on device” errors, switch to streaming or external storage, or temporarily throttle concurrency to prevent simultaneous heavy writes.

Common scenarios and practical planning tips

For developers tackling file-heavy tasks, a practical plan helps balance lambda disk space and performance:

Estimate peak temporary data: If a single invocation processes a batch of files totaling several hundred megabytes, consider elevating ephemeral storage or moving intermediate steps to S3/EFS.
Think about concurrency: High concurrency increases aggregate ephemeral storage pressure. When possible, stagger processing or scale reads to avoid simultaneous writes to /tmp.
Leverage parallelism with external storage: When results must be aggregated, write partial results to S3 or EFS and consolidate after processing completes.
Profile and observe: Use logs and metrics to measure disk usage during typical runs. If you regularly hit capacity, adjust your architecture rather than hack around storage limits.

Monitoring and troubleshooting lambda disk space usage

Tracking lambda disk space usage isn’t as straightforward as monitoring a dedicated metric for ephemeral storage, but you can gain insight by instrumenting your code and using CloudWatch together with careful logging. Some practical steps include:

Log the size of files written to /tmp during each invocation and alert when usage approaches the configured limit.
Use custom metrics to track peak /tmp usage per function and per batch of invocations.
Watch for failures that indicate storage exhaustion, such as “No space left on device” or unexpected I/O errors, and have a fallback path to S3 or EFS.

Conclusion

Understanding lambda disk space is essential for building robust, scalable serverless applications. By recognizing the limits of the default ephemeral storage, leveraging up to 10 GB when needed, and choosing S3 or EFS for larger or persistent data, you can design functions that are both fast and reliable. With careful planning, proper configuration, and thoughtful data flow patterns, lambda disk space becomes a strength rather than a bottleneck, enabling clean, maintainable code and a smoother user experience for your applications.