G
Gurudev Prasad Teketi
Guest
Overview
In this article, I walk you through building an automated reporting pipeline using AWS services. The goal was to generate daily summary reports on publisher readership, detect missing metadata (like publishers), store the results as CSVs in S3, and deliver structured Slack notifications to internal stakeholders β all without manual intervention.
Architecture Diagram

Workflow:
1.Amazon EventBridge (Scheduler) triggers the workflow daily.
2.Lambda Function #1: Report Generator
- Runs a named query on Amazon Athena that calculates reading activity data.
- Stores results in an S3 bucket using a structured naming convention.
- Sends a Slack message once the report is ready.
3.Lambda Function #2: Publisher Summary Aggregator
- Fetches the latest CSV report from S3.
- Aggregates per-publisher read counts by quarter.
- Posts a clean, readable table to Slack showing publisher performance.
4.Lambda Function #3: Missing Publisher Detector (optional)
- Runs a separate Athena query to find books with missing publisher info.
- Sends a notification to Slack with a direct link to the generated file in S3.
Project Structure
Code:
publisher-reporting/
β
βββ deploy.sh # Infra provisioning and Lambda packaging
βββ config.yaml # Runtime config (bucket names, cron schedules)
β
βββ lambda/
β βββ report_generator/
β β βββ handler.py # Runs Athena query and stores report in S3
β βββ summary_report_notifier/
β β βββ handler.py # Aggregates data and posts to Slack
β βββ missing_publisher_report/
β βββ handler.py # Detects and reports missing metadata
β
βββ terraform/ # Infra as Code (Lambda, EventBridge, IAM, etc.)
Code Snippets
Lambda: Report Generator
Code:
exec_response = athena.start_query_execution(
QueryString=query_string,
QueryExecutionContext={"Database": ATHENA_DATABASE},
ResultConfiguration={
"OutputLocation": f"s3://{ATHENA_OUTPUT_BUCKET}/temporary-athena-query-results/"
}
)
...
s3.copy_object(
Bucket=TARGET_REPORT_BUCKET,
CopySource={"Bucket": source_bucket, "Key": source_key},
Key=final_key
)
Lambda: Slack Summary Formatter
Code:
for row in reader:
if row["book_read_counts"].strip().upper() == "TRUE":
yq = row["year_quarter"].strip()
pub = row["publisher"].strip() or "Unknown"
counts[yq][pub] += 1
Scheduling
Code:
+------------------------------+------------------------------------------+-------------------------+
| Lambda Function | Purpose | Schedule (Cron Format) |
+------------------------------+------------------------------------------+-------------------------+
| report_generator | Run Athena query and save CSV to S3 | cron(0 0 * * ? *) |
| summary_report_notifier | Read latest CSV and post Slack summary | cron(10 0 * * ? *) |
| missing_publisher_report | Detect books without publisher info | cron(15 0 * * ? *) |
+------------------------------+------------------------------------------+-------------------------+
# Notes:
# - Times are in UTC
# - 10β15 min stagger prevents overlap and race conditions
Challenges & Fixes
1.Slack showed same data daily
Issue:
Slack message was posting the same publisher data every day.
Fix:
Code:
EventBridge schedule was set to run only on the 1st of each month.
Updated to run daily using: cron(0 0 * * ? *)
2.No new publisher summary files after Aug 1
Issue:
S3 bucket had no updated files after August 1st.
Fix:
Code:
Found that the publisher report generator Lambda was not running.
Corrected the EventBridge schedule to trigger daily.
3.Lambda race condition
Issue:
The Slack posting Lambda was sometimes reading an older CSV file instead of the one just generated.
Fix:
Code:
Introduced a 10-minute delay between the generator Lambda and the Slack reporter Lambda using separate schedules.
4.Slack output was hard to read
Issue:
The publisher read counts in Slack were misaligned and difficult to follow.
Fix:
Code:
Formatted the message using Slack-compatible triple backticks (```
...
```) to show preformatted blocks.
5.S3 bucket getting cluttered
Issue:
Temporary Athena result files were crowding the output bucket.
Fix:
Code:
Moved results to a 'temporary-athena-query-results/' folder prefix
Added a lifecycle policy to auto-delete them after 3 days.
Impact
- Eliminated all manual report generation
- Improved team visibility into reader engagement
- Ensured scalable, serverless infrastructure using AWS best practices
- Automated alerts improved issue tracking and data consistency
This project was a great example of combining Athena, Lambda, S3, and EventBridge into a cost-efficient, automated reporting pipeline. If youβre working with serverless data workflows, this pattern is easily adaptable to product analytics, user activity tracking, sales dashboards, and more.
Continue reading...