OutageOdyssey / sample_incidents.md
kshitijthakkar
Docker file fix
701d26a

The Case of the Phantom Pepperoni: A NullPointerException Saga
Incident Summary: On 2025-06-05T18:30:00Z, the PizzaTrackerService at CheesyBytes Inc. (a food delivery app) began reporting "Phantom Pepperoni" orders due to a NullPointerException in the calculateOvenTime() method. The incident caused 42% of pizzas to bake indefinitely, triggering smoke alarms at 3 pizzerias and confusing delivery drones that circled HQ for hours.


๐Ÿ”ฅ Affected Services

  • PizzaTrackerService (critical path for order processing)
  • OvenScheduler (automated baking system)
  • DroneNavigation (relied on bake-time ETA)

๐Ÿ“‰ Violated KPIs

  1. Order Accuracy: Dropped to 58% (target: 99.9%)
  2. Delivery Time: Increased to โˆž minutes (target: 30 mins)
  3. Customer Satisfaction: Fell to "Why is my pizza literally on fire?"

๐Ÿšจ Critical Alerts

[18:30:00] CRITICAL: NullPointerException at  
CheesyBytes.PizzaTrackerService.calculateOvenTime(Pizza.java:127)  
- Failed to invoke 'getToppingConfig()' on null object reference  
[18:31:23] ALERT: Oven #7 reported "unusual cheese flare-up"  
[18:35:45] WARNING: Drone fleet stuck in loop singing "Never Gonna Give You Up"  

๐Ÿ” Forensic Data

Stack Trace:

java.lang.NullPointerException: Cannot invoke "ToppingConfig.getCookingTime()"  
because the return value of "Pizza.getToppingConfig()" is null  

Variables at Fault:

  • currentPizza.getToppingConfig(): null
  • oven.preheat(): Called with null temperature value

Log Anomalies:

  • 127 instances of Pizza{name='Phantom Pepperoni', config=null}
  • Drone logs showed 694 attempts to calculate route to "NaN,NaN"

โฐ Event Timeline

  1. 18:29:55: Deployment of "Pepperoni++" v2.1.3 (removed null-check in PizzaFactory)
  2. 18:30:01: First NullPointerException observed
  3. 18:32:00: OvenScheduler began interpreting null bake time as "MAX_INT minutes"
  4. 18:40:00: Security cameras captured engineers offering burnt pizzas to a NPE gremlin drawn on the whiteboard

๐Ÿ› ๏ธ Resolution

The team:

  1. Rolled back to v2.1.2 (which had a if (config != null) check)
  2. Deployed @NonNull annotations on all getToppingConfig() calls
  3. Hired a "NullPointerException Prevention Clown" for code review parties

Post-Incident Confession:
A developer later admitted: "I thought pepperoni didnโ€™t need configs. Itโ€™s just meat confetti!"


Clues for Root Cause Hunters:

  • The deployment removed a null-check that previously handled legacy pizza types
  • Forensic data shows Phantom Pepperoni had no corresponding entry in the ToppingConfig database
  • The Pizza object constructor allowed toppingConfig to default to null if unspecified

The answer lies in the cheese... or the lack thereof. ๐Ÿง€๐Ÿ‘ป

[1] https://sentry.io/answers/what-is-a-nullpointerexception-and-how-do-i-fix-it/ [2] http://support.sas.com/kb/47290 [3] https://www.weblineindia.com/blog/fix-nullpointerexception-in-java-with-examples/ [4] https://stackoverflow.com/questions/218384/what-is-a-nullpointerexception-and-how-do-i-fix-it [5] https://www.youtube.com/watch?v=2b22PiQx8xc [6] https://www.digitalocean.com/community/tutorials/java-lang-nullpointerexception [7] https://www.harness.io/blog/java-nullpointerexception-solving-it [8] https://howtodoinjava.com/java/exception-handling/how-to-effectively-handle-nullpointerexception-in-java/ [9] https://forum.step.esa.int/t/java-null-pointer-exception/40829 [10] https://blogs.oracle.com/fusionhcmcoe/post/autocomplete-java-null-pointer-exception


The Great Bread Uprising: A Foreign Key Fiasco
Incident Summary: On 2025-06-06T08:15:00Z, DoughyDelights Bakeryโ€™s inventory system triggered a SQL error 1451 after attempting to delete the "Sourdough Starter" recipe. This caused 12,000 baguettes to morph into pretzels in delivery apps, stranded 45 delivery drivers at "Bread Narnia" coordinates, and spawned customer complaints like "My ciabatta is judging me."


๐Ÿฅ– Affected Services

  • RecipeRegistry (core recipe database)
  • BreadGenerator (batch baking scheduler)
  • DeliveryPathfinder (GPS routing tied to recipe IDs)

๐Ÿ“‰ Violated KPIs

  1. Inventory Accuracy: Dropped to -7% (target: 99.5%)
  2. Waste Metrics: Increased by 420% (target: โ‰ค5%)
  3. Driver Sanity: Replaced with "Why am I delivering negative bread?"

๐Ÿšจ Critical Alerts

[08:15:02] CRITICAL: SQL Error 1451 - "Cannot delete parent row: foreign key constraint fails"  
[08:16:10] ALERT: Baker #3 reported "rye dough singing *Never Bake Alone*"  
[08:20:00] WARNING: Delivery maps now route to /dev/null  

๐Ÿ” Forensic Data

Error Log:

ERROR 1451: Cannot delete or update a parent row: a foreign key constraint fails  
(`DoughyDelights`.`orders`, CONSTRAINT `fk_recipe_id` FOREIGN KEY (`recipe_id`)  
REFERENCES `RecipeRegistry` (`id`))  

Offending Query:

DELETE FROM RecipeRegistry WHERE recipe_name = 'Sourdough Starter';  

Database Snapshot:

  • Orphaned Orders: 14,892 orders referencing recipe_id=NULL
  • GPS Anomalies: 67 drivers attempted deliveries to POINT(โˆ…,โˆ…)

โฐ Event Timeline

  1. 08:14:55: Overzealous intern executed "Legacy Recipe Purge" script
  2. 08:15:01: First foreign key violation detected
  3. 08:17:30: BreadGenerator interpreted missing recipes as "pretzel mode"
  4. 08:25:00: Surveillance footage showed CFO trying to bribe a baguette

๐Ÿ› ๏ธ Resolution

The team:

  1. Restored Sourdough Starter from backup (with ON DELETE RESTRICT added)
  2. Deployed regex filter blocking DELETE commands containing "starter"
  3. Hosted a "Foreign Key Appreciation Day" with constraint-themed cupcakes

Post-Incident Confession:
The intern later tweeted: "#YOLO DELETE statements are the gluten-free option of SQL."


Clues for Root Cause Detectives:

  • The orders table had a foreign key dependency on RecipeRegistry.id [3]
  • No ON DELETE CASCADE clause existed in the schema
  • Audit logs showed 127 pre-incident warnings about "unreferenced recipe deletions"

The proof is in the foreign pudding. ๐Ÿฎ๐Ÿ”‘

[1] https://www.linkedin.com/pulse/10-common-bugs-sql-query-zita-demeter-yumuc [2] https://www.ibm.com/docs/en/idr/11.4.0?topic=sm-sql-error-codes-nnnn [3] https://ai2sql.io/common-sql-error-codes [4] https://learnsql.com/blog/five-common-sql-errors/ [5] https://docs.intersystems.com/latest/csp/docbook/DocBook.UI.Page.cls?KEY=RERR_sql [6] http://blog.solvaria.com/top-common-errors-sql-oracle [7] https://www.stratascratch.com/blog/top-most-common-sql-coding-errors-in-data-science/ [8] https://www.metabase.com/learn/sql/debugging-sql/sql-syntax

The Curious Case of the Exploding Cupcake Counter: An Integer Overflow Incident
Incident Summary: On 2025-06-06T14:22:00Z, the CupcakeCounter microservice at SweetStats Analytics experienced a catastrophic integer overflow. This resulted in the system reporting negative cupcake sales, causing the finance dashboard to display โ€œYou owe the universe 2,147,483,648 cupcakes.โ€ The marketing team briefly launched a โ€œCupcake Debt Forgivenessโ€ campaign before the error was traced.


๐Ÿง Affected Services

  • CupcakeCounter (real-time sales tally)
  • FinanceDashboard (exec-level reporting)
  • RewardsEngine (customer loyalty points)

๐Ÿ“‰ Violated KPIs

  1. Sales Accuracy: -2,147,483,648% (target: 100%)
  2. Reward Points Issued: 0 (target: 1 per cupcake)
  3. Dashboard Uptime: 78% (target: 99.9%)

๐Ÿšจ Critical Alerts

[14:22:01] CRITICAL: ArithmeticException: integer overflow in CupcakeCounter.updateSales()  
[14:22:03] ALERT: FinanceDashboard shows negative cupcake revenue  
[14:23:10] WARNING: RewardsEngine issued 0 points for 2 million cupcake purchases  

๐Ÿ” Forensic Data

Error Log:

java.lang.ArithmeticException: integer overflow  
  at SweetStats.CupcakeCounter.updateSales(CupcakeCounter.java:88)
  at SweetStats.SalesService.processOrder(SalesService.java:45)

Variable State:

  • int totalCupcakesSold = 2147483647 (max value for signed 32-bit int)
  • Next sale increments value to -2,147,483,648

Database Snapshot:

  • sales_total column in CupcakeSales table: -2,147,483,648
  • Loyalty points for user cupcake_queen: 0 (expected: 1,000,000)

โฐ Event Timeline

  1. 14:21:55: โ€œCupcake Maniaโ€ flash sale begins
  2. 14:22:00: Record-breaking 2 million orders in 5 seconds
  3. 14:22:01: Integer overflow triggers ArithmeticException
  4. 14:23:30: FinanceDashboard replaced currency symbols with crying emojis
  5. 14:25:00: CEO tweets โ€œWe are now in negative cupcakes. Please eat responsibly.โ€

๐Ÿ› ๏ธ Resolution

The team:

  1. Migrated int to long in all cupcake counters
  2. Added overflow detection and alerts
  3. Sent apology cupcakes to all affected customers (with apology sprinkles)

Post-Incident Confession:
A developer admitted: โ€œWho knew there were that many cupcake lovers? I thought 2 billion was enough for everyone!โ€


Clues for Root Cause Sleuths:

  • CupcakeCounter used a signed 32-bit integer for sales totals
  • No overflow checks or exception handling
  • Database column type matched Java int
  • Flash sale volume exceeded maximum representable value

The mystery is baked in the numbers. ๐Ÿง๐Ÿ’ฅ


The Great Database Traffic Jam: SQLTimeoutException Tango

Incident Summary:
On 2025-06-06T16:45:00Z, the OrderProcessor microservice at GadgetGuru Inc. (an e-commerce platform for quirky gadgets) experienced a severe SQLTimeoutException during a flash sale on "Self-Stirring Coffee Mugs." The database connection pool ran dry, causing 15,000 pending orders to queue indefinitely and triggering a wave of confused support tickets titled "Why is my coffee mug still manual?"


๐Ÿ›’ Affected Services

  • OrderProcessor (order placement and payment)
  • InventoryTracker (real-time stock management)
  • NotificationService (order confirmation emails)

๐Ÿ“‰ Violated KPIs

  1. Order Processing Time: Spiked to 9999 seconds (target: 2 seconds)
  2. Successful Order Rate: Dropped to 8% (target: 99.9%)
  3. Customer Satisfaction: "My coffee is getting cold waiting for my self-stirring mug!"

๐Ÿšจ Critical Alerts

[16:45:01] CRITICAL: SQLTimeoutException in OrderProcessor.processOrder()
[16:45:02] ALERT: Connection pool exhausted (maxConnections=50, active=50, waiters=14,987)
[16:46:30] WARNING: NotificationService backlogged with 9,000 unsent emails

๐Ÿ” Forensic Data

Error Log:

com.mysql.jdbc.exceptions.jdbc4.MySQLTimeoutException: Connection timed out after 30 seconds
  at OrderProcessor.processOrder(OrderProcessor.java:73)
  at OrderService.submitOrder(OrderService.java:45)

Connection Pool Metrics:

  • Max Connections: 50
  • Active Connections: 50
  • Waiting Requests: 14,987 at peak
  • Average Query Duration: 32 seconds (normal: <0.5s)

Database Snapshot:

  • Locked Rows: 23,000 (due to long-running transactions)
  • CPU Utilization: 98%
  • Deadlock Events: 0 (not a deadlock, just a traffic jam)

โฐ Event Timeline

  1. 16:44:50: Flash sale on "Self-Stirring Coffee Mugs" begins
  2. 16:45:00: Order volume spikes to 20,000 requests per minute
  3. 16:45:01: First SQLTimeoutException observed
  4. 16:46:00: Connection pool fully exhausted, orders start queuing
  5. 16:50:00: Support team overwhelmed with "Whereโ€™s my mug?" tickets
  6. 16:55:00: Engineers panic and try to "stir" the database with a spoon

๐Ÿ› ๏ธ Resolution

The team:

  1. Increased the database connection pool size from 50 to 500
  2. Optimized long-running queries and added query timeouts
  3. Implemented rate limiting for flash sales
  4. Hosted a "Connection Pool Appreciation Brunch" with free coffee (manually stirred)

Post-Incident Confession:
A database admin later admitted: "I thought 50 connections was enough for everyone. Turns out, everyone wants self-stirring mugs!"


Clues for Root Cause Investigators

  • Connection pool size was static and too small for flash sale traffic
  • Long-running queries blocked connections, causing a backlog
  • No rate limiting or circuit breakers were in place for peak events
  • Monitoring did not alert on pending request queues until after the pool was exhausted

The culprit? A coffee mug that stirred up more trouble than coffee! โ˜•๐Ÿ›‘