How I Helped a Company Investigate Missed Payments Using PayPal APIs and New Relic
In the fast-paced world of digital transactions, one missed payment can set off a chain of support tickets, frustrated users, and accounting inconsistencies. Recently, I was brought in to assist a company that was experiencing a troubling number of “payment not received” reports, despite successful charges on PayPal. My task? Get to the root cause, fix it, and help prevent it from happening again.
The Problem
The company’s customer support team flagged several user reports about paid subscriptions not unlocking access on the platform. Initially thought to be edge cases, the volume increased over time. The team had no visibility into what was going wrong—only that something was off between PayPal and their system.
Step 1: Reconstructing the Payment Flow
The first thing I did was map out the full payment lifecycle:
- User initiates a subscription or one-time payment.
- PayPal processes the payment.
- A webhook is sent to the system to confirm payment.
- The system activates the service for the user.
Each of these steps had a potential failure point.
Step 2: Pulling Data with PayPal APIs
Using PayPal’s Reporting and Subscription APIs, I wrote scripts to pull the transaction history programmatically. I focused on:
- Subscription state (ACTIVE, CANCELLED, etc.)
- Related transactions, including time, amount, and status
- Webhook event logs (where possible)
I also pulled all transactions over a period and grouped them by subscription ID, comparing them with our internal user records. This helped identify where records diverged—cases where PayPal had successfully charged the user, but our system didn’t reflect that.
Step 3: Real-Time Tracing with New Relic
To correlate the PayPal data with internal application logs, I used New Relic. I set up:
- A query to identify 400/500 errors on webhook endpoints
- Filters to surface
paymentMethod
,subscription
, andwebhook
traffic - Tracing for the
ipn
andwebhook
routes to determine if/when they were called
Step 4: Definig the Root Cause, - a Cascade of Silent Failures
The investigation uncovered several key issues:
- Webhooks occasionally failed silently — returning 500 without alerting anyone.
- Missing backfill process — there was no retry mechanism to pull PayPal state in case a webhook failed.
- Most critically: the event processing server responsible for handling webhooks was frequently crashing due to out-of-memory (OOM) events. These crashes led to ungraceful termination of processes in the middle of webhook processing. Payments were being marked as “initiated” but never completed — causing data inconsistencies and missing service activations.
This memory issue had gone unnoticed until I set up proper observability. Once real-time metrics and error traces were in place, the OOM patterns became obvious, and I could correlate those events directly to failed or incomplete webhook handling.
Step 5: Fix and Monitor
To address all these pain points, I implemented the following:
- Logged every webhook with detailed status codes and PayPal event IDs.
- Introduced a retry and backfill mechanism: if a subscription is in an uncertain state, the system now queries PayPal directly to verify status.
- Set up memory usage and system health dashboards in New Relic, including OOM alerts.
- Reconfigured the webhook processing system to ensure graceful shutdowns and reduce memory consumption.
- Improved visibility: dashboards in New Relic now show real-time webhook activity, failure rates, and unmatched payment states.
Results
- Immediate resolution of over 95% of reported cases
- Stable webhook processing system — no more silent kills or data gaps
- Proactive alerts for webhook failures and system health
- Reduced support workload due to fewer missing payment reports
- New SOP for payment reconciliation and infrastructure monitoring
Key Findings
One of the recurring patterns I see across companies and development teams is that they often have powerful tools like New Relic already installed — and are paying thousands of dollars annually for them — but lack the expertise to configure and interpret them properly. As a result, these tools sit underutilized, offering little more than surface-level dashboards. Without meaningful instrumentation, alerting, or trace correlation, teams are effectively flying blind despite having all the right equipment on board. Unlocking the real value of these platforms requires not just installation, but intelligent integration into the development and operations workflow.
If your platform handles recurring billing or online payments, silent errors can be expensive. Don’t wait for your users to report problems. Instead, build observability into your transaction flows — and if you’re not sure how, I can help.
Ready to build something amazing?
Let's discuss how I can help bring your software vision to life.
Hire MeBonus
— Dmitry R., CWD | CatWatchdog
ˁ˚ᴥ˚ˀ
If you want to know more about the details, don’t hesitate ask in the chat which is available for subscribers. I’ll also do extended version for this case-stud with more hardcore details, describing which architectural patterns are used in this case, how and why they are working.
Subscribe to CatWatchdog
Get professional insights on software development, architecture, and tech team leadership—delivered to your inbox.
Subscribe to free subscription to get more exclusive content. First 100 subscribers will get early access to the book from Blueprint package