Advancing SRE Observability with language models & cloud dependency graphs

By: Abhishek Jain, Ph.D., and Laurence Yang, Chief Data & Analytics Office, and Kritika Mehta, Consumer & Community Banking Technology

In the world of cloud infrastructure and microservices, Incident Root-Cause analysis (RCA) and resolution involves navigating and analyzing a multitude of data sources across time-series, text, and verbal dimensions. This process, which is difficult to automate, requires the manual work of experienced engineers. 

As companies increase their infrastructure and cloud capabilities, they need a reliable and effective Observability method for efficient IT incident resolution and operations. At DEVUP, we had an opportunity to present a way of addressing this need.  

Our objectives were to: 

  • Assist cloud Site Reliability Engineering teams with Root-Cause Analysis of incidents, thereby enhancing Observability and reducing service level agreements (SLAs). 
  • Utilize data from different incident dimensions (text, time-series) to provide detailed insights and recommendations to the SRE engineers. 
  • Leverage Classical and Generative Machine Learning techniques to enhance solutions. 

To achieve these objectives, we proposed a parallelized workflow. With both historical incident text data and anomalous cloud telemetry data, we generated concrete insights to engineering-related incidents for faster resolution times.  

Natural Language Processing (NLP) techniques helped us match user query (incident symptom) with historical incident data and provide recommendations to SRE based on top-matched identified root-causes. In this approach, we utilized Language models for root cause extraction and embedding generation from unstructured incident text, and Clustering algorithms for aggregating symptoms and root causes. 

Our other approach utilized time-series analysis to identify application services and events that are most correlated with user-identified incidents. A cloud dependency graph helped to determine the root cause of the issue, achieved as follows: 

  • Given a customer’s cloud application architecture, a Cloud Dependency Graph is first created and stored in a graph database. This graph has information regarding various cloud services and endpoints hosted by the application.  
  • During, or right after a service incident, the RCA application begins walking the graph from the affected incident, collecting telemetry data from cloud service metrics.  
  • Anomaly detection machine learning algorithms, such as Isolation Forests, identify and correlate anomalies between ‘walked’ services. Using statistical analysis, root-cause metrics are computed and presented to SRE in the same interactive UI used by RCA-NLP results.  

This novel approach enhances Observability for SRE teams by not only providing incident root-cause insights as a post-mortem step, but also mitigating incident resolution times by directing SRE’s focus on only the incident-affected services and metrics.  

It was a pleasure sharing these methods with our JPMorganChase engineering community. 

Evangelizing secure coding via hacker exploit techniques

By: Jim Maples, Mark Davison, and Jamison Trefzger, Secure Development Team

Evangelizing anything requires passion, a characteristic abundantly shared by every team member presenting at DEVUP. So, when we present the Vulnerability Impact Lab, rather than needing to convince, it is more like “preaching to the choir” – a very enthusiastic choir of engineers, developers, and technical experts. 

It is essential that every member of this carefully curated community of colleagues (penalty flag for gratuitous alliteration) understands the mechanical how-to’s of software vulnerability exploits. Learning exploit techniques and adopting an “attacker mindset” are crucial for software engineers’ creation of robust coding defenses. With this understanding, the Secure Development team created the Vulnerability Impact Lab to provide this hands-on, experiential training. 

The Vulnerability Impact Lab delivered both exploit technique and the attacker mindset in a manner that is approachable even to participants without a developer background. Participants were guided step by step through the process of exploiting vulnerabilities and saw the results of their activities in real time. Some engineers were astonished as to just how easily a hypothetical application could be exploited is and how devastating the results could be for an organization with a vulnerable app. One thing’s for certain; they won’t look at apps – the ones they develop or the ones they use – the same way ever again.

From code to metadata: how metadata-driven APIs transform development

By: Chandra Mana, Grace Forciea, and Gaurav Gupta, Global Banking Technology

In today's digital landscape, APIs play a vital role in connecting diverse software systems. As technology continues to evolve, so does our approach to API development. At DEVUP, we explored the revolutionary framework of metadata-driven API development and how it can transform the way we create APIs in our lecture, “From Code to Metadata: APIs Built Better!” 

The metadata-driven API development framework is a game-changer in the world of APIs. With this framework, developers can generate secure, fully operational APIs without writing a single line of code or worrying about deployment. All it takes is a few simple metadata data configurations. This framework saves time and resources while maintaining high-quality standards, making API development accessible and enjoyable. 

This framework is deployed in various business use cases, providing various means of consumption and distribution of data from multiple platforms. The goal of the framework is to provide a single version of the truth to all users, delivered by reliable, secure, and modern Data APIs and publications. 

Our metadata-driven API framework is currently used by more than one hundred applications across wholesale banking business, receiving more than 60 million calls per month and reaching over 80,000 users. Leveraging the metadata-driven API development framework simplifies our architecture and streamlines business costs. 

It was a pleasure presenting this platform at DEVUP, sharing our success story and educating our colleagues in other areas of the bank on what can be accomplished with an intuitive approach to API development.

Fitness tracker for engineering - Optimize your development team’s fitness and accelerate your satisfaction

By: Gordon Murphy, Priyanka Gangurde, Valeria Arce, Markets Technology

Fitness Technology has revolutionized the way people take ownership of their health, fitness and mental wellness. By providing on-demand, real-time access to workout plans, nutritional advice, and personalized mental and training programs, fitness technology is helping us be healthier and happier. 

It’s designed to keep us close to our goals, boost our motivation, and show our health progress and “Fitness Predictability,” which the human brain loves and craves.  

In the same context, engineers in Markets Technology have revolutionized how we approach software engineering by providing flow metrics and actionable insights for over 700 teams. With patterns revealed from current and historical data, teams can increase their ability to predict the future and thwart any risks. This creates a better experience. 

As we pursue engineering excellence, we’re always striving to produce better solutions for our clients in the fastest, most efficient way possible. That’s why we developed team analytics to help us streamline development, increase client satisfaction, and make our development teams happier, by putting control where it belongs--back with the developers. 

Inspired by the methodologies of elite sports teams, we took our colleagues at DEVUP on an immersive experience of audio, video, interactive gaming and successful use cases to highlight the intricacies of flow analysis. From computation methodologies to actionable insights, we demonstrated how team analytics prompts data-driven decision making, optimizing resource allocation and process efficiency.  

Through the adoption of team analytics software, we explore how flow metrics become tangible, actionable data points, enabling development teams to optimize their workflows and achieve peak performance. By seamlessly integrating these analytics into their processes, teams can foster a culture of autonomy and continuous improvement, leading healthy and happier teams and customers alike. 

Lori Beer kicks off DEVUP 2024

By: Catherine Livigni, Global Technology Communications

The day started with an opening ceremony and remarks from Global Chief Information Officer Lori Beer, who emphasized how proud she was to be at DEVUP again. As a leader who started her career as a software developer, Lori reflected on this unique opportunity being the largest event for software engineers in financial services. “To have had access to an event like this would have been amazing,” said Lori. 

Lori challenged listeners to take the lessons that they learn this week back to their teams and highlighted the conference’s theme of curiosity. “Each of you has an opportunity this week to explore new ideas, new approaches and new advancements that will help you solve our most complex challenges,” she continued. “Your role as a technologist is crucial. It is, simply, our Global Technology community that is the foundation of JPMorganChase’s success.”