Here are my peer-reviewed publications. You may access all of my papers, including preprints, on ๐Ÿ”—Google Scholar.

2024

Demystifying and Extracting Fault-indicating Information from Logs for Failure Diagnosis

Junjie Huang, Zhihan Jiang, Jinyang Liu, Yintong Huo, Jiazhen Gu, Zhuangbin Chenโ€ , Cong Feng, Hui Dong, Zengyin Yang, Michael R. Lyu

ISSRE'24 The International Symposium on Software Reliability Engineering, Tsukuba, Japan, Oct 2024.

Logs are crucial for maintaining online service systems, but manual investigation of logs by engineers is labor-intensive and prone to errors. We find that engineers typically prioritize two categories of log information for diagnosis: fault-indicating descriptions (FID) that highlight abnormal events, and fault-indicating parameters (FIP) that identify associated entities. Motivated by these findings, we propose Log4d, a two-stage approach with novel prompt-based tuning to automatically extract fault-indicating information from logs for fault diagnosis.

Demystifying and Extracting Fault-indicating Information from Logs for Failure Diagnosis
Demystifying and Extracting Fault-indicating Information from Logs for Failure Diagnosis

Junjie Huang, Zhihan Jiang, Jinyang Liu, Yintong Huo, Jiazhen Gu, Zhuangbin Chenโ€ , Cong Feng, Hui Dong, Zengyin Yang, Michael R. Lyu

ISSRE'24 The International Symposium on Software Reliability Engineering, Tsukuba, Japan, Oct 2024.

Logs are crucial for maintaining online service systems, but manual investigation of logs by engineers is labor-intensive and prone to errors. We find that engineers typically prioritize two categories of log information for diagnosis: fault-indicating descriptions (FID) that highlight abnormal events, and fault-indicating parameters (FIP) that identify associated entities. Motivated by these findings, we propose Log4d, a two-stage approach with novel prompt-based tuning to automatically extract fault-indicating information from logs for fault diagnosis.

A Large-scale Evaluation for Log Parsing Techniques: How Far are We?

Zhihan Jiang, Jinyang Liu, Junjie Huang, Yichen Li, Yintong Huo, Jiazhen Gu, Zhuangbin Chenโ€ , Jieming Zhu, Michael R. Lyu

available badge reusable badge

ISSTA'24 The ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, Sep 2024.

Log parsing is essential for converting unstructured logs into structured data for automated analysis. Evaluating the characteristics and performance of various log parsers is crucial, however, the existing Loghub dataset is limited in scale and representativeness. We introduce Loghub-2.0, comprising 14 datasets with an average of 3.6 million logs each. Based on these datasets, we thoroughly re-evaluate 15 state-of-the-art log parsers in a more rigorous and practical setting, offering valuable insights.

A Large-scale Evaluation for Log Parsing Techniques: How Far are We?
A Large-scale Evaluation for Log Parsing Techniques: How Far are We?

Zhihan Jiang, Jinyang Liu, Junjie Huang, Yichen Li, Yintong Huo, Jiazhen Gu, Zhuangbin Chenโ€ , Jieming Zhu, Michael R. Lyu

ISSTA'24 The ACM SIGSOFT International Symposium on Software Testing and Analysis, Vienna, Austria, Sep 2024.

Log parsing is essential for converting unstructured logs into structured data for automated analysis. Evaluating the characteristics and performance of various log parsers is crucial, however, the existing Loghub dataset is limited in scale and representativeness. We introduce Loghub-2.0, comprising 14 datasets with an average of 3.6 million logs each. Based on these datasets, we thoroughly re-evaluate 15 state-of-the-art log parsers in a more rigorous and practical setting, offering valuable insights.

LILAC: Log Parsing using LLMs with Adaptive Parsing Cache

Zhihan Jiang, Jinyang Liu, Zhuangbin Chen, Yichen Li, Junjie Huang, Yintong Huo, Pinjia He, Jiazhen Guโ€ , Michael R. Lyu

FSE'24 The ACM International Conference on the Foundations of Software Engineering, Porto de Galinhas, Brazil, July 2024.

Log parsing serves as a prerequisite for various log analysis tasks, but the performance of current syntax-based and semantic-based parsers remains unsatisfactory. Leveraging large language models (LLMs) to overcome the limitations of existing log parsers is promising; however, it presents challenges related to specialization, consistency and efficiency. To address these practical issues, we propose LILAC, the first practical Log parsIng framework using LLMs with Adaptive parsing Cache.

LILAC: Log Parsing using LLMs with Adaptive Parsing Cache
LILAC: Log Parsing using LLMs with Adaptive Parsing Cache

Zhihan Jiang, Jinyang Liu, Zhuangbin Chen, Yichen Li, Junjie Huang, Yintong Huo, Pinjia He, Jiazhen Guโ€ , Michael R. Lyu

FSE'24 The ACM International Conference on the Foundations of Software Engineering, Porto de Galinhas, Brazil, July 2024.

Log parsing serves as a prerequisite for various log analysis tasks, but the performance of current syntax-based and semantic-based parsers remains unsatisfactory. Leveraging large language models (LLMs) to overcome the limitations of existing log parsers is promising; however, it presents challenges related to specialization, consistency and efficiency. To address these practical issues, we propose LILAC, the first practical Log parsIng framework using LLMs with Adaptive parsing Cache.

Go Static: Contextualized Logging Statement Generation

Yichen Li, Yintong Huoโ€ , Renyi Zhong, Zhihan Jiang, Jinyang Liu, Junjie Huang, Jiazhen Gu, Michael R. Lyu

FSE'24 The ACM International Conference on the Foundations of Software Engineering, Porto de Galinhas, Brazil, July 2024.

Logging practices have been extensively studied to assist developers in writing logging statements. However, existing automatic logging methods with single-method contexts face three key limitations: limited static scope, inconsistent logging styles, and missing variables type information. To tackle these limitations, we propose SCLogger, the first approach to generate contextualized logging statements using large language models with inter-method static contexts.

Go Static: Contextualized Logging Statement Generation
Go Static: Contextualized Logging Statement Generation

Yichen Li, Yintong Huoโ€ , Renyi Zhong, Zhihan Jiang, Jinyang Liu, Junjie Huang, Jiazhen Gu, Michael R. Lyu

FSE'24 The ACM International Conference on the Foundations of Software Engineering, Porto de Galinhas, Brazil, July 2024.

Logging practices have been extensively studied to assist developers in writing logging statements. However, existing automatic logging methods with single-method contexts face three key limitations: limited static scope, inconsistent logging styles, and missing variables type information. To tackle these limitations, we propose SCLogger, the first approach to generate contextualized logging statements using large language models with inter-method static contexts.

TraceMesh: Scalable and Streaming Sampling for Distributed Traces

Zhuangbin Chen, Zhihan Jiang, Yuxin Su, Michael R. Lyu, Zibin Zhengโ€ 

CLOUD'24 The IEEE International Conference on Cloud Computing, Shenzhen, China, July 2024. ๐Ÿ† Best Paper Award

Distributed tracing is a fundamental monitoring tool for cloud systems; however, it typically captures overlapping and redundant information. Existing tail-based trace samplers fall short of considering the high-dimensional and dynamic nature of trace data. To address these practical challenges, we introduce TraceMesh, a scalable and streaming sampler for distributed traces, which adapts to evolving trace features and dynamically samples uncommon traces.

TraceMesh: Scalable and Streaming Sampling for Distributed Traces
TraceMesh: Scalable and Streaming Sampling for Distributed Traces

Zhuangbin Chen, Zhihan Jiang, Yuxin Su, Michael R. Lyu, Zibin Zhengโ€ 

CLOUD'24 The IEEE International Conference on Cloud Computing, Shenzhen, China, July 2024. ๐Ÿ† Best Paper Award

Distributed tracing is a fundamental monitoring tool for cloud systems; however, it typically captures overlapping and redundant information. Existing tail-based trace samplers fall short of considering the high-dimensional and dynamic nature of trace data. To address these practical challenges, we introduce TraceMesh, a scalable and streaming sampler for distributed traces, which adapts to evolving trace features and dynamically samples uncommon traces.

FaultProfIT: Hierarchical Fault Profiling of Incident Tickets in Large-scale Cloud Systems

Junjie Huang, Jinyang Liu, Zhuangbin Chen, Zhihan Jiang, Yichen Li, Jiazhen Guโ€ , Cong Feng, Zengyin Yang, Yongqiang Yang, Michael R. Lyu

ICSE'24 The IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, Lisbon, Portugal, Apr 2024.

Postmortem analysis is essential for managing cloud system incidents, involving profiling incidents to classify them into unique fault patterns. Current manual approaches are labor-intensive and error-prone, resulting in only the most severe incidents being analyzed, which leads to a skewed fault pattern overview. To address these limitations, we propose an automated approach called FaultProfIT, for Fault Pattern Profiling of Incident Tickets, utilizing hierarchy-guided contrastive learning.

FaultProfIT: Hierarchical Fault Profiling of Incident Tickets in Large-scale Cloud Systems
FaultProfIT: Hierarchical Fault Profiling of Incident Tickets in Large-scale Cloud Systems

Junjie Huang, Jinyang Liu, Zhuangbin Chen, Zhihan Jiang, Yichen Li, Jiazhen Guโ€ , Cong Feng, Zengyin Yang, Yongqiang Yang, Michael R. Lyu

ICSE'24 The IEEE/ACM International Conference on Software Engineering: Software Engineering in Practice, Lisbon, Portugal, Apr 2024.

Postmortem analysis is essential for managing cloud system incidents, involving profiling incidents to classify them into unique fault patterns. Current manual approaches are labor-intensive and error-prone, resulting in only the most severe incidents being analyzed, which leads to a skewed fault pattern overview. To address these limitations, we propose an automated approach called FaultProfIT, for Fault Pattern Profiling of Incident Tickets, utilizing hierarchy-guided contrastive learning.

2023

Prism: Revealing Hidden Functional Clusters of Massive Instances in Cloud Systems

Jinyang Liu*, Zhihan Jiang*, Jiazhen Gu, Junjie Huang, Zhuangbin Chenโ€ , Cong Feng, Zengyin Yang, Yongqiang Yang, Michael R. Lyu (* equal contribution)

ASE'23 The IEEE/ACM International Conference on Automated Software Engineering, Kirchberg, Luxembourg, Sep 2023.

To improve observability of large-scale cloud systems, we propose to infer functional clusters, i.e., groups of instances having similar functionalities, to bridge the gap betwwen instance and service layer. Our pilot study demonstrates that instances having similar functionalities share similar communication and resource usage patterns. Motivated by these findings, we propose a non-intrusive solution, Prism, to reveal functional clusters in cloud systems based on communication traces and performance metrics.

Prism: Revealing Hidden Functional Clusters of Massive Instances in Cloud Systems
Prism: Revealing Hidden Functional Clusters of Massive Instances in Cloud Systems

Jinyang Liu*, Zhihan Jiang*, Jiazhen Gu, Junjie Huang, Zhuangbin Chenโ€ , Cong Feng, Zengyin Yang, Yongqiang Yang, Michael R. Lyu (* equal contribution)

ASE'23 The IEEE/ACM International Conference on Automated Software Engineering, Kirchberg, Luxembourg, Sep 2023.

To improve observability of large-scale cloud systems, we propose to infer functional clusters, i.e., groups of instances having similar functionalities, to bridge the gap betwwen instance and service layer. Our pilot study demonstrates that instances having similar functionalities share similar communication and resource usage patterns. Motivated by these findings, we propose a non-intrusive solution, Prism, to reveal functional clusters in cloud systems based on communication traces and performance metrics.