CLOUD AI RESEARCH

Academic Minimalism

Research notes on cloud-native AI systems, distributed training, and infrastructure.

Writing by year

No thumbnails. Just text.

Applying Machine Learning Algorithms for Early Fault Prediction of Virtual Network Functions in Network Function Virtualization Systems

NFVMachine LearningAI

Analyzing the wonder of sunbeam that doesn't come from the sun.

A research note on making distributed AI services observable, recoverable, and easy to reason about.

CloudAIDistributedSystemsReliability

A note on why training jobs, schedulers, and observability should be designed together.

CloudInfraDistributedTrainingObservability

How I annotate OSDI papers to recover the design constraints hidden between the lines.

PaperReviewSystemsResearchWorkflow

A short note on treating observability as a first-class training primitive rather than a postmortem tool.

DistributedTrainingObservabilityMLOps

A lightweight framework for deciding when to retrain versus when to recalibrate.

EdgeAIEvaluationMLOps

A note on why accuracy alone is not enough when inference has to live on constrained devices and unstable networks.

EdgeAIInferenceSystemsDesign