Innovative Data Engineer

I specialize in engineering scalable web crawlers using Rust, capable of handling millions of pages with ease. My approach turns raw data into actionable intelligence with cutting-edge technologies.

View my CV for more details.

CETD implementation

Content extraction via Text Density
Algorithm described in the paper by Fei Sun, Dandan Song and Lejian Liao

Expiring Bloom Filter

Bloom Filter
Sliding window implementation of decaying bloom filter

Async framework for data extraction

Comprehensive Asynchronous Parallel Processing
Common things i use to build Rust CLI tools for web crawlers

Reinventing web data extraction, I turn raw, unstructured web content into powerful, actionable intelligence. Leveraging Rust for high-speed, massive-scale crawling alongside Python for flexible data processing, I create robust web robots and distributed systems that deliver the data foundations for advanced machine learning and AI projects.

My approach underpins everything from training domain-specific LLMs to building next-gen AI solutions.