Twitter Stream Analyzer
I wanted to see what machine learning could read from Twitter as it happens: sentiment, topics, named entities, with user-defined filters that raise alarms when matching tweets appear in the stream. The constraint that s…
I wanted to see what machine learning could read from Twitter as it happens: sentiment, topics, named entities, with user-defined filters that raise alarms when matching tweets appear in the stream.
The constraint that shaped the whole design: model inference is slower than tweets arrive. Run models inline with ingestion and you drop data on every spike. So the architecture is producer/consumer, Kafka-style messaging decoupling three stages: ingestion keeps up with the stream, NLP workers process at their own pace, and a Django API layer serves results over REST and GraphQL to a React frontend.
The related tradeoff was micro-batching over per-tweet inference. Batching raises throughput at the cost of a few seconds of latency, and for a monitoring dashboard, near-real-time is indistinguishable from real-time while being far cheaper to run.
The analytics and inference workloads are containerized with Docker, designed for Kubernetes-style orchestration. Getting throughput up was mostly an observability exercise: adding monitoring, logging, and metrics, then using them to find where workers sat idle and tuning utilization from evidence instead of guesses.