Reading Reddit’s Market Mood: A Minimal Streamlit Sentiment Dashboard
A single meme can move a stock twenty percent before the coffee finishes brewing.
Every trading day, subreddits like r/wallstreetbets collectively publish tens of thousands of posts, comments, and screenshots of options gains. Whether it’s the latest rumor disguised as DD or just a tidal wave of rocket emojis, the chatter is loud, chaotic, and, given the right lens, surprisingly measurable. Buried inside this subreddit is an endlessly looping question: Which stocks are people actually talking about right now, and are they feeling bullish, bearish, or just trolling?
I got tired of punching ticker symbols into browser tabs to quantify the noise. Traditional sentiment analysis tools either cost money, require complex API setups, or focus on mainstream news sources that miss the raw, unfiltered energy of retail investor forums.** I wanted a way to tap directly into Reddit’s pulse without pumping a dime into third-party enterprise firehoses or signing my soul away to a data-broker dashboard that wants late-cycle SaaS money.**
So I stitched together a no-frills, locally-run tool that downloads the freshest posts on-demand (manual refresh or force fresh), isolates ticker mentions, scores sentiment with VADER, and drops everything into a one-screen Streamlit dashboard you can scan between sips of coffee. No API keys, no cloud credits, no patience required.
Full dashboard, more features as we scroll.### The Problem: Signal in the Noise Reddit’s investment communities have become impossible to ignore. Whether you think they’re psychic crowd wisdom or organized chaos, the sheer volume of discussion around certain tickers often precedes significant price movements. When a previously ignored ticker suddenly dominates conversation, or when uniform bullishness flips to concern, these inflection points often precede volatility, whether up or down (we know what happened with GameStop).
The challenge isn’t finding the conversations, it’s making sense of them at scale. I wanted something different: a lightweight, local solution that could turn Reddit’s public posts into a one-glimpse snapshot of what retail is actually saying today; and whether net tone skews bullish, bearish, or simply resigned.
What You’ll Have in Under Five Minutes
When the Streamlit tab opens, you land on a sidebar fat-packed with dials: multi-select for subreddits (default: r/wallstreetbets; you can add the r/stocks and r/options via the multiselect), sliders for posts per subreddit (10–500, default 200) and the top stocks you want surfaced (5–50, default 20).
In plain English:* *”Grab the last, say, 200 posts from the selected subreddits; distill them to the symbols that show up most often; tell me how often; tell me sentiment.”
Hit Refresh and you stay on fast cached numbers. Hit Force Fresh Data if you just caught wind of a ticker exploding in live chat and want an instant temperature check.
A second later you’ll see:
- A tidy ranking table of ticker symbols sorted first by raw mentions, second by average sentiment
- **An alerts panel **that lights up only when a ticker crosses two user-set thresholds (mentions AND sentiment must both be met): no spam, just the outliers that deserve your next mouse click
- **Export CSV and Analyst Notes buttons inline: **one click dumps everything to CSV, the other lets you scribble free-form analyst notes that save locally to
data/notes.json - Current run visualizations: Scatter plot (Mentions vs Sentiment, labelled) and Treemap (size=mentions; grayscale by sentiment)
- Historical visualizations: Mention Heatmap (Ticker × Date) and Stacked Sentiment per day (Positive/Neutral/Negative)
- **Price Overlay: **average daily sentiment bars + adjusted close line with 7-day as-of merge.
How the Pipeline Works (No Spaghetti, I Promise)
The code traces a straight line from Reddit’s public JSON endpoints to your browser without hiding the tracks. Picture the data path as a carefully orchestrated conveyor belt:
Data Ingestion Without the OAuth Dance
Reddit’s public JSON endpoints feed the scraper directly: no API keys or authentication headaches required. By appending .json to any Reddit URL, you get structured data. The scraper walks pagination links with a polite one-second delay between calls, fast enough to be useful, slow enough to be respectful of Reddit’s servers and not ending up on some watchlist. The pipeline uses the hot feed by default.
Smart Ticker Extraction That Actually Works
The scraper ships stripped-down post bodies to a ticker extractor that’s smarter than just looking for capital letters. It maintains a curated whitelist of valid symbols and aggressively filters out common false positives.
The extractor matches uppercase tokens and $TICKER format validated against this whitelist. Common acronyms and community slang that look like tickers but aren’t, like DD, YOLO, HODL, CEO, USD, get filtered out. The system includes titles and selftext; comments aren’t fetched by default. Each unique post counts once per ticker to limit spam.
False positives that might look like tickers, solving a problem I had earlierHere is the regex pattern to match stock tickersThis is an efficient extraction process that crosschecks the list and returns a valid ticker#### Sentiment Analysis That Speaks Meme Each extracted ticker rides through VADER (Valence Aware Dictionary and sEntiment Reasoner) sentiment scoring, a tool specifically tuned for social media text. VADER excels here because it was trained on tweets and understands modern internet language. It knows that “stonks” is probably positive, that rocket emojis amplify bullishness, and that “diamond hands” isn’t about jewelry. The library understands intensifiers, emoji, and the difference between “good” and “GOOD!!!”. This is absolutely critical for an app like Reddit, and it was tough finding another reasoner that could capture a sentiment like VADER.
The app uses VADER’s compound score with simple thresholds: Positive > 0.1, Negative < -0.1, else Neutral. For each ticker, we compute a mentions-weighted average sentiment for the combined run, which smooths the effect of one noisy post.
Intelligent Caching and Storage
The extracted data gets bundled into a lightweight dataframe with these columns:
run_date, subreddit, ticker, mention_count, sentiment_score, sentiment_category, last_updated.
The system uses a file-based JSON cache with a 30-minute TTL (data_cache.json) in the DataController. ‘Refresh’ uses cache if valid; ‘Force Fresh Data’ bypasses it. This cache layer prevents unnecessary Reddit hammering during development and testing, while still allowing fresh updates when needed.
During market hours, you might want to bypass cache entirely for absolutely fresh data. After hours or on weekends, the cache prevents redundant API calls while you’re tweaking visualizations or filters.
Historical Context Without Extra Cron Jobs
I hate extra cron jobs. That’s why the history aspect optionally auto-accumulates. The app writes per-run snapshots to data/history/mentions_<subreddit>_<YYYY-MM-DD>.parquet (or .csv as fallback).
Run the collector to accumulate multi-day history:
bashpython3 collector.py — subreddits wallstreetbets,stocks — post-limit 200 — top-limit 30 — forcepython3 collector.py — subreddits wallstreetbets,stocks — post-limit 200 — top-limit 30 — interval-mins 60 — forceThe **history charts and Price Overlay **are most useful after a few daily snapshots exist. Once you have those snapshots, you unlock the historical heatmap (how mentions drift over time) and the Price Overlay: average daily sentiment bars + adjusted close line with 7-day as-of merge. If you see ‘no overlap’, collect over multiple days; the overlay uses as-of matching with a 7-day tolerance.
Flowchart for### Real-World Usage Patterns
Morning Market Prep
Set the app to pull 500 posts from each major investing subreddit, export the top 20 tickers to CSV, and cross-reference with your existing watchlist. The sentiment scores help prioritize which charts to pull up first.
Intraday Monitoring
Keep the dashboard running on a second monitor. The app refreshes manually when you need it. The alerts panel will catch any ticker that suddenly spikes in mentions AND sentiment, potentially signaling breaking news or coordinated discussion. When TSLA finally breaks the wall, you’ll know about it.
Weekend Research
Use the historical features to identify patterns. Does weekend sentiment predict Monday opens? Do certain subreddits lead others in identifying trends? The data export makes it easy to pull this into Excel or pandas for deeper analysis.
Quality of Life Features That Matter
The Min Mentions and Min Sentiment filters control what appears in the main table. Separately, alerts use their own thresholds so the main table and the alert triggers can be tuned independently. Both thresholds (mentions AND sentiment) must be met for an alert to fire.
CSV export uses the combined snapshot columns:
run_date,subreddit,ticker,mention_count,sentiment_score,sentiment_category,last_updated. Notes are saved locally todata/notes.json— if you want to sync them to Gist or Notion, that’s a manual process you can set up yourself.
Performance and Politeness
The app analyzes 10–500 posts per subreddit (default 200) with pagination and a 1-second polite delay; cached runs are instant. Reddit rate limits kick in only when you’re driving hundreds of concurrent workers. A polite single-threaded crawler keeps you alive forever.
More importantly, local builds buy three quiet pleasures:
- **Zero cold-start times: **if you’re trading on news cycles measured in minutes, seconds matter
- **Complete transparency: **you can step inside the exact dataframe that’s producing whatever you see, set a breakpoint, and watch the VADER map
- **Instant fixes: **when Reddit inevitably changes the JSON field name, the error is only one Python trace away from being fixed, no waiting for someone else’s server push
Extending Into Production
Turn those Min Mentions and Min Sentiment filters into hard risk-gate parameters for your trading algorithms. Drop the CSV export into your morning research folder, or wire the JSON notes file into a cron job that posts daily deltas to Slack. Add a PostgreSQL backend to replace file caching, implement user authentication for team sharing, or integrate with your broker’s API for automated position sizing based on sentiment extremes.
The app was intentionally built on batteries-included libraries (requests, pandas, vaderSentiment, streamlit) so you can hack it even if your last Python script still had semicolons. Every component is modular; swap VADER for a transformer-based sentiment model, replace Reddit with Twitter, or add technical indicators alongside the sentiment overlays.
What This Tool Is (And Isn’t)
This isn’t financial advice, and sentiment alone shouldn’t drive investment decisions. Reddit sentiment is one signal among many, often noisy and sometimes manipulated. But it’s also an on-demand window into retail investor psychology, and when combined with traditional analysis, it can provide valuable context for market movements.
If you’ve ever looked at a moon-shooting chart and asked, “Did Reddit cheer this runup before price, or is price now popularizing the meme?”, the answer can now be quantified, exported, and debated by 9:02 a.m.
Getting Started in Three Commands
bashgit clone [repository-url]pip install -r requirements.txtstreamlit run app.pyOptionally, runcollector.pyto build history as shown above.
If you’ve ever wondered whether that rocket-emoji stuff on Reddit contains any usable signal, now you can find out without having to scroll yourself (but, you’ll miss some of the fun). Leave the browser tab open: once you see the stream in motion, the mental signal filtration that used to happen on Discord and Twitter collapses into clear visualizations.
Thanks soo much for reading! Let me know any feedback in the comments below, and don’t forget to leave some claps. Bye!
Originally published on Medium