Builder Spotlight: From Data Scientist to Bot Seller — Building Sentiment Agents for Prediction Markets

Builder Spotlight is a series where we interview developers building in the prediction market agent ecosystem. Some use their real names, others prefer pseudonyms. The focus is on what they built, how they built it, and what they learned.

This interview has been lightly edited for length and clarity.

AgentBets: What’s your background, and how did you end up building trading bots?

Priya Sharma: I’m a data scientist by training. Master’s degree in computational linguistics, then four years at a large tech company working on NLP — text classification, entity extraction, sentiment analysis for product reviews. Standard industry NLP work.

I started following prediction markets in 2024 because they’re such a clean signal source. Unlike stock markets, where price movements reflect dozens of overlapping factors, a prediction market contract on “Will the Fed cut rates in June?” has exactly one thing driving the price: collective belief about whether the Fed will cut rates in June. That clarity fascinated me from a modeling perspective.

I noticed something specific: on many political and policy markets, the price would lag behind public sentiment shifts by hours, sometimes by a full day. A major news story would break, Twitter would explode with commentary, and the relevant prediction market wouldn’t fully reprice for six to twelve hours. If you could quantify the sentiment shift before the market priced it in, you had an edge.

That was the hypothesis. I spent about three months testing it manually — reading news, gauging Twitter sentiment by feel, and placing bets on Polymarket based on my interpretation. I was profitable enough to convince myself the signal was real. Then I spent the next nine months building the system to do it automatically.

AgentBets: Walk us through the technical architecture. How does the system work?

Priya: The pipeline has four stages: data collection, signal generation, conviction scoring, and execution.

Data collection is the foundation, and honestly it’s the most unglamorous part. I aggregate text data from five source categories:

News APIs — I pull from three major news aggregation APIs. These give me article headlines, summaries, and publication timestamps. I’m processing roughly 15,000 articles per day across political, economic, and policy categories.
Social media — X/Twitter is the primary source. I use the API to track specific accounts (political journalists, economists, policy analysts — about 800 accounts that I’ve curated over time) and keyword streams for topics that map to active prediction markets. Reddit is secondary but useful for specific topics.
Government and institutional sources — Federal Register filings, congressional schedules, regulatory dockets. These are lower volume but high signal. A new FDA filing or a FOMC meeting schedule change often predicts market movements before the news cycle picks it up.
On-chain data — Polymarket’s own order flow. Large purchases by wallets with historically accurate track records (I maintain a watchlist of about 50 addresses) are themselves a signal.
Odds aggregators — I track implied probabilities on competing platforms and traditional bookmakers where overlap exists.

Signal generation is where the NLP happens. Each text source gets processed through a classification pipeline. The core model is a fine-tuned DeBERTa-v3 transformer. I chose DeBERTa over larger models because the latency-accuracy tradeoff matters — I need to process thousands of documents per hour, and a 300M parameter model running on a single GPU gives me adequate accuracy with sub-second inference per document.

The model outputs three things for each document: a relevance score (how closely does this text relate to a specific prediction market?), a directional score (does this text suggest the probability should go up or down?), and a confidence score (how strongly does the language indicate a directional shift?). These three scores get aggregated at the market level to produce what I call a composite sentiment vector.

Conviction scoring takes the composite sentiment vector and compares it to the current market price. If the sentiment vector says “strongly bullish” but the market is already at 85%, there’s not much room to move. If the sentiment vector says “strongly bullish” and the market is at 55%, there’s a meaningful divergence. The conviction score measures the magnitude of that divergence, adjusted for historical accuracy of the sentiment signal in similar market categories.

Execution is the straightforward part. When the conviction score crosses a threshold, the system places a limit order on Polymarket. Position size scales with conviction — higher conviction means larger position, within risk limits. I use the Polymarket CLOB API with limit orders only; I never use market orders because the slippage on directional trades can be significant.

AgentBets: How did you train the model? Where did the labeled data come from?

Priya: This is where being a prediction markets participant myself was critical. I couldn’t use standard sentiment analysis training data because the domain is too specific. A model trained on Amazon product reviews doesn’t understand that “the committee voted to advance the bill” is bullish for a “Will the bill pass?” market.

I built my training set in three waves.

Wave one was manual annotation. For three months, I annotated roughly 5,000 text-market pairs myself. Each annotation was: here’s a piece of text, here’s a prediction market, does this text suggest the probability should go up, down, or stay the same, and how strongly? This was tedious but necessary for establishing the ground truth.

Wave two was semi-automated expansion. I used the manually annotated data to train an initial model, then used that model to pre-label a much larger dataset (about 40,000 examples). I reviewed and corrected the model’s labels, focusing on the cases where the model was least confident. This is standard active learning — the model learns fastest from the examples it’s most uncertain about.

Wave three was outcome-based validation. I backtested the model’s signals against actual market price movements. For every signal the model generated historically, I checked whether the market moved in the predicted direction within 24 hours. This gave me an outcome-based accuracy metric (roughly 62% directional accuracy on political markets, 58% on economic markets, 55% on other categories) and allowed me to recalibrate the conviction scoring thresholds.

62% directional accuracy might not sound impressive, but in prediction markets, you don’t need to be right most of the time — you need the times you’re right to be more profitable than the times you’re wrong. Because my position sizing scales with conviction, and my highest-conviction signals have a 71% accuracy rate, the portfolio-level performance is meaningfully positive.

AgentBets: Which data sources turned out to be most valuable?

Priya: This surprised me. I expected social media to be the strongest signal because of volume and speed. In practice, the ranking looks like this:

Curated Twitter accounts — specific journalists and analysts, not the firehose. The 800-account curated list outperforms the broad keyword-based Twitter feed by a wide margin. Quality of sources matters enormously.
Government and institutional filings — low volume, high signal, and almost never priced into markets quickly. A regulatory filing that signals a policy change can move a market by 10-15 points, and the market often doesn’t react until the filing gets covered by mainstream media hours later.
On-chain order flow — following smart money on Polymarket itself is a surprisingly good signal. When addresses with historically accurate track records start buying Yes on a market, the signal is worth following. It’s not NLP, but it’s part of the pipeline.
News APIs — good for confirming trends but rarely the first source to break relevant information. News aggregators are useful as a filter to identify which markets are in an active news cycle.
Broad social media — the unfiltered keyword stream is noisy. Most of the signal is drowned in irrelevant mentions, memes, and repetitive commentary. I keep it in the pipeline for coverage but it contributes relatively little to conviction scores.

The lesson is that raw volume of data doesn’t correlate with signal quality. A carefully curated list of 800 Twitter accounts produces better signals than a firehose of 500,000 tweets per day.

AgentBets: Let’s talk about the business side. When did you decide to turn this into a product?

Priya: About six months in. The bot was generating around $3,000 to $5,000 per month on my personal capital, which was meaningful but not life-changing on a $40K capital base. I realized that the sentiment signals themselves had value independent of my own trading. Other people with their own capital and risk tolerance could benefit from the same signals.

I initially considered three models: selling the bot outright, licensing the signals as a data feed, or running a managed fund. I ruled out the managed fund immediately — the regulatory complexity wasn’t worth it. Selling the bot outright was tempting but meant giving away my edge and having no recurring revenue. I settled on a signals-as-a-service subscription.

The product is structured in two tiers:

Signal tier ($39/month): Subscribers get real-time conviction scores for every active Polymarket market via a webhook or dashboard. They see which markets the model is bullish or bearish on, the conviction level, and the key data points driving the signal. They execute trades manually based on their own judgment.
Autopilot tier ($149/month): The full autonomous agent. Subscribers provide their Polymarket API keys, set risk parameters (max position size, daily loss limit, max number of concurrent positions), and the system executes trades on their behalf. This is the same system I run for my own account.

AgentBets: How’s the subscription business performing?

Priya: I launched in October 2025 with a small beta group. Current numbers as of late February 2026:

Signal tier: 84 subscribers ($3,276/month)
Autopilot tier: 23 subscribers ($3,427/month)
Combined subscription revenue: ~$6,700/month
My own trading revenue: ~$4,000-$6,000/month
Total monthly revenue from the ecosystem: ~$11,000-$13,000/month

Churn is the main challenge. Monthly churn on the Signal tier is about 12%, which is high. People subscribe during a hot market period, see good signals, then cancel during quiet periods when fewer markets are moving. The Autopilot tier has lower churn (about 6%) because the automation creates stickiness — people don’t want to go back to manual trading.

Customer acquisition cost is essentially zero. I don’t run paid ads. Growth has come from a small Twitter/X following where I share redacted versions of past signals (with a delay, so they’re not actionable), the AgentBets marketplace, and word of mouth in prediction market Discord communities.

For anyone considering a similar model, the prediction market bot pricing guide and the revenue sharing guide on this site cover the business model options in more detail than I can here.

AgentBets: What’s been the hardest part of selling a sentiment-based product?

Priya: Managing expectations. Arbitrage bots have 90%+ win rates. Sentiment bots don’t. My system has a ~62% directional accuracy overall. That means roughly four out of ten trades lose money. For subscribers who came from arbitrage bots or who expect near-certainty, this is jarring.

I’ve learned to be very explicit in onboarding: this is not an arbitrage system, you will see losing trades, the edge materializes over dozens of trades not individual ones, and the variance is real. Despite this, I still get cancellations from people who have three losing trades in a row and conclude the system is broken. It’s the nature of selling a probabilistic product.

The other challenge is transparency. Subscribers want to understand why the system is taking a particular position. “The model is bullish” isn’t enough — they want to see the data points. So I built a signal explainability layer that surfaces the top contributing factors for each conviction score: the specific tweets, articles, or filings that drove the signal. This added significant development time but reduced support inquiries and improved retention.

AgentBets: What advice would you give to data scientists who want to apply their skills to prediction markets?

Priya: Several things.

Start with the market, not the model. Spend at least a month manually trading prediction markets before you write a single line of model code. You need to internalize how these markets behave — when they’re efficient, when they’re not, how liquidity varies, how prices respond to events. A beautiful model trained on the wrong assumptions will lose money elegantly.

Data quality dominates model quality. I spent more time curating my Twitter account list, building my government filing scrapers, and cleaning my training data than I spent on model architecture. The model architecture is a fine-tuned DeBERTa — nothing cutting-edge. The data pipeline is where the edge lives.

Latency matters less than you think for sentiment strategies. Arbitrage bots need sub-second execution. Sentiment signals develop over minutes to hours. If your signal says “the market should be 10 points higher based on the news cycle,” you typically have a window of several hours to take the position. Don’t over-engineer for speed at the expense of signal quality.

Build the monitoring before you need it. Track every signal, every trade, every P&L metric from day one. When the model starts underperforming (and it will, because markets change), you need the historical data to diagnose why. Was the model wrong, or did the market move faster than expected? Did a new type of data source emerge that the model wasn’t trained on? Without detailed logging, you’re debugging blind.

Consider the business model from the start. If you’re building something that works, think early about how you want to monetize it. Running your own capital, selling signals, selling the bot, licensing the model — each has different tradeoffs in terms of revenue, risk, scalability, and how much of your edge you’re giving away. I wish I’d thought about the subscription model earlier; I would have designed the signal explainability layer from the beginning rather than retrofitting it.

AgentBets: What are you working on next?

Priya: Two things. First, expanding the model to cover Kalshi markets. The data pipeline is platform-agnostic — news and social media don’t care which platform the market is on — but I need to build the Kalshi API integration for execution and train the conviction scoring on Kalshi-specific price dynamics. Kalshi markets have different liquidity profiles and pricing behavior than Polymarket.

Second, I’m experimenting with using LLMs (specifically fine-tuned versions of smaller open-source models) for the signal generation step. The current DeBERTa pipeline is good at classification but limited in its ability to reason about complex, multi-step causal chains — “this regulatory filing implies that the agency will likely take action X, which would affect outcome Y in this prediction market.” LLMs are better at that kind of reasoning. Early experiments are promising but inference cost is 10x higher, so the economics need to work out.

AgentBets: Where can people learn more?

Priya: For building the agent infrastructure, the build a prediction market agent guide covers the non-ML parts well. For the business side, the prediction market bot pricing guide is useful. And if you’re thinking about the subscription model specifically, the revenue sharing guide covers the different approaches. I’m also intermittently active on X/Twitter where I share thoughts on prediction market ML — look for someone talking about DeBERTa and prediction markets and you’ll probably find me.

Priya Sharma is a pseudonym. The revenue, performance, and subscriber figures in this interview are self-reported and have not been independently verified. Prediction market trading involves risk of loss. This is not financial advice.

Are you building prediction market agents? We’d love to feature you in a future Builder Spotlight. Reach out at [email protected].