Skip to main content
HomeBlogEngineering
● Engineering

The parser rewrite: from 40 regexes to one tiny LLM

How we shipped a deterministic parser with an LLM fallback that handles every weird Telegram format we have ever seen.

AO
Aisha Okonkwo
ML / Parsers
11 min read

Telegram signal channels are creative. "BUY EU NOW SL 1.0850 TP 1.0920" and "📈 EURUSD long. Entry: market. Stop: 50p below. TP1 1.09, TP2 1.095" and a screenshot of a TradingView chart with no text — all valid signals to a human eye, all hostile to a regex.

Why we kept reaching for new regexes

Every channel onboarded forced another regex. The collection grew to over forty patterns with conflicting precedence; each added rule risked breaking another channel's parse. The maintenance cost was real, the false-positive rate climbed, and the rules made no attempt to handle screenshots.

What the rewrite changed

Two layers. The first is a deterministic parser — about 600 lines of TypeScript — that handles the clean structured cases (most of the volume, milliseconds to parse, no API cost). If it returns low confidence, the second layer is a small LLM call with a tightly-constrained JSON schema, with the original message as the only context and a system prompt that explicitly forbids guessing.

The constraints that matter

  • Output must be valid JSON conforming to a strict schema. The model returns null fields where it would otherwise hallucinate.
  • The model is asked to refuse parsing if it can't identify direction, symbol, and at least one of entry/SL/TP. Refusals route to operator review, not auto-execution.
  • Images get a vision model pass with the same schema; OCR alone proved too lossy for hand-annotated charts.
  • Every parsed signal stores the original message text, the parser path (deterministic vs LLM), and the model's confidence. The audit trail is more important than the parse.

Practical lessons

Don't let the model invent missing fields. Don't let the model break the JSON schema. Don't auto-execute on low confidence — flag for review. The hard part isn't the model, it's the contract around the model.

About PipSync

PipSync is a signal-to-execution routing platform. We do not provide investment advice, do not recommend signal sources, and do not hold client funds. Trading leveraged products involves substantial risk of loss. Read the Trust Center →

← All articlesHave feedback on this post? Get in touch →

The pip drop — weekly.

One well-edited email every Friday: the most interesting post on PipSync, one trade that caught our eye, and a link to what the team is reading. No hype, unsubscribe in one click.

4,180 subscribers · 48% open rate · zero tracking pixels