Knowledge Hub — AlphaPulse | Adalytica

Signal Methodology Overview

AlphaPulse transforms raw public text — news wires, financial commentary, social discourse, and regulatory filings — into two normalised scalar scores per topic per update cycle: the Pulse Strength (directional sentiment) and the Attention Pulse (coverage volume).

The pipeline runs on a continuous ingestion loop. Every article or post is routed through domain-specific NLP models trained on financial and geopolitical corpora, producing a raw sentiment logit and a source-weight adjusted coverage count. These raw outputs are then normalised, aggregated, and quality-gated before a gauge reading is published.

Design principle

Signals are narrative-derived — they measure what people are writing about markets, not what markets are doing. This makes them leading or coincident indicators that complement, rather than replicate, price-based data.

Greed regime

Neutral

Fear regime

High attention

How Sentiment Works

Sentiment is the numerical distillation of opinion from text. For every article, post, or filing the pipeline ingests, a model assigns a score in $[-1, +1]$ — where −1 is maximally negative, 0 is neutral, and +1 is maximally positive — relative to a specific tracked topic such as Bitcoin, Gold, or Trump Policy.

The same article can carry different sentiment scores for different topics. A piece headlined "Trump sorgt für neue Turbulenzen, Bitcoin reagiert prompt" simultaneously registers negative sentiment on Trump policy and slightly negative sentiment on Bitcoin — each scored independently by the topic router.

Infographic showing how individual news and social sources are scored for sentiment and aggregated into the AlphaPulse signal

As the infographic above illustrates, sources span languages, regions, and media types simultaneously — a German newswire, a Russian social post, and a Chinese broadcaster can all influence the same topic signal in the same update cycle. This cross-lingual aggregation is intentional: market-moving information is written in the language it originates in, and translating it introduces lag.

Per-document score−1 → +1Raw sentiment logit assigned by the domain-adapted transformer model.

Source weightingw ∈ (0, 1]Authority weight derived from reach, editorial standards, and historical accuracy.

Aggregation windowRollingWeighted scores accumulate over a rolling window before normalisation to 0–100.

Why multiple languages?

Over 70% of financial market events originate outside English-language media. Restricting ingestion to English introduces a systematic blind spot. AlphaPulse processes sources in 72 languages and normalises scores to the same scale before aggregation.

NLP Processing Pipeline

Each ingested document passes through a four-stage pipeline:

Topic routing — a lightweight classifier maps each document to one or more of the 100+ tracked topics using keyword priors and embedding similarity. Documents with confidence below 0.60 are discarded.
Sentiment scoring — a domain-adapted transformer model (financial BERT family) assigns a logit in $[-1, +1]$ . The logit is calibrated against a held-out labelled set of financial news updated quarterly.
Source weighting — each source has an authority weight $w_s \in (0, 1]$ derived from reach, editorial standards, and historical signal quality. The weighted contribution of document $i$ is $w_{s_i} \cdot \hat{y}_i$ .
Aggregation & normalisation — weighted scores are aggregated over a rolling window and mapped to the 0–100 percentile scale described below.

Score Normalisation (0–100)

Raw aggregate scores are converted to percentile ranks over a trailing 24-month window. Let $\mu$ and $\sigma$ be the rolling mean and standard deviation of the raw score for a given topic. The displayed score is:

S = 50 + 50 \cdot \frac{x - \mu}{\max(\sigma,\,\epsilon)}

where $x$ is the current raw aggregate, $\epsilon$ is a small regularisation constant that prevents division by zero for low-activity topics, and the result is clipped to $[0, 100]$ .

Cross-topic comparability

Because every topic is normalised to its own rolling baseline, a score of 70 means the same thing across Bitcoin and German Bunds — both are in the 70th percentile of their own sentiment history. Absolute raw scores are not comparable; the percentile rank is.

Sensitivity Filter — Square Root of Time Variance Threshold

The Sensitivity slider (range 1–5) in the signal table controls the noise-rejection threshold for change columns. It implements a Square Root of Time scaling rule borrowed from quantitative risk management: noise in a time series grows proportionally to $\sqrt{t}$ , not linearly.

A change $\Delta$ over a lookback window of $d$ days is highlighted as significant only when:

|\Delta| \geq C \cdot \sqrt{d}

where $C$ is the chosen sensitivity level. For the default $C = 2$ :

Window	√d	Min \|Δ\| at C=2
1 day	1.00	2.0 pts
7 days	2.65	5.3 pts
30 days	5.48	10.9 pts

Why this matters

A 5-point move over one day is more surprising than a 5-point move over 30 days — the 30-day window has a larger variance budget. The √T rule captures this intuition formally, so the significance threshold automatically scales with the lookback period rather than requiring separate thresholds per timeframe.

At $C = 1$ : maximum reactivity — even small moves are flagged; useful for short-term scanning but noisier. At $C = 5$ : only regime-level dislocations are highlighted, suitable for weekly or macro-cycle monitoring.

Confidence & Significance Thresholds

In addition to the user-facing Sensitivity filter, the pipeline applies internal quality gates before publishing any score. A topic score is only published when:

At least 12 unique source documents contributed to the aggregate window.
Source diversity exceeds a minimum entropy threshold (no single source contributes more than 40% of the weighted mass).
The rolling $\sigma$ is non-zero (topic has shown historical variance — prevents spurious scores for newly tracked topics).

Low-activity topics

Topics with fewer than 12 documents in a window will show a stale score with a greyed-out timestamp rather than a fresh reading. This is intentional — a thin-data score is more misleading than no score.