Mycroft Project Portfolio 2025: AI-Powered Financial Intelligence Systems

2025 Fellows Projects at Humanitarians AI

Feb 17, 2026

NOTE: For more info on the Myrcroft project subscribe to its Substack

The Mycroft project represents an ambitious attempt to democratize institutional-grade financial intelligence through open-source AI systems. Across 33 distinct projects, the portfolio demonstrates a consistent thesis: sophisticated investment analysis shouldn’t require Bloomberg terminals, hedge fund employment, or six-figure software budgets. Multi-agent AI architectures, LLM-powered analysis, and systematic triangulation can deliver comparable insights using free APIs, local models, and open-source tools.

But thesis and implementation are different things. Some of these projects are production-ready systems with documented architectures and proven methodologies. Others are conceptual frameworks awaiting implementation. Many fall somewhere between—working prototypes that demonstrate feasibility but lack the robustness, validation, and documentation required for actual deployment.

What makes this collection valuable isn’t just the individual projects. It’s the recurring patterns across them: multi-agent coordination solving complex analysis tasks, triangulation reducing single-source errors, RAG architectures grounding LLM outputs in verified data, systematic frameworks ensuring consistency. When these patterns work, they work remarkably well. When they fail, understanding why they failed teaches as much as success would have.

The Current Landscape

The 33 projects cluster into several categories, each addressing different aspects of financial intelligence:

Investment Analysis Systems (Projects 1-4, 23, 27): Multi-agent architectures for comprehensive company evaluation combining financial metrics, technology assessment, market positioning, risk analysis, and valuation. These demonstrate the power of specialized agents coordinating toward unified recommendations—when coordination mechanisms work properly. The weakness: heavy LLM reliance without systematic hallucination detection, unclear conflict resolution when agents disagree, limited backtesting validating recommendations actually generate alpha.

Portfolio Management Tools (Projects 5, 7, 16, 24-25, 28, 30): Systems for goal setting, stress testing, diversification analysis, and risk monitoring. These tackle the critical problem of understanding portfolio behavior before markets force the lesson. The Monte Carlo simulations are statistically sound. The regime-dependent correlation analysis reveals genuine insight about diversification degradation during stress. The limitation: backward-looking analysis assumes future resembles past, simplified return distributions miss fat tails, no transaction costs or tax considerations.

Data Collection & Intelligence (Projects 6, 10-13, 15, 17-21, 29, 32-33): Automated systems scraping congressional trades, patent filings, SEC documents, news feeds, regulatory announcements, funding events, GitHub repositories. This is where systematic data collection creates information asymmetry—seeing patterns invisible to manual analysis. The challenge: data quality varies wildly, keyword-based classification is brittle, sentiment-price correlations assumed not proven, no statistical significance testing.

Natural Language Interfaces (Projects 8, 14, 22-23, 26): RAG systems and LLM-powered query routers making complex financial analysis accessible through conversational interfaces. These represent the democratization promise—asking “What’s my portfolio’s tail risk?” in plain English and receiving rigorous analysis. The constraint: hallucination risk when LLMs synthesize across sources, extraction accuracy unvalidated, no confidence calibration.

Orchestration & Meta-Systems (Projects 9, 13, 23): Coordination layers routing queries to specialized agents, aggregating results, managing workflow execution. These address the hardest problem: making diverse systems work together reliably. The n8n implementations demonstrate visual workflow design works for moderate complexity. The Apache Airflow proposals promise enterprise-grade orchestration. Neither solves the fundamental challenge: when specialized agents return conflicting recommendations, who decides?

The Fellows Opportunity

Unlike the textbook projects where the work is building exercises and validating content, Mycroft projects offer three distinct contribution paths:

1. Reviving Dormant Projects

Several projects have proven concepts but Fellows have left for jobs, leaving functioning prototypes without active maintenance. These offer immediate value:

Production-ready systems needing deployment hardening: Congressional Trading Tracker (Projects 6, 15), Portfolio Stress Test (Projects 7, 16), Regulatory Scanner (Project 29)—working code requiring error handling, logging, monitoring, documentation for production use
Proof-of-concept systems needing validation: Investment Analysis Templates (Projects 1-4, 27)—multi-agent frameworks requiring backtesting, hallucination detection, systematic performance measurement
Data collection pipelines needing quality improvement: Patent Intelligence (Project 17), Funding Intelligence (Project 20), Tech Stack Analysis (Projects 32-33)—scrapers working but classification accuracy needs improvement through better ML models or LLM integration

Taking over a dormant project means:

Reviewing existing codebase (quality varies—some clean Python, some spaghetti n8n workflows)
Documenting current functionality and limitations (often underdocumented)
Identifying critical gaps (error handling, validation, performance, scalability)
Proposing specific improvements with measurable success criteria
Coordinating with project manager on priorities and timeline

2. Extending Active Projects

Projects with active Fellows offer collaboration opportunities for specific enhancements:

Adding validation layers: Many projects use LLMs without hallucination detection. Building triangulation validators (compare LLM output to structured data sources, calculate agreement scores, flag discrepancies) would significantly improve reliability.
Implementing proposed features: Documentation describes “Phase 2” features not yet built—FinBERT integration for sentiment (Projects 18, 19), multi-hop reasoning for RAG systems (Project 8), adaptive risk calibration (Project 11). These are scoped work packages waiting for implementation.
Building evaluation frameworks: Most projects lack systematic performance measurement. Creating backtesting harnesses (for trading signals), accuracy benchmarks (for data extraction), or user satisfaction metrics (for interfaces) enables evidence-based improvement.
Strengthening agentic capabilities: Many “agents” are actually pipelines with LLM components. True agentic behavior—autonomous goal pursuit, learning from feedback, multi-step planning—requires architectural changes documented in proposals but not implemented.

Extending active projects requires:

Coordinating with current Fellows (they must agree to collaboration)
Understanding existing architecture and design decisions
Proposing enhancements that integrate cleanly (not major refactoring)
Demonstrating value through prototypes before major investment
Documenting contributions for project continuity

3. Proposing New Projects

The portfolio has gaps where important financial intelligence capabilities remain unaddressed:

Earnings call analysis: Transcripts contain forward-looking statements, management tone shifts, competitive intelligence. RAG + FinBERT could extract structured insights. No current project does this systematically.
Options flow analysis: Unusual options activity signals informed traders. Scraping options volume, calculating deviations from historical norms, correlating with subsequent price movements would provide edge. Gap in current portfolio.
Supply chain intelligence: Tracking shipping data, satellite imagery of parking lots, job postings by location reveals operational health before financial statements. Mentioned conceptually, never implemented.
Cross-asset correlation monitoring: Bonds, commodities, currencies, volatility indices provide diversification—when correlations remain stable. System monitoring correlation regime changes would warn when diversification degrades. Related to Project 7 but broader scope.
Regulatory impact prediction: When SEC proposes rules, which companies benefit, which face compliance costs? LLM analysis of rule text + company business models could generate actionable signals. Project 29 tracks regulations but doesn’t analyze impacts.

New project proposals require:

Problem statement: What financial intelligence gap does this address?
Differentiation: How does this differ from existing projects (1-33)?
Methodology: Specific approach including data sources, models, validation
Feasibility: Can this be built with available free/low-cost tools and APIs?
Success criteria: How will we measure whether this works?
Resource estimate: Approximate development timeline and effort
Coordination with project manager before significant work begins

What Success Looks Like

Contributions to Mycroft projects should produce:

For dormant project revivals:

Comprehensive documentation (architecture, data flows, deployment)
Production-ready code (error handling, logging, monitoring, tests)
Validation results (accuracy metrics, performance benchmarks, user feedback)
Deployment guides (enabling others to run/maintain the system)

For active project extensions:

Feature implementations integrated with existing codebase
Performance improvements (speed, accuracy, reliability) with measurements
Evaluation frameworks enabling systematic testing and improvement
Documentation updates reflecting new capabilities

For new projects:

Working prototype demonstrating core functionality
Validation showing the approach works (backtesting, accuracy tests, user trials)
Documentation enabling others to understand, use, and extend
Integration considerations (how this connects to existing Mycroft systems)

Additionally, all contributions should include:

Critical analysis: What worked? What failed? What would you do differently?
Limitations documentation: What can’t this system do? When does it fail?
Generalization potential: Could this approach apply to other domains?
Ethical considerations: What are risks of misuse? How to mitigate?

The Coordination Requirement

Unlike open-source projects where you can fork and contribute freely, Mycroft requires coordination with the project manager before significant work. This isn’t bureaucratic overhead—it’s recognition that:

Active projects have Fellows with context: They understand design decisions, tried-and-failed approaches, planned directions. Coordinating prevents duplicate work and ensures contributions integrate cleanly.

Dormant projects may have institutional knowledge: Previous Fellows may have documented non-obvious challenges, data source reliability issues, or validation results not yet in formal documentation. Project manager can connect you with them.

New projects may duplicate existing work: What seems like a gap might be covered by an unlisted project, or abandoned because fundamental barriers emerged. Manager prevents wasted effort.

Resource allocation matters: With 33 projects, some deserve sunset (abandon gracefully), others deserve investment (double down on success), others deserve maintenance mode (keep running but don’t extend). Manager coordinates portfolio strategy.

The coordination process:

Express interest to project manager identifying specific project(s)
Receive context about current status, previous Fellows, known issues
For active projects: Introduced to current Fellows, must gain agreement
For dormant projects: Receive handoff materials, commit to ownership
For new projects: Present proposal, receive feedback, adjust scope
All paths: Agree on success criteria, timeline, communication cadence

Why This Matters

The Mycroft portfolio isn’t just about building financial intelligence tools. It’s testing whether open-source AI systems can democratize capabilities traditionally requiring institutional resources.

When congressional trading analysis is a one-click dashboard instead of manual EDGAR scraping, individual investors gain transparency previously available only to investigative journalists.

When portfolio stress testing shows diversification degrading during volatility spikes, retail investors avoid concentration risk that institutional risk managers detect with expensive Bloomberg terminals.

When RAG-powered regulatory analysis answers “How does this SEC rule affect my holdings?” in plain English, compliance becomes accessible instead of requiring legal expertise.

When multi-agent investment analysis combines financial metrics, patent intelligence, earnings execution, and competitive benchmarking, small investors approach sophistication of analyst teams.

But only if the systems work reliably.

A stress testing tool that hallucinates correlation matrices produces false confidence leading to unexpected losses. A congressional trading tracker with 50% accuracy generates noise, not signal. An investment recommendation system that hasn’t been backtested is speculation, not intelligence.

The humanitarian dimension: building broken tools isn’t just unhelpful—it’s dangerous. It creates false confidence leading to poor decisions. It wastes users’ time and attention. It undermines trust in AI-powered financial analysis broadly.

This is why validation, documentation, and critical analysis matter as much as feature development. A well-documented limitation prevents misuse. A rigorously backtested signal generates justified confidence. A clearly explained failure mode enables informed judgment.

Getting Started

If you’re reading this thinking:

“I could validate whether those multi-agent investment recommendations actually beat the market”
“I could build the hallucination detection layer those RAG systems need”
“I could revive that congressional trading tracker and deploy it production-ready”
“I could implement the earnings call analysis gap in the portfolio”

Then coordinate with the Mycroft project manager. Explain which project interests you, what specific contribution you’d make, what success would look like, and approximately how long you’d need.

Some projects need rescuing from abandonment. Others need extending with specific capabilities. The portfolio needs new projects addressing documented gaps. All need the kind of rigorous implementation, validation, and documentation that transforms proof-of-concept into production-ready systems.

The work isn’t building demos that impress in screenshots. It’s building systems that work reliably when deployed, fail gracefully when pushed beyond design limits, and document their limitations honestly so users can make informed decisions.

That’s the work that determines whether AI-powered financial intelligence democratizes sophistication or just democratizes overconfidence.

Project 1: Computational Finance Textbook

Core Claim: Systematic AI-driven investment analysis combining multiple LLMs, computational verification, and structured frameworks can provide more reliable investment recommendations than traditional single-method approaches. The triangulation methodology reduces errors while agent-based analysis enables comprehensive multi-factor evaluation.

Logical Method:

Multi-Agent Analysis Architecture with 7 specialized agents (Controller, Financial Analysis, Technology Assessment, Market Position, Strategy & Execution, Risk Assessment, Valuation)
Triangulation Validation across multiple platforms/models
Data Integration combining financial statements, market data, technical indicators, alternative data
Systematic Framework applying consistent analysis template across all companies

Methodological Soundness:

Strengths: Agent specialization enables deep domain expertise, systematic framework ensures consistency, triangulation reduces errors, multiple data sources increase reliability, explicit risk assessment
Weaknesses: Heavy reliance on LLMs which may hallucinate financial data, no details on how agent conflicts are resolved, unclear how qualitative factors are quantified, limited information on backtesting results, potential for overfitting to specific AI sector

Use of LLMs: Pervasive integration - each agent likely powered by LLMs, natural language analysis of qualitative factors, code generation for calculations, report generation and synthesis, triangulation using multiple LLMs

Use of Agentic AI: Explicit multi-agent system with Controller Agent coordinating specialized agents, task delegation, conflict resolution, completion verification, but no details on implementation or learning mechanisms

Project 2: AI-Focused Investment Strategy with Agent-Based Analysis

Core Claim: Systematic AI-driven investment analysis using specialized agent architectures can provide superior investment recommendations compared to traditional single-analyst approaches by deploying multiple specialized agents coordinated by a Controller Agent.

Logical Method:

Multi-Agent Architecture with 8 agents (Controller, Financial Analysis, Technology Assessment, Market Position, Strategy & Execution, Risk Assessment, Valuation, Report Generation)
Data Integration from SEC filings, financial databases, patent filings, academic papers, market research, job postings
Systematic Framework with standard template across companies, consistent scoring methodology, triangulation across agents
Quantitative Selection Process with primary/secondary factor weighting

Methodological Soundness:

Strengths: Agent specialization for domain expertise, systematic framework ensures comparability, multiple data sources reduce single-stream reliance, explicit risk assessment, quantitative + qualitative balance, scalability, transparency
Weaknesses: LLM hallucination risk, agent coordination unspecified, no backtesting, unclear qualitative factor quantification, AI sector concentration overfitting, data staleness from SEC filings, no evaluation metrics, correlation assumptions may not hold

Use of LLMs: Pervasive and structural - data extraction from Form D filings (25,000+ companies), natural language analysis, agent implementation with custom prompting, report generation, triangulation applied

Use of Agentic AI: Explicit multi-agent system design with hierarchy (Controller supervises specialized agents), communication patterns, workflow orchestration, autonomous data collection, analysis execution, quality checking

Project 3: $80K AI Sector Investment Strategy (July 2025)

Core Claim: A carefully constructed $80,000 investment portfolio focused on AI sector opportunities can capture significant growth while managing risk through systematic allocation across core earnings plays (60%), growth/momentum positions (17.5%), and AI financial instruments/ETFs (22.5%).

Logical Method:

Three-Tier Allocation Strategy: Core Earnings Holdings (60%), Growth/Momentum (17.5%), AI Financial Instruments (22.5%)
Timing Strategy with immediate deployment sequenced over one week to capture earnings catalysts
Risk Management through position sizing, stop-loss levels (10-12% stocks, 8-10% ETFs, 15% SOXL)
Expected Outcomes with probabilistic scenarios (Bull 35%, Base 50%, Bear 15%)

Methodological Soundness:

Strengths: Analyst consensus data for selection, earnings catalyst alignment, risk-adjusted position sizing, diversification via ETFs, explicit risk management, concrete metrics
Weaknesses: Very short time horizon (1-2 months), concentration risk (70% in 5 stocks), correlation unaddressed, leverage risk (SOXL 3x), catalyst dependency, arbitrary scenario probabilities, analyst rating limitations, no tax considerations, transaction costs ignored

Use of LLMs: Implicit but essential - likely used for data synthesis (analyst ratings aggregation), research assistance, report generation with structured documentation

Use of Agentic AI: Limited explicit application - no autonomous agent systems, potential enhancements proposed (Monitoring Agents, Rebalancing Agents, Market Intelligence Agents) but not implemented

Project 4: AI Company Investment Analysis Template System

Core Claim: A comprehensive, standardized template for analyzing AI companies combined with specialized agent prompts can ensure consistent, thorough investment analysis across diverse AI sector participants through both quantitative financial metrics and qualitative strategic factors.

Logical Method:

Structured Analysis Template with 8 major sections (Executive Summary through Investment Considerations)
Agent Specialization with 6 specialized agents + 2 coordination agents
AI-Specific Metrics innovation (AI revenue %, AI R&D %, compute investment, etc.)
Estimation Methodology for non-disclosed data using segment reporting, analyst estimates, patent/R&D activity

Methodological Soundness:

Strengths: Comprehensive coverage (60+ metrics), standardization for comparability, source hierarchy prioritization, AI-specific innovation, qualitative + quantitative balance, multiple valuation methods, risk taxonomy, actionable output
Weaknesses: Data availability challenge, estimation uncertainty, template rigidity, subjectivity in qualitative factors, benchmark selection difficulty, time intensive, agent coordination complexity, overfitting risk

Use of LLMs: Integral throughout - data extraction from SEC filings, estimation and imputation, qualitative analysis, agent prompting with detailed instructions, report synthesis

Use of Agentic AI: Explicit multi-agent architecture with defined roles (Controller, 6 specialized agents, Report Generation), coordination mechanisms, work plan creation, data handoffs, conflict resolution, completion verification

Project 5: Mycroft Goal Simulator (Goal Setting System)

Core Claim: Investment goal planning can be democratized through a system that combines natural language processing (local LLM) with Monte Carlo financial simulation to provide data-driven success probabilities and actionable recommendations without expensive financial advisor fees.

Logical Method:

Natural Language Goal Extraction using local LLaMA model via Ollama
Historical Market Data Collection (20 years from yfinance)
Monte Carlo Simulation (100-10,000 scenarios)
Probability Analysis with percentile calculations
Recommendation Generation based on success thresholds

Methodological Soundness:

Strengths: Local processing (privacy + no costs), free data, probabilistic approach, multiple scenarios, inflation-adjusted, Pydantic validation, fallback strategy, modular architecture
Weaknesses: LLM extraction accuracy issues, historical data assumption, no market regime awareness, simple portfolio model, no tax consideration, no sequence risk, limited asset classes, single-goal optimization, no human capital modeling, fixed inflation

Use of LLMs: Central to workflow - Local LLaMA 3.1 (8B) for goal extraction with prompt engineering, JSON parsing, natural language understanding

Use of Agentic AI: Limited - system is pipeline not agent, no autonomous decision-making, no learning, potential enhancements proposed (Goal Refinement Agent, Portfolio Recommendation Agent, Monitoring Agent, Learning Agent)

Project 6: Congressional Trading Analysis System

Core Claim: Members of Congress trade on non-public information, detectable through systematic analysis of trading patterns, timing relative to price movements, and filing delays by automating collection, analysis, and visualization of congressional stock trades.

Logical Method:

Automated Data Collection via Selenium-based scraper from Capitol Trades
Stock Performance Analysis using Yahoo Finance (30 days pre/post-trade)
Pattern Detection Indicators (buy before surge, sell before decline, timing advantage)
Dual Visualization Interface (chronological view, politician view)

Methodological Soundness:

Strengths: Public data, systematic collection, quantitative analysis, visual validation, audit trail, background processing, reproducible
Weaknesses: Correlation ≠ causation, no control group, selection bias, limited context, filing delays, incomplete data, no statistical testing, price movement attribution unclear, 30-day window arbitrary, no risk adjustment, survivorship bias

Use of LLMs: Minimal direct usage currently - potential applications in document classification, pattern narrative generation, contextual analysis, anomaly detection

Use of Agentic AI: Current architecture is task-based pipeline, not truly agentic - potential for Autonomous Monitoring Agent, Investigation Orchestrator Agent, Comparative Analysis Agent, Alert Generation Agent

Project 7: Portfolio Stress Test System - Layer 1 (Regime-Dependent Diversification)

Core Claim: Portfolio diversification degrades significantly during market stress - a portfolio appearing well-diversified in calm markets may behave like a concentrated position during crises. Analyzing correlation structure across volatility regimes identifies “diversification illusions” before catastrophic failure.

Logical Method:

Volatility Regime Classification using VIX index (Low <15, Medium 15-25, High >25)
Return Calculation by Regime with log returns
Diversification Metrics (Average Pairwise Correlation, Max Correlation, Effective Number of Assets)
Degradation Analysis comparing Low VIX vs High VIX
Visualization with correlation heatmaps and degradation charts

Methodological Soundness:

Strengths: Regime-specific analysis, historical validation, effective assets metric, VIX as established indicator, rolling average smoothing, minimum regime duration, multi-layer design, free data
Weaknesses: Backward looking, VIX-specific only, fixed thresholds, log returns assumption, no causal analysis, static portfolio weights, equity-focused, 20-day minimum limitation, single time horizon, no forward testing

Use of LLMs: No current integration - potential applications in report generation, recommendation enhancement, pattern explanation, interactive Q&A

Use of Agentic AI: Current state is analysis pipeline - potential for Portfolio Monitoring Agent, Diversification Optimization Agent, Regime Prediction Agent, Multi-Layer Coordinator Agent

Project 8: Regulatory QA System with RAG

Core Claim: Financial regulatory information scattered across multiple agencies can be transformed into proactive investment intelligence through a RAG system combining full document retrieval, semantic search, and local LLM inference, enabling natural language queries over complete regulatory corpus.

Logical Method:

Document Collection from existing Regulatory Intelligence Agent (SEC, FINRA, CFTC, Federal Register)
Full Document Retrieval with web scraping (BeautifulSoup)
RAG Pipeline: Document → Text Splitting (2000 char chunks, 400 overlap) → Embedding Generation → Vector Storage (ChromaDB) → Query Processing
Semantic Chunking Strategy balancing context window vs retrieval precision
Portfolio-Aware Context (planned enhancement)

Methodological Soundness:

Strengths: Full document context, semantic search, local inference (privacy + no cost), source citations, selective querying, chunk overlap, modular architecture, multiple format handling
Weaknesses: Retrieval quality dependency, chunk size tradeoff, general-purpose embedding model, no ranking refinement, in-memory vector store, no document versioning, limited cross-document reasoning, no temporal awareness, hallucination risk, no confidence scores, network dependency

Use of LLMs: Central architecture component - Ollama local models (llama3.2:3b, llama3.1:8b, mistral:7b) for document understanding, answer generation with context

Use of Agentic AI: Proposal only - describes proactive monitoring agents, multi-hop research, clarification dialogs (NOT implemented in current state)

Project 9: Mycroft Framework - Orchestration Layer

Core Claim: A sophisticated orchestration layer coordinating multiple specialized AI agents can deliver superior investment intelligence compared to monolithic AI systems through cross-agent validation, dynamic task allocation, pattern recognition, decision optimization, and continuous learning.

Logical Method:

Proposed Architecture using Apache Airflow/Prefect, Celery, Kafka/Redis, Docker, FastAPI/gRPC
Five Orchestration Mechanisms: Cross-Agent Validation, Dynamic Task Allocation, Pattern Recognition, Decision Optimization, Continuous Learning
Phased Implementation from prototype to production

Methodological Soundness:

Strengths: Modular design, horizontal scaling, transparency, battle-tested tools, observability, fault tolerance, async processing, API-driven
Weaknesses: Complexity overhead, DevOps required, latency introduction, custom glue code, debugging difficulty, resource intensive, learning curve, version management, single point of failure

Use of LLMs: LLM as orchestration intelligence - query understanding/routing, result synthesis, conflict resolution, pattern detection

Use of Agentic AI: Conceptual framework vs implementation - proposed capabilities (cross-agent validation, dynamic allocation, pattern recognition, learning) vs current implementation (query router + aggregator, manual orchestration)

Project 10: AI Talent Intelligence Agent (n8n Workflow)

Core Claim: Tracking AI researcher movements, paper publications, and hiring patterns provides early signals about company strategic direction and innovation capability through automated workflow combining ArXiv monitoring, news tracking, and AI-powered significance scoring.

Logical Method:

Multi-Source Data Collection (ArXiv API, Serper News API, Researcher Database)
Data Merging Pipeline in n8n
AI-Powered Analysis using Groq LLM (extraction, sentiment, significance scoring 1-10)
Filtering and Aggregation (significance >5)
Report Generation via HTML email

Methodological Soundness:

Strengths: Leading indicators, quantifiable signals, multi-source validation, free tier APIs, automated filtering, structured output, n8n visual workflow, production architecture
Weaknesses: ArXiv parsing incomplete, researcher DB mock data, subjective significance scoring, no historical baseline, Groq rate limits, single LLM (no triangulation), email-only output, no entity resolution, hiring vs departure not distinguished, company attribution errors

Use of LLMs: Central - Groq LLM (llama3.1 or mixtral-8x7b) for entity extraction from unstructured news/papers

Use of Agentic AI: Current state is scheduled batch processing - potential for Continuous Monitoring Agent, Talent Flow Analysis Agent, Research Impact Predictor Agent, Investment Signal Generator Agent

Project 11: AI News Sentiment Agent (n8n Workflow with FinBERT)

Core Claim: Financial news sentiment analysis using domain-specific models (FinBERT) can identify high-risk market events requiring immediate attention by processing multi-source feeds through AI-powered sentiment classification and multi-factor risk scoring.

Logical Method:

Multi-Source News Collection (NewsAPI, RSS feeds, Google News)
Two-Tier Architecture: Version 1 keyword-based, Version 2 FinBERT AI model
Multi-Factor Risk Scoring Algorithm combining sentiment, keywords, source credibility, market symbols
Alert Generation based on risk level
Database Storage in PostgreSQL

Methodological Soundness:

Strengths: Domain-specific FinBERT (95%+ accuracy), multi-factor risk beyond sentiment, real-time processing, historical tracking, source credibility weighting, structured data, filtering
Weaknesses: FinBERT 512 token limit, simple keyword detection, hard-coded source credibility, no event deduplication, sentiment ≠ market impact, no causal analysis, recency bias, no entity resolution, alert fatigue risk, no backtesting

Use of LLMs: Central - FinBERT (ProsusAI/finbert, specialized BERT for financial sentiment) for classification with probability distribution

Use of Agentic AI: Current implementation is reactive pipeline - potential for Real-Time Streaming Agent, Adaptive Risk Calibration Agent, Thematic Analysis Agent, Portfolio Impact Assessment Agent

Project 12: Finance Phrase Extraction Agent (n8n + React)

Core Claim: Financial documents contain critical terminology and KPIs that must be extracted for structured analysis. An AI-powered phrase extraction system using Gemini LLM can identify 50+ financial terms enabling automated intelligence gathering and trend analysis.

Logical Method:

End-to-End Pipeline: React Frontend → n8n Webhook → Gemini AI → JSON Cleaner → PostgreSQL → Response
Gemini-Powered Extraction with detailed prompt engineering
JSON Sanitization using regex cleaning
Dual Storage Strategy (PostgreSQL + React UI)
React Frontend with 3 pages (Extractor, History, Analytics)

Methodological Soundness:

Strengths: Production architecture (full-stack), Gemini AI integration, JSON validation, PostgreSQL TEXT[] native arrays, immediate API response, comprehensive testing, mobile responsive, export functionality, built-in analytics
Weaknesses: Gemini API dependency, ambiguous phrase definition, no context preservation, duplicate detection issues, no semantic grouping, limited validation, performance at scale unclear, JSON cleaning brittleness, no confidence scores, English only

Use of LLMs: Integral - Gemini 2.5 for phrase extraction with structured output, JSON-only response format

Use of Agentic AI: Current state is request-response API - potential for Document Monitoring Agent, Trend Detection Agent, Cross-Document Comparison Agent, Semantic Clustering Agent

Project 13: Financial Intelligence Hub Orchestrator (n8n Meta-Workflow)

Core Claim: A meta-orchestration layer that intelligently routes queries to specialized analysis workflows (SEC filings, patents, news) and synthesizes results using LLMs can provide comprehensive financial intelligence through single conversational interface.

Logical Method:

Three-Tier Architecture: Specialized Analysis Workflows, Intelligence Hub Orchestrator, User Interface
Query Processing Flow with LLM routing
LLM Routing Logic determining workflows to call
Result Synthesis generating unified reports
Execution Logging for debugging

Methodological Soundness:

Strengths: Modular microservices, intelligent routing, parallel execution, local LLM (privacy + no cost), flexible query handling, comprehensive logging, webhook-based, production URLs
Weaknesses: Single LLM decision point, no confidence scores, sequential synthesis, no error recovery, context loss, no multi-turn dialog, limited workflow discovery, no result ranking, latency accumulation, no caching

Use of LLMs: Dual LLM strategy - LLM as Router (intent classification) and LLM as Analyst (multi-source synthesis), using Ollama llama3.2:3b

Use of Agentic AI: Current implementation is meta-workflow orchestration - has autonomous routing, multi-step planning, adaptive behavior, but lacks learning, goal-directed behavior, iterative refinement

Project 14: Goal Setting System with LLM Enhancement

Core Claim: Local LLMs can extract structured goal parameters from natural language descriptions and calculate achievement probabilities using Monte Carlo simulation, enabling personalized financial planning without cloud dependency.

Logical Method:

Natural language goal input via Streamlit interface
Ollama-based LLM (LLaMA 3.1:8b) extracts structured parameters
JSON parsing with validation and error handling
Monte Carlo simulation (10,000 iterations) for probability calculation
Statistical analysis of outcomes with visualization

Methodological Soundness:

Strengths: Local processing preserves privacy, statistically sound Monte Carlo approach, structured data extraction appropriate, graceful error handling
Weaknesses: No model fine-tuning for finance, arbitrary 10k iteration count, oversimplified market assumptions (fixed 7% return, 15% volatility), no validation of LLM extraction accuracy

Use of LLMs: Core feature - Ollama LLaMA 3.1:8b for natural language to structured data extraction

Use of Agentic AI: None - single-step extraction without goal pursuit or autonomous behavior

Project 15: Congressional Trading Tracker

Core Claim: Systematic scraping and analysis of congressional stock trading disclosures can identify potential conflicts of interest and insider trading patterns by correlating trades with committee assignments and market performance.

Logical Method:

Automated web scraping from House/Senate disclosure websites
Data normalization (ticker symbols, transaction dates, amounts)
Committee assignment correlation via web scraping
Celery-based background task processing for scalability
PostgreSQL storage with indexed queries for analysis

Methodological Soundness:

Strengths: Uses official government data sources, addresses real transparency gap, systematic data collection, scalable architecture
Weaknesses: No statistical testing for significance, relies on self-reported data, delayed filings (up to 45 days), incomplete disclosure data, no control for legitimate trading

Use of LLMs: Minimal/None - pure data collection and processing

Use of Agentic AI: None - scheduled data collection without autonomous decision-making

Project 16: Portfolio Stress Test & Optimization

Core Claim: Monte Carlo simulation combined with mean-variance optimization can quantify portfolio risk across multiple scenarios while identifying efficient rebalancing strategies that maximize risk-adjusted returns.

Logical Method:

Historical return/covariance matrix calculation from yfinance data
Monte Carlo simulation (10,000 iterations) across 5 market scenarios
Mean-variance optimization (Efficient Frontier calculation)
Sharpe ratio maximization for optimal allocation
Statistical analysis with percentile-based risk metrics

Methodological Soundness:

Strengths: Established portfolio theory, appropriate statistical methods, multiple scenario testing, clear visualization of risk-return tradeoff
Weaknesses: Assumes normal return distributions (not realistic), historical correlation may not predict future, no tail risk measures (VaR/CVaR), fixed 5 scenarios may miss important cases, no transaction cost consideration

Use of LLMs: None - pure quantitative financial modeling

Use of Agentic AI: None - computational analysis without autonomous behavior

Project 17: Patent Intelligence System

Core Claim: Systematic patent monitoring from USPTO PatentsView combined with AI classification can identify innovation trends and competitive intelligence 6-24 months before public announcements.

Logical Method:

USPTO PatentsView API extraction with cursor pagination
Patent metadata parsing (inventors, assignees, CPC codes)
AI classification using keyword matching + CPC analysis
Company name normalization (alias dictionary)
Citation network analysis for influence metrics

Methodological Soundness:

Strengths: Uses authoritative USPTO data, appropriate lead time analysis, structured patent metadata, citation analysis valid
Weaknesses: Keyword classification oversimplified, no patent quality assessment, defensive vs innovative patents not distinguished, 18-month publication lag reduces timeliness, no validation of innovation predictions

Use of LLMs: Planned (Phase 2) - for patent classification, summarization, technology trend extraction (NOT currently implemented)

Use of Agentic AI: Proposal only - describes autonomous monitoring, inventor tracking, competitive alerts (NOT implemented)

Project 18: SEC Filings Financial Metrics Agent

Core Claim: Automated XBRL parsing from SEC EDGAR can extract standardized financial metrics and calculate ratios, eliminating manual data entry while enabling systematic company analysis.

Logical Method:

SEC EDGAR API queries by ticker/CIK
XBRL tag extraction (income statement, balance sheet, cash flow)
Financial ratio calculations (margins, ROE, leverage, efficiency)
Multi-period trend analysis (QoQ, YoY)
Export to JSON, CSV, and Pandas DataFrame

Methodological Soundness:

Strengths: Uses official SEC data, XBRL standardization reduces errors, comprehensive ratio coverage, multi-format export
Weaknesses: XBRL tag variability across companies, no non-GAAP metric handling, missing segment breakdowns, no accounting method adjustments, no separation of one-time items

Use of LLMs: Planned (Phase 2) - for MD&A sentiment, risk factor analysis, forward-looking statement extraction (NOT currently implemented)

Use of Agentic AI: Proposal only - describes continuous monitoring, anomaly detection, thesis validation (NOT implemented)

Project 19: Forecasting Agent

Core Claim: Combining Alpha Vantage market data with FinBERT sentiment analysis can generate probabilistic stock forecasts (optimistic/realistic/pessimistic scenarios) with quantified confidence levels.

Logical Method:

Alpha Vantage OHLC price data retrieval
FinBERT sentiment scoring of related news
Historical volatility calculation
Scenario generation (optimistic/realistic/pessimistic)
Risk level classification and confidence scoring

Methodological Soundness:

Strengths: Multi-scenario approach acknowledges uncertainty, volatility analysis standard practice, sentiment integration reasonable
Weaknesses: No statistical validation of forecasts, scenario generation method unclear, confidence scores not calibrated, no backtesting results, sentiment-price correlation assumed not tested

Use of LLMs: FinBERT for news sentiment (domain-specific financial sentiment model)

Use of Agentic AI: None - request-response forecasting without goal-directed behavior

Project 20: Funding Intelligence Agent

Core Claim: Multi-source web scraping (TechCrunch, VentureBeat) with keyword-based filtering can identify AI startup funding announcements at 85%+ accuracy, saving 10+ hours/week of manual research.

Logical Method:

Zyte API web scraping with JavaScript rendering
HTML parsing for article metadata
Keyword-based funding detection (raised, series, $M, etc.)
Industry classification using keyword scoring
Dual storage (PostgreSQL + Google Sheets)

Methodological Soundness:

Strengths: Uses authoritative tech news sources, keyword approach efficient, dual storage enables analysis + visualization, deduplication prevents errors
Weaknesses: 85% accuracy leaves 15% false positives/negatives, keyword matching brittle, limited to 2 sources, no funding amount extraction, classification simplistic

Use of LLMs: Planned (Phase 2) - for funding amount parsing, company extraction, investor identification (NOT currently implemented)

Use of Agentic AI: None - scheduled scraping without autonomous behavior

Project 21: Investor Intelligence Agent

Core Claim: Natural language query parsing combined with SQL routing can enable conversational exploration of investor-startup relationships through structured database queries.

Logical Method:

Natural language query classification (investor profile, startup investors, recent deals, top investors)
Route to appropriate SQL query template
PostgreSQL query execution
Result formatting for chatbot interface
HTML UI for interactive exploration

Methodological Soundness:

Strengths: Structured query approach reliable, classification-based routing efficient, SQL queries optimized, clear data model
Weaknesses: Fixed query templates limit flexibility, no fuzzy entity matching, classification errors break system, simulated peer data in some cases, no validation of query intent accuracy

Use of LLMs: Minimal - basic query classification only (could be rule-based)

Use of Agentic AI: None - query routing without autonomous reasoning

Project 22: Financial Literacy Bot with RAG

Core Claim: RAG-enhanced chatbot using PDF knowledge base, web search, and conversational memory can provide personalized financial education adapted to user’s learning progress.

Logical Method:

Dual-source intelligence (PDF vector search + web search)
HuggingFace embeddings + Pinecone vector database
Groq LLM for answer generation
Window buffer memory for conversation context
PostgreSQL for long-term learning history

Methodological Soundness:

Strengths: RAG appropriate for educational content, multi-source approach comprehensive, memory enables personalization, tracks learning progression
Weaknesses: Knowledge base limited to uploaded PDFs, web search not validated, learning gap analysis subjective, no assessment of comprehension, personalization algorithm unclear

Use of LLMs: Central - Groq (llama-3.3-70b-versatile) for answer generation and gap analysis

Use of Agentic AI: Limited - session memory and learning history, but no autonomous goal pursuit

Project 23: Mycroft Orchestrator

Core Claim: LLM-powered query routing can intelligently dispatch natural language requests to specialized financial/patent analysis agents based on intent detection.

Logical Method:

Ollama Llama 3 analyzes user query
Extracts tool name and parameters as JSON
JavaScript validation against tool schemas
HTTP routing to appropriate agent webhook
Response aggregation and delivery

Methodological Soundness:

Strengths: Natural language interface reduces friction, schema validation prevents errors, extensible architecture, clear separation of concerns
Weaknesses: Single-agent routing only (no multi-tool queries), intent detection errors break flow, no context persistence across queries, local LLM may struggle with edge cases

Use of LLMs: Central - Ollama Llama 3 for intent extraction and parameter identification

Use of Agentic AI: v1.0 has none; v2-dev proposes multi-tool coordination, persistent storage, proactive discovery (NOT implemented)

Project 24: Portfolio Intelligence Agent with RAG

Core Claim: RAG-enhanced portfolio tracking combining live price data with knowledge base retrieval can generate personalized, context-aware daily analysis that improves over time through auto-learning.

Logical Method:

Live stock prices from Yahoo Finance
Historical portfolio data from CSV
RAG retrieval of relevant insights (5-7 per analysis)
Groq LLM generates context-aware summary
Extract new insights and update knowledge base

Methodological Soundness:

Strengths: RAG provides personalization, auto-learning grows intelligence, educational focus appropriate, tracks actual portfolio
Weaknesses: Yahoo Finance delayed prices (15-20 min), knowledge base quality depends on accumulated data, no validation of AI insights, retrieval scoring arbitrary, learning from LLM output not validated

Use of LLMs: Central - Groq (Llama 3.3 70B) for portfolio analysis and insight extraction

Use of Agentic AI: Limited - auto-learning loop and historical pattern recognition, but no autonomous goal pursuit

Project 25: Portfolio Visualization Agent

Core Claim: Real-time portfolio tracking with interactive visualizations can provide instant insights into holdings, gains/losses, and allocation through web-based dashboard.

Logical Method:

Yahoo Finance API for live stock prices
Portfolio value and gain/loss calculations
Chart.js for interactive pie chart
HTML/CSS dashboard generation
One-click refresh capability

Methodological Soundness:

Strengths: Simple and effective visualization, real-time data appropriate, mobile responsive, Chart.js reliable library
Weaknesses: Yahoo Finance rate limits (100/hour), no historical tracking, calculations done per-request (inefficient), no benchmark comparison, single-user only

Use of LLMs: None - pure data visualization

Use of Agentic AI: None - request-response dashboard without autonomous behavior

Project 26: Product Recommendation Agent

Core Claim: Multi-criteria scoring (category match, features, budget, company size, industry, ratings) combined with AI reasoning can generate personalized SaaS product recommendations for small businesses.

Logical Method:

User requirements collected via webhook
PostgreSQL product catalog retrieval
Rule-based scoring (30 categories + features + budget + size + industry + ratings)
Top 3 selection based on total score
Google Gemini AI generates detailed reasoning

Methodological Soundness:

Strengths: Multi-factor scoring comprehensive, AI explanation adds value, database-driven scalable, structured output
Weaknesses: Scoring weights arbitrary (no validation), product catalog potentially biased, no user feedback loop, AI explanation not validated against user satisfaction, Sprint 2 RSS enrichment limited

Use of LLMs: Central - Google Gemini for recommendation reasoning and explanation generation

Use of Agentic AI: None - request-response recommendations without autonomous behavior

Project 27: Research Agent (Mycroft)

Core Claim: Multi-agent intelligence framework combining financial metrics, patent analysis, earnings execution, and competitive benchmarking can generate comprehensive investment recommendations with weighted scoring and letter grades.

Logical Method:

Financial Agent: Alpha Vantage metrics + ratio calculations
Patent Agent: Google patent search + AI classification
Earnings Agent: Quarterly beat/miss tracking + momentum analysis
Competitive Agent: Peer rankings + sector comparison
Weighted scoring (50% innovation + 30% financial + 20% earnings)

Methodological Soundness:

Strengths: Multi-agent approach comprehensive, peer benchmarking valuable, earnings execution quantifiable, structured scoring methodology
Weaknesses: Patent data from search (not official), arbitrary weight selection (50/30/20), peer group manually curated, simulated competitive scores in some cases, no backtesting of recommendations

Use of LLMs: Minimal - basic text processing for patent classification

Use of Agentic AI: Multi-agent coordination (4 specialized agents), but no autonomous goal pursuit beyond prescribed analysis

Project 28: Risk Management Agent

Core Claim: Automated risk monitoring with AI-powered narrative generation can provide institutional-grade portfolio risk analysis including position sizing, stop-loss management, and multi-factor risk scoring.

Logical Method:

Portfolio data from Google Sheets
Live prices from Alpha Vantage
Risk calculations (position %, stop-loss, volatility, P&L)
Multi-factor scoring (0-100+ scale, 9 risk factors)
Groq LLM generates plain-English risk narrative

Methodological Soundness:

Strengths: Comprehensive risk metrics, volatility-adjusted position sizing, multi-level alerts, historical logging
Weaknesses: Risk scoring weights arbitrary, 8% stop-loss fixed (not adaptive), volatility calculation simple, no correlation analysis, no tail risk measures (VaR/CVaR)

Use of LLMs: Central - Groq (Llama 3.1) for risk narrative and actionable recommendations

Use of Agentic AI: None - scheduled analysis without autonomous decision-making

Project 29: Regulatory Scanning Agent

Core Claim: Multi-source RSS monitoring (SEC, FINRA, CFTC, Federal Register) with keyword-based urgency scoring can provide real-time regulatory intelligence while filtering noise from daily filings.

Logical Method:

5 parallel RSS feed monitoring
Data normalization across different feed formats
Keyword analysis across 6 regulatory domains
Urgency scoring (1-10) + impact classification
PostgreSQL storage + priority email alerts

Methodological Soundness:

Strengths: Uses authoritative regulatory sources, keyword approach efficient, deduplication prevents errors, priority filtering reduces noise
Weaknesses: Keyword matching misses nuance, urgency scoring subjective, 6 categories may miss important areas, no entity disambiguation, alert threshold arbitrary

Use of LLMs: None currently - keyword-based classification

Use of Agentic AI: None - scheduled monitoring without autonomous behavior

Project 30: Scenario Stress Testing Agent

Core Claim: Predefined and custom natural language market scenarios combined with compound shock calculations can quantify portfolio drawdowns and identify vulnerabilities before they materialize.

Logical Method:

Portfolio holdings input
Live prices from Alpha Vantage
Custom scenario → Groq LLM interprets → generates market shocks
Compound shock calculation (market + sector + category)
Drawdown computation + risk classification

Methodological Soundness:

Strengths: Natural language scenarios accessible, compound shocks realistic, multiple shock dimensions, real-time price data
Weaknesses: LLM shock interpretation not validated, additive shock model can exceed 100%, no correlation between assets, predefined scenarios arbitrary, no historical validation

Use of LLMs: Central - Groq (Llama 3.1) for natural language scenario interpretation and shock generation

Use of Agentic AI: None - request-response stress testing without autonomous behavior

Project 31: Social Sentiment Agent

Core Claim: Multi-platform social media monitoring (StackOverflow, GitHub, Reddit) with LLM-powered sentiment analysis can generate investment signals from technical developer communities.

Logical Method:

3-source data collection (StackOverflow API, GitHub API, Reddit API)
Data harmonization across platforms
Groq LLM sentiment analysis with confidence scoring
Topic classification into 6 investment categories
Multi-dimensional quality scoring (20-point scale)

Methodological Soundness:

Strengths: Technical communities appropriate for AI intelligence, multi-platform reduces bias, quality filtering reduces noise, topic classification valuable
Weaknesses: No validation of sentiment-price correlation, quality scoring arbitrary, 3 sources limited coverage, no spam detection, sentiment may lag price movements

Use of LLMs: Central - Groq (Llama 3.1-8B) for sentiment classification and topic categorization

Use of Agentic AI: None - scheduled analysis without autonomous behavior

Project 32: Tech Stack Comparative Agent

Core Claim: GitHub repository metadata aggregation combined with arXiv research signals can enable comparative analysis of company technology stacks through open-source contributions.

Logical Method:

GitHub REST API repository fetching with pagination
Metadata extraction (stars, forks, languages, issues)
arXiv API research paper counting by organization
Data normalization and aggregation
CSV export for analysis

Methodological Soundness:

Strengths: Uses public GitHub data, research integration valuable, repository metrics quantifiable, comparative approach insightful
Weaknesses: Open-source ≠ internal tech stack, stars don’t measure quality, arXiv counting simplistic, no code analysis, missing enterprise repositories

Use of LLMs: None - pure data aggregation

Use of Agentic AI: None - scheduled data collection without autonomous behavior

Project 33: Open Source Engineering Health Scoring

Core Claim: Multi-dimensional scoring combining popularity (stars/forks), activity (commit frequency), issue health (issue density), and license can quantify open-source project maturity (0-100 scale).

Logical Method:

GitHub repository snapshot extraction
Popularity scoring (log-normalized within dataset)
Activity scoring (recent commits, last push time)
Issue health (issues per 1k stars as proxy)
License scoring (Apache/MIT bonus points)

Methodological Soundness:

Strengths: Multi-dimensional approach comprehensive, log normalization handles scale, issue density reasonable proxy, license consideration important
Weaknesses: Popularity normalization only within dataset (not absolute), issue density doesn’t measure response time, no PR velocity, no bus factor analysis, no CI status, arbitrary weights

Use of LLMs: None - pure quantitative analysis

Use of Agentic AI: None - batch analysis without autonomous behavior

Nik Bear Brown - Computational Skepticism

Discussion about this post

Ready for more?