In today’s threat intelligence world, timely and accurate news reporting is crucial. With the vast influx of news articles daily, manually reading through headlines for relevant threats becomes overwhelming and time consuming. This post demonstrates how to build an AI agent that automates the process of ranking, analyzing, and reporting on cybersecurity news. By leveraging advanced language models and libraries like Langchain, you can quickly identify the most critical news items and derive actionable insights.
The code examples below are based on my open-source project ‘OSINT Toolkit’. In case you need the full code, just visit its GitHub repository.
Overview
The AI agent consists of several core components:
- A Database with News Articles: Before creating an Agent we need relevant data for it.
- News Article Ranking: Uses a custom prompt and an LLM to rank recent articles based on relevance and uniqueness.
- Article Analysis: Generates detailed analyses for selected articles, including risk assessments and actionable recommendations.
Each part of the agent is integrated into a modular Python codebase, making it easy to maintain and extend.
Here is a visualization of the steps the agent will perform:
Ranking Recent Cybersecurity Articles
The first step is to rank recent news articles. The agent uses a prompt designed for a cyber threat intelligence analyst to review a list of headlines. The goal is to:
- Select the top 10 most relevant articles.
- Avoid duplicating similar topics.
- Rank the articles based on a relevance score (1 being the highest).
The reason for only using headlines is that the context windows of most LLMs is not big enough for hundreds or thousands of news articles.
The following code snippet demonstrates how to construct the ranking prompt and process the articles:
RANKING_PROMPT = """
You are a very experienced cyber threat intelligence analyst specialized in analyzing large amounts of news articles for relevance.
Below is a list of news article headlines. For each headline, consider how relevant it is for cybersecurity and threat intelligence.
News articles have a higher relevance, if there are more news article headlines about the same topic and / or if it describes about active threats or vulnerabilities.
Your task:
1. Identify the 10 most relevant articles from the list (if there are fewer than 10, select them all).
2. Make sure to not list multiple articles about the same topic.
3. Sort them by relevance (most relevant first).
4. Return the result in a JSON array of objects with the following fields:
- id: the article ID
- title: the article headline
- relevance_score: 1 (most relevant) to 10 (least relevant within the top 10)
- reason: a brief reason for ranking
Only return the JSON, no additional commentary.
List of articles:
<news article headlines>
{articles_list}
</news article headlines>
"""
In the code, the function rank_recent_articles
retrieves recent news articles (filtered to the last 7 days), formats them into a list, and sends them through the ranking chain to get a sorted list based on relevance.
Analyzing Individual Articles
Once the top articles are identified, the next step is to perform an in-depth analysis of each article. This involves generating a detailed summary, risk rating, and actionable recommendations. The analysis uses another prompt tailored for cybersecurity intelligence, which specifies:
- Risk Assessment: Categorizing each article as High, Medium, Low, or Informational.
- Summary & Analysis: Providing a concise yet comprehensive overview of the article’s content.
- Action Items: Listing recommended steps or precautions based on the article’s findings.
Here’s the analysis prompt integrated into the code:
ANALYSIS_PROMPT = """
You are a very experienced cyber threat intelligence analyst specialized in in creating best in class news article analysis.
Below you will find news article data about a news artcicle.
You will receive the article's title, feed name, and summary text.
Produce a comprehensive analysis in JSON format with the following fields:
{{
"Risk": "[High / Medium / Low / Informational]",
"Summary": "[Detailed summary of the article]",
"Analysis comment": "[Reasoning why the article is relevant]",
"Action items": ["list of actions"],
"Source": "Complete url to the article"
}}
The criteria to determine the risk rating are:
<Risk Criteria>
High Risk - Immediate, active threats with high confidence—such as zero-day exploits or unpatched vulnerabilities—that require urgent action to protect critical systems. Severe threats with potential for significant impact where mitigations exist but may be underutilized, necessitating prompt response.
Medium Risk - Emerging vulnerabilities or attack trends that could impact operations under specific conditions, meriting careful monitoring.
Low Risk - Minor issues or outdated threats unlikely to affect core operations, generally limited in scope or impact.
Informational - Background or analytical content that provides context and insights without posing an immediate threat.
</Risk Criteria>
Keep it concise and to the point but not too short. Return only valid JSON.
Article data:
<news article data>
{article_data}
</news article data>
"""
The analyze_article
function uses this prompt to analyze each selected article, ensuring that all critical details—such as title, summary, full text, and publication date—are included in the request to the language model.
Putting It All Together
The overall architecture of the AI agent involves:
- Database Integration: Fetching recent news articles using SQLAlchemy.
- API Key Management: Securely retrieving API keys for OpenAI.
- Chained Prompts and LLM Calls: Using LangChain to seamlessly integrate prompts, LLM responses, and JSON parsing.
- Logging and Error Handling: Ensuring that any issues during ranking or analysis are logged for troubleshooting.
The modular design allows you to further customize prompts, add new features, or integrate additional data sources as needed.
Conclusion
Automating cybersecurity news reporting with an AI agent speeds up the process of cyber security news analysis. By ranking articles based on relevance, generating detailed analyses, and streaming results in real time, you can consume critical information quickly. The provided code examples illustrate how to set up and integrate each component, offering a robust foundation to expand upon. Happy coding and stay secure!