DATA ENGINEERING Pipeline
Crypto Market Analyzer
An automated Python pipeline utilizing the CoinMarketCap API to track, analyze, and visualize real-time trends of top cryptocurrencies.
Project Overview
This project builds a robust pipeline for financial data analysis. By connecting to the CoinMarketCap API, it pulls live market data, cleans and structures it into a Pandas DataFrame, and stores it locally for historical tracking. The final step involves sophisticated data visualization to map percentage changes over various timeframes (1h, 24h, 7d, 30d).
API Integration
Handles secure API requests with headers and parameters to fetch live crypto assets.
Data Persistence
Appends new data to a local CSV file to build a historical dataset over time.
Trend Analysis
Calculates mean percentage changes across 5 different time intervals.
Market Trend Visualization
Below is a dynamic representation of the market data generated by the project. It visualizes the volatility (Percentage Change) of top assets over time.
Core Implementation
1. Fetching Data from API
We configure the request headers with our API key and define parameters to fetch the top 15
currencies converted to USD using requests.
from requests import Session
import json
url = 'https://pro-api.coinmarketcap.com/v1/cryptocurrency/listings/latest'
parameters = {
'start': '1',
'limit': '15',
'convert': 'USD'
}
headers = {
'Accepts': 'application/json',
'X-CMC_PRO_API_KEY': 'YOUR-API-KEY-HERE'
}
session = Session()
session.headers.update(headers)
response = session.get(url, params=parameters)
data = json.loads(response.text)
2. Normalizing & Analyzing Data
The nested JSON response is normalized into a Pandas DataFrame. We then group by currency name to calculate average percentage changes.
import pandas as pd
# Normalize JSON to DataFrame
df = pd.json_normalize(data['data'])
# Calculate mean changes for visualization
df_viz = df.groupby('name', sort=False)[[
'quote.USD.percent_change_1h',
'quote.USD.percent_change_24h',
'quote.USD.percent_change_7d'
]].mean()
3. Visualization
Finally, the data is reshaped to allow Seaborn to plot the time intervals on the X-axis and percentage change on the Y-axis.
import seaborn as sns
import matplotlib.pyplot as plt
# Reshape data for plotting
df_melted = df_viz.stack().to_frame().reset_index()
df_melted = df_melted.rename(columns={0: 'values', 'level_1': 'interval'})
sns.pointplot(x='interval', y='values', hue='name', data=df_melted)
plt.show()
Key Outcomes & Conclusion
This project demonstrates a complete data engineering workflow. Key takeaways include:
- API Proficiency: Successfully authenticated and retrieved complex nested data from a third-party API.
-
Data Cleaning: Utilized Pandas
json_normalizeandmeltto transform raw JSON into a format suitable for visualization. - Advanced Visualization: Leveraged Seaborn's point plots to effectively communicate multi-variable time series data.