基于 DeepSeek 的电商商品价格库存预测--第二版

活力板蓝根

已于 2025-04-07 14:04:09 修改

阅读量2k

点赞数 52

CC 4.0 BY-SA版权

分类专栏：数据分析人工智能文章标签：数据分析人工智能 python

于 2025-03-11 16:15:50 首次发布

本文链接：https://blog.youkuaiyun.com/weixin_42934197/article/details/146175061

数据分析同时被 2 个专栏收录

8 篇文章

订阅专栏

人工智能

5 篇文章

订阅专栏

上次做的那个电商产品预测后面运用起来发现有很多不足的地方，所以这次整个升级了一下，进行了新的模块化设计，尤其是在数据填充部分有了很多新想法。可视化也进行了一个大升级，可交互性提高了特别多。
之前写的——基于大模型的【电商商品价格与库存】智能预测

这第二版价格库存预测，从数据样本处理——deepseek处理——数据填充——可视化都已实现且跑通，目前是能预测未来6个月的商品价格趋势和库存变化，之后有新想法我也会不断更新，希望大家享用愉快！

项目背景

在电商领域，准确预测商品价格和库存变化对于商家的库存管理和定价策略至关重要。本项目开发了一个基于AI的预测系统，通过分析历史数据，预测未来6个月的价格和库存趋势，帮助商家：

优化库存管理，避免积压和断货
制定合理的定价策略，提高竞争力
把握市场趋势，提前做好运营规划

系统架构

系统采用模块化设计，主要包含以下功能模块：
在这里插入图片描述

数据处理模块
- 处理历史商品价格和库存数据
- 支持多种数据格式导入
- 自动数据清洗和标准化
AI预测模块（优化升级部分）
- 集成DeepSeek AI模型
- 预测未来6个月趋势
- 提供价格和库存范围预测
- 包含预测原因分析
数据填充模块（优化升级部分）
- 基于月度预测生成每日数据
- 支持多种填充算法
- 保证数据连续性和合理性
可视化模块
- 生成交互式HTML报告
- 展示历史数据和预测趋势
- 支持数据下钻分析

项目实现

1. 数据处理模块

数据处理模块负责解析和标准化原始数据。以下是核心实现：

class DataProcessor:
    def __init__(self, input_file, end_date=None):
        """
        初始化数据处理器
        
        参数:
            input_file (str): 输入CSV文件路径
            end_date (str, optional): 数据结束日期，格式'YYYY-MM-DD'
        """
        self.input_file = input_file
        self.end_date = datetime.strptime(end_date, '%Y-%m-%d') if end_date else datetime.now()
        self.start_date = self.end_date - timedelta(days=365)  # 处理近一年数据
        
        # 生成周数据时间点
        self.weekly_dates = [
            self.end_date - timedelta(days=i*7) 
            for i in range(52)
        ]
        self.weekly_dates.reverse()  # 按时间顺序排列
        
        self.data = None
        self.processed_data = {}

    def _parse_json_array(self, json_str):
        """
        解析JSON数组字符串
        
        参数:
            json_str (str): JSON格式的数组字符串
            
        返回:
            list: 解析后的数组
        """
        try:
            # 处理可能的转义字符和引号问题
            cleaned_str = json_str.replace("'", '"').replace('""', '"')
            return json.loads(cleaned_str)
        except Exception as e:
            logger.error(f"JSON解析错误: {str(e)}")
            return []

    def _normalize_price_data(self, price_data):
        """
        标准化价格数据
        
        参数:
            price_data (list): 原始价格数据列表
            
        返回:
            list: 标准化后的价格数据
        """
        normalized = []
        for price_range in price_data:
            try:
                min_price = float(price_range[0])
                max_price = float(price_range[1])
                avg_price = (min_price + max_price) / 2
                
                # 异常值检测
                if min_price < 0 or max_price < min_price:
                    raise ValueError("价格数据异常")
                
                normalized.append({
                    'min': min_price,
                    'max': max_price,
                    'avg': avg_price
                })
            except Exception as e:
                logger.warning(f"价格数据标准化失败: {str(e)}")
                normalized.append({'min': 0, 'max': 0, 'avg': 0})
        
        return normalized

    def _normalize_stock_data(self, stock_data):
        """
        标准化库存数据
        
        参数:
            stock_data (list): 原始库存数据列表
            
        返回:
            list: 标准化后的库存数据
        """
        normalized = []
        for stock_range in stock_data:
            try:
                min_stock = int(stock_range[0])
                max_stock = int(stock_range[1])
                avg_stock = (min_stock + max_stock) // 2
                
                # 异常值检测
                if min_stock < 0 or max_stock < min_stock:
                    raise ValueError("库存数据异常")
                
                normalized.append({
                    'min': min_stock,
                    'max': max_stock,
                    'avg': avg_stock
                })
            except Exception as e:
                logger.warning(f"库存数据标准化失败: {str(e)}")
                normalized.append({'min': 0, 'max': 0, 'avg': 0})
        
        return normalized

    def process_data(self):
        """处理商品历史数据"""
        if self.data is None:
            if not self.load_data():
                return False
        
        for _, row in self.data.iterrows():
            product_id = row['product_id']
            category = row['category']
            
            try:
                # 解析价格和库存数据
                week_price_data = self._parse_json_array(row['week_price'])
                week_stock_data = self._parse_json_array(row['week_stock'])
                
                # 数据标准化处理
                price_data = self._normalize_price_data(week_price_data)
                stock_data = self._normalize_stock_data(week_stock_data)
                
                # 存储处理后的数据
                self.processed_data[product_id] = {
                    'category': category,
                    'historical_data': {
                        'dates': [d.strftime('%Y-%m-%d') for d in self.weekly_dates],
                        'price': price_data,
                        'stock': stock_data
                    }
                }
                
            except Exception as e:
                logger.error(f"处理商品 {product_id} 数据时出错: {str(e)}")
                continue
        
        return len(self.processed_data) > 0

2. AI预测模块

AI预测模块负责调用DeepSeek API进行预测。里面的提示词和分析范围大家可以随取随修改，这里只是给一个示例，调整到适合自己项目的提示词效果才更好，核心实现如下：

class PricePredictor:
    def __init__(self, api_key=None, timeout=30):
        """
        初始化预测器
        
        参数:
            api_key (str): DeepSeek API密钥
            timeout (int): 请求超时时间（秒）
        """
        self.api_key = api_key or os.environ.get('DEEPSEEK_API_KEY')
        self.timeout = timeout
        self.cache_dir = "data/cache"
        os.makedirs(self.cache_dir, exist_ok=True)
        
        # 配置请求会话
        self.session = requests.Session()
        retries = Retry(
            total=3,
            backoff_factor=0.5,
            status_forcelist=[429, 500, 502, 503, 504]
        )
        self.session.mount('https://', HTTPAdapter(max_retries=retries))

    def _generate_prompt(self, product_data):
        """
        生成预测提示词
        
        参数:
            product_data (dict): 商品历史数据
            
        返回:
            str: 格式化的提示词
        """
        product_id = product_data['product_id']
        category = product_data['category']
        historical_data = product_data['historical_data']
        
        # 构建历史数据描述
        history_desc = []
        for date, price, stock in zip(
            historical_data['dates'],
            historical_data['price'],
            historical_data['stock']
        ):
            history_desc.append(
                f"- {date}: 价格 ¥{price['min']:.2f}-{price['max']:.2f}, "
                f"库存 {stock['min']}-{stock['max']}"
            )
        
        # 生成完整提示词
        prompt = f"""作为电商分析专家，请基于以下历史数据，预测商品未来6个月的价格和库存趋势。

商品信息:
- ID: {product_id}
- 类别: {category}

历史数据（过去12个月）:
{chr(10).join(history_desc)}

请分析以下因素：
1. 历史价格波动规律
2. 季节性影响
3. 库存周转情况
4. 市场供需关系

并预测未来6个月的：
1. 每月价格范围（最低-最高）
2. 每月库存范围（最低-最高）
3. 变化原因分析（限50字内）

请以JSON格式返回预测结果。
"""
        return prompt

    def _calculate_confidence(self, prediction):
        """
        计算预测结果的置信度
        
        参数:
            prediction (dict): 预测结果
            
        返回:
            float: 0-1之间的置信度分数
        """
        scores = []
        predictions = prediction.get('predictions', [])
        
        for i in range(1, len(predictions)):
            prev = predictions[i-1]
            curr = predictions[i]
            
            # 价格变化合理性
            price_change = abs(
                (curr['price']['max'] + curr['price']['min'])/2 -
                (prev['price']['max'] + prev['price']['min'])/2
            ) / ((prev['price']['max'] + prev['price']['min'])/2)
            price_score = max(0, 1 - price_change)
            
            # 库存变化合理性
            stock_change = abs(
                (curr['stock']['max'] + curr['stock']['min'])/2 -
                (prev['stock']['max'] + prev['stock']['min'])/2
            ) / max(1, (prev['stock']['max'] + prev['stock']['min'])/2)
            stock_score = max(0, 1 - stock_change)
            
            # 原因解释完整性
            reason_score = min(1.0, len(curr['reason']) / 50)
            
            scores.extend([price_score, stock_score, reason_score])
        
        return np.mean(scores) if scores else 0.0

    async def predict(self, product_data):
        """
        预测商品未来趋势
        
        参数:
            product_data (dict): 商品历史数据
            
        返回:
            dict: 预测结果
        """
        # 生成缓存键
        cache_key = hashlib.md5(
            json.dumps(product_data, sort_keys=True).encode()
        ).hexdigest()
        
        # 检查缓存
        cache_file = os.path.join(self.cache_dir, f"{cache_key}.json")
        if os.path.exists(cache_file):
            if time.time() - os.path.getmtime(cache_file) < 24*3600:
                with open(cache_file, 'r') as f:
                    return json.load(f)
        
        # 准备API请求
        prompt = self._generate_prompt(product_data)
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {self.api_key}"
        }
        payload = {
            "model": "deepseek-chat",
            "messages": [
                {
                    "role": "system",
                    "content": "你是一位专业的电商数据分析师，擅长预测商品价格和库存趋势。"
                },
                {
                    "role": "user",
                    "content": prompt
                }
            ],
            "temperature": 0.3,
            "max_tokens": 1000
        }
        
        try:
            # 发送API请求
            async with aiohttp.ClientSession() as session:
                async with session.post(
                    self.api_url,
                    headers=headers,
                    json=payload,
                    timeout=self.timeout
                ) as response:
                    if response.status == 200:
                        result = await response.json()
                        prediction = json.loads(
                            result['choices'][0]['message']['content']
                        )
                        
                        # 计算置信度
                        prediction['confidence'] = self._calculate_confidence(
                            prediction
                        )
                        
                        # 缓存结果
                        with open(cache_file, 'w') as f:
                            json.dump(prediction, f, indent=2)
                        
                        return prediction
                    else:
                        raise Exception(
                            f"API请求失败: {response.status} - {await response.text()}"
                        )
                        
        except Exception as e:
            logger.error(f"预测失败: {str(e)}")
            return self._generate_mock_prediction(product_data)

3. 数据填充模块

数据填充模块在本项目中还是很重要的，它负责将月度预测数据填充为每日数据点，这样以来deepseek就不需要每天都给一个数据，输出的token量也会少很多，节省了相当多的成本，这样生成的数据也更符合我们的需要：

class DataFiller:
    def __init__(self, prediction_data):
        """
        初始化数据填充器
        
        参数:
            prediction_data (dict): 预测数据
        """
        self.prediction_data = prediction_data
        self.filled_data = {
            'dates': [],
            'price': [],
            'stock': [],
            'reasons': {}
        }

    def _generate_daily_values(self, start_value, end_value, num_days, method='smooth'):
        """
        生成平滑过渡的每日数据
        
        参数:
            start_value (float): 起始值
            end_value (float): 结束值
            num_days (int): 天数
            method (str): 填充方法，可选 'smooth', 'linear', 'random'
            
        返回:
            list: 每日数据列表
        """
        if method == 'linear':
            # 线性插值
            return np.linspace(start_value, end_value, num_days).tolist()
            
        elif method == 'random':
            # 带随机波动的插值
            values = []
            range_size = abs(end_value - start_value)
            step = (end_value - start_value) / (num_days - 1)
            
            for i in range(num_days):
                base_value = start_value + step * i
                noise = random.uniform(-0.1, 0.1) * range_size
                values.append(max(0, base_value + noise))
            
            values[-1] = end_value  # 确保最后一个值正确
            return values
            
        else:  # smooth
            # 使用余弦函数生成平滑过渡
            values = []
            for i in range(num_days):
                t = i / (num_days - 1)
                smooth_t = 0.5 * (1 - np.cos(np.pi * t))
                value = start_value + (end_value - start_value) * smooth_t
                values.append(max(0, value))
            return values

    def fill_data(self, method='smooth'):
        """
        填充每日数据
        
        参数:
            method (str): 填充方法
            
        返回:
            dict: 填充后的数据
        """
        predictions = self.prediction_data.get('predictions', [])
        if not predictions:
            return self.filled_data
        
        # 遍历每个月的预测
        for i in range(len(predictions)-1):
            curr_pred = predictions[i]
            next_pred = predictions[i+1]
            
            # 获取当月天数
            curr_date = datetime.strptime(curr_pred['month'], '%Y-%m')
            next_date = datetime.strptime(next_pred['month'], '%Y-%m')
            num_days = (next_date - curr_date).days
            
            # 生成每日价格数据
            price_min_values = self._generate_daily_values(
                curr_pred['price']['min'],
                next_pred['price']['min'],
                num_days,
                method
            )
            price_max_values = self._generate_daily_values(
                curr_pred['price']['max'],
                next_pred['price']['max'],
                num_days,
                method
            )
            
            # 生成每日库存数据
            stock_min_values = self._generate_daily_values(
                curr_pred['stock']['min'],
                next_pred['stock']['min'],
                num_days,
                method
            )
            stock_max_values = self._generate_daily_values(
                curr_pred['stock']['max'],
                next_pred['stock']['max'],
                num_days,
                method
            )
            
            # 填充数据
            for day in range(num_days):
                current_date = curr_date + timedelta(days=day)
                self.filled_data['dates'].append(
                    current_date.strftime('%Y-%m-%d')
                )
                
                self.filled_data['price'].append({
                    'min': round(price_min_values[day], 2),
                    'max': round(price_max_values[day], 2),
                    'avg': round(
                        (price_min_values[day] + price_max_values[day]) / 2,
                        2
                    )
                })
                
                self.filled_data['stock'].append({
                    'min': int(stock_min_values[day]),
                    'max': int(stock_max_values[day]),
                    'avg': int(
                        (stock_min_values[day] + stock_max_values[day]) / 2
                    )
                })
            
            # 保存当月预测原因
            self.filled_data['reasons'][curr_pred['month']] = curr_pred['reason']
        
        return self.filled_data

4. 可视化模块

可视化模块使用Plotly生成交互式图表：

class Visualizer:
    def __init__(self, output_dir=None):
        """
        初始化可视化器
        
        参数:
            output_dir (str): 输出目录
        """
        self.output_dir = output_dir or f"output/{datetime.now().strftime('%Y-%m-%d')}"
        os.makedirs(self.output_dir, exist_ok=True)

    def create_price_stock_chart(self, product_data, prediction_data):
        """
        创建价格和库存趋势图
        
        参数:
            product_data (dict): 历史数据
            prediction_data (dict): 预测数据
            
        返回:
            plotly.graph_objects.Figure: 图表对象
        """
        # 创建子图
        fig = make_subplots(
            rows=2, cols=1,
            subplot_titles=("商品价格趋势", "商品库存趋势"),
            vertical_spacing=0.15
        )
        
        # 添加历史价格数据
        hist_dates = product_data['historical_data']['dates']
        hist_prices = product_data['historical_data']['price']
        
        fig.add_trace(
            go.Scatter(
                x=hist_dates,
                y=[p['min'] for p in hist_prices],
                name="历史最低价",
                line=dict(color='rgba(0,128,255,0.8)'),
                hovertemplate="日期: %{x}<br>最低价: ¥%{y:.2f}"
            ),
            row=1, col=1
        )
        
        fig.add_trace(
            go.Scatter(
                x=hist_dates,
                y=[p['max'] for p in hist_prices],
                name="历史最高价",
                line=dict(color='rgba(0,128,255,0.4)'),
                hovertemplate="日期: %{x}<br>最高价: ¥%{y:.2f}"
            ),
            row=1, col=1
        )
        
        # 添加预测价格数据
        pred_dates = prediction_data['dates']
        pred_prices = prediction_data['price']
        
        fig.add_trace(
            go.Scatter(
                x=pred_dates,
                y=[p['min'] for p in pred_prices],
                name="预测最低价",
                line=dict(color='rgba(255,0,0,0.8)', dash='dash'),
                hovertemplate="日期: %{x}<br>预测最低价: ¥%{y:.2f}"
            ),
            row=1, col=1
        )
        
        fig.add_trace(
            go.Scatter(
                x=pred_dates,
                y=[p['max'] for p in pred_prices],
                name="预测最高价",
                line=dict(color='rgba(255,0,0,0.4)', dash='dash'),
                hovertemplate="日期: %{x}<br>预测最高价: ¥%{y:.2f}"
            ),
            row=1, col=1
        )
        
        # 添加库存数据...（类似的代码处理库存图表）
        
        # 设置图表布局
        fig.update_layout(
            title=f"商品 {product_data['product_id']} 价格库存预测",
            height=800,
            showlegend=True,
            legend=dict(
                orientation="h",
                yanchor="bottom",
                y=1.02,
                xanchor="right",
                x=1
            )
        )
        
        return fig

    def generate_report(self, products_data):
        """
        生成HTML报告
        
        参数:
            products_data (list): 商品数据列表
            
        返回:
            str: 报告文件路径
        """
        # 创建报告模板
        env = Environment(loader=FileSystemLoader("templates"))
        template = env.get_template("report.html")
        
        # 准备报告数据
        report_data = {
            'generation_time': datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
            'products_count': len(products_data),
            'products': []
        }
        
        # 为每个商品生成图表
        for product in products_data:
            chart = self.create_price_stock_chart(
                product['historical_data'],
                product['prediction_data']
            )
            
            report_data['products'].append({
                'id': product['product_id'],
                'category': product['category'],
                'chart': chart.to_html(
                    full_html=False,
                    include_plotlyjs='cdn'
                ),
                'confidence': product['prediction_data'].get('confidence', 0)
            })
        
        # 生成报告
        html_content = template.render(**report_data)
        report_file = os.path.join(self.output_dir, "report.html")
        
        with open(report_file, 'w', encoding='utf-8') as f:
            f.write(html_content)
        
        return report_file

5. 主程序流程

async def main():
    """主程序入口"""
    # 初始化组件
    processor = DataProcessor("input/products.csv")
    predictor = PricePredictor(api_key=os.environ.get('DEEPSEEK_API_KEY'))
    filler = DataFiller()
    visualizer = Visualizer()
    
    try:
        # 处理历史数据
        if not processor.process_data():
            raise Exception("数据处理失败")
        
        # 获取处理后的数据
        products_data = processor.get_processed_data()
        
        # 并行预测所有商品
        predictions = {}
        async with asyncio.TaskGroup() as tg:
            for product_id, product_data in products_data.items():
                predictions[product_id] = tg.create_task(
                    predictor.predict(product_data)
                )
        
        # 填充每日数据并生成报告
        report_data = []
        for product_id, product_data in products_data.items():
            prediction = predictions[product_id].result()
            filled_data = filler.fill_data(prediction)
            
            report_data.append({
                'product_id': product_id,
                'historical_data': product_data,
                'prediction_data': filled_data
            })
        
        # 生成报告
        report_file = visualizer.generate_report(report_data)
        logger.info(f"报告已生成: {report_file}")
        
    except Exception as e:
        logger.error(f"处理失败: {str(e)}")
        sys.exit(1)

if __name__ == "__main__":
    asyncio.run(main())

实际应用案例

以某电商平台的手机品类为例，系统对iPhone 15系列产品进行了预测，此数据仅供参考。

预测结果可视化展示：
在这里插入图片描述

预测结果分析：

价格趋势：
- 4-6月：价格稳中有降，从¥6299降至¥5899
- 7-9月：新机发布前夕，价格进一步下探至¥5699
- 预测置信度：85%
库存趋势：
- 4-6月：保持稳定库存水平2000-3000台
- 7-8月：开始去库存，降至1000-1500台
- 9月：新机发布，补充库存至3000-4000台
- 预测置信度：80%
市场分析：
- 新机型发布前2-3个月开始降价清库存
- 库存控制策略合理，避免积压
- 价格调整幅度符合历史规律