基于BP神经网络飞机颠簸预测_amdar 数据颠簸-优快云博客

本文链接：https://blog.youkuaiyun.com/zshluckydogs/article/details/113186720

该文介绍了如何利用AMDAR数据和NCEP再分析数据预测飞机颠簸，通过数据提取、预处理、特征工程和BP神经网络模型建立，对不同强度的颠簸进行分类。模型在无颠簸和严重颠簸类别上表现良好，但在轻度和重度颠簸上仍有提升空间。最后讨论了模型部署和前端设计，强调了异步服务架构在深度学习服务中的重要性。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

背景介绍

飞机在飞行过程中遇到扰动气流或者受到方向、大小不同的气流冲击导致的左右摇晃、前后颠簸、上下抛掷以及局部震颤等想象统称为颠簸。中度以上颠簸会使飞机仪表指示失常，操纵困难；特别严重时会破坏飞机结构，造成事故。飞机一旦进入颠簸区，可采用改变航向和高度等办法尽快脱离。所以提前预测颠簸发生区域，规划飞行线路避开颠簸区域对飞行安全有着重大意义。

数据说明

颠簸数据来源AMDAR飞机报数据，利用DEVG（导出垂直阵风风速）作为颠簸发生依据。

参考WMO关于颠簸强度定义标准，对数据进行标注。

其次，颠簸的发生跟温度、位势高度、垂直速度、纬向风速和经向风速有关，这些要素在AMDAR数据中缺失，所以还需要利用NCEP再分析数据获取这些要素。

数据提取过程如下：

首先，从AMDAR数据中提取DEVG值大于0的数据及其经纬度时间飞行高度等。

其次，对AMDAR提取到的数据进行同化，即对AMDAR时间、经纬度、飞行高度数据进行格点化，对应到NCEP格点数据。

之后，从NCEP数据中提取需要的要素值。

从AMDAR中提取数据

def read_amdar():
    # select DEVC values from AMDAR dataset.
    # AMDAR dataset
    # V11036 DEVC
    # Lat 纬度
    # Lon 精度
    # Year 年
    # Mon 月
    # Day 日
    # Hour 时
    # Min 分
    # Second 秒
    #Flight_Heigh 飞行高度
    root_dir = 'fjbcsv/'
    files = os.listdir(root_dir)
    for filename in sorted(files):
        print(filename)
        if os.path.isdir(os.path.join(root_dir,filename)):
            csvs = os.listdir(os.path.join(root_dir,filename))
            
            for each in tqdm(csvs):
                nn = each.split('.')[0] + '.csv'

                df = pd.read_csv(os.path.join(root_dir,filename,csvs[0]))

                temp =  df[['Year','Mon','Day','Hour','Min','Second','Lat','Lon','Flight_Heigh','V11036']] 
                temp2 = temp[(temp['V11036']!=999999.0) & (temp['V11036']!=999998.0)]
                #print(temp.shape,temp2.shape)
                #print(temp2['V11036'].max())
                #print(temp2.head)
                #print(temp2['Flight_Heigh'].max())
                # 提取到的数据存储下来用作后续提取NCEP再分析数据
                # 数据较多且分散，不适合全部加载到内存中做一步到位提取NCEP数据
                
                temp2.to_csv(os.path.join('temp_result',nn),index=False)

从NCEP再分析数据中提取数据

NCEP是标准的格点数据，需要根据时间、等压面、经纬度格点坐标提取要素值，所以比较难的是如何把AMDAR数据中的时间、经纬度、飞行高度等要素值转变成格点坐标。

def post_process(*args):
    ## transform time to index :time - YYYY-MM-DD:Hour
    ## transorm flight heigh to air pressure
    ## transform lat degrees to index
    ## transform lon degrees to index
    ## transform once, apply to different nc files.

    time,fh,lat,lon = args
    Day,hour = time.split(':')
    hour = int(float(hour))
    days = getDays('2020-01-01',Day)
    if hour >=0 and hour < 6:
        t_index = days * 4 + 1
    if hour >= 6 and hour < 12:
        t_index = days * 4 + 2
    if hour >= 12 and hour < 18:
        t_index = days * 4 + 3
    if hour >= 18:
        t_index = days * 4 + 4 
    ######################
    ## height index
    index = find_h(fh)
    
    if index == 0 :
        a = np.argmin([abs(H[index]-fh),abs(H[index+1]-fh)])
    elif index == len(H) - 1:
        a = np.argmin([abs(H[index-1]-fh),abs(H[index]-fh)])
    else:
        a = np.argmin([abs(H[index-1]-fh),abs(H[index]-fh),abs(H[index+1]-fh)])
    if a == 0 :
        index -= 1
    if a == 1:
        index = index 
    if a == 2:
        index += 1
    h_index = index
    ###################
    ## lat index
    lat_index = round(abs(lat-90)/180*73)
  
    ##################
    ## lon index
    lon_index = round(abs(lon)/180*72)
    if lon > 0:
        lon_index = lon_index
    else:
        lon_index = 72 + lon_index
    #print(f'time index {t_index},height index {h_index},lat index {lat_index},lon index {lon_index}')
    return (t_index,h_index,int(lat_index),int(lon_index))
    
def extract_data(*args):
    
    time,airP,lat,lon,ds = args 

    temp = ds[time][airP][lat][lon]
    temp = temp.values

    #print(f'values: {temp}')
    return float(temp)

需要注意的是，提取到的数据中位势高度存在负值，根据实际应用场景分析，不会出现位势高度为负值的情况，所以要对这一类数据进行剔除。

模型架构

在调研大量文献资料之后，确定使用BP神经网络。依据调研的资料，颠簸模型与决定颠簸的要素是非线性的关系，常用方法是使用一个判别函数，对要素进行变换之后得到衡量颠簸类型的值。这种方法的优点是部署方便，缺点是性能很容易达到瓶颈，一个函数仅能表示一种变换，无法适配输入输出多种对应关系。这正好是神经网络的优点，能够拟合较为复杂多样的映射关系。

class Tur(nn.Module):
    def __init__(self,n_feature,n_hidden,n_ouput):
        super(Tur,self).__init__()
        self.fc1 = nn.Linear(n_feature,n_hidden)
        self.fc2 = nn.Linear(n_hidden,n_hidden)
        """
        maybe module is too simple to converge

        """
        self.fc3 = nn.Linear(n_hidden,n_hidden)
        self.fc4 = nn.Linear(n_hidden,n_hidden)

        self.dropout1 = nn.Dropout(p=0.2)
        self.out = nn.Linear(n_hidden,n_ouput)
        
    def forward(self,x):
        #x = F.tanh(self.fc1(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        #x = F.relu(self.fc3(x))
        #x = F.relu(self.fc4(x))
        #x = self.dropout1(x)
        x = self.out(x)
        return x

使用三层BP网络作为骨干网络，dropout方法防止过拟合。由于全连接网络比卷积网络更容易过拟合，特别是数据集中各类别的分类超平面类间间距不够大时，更容易发生过拟合想象，所以引入dropout防止过拟合是非常有必要的。

在训练模型时，SGD方法收敛速度较快，Adam方法的收敛效果较好，使用SGD时应设置较少的epochs较大的初始学习率如1e-2，或者0.05等，如果模型推理结果没有达到生产要求，在增加epochs延长训练过程时需要设置动态衰减学习率；使用Adam方法，需要注意的是设置较小的初始学习率较大的epochs可以达到更好的效果，比如1e-3，40以上的epochs等。

模型评估

使用Adam方法，训练10000个epochs，得到的结果如下：

precision    recall  f1-score   support
     class 0       0.54      0.74      0.62        50
     class 1       0.48      0.30      0.37        50
     class 2       0.56      0.50      0.53        50
     class 3       0.91      1.00      0.95        50
    accuracy                           0.64       200
   macro avg       0.62      0.64      0.62       200
weighted avg       0.62      0.64      0.62       200

类别0到3分别是无颠簸、轻度颠簸、重度颠簸、严重颠簸。

模型在无颠簸、严重颠簸类别上表现最好，在轻度颠簸和重度颠簸上表现较差。通过对训练数据集进行TSNE分析，发现轻度颠簸和重度颠簸数据没有清晰的分类超平面，如下图，两类数据有较多的重叠，合并为一个类别分析要比作为单独的两个类别更为合理。

部署生产环境

模型在训练和推理数据集上取得合理的结果后，就可以作为1.0版本部署到生产环境进行用户测试和版本迭代。

推荐使用Tornado，或者Django作为正式服务框架，Flask作为试验服务框架。

Tornado是一个优秀的异步架构，对于深度学习服务来说，很多任务都做不到实时推理，同步服务逻辑需要保持长链接，推理完成前任务常时等待，所以异步服务架构是一个更合理的选择。

在试验时发现模型推理时间小于1ms，使用同步服务相对异步服务要更合理，一是因为模型可以准实时推理不需要长链接，二是同步服务更容易实现。

class MainHandler(tornado.web.RequestHandler):
    def set_default_headers(self):
        # 解决跨域问题
        self.set_header('Access-Control-Allow-Origin', '*')
        self.set_header('Access-Control-Allow-Headers', 'x-requested-with')
        self.set_header('Access-Control-Allow-Methods', 'POST, GET, OPTIONS')

    def resp(self, resp_dct):
        # 定义返回形式，json
        resp = json.dumps(resp_dct, ensure_ascii = False)
        # logging.info(resp)
        return resp

    def post(self):
        # post方法获取模型推理需要的要素
        # 加载推理模型
        # 获取模型输出
        # 进行类别转换
        # 返回模型推理结果
        #files = self.request.files.get('file', None)
        air = self.get_argument('air', None)
        hgt = self.get_argument('hgt', None)
        omega = self.get_argument('omega', None)
        uwnd = self.get_argument('uwnd', None)
        vwnd = self.get_argument('vwnd', None)

        print(air,hgt,omega,uwnd,vwnd)
        try:
            net = load_turmodel()
        except Exception as e:
            logging.info(e)
        
        x = [air,hgt,omega,uwnd,vwnd]
        query = '\t'.join(x)
        x = [np.float(each) for each in x]
        x = np.array(x)
        x = (x-Turdata.min_value)/(Turdata.max_value-Turdata.min_value)
        x = np.expand_dims(x,0)
        x = torch.FloatTensor(x)
        outputs = net(x)
        _,predicted = torch.max(outputs,1)
        predicts = predicted.numpy()
        prob = F.softmax(outputs.data)
        prob_value = prob.max().data.numpy()
        flag = label_tur[predicts[0]]
        print(flag,prob_value)    
        # 记录每一次提交的测试要素值和模型推理结果，人工验证推理结果，进行类别校正
        # 用作后续模型2.0版本迭代
        # 的训练数据
        f = open('turdata.txt','a+')
        f.write(query+'\t'+str(prob_value)+'\t'+(flag))
        f.write('\n')
        
        self.write(self.resp({'code':0, 'prob': float(prob_value), 'flag': flag}))

    def options(self):
        # pass
        self.set_status(204)
        self.finish()
        # self.write(self.resp({'code':0, 'msg':''}))

    def get(self):
        self.write('some get')