CSV转LibSVM格式之C语言实现改进

本文介绍了一种使用C程序将CSV数据转换为LibSVM格式的方法,并针对原有程序存在的问题进行了改进,包括输出方式、数据展示及标签位置等。提供修改后的源代码供读者参考。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

上次讲了使用Excel的宏进行Excel数据和libsvm数据之间的转换,但是对于大数据量情况下,该方法并不好用,因此需要使用程序来实现。
非常幸运在libsvm官网上找到了C程序可以做这个事情,试用了一下后觉得有一些问题,因此对源程序进行了修改,在此分享给大家。
主要的改进点如下:
1、原来的程序计算后数据输出在控制台,不方便进行进一步操作,因此 将输出改成写入了TXT文件。
2、解决了double型数据后有过多的0(如4.0000)的显示问题。
3、解决了cvs原始数据lable在最后一列导致输出的libsvm文件属性和label相反的问题。
源代码如下,作为参考。

/*
    convert cvs data to libsvm/svm-light format
    Updated on Jan 11, 2014 to use strsep() instead of strtok().
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#ifndef __USE_BSD
char *strsep(char **stringp, const char *delim);
#endif

char buf[10000000];
float feature[100000];

int main(int argc, char **argv)
{
    FILE *fp,*fp1;

    if(argc!=2) { fprintf(stderr,"Usage %s filename\n",argv[0]); }
/*  if((fp=fopen(argv[1],"r"))==NULL)
//在命令行跑程序可以使用这段,将文件路径直接作为main函数的参数就行了
    {
        fprintf(stderr,"Can't open input file %s\n",argv[1]);
    }
*/

    if((fp=fopen("day.csv","r"))==NULL)//我的输入文件
    {
        fprintf(stderr,"Can't open input file %s\n",argv[1]);
    }
     if((fp1=fopen("1.txt","w"))==NULL)//输出文件
    {
        printf("Can't open input file %s\n");
    }

    while(fscanf(fp,"%[^\n]\n",buf)==1)
    {
        int i=0,j;
        char *ptr=buf;
        while (ptr != NULL)
        {
            char *token = strsep(&ptr, ",");
            if (strlen(token) == 0)
                feature[i++] = 0.0;
            else
                feature[i++] = atof(token);
        }

    fprintf(fp1,"%d ", (int) feature[i-1]);
        for(j=0;j<i-1;j++)
            if(feature[j]!=0)
                //printf(" %d:%f",j,feature[j]);
                 fprintf(fp1," %d:%g",j+1,feature[j]);

        //printf("\n");
        fprintf(fp1,"\n");
    }
    fclose(fp1);
    return 0;
}

#ifndef __USE_BSD
/*-
 * Copyright (c) 1990, 1993
 *  The Regents of the University of California.  All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 4. Neither the name of the University nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */
char *strsep(char **stringp, const char *delim){
    char *s;
    const char *spanp;
    int c, sc;
    char *tok;

    if ((s = *stringp) == NULL)
        return (NULL);
    for (tok = s;;) {
        c = *s++;
        spanp = delim;
        do {
            if ((sc = *spanp++) == c) {
                if (c == 0)
                    s = NULL;
                else
                    s[-1] = 0;
                *stringp = s;
                return (tok);
            }
        } while (sc != 0);
    }
}
#endif

最后感谢原作者的分享,相关的资料可以在以下网址找到:
http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#/Q3:_Data_preparation

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值