I have a bit trouble with some data stored in a text file on hand for regression analysis using Python.
The data are stored in the format that look like this:
2104,3,399900 1600,3,329900 2400,3,369000 ....
I need to do some analysis like finding mean by this:
(2104+1600+...)/number of data
I think the appropriate steps is to store the data into array. But I have no idea how to store it. I think of two ways to do so. The first one is to set 3 array that stores like
a=[2104 1600 2400 ...] b=[3 3 3 ...] c=[399900 329900 36000 ...]
The second way is to store in
a=[2104 3 399900], b=[1600 3 329900] and so on.
Which one is better?
Also, how to write code that allows the data can be stored into array? I think of like this:
with open("file.txt", "r") as ins:
array = []
elt.strip(',."\'?!*:') for line in ins:
array.append(line)
Is that correct?
解决方案
You could use :
with open('data.txt') as data:
substrings = data.read().split()
values = [map(int, substring.split(',')) for substring in substrings]
average = sum([a for a, b, c in values]) / float(len(values))
print average
With this data.txt, :
2104,3,399900 1600,3,329900 2400,3,369000
2105,3,399900 1601,3,329900 2401,3,369000
It outputs :
2035.16666667