CSV文件的读取 In〔1〕:importcsv通过Python自带的csv库创建CSV文件fpopen(H:python数据分析数据ch4ex1。csv,w,newline)新建CSV文件writercsv。writer(fp)writer。writerow((id,name,grade))向CSV写入数据writer。writerow((1,lucky,87))writer。writerow((2,peter,92))writer。writerow((3,lili,85))fp。close() In〔2〕:!typeH:python数据分析数据ch4ex1。csv通过!type方法查看数据,type方法只适用于Windows系统,UNIX系统使用!cat命令。id,name,grade1,lucky,872,peter,923,lili,85 In〔3〕:importpandasaspddfpd。readcsv(open(H:python数据分析数据ch4ex1。csv))使用readcsv函数读取CSV文件读取CSV文件时,如果文件路径中有中文,需要加open函数,否则会报错df Out〔3〕: id name grade 0hr1hrlucky 87hr1hr2hrpeter 92hr2hr3hrlili 85hrIn〔4〕:dfpd。readtable(open(H:python数据分析数据ch4ex1。csv),sep,)使用readtable进行读取CSV文件,指定分隔符即可df Out〔4〕: id name grade 0hr1hrlucky 87hr1hr2hrpeter 92hr2hr3hrlili 85hrIn〔5〕:dfpd。readcsv(open(H:python数据分析数据ch4ex1。csv),indexcolid)默认情况下,读取的DataFrame的行索引是从0开始进行计数通过indexcol参数指定id列为行索引df Out〔5〕: name grade id 1hrlucky 87hr2hrpeter 92hr3hrlili 85hrIn〔6〕:importcsv通过Python自带的csv库创建CSV文件fpopen(H:python数据分析数据ch4ex2。csv,w,newline)writercsv。writer(fp)writer。writerow((school,id,name,grade))写入数据writer。writerow((a,1,lucky,87))writer。writerow((a,2,peter,92))writer。writerow((a,3,lili,85))writer。writerow((b,1,coco,78))writer。writerow((b,2,kevin,87))writer。writerow((b,3,heven,96))fp。close() In〔7〕:!typeH:python数据分析数据ch4ex2。csv查看数据school,id,name,gradea,1,lucky,87a,2,peter,92a,3,lili,85b,1,coco,78b,2,kevin,87b,3,heven,96 In〔8〕:dfpd。readcsv(open(H:python数据分析数据ch4ex2。csv),indexcol〔0,id〕)层次化索引,传入列编号或者列名组成的列表即可df Out〔8〕: name grade school id a 1hrlucky 87hr2hrpeter 92hr3hrlili 85hrb 1hrcoco 78hr2hrkevin 87hr3hrheven 96hrIn〔9〕:importcsv通过Python自带的csv库创建CSV文件fpopen(H:python数据分析数据ch4ex3。csv,w,newline)writercsv。writer(fp)writer。writerow((1,lucky,87))writer。writerow((2,peter,92))writer。writerow((3,lili,85))fp。close() In〔10〕:!typeH:python数据分析数据ch4ex3。csv查看数据1,lucky,872,peter,923,lili,85 In〔12〕:dfpd。readcsv(open(H:python数据分析数据ch4ex3。csv))默认情况读取,会指定第一行为标题行df Out〔12〕: 1hrlucky 87hr0hr2hrpeter 92hr1hr3hrlili 85hrIn〔13〕:dfpd。readcsv(open(H:python数据分析数据ch4ex3。csv),headerNone)通过header参数分配默认的标题行如果表头的type和csv内容的type相一致的时候,那么直接读取,会让第一行来当表头此时加headerNone,可以让第一行不当表头,而默认给0、1来当表头header这个属性是指,在不加headerNone这个属性所出来的数据的基础上,把那个数据的表头去掉,换成0开头的表头df Out〔13〕: 0hr1hr2hr0hr1hrlucky 87hr1hr2hrpeter 92hr2hr3hrlili 85hrIn〔14〕:dfpd。readcsv(open(H:python数据分析数据ch4ex3。csv),names〔id,name,grade〕)通过names参数给其指定列名当设置了names属性之后,header无论设不设置,都会是Nonedf Out〔14〕: id name grade 0hr1hrlucky 87hr1hr2hrpeter 92hr2hr3hrlili 85hrIn〔15〕:importcsv通过Python自带的csv库创建CSV文件并写入数据fpopen(H:python数据分析数据ch4ex4。csv,w,newline)writercsv。writer(fp)writer。writerow(〔Thisisgrade〕)writer。writerow((id,name,grade))writer。writerow((1,lucky,87))writer。writerow((2,peter,92))writer。writerow((3,lili,85))writer。writerow(〔time〕)fp。close() In〔16〕:!typeH:python数据分析数据ch4ex4。csv查看数据Thisisgradeid,name,grade1,lucky,872,peter,923,lili,85time In〔17〕:dfpd。readcsv(open(H:python数据分析数据ch4ex4。csv),skiprows〔0,5〕)通过skiprows参数跳过一些行无论是带表头还是不带表头,skiprows2的效果,都是读第三行(也就是跳了两行读)df Out〔17〕: id name grade 0hr1hrlucky 87hr1hr2hrpeter 92hr2hr3hrlili 85hrIn〔19〕:dfpd。readcsv(open(H:python数据分析数据titanic。csv),nrows10)通过nrows参数,可以选择只读取部分行数据df Out〔19〕: PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked 0hr1hr0hr3hrBraund,Mr。OwenHarris male 22。0 1hr0hrA521171 7。2500 NaN S 1hr2hr1hr1hrCumings,Mrs。JohnBradley(FlorenceBriggsTh。。。 female 38。0 1hr0hrPC17599 71。2833 C85 C 2hr3hr1hr3hrHeikkinen,Miss。Laina female 26。0 0hr0hrSTONO2。3101282 7。9250 NaN S 3hr4hr1hr1hrFutrelle,Mrs。JacquesHeath(LilyMayPeel) female 35。0 1hr0hr113803hr53。1000 C123 S 4hr5hr0hr3hrAllen,Mr。WilliamHenry male 35。0 0hr0hr373450hr8。0500 NaN S 5hr6hr0hr3hrMoran,Mr。James male NaN 0hr0hr330877hr8。4583 NaN Q 6hr7hr0hr1hrMcCarthy,Mr。TimothyJ male 54。0 0hr0hr17463hr51。8625 E46 S 7hr8hr0hr3hrPalsson,Master。GostaLeonard male 2。0 3hr1hr349909hr21。0750 NaN S 8hr9hr1hr3hrJohnson,Mrs。OscarW(ElisabethVilhelminaBerg) female 27。0 0hr2hr347742hr11。1333 NaN S 9hr10hr1hr2hrNasser,Mrs。Nicholas(AdeleAchem) female 14。0 1hr0hr237736hr30。0708 NaN C In〔20〕:dfpd。readcsv(open(H:python数据分析数据titanic。csv),nrows10,usecols〔Survived,Sex〕)通过usecols参数进行部分列的选取df Out〔20〕: Survived Sex 0hr0hrmale 1hr1hrfemale 2hr1hrfemale 3hr1hrfemale 4hr0hrmale 5hr0hrmale 6hr0hrmale 7hr0hrmale 8hr1hrfemale 9hr1hrfemale In〔21〕:dfpd。readcsv(open(H:python数据分析数据titanic。csv))在处理很大文件的时候,需要对文件进行逐块读取,首先通过info函数查看泰坦尼克号的生还者数据,共有891条数据df。info()classpandas。core。frame。DataFrameRangeIndex:891entries,0to890Datacolumns(total12columns):PassengerId891nonnullint64Survived891nonnullint64Pclass891nonnullint64Name891nonnullobjectSex891nonnullobjectAge714nonnullfloat64SibSp891nonnullint64Parch891nonnullint64Ticket891nonnullobjectFare891nonnullfloat64Cabin204nonnullobjectEmbarked889nonnullobjectdtypes:float64(2),int64(5),object(5)memoryusage:83。6KB In〔22〕:chunkerpd。readcsv(open(H:python数据分析数据titanic。csv),chunksize100)通过chunksize参数,即可逐步读取文件设定读取的行数,返回一个固定行数的迭代器,每次读取只消耗相应行数对应的dataframe的内存,从而可以有效的解决内存消耗过多的问题chunker Out〔22〕:pandas。io。parsers。TextFileReaderat0x96c3cf8 In〔23〕:dfpd。readcsv(open(H:python数据分析数据titanic。csv))df〔Sex〕。valuecounts() Out〔23〕:male577female314Name:Sex,dtype:int64 In〔24〕:frompandasimportSeriesimportpandasaspdchunkerpd。readcsv(open(H:python数据分析数据titanic。csv),chunksize100)sexSeries(〔〕)foriinchunker:返回的是可迭代的TextFileReader。通过迭代,可以对Sex列进行计数sexsex。add(i〔Sex〕。valuecounts(),fillvalue0)sex Out〔24〕:male577。0female314。0dtype:float64readcsvreadtable参数 TXT文件的读取 In〔25〕:fpopen(H:python数据分析数据ch4ex6。txt,a)创建TXT文件fp。writelines(id?name?grade)写入数据fp。writelines(1?lucky?87)fp。writelines(2?peter?92)fp。writelines(3?lili?85)fp。close() In〔26〕:!typeH:python数据分析数据ch4ex6。txt查看数据id?name?grade1?lucky?872?peter?923?lili?85 In〔27〕:importpandasaspddfpd。readtable(open(H:python数据分析数据ch4ex6。txt),sep?)读取TXT文件通过readtable函数中的sep参数进行分隔符的指定df Out〔27〕: id name grade 0hr1hrlucky 87hr1hr2hrpeter 92hr2hr3hrlili 85hrIn〔28〕:!typeH:python数据分析数据ch4ex7。txt查看TXT文件,以空格隔开的文件idnamegrade1lucky872peter923lili85 In〔29〕:dfpd。readtable(open(H:python数据分析数据ch4ex7。txt),seps)正则表达式处理空格读取数据df Out〔29〕: id name grade 0hr1hrlucky 87hr1hr2hrpeter 92hr2hr3hrlili 85文本存储 In〔30〕:importpandasaspddfpd。readcsv(open(H:python数据分析数据ch4ex1。csv))df Out〔30〕: id name grade 0hr1hrlucky 87hr1hr2hrpeter 92hr2hr3hrlili 85hrIn〔31〕:利用DataFrame的tocsv方法,可以将数据存储到以逗号分隔的CSV文件中df。tocsv(H:python数据分析数据out1。csv)!typeH:python数据分析数据out1。csv,id,name,grade0,1,lucky,871,2,peter,922,3,lili,85 In〔32〕:通过sep参数指定存储的分隔符,默认情况下会存储行和列索引df。tocsv(H:python数据分析数据out2。csv,sep?)!typeH:python数据分析数据out2。csv?id?name?grade0?1?lucky?871?2?peter?922?3?lili?85 In〔33〕:通过设置index和header分别处理行和列索引df。tocsv(H:python数据分析数据out3。csv,indexFalse)!typeH:python数据分析数据out3。csvid,name,grade1,lucky,872,peter,923,lili,85