输入输出

得到用户输入

示例

#!/usr/bin/python
# user_input.py

def reverse(text):
    return text[::-1]

def is_palindrome(text):
    return text == reverse(text)

something = input('Enter text: ')

if (is_palindrome(something)):
    print("Yes, it is a palindrome")
else:
    print("No, it is not a palindrome")

输出

$ python user_input.py
Enter text: sir
No, it is not a palindrome

$ python user_input.py
Enter text: madam
Yes, it is a palindrome

$ python user_input.py
Enter text: racecar
Yes, it is a palindrome

说明

input()函数接收一个字符串实参并将其打印给用户，然后函数等待用户输入一些东西，一但用户按下回车键则输入结束，input函数将返回输入的文本。

练习题

检测一个文本是否为回文应该忽略标点，空格和大小写。

例如"Rise to vote, sir."同样是一个回文，但是我们当前的例子无法识别它。你能改善这个例子让它做都这点吗？

我的解答

#!/bin/python

denotes = ['.', ',', '!', '?', ' ']

def normalize(string):
    str = string.strip()
    str = str.lower()
    for note in denotes:
        str = str.replace(note, '')
    return str


def inverse(string):
    str = string[::-1]
    return str

def ispalindrome(string):
    str = normalize(string)
    print("String after normalization: {0}".format(str))
    str_inversed = inverse(str)
    print("String after inversion: {0}".format(str_inversed))
    if str == str_inversed:
        return 0
    else:
        return 1        

text = input("Enter text:")
if ispalindrome(text) == 0:
    print("{0} is palindrome.".format(text))
else:
    print("{0} is not palindrome.".format(text))

文件

通过创建file类对象，使用其read，readline或write方法你可以对文件进行读写。

具体读或写的方式依赖于你打开文件时指定的模式。

最后当你完成文件操作时调用close关闭文件。

示例

#!/usr/bin/python
# Filename: using_file.py

poem = '''\
Programming is fun
When the work is done
if you wanna make your work also fun:
    use Python!
'''

f = open('poem.txt', 'w', encoding='utf-8') # 写模式打开
f.write(poem) # 写文件
f.close() # 关闭文件

f = open('poem.txt',encoding='utf-8') # 如果没有提供打开模式, 则默认假设为读模式
while True:
    line = f.readline()
    if not line: # 也可以使用 if len(line) == 0
        break
    print(line, end='')
f.close() # 别忘了关闭文件

输出

$ python using_file.py
Programming is fun
When the work is done
if you wanna make your work also fun:
    use Python!

说明

首先，我们通过内建函数open打开一个文件。格式为：

1 2	open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None) -> file object

open()函数返回一个流对象（stream object），它拥有一些用来获取信息和操作字符流的方法和属性。在函数中我们指定了被打开文件的文件名与希望使用的打开模式。

其中打开模式包括：

字符	含义
`r`	读模式（默认）
`w`	写模式，将会先把文件清空。
`x`	创建一个新文件并打开。如果该文件已存在，将引发`FileExistsError`异常。
`a`	打开一个文件以写入。如果该文件已存在，则向该文件追加内容。
`b`	二进制模式
`t`	文本模式（默认）
`+`	打开一个磁盘文件以更新（读和写）
`U`	全局newline模式（以向后兼容，新版本的Python不需要用到）

encoding属性反映的是在你调用open()函数时指定的编码方式。如果你在打开文件的时候没有指定编码方式（不好的开发人员！），那么encoding属性反映的是locale.getpreferredencoding()的返回值。注意二进制的流对象没有encoding属性，因为Python不需要做编码转换工作。

在循环中我们使用readline方法读取文件的每一行。这个方法返回一整行文本其中包括末尾的换行符。当返回一个空字符串时，意味着我们已经来到文件尾，因此使用break跳出循环。

默认的，print()函数将自动打印一个换行。因为从文件读出的文本行末尾已经包含一个换行，所以我们指定参数end=''抑制换行。

最后我们关闭文件。

另外，流对象有几个有用的函数：

read(n)：读取n指定的字符个数（可选）。如果不带参数，则读取文件的全部内容；
readline()：读取一行；
seek()：定位到文件流中的特定字节；
tell()：返回当前流位置；

更多内容访问Python 3.0文档: io 或 $ pydoc io。

检测文件是否已经关闭

已经关闭了的流对象有一个有用的属性：closed用来确认文件是否已经被关闭了。

>>> f = open('poem.txt', 'w', encoding='utf-8')
>>> f.close()
>>> f.read()
Traceback (most recent call last):
  File "<pyshell#24>", line 1, in <module>
    a.read()
>>> f.close() # 再次关闭不会引发异常，只是一个空操作
>>> f.closed
True

自动关闭文件

with open('examples/chinese.txt', encoding='utf-8') as a_file:
    a_file.seek(17)
    a_character = a_file.read(1)
    print(a_character)

这段代码跟try...finally块等效，但是却非常简洁——甚至连显示调用close()都不需要：当with块结束时，Python自动调用a_file.close()。

这就是它与众不同的地方：无论你以何种方式跳出with块，Python会自动关闭那个文件，即使是因为未处理的异常而“exit”。是的，即使代码中引发了一个异常，整个程序突然中止了，Python也能够保证那个文件能被关闭掉。

处理压缩文件

Python标准库包含支持读写压缩文件的模块。有许多种不同的压缩方案；其中，gzip和bzip2是非Windows操作系统下最流行的两种压缩方式。

gzip模块允许你创建用来读写gzip压缩文件的流对象。该流对象支持read()方法（如果你以读取模式打开）或者write()方法（如果你以写入模式打开）。这就意味着，你可以使用从普通文件那儿学到的技术来直接读写gzip压缩文件，而不需要创建临时文件来保存解压缩了的数据。

示例

$ you@localhost:~$ python3

>>> import gzip
>>> with gzip.open('out.log.gz', mode='wb') as z_file:
...   z_file.write('A nine mile walk is no joke, especially in the rain.'.encode('utf-8'))
... 
>>> exit()

$ you@localhost:~$ ls -l out.log.gz                     
-rw-r--r--  1 mark mark    79 2009-07-19 14:29 out.log.gz
$ you@localhost:~$ gunzip out.log.gz                       
$ you@localhost:~$ cat out.log                             
A nine mile walk is no joke, especially in the rain.

Pickle

Python 提供了一个名为 pickle 的标准模块用于将任意 Python 对象存入文件或从文件中读出。

什么东西能用pickle模块存储？

所有Python支持的原生类型 : 布尔, 整数, 浮点数, 复数, 字符串, bytes(字节串)对象, 字节数组, 以及 None。
由任何原生类型组成的列表，元组，字典和集合。
由任何原生类型组成的列表，元组，字典和集合组成的列表，元组，字典和集合(可以一直嵌套下去，直至Python支持的最大递归层数)。
函数，类，和类的实例(带警告)。

示例

#!/usr/bin/python
#Filename: pickling.py

import pickle

# the name of the file where we will store the object
shoplistfile = 'shoplist.pickle'

# the list of things to buy
shoplist = ['apple', 'mango', 'carrot']

# Write to the file
with open(shoplistfile, 'wb') as f:
    pickle.dump(shoplist, f) # 转储对象到文件

del shoplist # 销毁shoplist变量

# 从文件找回对象
with open(shoplistfile, 'rb') as f:
    storedlist = pickle.load(f) # 从文件加载对象

print(storedlist)

输出

1 2	$ python pickling.py ['apple', 'mango', 'carrot']

说明

为了将对象存储到文件，我们必须首先 'wb' 写二进制文件模式打开文件然后调用pickle模块的dump函数。这个过程叫做封藏(pickling)对象。
接下来我们使用pickle的load函数重新找回对象。这个过程叫做解封(unpickling)对象。
pickle模块接受一个Python数据结构并将其保存的一个文件。要做到这样，它使用一个被称为“pickle协议”的东西序列化该数据结构。pickle 协议是Python特定的，没有任何跨语言兼容的保证。你很可能不能使用Perl, php, Java, 或者其他语言来对你刚刚创建的shoplist.data文件做任何有用的事情。
并非所有的Python数据结构都可以通过pickle模块序列化。随着新的数据类型被加入到Python语言中，pickle协议已经被修改过很多次了，但是它还是有一些限制。由于这些变化，不同版本的Python的兼容性也没有保证。新的版本的Python支持旧的序列化格式，但是旧版本的Python不支持新的格式(因为它们不支持新的数据类型)。
除非你指定，pickle模块中的函数将使用最新版本的pickle协议。这保证了你对可以被序列化的数据类型有最大的灵活度，但这也意味着生成的文件不能被不支持新版pickle协议的旧版本的Python读取。最新版本的pickle协议是二进制格式的。请确认使用二进制模式来打开你的pickle文件，否则当你写入的时候数据会被损坏。

pickcle 的现有版本

pickle协议已经存在好多年了，它随着Python本身的成熟也不断成熟。现在存在四个不同版本的pickle协议。

Python 1.x 有两个pickle协议，一个基于文本的格式(“版本 0”) 以及一个二进制格式(“版本 1”).
Python 2.3 引入了一个新的pickle协议(“版本 2”) 来处理Python 类对象的新功能。它是一个二进制格式。
Python 3.0 引入了另一个pickle 协议 (“版本 3”) ，显式的支持bytes 对象和字节数组。它是一个二进制格式。

在实践中这意味着，尽管 Python 3 可以读取版本 2 的pickle 协议生成的数据, Python 2 不能读取版本 3的协议生成的数据。

很多关于pickle模块的文章提到了cPickle。在Python 2中, pickle 模块有两个实现, 一个由纯Python写的而另一个用C写的(但仍然可以在Python中调用)。在Python 3中, 这两个模块已经合并, 所以你总是简单的import pickle就可以。你可能会发现这些文章很有用，但是你应该忽略已过时的关于的cPickle的信息.

调试Pickle 文件

示例

>>> shell
1
>>> import pickletools
>>> with open('entry.pickle', 'rb') as f:
...     pickletools.dis(f)
    0: \x80 PROTO      3
    2: }    EMPTY_DICT
    3: q    BINPUT     0
    5: (    MARK
    6: X        BINUNICODE 'published_date'
   25: q        BINPUT     1
   27: c        GLOBAL     'time struct_time'
   45: q        BINPUT     2
   47: (        MARK
   48: M            BININT2    2009
   51: K            BININT1    3
   53: K            BININT1    27
   55: K            BININT1    22
   57: K            BININT1    20
   59: K            BININT1    42
   61: K            BININT1    4
   63: K            BININT1    86
   65: J            BININT     -1
   70: t            TUPLE      (MARK at 47)
   71: q        BINPUT     3
   73: }        EMPTY_DICT
   74: q        BINPUT     4
   76: \x86     TUPLE2
   77: q        BINPUT     5
   79: R        REDUCE
   80: q        BINPUT     6
   82: X        BINUNICODE 'comments_link'
  100: q        BINPUT     7
  102: N        NONE
  103: X        BINUNICODE 'internal_id'
  119: q        BINPUT     8
  121: C        SHORT_BINBYTES '脼脮麓酶'
  127: q        BINPUT     9
  129: X        BINUNICODE 'tags'
  138: q        BINPUT     10
  140: X        BINUNICODE 'diveintopython'
  159: q        BINPUT     11
  161: X        BINUNICODE 'docbook'
  173: q        BINPUT     12
  175: X        BINUNICODE 'html'
  184: q        BINPUT     13
  186: \x87     TUPLE3
  187: q        BINPUT     14
  189: X        BINUNICODE 'title'
  199: q        BINPUT     15
  201: X        BINUNICODE 'Dive into history, 2009 edition'
  237: q        BINPUT     16
  239: X        BINUNICODE 'article_link'
  256: q        BINPUT     17
  258: X        BINUNICODE 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition'
  337: q        BINPUT     18
  339: X        BINUNICODE 'published'
  353: q        BINPUT     19
  355: \x88     NEWTRUE
  356: u        SETITEMS   (MARK at 5)
  357: .    STOP
highest protocol among opcodes = 3

说明

这个反汇编中最有趣的信息是最后一行, 因为它包含了文件保存时使用的pickle协议的版本号。在pickle协议里面没有明确的版本标志。为了确定保存pickle文件时使用的协议版本，你需要查看序列化后的数据的标记(“opcodes”)并且使用硬编码的哪个版本的协议引入了哪些标记的知识(来确定版本号)。pickle.dis()函数正是这么干的，并且它在反汇编的输出的最后一行打印出结果。下面是一个不打印，仅仅返回版本号的函数:

import pickletools

def protocol_version(file_object):
    maxproto = -1
    for opcode, arg, pos in pickletools.genops(file_object):
        maxproto = max(maxproto, opcode.proto)
    return maxproto

实际使用它：

>>> import pickleversion
>>> with open('entry.pickle', 'rb') as f:
...     v = pickleversion.protocol_version(f)
>>> v
3

JSON

Python 3 在标准库中包含了一个 json模块。同 pickle模块类似， json模块包含一些函数，可以序列化数据结构，保存序列化后的数据至磁盘，从磁盘上读取序列化后的数据，将数据反序列化成新的Pythone对象。但两者也有一些很重要的区别。

首先，json数据格式是基于文本的，不是二进制的。RFC 4627 定义了json格式以及怎样将各种类型的数据编码成文本。比如，一个布尔值要么存储为5个字符的字符串’false’，要么存储为4个字符的字符串 ‘true’。所有的json值都是大小写敏感的。

第二，由于是文本格式，存在空白(whitespaces)的问题。 json 允许在值之间有任意数目的空白(空格，跳格，回车，换行)。空白是“无关紧要的”，这意味着json编码器可以按它们的喜好添加任意多或任意少的空白，而json解码器被要求忽略值之间的任意空白。这允许你“美观的打印（pretty-print）” 你的 json 数据，通过不同的缩进层次嵌套值，这样你就可以在标准浏览器或文本编辑器中阅读它。Python 的 json 模块有在编码时执行美观打印（pretty-printing）的选项。

第三，字符编码的问题是长期存在的。json 用纯文本编码数据，但是你知道， “不存在纯文本这种东西。” json必须以Unicode 编码(UTF-32， UTF-16，或者默认的， utf-8)方式存储， RFC 4627的第3节定义了如何区分使用的是哪种编码。

写入 json

示例

>>> shell
1
>>> basic_entry = {}                                           
>>> basic_entry['id'] = 256
>>> basic_entry['title'] = 'Dive into history, 2009 edition'
>>> basic_entry['tags'] = ('diveintopython', 'docbook', 'html')
>>> basic_entry['published'] = True
>>> basic_entry['comments_link'] = None
>>> import json
>>> with open('basic.json', mode='w', encoding='utf-8') as f:  
...     json.dump(basic_entry, f)

输出

那么生成的json序列化数据是什么样的呢？

1
2
3

you@localhost:~/diveintopython3/examples$ cat basic.json
{"published": true, "tags": ["diveintopython", "docbook", "html"], "comments_link": null,
"id": 256, "title": "Dive into history, 2009 edition"}

json 的值之间可以包含任意数目的空把, 并且json模块提供了一个方便的途径来利用这一点生成更可读的json文件。

示例

>>> shell
1
>>> with open('basic-pretty.json', mode='w', encoding='utf-8') as f:
...     json.dump(basic_entry, f, indent=2)
>>>

说明

如果你给json.dump()函数传入indent参数, 它以文件变大为代价使生成的json文件更可读。indent 参数是一个整数。0 意味着“每个值单独一行。” 大于0的数字意味着“每个值单独一行并且使用这个数目的空格来缩进嵌套的数据结构。”

输出

you@localhost:~/diveintopython3/examples$ cat basic-pretty.json
{
  "published": true, 
  "tags": [
    "diveintopython", 
    "docbook", 
    "html"
  ], 
  "comments_link": null, 
  "id": 256, 
  "title": "Dive into history, 2009 edition"
}

读入JSON

示例

>>  with open('entry.json', 'r', encoding='utf-8') as f:
...     entry = json.load(f)
>>> entry
{'comments_link': None,
 'internal_id': {'__class__': 'bytes', '__value__': [222, 213, 180, 248]},
 'title': 'Dive into history, 2009 edition',
 'tags': ['diveintopython', 'docbook', 'html'],
 'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
 'published_date': {'__class__': 'time.asctime', '__value__': 'Fri Mar 27 22:20:42 2009'},
 'published': True}

JSON 对 Python 数据类型的支持情况

由于json 不是Python特定的，对应到Python的数据类型的时候有很多不匹配。有一些仅仅是名字不同，但是有两个Python数据类型完全缺少：

JSON	Python 3
object	dictionary
array	list
string	string
integer	integer
real number	float
true	True
false	False
null	None

而且尽管 json 非常好的支持字符串，但是它没有对 bytes 对象或 bytearray 的支持。另外，json 没有一个单独的类型对应元组。

即使json没有内建的字节流支持, 并不意味着你不能序列化bytes对象。json模块提供了编解码未知数据类型的扩展接口。(“未知”的意思是“json没有定义”。很显然json 模块认识字节数组, 但是它被json规范的限制束缚住了。) 如果你希望编码字节串或者其它json没有原生支持的数据类型，你需要给这些类型提供定制的编码和解码器。例如，如果保存字节串对你来说很重要，你可以定义自己的“迷你序列化格式。”

示例

def to_json(python_object):
    if isinstance(python_object, time.struct_time):          
        return {'__class__': 'time.asctime',
                '__value__': time.asctime(python_object)}    
    if isinstance(python_object, bytes):                                
        return {'__class__': 'bytes',
                '__value__': list(python_object)}                       
    raise TypeError(repr(python_object) + ' is not JSON serializable')

说明

为了给一个json没有原生支持的数据类型定义你自己的“迷你序列化格式”, 只要定义一个接受一个Python对象为参数的函数。这个对象将会是json.dump()函数无法自己序列化的实际对象。

你的自定义序列化函数应该检查json.dump()函数传给它的对象的类型。当你的函数只序列化一个类型的时候这不是必须的，但是它使你的函数的覆盖的内容清楚明白，并且在你需要序列化更多类型的时候更容易扩展。

在这个例子里面, 我将 bytes 对象转换成字典。__class__键持有原始的数据类型(以字符串的形式, ‘bytes’), 而 __value__ 键持有实际的数据。当然它不能是bytes对象; 大体的想法是将其转换成某些可以被json序列化的东西! bytes对象就是一个范围在0–255的整数的序列。我们 可以使用 list() 函数将bytes对象转换成整数列表 。例如b'\xDE\xD5\xB4\xF8' 会变成 [222, 213, 180, 248]。(算一下! 这是对的! 16进制的字节 \xDE 是十进制的 222, \xD5 是 213, 以此类推。)

类似的，将time.struct_time结构转化成一个只包含json可序列化值的字典。在这个例子里, 最简单的将日期时间转换成json可序列化值的方法是使用time.asctime()函数将其转换成字符串。time.asctime() 函数将难看的time.struct_time转换成字符串 ‘Fri Mar 27 22:20:42 2009’。

最后一行一行很重要。你序列化的数据结构可能包含json内建的可序列化类型和你的定制序列化器支持的类型之外的东西。在这种情况下，你的定制序列化器抛出一个TypeError，那样json.dump() 函数就可以知道你的定制序列化函数不认识该类型。

就这么多；你不需要其它的东西。特别是, 这个定制序列化函数返回Python字典，不是字符串。你不是自己做所有序列化到json的工作; 你仅仅在做转换成被支持的类型那部分工作。json.dump() 函数做剩下的事情。

输出

>>> shell
1
>>> with open('entry.json', 'w', encoding='utf-8') as f:
...     json.dump(entry, f, default=customserializer.to_json)
...
>>> exit()

you@localhost:~/diveintopython3/examples$ ls -l example.json
-rw-r--r-- 1 you  you  391 Aug  3 13:34 entry.json
you@localhost:~/diveintopython3/examples$ cat example.json
{"published_date": {"__class__": "time.asctime", "__value__": "Fri Mar 27 22:20:42 2009"},
"comments_link": null, "internal_id": {"__class__": "bytes", "__value__": [222, 213, 180, 248]},
"tags": ["diveintopython", "docbook", "html"], "title": "Dive into history, 2009 edition",
"article_link": "http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition",
"published": true}

对于读入数据，json.load()并不知道你可能传给json.dump()的任何转换函数的任何信息。你需要的是to_json()函数的逆函数 — 一个接受定制转换出的json 对象并将其转换回原始的Python数据类型。

示例

# add this to customserializer.py
def from_json(json_object):                                   
    if '__class__' in json_object:                            
        if json_object['__class__'] == 'time.asctime':
            return time.strptime(json_object['__value__'])    
        if json_object['__class__'] == 'bytes':
            return bytes(json_object['__value__'])            
    return json_object

输出

>>> shell
2
>>> import customserializer
>>> with open('entry.json', 'r', encoding='utf-8') as f:
...     entry = json.load(f, object_hook=customserializer.from_json)  
... 
>>> entry                                                             
{'comments_link': None,
 'internal_id': b'\xDE\xD5\xB4\xF8',
 'title': 'Dive into history, 2009 edition',
 'tags': ['diveintopython', 'docbook', 'html'],
 'article_link': 'http://diveintomark.org/archives/2009/03/27/dive-into-history-2009-edition',
 'published_date': time.struct_time(tm_year=2009, tm_mon=3, tm_mday=27, tm_hour=22, tm_min=20, tm_sec=42, tm_wday=4, tm_yday=86, tm_isdst=-1),
 'published': True}