Python

2019-09-17
作者 Qiang
~75.19K 字
次阅读
条评论

1. 字符串( Python Strings )
1. 1.1. 格式化字符串(Formatted Strings)
2. 1.2. String Methods
2. Python2中的4种数值类型( Numbers in Python Version 2 )
3. Python3中的3种数值类型( Numbers in Python Version 3 )
4. Python2中的运算操作(Arithmetic in Python V2)
5. Python3中的运算操作( Arithmetic in Python V3 )
6. 内置帮助文件dir()help() ( Interactive Help )
7. 布尔类型( Python Booleans )
8. datetime模块实操( Datetime Module (Dates and Times) )
9. 条件分支(If, Then, Else in Python)
10. 函数( Python Functions )
1. 10.1. 关键字参数
2. 10.2. 可变参数
3. 10.3. 递归函数
11. 集合( Sets in Python )
12. 列表( Python Lists )
1. 12.1. 2D Lists
13. 字典( Python Dictionaries )
1. 13.1. 字典生成式
14. 元祖( Python Tuples )
15. 日志( Logging in Python )
16. 递归与记忆化( Recursion, the Fibonacci Sequence and Memoization )
17. 随机模块( Python Random Number Generator: the Random Module )
18. 处理CSV数据格式( CSV Files in Python )
19. 列表推导式( List Comprehension )
20. 生成器和迭代器
1. 20.1. 生成器
2. 20.2. 迭代器
21. 类和对象( Python Classes and Objects )
22. 类的继承( Inheritance )
1. 22.1. super().init()
23. 模块( Modules )
1. 23.1. 引入模块
2. 23.2. from … import * 语句
3. 23.3. __name__属性
24. 包( Packages )
25. lambda表达式和匿名函数( Lambda Expressions & Anonymous Functions )
26. 必备函数map filter reduce操作( Map, Filter, and Reduce Functions )
1. 26.1. map函数
2. 26.2. filter函数
3. 26.3. reduce函数
27. sort和sorted方法的排序区别( Sorting in Python )
28. 读取和写入文本文件( Text Files in Python )
1. 28.1. 读取文件
2. 28.2. 写入文件
29. 异常处理( Exceptions in Python )
30. pathlib库的Path类
1. 30.1. Path.exists()
2. 30.2. Path.mkdir()
3. 30.3. Path.rmdir()
4. 30.4. Path.glob()
31. 其他
1. 31.1. zip与星号
2. 31.2. 魔法方法
3. 31.3. 类方法
4. 31.4. 装饰器
32. 参考

YouTube上看到一个很炫酷的Python教程

字符串( Python Strings )

在Python中可以通过单，双，三引号创建字符串.

▶ python3
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> message = "Meet me tonight."
>>> print(message)
Meet me tonight.
>>>
>>> message2 = 'The clock strikes at midnight.'
>>> print(message2)
The clock strikes at midnight.

当字符串变得复杂包含了单双引号的时候, 有两种办法可以解决. 第一种是用\backslash符号. 第二种是通过三引号解决.

>>> movie_quote = """One of my favorite lines from The Godfather is:
... I'm going to make him an offer he can't refuse."
... Do you know who said this?"""
>>>

使用’‘’, 回车换行会出现这个… Python提示, 这里有换行操作.

格式化字符串(Formatted Strings)

假设我们想打印出John [Smith] is a coder。我们可能会用下面的方法：

first = 'John'
last = 'Smith'
message = first + ' [' + last + '] is a coder'
print(message)

虽然这种方法非常有效，但是并不理想，因为当我们的文本变得复杂时候，代码的阅读性会很差。

first = 'John'
last = 'Smith'

msg = f'{first} [{last}] is a coder'
print(msg)

定义格式化字符串，需要我们在字符串前面加上f，然后用大括号把变量名括起来，在字符串中动态的插入值。

String Methods

len()
.upper()
.lower()
.find()
.replace()
'...' in variable

使用这个len()函数我们可以对字符的数量进行限制。

1 2	course = 'Python for Beginners' print(len(course))

1
2
3

>>> course = 'Python for Beginners'
>>> print(len(course))
20

如果我们希望把所有的字符转换为大写字母，我们就需要使用.upper()方法
如果我们希望把所有的字符转换为小写字母，我们就需要使用.lower()方法

1
2
3

course = 'Python for Beginners'
print(course.upper())
print(course.lower())

>>> print(course.upper())
PYTHON FOR BEGINNERS
>>> print(course.lower())
python for beginners

注意这个方法不会修改我们原来的字符串，是创建一个新的字符串并返回它。

如果希望在一个字符串中找到一个字符或一个字符序列，我们就需要使用.find()方法

1
2
3

course = 'Python for Beginners'
print(course.find('P'))
print(course.find('Beginners'))

>>> print(course.find('P'))
0
>>> print(course.find('Beginners'))
11

我们得到了0，这是因为字符串中第一个大写P的索引是0。
我们得到了11，这是因为Beginners的索引是从11开始的。

我们也有替换字符或字符串的方法，我们就需要使用.replace()方法

1 2	course = 'Python for Beginners' print(course.replace('Beginners', 'Absolute Beginners'))

1 2	>>> print(course.replace('Beginners', 'Absolute Beginners')) Python for Absolute Beginners

需要注意find()以及replace()方法是区分大小写的！

最后，如果想要检查你的字符或字符序列时，使用in运算符。比如你想知道字符串中是否包含Python这个词。

1
2
3

course = 'Python for Beginners'
print('Python' in course)
print('python' in course)

>>> print('Python' in course)
True
>>> print('python' in course)
False

这里注意与find()方法的区别，find方法返回的是一个字符或字符序列的索引，而in运算符返回的是一个布尔值。

Python2中的4种数值类型( Numbers in Python Version 2 )

Python2中有4种数值类型: int, long, float, complex. long结尾用L, complex结尾用j.

▶ python
Python 2.7.10 (default, Feb 22 2019, 21:55:15) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 28
>>> type(a)
<type 'int'>
>>> a
28
>>> print(a)
28
>>> import sys
>>> sys.maxint
9223372036854775807
>>> b = 2147483648
>>> type(b)
<type 'int'>
>>> c = 9223372036854775808
>>> type(c)
<type 'long'>
>>> d = -sys.maxint - 1
>>> type(d)
<type 'int'>
>>> e = 2.718
>>> type(e)
<type 'float'>
>>> z = 3 + 5.7j
>>> type(z)
<type 'complex'>
>>> z.real
3.0
>>> z.imag
5.7

Python3中的3种数值类型( Numbers in Python Version 3 )

Python3中有3种数值类型: int, float, complex.

在Python3中不必担心整数的大小，只要你的电脑足够大，可以创建任意大小的整数。没有大小限制，没有溢出。

▶ python3
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> z = 2 - 6.1j
>>> type(z)
<class 'complex'>
>>> z.real
2.0
>>> z.imag
-6.1
>>>

注意:实数输入的是整数，显示的是浮点型。因为在python中实和虚数部分规定为浮点型。

Python2中的运算操作(Arithmetic in Python V2)

范围 $int < long < float < complex$

▶ python
Python 2.7.10 (default, Feb 22 2019, 21:55:15) 
[GCC 4.2.1 Compatible Apple LLVM 10.0.1 (clang-1001.0.37.14)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 2 # int
>>> b = 3L # long
>>> c = 6.0  # float
>>> d = 12 + 0j  # complex number
>>> # Addition
... a + b  # int + long
5L
>>> # Subtraction
... c - b  # float - long
3.0
>>> Multiplication
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'Multiplication' is not defined
>>> a * c  # int * float
12.0
>>> # Division
... d / c  # complex / float
(2+0j)

整数除法

>>> 16 / 5
3
>>> 16 % 5
1
>>> float(16) / 5
3.2
>>> 2 / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero

总结:当进行两个数的加减乘除运算时，Python会自动将类型转换为更大的。两个整数相除，Python只返回商，小心除数是0。

Python3中的运算操作( Arithmetic in Python V3 )

▶ python3
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 16/5
3.2
>>> 20/5
4.0
>>> 16 % 5
1
>>> 16//5
3
>>> 2/0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

总结:当进行两个数的加减乘除运算时，Python会自动将类型转换为更大的。除法不像Python2，Python3返回正确的浮点或复数型的结果，不会返回商，小心除数是0。

内置帮助文件dir()help() ( Interactive Help )

▶ python3
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
>>> dir(__builtins__)
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'ModuleNotFoundError', 'NameError', 'None', 'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'RecursionError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopAsyncIteration', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '_', '__build_class__', '__debug__', '__doc__', '__import__', '__loader__', '__name__', '__package__', '__spec__', 'abs', 'all', 'any', 'ascii', 'bin', 'bool', 'breakpoint', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']
>>> help(hex)
Help on built-in function hex in module builtins:

hex(number, /)
    Return the hexadecimal representation of an integer.

    >>> hex(12648430)
    '0xc0ffee'
>>> hex(10)
'0xa'
>>> 0xa
10

>>> import math
>>> dir(math)
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']
>>> help(radians)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'radians' is not defined
# 因为radians函数是math模块中的，必须指定函数详细路径
>>> help(math.radians)
# 为了使用，必须指定函数详细路径
>>> math.radians(180)
3.141592653589793

布尔类型( Python Booleans )

boolean只有两个值:True和False.必须首字母大写.

>>> a = 3
>>> b = 5
>>> a == b
False
>>> a != b
True

在Python中0认为是False,空的字符串也认为是False

>>> type(True)
<class 'bool'>
>>> bool(2)
True
>>> bool(-2.1)
True
>>> bool(0)
False
>>> bool('abc')
True
>>> bool(' ')
True
>>> bool('')
False

datetime模块实操( Datetime Module (Dates and Times) )

>>> import datetime
>>> help(datetime.date)

>>> gvr = datetime.date(1956, 1, 31)
>>> print(gvr)
1956-01-31
>>> print(gvr.year)
1956
>>> print(gvr.month)
1
>>> print(gvr.day)
31

datetime模块中还有timedelta类，用来修改日期

>>> mail = datetime.date(2000, 1, 1)
>>> dt = datetime.timedelta(100)
>>> print(mail + dt)
2000-04-10

选择日期或时间格式，例如想显示一天的全写，然后是月，最后是年，传统的方法是对日期使用strftime方法.
还有一种方法是创建一个格式字符串，可以先输入任何喜欢的文本，到日期格式部分写.调用format()方法，值是日期或时间对象.

>>> # Day-name, Month-name Day-#, Year
...
>>> print(gvr.strftime("%A, %B, %d, %Y"))
Tuesday, January, 31, 1956
>>>
>>> message = "GVR was born on {:%A, %B, %d, %Y}."
>>> print(message.format(gvr))
GVR was born on Tuesday, January, 31, 1956.

用date类创建日期,参数是年,月,日. 用time类创建时间,参数是时,分,秒.最后用datetime类创建日期时间都有的变量,前三个参数是年,月,日,后三个参数是时,分,秒.

>>> dir(datetime)
['MAXYEAR', 'MINYEAR', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'date', 'datetime', 'datetime_CAPI', 'time', 'timedelta', 'timezone', 'tzinfo']
>>>
>>> launch_date = datetime.date(2017, 3, 20)
>>> launch_time = datetime.time(22, 27, 0)
>>> launch_datetime = datetime.datetime(2017, 3, 20, 22, 27, 0)
>>>
>>> print(launch_date)
2017-03-20
>>> print(launch_time)
22:27:00
>>> print(launch_datetime)
2017-03-20 22:27:00

像date对象一样,time也可输出每部分值,依次调用.hour/minute/second即可,datetime对象也可以.

>>> print(launch_time.hour)
22
>>> print(launch_time.minute)
27
>>> print(launch_time.second)
0

访问当前日期和时间:使用datetime类中的today()方法

Module: datetime
Class: datetime
Method: today()

>>> now = datetime.datetime.today()
>>> print(now)
2019-09-27 23:18:54.381467
>>> print(now.microsecond)
381467

将获取datetime字符串转为datetime对象:使用datetime类中的strptime方法，将时间字符串解析为datetime, 第一个参数是字符串，第二个参数是格式, 打印就会看到标准的Python日期格式，此时就是对象而非字符串了

Module: datetime
Class: datetime
Method: strptime()

>>> moon_landing = "7/20/1969"
>>> moon_landing_datetime = datetime.datetime.strptime(moon_landing, "%m/%d/%Y")
>>> print(moon_landing_datetime)
1969-07-20 00:00:00
>>> print(type(moon_landing_datetime))
<class 'datetime.datetime'>
>>>

总结:datetime模块中包含一个datedelta类，可以储存日期时间中的差异，还可以通过加减来修改日期。

条件分支(If, Then, Else in Python)

age = int(input('请输入1到100之间的数字:'))
if age >= 18:
	print('adult')
elif age >= 6:
	print('teenager')
else:
	print('kid')

注意缩进(Tab缩进)

函数( Python Functions )

函数的部分主要说一下参数，函数的参数有位置参数，默认参数，可变参数，关键字参数，命名关键字参数

关键字参数

Python函数可以接受关键字参数(keyword arguments).下面的例子是编写函数,将身高从美制单位转为厘米.(i inch = 2.54 cm, 1 foot = 12 inches).使用关键字参数会更灵活,代码更干净.

Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def cm(feet = 0, inches = 0):
...     '''Concerts a length from feet and inches to centimeters.'''
...     inches_to_cm = inches * 2.54
...     feet_to_cm = feet * 12 * 2.54
...     return inches_to_cm + feet_to_cm
>>> cm(feet = 5)
152.4
>>> cm(inches = 70)
177.8
>>> cm(feet = 5, inches = 8)
172.72

当创建函数时有两种参数方式,关键字参数(keyword)有等号=,必需参数(required)可以没有等号.创建函数时,两种都可以用,但必须从一而终.
例如创建函数g,第一个关键字参数,会报错,这里提示默认参数(default argument),这是关键字参数的另一个叫法.

>>> def g(x = 0, y):
...     return x + y
...
  File "<stdin>", line 1
SyntaxError: non-default argument follows default argument

要修正,必须将非默认参数写到前面,这些又名必需参数(Required Argument).因为需要指定.

1
2
3

>>> def g(y, x = 0):
...     return x + y
...

要调用该函数,必需要指定参数y值,关键字参数x则选填,x不给值就用默认的,如果想为其赋值,就必须要指名来设置,必需参数不需要指名,由位置来决定.

>>> g(5)
5
>>> g(7, x = 3)
10

定义默认参数要牢记一点：默认参数必须指向不变对象！

1
2
3

def add_end(L=[]):
    L.append('END')
    return L

>>> add_end()
['END']
>>> add_end()
['END', 'END']
>>> add_end()
['END', 'END', 'END']

但是，再次调用add_end()时，结果就不对了：

原因解释如下：

Python函数在定义的时候，默认参数L的值就被计算出来了，即[]，因为默认参数L也是一个变量，它指向对象[]，每次调用该函数，如果改变了L的内容，则下次调用时，默认参数的内容就变了，不再是函数定义时的[]了。

要修改上面的例子，我们可以用None这个不变对象来实现：

def add_end(L=None):
    if L is None:
        L = []
    L.append('END')
    return L

为什么要设计str、None这样的不变对象呢？因为不变对象一旦创建，对象内部的数据就不能修改，这样就减少了由于修改数据导致的错误。此外，由于对象不变，多任务环境下同时读取对象不需要加锁，同时读一点问题都没有。我们在编写程序时，如果可以设计一个不变对象，那就尽量设计成不变对象。

可变参数

参数个数不确定的时候，我们首先想到可以把a，b，c……作为一个list或tuple传进来，这样，函数可以定义如下：

def calc(*numbers):
    sum = 0
    for n in numbers:
        sum = sum + n * n
    return sum

定义可变参数和定义一个list或tuple参数相比，仅仅在参数前面加了一个*号。在函数内部，参数numbers接收到的是一个tuple，因此，函数代码完全不变。但是，调用该函数时，可以传入任意个参数，包括0个参数：

>>> calc(1, 2)
5
>>> calc()
0

Python允许你在list或tuple前面加一个*号，把list或tuple的元素变成可变参数传进去：

1
2
3

>>> nums = [1, 2, 3]
>>> calc(*nums)
14

*nums表示把nums这个list的所有元素作为可变参数传进去。这种写法相当有用，而且很常见。

递归函数

在函数内部，可以调用其他函数。如果一个函数在内部调用自身本身，这个函数就是递归函数

举个例子，我们来计算阶乘 $n! = 1 \times 2 \times 3 \times \cdots \times n$ ，用函数fact(n)表示

def fact(n):
    if n == 1:
        return 1
    return n * fact(n - 1)

如果我们计算fact(5)，可以根据函数定义看到计算过程如下：

===> fact(5)
===> 5 * fact(4)
===> 5 * (4 * fact(3))
===> 5 * (4 * (3 * fact(2)))
===> 5 * (4 * (3 * (2 * fact(1))))
===> 5 * (4 * (3 * (2 * 1)))
===> 5 * (4 * (3 * 2))
===> 5 * (4 * 6)
===> 5 * 24
===> 120

使用递归函数需要注意防止栈溢出。在计算机中，函数调用是通过栈（stack）这种数据结构实现的，每当进入一个函数调用，栈就会加一层栈帧，每当函数返回，栈就会减一层栈帧。由于栈的大小不是无限的，所以，递归调用的次数过多，会导致栈溢出

解决递归调用栈溢出的方法是通过尾递归优化，尾递归是指，在函数返回的时候，调用自身本身，并且，return语句不能包含表达式。这样，编译器或者解释器就可以把尾递归做优化，使递归本身无论调用多少次，都只占用一个栈帧，不会出现栈溢出的情况。

def fact(n):
    return fact_iter(n, 1)

def fact_iter(num, product):
    if num == 1:
        return product
    return fact_iter(num - 1, num * product)

可以看到，return fact_iter(num - 1, num * product)仅返回递归函数本身，num - 1和num * product在函数调用前就会被计算，不影响函数调用。

===> fact_iter(5, 1)
===> fact_iter(4, 5)
===> fact_iter(3, 20)
===> fact_iter(2, 60)
===> fact_iter(1, 120)
===> 120

尾递归调用时，如果做了优化，栈不会增长，因此，无论多少次调用也不会导致栈溢出。

遗憾的是，大多数编程语言没有针对尾递归做优化，Python解释器也没有做优化，所以，即使把上面的fact(n)函数改成尾递归方式，也会导致栈溢出。

集合( Sets in Python )

要创建一个set，需要提供一个list作为输入集合,重复元素在set中自动被过滤.通过add(key)方法可以添加元素到set中，可以重复添加，但不会有效果.通过remove(key)方法可以删除元素.

>>> s = set([28, True, 2.718, 'set'])
>>> s.add(123)
>>> s
{True, 2.718, 'set', 123, 28}
>>> s.remove(123)
>>> s
{True, 2.718, 'set', 28}
>>> s.discard(28)
>>> s
{True, 2.718, 'set'}
>>> s.clear()
>>> s
set()
>>> len(s)
0
>>>

获得两个集合的并集和交集.union()是获取两个集合的交集,intersection()是获取两个集合的并集。

>>> # Integers 1 - 10
...
>>> odds = set([1, 3, 5, 7, 9])
>>> evens = set([2, 4, 6, 8, 10])
>>> primes = set([2, 3, 5, 7])
>>> composites = set([4, 6, 8, 9, 10])
>>>
>>> odds.union(evens)
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
>>> evens.union(odds)
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
>>> odds
{1, 3, 5, 7, 9}
>>> evens
{2, 4, 6, 8, 10}
>>> odds.intersection(primes)
{3, 5, 7}
>>> primes.intersection(evens)
{2}

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s1 & s2
{2, 3}
>>> s1 | s2
{1, 2, 3, 4}

下面一个例子是判断集合中是否包含某项.使用in操作符.还可以判断是否不包括，使用not in操作符

>>> 2 in primes
True
>>> 6 in odds
False
>>> 9 not in evens
True

str是不变对象，而list是可变对象

对于可变对象，比如list，对list进行操作，list内部的内容是会变化的，比如：

>>> a = ['c', 'b', 'a']
>>> a.sort()
>>> a
['a', 'b', 'c']

而对于不可变对象，比如str，对str进行操作呢：

>>> a = 'abc'
>>> a.replace('a', 'A')
'Abc'
>>> a
'abc'

列表( Python Lists )

有两种方式创建列表，一种方式通过list()，更简单的方式直接[].
通过用append方法来插入值,在列表末尾插入，注意列表如何保存数据的顺序，这和集合(set)不同.集合中排序并不重要，而列表中排序就是一切.
通过insert方法可以把元素插入到指定的位置，比如索引号为1的位置.
要删除list末尾的元素，用pop()方法.
要删除指定位置的元素，用pop(i)方法，其中i是索引位置.
要把某个元素替换成别的元素，可以直接赋值给对应的索引位置.
通过remove方法，删除指定的元素
通过clear方法，清空列表
通过count方法，计算列表中元素的个数

>>> primes = [2, 3, 5, 7, 11, 13]
>>> primes.append(17)
>>> primes.append(19)
>>>
>>> primes
[2, 3, 5, 7, 11, 13, 17, 19]
>>> primes.insert(1, 23)
>>> primes
[2, 23, 3, 5, 7, 11, 13, 17, 19]
>>>
>>> primes.pop()
19
>>> primes
[2, 23, 3, 5, 7, 11, 13, 17]
>>> primes.pop(1)
23
>>> primes
[2, 3, 5, 7, 11, 13, 17]
>>> primes[0] = '0'
>>> primes
['0', 3, 5, 7, 11, 13, 17]
>>>
>>> numbers = [5, 2, 1, 7 ,4]
>>> numbers.remove(5)
>>> print(number)
[2, 1, 7, 4]
>>>
>>> numbers.clear()
>>> print(numbers)
[]
>>>
>>> numbers = [5, 2, 1, 5, 7 ,4]
>>> print(numbers.count(5))
2

可以通过索引查看具体值,最开始的索引值(index)是0.最开始的左边的索引值是-1.此时py会自动将末尾项索引值判定为-1.注意这样只能走一轮.

>>> primes
[2, 3, 5, 7, 11, 13, 17, 19]
>>> primes[0]
2
>>> primes[1]
3
>>> primes[-1]
19

使用切片(slicing),也是一种访问列表值的方式.返回列表中的一部分值.
先指定列表名,然后[起始位置:结束位置之前].注意切片包括起始位置,但不包括结束位置.

>>> primes
[2, 3, 5, 7, 11, 13, 17, 19]
>>>
>>> primes[2:5]
[5, 7, 11]

list里面的元素的数据类型也可以不同,list元素也可以是另一个list.要拿到’php’可以写p[1]或者s[2][1]，因此s可以看成是一个二维数组.如果一个list中一个元素也没有，就是一个空的list，它的长度为0.

>>> L = ['Apple', 123, True]
>>> s = ['python', 'java', ['asp', 'php'], 'scheme']
>>> len(s)
4
>>>
>>> L = []
>>> len(L)
0

还可以将两个列表拼接为一个,分别创建numbers和litters列表.通过+符号将两个列表拼接.这种拼接叫组合.注意,原来的numbers和letters列表不变.

>>> numbers = [1, 2, 3]
>>> letters = ['a', 'b', 'c']
>>> numbers + letters
[1, 2, 3, 'a', 'b', 'c']
>>> letters + numbers
['a', 'b', 'c', 1, 2, 3]

2D Lists

二维列表多用于数据分析和机器学习中，通常可以表示矩阵，例如：

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

for row in matrix:
    for item in row:
        print(item)

字典( Python Dictionaries )

创建字典:post = {"key1":value1, "key2":value2}.也可以使用dict()来创建字典.新增值用[],键写在其中,用=来进行赋值.注意在dict()中键名不需要双引号扩起来.但是通过[]新增时,要加上.

>>> post2 = dict(message = "SS Cotopaxi", languange = "English")
>>> print(post2)
{'message': 'SS Cotopaxi', 'languange': 'English'}
>>>
>>> post2["user_id"] = 209
>>> post2["datetime"] = 123456789
>>>
>>> print(post2)
{'message': 'SS Cotopaxi', 'languange': 'English', 'user_id': 209, 'datetime': 123456789}

用[]在字典中储存新数据,用()来访问数据.

1 2	>>> print(post2["message"]) SS Cotopaxi

避免访问字典中不存在的键的第一种方法是通过if判断属否包含

>>> if 'location' in post2:
...     print(post2['location'])
... else:
...     print("The post does not contain a location value.")
...
The post does not contain a location value.

或者使用try来抛出错误.如果字典中没有该键,会显示KeyError.

>>> try:
...     print(post2['location'])
... except KeyError:
...     print("The post does not contain a location value.")
...
The post does not contain a location value.

还有另一种访问字典数据的方法,以及处理是否包含某键的操作.通过dict提供的get()
方法.如果key不存在,可以返回None.或者自己指定的value.

1
2
3

>>> loc = post2.get('location', None)
>>> print(loc)
None

遍历字典中键值对的操作,简单的方法是遍历所有键,然后get(键)对应值,keys方法以列表返回一个字典所有的键,数据的顺序有可能不同.

>>> for key in post2.keys():
...     value = post2[key]
...     print(key, "=", value)
...
message = SS Cotopaxi
languange = English
user_id = 209
datetime = 123456789

另一个遍历方法循环中使用items()方法.每一次返回对应键值对

>>> for key, value in post2.items():
...     print(key, "=", value)
...
message = SS Cotopaxi
languange = English
user_id = 209
datetime = 123456789

从字典中删除数据也有很多方法,pop和popitem方法,允许从字典中移除一项,clear方法清空字典.

要删除一个key，用pop(key)方法，对应的value也会从dict中删除：

>>> d.pop('Bob')
75
>>> d
{'Michael': 95, 'Tracy': 85}

请务必注意，dict内部存放的顺序和key放入的顺序是没有关系的。

和list比较，dict有以下几个特点：

查找和插入的速度极快，不会随着key的增加而变慢；
需要占用大量的内存，内存浪费多。

而list相反：

查找和插入的时间随着元素的增加而增加；
占用空间小，浪费内存很少。

所以，dict是用空间来换取时间的一种方法

dict可以用在需要高速查找的很多地方，在Python代码中几乎无处不在，正确使用dict非常重要，需要牢记的第一条就是dict的key必须是不可变对象。

这是因为dict根据key来计算value的存储位置，如果每次计算相同的key得出的结果不同，那dict内部就完全混乱了。这个通过key计算位置的算法称为哈希算法（Hash）。

要保证hash的正确性，作为key的对象就不能变。在Python中，字符串、整数等都是不可变的，因此，可以放心地作为key。而list是可变的，就不能作为key：

>>> key = [1, 2, 3]
>>> d[key] = 'a list'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

字典生成式

k = ['张三', '李四', '王五']
s = [1000, 3000, 7000]

# print({x : y for x in k for y in s})

print({k : 3000 for k in ['张三', '李四', '王五']})

# 输出结果： {'张三': 3000, '李四': 3000, '王五': 3000}

元祖( Python Tuples )

tuple和list非常类似,但是tuple一旦初始化就不能修改.
元祖和列表也有区别,下面通过使用getsizeof函数对比同样数据下列表和元祖的大小.可以看到,列表比元祖占更多内存.

>>> import sys
>>>
>>> list_eg = [1,2,3,"a","b","c",True,3.1415]
>>> tuple_eg = (1,2,3,"a","b","c",True,3.1415)
>>>
>>> print("List size = ", sys.getsizeof(list_eg))
List size =  128
>>> print("tuple size = ", sys.getsizeof(tuple_eg))
tuple size =  112

元祖是固定的,不能改动值(Immutable),一旦创建元祖,就不能再改动了.
元祖生成速度比列表更快,需要使用timeit模块,查看创建一百万个列表和元祖的时间.首先引入timeit模块,该模块有相同的名字timeit,第一个参数要执行的语句,创建一个短的整数列表,接下来指定多次执行的次数,输出的就是创建这些次数的时间.

>>> import timeit
>>>
>>> list_test = timeit.timeit(stmt="[1,2,3,4,5]", number=1000000)
>>> tuple_test = timeit.timeit(stmt="(1,2,3,4,5)", number=1000000)
>>>
>>> print("List time:", list_test)
List time: 0.0711129419998997
>>> print("Tuple time:", tuple_test)
Tuple time: 0.015337733000023945

如果要定义一个空的tuple，可以写成(),只有1个元素的tuple定义时必须加一个逗号,.

>>> empty_tuple = ()
>>> test1 = ("a",)
>>> test2 = ("a", "b")
>>> test3 = ("a", "b", "c")
>>>
>>> print(empty_tuple)
()
>>> print(test1)
('a',)
>>> print(test2)
('a', 'b')
>>> print(test3)
('a', 'b', 'c')

还有一种创建元祖的方法,不写()也可以,刚才1个元素的元祖,要在末尾写,,打印值和类型,显示的都是元祖.

>>> test1 = 1,
>>> test2 = 1, 2
>>> test3 = 1, 2, 3
>>>
>>> print(test1)
(1,)
>>> print(test2)
(1, 2)
>>> print(test3)
(1, 2, 3)

与列表一样，元祖也可以用下表(index)来引用.

>>> # (age, country, knows_python)
...
>>> survey = (27, "Vietnam", True)
>>> age = survey[0]
>>> country = survey[1]
>>> knows_python = survey[2]
>>>
>>> print("Age =", age)
Age = 27
>>> print("Country =", country)
Country = Vietnam
>>> print("Konws Python ?", knows_python)
Knows Python ? True

但是元祖提供了更快的选择,可以将元祖中所有元素单独赋不同变量.元祖解释器要求1个元素必须写,一定要确保变量和元祖中值的数量一样，否则会报ValueError错误.

>>> survey2 = (21, "Switzerland", False)
>>>
>>> age, country, konws_python = survey2
>>> print("Age =", age)
Age = 21
>>> print("Country =", country)
Country = Switzerland
>>> print("Konws Python ?", knows_python)
Konws Python ? Flase

日志( Logging in Python )

Py中logging模块提供了记录进度和问题的一切.日志允许将状态信息写入文件或其他输出流,包含代码中哪些部分已执行,可能出现什么问题.每则日志都有一个级别,分为五级:Debug, Info, Warning, Error, Critical.

首先引入logging模块,通过basicConfig方法设置日志系统.然后使用getLogger方法,调整特性,直到工作方式符合我们的要求.

键入logging.basicConfig并指定文件名,Py会把内容记录到该文件中,如果不存在,会创建.然后创建对象logger,值为getLogger类.这个就作为根日志(root logger).

>>> import logging
>>>
>>> # Create and configure logger
...
>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log")
>>> logger = logging.getLogger()

现在开始测试,使用info方法,并设置内容字符串,

1
2
3

>>> # Test the logger
...
>>> logger.info("Our first message.")

Lumberjack.log文件虽然创建了,但是内容却是空的,日志信息毫无踪迹,为啥呢?原因出在配置logger的方式,打印logger.level.

1 2	>>> print(logger.level) 30

在日志模块,有6种级别,分别对应一下数值.

Level	Numeric Value
NOTSET	0
DEBUG	10
INFO	20
WARNING	30
ERROR	40
CRITICAL	50

日志写入信息,需要级别高于设置级别才行,刚才使用的info,值为20.但是basicConfig的默认级别是30的WARNING.解决办法，更新basicConfig,然后将级别设置为DEBUG(10).

>>> import logging
>>>
>>> # Create and configure logger
...
>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.DEBUG)
>>> logger = logging.getLogger()
>>> # Test the logger
...
>>> logger.info("Our first message.")
>>> print(logger.level)
10

此时再打开日志文件,就能看到测试信息了,先是日志级别(INFO),然后是日志名称,默认为root.然后是日志内容.格式可以，但缺少一些重要数据.日志创建时间

1 2	▶ cat Lumberjack.log INFO:root:Our first message.

现在改动日志内容格式,需要创建一个格式字符串.Py提供了丰富的日志内容格式,这里只要介绍asctime, levelname, message.
基于想用的属性创建一个字符串，格式为%(Levelname)s %(asctime)s - %(message)s.在basicConfig将format指定为LOG_FORMAT,

>>> import logging
>>>
>>> # Create and configure logger
...
>>> LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s"
>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.DEBUG, format = LOG_FORMAT)
>>> logger = logging.getLogger()
>>> # Test the logger
...
>>> logger.info("Our first message.")
>>>
>>> print(logger.level)
10

打开日志文件,就会看到新格式的内容.之前的信息还在那里.默认情况下,日志都是追加模式.

1
2
3

▶ cat Lumberjack.log
INFO:root:Our first message.
INFO 2019-10-07 00:58:39,325 - Our first message.

如果想要日志文件每次都重写,修改filemode为'w',然后改动内容再次测试.

>>> import logging
>>>
>>> # Create and configure logger
...
>>> LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s"
>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.DEBUG, format = LOG_FORMAT, filemode = 'w')
>>> logger = logging.getLogger()
>>> # Test the logger
...
>>> logger.info("Our SECOND message.")

在看之前的日志,之前的文件都被覆盖了,只有最新的内容.

1 2	▶ cat Lumberjack.log INFO 2019-10-07 01:06:52,276 - Our SECOND message.

使用不同级别的名称,可以记录不同的内容.

>>> import logging
>>>
>>> # Create and configure logger
...
>>> LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s"
>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.DEBUG, format = LOG_FORMAT, filemode = 'w')
>>> logger = logging.getLogger()
>>> # Test messages
...
>>> logger.debug("This id a harmless debug message.")
>>> logger.info("Just some useful info.")
>>> logger.warning("I'm sorry, but I can't do that, Dave.")
>>> logger.error("Did you just try to divide by zero?")
>>> logger.critical("The entire internet is down!!")

打开日志看到5条内容.

▶ cat Lumberjack.log
INFO 2019-10-07 01:06:52,276 - Our SECOND message.
DEBUG 2019-10-07 01:11:41,564 - This id a harmless debug message.
INFO 2019-10-07 01:13:25,214 - Just some useful info.
WARNING 2019-10-07 01:14:20,196 - I'm sorry, but I can't do that, Dave.
ERROR 2019-10-07 01:15:23,000 - Did you just try to divide by zero?
CRITICAL 2019-10-07 01:15:48,875 - The entire internet is down!!

将级别改为ERROR,再来运行.

1	>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.ERROR, format = LOG_FORMAT, filemode = 'w')

现在只看到error和critical内容.

1
2
3

▶ cat Lumberjack.log
ERROR 2019-10-07 01:19:20,247 - Did you just try to divide by zero?
CRITICAL 2019-10-07 01:19:21,971 - The entire internet is down!!

Let’s see it in action.
编写一个二次函数公式,求根公式,先导入math模块,需要平方根函数,函数名为quadratic_formula,未来人们会歌颂我们创建的函数名字,函数有a,b,c三个参数.
日志内容和注释一样,看起来有点重复,一旦确信代码工作正常,删除日志内容,保留注释.

import logging
import math

# Create and configure logger

LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s"
logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.DEBUG, format = LOG_FORMAT, filemode = 'w')
logger = logging.getLogger()

def quadratic_formula(a, b, c):
    """Return the solutions to the equation ax^2 + bx + c = 0."""
    logger.info("quadratic_formula({0}, {1}, {2})".format(a, b, c))

    # Compute the discriminant
    logger.debug("# Compute the discriminant")
    disc = b**2 - 4*a*c

    # Compute the two roots
    logger.debug("# Compute the two roots.")
    root1 = (-b + math.sqrt(disc)) / (2*a)
    root2 = (-b - math.sqrt(disc)) / (2*a)

    # Return the roots
    logger.debug("# Return the roots.")
    return (root1, root2)

roots = quadratic_formula(1, 0, -4)
print(roots)

函数返回正确结果,查看日志文件,会看到详细内容,

▶ python3 log.py
(2.0, -2.0)
▶ cat Lumberjack.log
INFO 2019-10-07 01:47:04,074 - quadratic_formula(1, 0, -4)
DEBUG 2019-10-07 01:47:04,074 - # Compute the discriminant
DEBUG 2019-10-07 01:47:04,074 - # Compute the two roots.
DEBUG 2019-10-07 01:47:04,075 - # Return the roots.

现在再测试函数,将-4改为1

1	roots = quadratic_formula(1, 0, 1)

报错了,查看日志文件,只看到2条debug内容.所以问题是我们求根是产生的.

▶ python3 log.py
Traceback (most recent call last):
  File "log.py", line 28, in <module>
    roots = quadratic_formula(1, 0, 1)
  File "log.py", line 20, in quadratic_formula
    root1 = (-b + math.sqrt(disc)) / (2*a)
ValueError: math domain error

▶ cat Lumberjack.log
INFO 2019-10-07 01:50:48,174 - quadratic_formula(1, 0, 1)
DEBUG 2019-10-07 01:50:48,175 - # Compute the discriminant
DEBUG 2019-10-07 01:50:48,175 - # Compute the two roots.

递归与记忆化( Recursion, the Fibonacci Sequence and Memoization )

Fibonacci Sequence(Recursive Fuction)

>>> def fibonacci(n):
...     if n == 1:
...             return 1
...     elif n == 2:
...             return 1
...     elif n > 2:
...             return fibonacci(n - 1) + fibonacci(n - 2)
... 
>>> for n in range(1, 11):
...     print(n, ":", fibonacci(n))
...
1 : 1
2 : 1
3 : 2
4 : 3
5 : 5
6 : 8
7 : 13
8 : 21
9 : 34
10 : 55

但是这样写有个弊端，当执行前100项的时候，由于不停重复递归，执行起来非常的慢。为了解决这个问题，可以通过记忆化(Memoization)。记忆化背后想法很简单，储存返回值(Cache value)。这样再调用就不重复执行。

创建字典fxx_cache，储存调用结果，然后重写f函数，并使用储存的值。首先检查n项的值是否在字典里，如果在直接返回。不在就求他的值，先计算然后储存起来再返回。

fibonacci_cache = {}

def fibonacci(n):
    # If we have cache the value, then return it
    if n in fibonacci_cache:
        return fibonacci_cache[n]

    # Compute the Nth term
    if n == 1:
        value = 1
    elif n == 2:
        value = 1
    elif n > 2:
        value = fibonacci(n-1) + fibonacci(n-2)

    # Cache the value and return it
    fibonacci_cache[n] = value
    return value

for n in range(1, 101):
    print(n, ":", fibonacci(n))

测试之后发现就算测试1000项也是瞬间完成。

还有一个更简单的方法。从functools模块中导入装饰器lru_cache。代表最近最少使用原则，提供一种单向的缓存方式。在函数前键入@lru_cache(),在括号中指定要缓存的条目数,默认Py储存128个最近使用的值。

from functools import lru_cache

@lru_cache(maxsize = 1000)
def fibonacci(n):
    if n == 1:
        return 1
    elif n == 2:
        return 1
    elif n > 2:
        return fibonacci(n-1) + fibonacci(n-2)

for n in range(1, 501):
    print(n, ":", fibonacci(n))

这样执行结果会非常的快，但是当输入2.1，-1，‘1’的时候会报错，因为我们没有做参数的类型检查。

from functools import lru_cache

@lru_cache(maxsize = 1000)
def fibonacci(n):
    # Check that the input is a positive integer
    if type(n) != int:
        raise TypeError("n must be a positive int")
    if n < 1:
        raise ValueError("n must be a positive int") 

    # Compute the Nth term 
    if n == 1:
        return 1
    elif n == 2:
        return 1
    elif n > 2:
        return fibonacci(n-1) + fibonacci(n-2)

for n in range(1, 501):
    print(n, ":", fibonacci(n))

随机模块( Python Random Number Generator: the Random Module )

先来一条需要重视的警告，密码(cryptography)中常用随机数，但随机模块中的函数并不为此而构建，Py警告我们不要将该模块用于安全目的。但对大多数其他程序来说，随机模块是很棒的工具。

首先要导入random模块,先从random函数开始。该函数返回0到1之间的随机数，注意会返回0，但是不会返回1。

# Display 10 random numbers from interval [0, 1)

import random

for i in range(10):
    print(random.random())

python3 ran.py
0.5106691035823852
0.46531635162452156
0.6855409873422025
0.09298690069781068
0.5117634349489444
0.11244089989308159
0.3869049755854831
0.3035650646012348
0.11256644727198029
0.6335314012411107

random函数还可以用来表示均匀分布(uniform distribution),意味所选数字的概率在区间内平均分布。

random函数可以用来定制随机数生成器，这是随机模块中许多函数的核心部分。

例如生成3~7之间的随机数:使用uniform函数，该函数从属于random模块.

# Generate random numbers from interval [3, 7)

import random

for i in range(10):
    print(random.uniform(3, 7))

python3 ran.py
5.065995219714599
6.053861064813075
4.355351836924287
3.8442052835650293
3.4617765863944605
6.012704355572884
4.352595206539573
5.217513736639914
4.868314576510137
4.154966578672884

random函数和uniform函数都符合均分布,其他分布的一些数组也会被用到。最常见的是正态分布，也叫钟形曲线。正态分布有两个参数描述，平均值(mean)和标准差(standard deviation).平均值( $\mu$ )就是钟形曲线的顶点，标准差( $\sigma$ )描述曲线的宽度或窄度.

从正态分布生成随机数，使用normalvariate函数，使用该函数必须设置两个参数的值。

例如,基于正态分布生成20个数字，平均值0，标准差1:

import random

for i in range(20):
    print(random.normalvariate(0, 1))

▶ python3 ran.py
0.8333294121536797
0.755624821412367
1.478740726288408
0.07938317039808185
0.291908992130876
-0.22504815004343645
-0.8676884812379501
-0.8029345982713375
-1.043558392568845
-3.6805077274987053
0.21430472612013182
1.046297522716622
1.3217391478625042
-0.15949303555422145
-0.12060525806010007
1.1275886234780748
1.691386051133269
0.7738680349015079
1.3120164616558796
-0.9755020257704168

注意:这些数字都在平均值0附近聚集.如果将标准差改为9，这些数字的范围会变得更广。如果将标准差改小，生成的随机数会更收缩.

有时不需从无限多的可能中选择随机数，例如摇骰子，6个值之一。使用randint函数.

例如，生成1到6之间的随机整数:

import random

for i in range(20):
    print(random.randint(1, 6))

python3 ran.py
2
1
2
1
3
6
3
5
4
5
1
2
2
6
5
2
6
3
3
1

有时甚至需要随机从非数字列表中选,假如编写一个石头剪子布游戏。

使用choice函数从列表中随机输出值，在choice函数中指定Py要选择的列表

import random

outcomes = ['rock', 'paper', 'scissors']

for i in range(20):
    print(random.choice(outcomes))

python3 ran.py
rock
paper
rock
scissors
scissors
paper
paper
scissors
paper
rock
scissors
scissors
paper
rock
scissors
scissors
rock
paper
rock
rock

处理CSV数据格式( CSV Files in Python )

CSV是逗号分割值的缩写，是储存数据的常用格式。使用它们不需要额外驱动或API。

CSV是一个包含数据的文本文件，通常文件第一行都是头文件，后续的都是数据。将每行都看作是数据库中的记录，每行数据块都是由逗号分割。注意此为文本文件，没有数据类型。

当阅读CSV文件，有必要将数据转换为适当的数据类型。看到,,意味着一段数据消失了。

下面导入本地的股票csv数据，通过open函数打开该路径下的文件。

path = "/Users/zhangqiang/Desktop/Programming/Python/blog_learning/google_stock_data.csv"

file = open(path)
for line in file:
    print(line)

为了有用，需要储存这些数据。下面展示两种实现方式:

先在不使用csv模块下分析数据，通过列表快速读取文件内容。注意:每一行都视为一个字符串！此外，每行最后都有换行符。

>>> path = "/Users/zhangqiang/Desktop/Programming/Python/blog_learning/google_stock_data.csv"
>>> lines = [line for line in open(path)]
>>> lines[0]
'Date,Open,High,Low,Close,Volume,Adj Close\n'
>>> lines[1]
'8/19/2014,585.002622,587.342658,584.002627,586.862643,978600,586.862643\n'

使用strip方法移除字符串从头到尾指定的字符(默认为空格)，使用split方法将其切成小片，只需要指定分隔符,。这样每行读起来更好。

>>> lines[0].strip()
'Date,Open,High,Low,Close,Volume,Adj Close'
>>> lines[0].strip().split(',')
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']

再来读取遍数据，本次注意移除不必要的空隙，然后将其切成分开的值。再来检查前两行，会发现每行数据都写在列表中。

>>> dataset = [lines.strip().split(',') for lines in open(path)]
>>> dataset[0]
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
>>> dataset[1]
['8/19/2014', '585.002622', '587.342658', '584.002627', '586.862643', '978600', '586.862643']

不幸的是，数据仍然是字符串。还有另一个问题，如果csv数据包含书籍或电影怎么办？许多标题下会有评论，用拆分这些标题和评论。遇到这样的问题就要引入csv模块了。

首先导入csv模块，首先打开文件，注意使用了newline关键字参数，然后传入空的字符串。因为不同操作系统，字符串结尾有换行或回车或都有。这么写有利于csv在不同平台都能正常工作。创建变量reader，值为调用csv.reader读取文件。

第一行是头文件，不包含数据，通过next函数提取第一行。现在再来读取数据。

import csv

path = "/Users/zhangqiang/Desktop/Programming/Python/blog_learning/google_stock_data.csv"
file = open(path, newline='')
reader = csv.reader(file)

header = next(reader)   # The first line is the header
data = [row for row in reader]  # Read the remaining data

print(reader)
print(data[0])

列表推导式( List Comprehension )

列表推导式也是由[]构成，接下来创建带if语句的列表推导式:【 expr for val in collection 】。这个是最基本的列表推导式格式，列表中第一个元素是表达式，对收集到的数据进行循环，每一项都执行表达式。

如果只想循环包含某些条件的数据块，接下来在for循环后创建带if语句的列表推导式，只有在if语句为真，才执行表达式:【 expr for val in collection if (test) 】。

接下来创建带if语句的列表推导式，可以是多个if语句:【 expr for val in collection if (test1) and (test2) 】。

也可以循环多组数据:【 expr for val1 in collection1 and val2 in collection2 】。

第一个例子:创建前100正整数的平方的列表

先写一个不用列表推导式的:

squares = []
for i in range(1, 101):
    squares.append(i**2)

print(squares)

1
2
3

python3 list_com.py

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801, 10000]

再用列表推导式实现一遍:中括号中第一个是每一项要执行的表达式，最后执行循环范围

1 2	squares2 = [i**2 for i in range(1, 101)] print(squares2)

1
2
3

python3 list_com.py

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801, 10000]

再看下一个例子:获取前100个数的平方值除以5的余数列表

1
2
3

# x ** 2 % 5 <====> x^2 (mod5)
remainder5 = [x**2 % 5 for x in range(1, 101)]
print(remainder5)

1
2
3

python3 list_com.py

[1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0]

打印后，除以5的余数有3种结果0，1，4。

观察模是任意质数p时，会看到一个有趣的模式，结果长度就是p+1再除以2。找出列表中数字的问题是被称为二次互反律的复杂数学问题，最早由高斯证明。

\begin{aligned} p_remainders &= [ x ** 2 \% p for x in range(0, p)] \\ len(p_remainders) &= \frac{p+1}{2} \end{aligned}

接下来创建带if语句的列表推导式:假设有一个电影名的列表，找出那些以字母G开头的列表

首先先不用列表推导式:需要额外创建一个空列表，然后遍历数据列表movies，使用startswith方法，查找是否以G开头。如果是插入创建的列表中。

movies = ["Star Wars", "Gandhi", "Casablanca",
          "Shawshank Redemption", "2001: A Space Odyssey", "Groundhog Day"]

gmovies = []
for title in movies:
    if title.startswith("G"):
        gmovies.append(title)
print(gmovies)

1
2
3

python3 list_com.py

['Gandhi', 'Groundhog Day']

通过列表推导式，4行代码1行搞定:表达式首先指定title，循环遍历movies，if判断是否首字母是G。

movies = ["Star Wars", "Gandhi", "Casablanca",
          "Shawshank Redemption", "2001: A Space Odyssey", "Groundhog Day"]

gmovies = [title for title in movies if title.startswith("G")]
print(gmovies)

1
2
3

python3 list_com.py

['Gandhi', 'Groundhog Day']

再看下一个例子:假设movies是一个元祖，包含电影名和发行时间。如果想找到2000之前发行的电影，怎么通过列表推导式实现？

先创建一个只包含名字的列表，但每次循环遍历的元素是元祖。创建if语句是判断年份的列表推导式。用并不存在于title列表的year进行判断最后只有电影名。

movies = [("Citizen Kane", 1941), ("Spirited Away", 2001), ("Gattaca", 1997), ("The Aviator", 2004), ("Raiders of the Lost Ark", 1981)]

pre2k = [title for (title, year) in movies if year < 2000]

print(pre2k)

1
2
3

python3 list_com.py

['Citizen Kane', 'Gattaca', 'Raiders of the Lost Ark']

再看下一个例子:假设有一个v列表，如何实现v的标量乘法让每一项*4。可以通过列表推导式实现标量乘法。

注意:4*v不能实现，只是进行了4次连接，在Py中两个列表相加会连接在一起。

>>> v = [2, -3, 1]
>>> 4 * v
[2, -3, 1, 2, -3, 1, 2, -3, 1, 2, -3, 1]
>>>
>>> [2, 4, 6] + [1, 3]
[2, 4, 6, 1, 3]

1
2
3

v = [2, -3, 1]
w = [4*x for x in v]
print(w)

1	[8, -12, 4]

接下来是最后一个例子:用列表推导式求集合的笛卡尔积。假设你有A,B两个数组，笛卡尔积是指一对集合，a是集合A中元素,b是集合B中元素。数学表达式通常如下

A \times B = \lbrace (a,b) \ |\ a \in A, b \in B \rbrace

列表推导式中用元祖(a, b)，先循环集合A，再循环集合B。

A = [1, 3, 5, 7]
B = [2, 4, 6, 8]

cartesian_product = [(a, b) for a in A for b in B]
print(cartesian_product)

1
2
3

python3 list_com.py

[(1, 2), (1, 4), (1, 6), (1, 8), (3, 2), (3, 4), (3, 6), (3, 8), (5, 2), (5, 4), (5, 6), (5, 8), (7, 2), (7, 4), (7, 6), (7, 8)]

运用列表生成式，可以写出非常简洁的代码。例如，列出当前目录下的所有文件和目录名，可以通过一行代码实现：

1
2
3

>>> import os # 导入os模块，模块的概念后面讲到
>>> [d for d in os.listdir('.')] # os.listdir可以列出文件和目录
['.emacs.d', '.ssh', '.Trash', 'Adlm', 'Applications', 'Desktop', 'Documents', 'Downloads', 'Library', 'Movies', 'Music', 'Pictures', 'Public', 'VirtualBox VMs', 'Workspace', 'XCode']

列表生成式也可以使用两个变量来生成list：

1
2
3

>>> d = {'x': 'A', 'y': 'B', 'z': 'C' }
>>> [k + '=' + v for k, v in d.items()]
['y=B', 'x=A', 'z=C']

生成器和迭代器

生成器

通过列表生成式，我们可以直接创建一个列表。但是，受到内存限制，列表容量肯定是有限的。而且，创建一个包含100万个元素的列表，不仅占用很大的存储空间，如果我们仅仅需要访问前面几个元素，那后面绝大多数元素占用的空间都白白浪费了。

所以，如果列表元素可以按照某种算法推算出来，那我们是否可以在循环的过程中不断推算出后续的元素呢？这样就不必创建完整的list，从而节省大量的空间。在Python中，这种一边循环一边计算的机制，称为生成器：generator。

要创建一个generator，有很多种方法。第一种方法很简单，只要把一个列表生成式的[]改成()，就创建了一个generator：

>>> L = [x * x for x in range(10)]
>>> L
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> g = (x * x for x in range(10))
>>> g
<generator object <genexpr> at 0x1022ef630>

如果要一个一个打印出来，可以通过next()函数获得generator的下一个返回值：

>>> next(g)
0
>>> next(g)
1
>>> next(g)
4
>>> next(g)
9
>>> next(g)
16

当然，上面这种不断调用next(g)实在是太变态了，正确的方法是使用for循环，因为generator也是可迭代对象：

>>> g = (x * x for x in range(10))
>>> for n in g:
...     print(n)
... 
0
1
4
9
16
25
36
49
64
81

所以，我们创建了一个generator后，基本上永远不会调用next()，而是通过for循环来迭代它，并且不需要关心StopIteration的错误

定义generator的另一种方法：如果一个函数定义中包含yield关键字，那么这个函数就不再是一个普通函数，而是一个generator：

这里，最难理解的就是generator和函数的执行流程不一样。函数是顺序执行，遇到return语句或者最后一行函数语句就返回。而变成generator的函数，在每次调用next()的时候执行，遇到yield语句返回，再次执行时从上次返回的yield语句处继续执行

>>> def odd():
...     print("111")
...     yield 1
...     print("222")
...     yield 3
...     print("333")
...     yield 5
...
>>> o = odd()
>>> next(o)
111
1
>>> next(o)
222
3
>>> next(o)
333
5
>>> next(o)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

迭代器

可以被next()函数调用并不断返回下一个值的对象称为迭代器：Iterator

生成器都是Iterator对象，但list、dict、str虽然是Iterable，却不是Iterator。

把list、dict、str等Iterable变成Iterator可以使用iter()函数：

>>> isinstance(iter([]), Iterator)
True
>>> isinstance(iter('abc'), Iterator)
True

类和对象( Python Classes and Objects )

在Py中先写关键字class,然后在写类名，推荐类名首字母大写。

首先创建类User，现在已定义了类,供不同使用者调用，创建变量user1，值为类名加()，和调用方法有些像。
实际上称呼user1为User类的实例(instance),也可以称呼user1为对象(object).添加数据得先写对象名,然后.变量名，最后是值。
下面给对象first_name和last_name两个值。first_name和last_name写入对象后，被称为域(field)。在user1中储存具体数据.
注意,属性名不要大写.

>>> class User:
...     pass
...
>>> user1 = User()
>>> user1.first_name = "Dave"
>>> user1.last_name = "Bowman"
>>> # (`对象名.属性名`)
>>> print(user.first_name)
Dave

类中可以创建对象，不同对象的变量名一样，值可以不一样。

>>> user2 = User()
>>> user2.first_name = "Frank"
>>> user2.last_name = "Poole"
>>>
>>> print(user1.first_name, user1.last_name)
Dave Bowman
>>> print(user2.first_name, user2.last_name)
Frank Poole

下面展示类的其他功能,增加方法，初始化，文档字符串，可以把简单的类变为一个数据中心。
第一个特性来添加init方法, 类中的函数叫做方法.init方法前后都要写__init__(魔法方法), 每次创建新的实例对象都会调用它。第一个参数是self，指向最新创建的对象，在self参数后面还可以添加其他参数。例如full_name和birthday.第一件事将这些值存到对象属性中，self.name值为full_name。full_name存储为属性self.name.self.birthday值为birthday。此处要小心，这个等号后面的birthday在调用对象是指定。self.birthday的birthday是保存刚才值的属性。
下面创建一个user，并使用__init__方法，但是需要提供两个值，因为这里的__init__方法需要两个值，第一个是名字，第二个是生日。分别打印该对象两个属性值

class User:
    def __init__(self, full_name, birthday):
        self.name = full_name
        self.birthday = birthday  # yyyymmdd

user = User("Dave Bowman", "19710315")
print(user.name)
print(user.birthday)

python3 classes.py

Dave Bowman
19710315

再来给类添加一个功能，在__init__方法中，分别提取first和last名字，对full_name使用split方法来切片，每当遇到空格就会将值切开,切片会储存在数组中, first_name值就是数组的第一项[0],last_name就是数组的最后一项,也就是-1。

class User:
    def __init__(self, full_name, birthday):
        self.name = full_name
        self.birthday = birthday  # yyyymmdd

        # Extract first and last names
        name_pieces = full_name.split(" ")
        self.first_name = name_pieces[0]
        self.last_name = name_pieces[-1]

user = User("Dave Bowman", "19710315")
print(user.name)
print(user.first_name)
print(user.last_name)
print(user.birthday)

python3 classes.py

Dave Bowman
Dave
Bowman
19710315

通过写帮助文档来提升类。再来给User类添加一个按年头返回年龄的方法，和__init__方法一样，第一个参数self，为了显示责任心，写一段注释。通过生日来计算年龄，所以该方法不需要额外的参数。因为会用到日期，导入datetime模块，
首先获得今天的日期，将日期写死成2001.05.12，为了使每个人测试结果相同，接下来将生日转成日期格式，通过[起:止]分别获取年月日。
可以创建一个User生日的对象，如果计算今天和生日中日期的差，得到一个日期增量对象，属性是days，不考虑闰年，拿日期差除以365，最后将age_in_years转成整型并返回。
测试这个方法，还是用之前的user,打印user.age().

import datetime

class User:
    """A member of FriendFace. For now we are only storing their name and birthday. But soon we will store an uncomfortable amount of user information."""
    def __init__(self, full_name, birthday):
        self.name = full_name
        self.birthday = birthday  # yyyymmdd

        # Extract first and last names
        name_pieces = full_name.split(" ")
        self.first_name = name_pieces[0]
        self.last_name = name_pieces[-1]

    def age(self):
        """Return the age of the user in years."""
        today = datetime.date(2001, 5, 12)
        yyyy = int(self.birthday[0:4])
        mm = int(self.birthday[4:6])
        dd = int(self.birthday[6:8])
        dob = datetime.date(yyyy, mm, dd) # Date of birth
        age_in_days = (today - dob).days
        age_in_years = age_in_days / 365
        return int(age_in_years)

user = User("Dave Bowman", "19710315")
print(user.age())

1 2	python3 classes.py 30

注意调用age方法是，不用写self.只有编写方法时才会写关键字self.
再来使用help(User)

help(User)

Help on class User in module __main__:

class User(builtins.object)
 |  User(full_name, birthday)
 |  
 |  A member of FriendFace. For now we are only storing their name and birthday. But soon we will store an uncomfortable amount of user information.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, full_name, birthday)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  age(self)
 |      Return the age of the user in years.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
:

总结中会出现age方法的描述.

[总结-类(Classes)]:

通过域(属性)(Fields)将数据整合
执行的函数叫方法(Methods)
类中的很多对象叫实例(Instances)

类的继承( Inheritance )

class Animal(object):
    def run(self):
        print('Animal is running...')

class Dog(Animal):

    def run(self):
        print('Dog is running...')

class Cat(Animal):

    def run(self):
        print('Cat is running...')

dog = Dog()
dog.run()

cat = Cat()
cat.run()

1 2	Dog is running... Cat is running...

super().init()

class Person:
    def __init__(self, name = 'Person'):
        self.name = name


class Puple(Person):  # 直接继承Person，可调用name
    pass


class Puple_Init(Person):  # 继承Person，覆盖父类__init__方法，增加age属性
    def __init__(self, age): # 不可调用name属性
        self.age = age

class Puple_Super(Person):  # 继承Person，改写__init__方法，增加age属性
    def __init__(self, name, age):  # 可调用name属性
        # super(Class, self).method()
        super(Puple_Super, self).__init__(name)
        self.age = age

模块( Modules )

Python中的一个模块基本上是一个带有python代码的文件。在Python中，一个.py文件就称之为一个模块（Module）。

为了避免模块名冲突，Python又引入了按目录来组织模块的方法，称为包（Package）。

举个例子，一个abc.py的文件就是一个名字叫abc的模块，一个xyz.py的文件就是一个名字叫xyz的模块。

现在，假设我们的abc和xyz这两个模块名字与其他模块冲突了，于是我们可以通过包来组织模块，避免冲突。方法是选择一个顶层包名，比如mycompany，按照如下目录存放：

mycompany
├─ __init__.py
├─ abc.py
└─ xyz.py

引入了包以后，只要顶层的包名不与别人冲突，那所有模块都不会与别人冲突。现在，abc.py模块的名字就变成了mycompany.abc，类似的，xyz.py的模块名变成了mycompany.xyz。

请注意，每一个包目录下面都会有一个__init__.py的文件，这个文件是必须存在的，否则，Python就把这个目录当成普通目录，而不是一个包。init.py可以是空文件，也可以有Python代码，因为__init__.py本身就是一个模块，而它的模块名就是mycompany。

类似的，可以有多级目录，组成多级层次的包结构。比如如下的目录结构：

mycompany
 ├─ web
 │  ├─ __init__.py
 │  ├─ utils.py
 │  └─ www.py
 ├─ __init__.py
 ├─ abc.py
 └─ utils.py

文件www.py的模块名就是mycompany.web.www，两个文件utils.py的模块名分别是mycompany.utils和mycompany.web.utils。

注意：自己创建模块时要注意命名，不能和Python自带的模块名称冲突。例如，系统自带了sys模块，自己的模块就不可命名为sys.py，否则将无法导入系统自带的sys模块。

引入模块

例如这里我们编写了以下两个函数位于converters.py文件中:

def lbs_to_kg(weight):
    pass
def kg_to_lbs(weight):
    pass

我们在同一路径下另一个文件中就可以引用:

1
2
3

import converters

print(converters.kg_to_lbs(70))

我们还可以用另外的方法引用：

1
2
3

from converters import kg_to_lbs

kg_to_lbs(100)

从我们的模块中得到一个特定的函数时候，就可以直接调用这个函数，前面不需要加上模块名前缀。

from … import * 语句

把一个模块的所有内容全都导入到当前的命名空间也是可行的，只需使用如下声明：

1	from modname import *

这提供了一个简单的方法来导入一个模块中的所有项目。然而这种声明不该被过多地使用。

name属性

一个模块被另一个程序第一次引入时，其主程序将运行。如果我们想在模块被引入时，模块中的某一程序块不执行，我们可以用__name__属性来使该程序块仅在该模块自身运行时执行。

#!/usr/bin/python3
# Filename: using_name.py

if __name__ == '__main__':
   print('程序自身在运行')
else:
   print('我来自另一模块')

1
2
3

python using_name.py

程序自身在运行

python
>>> import using_name
我来自另一模块
>>>

说明：每个模块都有一个__name__属性，当其值是__main__时，表明该模块自身在运行，否则是被引入。

包( Packages )

为了组织好模块，会将多个模块分为包。Python 处理包也是相当方便的。简单来说，包就是文件夹，但该文件夹下必须存在__init__.py文件。

常见的包结构如下：

package_a
├─ __init__.py
├─ module_a1.py
└─ module_a2.py
└─ ...

最简单的情况下，只需要一个空的__init__.py文件即可。当然它也可以执行包的初始化代码，或者定义稍后介绍的__all__变量。当然包底下也能包含包，这和文件夹一样，还是比较好理解的。

当我们使用from语句导入时，我们可以从这个包(package)导入一个特定的模块(module)，或者我们可以从package.module开始，然后导入一个或多个特定的函数。

无论是隐式的还是显式的相对导入都是从当前模块开始的。主模块的名字永远是__main__，一个Python应用程序的主模块，应当总是使用绝对路径引用。

lambda表达式和匿名函数( Lambda Expressions & Anonymous Functions )

匿名和lambda可以互换

创建lambda表达式前先写lambda，然后紧跟着输入，接下来输入冒号，最后输入能返回值的具体表达式。

1
2
3

>>> g = lambda x: 3*x+1
>>> g(2)
7

来看下一个例子，假设你要处理来自网页注册表单的数据，你想把分开的名和姓在用户界面上拼成一个全名。

匿名函数有两个输入fn和ln，首先两个值都要通过strip函数去掉收尾的空格。使用title函数确保只有首字母大写。注意我们在fn和ln之间加了一个空格，

1
2
3

>>> full_name = lambda fn,ln: fn.strip().title() + " " + ln.strip().title()
>>> full_name("    leonhard", "EULER")
'Leonhard Euler'

和使用其他函数一样测试它。

以下是创建lambda表达式的基本流程:

先输入lambda关键字后面是若干参数，和函数一样，匿名函数可以不带输入(参数)
然后输入冒号
最后输入一个表达式，表达式结果是返回值

注意:多行函数不能使用lambda表达式

下面看一下lambda表达式的常见用法:不命名

假设有一个科学家的名字的列表scifi_authors，通过姓来排序，注意，其中有一些有中间名，有的有首字母。
我们的策略创建一个匿名函数提取姓，基于此来排序对应值。列表有内置方法sort，key参数指定要排序的目标。我们传入一个lambda表达式，通过split按照空格切割来找到姓，索引-1找到最后一个，最后预防措施，转为小写。这样排序就不分大小写了。

scifi_authors = ["Isaac Asimov", "Ray Bradbury", "Robert Heinlein", "Arthus C. Clarke", "Frank Herbert", "Orson Scott Card", "Douglas Adams", "H. G. Wells", "Leigh Brackett"]
>>> scifi_authors.sort(key=lambda name: name.split(" ")[-1].lower())
>>> scifi_authors
['Douglas Adams', 'Isaac Asimov', 'Leigh Brackett', 'Ray Bradbury', 'Orson Scott Card', 'Arthus C. Clarke', 'Robert Heinlein', 'Frank Herbert', 'H. G. Wells']

继续深入，接下来我们要编写一个函数。假设你是用二次函数，也许你在计算炮弹轨迹。

>>> def build_quadratic_function(a, b, c):
...     """Returns the function f(x) = ax^2 + bx + c"""
...     return lambda x: a*x**2 + b*x + c
...
>>> f = build_quadratic_function(2, 3, -5)
>>> f(0)
-5
>>> f(1)
0
>>> f(2)
9

还有更神奇的测试，不同别名直接调用该函数。函数的创建不一样了，在(3,0,1)后直接(2)

1 2	>>> build_quadratic_function(3, 0, 1)(2) # 3x^2+1 evaluated for x=2 13

必备函数map filter reduce操作( Map, Filter, and Reduce Functions )

map函数

先看map函数，假设有一个计算半径r面积的函数，我们需要计算很多不同圆的面积，这是要计算的半径列表。

方法一可以创建一个空的列表areas，接下来循环遍历radii中值，分别计算每个半径对应的面积，然后将值插入到areas列表中。

import math


def area(r):
    """Area of a cricle with radius 'r'."""
    return math.pi * (r**2)


radii = [2, 5, 7.1, 0.3, 10]

# Method 1: Direct method

areas = []
for r in radii:
    a = area(r)
    areas.append(a)

print(areas)

1 2	python3 map_radii.py [12.566370614359172, 78.53981633974483, 158.36768566746147, 0.2827433388230814, 314.1592653589793]

使用map函数我们一行代码就能搞定,map函数需要2个参数，一个是函数，一个是列表，元祖或其他要遍历的对象

在这里，map会对radii中每个元素应用area函数。但是注意，map函数输出不是一个列表，是一个map对象，对结果的迭代。这是非常好的，尤其是在处理大量数据的时候，我们可以将map结果存到一个列表中。

import math


def area(r):
    """Area of a cricle with radius 'r'."""
    return math.pi * (r**2)


radii = [2, 5, 7.1, 0.3, 10]

# Method 2: Use 'map' function

print(list(map(area, radii)))

1 2	python3 map_radii.py [12.566370614359172, 78.53981633974483, 158.36768566746147, 0.2827433388230814, 314.1592653589793]

下面是map函数的工作方式

假设你有列表，元祖或其他可迭代的数据集，会对每段数据应用f函数，使用map函数首先写函数f，然后写要遍历的数据。map函数会返回通过f函数遍历数据后的结果。

第二个例子在地图类数据上应用map函数。假设你在天气预报服务工作，所有的温度数据都按照摄氏度储存，然后出乎意料有人需要看华氏温度。

我们的目标是将列表中温度转为华氏温度，摄氏度转华氏温度公式: $F = \frac{9}{5} \cdot C + 32$ 。首先通过lambda表达式完成公式，输入接受一个元祖，返回元祖第0项，名字不变，但摄氏温度转为华氏。

temps = [("Berlin", 29), ("Cairo", 36), ("Buenos Aires", 19), ("Los Angeles", 26), ("Tokyo", 27), ("New York", 28), ("London", 22), ("Beijing", 32)]

c_to_f = lambda data: (data[0], (9/5)*data[1] + 32)

print(list(map(c_to_f, temps)))

1 2	python3 map_radii.py [('Berlin', 84.2), ('Cairo', 96.8), ('Buenos Aires', 66.2), ('Los Angeles', 78.80000000000001), ('Tokyo', 80.6), ('New York', 82.4), ('London', 71.6), ('Beijing', 89.6)]

map()作为高阶函数，还可以计算任意复杂的函数，比如，把这个list所有数字转为字符串：

1 2	>>> list(map(str, [1, 2, 3, 4, 5, 6, 7, 8, 9])) ['1', '2', '3', '4', '5', '6', '7', '8', '9']

filter函数

filter函数从列表，元祖等数据集中过滤需要的数据，因为可以过滤掉不需要的数据所以叫fillter。

假设你在分析一些数据，需要筛选高于平均值的所有值。

因为包含平均数，先导入statistic模块，计算这些数据的平均值。然后通过filter函数过滤大于平均值的数，第一个参数是函数，创建一个匿名函数来检测输入值是否大于平均值。第二个参数是传入的数据。
filter函数只会返回结果为True的数据，重申返回值不是列表，但filter对象会遍历结果。通过list()将过滤器结果保存为列表。

import statistics

data = [1.3, 2.7, 0.8, 4.1, 4.3, -0.1]
avg = statistics.mean(data)

print("avg = %lf" % avg)
print(list(filter(lambda x: x > avg, data)))

python3 filter_avg.py

avg = 2.183333
[2.7, 4.1, 4.3]

再来看一个filter函数的用法。当你处理来自外部源的数据时，经常会遇到丢失的数据或空值。例如这是南美洲部分国家的名单，注意有许多字符串是空的。

通过filter函数来移除，不使用其他函数。第一个参数为None，第二个参数是数据。会过滤到值为false的数据。

# Remove missing data

countries = ["", "Argentina", "", "Brazil", "Chile",
             "", "Colombia", "", "Ecuador", "", "", "Venezuela"]

print(list(filter(None, countries)))

1
2
3

python3 filter_avg.py

['Argentina', 'Brazil', 'Chile', 'Colombia', 'Ecuador', 'Venezuela']

在Py中以下这些值会被当作False：空字符串，0，0.0，0j，空列表，空元祖，空字典，False，None。这样使用filter函数要小心，大多时候0是有效的数据块，所以不需要过滤掉。

把一个序列中的空字符串删掉，可以这么写：

def not_empty(s):
    return s and s.strip()

list(filter(not_empty, ['A', '', 'B', None, 'C', '  ']))
# 结果: ['A', 'B', 'C']

reduce函数

reduce函数把一个函数作用在一个序列[x1, x2, x3, $\cdots$ ]上，这个函数必须接收两个参数，reduce把结果继续和序列的下一个元素做累积计算，其效果就是：

reduce(f, [x1, x2, x3, x4]) = f(f(f(x1, x2), x3), x4)

要使用reduce函数先导入functools模块。通过reduce函数将列表中所有数据相乘，为了演示先创建一个含10个素数的列表。

使用reduce函数要有2个输入，使用lambda表达式创建彼此相乘。

from functools import reduce

# Multiply all numbers in a list

data = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
multiplier = lambda x, y: x*y

print(reduce(multiplier, data))

1
2
3

python3 reduces.py

6469693230

sort和sorted方法的排序区别( Sorting in Python )

先从一个例子看起，按照字母排序。列表是6种碱土金属数据，目前按照原子序数排序。但如果我们想按照字母顺序排序呢？

列表直接使用sort方法，默认情况下，sort假定你要按照字母顺序升序排序，如果想反过来，使用sort方法，然后将reverse属性设置为Ture。

>>> # Alkaline earth metals
... 
>>> earth_metals = ["Beryllium", "Magnesium", "Calcium", "Strontium", "Barium", "Radium"]
>>> earth_metals.sort()
>>> earth_metals
['Barium', 'Beryllium', 'Calcium', 'Magnesium', 'Radium', 'Strontium']
>>> 
>>> earth_metals.sort(reverse=True)
>>> earth_metals
['Strontium', 'Radium', 'Magnesium', 'Calcium', 'Beryllium', 'Barium']

如果这次将数据存到元祖中而不是列表，再来对元祖调用sort方法就会报错！原因是元祖是不可变对象，(位置)不能改变，而排序就要求可变动。

sort方法有两个参数，key和reverse属性，默认情况下reverse的值为False，意味着数据按照升序排序。注意，原地算法(in-place)表示Py不会创建第二个列表去储存排序数据，直接在列表自身上排序！

再来看下参数key，主要是用来进行比较的元素。使用参数key要传入一个函数，决定要排序的值。

来看一个例子，发现sort方法所以特性，创建太阳系的8个星球的列表。每个行星数据包括名字，以公里为单位的赤道半径，以单位克每立方厘米储存的密度，以天文单位储存的距日平均距离，一个天文单位是地球与太阳的平均距离。

目前行星根据距日距离来分类。现在我们要按照半径大小来排序，最大的放前面。需要创建一个函数，返回要排序的值，在本例中需要返回元祖的第2项值。可以使用lambda表达式，输入是行星的数据，输出是第二项的半径值。注意索引是从0而不是从1开始的。然后调用sort方法，并将size函数传给key，因为我们想将行星从大到小排序(降序)，将reverse的值为True。

planets = [
    ("Mercury", 2440, 5.43, 0.395),
    ("Venus", 6052, 5.24, 0.723),
    ("Earth", 6378, 5.52, 1.000),
    ("Mars", 3396, 3.93, 1.530),
    ("Jupiter", 71492, 1.33, 5.210),
    ("Saturn", 60268, 0.69, 9.551),
    ("Uranus", 25559, 1.27, 19.213),
    ("Neptune", 24764, 1.64, 30.070)
]

size = lambda planets: planets[1]
planets.sort(key=size, reverse=True)
print(planets)

1
2
3

python3 sorts.py

[('Jupiter', 71492, 1.33, 5.21), ('Saturn', 60268, 0.69, 9.551), ('Uranus', 25559, 1.27, 19.213), ('Neptune', 24764, 1.64, 30.07), ('Earth', 6378, 5.52, 1.0), ('Venus', 6052, 5.24, 0.723), ('Mars', 3396, 3.93, 1.53), ('Mercury', 2440, 5.43, 0.395)]

现在按照密度对行星进行分类，从自小密度开始

planets = [
    ("Mercury", 2440, 5.43, 0.395),
    ("Venus", 6052, 5.24, 0.723),
    ("Earth", 6378, 5.52, 1.000),
    ("Mars", 3396, 3.93, 1.530),
    ("Jupiter", 71492, 1.33, 5.210),
    ("Saturn", 60268, 0.69, 9.551),
    ("Uranus", 25559, 1.27, 19.213),
    ("Neptune", 24764, 1.64, 30.070)
]

density = lambda planets: planets[2]
planets.sort(key=density)
print(planets)

1
2
3

python3 sorts.py

[('Saturn', 60268, 0.69, 9.551), ('Uranus', 25559, 1.27, 19.213), ('Jupiter', 71492, 1.33, 5.21), ('Neptune', 24764, 1.64, 30.07), ('Mars', 3396, 3.93, 1.53), ('Venus', 6052, 5.24, 0.723), ('Mercury', 2440, 5.43, 0.395), ('Earth', 6378, 5.52, 1.0)]

在目前所有的例子中，sort方法改变了列表(list.sort())。如果想创建一个排序副本呢？元祖如何呢？对元祖排序有可能吗？这个时候就需要使用内置函数sorted()

sorted函数调用时第一个参数是列表，元祖或者任何可迭代对象。就像列表中的sort方法一样，可以设置key和reverse属性。

继续用上面的碱土金属的例子，通过sorted函数对其排序

>>> earth_metals = ["Beryllium", "Magnesium", "Calcium", "Strontium", "Barium", "Radium"]
>>> sorted_earth_metals = sorted(earth_metals)
>>> sorted_earth_metals
['Barium', 'Beryllium', 'Calcium', 'Magnesium', 'Radium', 'Strontium']
>>> earth_metals
['Beryllium', 'Magnesium', 'Calcium', 'Strontium', 'Barium', 'Radium']

我们发现按照字母表升序排序，并且原始的earth_metals列表顺序并没有变。

现在看下一个例子，对随机敲入10个正整数的元祖进行排序。元祖没有sort方法可用，因为它们是不可改动的，但如果传入sorted函数中，会返回一个排好序的列表。输入是元祖，输出是列表！原始的元祖不会变！

>>> data = (7, 2, 5, 6, 1, 3, 9, 10, 4, 8)
>>> sorted(data)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> data
(7, 2, 5, 6, 1, 3, 9, 10, 4, 8)

下一个例子也许会出乎意料，按照字母顺序对字符串进行排序。虽然是字符串，本身可迭代的。可以逐个字符迭代此字符串。

sorted会返回按照字符串中所有字符排序的列表。键入sorted("Alphabetically")，大写A排在小写a前面。

1 2	>>> sorted("Alphabetically") ['A', 'a', 'a', 'b', 'c', 'e', 'h', 'i', 'l', 'l', 'l', 'p', 't', 'y']

sort方法可以用于列表，但是这也会改变列表本身！
sorted方法更普遍，可以用于列表，元祖，字符串和其他可迭代对象。更赞的是不会改变原始数据，而返回排好序的新列表。

读取和写入文本文件( Text Files in Python )

有两种类型的文件：文本文件(Text Files)和二进制文件(Binary Files)。
文本文件是人们可读的数据
- 例如:富文本(Plain Text),XML,JSON，甚至源码
二进制文件用于储存编译码(Compiled code)，App数据，音视频图类媒体文件(Media files(images,audio,video))

今天我们学如何读取和编写文本文件

读取文件

通过内置open函数来在Py中打开文件，今天主要用file和mode。

一种方式open()中指定文件名(放在同目录下)，默认情况下以只读模式打开文件，返回一个文件对象f。通过read方法来读取其中内容，会返回文件中所有文本信息到text中，最终再关闭文件f。打印text会看到所有内容。

>>> f = open("text.txt")
>>> text = f.read()
>>> f.close()
>>> 
>>> print(text)
used for open files

这种方法也有隐患，如果在close之前出了问题怎么办，不想让打开的文件乱占用内存。基于此，有更好更安全更精简的方式去读取文件。

第二种方法是通过关键字来打开文件，照旧按你指定的名字打开文件并保存到一个文件对象中。和之前一样的读取内容。但用了这招就不用在close()文件了。哪怕代码块发生异常，Py也会关闭该文件。

>>> with open("text.txt") as fobj:
...     bio = fobj.read()
... 
>>> print(bio)
used for open files

如果尝试open一个不存在的文件会如何？Py会报错文件未找到(FileNotFoundError)。其中一种解决办法，放到try中，当FileNotFoundError时要执行相应操作，例如，如果不存在，text值为None。只要文件未找到，便输入为None

>>> try:
...     with open("svbsdv.txt") as f:
...             text = f.open()
... except FileNotFoundError:
...     text = None
...
>>> print(text)
None

写入文件

例如将五大洋的名字写入到.txt文件中，需要指定模式w，代表写入。如果不指定模式，Py默认都是只读模式。如果oceans.txt不存在，Py会自动创建一个，如果已经存在，Py会直接在其中写入，要小心！

通过循环将ocean列表写入到oceans文件中，要使用write方法。记住在使用with方法打开文件，Py会自动为你close文件

>>> oceans = ["Pacific", "Atlantic", "Indian", "Southern", "Arctic"]
>>>
>>> with open("oceans.txt", "w") as f:
...     for ocean in oceans:
...             f.write(ocean)

1
2
3

cat oceans.txt

PacificAtlanticIndianSouthernArctic

我们发现，名字是都写入了，但是彼此之间没有分隔，因为没指定分隔，Py也并不认为我们需要它。在每次遍历名字后加入换行。

>>> oceans = ["Pacific", "Atlantic", "Indian", "Southern", "Arctic"]
>>>
>>> with open("oceans.txt", "w") as f:
...     for ocean in oceans:
...             f.write(ocean)
...             f.write("\n")

cat oceans.txt

Pacific
Atlantic
Indian
Southern
Arctic

还有另一种方式让每个字符串单独一行显示，可以使用print函数，并将file属性值设定为f

>>> oceans = ["Pacific", "Atlantic", "Indian", "Southern", "Arctic"]
>>>
>>> with open("oceans.txt", "w") as f:
...     for ocean in oceans:
...             print(ocean, file=f)

cat oceans.txt

Pacific
Atlantic
Indian
Southern
Arctic

注意，每运行一次代码文件就会被覆盖，因为我们使用的是写入模式，那么是否在不覆盖文本情况下写入文件呢？

在文件名后指定a，使用插入模式(append mode)，插入模式下如果目标文件不存在则创建，如果存在，Py会自动在末尾插入，不会覆盖原始文件的内容。

>>> oceans = ["Pacific", "Atlantic", "Indian", "Southern", "Arctic"]
>>> with open("oceans.txt", "w") as f:
...     for ocean in oceans:
...             print(ocean, file=f)
...
>>> with open("oceans.txt", "a") as f:
...     print(23*"=", file=f)
...     print("These are the 5 oceans.", file=f)

cat oceans.txt

Pacific
Atlantic
Indian
Southern
Arctic
=======================
These are the 5 oceans.

异常处理( Exceptions in Python )

通过try和except语句来处理异常。

try:
    age = int(input('Age: '))
    print(age)
except ValueError:
    print('Invalid value')

例如，上面的例子中，当出现输入值错误(ValueError)的时候，会执行except部分的代码。

当然错误的类型会有很多，要提前处理好响应的异常处理措施。

pathlib库的Path类

Path.exists()

判断当前路径是否是文件或者文件夹，会返回一个布尔值

eg:

from pathlib import Path

path = Path("emails")
print(path.exists())

Path.mkdir()

根据路径创建文件夹

from pathlib import Path

path = Path("emails")
path.mkdir()

Path.rmdir()

当path为空文件夹的时候，删除该文件夹

Path.glob()

Path.glob(pattern):获取路径下的所有符合pattern的文件，返回一个generator

from pathlib import Path

path = Path()
for file in path.glob('*.md'):
    print(file)

blog-setup.md
c_plus.md
discrete-mathematics.md
docker.md
git-tutorial.md
hello-world.md
information-theory.md
manjaro.md
mathematical-statistics.md
matrix-theory.md
migrate-git-repository.md
network.md
nginx.md
optimality-theory.md
php.md
python-mosh.md
python.md
samba.md
server-setup.md
statistic.md
theory-of-computation.md
top-reading.md

这里的后缀py可以换成你想要搜索的文件类型，例如:.xls,.py等等

其他

zip与星号

a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# 不加
list(zip(a))
Out[414]: [([1, 2, 3],), ([4, 5, 6],), ([7, 8, 9],)]

# 加星号
list(zip(*a))
Out[415]: [(1, 4, 7), (2, 5, 8), (3, 6, 9)]

# 相当于
m = [1, 2, 3]; n = [4, 5, 6]; q = [7, 8, 9]

list(zip(m,n,q))
Out[417]: [(1, 4, 7), (2, 5, 8), (3, 6, 9)]

# 解压
a, b, c = zip(*zip(m,n,q))

list(a)
Out[419]: [1, 2, 3]
list(b)
Out[420]: [4, 5, 6]
list(c)
Out[421]: [7, 8, 9]

魔法方法

1
2
3

class Test():
    def __getitem__(self, key: str):
        return key

如果在类中定义了__getitem__方法，那么它的实例对象t可以通过t[key]取值

1
2
3

class Test():
    def __repr__(self):
        return 'xxxx'

通过重写类的__repr__方法，从而实现当输出实例化对象时，输出我们想要的信息

类方法

class Test():

    def __init__(self, a, b):
        self.a = a
        self.b = b

    @classmethod
    def t1(cls):
        print("cls:", cls)

classmethod修饰符对应的函数不需要实例化，不需要 self 参数，但第一个参数需要是表示自身类的 cls 参数，可以来调用类的属性，类的方法，实例化对象等

正确理解Python中的 @staticmethod@classmethod方法

装饰器

@staticmethod

用于将方法转化为静态方法

class MyClass:
    @staticmethod
    def the_static_method():
        return "This is a static method"

@classmethod

用于将方法转化为类方法

class MyClass:
    class_variable = 0

    def __init__(self, value):
        self.instance_variable = value
    
    @classmethod
    def increment_class_variable(cls):
        cls.class_variable += 1

@property

用于将一个方法转化为属性，可以让你像访问属性一样访问该方法

class Circle:
    def __init__(self, radius):
        self._radius = radius
    
    @property
    def area(self):
        return 3.14 * self._radius ** self._radius

@setter

与 @property 一起使用，用于设置属性的值

class Circle:
    def __init__(self, radius):
        self._radius = radius
    
    @property
    def area(self):
        return 3.14 * self._radius ** self._radius

    @radius.setter
    def radius(self, value):
        if value < 0:
            raise ValueError("Radius must be positive")
        self._radius = value

@staticmethod

用于将方法转化为静态方法

class MathUtils:
    @staticmethod
    def add(x, y):
        return x + y

@functools.wraps

用于保留被装饰函数的元信息

import functools

def my_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print("Before the function is called")
        result = func(*args, **kwargs)
        print("After the function is called")
        return result
    return wrapper

@timeit

用于测量函数的执行时间

import time

def timeit(func):
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        print(f"耗时: {end_time - start_time}")
        return result
    return wrapper

@login_required

用于检测用户是否登录

def login_required(func):
    def wrapper(request, *args, **kwargs):
        if request.user.is_authenticated:
            return func(request, *args, **kwargs)
        else:
            return redirect("/login/")
    return wrapper

@retry

用于在函数执行失败时重试指定次数

import time

def retry(max_retries):
    def decorator(func):
        def wrapper(*args, **kwargs):
            for _ in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    print(f"Exception {e}, retrying...")
                    time.sleep(1)
            return "Max retries exceeded"
        return wrapper
    return decorator

Hi, Qiang

字符串( Python Strings )

格式化字符串(Formatted Strings)

String Methods

Python2中的4种数值类型( Numbers in Python Version 2 )

Python3中的3种数值类型( Numbers in Python Version 3 )

Python2中的运算操作(Arithmetic in Python V2)

Python3中的运算操作( Arithmetic in Python V3 )

内置帮助文件dir()help() ( Interactive Help )

布尔类型( Python Booleans )

datetime模块实操( Datetime Module (Dates and Times) )

条件分支(If, Then, Else in Python)

函数( Python Functions )

关键字参数

可变参数

递归函数

集合( Sets in Python )

列表( Python Lists )

2D Lists

字典( Python Dictionaries )

字典生成式

元祖( Python Tuples )

日志( Logging in Python )

递归与记忆化( Recursion, the Fibonacci Sequence and Memoization )

随机模块( Python Random Number Generator: the Random Module )

处理CSV数据格式( CSV Files in Python )

列表推导式( List Comprehension )

生成器和迭代器

生成器

迭代器

类和对象( Python Classes and Objects )

类的继承( Inheritance )

super().init()

模块( Modules )

引入模块

from … import * 语句

__name__属性

包( Packages )

lambda表达式和匿名函数( Lambda Expressions & Anonymous Functions )

必备函数map filter reduce操作( Map, Filter, and Reduce Functions )

map函数

filter函数

reduce函数

sort和sorted方法的排序区别( Sorting in Python )

读取和写入文本文件( Text Files in Python )

读取文件

写入文件

异常处理( Exceptions in Python )

pathlib库的Path类

Path.exists()

Path.mkdir()

Path.rmdir()

Path.glob()

其他

zip与星号

魔法方法

类方法

装饰器

@staticmethod

@classmethod

@property

@setter

@staticmethod

@functools.wraps

@timeit

@login_required

@retry

参考

本作品采用 知识共享署名-相同方式共享 4.0 国际许可协议 进行许可

name属性

本作品采用知识共享署名-相同方式共享 4.0 国际许可协议进行许可