Python

2019-09-17
作者 Sirius
~98.67K 字
次阅读
条评论

1. 字符串( Python Strings )
1. 1.1. 格式化字符串(Formatted Strings)
2. 1.2. String Methods
2. 数值类型( Numbers )
3. 运算操作( Arithmetic Operations )
4. 内置帮助文件dir()help() ( Interactive Help )
5. 布尔类型( Python Booleans )
6. datetime模块实操( Datetime Module (Dates and Times) )
7. 条件分支(If, Then, Else in Python)
8. 函数( Python Functions )
1. 8.1. 关键字参数
2. 8.2. 可变参数
3. 8.3. 递归函数
9. 集合( Sets in Python )
10. 列表( Python Lists )
1. 10.1. 2D Lists
11. 字典( Python Dictionaries )
1. 11.1. 字典生成式
12. 元祖( Python Tuples )
13. 日志( Logging in Python )
14. 递归与记忆化( Recursion, the Fibonacci Sequence and Memoization )
15. 随机模块( Python Random Number Generator: the Random Module )
16. 文件操作( File Operations )
1. 16.1. 处理CSV数据格式( CSV Files )
17. 函数式编程( Functional Programming )
1. 17.1. 列表推导式( List Comprehension )
18. 生成器和迭代器
1. 18.1. 生成器
2. 18.2. 迭代器
19. 类和对象( Python Classes and Objects )
20. 类的继承( Inheritance )
1. 20.1. super().init()
21. 模块( Modules )
1. 21.1. 引入模块
2. 21.2. from … import * 语句
3. 21.3. __name__属性
22. 包( Packages )
1. 22.1. lambda表达式和匿名函数( Lambda Expressions & Anonymous Functions )
2. 22.2. 必备函数map filter reduce操作( Map, Filter, and Reduce Functions )
3. 22.3. map函数
4. 22.4. filter函数
5. 22.5. reduce函数
23. sort和sorted方法的排序区别( Sorting in Python )
1. 23.1. 读取和写入文本文件( Text Files )
2. 23.2. 读取文件
3. 23.3. 写入文件
4. 23.4. pathlib库的Path类
24. 异常处理( Exceptions in Python )
25. 其他
1. 25.1. zip与星号
2. 25.2. 魔法方法
3. 25.3. 类方法
4. 25.4. 装饰器
26. Python 核心概念综合示例
27. 并发编程( Concurrent Programming )
1. 27.1. 多线程编程( Threading )
2. 27.2. 异步编程与协程( Async/Await and Coroutines )
28. 爬虫实战：HTTP 库对比( Web Scraping: HTTP Libraries Comparison )
1. 28.1. requests - 最流行的同步 HTTP 库
2. 28.2. urllib3 - 底层 HTTP 库
3. 28.3. httpx - 现代化的 HTTP 库（推荐）
4. 28.4. BeautifulSoup4 - HTML 解析库
5. 28.5. 实战示例：异步爬虫
6. 28.6. 库选择建议
29. 参考

YouTube上看到一个很炫酷的Python教程

字符串( Python Strings )

在Python中可以通过单，双，三引号创建字符串.

▶ python3
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> message = "Meet me tonight."
>>> print(message)
Meet me tonight.
>>>
>>> message2 = 'The clock strikes at midnight.'
>>> print(message2)
The clock strikes at midnight.

当字符串变得复杂包含了单双引号的时候, 有两种办法可以解决. 第一种是用\backslash符号. 第二种是通过三引号解决.

>>> movie_quote = """One of my favorite lines from The Godfather is:
... I'm going to make him an offer he can't refuse."
... Do you know who said this?"""
>>>

使用’‘’, 回车换行会出现这个… Python提示, 这里有换行操作.

格式化字符串(Formatted Strings)

假设我们想打印出John [Smith] is a coder。我们可能会用下面的方法：

first = 'John'
last = 'Smith'
message = first + ' [' + last + '] is a coder'
print(message)

虽然这种方法非常有效，但是并不理想，因为当我们的文本变得复杂时候，代码的阅读性会很差。

first = 'John'
last = 'Smith'

msg = f'{first} [{last}] is a coder'
print(msg)

定义格式化字符串，需要我们在字符串前面加上f，然后用大括号把变量名括起来，在字符串中动态的插入值。

String Methods

len()
.upper()
.lower()
.find()
.replace()
'...' in variable

使用这个len()函数我们可以对字符的数量进行限制。

1 2	course = 'Python for Beginners' print(len(course))

1
2
3

>>> course = 'Python for Beginners'
>>> print(len(course))
20

如果我们希望把所有的字符转换为大写字母，我们就需要使用.upper()方法
如果我们希望把所有的字符转换为小写字母，我们就需要使用.lower()方法

1
2
3

course = 'Python for Beginners'
print(course.upper())
print(course.lower())

>>> print(course.upper())
PYTHON FOR BEGINNERS
>>> print(course.lower())
python for beginners

注意这个方法不会修改我们原来的字符串，是创建一个新的字符串并返回它。

如果希望在一个字符串中找到一个字符或一个字符序列，我们就需要使用.find()方法

1
2
3

course = 'Python for Beginners'
print(course.find('P'))
print(course.find('Beginners'))

>>> print(course.find('P'))
0
>>> print(course.find('Beginners'))
11

我们得到了0，这是因为字符串中第一个大写P的索引是0。
我们得到了11，这是因为Beginners的索引是从11开始的。

我们也有替换字符或字符串的方法，我们就需要使用.replace()方法

1 2	course = 'Python for Beginners' print(course.replace('Beginners', 'Absolute Beginners'))

1 2	>>> print(course.replace('Beginners', 'Absolute Beginners')) Python for Absolute Beginners

需要注意find()以及replace()方法是区分大小写的！

最后，如果想要检查你的字符或字符序列时，使用in运算符。比如你想知道字符串中是否包含Python这个词。

1
2
3

course = 'Python for Beginners'
print('Python' in course)
print('python' in course)

>>> print('Python' in course)
True
>>> print('python' in course)
False

这里注意与find()方法的区别，find方法返回的是一个字符或字符序列的索引，而in运算符返回的是一个布尔值。

数值类型( Numbers )

Python3中有3种数值类型: int, float, complex.

在Python3中不必担心整数的大小，只要你的电脑足够大，可以创建任意大小的整数。没有大小限制，没有溢出。

▶ python3
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> z = 2 - 6.1j
>>> type(z)
<class 'complex'>
>>> z.real
2.0
>>> z.imag
-6.1
>>>

注意:实数输入的是整数，显示的是浮点型。因为在python中实和虚数部分规定为浮点型。

运算操作( Arithmetic Operations )

▶ python3
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 16/5
3.2
>>> 20/5
4.0
>>> 16 % 5
1
>>> 16//5
3
>>> 2/0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero

总结:当进行两个数的加减乘除运算时，Python会自动将类型转换为更大的。除法运算返回浮点或复数型的结果，使用//进行整除运算，小心除数是0。

内置帮助文件dir()help() ( Interactive Help )

▶ python3
Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
>>> dir(__builtins__)
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'ModuleNotFoundError', 'NameError', 'None', 'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'RecursionError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopAsyncIteration', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '_', '__build_class__', '__debug__', '__doc__', '__import__', '__loader__', '__name__', '__package__', '__spec__', 'abs', 'all', 'any', 'ascii', 'bin', 'bool', 'breakpoint', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']
>>> help(hex)
Help on built-in function hex in module builtins:

hex(number, /)
    Return the hexadecimal representation of an integer.

    >>> hex(12648430)
    '0xc0ffee'
>>> hex(10)
'0xa'
>>> 0xa
10

>>> import math
>>> dir(math)
['__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'ceil', 'copysign', 'cos', 'cosh', 'degrees', 'e', 'erf', 'erfc', 'exp', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'pi', 'pow', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc']
>>> help(radians)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'radians' is not defined
# 因为radians函数是math模块中的，必须指定函数详细路径
>>> help(math.radians)
# 为了使用，必须指定函数详细路径
>>> math.radians(180)
3.141592653589793

布尔类型( Python Booleans )

boolean只有两个值:True和False.必须首字母大写.

>>> a = 3
>>> b = 5
>>> a == b
False
>>> a != b
True

在Python中0认为是False,空的字符串也认为是False

>>> type(True)
<class 'bool'>
>>> bool(2)
True
>>> bool(-2.1)
True
>>> bool(0)
False
>>> bool('abc')
True
>>> bool(' ')
True
>>> bool('')
False

datetime模块实操( Datetime Module (Dates and Times) )

>>> import datetime
>>> help(datetime.date)

>>> gvr = datetime.date(1956, 1, 31)
>>> print(gvr)
1956-01-31
>>> print(gvr.year)
1956
>>> print(gvr.month)
1
>>> print(gvr.day)
31

datetime模块中还有timedelta类，用来修改日期

>>> mail = datetime.date(2000, 1, 1)
>>> dt = datetime.timedelta(100)
>>> print(mail + dt)
2000-04-10

选择日期或时间格式，例如想显示一天的全写，然后是月，最后是年，传统的方法是对日期使用strftime方法.
还有一种方法是创建一个格式字符串，可以先输入任何喜欢的文本，到日期格式部分写.调用format()方法，值是日期或时间对象.

>>> # Day-name, Month-name Day-#, Year
...
>>> print(gvr.strftime("%A, %B, %d, %Y"))
Tuesday, January, 31, 1956
>>>
>>> message = "GVR was born on {:%A, %B, %d, %Y}."
>>> print(message.format(gvr))
GVR was born on Tuesday, January, 31, 1956.

用date类创建日期,参数是年,月,日. 用time类创建时间,参数是时,分,秒.最后用datetime类创建日期时间都有的变量,前三个参数是年,月,日,后三个参数是时,分,秒.

>>> dir(datetime)
['MAXYEAR', 'MINYEAR', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', 'date', 'datetime', 'datetime_CAPI', 'time', 'timedelta', 'timezone', 'tzinfo']
>>>
>>> launch_date = datetime.date(2017, 3, 20)
>>> launch_time = datetime.time(22, 27, 0)
>>> launch_datetime = datetime.datetime(2017, 3, 20, 22, 27, 0)
>>>
>>> print(launch_date)
2017-03-20
>>> print(launch_time)
22:27:00
>>> print(launch_datetime)
2017-03-20 22:27:00

像date对象一样,time也可输出每部分值,依次调用.hour/minute/second即可,datetime对象也可以.

>>> print(launch_time.hour)
22
>>> print(launch_time.minute)
27
>>> print(launch_time.second)
0

访问当前日期和时间:使用datetime类中的today()方法

Module: datetime
Class: datetime
Method: today()

>>> now = datetime.datetime.today()
>>> print(now)
2019-09-27 23:18:54.381467
>>> print(now.microsecond)
381467

将获取datetime字符串转为datetime对象:使用datetime类中的strptime方法，将时间字符串解析为datetime, 第一个参数是字符串，第二个参数是格式, 打印就会看到标准的Python日期格式，此时就是对象而非字符串了

Module: datetime
Class: datetime
Method: strptime()

>>> moon_landing = "7/20/1969"
>>> moon_landing_datetime = datetime.datetime.strptime(moon_landing, "%m/%d/%Y")
>>> print(moon_landing_datetime)
1969-07-20 00:00:00
>>> print(type(moon_landing_datetime))
<class 'datetime.datetime'>
>>>

总结:datetime模块中包含一个datedelta类，可以储存日期时间中的差异，还可以通过加减来修改日期。

条件分支(If, Then, Else in Python)

age = int(input('请输入1到100之间的数字:'))
if age >= 18:
	print('adult')
elif age >= 6:
	print('teenager')
else:
	print('kid')

注意缩进(Tab缩进)

函数( Python Functions )

函数的部分主要说一下参数，函数的参数有位置参数，默认参数，可变参数，关键字参数，命名关键字参数

关键字参数

Python函数可以接受关键字参数(keyword arguments).下面的例子是编写函数,将身高从美制单位转为厘米.(i inch = 2.54 cm, 1 foot = 12 inches).使用关键字参数会更灵活,代码更干净.

Python 3.7.0 (v3.7.0:1bf9cc5093, Jun 26 2018, 23:26:24)
[Clang 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> def cm(feet = 0, inches = 0):
...     '''Concerts a length from feet and inches to centimeters.'''
...     inches_to_cm = inches * 2.54
...     feet_to_cm = feet * 12 * 2.54
...     return inches_to_cm + feet_to_cm
>>> cm(feet = 5)
152.4
>>> cm(inches = 70)
177.8
>>> cm(feet = 5, inches = 8)
172.72

当创建函数时有两种参数方式,关键字参数(keyword)有等号=,必需参数(required)可以没有等号.创建函数时,两种都可以用,但必须从一而终.
例如创建函数g,第一个关键字参数,会报错,这里提示默认参数(default argument),这是关键字参数的另一个叫法.

>>> def g(x = 0, y):
...     return x + y
...
  File "<stdin>", line 1
SyntaxError: non-default argument follows default argument

要修正,必须将非默认参数写到前面,这些又名必需参数(Required Argument).因为需要指定.

1
2
3

>>> def g(y, x = 0):
...     return x + y
...

要调用该函数,必需要指定参数y值,关键字参数x则选填,x不给值就用默认的,如果想为其赋值,就必须要指名来设置,必需参数不需要指名,由位置来决定.

>>> g(5)
5
>>> g(7, x = 3)
10

定义默认参数要牢记一点：默认参数必须指向不变对象！

1
2
3

def add_end(L=[]):
    L.append('END')
    return L

>>> add_end()
['END']
>>> add_end()
['END', 'END']
>>> add_end()
['END', 'END', 'END']

但是，再次调用add_end()时，结果就不对了：

原因解释如下：

Python函数在定义的时候，默认参数L的值就被计算出来了，即[]，因为默认参数L也是一个变量，它指向对象[]，每次调用该函数，如果改变了L的内容，则下次调用时，默认参数的内容就变了，不再是函数定义时的[]了。

要修改上面的例子，我们可以用None这个不变对象来实现：

def add_end(L=None):
    if L is None:
        L = []
    L.append('END')
    return L

为什么要设计str、None这样的不变对象呢？因为不变对象一旦创建，对象内部的数据就不能修改，这样就减少了由于修改数据导致的错误。此外，由于对象不变，多任务环境下同时读取对象不需要加锁，同时读一点问题都没有。我们在编写程序时，如果可以设计一个不变对象，那就尽量设计成不变对象。

可变参数

参数个数不确定的时候，我们首先想到可以把a，b，c……作为一个list或tuple传进来，这样，函数可以定义如下：

def calc(*numbers):
    sum = 0
    for n in numbers:
        sum = sum + n * n
    return sum

定义可变参数和定义一个list或tuple参数相比，仅仅在参数前面加了一个*号。在函数内部，参数numbers接收到的是一个tuple，因此，函数代码完全不变。但是，调用该函数时，可以传入任意个参数，包括0个参数：

>>> calc(1, 2)
5
>>> calc()
0

Python允许你在list或tuple前面加一个*号，把list或tuple的元素变成可变参数传进去：

1
2
3

>>> nums = [1, 2, 3]
>>> calc(*nums)
14

*nums表示把nums这个list的所有元素作为可变参数传进去。这种写法相当有用，而且很常见。

递归函数

在函数内部，可以调用其他函数。如果一个函数在内部调用自身本身，这个函数就是递归函数

举个例子，我们来计算阶乘 $n! = 1 \times 2 \times 3 \times \cdots \times n$ ，用函数fact(n)表示

def fact(n):
    if n == 1:
        return 1
    return n * fact(n - 1)

如果我们计算fact(5)，可以根据函数定义看到计算过程如下：

===> fact(5)
===> 5 * fact(4)
===> 5 * (4 * fact(3))
===> 5 * (4 * (3 * fact(2)))
===> 5 * (4 * (3 * (2 * fact(1))))
===> 5 * (4 * (3 * (2 * 1)))
===> 5 * (4 * (3 * 2))
===> 5 * (4 * 6)
===> 5 * 24
===> 120

使用递归函数需要注意防止栈溢出。在计算机中，函数调用是通过栈（stack）这种数据结构实现的，每当进入一个函数调用，栈就会加一层栈帧，每当函数返回，栈就会减一层栈帧。由于栈的大小不是无限的，所以，递归调用的次数过多，会导致栈溢出

解决递归调用栈溢出的方法是通过尾递归优化，尾递归是指，在函数返回的时候，调用自身本身，并且，return语句不能包含表达式。这样，编译器或者解释器就可以把尾递归做优化，使递归本身无论调用多少次，都只占用一个栈帧，不会出现栈溢出的情况。

def fact(n):
    return fact_iter(n, 1)

def fact_iter(num, product):
    if num == 1:
        return product
    return fact_iter(num - 1, num * product)

可以看到，return fact_iter(num - 1, num * product)仅返回递归函数本身，num - 1和num * product在函数调用前就会被计算，不影响函数调用。

===> fact_iter(5, 1)
===> fact_iter(4, 5)
===> fact_iter(3, 20)
===> fact_iter(2, 60)
===> fact_iter(1, 120)
===> 120

尾递归调用时，如果做了优化，栈不会增长，因此，无论多少次调用也不会导致栈溢出。

遗憾的是，大多数编程语言没有针对尾递归做优化，Python解释器也没有做优化，所以，即使把上面的fact(n)函数改成尾递归方式，也会导致栈溢出。

集合( Sets in Python )

要创建一个set，需要提供一个list作为输入集合,重复元素在set中自动被过滤.通过add(key)方法可以添加元素到set中，可以重复添加，但不会有效果.通过remove(key)方法可以删除元素.

>>> s = set([28, True, 2.718, 'set'])
>>> s.add(123)
>>> s
{True, 2.718, 'set', 123, 28}
>>> s.remove(123)
>>> s
{True, 2.718, 'set', 28}
>>> s.discard(28)
>>> s
{True, 2.718, 'set'}
>>> s.clear()
>>> s
set()
>>> len(s)
0
>>>

获得两个集合的并集和交集.union()是获取两个集合的交集,intersection()是获取两个集合的并集。

>>> # Integers 1 - 10
...
>>> odds = set([1, 3, 5, 7, 9])
>>> evens = set([2, 4, 6, 8, 10])
>>> primes = set([2, 3, 5, 7])
>>> composites = set([4, 6, 8, 9, 10])
>>>
>>> odds.union(evens)
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
>>> evens.union(odds)
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
>>> odds
{1, 3, 5, 7, 9}
>>> evens
{2, 4, 6, 8, 10}
>>> odds.intersection(primes)
{3, 5, 7}
>>> primes.intersection(evens)
{2}

>>> s1 = set([1, 2, 3])
>>> s2 = set([2, 3, 4])
>>> s1 & s2
{2, 3}
>>> s1 | s2
{1, 2, 3, 4}

下面一个例子是判断集合中是否包含某项.使用in操作符.还可以判断是否不包括，使用not in操作符

>>> 2 in primes
True
>>> 6 in odds
False
>>> 9 not in evens
True

str是不变对象，而list是可变对象

对于可变对象，比如list，对list进行操作，list内部的内容是会变化的，比如：

>>> a = ['c', 'b', 'a']
>>> a.sort()
>>> a
['a', 'b', 'c']

而对于不可变对象，比如str，对str进行操作呢：

>>> a = 'abc'
>>> a.replace('a', 'A')
'Abc'
>>> a
'abc'

列表( Python Lists )

有两种方式创建列表，一种方式通过list()，更简单的方式直接[].
通过用append方法来插入值,在列表末尾插入，注意列表如何保存数据的顺序，这和集合(set)不同.集合中排序并不重要，而列表中排序就是一切.
通过insert方法可以把元素插入到指定的位置，比如索引号为1的位置.
要删除list末尾的元素，用pop()方法.
要删除指定位置的元素，用pop(i)方法，其中i是索引位置.
要把某个元素替换成别的元素，可以直接赋值给对应的索引位置.
通过remove方法，删除指定的元素
通过clear方法，清空列表
通过count方法，计算列表中元素的个数

>>> primes = [2, 3, 5, 7, 11, 13]
>>> primes.append(17)
>>> primes.append(19)
>>>
>>> primes
[2, 3, 5, 7, 11, 13, 17, 19]
>>> primes.insert(1, 23)
>>> primes
[2, 23, 3, 5, 7, 11, 13, 17, 19]
>>>
>>> primes.pop()
19
>>> primes
[2, 23, 3, 5, 7, 11, 13, 17]
>>> primes.pop(1)
23
>>> primes
[2, 3, 5, 7, 11, 13, 17]
>>> primes[0] = '0'
>>> primes
['0', 3, 5, 7, 11, 13, 17]
>>>
>>> numbers = [5, 2, 1, 7 ,4]
>>> numbers.remove(5)
>>> print(number)
[2, 1, 7, 4]
>>>
>>> numbers.clear()
>>> print(numbers)
[]
>>>
>>> numbers = [5, 2, 1, 5, 7 ,4]
>>> print(numbers.count(5))
2

可以通过索引查看具体值,最开始的索引值(index)是0.最开始的左边的索引值是-1.此时py会自动将末尾项索引值判定为-1.注意这样只能走一轮.

>>> primes
[2, 3, 5, 7, 11, 13, 17, 19]
>>> primes[0]
2
>>> primes[1]
3
>>> primes[-1]
19

使用切片(slicing),也是一种访问列表值的方式.返回列表中的一部分值.
先指定列表名,然后[起始位置:结束位置之前].注意切片包括起始位置,但不包括结束位置.

>>> primes
[2, 3, 5, 7, 11, 13, 17, 19]
>>>
>>> primes[2:5]
[5, 7, 11]

list里面的元素的数据类型也可以不同,list元素也可以是另一个list.要拿到’php’可以写p[1]或者s[2][1]，因此s可以看成是一个二维数组.如果一个list中一个元素也没有，就是一个空的list，它的长度为0.

>>> L = ['Apple', 123, True]
>>> s = ['python', 'java', ['asp', 'php'], 'scheme']
>>> len(s)
4
>>>
>>> L = []
>>> len(L)
0

还可以将两个列表拼接为一个,分别创建numbers和litters列表.通过+符号将两个列表拼接.这种拼接叫组合.注意,原来的numbers和letters列表不变.

>>> numbers = [1, 2, 3]
>>> letters = ['a', 'b', 'c']
>>> numbers + letters
[1, 2, 3, 'a', 'b', 'c']
>>> letters + numbers
['a', 'b', 'c', 1, 2, 3]

2D Lists

二维列表多用于数据分析和机器学习中，通常可以表示矩阵，例如：

matrix = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

for row in matrix:
    for item in row:
        print(item)

字典( Python Dictionaries )

创建字典:post = {"key1":value1, "key2":value2}.也可以使用dict()来创建字典.新增值用[],键写在其中,用=来进行赋值.注意在dict()中键名不需要双引号扩起来.但是通过[]新增时,要加上.

>>> post2 = dict(message = "SS Cotopaxi", languange = "English")
>>> print(post2)
{'message': 'SS Cotopaxi', 'languange': 'English'}
>>>
>>> post2["user_id"] = 209
>>> post2["datetime"] = 123456789
>>>
>>> print(post2)
{'message': 'SS Cotopaxi', 'languange': 'English', 'user_id': 209, 'datetime': 123456789}

用[]在字典中储存新数据,用()来访问数据.

1 2	>>> print(post2["message"]) SS Cotopaxi

避免访问字典中不存在的键的第一种方法是通过if判断属否包含

>>> if 'location' in post2:
...     print(post2['location'])
... else:
...     print("The post does not contain a location value.")
...
The post does not contain a location value.

或者使用try来抛出错误.如果字典中没有该键,会显示KeyError.

>>> try:
...     print(post2['location'])
... except KeyError:
...     print("The post does not contain a location value.")
...
The post does not contain a location value.

还有另一种访问字典数据的方法,以及处理是否包含某键的操作.通过dict提供的get()
方法.如果key不存在,可以返回None.或者自己指定的value.

1
2
3

>>> loc = post2.get('location', None)
>>> print(loc)
None

遍历字典中键值对的操作,简单的方法是遍历所有键,然后get(键)对应值,keys方法以列表返回一个字典所有的键,数据的顺序有可能不同.

>>> for key in post2.keys():
...     value = post2[key]
...     print(key, "=", value)
...
message = SS Cotopaxi
languange = English
user_id = 209
datetime = 123456789

另一个遍历方法循环中使用items()方法.每一次返回对应键值对

>>> for key, value in post2.items():
...     print(key, "=", value)
...
message = SS Cotopaxi
languange = English
user_id = 209
datetime = 123456789

从字典中删除数据也有很多方法,pop和popitem方法,允许从字典中移除一项,clear方法清空字典.

要删除一个key，用pop(key)方法，对应的value也会从dict中删除：

>>> d.pop('Bob')
75
>>> d
{'Michael': 95, 'Tracy': 85}

请务必注意，dict内部存放的顺序和key放入的顺序是没有关系的。

和list比较，dict有以下几个特点：

查找和插入的速度极快，不会随着key的增加而变慢；
需要占用大量的内存，内存浪费多。

而list相反：

查找和插入的时间随着元素的增加而增加；
占用空间小，浪费内存很少。

所以，dict是用空间来换取时间的一种方法

dict可以用在需要高速查找的很多地方，在Python代码中几乎无处不在，正确使用dict非常重要，需要牢记的第一条就是dict的key必须是不可变对象。

这是因为dict根据key来计算value的存储位置，如果每次计算相同的key得出的结果不同，那dict内部就完全混乱了。这个通过key计算位置的算法称为哈希算法（Hash）。

要保证hash的正确性，作为key的对象就不能变。在Python中，字符串、整数等都是不可变的，因此，可以放心地作为key。而list是可变的，就不能作为key：

>>> key = [1, 2, 3]
>>> d[key] = 'a list'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

字典生成式

k = ['张三', '李四', '王五']
s = [1000, 3000, 7000]

# print({x : y for x in k for y in s})

print({k : 3000 for k in ['张三', '李四', '王五']})

# 输出结果： {'张三': 3000, '李四': 3000, '王五': 3000}

元祖( Python Tuples )

tuple和list非常类似,但是tuple一旦初始化就不能修改.
元祖和列表也有区别,下面通过使用getsizeof函数对比同样数据下列表和元祖的大小.可以看到,列表比元祖占更多内存.

>>> import sys
>>>
>>> list_eg = [1,2,3,"a","b","c",True,3.1415]
>>> tuple_eg = (1,2,3,"a","b","c",True,3.1415)
>>>
>>> print("List size = ", sys.getsizeof(list_eg))
List size =  128
>>> print("tuple size = ", sys.getsizeof(tuple_eg))
tuple size =  112

元祖是固定的,不能改动值(Immutable),一旦创建元祖,就不能再改动了.
元祖生成速度比列表更快,需要使用timeit模块,查看创建一百万个列表和元祖的时间.首先引入timeit模块,该模块有相同的名字timeit,第一个参数要执行的语句,创建一个短的整数列表,接下来指定多次执行的次数,输出的就是创建这些次数的时间.

>>> import timeit
>>>
>>> list_test = timeit.timeit(stmt="[1,2,3,4,5]", number=1000000)
>>> tuple_test = timeit.timeit(stmt="(1,2,3,4,5)", number=1000000)
>>>
>>> print("List time:", list_test)
List time: 0.0711129419998997
>>> print("Tuple time:", tuple_test)
Tuple time: 0.015337733000023945

如果要定义一个空的tuple，可以写成(),只有1个元素的tuple定义时必须加一个逗号,.

>>> empty_tuple = ()
>>> test1 = ("a",)
>>> test2 = ("a", "b")
>>> test3 = ("a", "b", "c")
>>>
>>> print(empty_tuple)
()
>>> print(test1)
('a',)
>>> print(test2)
('a', 'b')
>>> print(test3)
('a', 'b', 'c')

还有一种创建元祖的方法,不写()也可以,刚才1个元素的元祖,要在末尾写,,打印值和类型,显示的都是元祖.

>>> test1 = 1,
>>> test2 = 1, 2
>>> test3 = 1, 2, 3
>>>
>>> print(test1)
(1,)
>>> print(test2)
(1, 2)
>>> print(test3)
(1, 2, 3)

与列表一样，元祖也可以用下表(index)来引用.

>>> # (age, country, knows_python)
...
>>> survey = (27, "Vietnam", True)
>>> age = survey[0]
>>> country = survey[1]
>>> knows_python = survey[2]
>>>
>>> print("Age =", age)
Age = 27
>>> print("Country =", country)
Country = Vietnam
>>> print("Konws Python ?", knows_python)
Knows Python ? True

但是元祖提供了更快的选择,可以将元祖中所有元素单独赋不同变量.元祖解释器要求1个元素必须写,一定要确保变量和元祖中值的数量一样，否则会报ValueError错误.

>>> survey2 = (21, "Switzerland", False)
>>>
>>> age, country, konws_python = survey2
>>> print("Age =", age)
Age = 21
>>> print("Country =", country)
Country = Switzerland
>>> print("Konws Python ?", knows_python)
Konws Python ? Flase

日志( Logging in Python )

Py中logging模块提供了记录进度和问题的一切.日志允许将状态信息写入文件或其他输出流,包含代码中哪些部分已执行,可能出现什么问题.每则日志都有一个级别,分为五级:Debug, Info, Warning, Error, Critical.

首先引入logging模块,通过basicConfig方法设置日志系统.然后使用getLogger方法,调整特性,直到工作方式符合我们的要求.

键入logging.basicConfig并指定文件名,Py会把内容记录到该文件中,如果不存在,会创建.然后创建对象logger,值为getLogger类.这个就作为根日志(root logger).

>>> import logging
>>>
>>> # Create and configure logger
...
>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log")
>>> logger = logging.getLogger()

现在开始测试,使用info方法,并设置内容字符串,

1
2
3

>>> # Test the logger
...
>>> logger.info("Our first message.")

Lumberjack.log文件虽然创建了,但是内容却是空的,日志信息毫无踪迹,为啥呢?原因出在配置logger的方式,打印logger.level.

1 2	>>> print(logger.level) 30

在日志模块,有6种级别,分别对应一下数值.

Level	Numeric Value
NOTSET	0
DEBUG	10
INFO	20
WARNING	30
ERROR	40
CRITICAL	50

日志写入信息,需要级别高于设置级别才行,刚才使用的info,值为20.但是basicConfig的默认级别是30的WARNING.解决办法，更新basicConfig,然后将级别设置为DEBUG(10).

>>> import logging
>>>
>>> # Create and configure logger
...
>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.DEBUG)
>>> logger = logging.getLogger()
>>> # Test the logger
...
>>> logger.info("Our first message.")
>>> print(logger.level)
10

此时再打开日志文件,就能看到测试信息了,先是日志级别(INFO),然后是日志名称,默认为root.然后是日志内容.格式可以，但缺少一些重要数据.日志创建时间

1 2	▶ cat Lumberjack.log INFO:root:Our first message.

现在改动日志内容格式,需要创建一个格式字符串.Py提供了丰富的日志内容格式,这里只要介绍asctime, levelname, message.
基于想用的属性创建一个字符串，格式为%(Levelname)s %(asctime)s - %(message)s.在basicConfig将format指定为LOG_FORMAT,

>>> import logging
>>>
>>> # Create and configure logger
...
>>> LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s"
>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.DEBUG, format = LOG_FORMAT)
>>> logger = logging.getLogger()
>>> # Test the logger
...
>>> logger.info("Our first message.")
>>>
>>> print(logger.level)
10

打开日志文件,就会看到新格式的内容.之前的信息还在那里.默认情况下,日志都是追加模式.

1
2
3

▶ cat Lumberjack.log
INFO:root:Our first message.
INFO 2019-10-07 00:58:39,325 - Our first message.

如果想要日志文件每次都重写,修改filemode为'w',然后改动内容再次测试.

>>> import logging
>>>
>>> # Create and configure logger
...
>>> LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s"
>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.DEBUG, format = LOG_FORMAT, filemode = 'w')
>>> logger = logging.getLogger()
>>> # Test the logger
...
>>> logger.info("Our SECOND message.")

在看之前的日志,之前的文件都被覆盖了,只有最新的内容.

1 2	▶ cat Lumberjack.log INFO 2019-10-07 01:06:52,276 - Our SECOND message.

使用不同级别的名称,可以记录不同的内容.

>>> import logging
>>>
>>> # Create and configure logger
...
>>> LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s"
>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.DEBUG, format = LOG_FORMAT, filemode = 'w')
>>> logger = logging.getLogger()
>>> # Test messages
...
>>> logger.debug("This id a harmless debug message.")
>>> logger.info("Just some useful info.")
>>> logger.warning("I'm sorry, but I can't do that, Dave.")
>>> logger.error("Did you just try to divide by zero?")
>>> logger.critical("The entire internet is down!!")

打开日志看到5条内容.

▶ cat Lumberjack.log
INFO 2019-10-07 01:06:52,276 - Our SECOND message.
DEBUG 2019-10-07 01:11:41,564 - This id a harmless debug message.
INFO 2019-10-07 01:13:25,214 - Just some useful info.
WARNING 2019-10-07 01:14:20,196 - I'm sorry, but I can't do that, Dave.
ERROR 2019-10-07 01:15:23,000 - Did you just try to divide by zero?
CRITICAL 2019-10-07 01:15:48,875 - The entire internet is down!!

将级别改为ERROR,再来运行.

1	>>> logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.ERROR, format = LOG_FORMAT, filemode = 'w')

现在只看到error和critical内容.

1
2
3

▶ cat Lumberjack.log
ERROR 2019-10-07 01:19:20,247 - Did you just try to divide by zero?
CRITICAL 2019-10-07 01:19:21,971 - The entire internet is down!!

Let’s see it in action.
编写一个二次函数公式,求根公式,先导入math模块,需要平方根函数,函数名为quadratic_formula,未来人们会歌颂我们创建的函数名字,函数有a,b,c三个参数.
日志内容和注释一样,看起来有点重复,一旦确信代码工作正常,删除日志内容,保留注释.

import logging
import math

# Create and configure logger

LOG_FORMAT = "%(levelname)s %(asctime)s - %(message)s"
logging.basicConfig(filename = "/Users/zhangqiang/Desktop/Programming/Python/practice/Lumberjack.log", level = logging.DEBUG, format = LOG_FORMAT, filemode = 'w')
logger = logging.getLogger()

def quadratic_formula(a, b, c):
    """Return the solutions to the equation ax^2 + bx + c = 0."""
    logger.info("quadratic_formula({0}, {1}, {2})".format(a, b, c))

    # Compute the discriminant
    logger.debug("# Compute the discriminant")
    disc = b**2 - 4*a*c

    # Compute the two roots
    logger.debug("# Compute the two roots.")
    root1 = (-b + math.sqrt(disc)) / (2*a)
    root2 = (-b - math.sqrt(disc)) / (2*a)

    # Return the roots
    logger.debug("# Return the roots.")
    return (root1, root2)

roots = quadratic_formula(1, 0, -4)
print(roots)

函数返回正确结果,查看日志文件,会看到详细内容,

▶ python3 log.py
(2.0, -2.0)
▶ cat Lumberjack.log
INFO 2019-10-07 01:47:04,074 - quadratic_formula(1, 0, -4)
DEBUG 2019-10-07 01:47:04,074 - # Compute the discriminant
DEBUG 2019-10-07 01:47:04,074 - # Compute the two roots.
DEBUG 2019-10-07 01:47:04,075 - # Return the roots.

现在再测试函数,将-4改为1

1	roots = quadratic_formula(1, 0, 1)

报错了,查看日志文件,只看到2条debug内容.所以问题是我们求根是产生的.

▶ python3 log.py
Traceback (most recent call last):
  File "log.py", line 28, in <module>
    roots = quadratic_formula(1, 0, 1)
  File "log.py", line 20, in quadratic_formula
    root1 = (-b + math.sqrt(disc)) / (2*a)
ValueError: math domain error

▶ cat Lumberjack.log
INFO 2019-10-07 01:50:48,174 - quadratic_formula(1, 0, 1)
DEBUG 2019-10-07 01:50:48,175 - # Compute the discriminant
DEBUG 2019-10-07 01:50:48,175 - # Compute the two roots.

递归与记忆化( Recursion, the Fibonacci Sequence and Memoization )

Fibonacci Sequence(Recursive Fuction)

>>> def fibonacci(n):
...     if n == 1:
...             return 1
...     elif n == 2:
...             return 1
...     elif n > 2:
...             return fibonacci(n - 1) + fibonacci(n - 2)
... 
>>> for n in range(1, 11):
...     print(n, ":", fibonacci(n))
...
1 : 1
2 : 1
3 : 2
4 : 3
5 : 5
6 : 8
7 : 13
8 : 21
9 : 34
10 : 55

但是这样写有个弊端，当执行前100项的时候，由于不停重复递归，执行起来非常的慢。为了解决这个问题，可以通过记忆化(Memoization)。记忆化背后想法很简单，储存返回值(Cache value)。这样再调用就不重复执行。

创建字典fxx_cache，储存调用结果，然后重写f函数，并使用储存的值。首先检查n项的值是否在字典里，如果在直接返回。不在就求他的值，先计算然后储存起来再返回。

fibonacci_cache = {}

def fibonacci(n):
    # If we have cache the value, then return it
    if n in fibonacci_cache:
        return fibonacci_cache[n]

    # Compute the Nth term
    if n == 1:
        value = 1
    elif n == 2:
        value = 1
    elif n > 2:
        value = fibonacci(n-1) + fibonacci(n-2)

    # Cache the value and return it
    fibonacci_cache[n] = value
    return value

for n in range(1, 101):
    print(n, ":", fibonacci(n))

测试之后发现就算测试1000项也是瞬间完成。

还有一个更简单的方法。从functools模块中导入装饰器lru_cache。代表最近最少使用原则，提供一种单向的缓存方式。在函数前键入@lru_cache(),在括号中指定要缓存的条目数,默认Py储存128个最近使用的值。

from functools import lru_cache

@lru_cache(maxsize = 1000)
def fibonacci(n):
    if n == 1:
        return 1
    elif n == 2:
        return 1
    elif n > 2:
        return fibonacci(n-1) + fibonacci(n-2)

for n in range(1, 501):
    print(n, ":", fibonacci(n))

这样执行结果会非常的快，但是当输入2.1，-1，‘1’的时候会报错，因为我们没有做参数的类型检查。

from functools import lru_cache

@lru_cache(maxsize = 1000)
def fibonacci(n):
    # Check that the input is a positive integer
    if type(n) != int:
        raise TypeError("n must be a positive int")
    if n < 1:
        raise ValueError("n must be a positive int") 

    # Compute the Nth term 
    if n == 1:
        return 1
    elif n == 2:
        return 1
    elif n > 2:
        return fibonacci(n-1) + fibonacci(n-2)

for n in range(1, 501):
    print(n, ":", fibonacci(n))

随机模块( Python Random Number Generator: the Random Module )

先来一条需要重视的警告，密码(cryptography)中常用随机数，但随机模块中的函数并不为此而构建，Py警告我们不要将该模块用于安全目的。但对大多数其他程序来说，随机模块是很棒的工具。

首先要导入random模块,先从random函数开始。该函数返回0到1之间的随机数，注意会返回0，但是不会返回1。

# Display 10 random numbers from interval [0, 1)

import random

for i in range(10):
    print(random.random())

python3 ran.py
0.5106691035823852
0.46531635162452156
0.6855409873422025
0.09298690069781068
0.5117634349489444
0.11244089989308159
0.3869049755854831
0.3035650646012348
0.11256644727198029
0.6335314012411107

random函数还可以用来表示均匀分布(uniform distribution),意味所选数字的概率在区间内平均分布。

random函数可以用来定制随机数生成器，这是随机模块中许多函数的核心部分。

例如生成3~7之间的随机数:使用uniform函数，该函数从属于random模块.

# Generate random numbers from interval [3, 7)

import random

for i in range(10):
    print(random.uniform(3, 7))

python3 ran.py
5.065995219714599
6.053861064813075
4.355351836924287
3.8442052835650293
3.4617765863944605
6.012704355572884
4.352595206539573
5.217513736639914
4.868314576510137
4.154966578672884

random函数和uniform函数都符合均分布,其他分布的一些数组也会被用到。最常见的是正态分布，也叫钟形曲线。正态分布有两个参数描述，平均值(mean)和标准差(standard deviation).平均值( $\mu$ )就是钟形曲线的顶点，标准差( $\sigma$ )描述曲线的宽度或窄度.

从正态分布生成随机数，使用normalvariate函数，使用该函数必须设置两个参数的值。

例如,基于正态分布生成20个数字，平均值0，标准差1:

import random

for i in range(20):
    print(random.normalvariate(0, 1))

▶ python3 ran.py
0.8333294121536797
0.755624821412367
1.478740726288408
0.07938317039808185
0.291908992130876
-0.22504815004343645
-0.8676884812379501
-0.8029345982713375
-1.043558392568845
-3.6805077274987053
0.21430472612013182
1.046297522716622
1.3217391478625042
-0.15949303555422145
-0.12060525806010007
1.1275886234780748
1.691386051133269
0.7738680349015079
1.3120164616558796
-0.9755020257704168

注意:这些数字都在平均值0附近聚集.如果将标准差改为9，这些数字的范围会变得更广。如果将标准差改小，生成的随机数会更收缩.

有时不需从无限多的可能中选择随机数，例如摇骰子，6个值之一。使用randint函数.

例如，生成1到6之间的随机整数:

import random

for i in range(20):
    print(random.randint(1, 6))

python3 ran.py
2
1
2
1
3
6
3
5
4
5
1
2
2
6
5
2
6
3
3
1

有时甚至需要随机从非数字列表中选,假如编写一个石头剪子布游戏。

使用choice函数从列表中随机输出值，在choice函数中指定Py要选择的列表

import random

outcomes = ['rock', 'paper', 'scissors']

for i in range(20):
    print(random.choice(outcomes))

python3 ran.py
rock
paper
rock
scissors
scissors
paper
paper
scissors
paper
rock
scissors
scissors
paper
rock
scissors
scissors
rock
paper
rock
rock

文件操作( File Operations )

处理CSV数据格式( CSV Files )

CSV是逗号分割值的缩写，是储存数据的常用格式。使用它们不需要额外驱动或API。

CSV是一个包含数据的文本文件，通常文件第一行都是头文件，后续的都是数据。将每行都看作是数据库中的记录，每行数据块都是由逗号分割。注意此为文本文件，没有数据类型。

当阅读CSV文件，有必要将数据转换为适当的数据类型。看到,,意味着一段数据消失了。

下面导入本地的股票csv数据，通过open函数打开该路径下的文件。

path = "/Users/zhangqiang/Desktop/Programming/Python/blog_learning/google_stock_data.csv"

file = open(path)
for line in file:
    print(line)

为了有用，需要储存这些数据。下面展示两种实现方式:

先在不使用csv模块下分析数据，通过列表快速读取文件内容。注意:每一行都视为一个字符串！此外，每行最后都有换行符。

>>> path = "/Users/zhangqiang/Desktop/Programming/Python/blog_learning/google_stock_data.csv"
>>> lines = [line for line in open(path)]
>>> lines[0]
'Date,Open,High,Low,Close,Volume,Adj Close\n'
>>> lines[1]
'8/19/2014,585.002622,587.342658,584.002627,586.862643,978600,586.862643\n'

使用strip方法移除字符串从头到尾指定的字符(默认为空格)，使用split方法将其切成小片，只需要指定分隔符,。这样每行读起来更好。

>>> lines[0].strip()
'Date,Open,High,Low,Close,Volume,Adj Close'
>>> lines[0].strip().split(',')
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']

再来读取遍数据，本次注意移除不必要的空隙，然后将其切成分开的值。再来检查前两行，会发现每行数据都写在列表中。

>>> dataset = [lines.strip().split(',') for lines in open(path)]
>>> dataset[0]
['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Adj Close']
>>> dataset[1]
['8/19/2014', '585.002622', '587.342658', '584.002627', '586.862643', '978600', '586.862643']

不幸的是，数据仍然是字符串。还有另一个问题，如果csv数据包含书籍或电影怎么办？许多标题下会有评论，用拆分这些标题和评论。遇到这样的问题就要引入csv模块了。

首先导入csv模块，首先打开文件，注意使用了newline关键字参数，然后传入空的字符串。因为不同操作系统，字符串结尾有换行或回车或都有。这么写有利于csv在不同平台都能正常工作。创建变量reader，值为调用csv.reader读取文件。

第一行是头文件，不包含数据，通过next函数提取第一行。现在再来读取数据。

import csv

path = "/Users/zhangqiang/Desktop/Programming/Python/blog_learning/google_stock_data.csv"
file = open(path, newline='')
reader = csv.reader(file)

header = next(reader)   # The first line is the header
data = [row for row in reader]  # Read the remaining data

print(reader)
print(data[0])

函数式编程( Functional Programming )

列表推导式( List Comprehension )

列表推导式也是由[]构成，接下来创建带if语句的列表推导式:【 expr for val in collection 】。这个是最基本的列表推导式格式，列表中第一个元素是表达式，对收集到的数据进行循环，每一项都执行表达式。

如果只想循环包含某些条件的数据块，接下来在for循环后创建带if语句的列表推导式，只有在if语句为真，才执行表达式:【 expr for val in collection if (test) 】。

接下来创建带if语句的列表推导式，可以是多个if语句:【 expr for val in collection if (test1) and (test2) 】。

也可以循环多组数据:【 expr for val1 in collection1 and val2 in collection2 】。

第一个例子:创建前100正整数的平方的列表

先写一个不用列表推导式的:

squares = []
for i in range(1, 101):
    squares.append(i**2)

print(squares)

1
2
3

python3 list_com.py

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801, 10000]

再用列表推导式实现一遍:中括号中第一个是每一项要执行的表达式，最后执行循环范围

1 2	squares2 = [i**2 for i in range(1, 101)] print(squares2)

1
2
3

python3 list_com.py

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400, 441, 484, 529, 576, 625, 676, 729, 784, 841, 900, 961, 1024, 1089, 1156, 1225, 1296, 1369, 1444, 1521, 1600, 1681, 1764, 1849, 1936, 2025, 2116, 2209, 2304, 2401, 2500, 2601, 2704, 2809, 2916, 3025, 3136, 3249, 3364, 3481, 3600, 3721, 3844, 3969, 4096, 4225, 4356, 4489, 4624, 4761, 4900, 5041, 5184, 5329, 5476, 5625, 5776, 5929, 6084, 6241, 6400, 6561, 6724, 6889, 7056, 7225, 7396, 7569, 7744, 7921, 8100, 8281, 8464, 8649, 8836, 9025, 9216, 9409, 9604, 9801, 10000]

再看下一个例子:获取前100个数的平方值除以5的余数列表

1
2
3

# x ** 2 % 5 <====> x^2 (mod5)
remainder5 = [x**2 % 5 for x in range(1, 101)]
print(remainder5)

1
2
3

python3 list_com.py

[1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0, 1, 4, 4, 1, 0]

打印后，除以5的余数有3种结果0，1，4。

观察模是任意质数p时，会看到一个有趣的模式，结果长度就是p+1再除以2。找出列表中数字的问题是被称为二次互反律的复杂数学问题，最早由高斯证明。

\begin{aligned} p_remainders &= [ x ** 2 \% p for x in range(0, p)] \\ len(p_remainders) &= \frac{p+1}{2} \end{aligned}

接下来创建带if语句的列表推导式:假设有一个电影名的列表，找出那些以字母G开头的列表

首先先不用列表推导式:需要额外创建一个空列表，然后遍历数据列表movies，使用startswith方法，查找是否以G开头。如果是插入创建的列表中。

movies = ["Star Wars", "Gandhi", "Casablanca",
          "Shawshank Redemption", "2001: A Space Odyssey", "Groundhog Day"]

gmovies = []
for title in movies:
    if title.startswith("G"):
        gmovies.append(title)
print(gmovies)

1
2
3

python3 list_com.py

['Gandhi', 'Groundhog Day']

通过列表推导式，4行代码1行搞定:表达式首先指定title，循环遍历movies，if判断是否首字母是G。

movies = ["Star Wars", "Gandhi", "Casablanca",
          "Shawshank Redemption", "2001: A Space Odyssey", "Groundhog Day"]

gmovies = [title for title in movies if title.startswith("G")]
print(gmovies)

1
2
3

python3 list_com.py

['Gandhi', 'Groundhog Day']

再看下一个例子:假设movies是一个元祖，包含电影名和发行时间。如果想找到2000之前发行的电影，怎么通过列表推导式实现？

先创建一个只包含名字的列表，但每次循环遍历的元素是元祖。创建if语句是判断年份的列表推导式。用并不存在于title列表的year进行判断最后只有电影名。

movies = [("Citizen Kane", 1941), ("Spirited Away", 2001), ("Gattaca", 1997), ("The Aviator", 2004), ("Raiders of the Lost Ark", 1981)]

pre2k = [title for (title, year) in movies if year < 2000]

print(pre2k)

1
2
3

python3 list_com.py

['Citizen Kane', 'Gattaca', 'Raiders of the Lost Ark']

再看下一个例子:假设有一个v列表，如何实现v的标量乘法让每一项*4。可以通过列表推导式实现标量乘法。

注意:4*v不能实现，只是进行了4次连接，在Py中两个列表相加会连接在一起。

>>> v = [2, -3, 1]
>>> 4 * v
[2, -3, 1, 2, -3, 1, 2, -3, 1, 2, -3, 1]
>>>
>>> [2, 4, 6] + [1, 3]
[2, 4, 6, 1, 3]

1
2
3

v = [2, -3, 1]
w = [4*x for x in v]
print(w)

1	[8, -12, 4]

接下来是最后一个例子:用列表推导式求集合的笛卡尔积。假设你有A,B两个数组，笛卡尔积是指一对集合，a是集合A中元素,b是集合B中元素。数学表达式通常如下

A \times B = \lbrace (a,b) \ |\ a \in A, b \in B \rbrace

列表推导式中用元祖(a, b)，先循环集合A，再循环集合B。

A = [1, 3, 5, 7]
B = [2, 4, 6, 8]

cartesian_product = [(a, b) for a in A for b in B]
print(cartesian_product)

1
2
3

python3 list_com.py

[(1, 2), (1, 4), (1, 6), (1, 8), (3, 2), (3, 4), (3, 6), (3, 8), (5, 2), (5, 4), (5, 6), (5, 8), (7, 2), (7, 4), (7, 6), (7, 8)]

运用列表生成式，可以写出非常简洁的代码。例如，列出当前目录下的所有文件和目录名，可以通过一行代码实现：

1
2
3

>>> import os # 导入os模块，模块的概念后面讲到
>>> [d for d in os.listdir('.')] # os.listdir可以列出文件和目录
['.emacs.d', '.ssh', '.Trash', 'Adlm', 'Applications', 'Desktop', 'Documents', 'Downloads', 'Library', 'Movies', 'Music', 'Pictures', 'Public', 'VirtualBox VMs', 'Workspace', 'XCode']

列表生成式也可以使用两个变量来生成list：

1
2
3

>>> d = {'x': 'A', 'y': 'B', 'z': 'C' }
>>> [k + '=' + v for k, v in d.items()]
['y=B', 'x=A', 'z=C']

生成器和迭代器

生成器

通过列表生成式，我们可以直接创建一个列表。但是，受到内存限制，列表容量肯定是有限的。而且，创建一个包含100万个元素的列表，不仅占用很大的存储空间，如果我们仅仅需要访问前面几个元素，那后面绝大多数元素占用的空间都白白浪费了。

所以，如果列表元素可以按照某种算法推算出来，那我们是否可以在循环的过程中不断推算出后续的元素呢？这样就不必创建完整的list，从而节省大量的空间。在Python中，这种一边循环一边计算的机制，称为生成器：generator。

要创建一个generator，有很多种方法。第一种方法很简单，只要把一个列表生成式的[]改成()，就创建了一个generator：

>>> L = [x * x for x in range(10)]
>>> L
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
>>> g = (x * x for x in range(10))
>>> g
<generator object <genexpr> at 0x1022ef630>

如果要一个一个打印出来，可以通过next()函数获得generator的下一个返回值：

>>> next(g)
0
>>> next(g)
1
>>> next(g)
4
>>> next(g)
9
>>> next(g)
16

当然，上面这种不断调用next(g)实在是太变态了，正确的方法是使用for循环，因为generator也是可迭代对象：

>>> g = (x * x for x in range(10))
>>> for n in g:
...     print(n)
... 
0
1
4
9
16
25
36
49
64
81

所以，我们创建了一个generator后，基本上永远不会调用next()，而是通过for循环来迭代它，并且不需要关心StopIteration的错误

定义generator的另一种方法：如果一个函数定义中包含yield关键字，那么这个函数就不再是一个普通函数，而是一个generator：

这里，最难理解的就是generator和函数的执行流程不一样。函数是顺序执行，遇到return语句或者最后一行函数语句就返回。而变成generator的函数，在每次调用next()的时候执行，遇到yield语句返回，再次执行时从上次返回的yield语句处继续执行

>>> def odd():
...     print("111")
...     yield 1
...     print("222")
...     yield 3
...     print("333")
...     yield 5
...
>>> o = odd()
>>> next(o)
111
1
>>> next(o)
222
3
>>> next(o)
333
5
>>> next(o)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

迭代器

可以被next()函数调用并不断返回下一个值的对象称为迭代器：Iterator

生成器都是Iterator对象，但list、dict、str虽然是Iterable，却不是Iterator。

把list、dict、str等Iterable变成Iterator可以使用iter()函数：

>>> isinstance(iter([]), Iterator)
True
>>> isinstance(iter('abc'), Iterator)
True

类和对象( Python Classes and Objects )

在Py中先写关键字class,然后在写类名，推荐类名首字母大写。

首先创建类User，现在已定义了类,供不同使用者调用，创建变量user1，值为类名加()，和调用方法有些像。
实际上称呼user1为User类的实例(instance),也可以称呼user1为对象(object).添加数据得先写对象名,然后.变量名，最后是值。
下面给对象first_name和last_name两个值。first_name和last_name写入对象后，被称为域(field)。在user1中储存具体数据.
注意,属性名不要大写.

>>> class User:
...     pass
...
>>> user1 = User()
>>> user1.first_name = "Dave"
>>> user1.last_name = "Bowman"
>>> # (`对象名.属性名`)
>>> print(user.first_name)
Dave

类中可以创建对象，不同对象的变量名一样，值可以不一样。

>>> user2 = User()
>>> user2.first_name = "Frank"
>>> user2.last_name = "Poole"
>>>
>>> print(user1.first_name, user1.last_name)
Dave Bowman
>>> print(user2.first_name, user2.last_name)
Frank Poole

下面展示类的其他功能,增加方法，初始化，文档字符串，可以把简单的类变为一个数据中心。
第一个特性来添加init方法, 类中的函数叫做方法.init方法前后都要写__init__(魔法方法), 每次创建新的实例对象都会调用它。第一个参数是self，指向最新创建的对象，在self参数后面还可以添加其他参数。例如full_name和birthday.第一件事将这些值存到对象属性中，self.name值为full_name。full_name存储为属性self.name.self.birthday值为birthday。此处要小心，这个等号后面的birthday在调用对象是指定。self.birthday的birthday是保存刚才值的属性。
下面创建一个user，并使用__init__方法，但是需要提供两个值，因为这里的__init__方法需要两个值，第一个是名字，第二个是生日。分别打印该对象两个属性值

class User:
    def __init__(self, full_name, birthday):
        self.name = full_name
        self.birthday = birthday  # yyyymmdd

user = User("Dave Bowman", "19710315")
print(user.name)
print(user.birthday)

python3 classes.py

Dave Bowman
19710315

再来给类添加一个功能，在__init__方法中，分别提取first和last名字，对full_name使用split方法来切片，每当遇到空格就会将值切开,切片会储存在数组中, first_name值就是数组的第一项[0],last_name就是数组的最后一项,也就是-1。

class User:
    def __init__(self, full_name, birthday):
        self.name = full_name
        self.birthday = birthday  # yyyymmdd

        # Extract first and last names
        name_pieces = full_name.split(" ")
        self.first_name = name_pieces[0]
        self.last_name = name_pieces[-1]

user = User("Dave Bowman", "19710315")
print(user.name)
print(user.first_name)
print(user.last_name)
print(user.birthday)

python3 classes.py

Dave Bowman
Dave
Bowman
19710315

通过写帮助文档来提升类。再来给User类添加一个按年头返回年龄的方法，和__init__方法一样，第一个参数self，为了显示责任心，写一段注释。通过生日来计算年龄，所以该方法不需要额外的参数。因为会用到日期，导入datetime模块，
首先获得今天的日期，将日期写死成2001.05.12，为了使每个人测试结果相同，接下来将生日转成日期格式，通过[起:止]分别获取年月日。
可以创建一个User生日的对象，如果计算今天和生日中日期的差，得到一个日期增量对象，属性是days，不考虑闰年，拿日期差除以365，最后将age_in_years转成整型并返回。
测试这个方法，还是用之前的user,打印user.age().

import datetime

class User:
    """A member of FriendFace. For now we are only storing their name and birthday. But soon we will store an uncomfortable amount of user information."""
    def __init__(self, full_name, birthday):
        self.name = full_name
        self.birthday = birthday  # yyyymmdd

        # Extract first and last names
        name_pieces = full_name.split(" ")
        self.first_name = name_pieces[0]
        self.last_name = name_pieces[-1]

    def age(self):
        """Return the age of the user in years."""
        today = datetime.date(2001, 5, 12)
        yyyy = int(self.birthday[0:4])
        mm = int(self.birthday[4:6])
        dd = int(self.birthday[6:8])
        dob = datetime.date(yyyy, mm, dd) # Date of birth
        age_in_days = (today - dob).days
        age_in_years = age_in_days / 365
        return int(age_in_years)

user = User("Dave Bowman", "19710315")
print(user.age())

1 2	python3 classes.py 30

注意调用age方法是，不用写self.只有编写方法时才会写关键字self.
再来使用help(User)

help(User)

Help on class User in module __main__:

class User(builtins.object)
 |  User(full_name, birthday)
 |  
 |  A member of FriendFace. For now we are only storing their name and birthday. But soon we will store an uncomfortable amount of user information.
 |  
 |  Methods defined here:
 |  
 |  __init__(self, full_name, birthday)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  age(self)
 |      Return the age of the user in years.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
:

总结中会出现age方法的描述.

[总结-类(Classes)]:

通过域(属性)(Fields)将数据整合
执行的函数叫方法(Methods)
类中的很多对象叫实例(Instances)

类的继承( Inheritance )

class Animal(object):
    def run(self):
        print('Animal is running...')

class Dog(Animal):

    def run(self):
        print('Dog is running...')

class Cat(Animal):

    def run(self):
        print('Cat is running...')

dog = Dog()
dog.run()

cat = Cat()
cat.run()

1 2	Dog is running... Cat is running...

super().init()

class Person:
    def __init__(self, name = 'Person'):
        self.name = name


class Puple(Person):  # 直接继承Person，可调用name
    pass


class Puple_Init(Person):  # 继承Person，覆盖父类__init__方法，增加age属性
    def __init__(self, age): # 不可调用name属性
        self.age = age

class Puple_Super(Person):  # 继承Person，改写__init__方法，增加age属性
    def __init__(self, name, age):  # 可调用name属性
        # super(Class, self).method()
        super(Puple_Super, self).__init__(name)
        self.age = age

模块( Modules )

Python中的一个模块基本上是一个带有python代码的文件。在Python中，一个.py文件就称之为一个模块（Module）。

为了避免模块名冲突，Python又引入了按目录来组织模块的方法，称为包（Package）。

举个例子，一个abc.py的文件就是一个名字叫abc的模块，一个xyz.py的文件就是一个名字叫xyz的模块。

现在，假设我们的abc和xyz这两个模块名字与其他模块冲突了，于是我们可以通过包来组织模块，避免冲突。方法是选择一个顶层包名，比如mycompany，按照如下目录存放：

mycompany
├─ __init__.py
├─ abc.py
└─ xyz.py

引入了包以后，只要顶层的包名不与别人冲突，那所有模块都不会与别人冲突。现在，abc.py模块的名字就变成了mycompany.abc，类似的，xyz.py的模块名变成了mycompany.xyz。

请注意，每一个包目录下面都会有一个__init__.py的文件，这个文件是必须存在的，否则，Python就把这个目录当成普通目录，而不是一个包。init.py可以是空文件，也可以有Python代码，因为__init__.py本身就是一个模块，而它的模块名就是mycompany。

类似的，可以有多级目录，组成多级层次的包结构。比如如下的目录结构：

mycompany
 ├─ web
 │  ├─ __init__.py
 │  ├─ utils.py
 │  └─ www.py
 ├─ __init__.py
 ├─ abc.py
 └─ utils.py

文件www.py的模块名就是mycompany.web.www，两个文件utils.py的模块名分别是mycompany.utils和mycompany.web.utils。

注意：自己创建模块时要注意命名，不能和Python自带的模块名称冲突。例如，系统自带了sys模块，自己的模块就不可命名为sys.py，否则将无法导入系统自带的sys模块。

引入模块

例如这里我们编写了以下两个函数位于converters.py文件中:

def lbs_to_kg(weight):
    pass
def kg_to_lbs(weight):
    pass

我们在同一路径下另一个文件中就可以引用:

1
2
3

import converters

print(converters.kg_to_lbs(70))

我们还可以用另外的方法引用：

1
2
3

from converters import kg_to_lbs

kg_to_lbs(100)

从我们的模块中得到一个特定的函数时候，就可以直接调用这个函数，前面不需要加上模块名前缀。

from … import * 语句

把一个模块的所有内容全都导入到当前的命名空间也是可行的，只需使用如下声明：

1	from modname import *

这提供了一个简单的方法来导入一个模块中的所有项目。然而这种声明不该被过多地使用。

name属性

一个模块被另一个程序第一次引入时，其主程序将运行。如果我们想在模块被引入时，模块中的某一程序块不执行，我们可以用__name__属性来使该程序块仅在该模块自身运行时执行。

#!/usr/bin/python3
# Filename: using_name.py

if __name__ == '__main__':
   print('程序自身在运行')
else:
   print('我来自另一模块')

1
2
3

python using_name.py

程序自身在运行

python
>>> import using_name
我来自另一模块
>>>

说明：每个模块都有一个__name__属性，当其值是__main__时，表明该模块自身在运行，否则是被引入。

包( Packages )

为了组织好模块，会将多个模块分为包。Python 处理包也是相当方便的。简单来说，包就是文件夹，但该文件夹下必须存在__init__.py文件。

常见的包结构如下：

package_a
├─ __init__.py
├─ module_a1.py
└─ module_a2.py
└─ ...

最简单的情况下，只需要一个空的__init__.py文件即可。当然它也可以执行包的初始化代码，或者定义稍后介绍的__all__变量。当然包底下也能包含包，这和文件夹一样，还是比较好理解的。

当我们使用from语句导入时，我们可以从这个包(package)导入一个特定的模块(module)，或者我们可以从package.module开始，然后导入一个或多个特定的函数。

无论是隐式的还是显式的相对导入都是从当前模块开始的。主模块的名字永远是__main__，一个Python应用程序的主模块，应当总是使用绝对路径引用。

lambda表达式和匿名函数( Lambda Expressions & Anonymous Functions )

匿名和lambda可以互换

创建lambda表达式前先写lambda，然后紧跟着输入，接下来输入冒号，最后输入能返回值的具体表达式。

1
2
3

>>> g = lambda x: 3*x+1
>>> g(2)
7

来看下一个例子，假设你要处理来自网页注册表单的数据，你想把分开的名和姓在用户界面上拼成一个全名。

匿名函数有两个输入fn和ln，首先两个值都要通过strip函数去掉收尾的空格。使用title函数确保只有首字母大写。注意我们在fn和ln之间加了一个空格，

1
2
3

>>> full_name = lambda fn,ln: fn.strip().title() + " " + ln.strip().title()
>>> full_name("    leonhard", "EULER")
'Leonhard Euler'

和使用其他函数一样测试它。

以下是创建lambda表达式的基本流程:

先输入lambda关键字后面是若干参数，和函数一样，匿名函数可以不带输入(参数)
然后输入冒号
最后输入一个表达式，表达式结果是返回值

注意:多行函数不能使用lambda表达式

下面看一下lambda表达式的常见用法:不命名

假设有一个科学家的名字的列表scifi_authors，通过姓来排序，注意，其中有一些有中间名，有的有首字母。
我们的策略创建一个匿名函数提取姓，基于此来排序对应值。列表有内置方法sort，key参数指定要排序的目标。我们传入一个lambda表达式，通过split按照空格切割来找到姓，索引-1找到最后一个，最后预防措施，转为小写。这样排序就不分大小写了。

scifi_authors = ["Isaac Asimov", "Ray Bradbury", "Robert Heinlein", "Arthus C. Clarke", "Frank Herbert", "Orson Scott Card", "Douglas Adams", "H. G. Wells", "Leigh Brackett"]
>>> scifi_authors.sort(key=lambda name: name.split(" ")[-1].lower())
>>> scifi_authors
['Douglas Adams', 'Isaac Asimov', 'Leigh Brackett', 'Ray Bradbury', 'Orson Scott Card', 'Arthus C. Clarke', 'Robert Heinlein', 'Frank Herbert', 'H. G. Wells']

继续深入，接下来我们要编写一个函数。假设你是用二次函数，也许你在计算炮弹轨迹。

>>> def build_quadratic_function(a, b, c):
...     """Returns the function f(x) = ax^2 + bx + c"""
...     return lambda x: a*x**2 + b*x + c
...
>>> f = build_quadratic_function(2, 3, -5)
>>> f(0)
-5
>>> f(1)
0
>>> f(2)
9

还有更神奇的测试，不同别名直接调用该函数。函数的创建不一样了，在(3,0,1)后直接(2)

1 2	>>> build_quadratic_function(3, 0, 1)(2) # 3x^2+1 evaluated for x=2 13

必备函数map filter reduce操作( Map, Filter, and Reduce Functions )

map函数

先看map函数，假设有一个计算半径r面积的函数，我们需要计算很多不同圆的面积，这是要计算的半径列表。

方法一可以创建一个空的列表areas，接下来循环遍历radii中值，分别计算每个半径对应的面积，然后将值插入到areas列表中。

import math


def area(r):
    """Area of a cricle with radius 'r'."""
    return math.pi * (r**2)


radii = [2, 5, 7.1, 0.3, 10]

# Method 1: Direct method

areas = []
for r in radii:
    a = area(r)
    areas.append(a)

print(areas)

1 2	python3 map_radii.py [12.566370614359172, 78.53981633974483, 158.36768566746147, 0.2827433388230814, 314.1592653589793]

使用map函数我们一行代码就能搞定,map函数需要2个参数，一个是函数，一个是列表，元祖或其他要遍历的对象

在这里，map会对radii中每个元素应用area函数。但是注意，map函数输出不是一个列表，是一个map对象，对结果的迭代。这是非常好的，尤其是在处理大量数据的时候，我们可以将map结果存到一个列表中。

import math


def area(r):
    """Area of a cricle with radius 'r'."""
    return math.pi * (r**2)


radii = [2, 5, 7.1, 0.3, 10]

# Method 2: Use 'map' function

print(list(map(area, radii)))

1 2	python3 map_radii.py [12.566370614359172, 78.53981633974483, 158.36768566746147, 0.2827433388230814, 314.1592653589793]

下面是map函数的工作方式

假设你有列表，元祖或其他可迭代的数据集，会对每段数据应用f函数，使用map函数首先写函数f，然后写要遍历的数据。map函数会返回通过f函数遍历数据后的结果。

第二个例子在地图类数据上应用map函数。假设你在天气预报服务工作，所有的温度数据都按照摄氏度储存，然后出乎意料有人需要看华氏温度。

我们的目标是将列表中温度转为华氏温度，摄氏度转华氏温度公式: $F = \frac{9}{5} \cdot C + 32$ 。首先通过lambda表达式完成公式，输入接受一个元祖，返回元祖第0项，名字不变，但摄氏温度转为华氏。

temps = [("Berlin", 29), ("Cairo", 36), ("Buenos Aires", 19), ("Los Angeles", 26), ("Tokyo", 27), ("New York", 28), ("London", 22), ("Beijing", 32)]

c_to_f = lambda data: (data[0], (9/5)*data[1] + 32)

print(list(map(c_to_f, temps)))

1 2	python3 map_radii.py [('Berlin', 84.2), ('Cairo', 96.8), ('Buenos Aires', 66.2), ('Los Angeles', 78.80000000000001), ('Tokyo', 80.6), ('New York', 82.4), ('London', 71.6), ('Beijing', 89.6)]

map()作为高阶函数，还可以计算任意复杂的函数，比如，把这个list所有数字转为字符串：

1 2	>>> list(map(str, [1, 2, 3, 4, 5, 6, 7, 8, 9])) ['1', '2', '3', '4', '5', '6', '7', '8', '9']

filter函数

filter函数从列表，元祖等数据集中过滤需要的数据，因为可以过滤掉不需要的数据所以叫fillter。

假设你在分析一些数据，需要筛选高于平均值的所有值。

因为包含平均数，先导入statistic模块，计算这些数据的平均值。然后通过filter函数过滤大于平均值的数，第一个参数是函数，创建一个匿名函数来检测输入值是否大于平均值。第二个参数是传入的数据。
filter函数只会返回结果为True的数据，重申返回值不是列表，但filter对象会遍历结果。通过list()将过滤器结果保存为列表。

import statistics

data = [1.3, 2.7, 0.8, 4.1, 4.3, -0.1]
avg = statistics.mean(data)

print("avg = %lf" % avg)
print(list(filter(lambda x: x > avg, data)))

python3 filter_avg.py

avg = 2.183333
[2.7, 4.1, 4.3]

再来看一个filter函数的用法。当你处理来自外部源的数据时，经常会遇到丢失的数据或空值。例如这是南美洲部分国家的名单，注意有许多字符串是空的。

通过filter函数来移除，不使用其他函数。第一个参数为None，第二个参数是数据。会过滤到值为false的数据。

# Remove missing data

countries = ["", "Argentina", "", "Brazil", "Chile",
             "", "Colombia", "", "Ecuador", "", "", "Venezuela"]

print(list(filter(None, countries)))

1
2
3

python3 filter_avg.py

['Argentina', 'Brazil', 'Chile', 'Colombia', 'Ecuador', 'Venezuela']

在Py中以下这些值会被当作False：空字符串，0，0.0，0j，空列表，空元祖，空字典，False，None。这样使用filter函数要小心，大多时候0是有效的数据块，所以不需要过滤掉。

把一个序列中的空字符串删掉，可以这么写：

def not_empty(s):
    return s and s.strip()

list(filter(not_empty, ['A', '', 'B', None, 'C', '  ']))
# 结果: ['A', 'B', 'C']

reduce函数

reduce函数把一个函数作用在一个序列[x1, x2, x3, $\cdots$ ]上，这个函数必须接收两个参数，reduce把结果继续和序列的下一个元素做累积计算，其效果就是：

reduce(f, [x1, x2, x3, x4]) = f(f(f(x1, x2), x3), x4)

要使用reduce函数先导入functools模块。通过reduce函数将列表中所有数据相乘，为了演示先创建一个含10个素数的列表。

使用reduce函数要有2个输入，使用lambda表达式创建彼此相乘。

from functools import reduce

# Multiply all numbers in a list

data = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
multiplier = lambda x, y: x*y

print(reduce(multiplier, data))

1
2
3

python3 reduces.py

6469693230

sort和sorted方法的排序区别( Sorting in Python )

先从一个例子看起，按照字母排序。列表是6种碱土金属数据，目前按照原子序数排序。但如果我们想按照字母顺序排序呢？

列表直接使用sort方法，默认情况下，sort假定你要按照字母顺序升序排序，如果想反过来，使用sort方法，然后将reverse属性设置为Ture。

>>> # Alkaline earth metals
... 
>>> earth_metals = ["Beryllium", "Magnesium", "Calcium", "Strontium", "Barium", "Radium"]
>>> earth_metals.sort()
>>> earth_metals
['Barium', 'Beryllium', 'Calcium', 'Magnesium', 'Radium', 'Strontium']
>>> 
>>> earth_metals.sort(reverse=True)
>>> earth_metals
['Strontium', 'Radium', 'Magnesium', 'Calcium', 'Beryllium', 'Barium']

如果这次将数据存到元祖中而不是列表，再来对元祖调用sort方法就会报错！原因是元祖是不可变对象，(位置)不能改变，而排序就要求可变动。

sort方法有两个参数，key和reverse属性，默认情况下reverse的值为False，意味着数据按照升序排序。注意，原地算法(in-place)表示Py不会创建第二个列表去储存排序数据，直接在列表自身上排序！

再来看下参数key，主要是用来进行比较的元素。使用参数key要传入一个函数，决定要排序的值。

来看一个例子，发现sort方法所以特性，创建太阳系的8个星球的列表。每个行星数据包括名字，以公里为单位的赤道半径，以单位克每立方厘米储存的密度，以天文单位储存的距日平均距离，一个天文单位是地球与太阳的平均距离。

目前行星根据距日距离来分类。现在我们要按照半径大小来排序，最大的放前面。需要创建一个函数，返回要排序的值，在本例中需要返回元祖的第2项值。可以使用lambda表达式，输入是行星的数据，输出是第二项的半径值。注意索引是从0而不是从1开始的。然后调用sort方法，并将size函数传给key，因为我们想将行星从大到小排序(降序)，将reverse的值为True。

planets = [
    ("Mercury", 2440, 5.43, 0.395),
    ("Venus", 6052, 5.24, 0.723),
    ("Earth", 6378, 5.52, 1.000),
    ("Mars", 3396, 3.93, 1.530),
    ("Jupiter", 71492, 1.33, 5.210),
    ("Saturn", 60268, 0.69, 9.551),
    ("Uranus", 25559, 1.27, 19.213),
    ("Neptune", 24764, 1.64, 30.070)
]

size = lambda planets: planets[1]
planets.sort(key=size, reverse=True)
print(planets)

1
2
3

python3 sorts.py

[('Jupiter', 71492, 1.33, 5.21), ('Saturn', 60268, 0.69, 9.551), ('Uranus', 25559, 1.27, 19.213), ('Neptune', 24764, 1.64, 30.07), ('Earth', 6378, 5.52, 1.0), ('Venus', 6052, 5.24, 0.723), ('Mars', 3396, 3.93, 1.53), ('Mercury', 2440, 5.43, 0.395)]

现在按照密度对行星进行分类，从自小密度开始

planets = [
    ("Mercury", 2440, 5.43, 0.395),
    ("Venus", 6052, 5.24, 0.723),
    ("Earth", 6378, 5.52, 1.000),
    ("Mars", 3396, 3.93, 1.530),
    ("Jupiter", 71492, 1.33, 5.210),
    ("Saturn", 60268, 0.69, 9.551),
    ("Uranus", 25559, 1.27, 19.213),
    ("Neptune", 24764, 1.64, 30.070)
]

density = lambda planets: planets[2]
planets.sort(key=density)
print(planets)

1
2
3

python3 sorts.py

[('Saturn', 60268, 0.69, 9.551), ('Uranus', 25559, 1.27, 19.213), ('Jupiter', 71492, 1.33, 5.21), ('Neptune', 24764, 1.64, 30.07), ('Mars', 3396, 3.93, 1.53), ('Venus', 6052, 5.24, 0.723), ('Mercury', 2440, 5.43, 0.395), ('Earth', 6378, 5.52, 1.0)]

在目前所有的例子中，sort方法改变了列表(list.sort())。如果想创建一个排序副本呢？元祖如何呢？对元祖排序有可能吗？这个时候就需要使用内置函数sorted()

sorted函数调用时第一个参数是列表，元祖或者任何可迭代对象。就像列表中的sort方法一样，可以设置key和reverse属性。

继续用上面的碱土金属的例子，通过sorted函数对其排序

>>> earth_metals = ["Beryllium", "Magnesium", "Calcium", "Strontium", "Barium", "Radium"]
>>> sorted_earth_metals = sorted(earth_metals)
>>> sorted_earth_metals
['Barium', 'Beryllium', 'Calcium', 'Magnesium', 'Radium', 'Strontium']
>>> earth_metals
['Beryllium', 'Magnesium', 'Calcium', 'Strontium', 'Barium', 'Radium']

我们发现按照字母表升序排序，并且原始的earth_metals列表顺序并没有变。

现在看下一个例子，对随机敲入10个正整数的元祖进行排序。元祖没有sort方法可用，因为它们是不可改动的，但如果传入sorted函数中，会返回一个排好序的列表。输入是元祖，输出是列表！原始的元祖不会变！

>>> data = (7, 2, 5, 6, 1, 3, 9, 10, 4, 8)
>>> sorted(data)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> data
(7, 2, 5, 6, 1, 3, 9, 10, 4, 8)

下一个例子也许会出乎意料，按照字母顺序对字符串进行排序。虽然是字符串，本身可迭代的。可以逐个字符迭代此字符串。

sorted会返回按照字符串中所有字符排序的列表。键入sorted("Alphabetically")，大写A排在小写a前面。

1 2	>>> sorted("Alphabetically") ['A', 'a', 'a', 'b', 'c', 'e', 'h', 'i', 'l', 'l', 'l', 'p', 't', 'y']

sort方法可以用于列表，但是这也会改变列表本身！
sorted方法更普遍，可以用于列表，元祖，字符串和其他可迭代对象。更赞的是不会改变原始数据，而返回排好序的新列表。

读取和写入文本文件( Text Files )

有两种类型的文件：文本文件(Text Files)和二进制文件(Binary Files)。
文本文件是人们可读的数据
- 例如:富文本(Plain Text),XML,JSON，甚至源码
二进制文件用于储存编译码(Compiled code)，App数据，音视频图类媒体文件(Media files(images,audio,video))

今天我们学如何读取和编写文本文件

读取文件

通过内置open函数来在Py中打开文件，今天主要用file和mode。

一种方式open()中指定文件名(放在同目录下)，默认情况下以只读模式打开文件，返回一个文件对象f。通过read方法来读取其中内容，会返回文件中所有文本信息到text中，最终再关闭文件f。打印text会看到所有内容。

>>> f = open("text.txt")
>>> text = f.read()
>>> f.close()
>>> 
>>> print(text)
used for open files

这种方法也有隐患，如果在close之前出了问题怎么办，不想让打开的文件乱占用内存。基于此，有更好更安全更精简的方式去读取文件。

第二种方法是通过关键字来打开文件，照旧按你指定的名字打开文件并保存到一个文件对象中。和之前一样的读取内容。但用了这招就不用在close()文件了。哪怕代码块发生异常，Py也会关闭该文件。

>>> with open("text.txt") as fobj:
...     bio = fobj.read()
... 
>>> print(bio)
used for open files

如果尝试open一个不存在的文件会如何？Py会报错文件未找到(FileNotFoundError)。其中一种解决办法，放到try中，当FileNotFoundError时要执行相应操作，例如，如果不存在，text值为None。只要文件未找到，便输入为None

>>> try:
...     with open("svbsdv.txt") as f:
...             text = f.open()
... except FileNotFoundError:
...     text = None
...
>>> print(text)
None

写入文件

例如将五大洋的名字写入到.txt文件中，需要指定模式w，代表写入。如果不指定模式，Py默认都是只读模式。如果oceans.txt不存在，Py会自动创建一个，如果已经存在，Py会直接在其中写入，要小心！

通过循环将ocean列表写入到oceans文件中，要使用write方法。记住在使用with方法打开文件，Py会自动为你close文件

>>> oceans = ["Pacific", "Atlantic", "Indian", "Southern", "Arctic"]
>>>
>>> with open("oceans.txt", "w") as f:
...     for ocean in oceans:
...             f.write(ocean)

1
2
3

cat oceans.txt

PacificAtlanticIndianSouthernArctic

我们发现，名字是都写入了，但是彼此之间没有分隔，因为没指定分隔，Py也并不认为我们需要它。在每次遍历名字后加入换行。

>>> oceans = ["Pacific", "Atlantic", "Indian", "Southern", "Arctic"]
>>>
>>> with open("oceans.txt", "w") as f:
...     for ocean in oceans:
...             f.write(ocean)
...             f.write("\n")

cat oceans.txt

Pacific
Atlantic
Indian
Southern
Arctic

还有另一种方式让每个字符串单独一行显示，可以使用print函数，并将file属性值设定为f

>>> oceans = ["Pacific", "Atlantic", "Indian", "Southern", "Arctic"]
>>>
>>> with open("oceans.txt", "w") as f:
...     for ocean in oceans:
...             print(ocean, file=f)

cat oceans.txt

Pacific
Atlantic
Indian
Southern
Arctic

注意，每运行一次代码文件就会被覆盖，因为我们使用的是写入模式，那么是否在不覆盖文本情况下写入文件呢？

在文件名后指定a，使用插入模式(append mode)，插入模式下如果目标文件不存在则创建，如果存在，Py会自动在末尾插入，不会覆盖原始文件的内容。

>>> oceans = ["Pacific", "Atlantic", "Indian", "Southern", "Arctic"]
>>> with open("oceans.txt", "w") as f:
...     for ocean in oceans:
...             print(ocean, file=f)
...
>>> with open("oceans.txt", "a") as f:
...     print(23*"=", file=f)
...     print("These are the 5 oceans.", file=f)

cat oceans.txt

Pacific
Atlantic
Indian
Southern
Arctic
=======================
These are the 5 oceans.

pathlib库的Path类

pathlib 提供了面向对象的文件系统路径操作。

Path.exists()

判断当前路径是否是文件或者文件夹，返回布尔值：

from pathlib import Path

path = Path("emails")
print(path.exists())

Path.mkdir()

根据路径创建文件夹：

from pathlib import Path

path = Path("emails")
path.mkdir()

Path.rmdir()

删除空文件夹。

Path.glob()

获取路径下所有符合pattern的文件，返回一个generator：

from pathlib import Path

path = Path()
for file in path.glob('*.md'):
    print(file)

输出示例：

blog-setup.md
python.md
docker.md
...

可以搜索不同的文件类型，例如 *.py, *.xls 等。

异常处理( Exceptions in Python )

通过try和except语句来处理异常。

try:
    age = int(input('Age: '))
    print(age)
except ValueError:
    print('Invalid value')

例如，上面的例子中，当出现输入值错误(ValueError)的时候，会执行except部分的代码。

当然错误的类型会有很多，要提前处理好响应的异常处理措施。

其他

zip与星号

a = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# 不加
list(zip(a))
Out[414]: [([1, 2, 3],), ([4, 5, 6],), ([7, 8, 9],)]

# 加星号
list(zip(*a))
Out[415]: [(1, 4, 7), (2, 5, 8), (3, 6, 9)]

# 相当于
m = [1, 2, 3]; n = [4, 5, 6]; q = [7, 8, 9]

list(zip(m,n,q))
Out[417]: [(1, 4, 7), (2, 5, 8), (3, 6, 9)]

# 解压
a, b, c = zip(*zip(m,n,q))

list(a)
Out[419]: [1, 2, 3]
list(b)
Out[420]: [4, 5, 6]
list(c)
Out[421]: [7, 8, 9]

魔法方法

1
2
3

class Test():
    def __getitem__(self, key: str):
        return key

如果在类中定义了__getitem__方法，那么它的实例对象t可以通过t[key]取值

1
2
3

class Test():
    def __repr__(self):
        return 'xxxx'

通过重写类的__repr__方法，从而实现当输出实例化对象时，输出我们想要的信息

类方法

class Test():

    def __init__(self, a, b):
        self.a = a
        self.b = b

    @classmethod
    def t1(cls):
        print("cls:", cls)

classmethod修饰符对应的函数不需要实例化，不需要 self 参数，但第一个参数需要是表示自身类的 cls 参数，可以来调用类的属性，类的方法，实例化对象等

正确理解Python中的 @staticmethod@classmethod方法

装饰器

@staticmethod

用于将方法转化为静态方法

class MyClass:
    @staticmethod
    def the_static_method():
        return "This is a static method"

@classmethod

用于将方法转化为类方法

class MyClass:
    class_variable = 0

    def __init__(self, value):
        self.instance_variable = value
    
    @classmethod
    def increment_class_variable(cls):
        cls.class_variable += 1

@property

用于将一个方法转化为属性，可以让你像访问属性一样访问该方法

class Circle:
    def __init__(self, radius):
        self._radius = radius
    
    @property
    def area(self):
        return 3.14 * self._radius ** self._radius

@setter

与 @property 一起使用，用于设置属性的值

class Circle:
    def __init__(self, radius):
        self._radius = radius
    
    @property
    def area(self):
        return 3.14 * self._radius ** self._radius

    @radius.setter
    def radius(self, value):
        if value < 0:
            raise ValueError("Radius must be positive")
        self._radius = value

@staticmethod

用于将方法转化为静态方法

class MathUtils:
    @staticmethod
    def add(x, y):
        return x + y

@functools.wraps

用于保留被装饰函数的元信息

import functools

def my_decorator(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print("Before the function is called")
        result = func(*args, **kwargs)
        print("After the function is called")
        return result
    return wrapper

@timeit

用于测量函数的执行时间

import time

def timeit(func):
    def wrapper(*args, **kwargs):
        start_time = time.time()
        result = func(*args, **kwargs)
        end_time = time.time()
        print(f"耗时: {end_time - start_time}")
        return result
    return wrapper

@login_required

用于检测用户是否登录

def login_required(func):
    def wrapper(request, *args, **kwargs):
        if request.user.is_authenticated:
            return func(request, *args, **kwargs)
        else:
            return redirect("/login/")
    return wrapper

@retry

用于在函数执行失败时重试指定次数

import time

def retry(max_retries):
    def decorator(func):
        def wrapper(*args, **kwargs):
            for _ in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    print(f"Exception {e}, retrying...")
                    time.sleep(1)
            return "Max retries exceeded"
        return wrapper
    return decorator

Python 核心概念综合示例

下面是一个包含 Python 所有核心概念的完整示例代码：

# ============ 装饰器 ============
from typing import List, Dict, Optional, Generic, TypeVar, Protocol
from abc import ABC, abstractmethod
from contextlib import contextmanager
import functools

# 函数装饰器
def log_calls(func):
    """记录函数调用的装饰器"""
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print(f"调用函数: {func.__name__}")
        result = func(*args, **kwargs)
        print(f"函数 {func.__name__} 返回: {result}")
        return result
    return wrapper

# 类装饰器
def singleton(cls):
    """单例模式装饰器"""
    instances = {}
    @functools.wraps(cls)
    def get_instance(*args, **kwargs):
        if cls not in instances:
            instances[cls] = cls(*args, **kwargs)
        return instances[cls]
    return get_instance

# ============ 协议（Protocol） ============
class Movable(Protocol):
    """可移动协议"""
    def move(self) -> str:
        ...

    def get_speed(self) -> float:
        ...

# ============ 抽象基类 ============
class Animal(ABC):
    """动物抽象基类"""

    # 类变量
    animal_count: int = 0
    MAX_AGE: int = 100

    def __init__(self, name: str, age: int):
        # 实例变量（私有）
        self._name = name
        self._age = age
        self.__secret = "秘密信息"  # 名称修饰
        Animal.animal_count += 1

    # property 装饰器（getter）
    @property
    def name(self) -> str:
        """获取名字"""
        return self._name

    # property setter
    @name.setter
    def name(self, value: str) -> None:
        """设置名字"""
        if not value:
            raise ValueError("名字不能为空")
        self._name = value

    @property
    def age(self) -> int:
        return self._age

    @age.setter
    def age(self, value: int) -> None:
        if value < 0 or value > self.MAX_AGE:
            raise ValueError(f"年龄必须在 0-{self.MAX_AGE} 之间")
        self._age = value

    # 抽象方法（子类必须实现）
    @abstractmethod
    def make_sound(self) -> str:
        """发出声音"""
        pass

    @abstractmethod
    def eat(self) -> str:
        """吃东西"""
        pass

    # 普通方法
    def sleep(self) -> str:
        return f"{self.name} 正在睡觉"

    # 类方法
    @classmethod
    def get_animal_count(cls) -> int:
        """获取动物总数"""
        return cls.animal_count

    # 静态方法
    @staticmethod
    def is_valid_age(age: int) -> bool:
        """验证年龄是否有效"""
        return 0 <= age <= Animal.MAX_AGE

    # 魔术方法
    def __str__(self) -> str:
        return f"{self.__class__.__name__}({self.name}, {self.age}岁)"

    def __repr__(self) -> str:
        return f"{self.__class__.__name__}(name='{self.name}', age={self.age})"

    def __eq__(self, other) -> bool:
        if not isinstance(other, Animal):
            return False
        return self.name == other.name and self.age == other.age

    def __lt__(self, other) -> bool:
        """用于排序"""
        return self.age < other.age

    def __enter__(self):
        """上下文管理器入口"""
        print(f"{self.name} 开始活动")
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        """上下文管理器出口"""
        print(f"{self.name} 结束活动")
        return False

# ============ 多重继承 + 实现协议 ============
class Flyable:
    """可飞行的 Mixin"""
    def fly(self) -> str:
        return "正在飞行"

class Swimmable:
    """可游泳的 Mixin"""
    def swim(self) -> str:
        return "正在游泳"

class Dog(Animal, Movable):
    """狗类（继承 + 实现协议）"""

    def __init__(self, name: str, age: int, breed: str):
        super().__init__(name, age)
        self.breed = breed
        self._speed = 0.0

    # 实现抽象方法
    def make_sound(self) -> str:
        return f"{self.name} 汪汪叫"

    def eat(self) -> str:
        return f"{self.name} 正在吃骨头"

    # 方法重写
    def sleep(self) -> str:
        # 调用父类方法
        parent_msg = super().sleep()
        return f"{parent_msg}，睡得很香"

    # 实现 Movable 协议
    def move(self) -> str:
        return f"{self.name} 正在奔跑，速度: {self._speed}km/h"

    def get_speed(self) -> float:
        return self._speed

    # 方法重载（Python 使用默认参数实现）
    def bark(self, times: int = 1, message: Optional[str] = None) -> None:
        if message:
            print(f"{self.name} 说: {message}")
        else:
            for _ in range(times):
                print(f"{self.name} 汪!")

    # 生成器方法
    def count_barks(self, max_count: int):
        """生成器：计数叫声"""
        for i in range(1, max_count + 1):
            yield f"第 {i} 次汪!"

    # 迭代器协议
    def __iter__(self):
        """使对象可迭代"""
        self._iter_count = 0
        return self

    def __next__(self):
        if self._iter_count < 3:
            self._iter_count += 1
            return f"汪 #{self._iter_count}"
        raise StopIteration

class Duck(Animal, Flyable, Swimmable):
    """鸭子（多重继承）"""

    def __init__(self, name: str, age: int):
        super().__init__(name, age)

    def make_sound(self) -> str:
        return f"{self.name} 嘎嘎叫"

    def eat(self) -> str:
        return f"{self.name} 正在吃鱼"

# ============ 泛型 ============
T = TypeVar('T', bound=Animal)

class AnimalShelter(Generic[T]):
    """动物收容所（泛型类）"""

    def __init__(self):
        self._animals: List[T] = []

    def add_animal(self, animal: T) -> None:
        self._animals.append(animal)

    def get_all(self) -> List[T]:
        return self._animals.copy()

    def find_by_name(self, name: str) -> Optional[T]:
        """类型守卫示例"""
        for animal in self._animals:
            if animal.name == name:
                return animal
        return None

    # 实现容器协议
    def __len__(self) -> int:
        return len(self._animals)

    def __getitem__(self, index: int) -> T:
        return self._animals[index]

# ============ 上下文管理器 ============
@contextmanager
def animal_activity(animal: Animal):
    """上下文管理器装饰器"""
    print(f"=== {animal.name} 开始活动 ===")
    try:
        yield animal
    finally:
        print(f"=== {animal.name} 结束活动 ===")

# ============ 异常处理 ============
class AnimalException(Exception):
    """自定义异常"""
    pass

class AgeException(AnimalException):
    """年龄异常"""
    pass

def validate_animal(animal: Animal) -> None:
    """验证动物信息"""
    if animal.age < 0:
        raise AgeException(f"年龄不能为负数: {animal.age}")
    if not animal.name:
        raise AnimalException("名字不能为空")

# ============ 装饰器应用 ============
@singleton
class AnimalRegistry:
    """动物注册表（单例模式）"""

    def __init__(self):
        self.registry: Dict[str, Animal] = {}

    @log_calls
    def register(self, animal: Animal) -> None:
        """注册动物"""
        self.registry[animal.name] = animal

    def get(self, name: str) -> Optional[Animal]:
        return self.registry.get(name)

# ============ 测试代码 ============
def main():
    print("=== Python 核心概念综合演示 ===\n")

    # 1. 类和对象
    print("1. 类和对象、封装:")
    dog = Dog("旺财", 3, "金毛")
    duck = Duck("唐老鸭", 2)
    print(f"创建了: {dog}")
    print(f"创建了: {duck}")

    # 2. Property（属性）
    print("\n2. Property 装饰器:")
    print(f"狗的名字: {dog.name}")
    dog.name = "大黄"  # 使用 setter
    print(f"修改后的名字: {dog.name}")

    # 3. 抽象方法和多态
    print("\n3. 抽象方法和多态:")
    animals: List[Animal] = [dog, duck]
    for animal in animals:
        print(f"{animal.make_sound()}")
        print(f"{animal.eat()}")

    # 4. 类方法和静态方法
    print("\n4. 类方法和静态方法:")
    print(f"动物总数: {Animal.get_animal_count()}")
    print(f"年龄 5 是否有效: {Animal.is_valid_age(5)}")

    # 5. 魔术方法
    print("\n5. 魔术方法:")
    dog2 = Dog("旺财", 3, "金毛")
    print(f"dog == dog2: {dog == dog2}")
    print(f"字符串表示: {str(dog)}")
    print(f"repr 表示: {repr(dog)}")

    # 6. 类型注解和泛型
    print("\n6. 泛型:")
    shelter: AnimalShelter[Dog] = AnimalShelter()
    shelter.add_animal(dog)
    shelter.add_animal(Dog("小黑", 2, "哈士奇"))
    print(f"收容所有 {len(shelter)} 只狗")
    print(f"第一只: {shelter[0]}")

    # 7. 多重继承
    print("\n7. 多重继承 (Mixin):")
    print(f"{duck.fly()}")
    print(f"{duck.swim()}")

    # 8. 协议（Protocol）
    print("\n8. 协议 (Protocol):")
    def make_it_move(obj: Movable) -> None:
        print(obj.move())
    make_it_move(dog)

    # 9. 生成器
    print("\n9. 生成器:")
    for bark in dog.count_barks(3):
        print(bark)

    # 10. 迭代器
    print("\n10. 迭代器:")
    for bark in dog:
        print(bark)

    # 11. 上下文管理器
    print("\n11. 上下文管理器:")
    with animal_activity(dog):
        print(f"{dog.name} 在活动中")

    # 使用 __enter__ 和 __exit__
    with duck:
        print(f"{duck.name} 在活动中")

    # 12. 异常处理
    print("\n12. 异常处理:")
    try:
        bad_dog = Dog("", -1, "unknown")
        validate_animal(bad_dog)
    except AgeException as e:
        print(f"捕获年龄异常: {e}")
    except AnimalException as e:
        print(f"捕获动物异常: {e}")
    finally:
        print("异常处理完成")

    # 13. 装饰器
    print("\n13. 装饰器:")
    registry1 = AnimalRegistry()
    registry2 = AnimalRegistry()
    print(f"单例模式: registry1 is registry2: {registry1 is registry2}")
    registry1.register(dog)  # 使用 @log_calls 装饰器

    # 14. Lambda 和高阶函数
    print("\n14. Lambda 和高阶函数:")
    animals_list = [Dog("A", 5, "a"), Dog("B", 2, "b"), Dog("C", 8, "c")]
    # 使用 lambda 排序
    sorted_animals = sorted(animals_list, key=lambda x: x.age)
    print(f"按年龄排序: {[f'{a.name}({a.age})' for a in sorted_animals]}")

    # 使用 filter
    young_animals = list(filter(lambda x: x.age < 5, animals_list))
    print(f"年轻动物: {[a.name for a in young_animals]}")

    # 使用 map
    names = list(map(lambda x: x.name.upper(), animals_list))
    print(f"大写名字: {names}")

    # 15. 列表推导式、字典推导式、集合推导式
    print("\n15. 推导式:")
    # 列表推导式
    ages = [a.age for a in animals_list]
    print(f"年龄列表: {ages}")

    # 字典推导式
    animal_dict = {a.name: a.age for a in animals_list}
    print(f"动物字典: {animal_dict}")

    # 集合推导式
    unique_ages = {a.age for a in animals_list}
    print(f"唯一年龄: {unique_ages}")

    # 16. 函数作为一等公民
    print("\n16. 函数作为一等公民:")
    def apply_to_animal(animal: Animal, func):
        return func(animal)

    result = apply_to_animal(dog, lambda a: f"{a.name} 很可爱")
    print(result)

    # 17. 闭包
    print("\n17. 闭包:")
    def make_multiplier(factor: int):
        def multiplier(x: int) -> int:
            return x * factor
        return multiplier

    times_three = make_multiplier(3)
    print(f"3 * 4 = {times_three(4)}")

    print("\n=== 演示完成 ===")

if __name__ == "__main__":
    main()

这个示例代码包含了以下所有 Python 核心概念：

类和对象：Animal、Dog、Duck 等类的定义
继承：Dog/Duck 继承 Animal
多重继承：Duck 同时继承 Flyable 和 Swimmable (Mixin 模式)
抽象基类 (ABC)：Animal 抽象类和抽象方法
封装：私有属性 (_name, __secret) 和 property
多态：不同子类实现相同的抽象方法
Property 装饰器：@property、@name.setter
类方法：@classmethod
静态方法：@staticmethod
魔术方法：init, str, repr, eq, lt, len, getitem 等
装饰器：函数装饰器、类装饰器、@functools.wraps
类型注解：使用 typing 模块进行类型标注
泛型：Generic[T] 和 TypeVar
协议 (Protocol)：Movable 协议（结构化子类型）
上下文管理器：enter/exit 和 @contextmanager
生成器：yield 关键字
迭代器：iter 和 next 协议
异常处理：自定义异常、try/except/finally
Lambda 表达式：匿名函数
高阶函数：map、filter、sorted
列表推导式：[x for x in …]
字典推导式：{k: v for …}
集合推导式：{x for x in …}
闭包：函数内定义函数
函数作为一等公民：函数可以作为参数传递

并发编程( Concurrent Programming )

多线程编程( Threading )

Python 提供了 threading 模块来支持多线程编程，适用于 I/O 密集型任务。

基础线程使用

import threading
import time

def worker(name, delay):
    """工作线程函数"""
    print(f"线程 {name} 开始执行")
    time.sleep(delay)
    print(f"线程 {name} 执行完毕")

# 创建线程
thread1 = threading.Thread(target=worker, args=("A", 2))
thread2 = threading.Thread(target=worker, args=("B", 1))

# 启动线程
thread1.start()
thread2.start()

# 等待线程完成
thread1.join()
thread2.join()

print("所有线程执行完毕")

使用类创建线程

import threading

class WorkerThread(threading.Thread):
    def __init__(self, name, count):
        threading.Thread.__init__(self)
        self.name = name
        self.count = count

    def run(self):
        print(f"线程 {self.name} 启动")
        for i in range(self.count):
            print(f"{self.name}: {i}")
        print(f"线程 {self.name} 结束")

# 创建并启动线程
t1 = WorkerThread("Thread-1", 5)
t2 = WorkerThread("Thread-2", 3)

t1.start()
t2.start()

t1.join()
t2.join()

线程锁( Thread Lock )

当多个线程访问共享资源时，需要使用锁来避免竞态条件：

import threading

counter = 0
lock = threading.Lock()

def increment():
    global counter
    for _ in range(100000):
        with lock:  # 使用 with 语句自动获取和释放锁
            counter += 1

threads = []
for i in range(10):
    t = threading.Thread(target=increment)
    threads.append(t)
    t.start()

for t in threads:
    t.join()

print(f"最终计数: {counter}")  # 输出: 1000000

线程池( ThreadPoolExecutor )

使用 concurrent.futures 模块可以更方便地管理线程池：

from concurrent.futures import ThreadPoolExecutor, as_completed
import time

def task(n):
    """模拟耗时任务"""
    time.sleep(1)
    return n * n

# 创建线程池
with ThreadPoolExecutor(max_workers=5) as executor:
    # 提交任务
    futures = [executor.submit(task, i) for i in range(10)]

    # 获取结果
    for future in as_completed(futures):
        result = future.result()
        print(f"结果: {result}")

GIL 限制

需要注意的是，Python 的全局解释器锁(GIL)限制了多线程在 CPU 密集型任务中的性能。对于 CPU 密集型任务，建议使用 multiprocessing 模块：

from multiprocessing import Pool

def cpu_intensive_task(n):
    """CPU 密集型任务"""
    return sum(i * i for i in range(n))

if __name__ == '__main__':
    with Pool(processes=4) as pool:
        results = pool.map(cpu_intensive_task, [1000000, 2000000, 3000000])
        print(results)

异步编程与协程( Async/Await and Coroutines )

Python 3.5+ 引入了 async/await 语法，使异步编程更加直观。异步编程特别适合 I/O 密集型任务，如网络请求、文件操作等。

基础异步函数

import asyncio

async def say_hello(name, delay):
    """异步函数示例"""
    print(f"Hello {name}!")
    await asyncio.sleep(delay)  # 异步等待
    print(f"Goodbye {name}!")

# 运行异步函数
asyncio.run(say_hello("World", 1))

并发执行多个协程

import asyncio
import time

async def fetch_data(id, delay):
    """模拟异步获取数据"""
    print(f"开始获取数据 {id}")
    await asyncio.sleep(delay)
    print(f"数据 {id} 获取完成")
    return f"Data-{id}"

async def main():
    # 并发执行多个协程
    tasks = [
        fetch_data(1, 2),
        fetch_data(2, 1),
        fetch_data(3, 3)
    ]

    # gather 会等待所有任务完成
    results = await asyncio.gather(*tasks)
    print(f"所有结果: {results}")

# 运行
start = time.time()
asyncio.run(main())
print(f"总耗时: {time.time() - start:.2f}秒")  # 约 3 秒，而非 6 秒

使用 asyncio.create_task

import asyncio

async def worker(name, work_queue):
    """异步工作者"""
    while not work_queue.empty():
        work = await work_queue.get()
        print(f"{name} 正在处理: {work}")
        await asyncio.sleep(1)
        print(f"{name} 完成: {work}")
        work_queue.task_done()

async def main():
    # 创建队列
    queue = asyncio.Queue()

    # 添加任务
    for i in range(5):
        await queue.put(f"Task-{i}")

    # 创建多个工作者
    workers = [
        asyncio.create_task(worker(f"Worker-{i}", queue))
        for i in range(3)
    ]

    # 等待队列处理完成
    await queue.join()

    # 取消工作者
    for w in workers:
        w.cancel()

asyncio.run(main())

异步上下文管理器

import asyncio

class AsyncResource:
    async def __aenter__(self):
        print("获取资源")
        await asyncio.sleep(1)
        return self

    async def __aexit__(self, exc_type, exc_val, exc_tb):
        print("释放资源")
        await asyncio.sleep(1)

    async def do_work(self):
        print("执行工作")

async def main():
    async with AsyncResource() as resource:
        await resource.do_work()

asyncio.run(main())

异步生成器

import asyncio

async def async_range(count):
    """异步生成器"""
    for i in range(count):
        await asyncio.sleep(0.5)
        yield i

async def main():
    async for num in async_range(5):
        print(f"收到: {num}")

asyncio.run(main())

异步与多线程对比

import asyncio
import time
from concurrent.futures import ThreadPoolExecutor

# 异步方式
async def async_io_task(n):
    await asyncio.sleep(1)
    return n * 2

async def run_async():
    tasks = [async_io_task(i) for i in range(10)]
    return await asyncio.gather(*tasks)

# 多线程方式
def thread_io_task(n):
    time.sleep(1)
    return n * 2

def run_threaded():
    with ThreadPoolExecutor(max_workers=10) as executor:
        return list(executor.map(thread_io_task, range(10)))

# 比较
print("异步方式:")
start = time.time()
asyncio.run(run_async())
print(f"耗时: {time.time() - start:.2f}秒")

print("\n多线程方式:")
start = time.time()
run_threaded()
print(f"耗时: {time.time() - start:.2f}秒")

性能对比：

异步编程：适合大量 I/O 操作，单线程，内存占用少，上下文切换开销小
多线程：适合 I/O 密集型任务，但线程创建和切换有开销
多进程：适合 CPU 密集型任务，可以绕过 GIL

爬虫实战：HTTP 库对比( Web Scraping: HTTP Libraries Comparison )

在 Python 爬虫开发中，有多个 HTTP 库可选。下面对比几个常用库的特点和使用场景。

requests - 最流行的同步 HTTP 库

requests 是 Python 中最受欢迎的 HTTP 库，API 简洁易用。

import requests
from bs4 import BeautifulSoup

# 基础 GET 请求
response = requests.get('https://api.github.com/users/github')
data = response.json()
print(f"Status: {response.status_code}")
print(f"User: {data['name']}")

# POST 请求
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post('https://httpbin.org/post', json=payload)

# 会话保持（自动处理 cookies）
session = requests.Session()
session.headers.update({'User-Agent': 'Mozilla/5.0'})

# 超时和重试
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

session = requests.Session()
retry = Retry(total=3, backoff_factor=0.3)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)

response = session.get('https://example.com', timeout=5)

优点：

API 简洁直观，学习曲线平缓
文档完善，社区支持好
自动处理 cookies、会话
内置 SSL 验证

缺点：

仅支持同步请求
对于大量并发请求性能较差

urllib3 - 底层 HTTP 库

urllib3 是 requests 的底层依赖，提供更底层的控制。

import urllib3

# 创建连接池
http = urllib3.PoolManager()

# 发送请求
response = http.request('GET', 'https://httpbin.org/get')
print(response.status)
print(response.data.decode('utf-8'))

# 自定义请求头
headers = {'User-Agent': 'Custom-Agent'}
response = http.request('GET', 'https://httpbin.org/headers', headers=headers)

# 连接池重用
response1 = http.request('GET', 'https://example.com')
response2 = http.request('GET', 'https://example.com')  # 重用连接

优点：

连接池管理，性能好
线程安全
更细粒度的控制

缺点：

API 相对底层，不如 requests 简洁
需要手动处理某些高级功能

httpx - 现代化的 HTTP 库（推荐）

httpx 是新一代 HTTP 库，支持同步和异步，API 类似 requests。

import httpx
import asyncio

# 同步使用（兼容 requests API）
response = httpx.get('https://api.github.com/users/github')
print(response.json())

# 异步使用
async def fetch_async():
    async with httpx.AsyncClient() as client:
        response = await client.get('https://api.github.com/users/github')
        return response.json()

data = asyncio.run(fetch_async())

# 并发请求
async def fetch_multiple():
    urls = [
        'https://httpbin.org/delay/1',
        'https://httpbin.org/delay/2',
        'https://httpbin.org/delay/3'
    ]

    async with httpx.AsyncClient() as client:
        tasks = [client.get(url) for url in urls]
        responses = await asyncio.gather(*tasks)
        return [r.status_code for r in responses]

results = asyncio.run(fetch_multiple())
print(results)

# HTTP/2 支持
async def http2_request():
    async with httpx.AsyncClient(http2=True) as client:
        response = await client.get('https://www.google.com')
        print(f"HTTP Version: {response.http_version}")

asyncio.run(http2_request())

优点：

同时支持同步和异步
支持 HTTP/2
API 与 requests 兼容
性能优秀，适合高并发场景
类型提示支持好

缺点：

相对较新，生态不如 requests 成熟
某些第三方库可能不兼容

BeautifulSoup4 - HTML 解析库

BeautifulSoup 用于解析 HTML/XML，常与 HTTP 库配合使用。

from bs4 import BeautifulSoup
import httpx

# 获取网页
response = httpx.get('https://news.ycombinator.com')
soup = BeautifulSoup(response.text, 'html.parser')

# 查找元素
titles = soup.find_all('span', class_='titleline')
for title in titles[:5]:
    link = title.find('a')
    if link:
        print(f"标题: {link.text}")
        print(f"链接: {link.get('href')}\n")

# CSS 选择器
items = soup.select('.athing')
for item in items[:3]:
    title = item.select_one('.titleline a')
    if title:
        print(title.text)

# 遍历 DOM 树
for tag in soup.find_all('a', limit=5):
    print(f"链接文本: {tag.text}, href: {tag.get('href', 'N/A')}")

实战示例：异步爬虫

import httpx
import asyncio
from bs4 import BeautifulSoup

async def scrape_page(client, url):
    """异步爬取单个页面"""
    try:
        response = await client.get(url, timeout=10)
        response.raise_for_status()

        soup = BeautifulSoup(response.text, 'html.parser')
        title = soup.find('title')

        return {
            'url': url,
            'title': title.text if title else 'No title',
            'status': response.status_code
        }
    except Exception as e:
        return {'url': url, 'error': str(e)}

async def main():
    urls = [
        'https://www.python.org',
        'https://www.github.com',
        'https://stackoverflow.com',
        'https://news.ycombinator.com',
    ]

    # 使用 httpx 异步客户端
    async with httpx.AsyncClient(
        headers={'User-Agent': 'Mozilla/5.0'},
        follow_redirects=True
    ) as client:
        # 并发爬取
        tasks = [scrape_page(client, url) for url in urls]
        results = await asyncio.gather(*tasks)

        for result in results:
            if 'error' in result:
                print(f"❌ {result['url']}: {result['error']}")
            else:
                print(f"✓ {result['title']}")

# 运行
asyncio.run(main())

库选择建议

使用场景	推荐库
简单的同步请求	`requests`
高并发爬虫	`httpx` (异步)
底层控制、连接池	`urllib3`
HTML/XML 解析	`BeautifulSoup4` 或 `lxml`
现代化项目	`httpx` ⭐ 推荐

当前趋势：

httpx 正在成为主流选择，特别是在需要高性能和异步支持的场景
requests 仍然是最流行的库，适合简单场景
大型爬虫项目推荐使用 httpx + asyncio 组合

Hi, Sirius

字符串( Python Strings )

格式化字符串(Formatted Strings)

String Methods

数值类型( Numbers )

运算操作( Arithmetic Operations )

内置帮助文件dir()help() ( Interactive Help )

布尔类型( Python Booleans )

datetime模块实操( Datetime Module (Dates and Times) )

条件分支(If, Then, Else in Python)

函数( Python Functions )

关键字参数

可变参数

递归函数

集合( Sets in Python )

列表( Python Lists )

2D Lists

字典( Python Dictionaries )

字典生成式

元祖( Python Tuples )

日志( Logging in Python )

递归与记忆化( Recursion, the Fibonacci Sequence and Memoization )

随机模块( Python Random Number Generator: the Random Module )

文件操作( File Operations )

处理CSV数据格式( CSV Files )

函数式编程( Functional Programming )

列表推导式( List Comprehension )

生成器和迭代器

生成器

迭代器

类和对象( Python Classes and Objects )

类的继承( Inheritance )

super().init()

模块( Modules )

引入模块

from … import * 语句

__name__属性

包( Packages )

lambda表达式和匿名函数( Lambda Expressions & Anonymous Functions )

必备函数map filter reduce操作( Map, Filter, and Reduce Functions )

map函数

filter函数

reduce函数

sort和sorted方法的排序区别( Sorting in Python )

读取和写入文本文件( Text Files )

读取文件

写入文件

pathlib库的Path类

Path.exists()

Path.mkdir()

Path.rmdir()

Path.glob()

异常处理( Exceptions in Python )

其他

zip与星号

魔法方法

类方法

装饰器

@staticmethod

@classmethod

@property

@setter

@staticmethod

@functools.wraps

@timeit

@login_required

@retry

Python 核心概念综合示例

并发编程( Concurrent Programming )

多线程编程( Threading )

基础线程使用

使用类创建线程

线程锁( Thread Lock )

线程池( ThreadPoolExecutor )

GIL 限制

异步编程与协程( Async/Await and Coroutines )

基础异步函数

并发执行多个协程

使用 asyncio.create_task

异步上下文管理器

name属性

本作品采用知识共享署名-相同方式共享 4.0 国际许可协议进行许可