python

Day5 leetcode Next Greater Element 下一个更大的元素

李魔佛发表了文章 • 0 个评论 • 4577 次浏览 • 2017-03-03 09:30 • 来自相关话题

You are given two arrays (without duplicates) nums1 and nums2 where nums1’s elements are subset of nums2. Find all the next greater numbers for nums1's elements in the corresponding places of nums2.

The Next Greater Number of a number x in nums1 is the first greater number to its right in nums2. If it does not exist, output -1 for this number.

Example 1:
Input: nums1 = [4,1,2], nums2 = [1,3,4,2].
Output: [-1,3,-1]
Explanation:
For number 4 in the first array, you cannot find the next greater number for it in the second array, so output -1.
For number 1 in the first array, the next greater number for it in the second array is 3.
For number 2 in the first array, there is no next greater number for it in the second array, so output -1.Example 2:
Input: nums1 = [2,4], nums2 = [1,2,3,4].
Output: [3,-1]
Explanation:
For number 2 in the first array, the next greater number for it in the second array is 3.
For number 4 in the first array, there is no next greater number for it in the second array, so
Note:

All elements in nums1 and nums2 are unique.
The length of both nums1 and nums2 would not exceed 1000.

中文解释下：
有2个数组（列表） num1,和num2,

nums1 = [4,1,2], nums2 = [1,3,4,2].

nums1是num2的子集

然后在nums1中每个元素，在num2中找到第一个比它大的元素，比如nums1中第一个是4，在nums2中没有比4更大的，所以返回的是-1，nums第二个是1，在nums2中第一个比1大的是3，所以返回的是3，第三个的是2，nums2中第一个比2大的数是3，所以返回的是3
所以上面的结果需要返回：
[-1,3,3] 查看全部

You are given two arrays (without duplicates) nums1 and nums2 where nums1’s elements are subset of nums2. Find all the next greater numbers for nums1's elements in the corresponding places of nums2.

The Next Greater Number of a number x in nums1 is the first greater number to its right in nums2. If it does not exist, output -1 for this number.

Example 1:

Input: nums1 = [4,1,2], nums2 = [1,3,4,2].

Output: [-1,3,-1]

Explanation:

    For number 4 in the first array, you cannot find the next greater number for it in the second array, so output -1.

    For number 1 in the first array, the next greater number for it in the second array is 3.

    For number 2 in the first array, there is no next greater number for it in the second array, so output -1.

Example 2:

Input: nums1 = [2,4], nums2 = [1,2,3,4].

Output: [3,-1]

Explanation:

    For number 2 in the first array, the next greater number for it in the second array is 3.

    For number 4 in the first array, there is no next greater number for it in the second array, so

Note:

All elements in nums1 and nums2 are unique.
The length of both nums1 and nums2 would not exceed 1000.

中文解释下：
有2个数组（列表） num1,和num2,

nums1 = [4,1,2], nums2 = [1,3,4,2].

nums1是num2的子集

然后在nums1中每个元素，在num2中找到第一个比它大的元素，比如nums1中第一个是4，在nums2中没有比4更大的，所以返回的是-1，nums第二个是1，在nums2中第一个比1大的是3，所以返回的是3，第三个的是2，nums2中第一个比2大的数是3，所以返回的是3
所以上面的结果需要返回：
[-1,3,3]

leetcode Day4 Keyboard Row 键盘中的行

李魔佛发表了文章 • 0 个评论 • 5350 次浏览 • 2017-02-28 22:21 • 来自相关话题

Given a List of words, return the words that can be typed using letters of alphabet on only one row's of American keyboard like the image below.

Example 1:
Input: ["Hello", "Alaska", "Dad", "Peace"]
Output: ["Alaska", "Dad"]Note:

You may use one character in the keyboard more than once.
You may assume the input string will only contain letters of alphabet.

中文解释：
输入一个字符串列表，如果这个字符串的字母在键盘上的位置为同一行，就输出这个字符串，否则不输出。

def findWords( words):
"""
:type words: List[str]
:rtype: List[str]
"""

kb={'q':['q','w','e','r','t','y','u','i','o','p'],
'a':['a','s','d','f','g','h','j','k','l'],
'z':['z','x','c','v','b','n','m']}

rList=[]
qRow= kb['q']
aRow= kb['a']
zRow= kb['z']
for wi in words:
w=wi.lower()
i=0
l=len(w)

if w[0] in qRow:
row=qRow
if w[0] in aRow:
row=aRow
if w[0] in zRow:
row=zRow

#row=kb[w[0]]
for i in range(len(w)):
if w[i] not in row:
break
else:
if i==l-1:
rList.append(wi)
return rList
解释：
首先将字母转换为统一的小写字母，然后根据首字母确定该字符串会属于键盘上的哪一列，因为键盘上只有3列，分别为q，a，z行，如果确定属于q行，接下来将剩下的字符一直在q行内循环，一旦遇到不在q行，就退出这一次的循环，进行下一个字符串的判断。
查看全部

Given a List of words, return the words that can be typed using letters of alphabet on only one row's of American keyboard like the image below.

Example 1:

Input: ["Hello", "Alaska", "Dad", "Peace"]

Output: ["Alaska", "Dad"]

Note:

You may use one character in the keyboard more than once.
You may assume the input string will only contain letters of alphabet.

中文解释：
输入一个字符串列表，如果这个字符串的字母在键盘上的位置为同一行，就输出这个字符串，否则不输出。

    def findWords( words):

        """

        :type words: List[str]

        :rtype: List[str]

        """

        

        kb={'q':['q','w','e','r','t','y','u','i','o','p'],

            'a':['a','s','d','f','g','h','j','k','l'],

        'z':['z','x','c','v','b','n','m']}

    

        rList=[]

        qRow= kb['q']

        aRow= kb['a']

        zRow= kb['z']

        for wi in words:

            w=wi.lower()

            i=0

            l=len(w)

    

            if w[0] in qRow:

                row=qRow

            if w[0] in aRow:

                row=aRow

            if w[0] in zRow:

                row=zRow

    

            #row=kb[w[0]]

            for i in range(len(w)):

                if w[i] not in row:

                    break

                else:

                    if i==l-1:

                        rList.append(wi)

        return rList

解释：
首先将字母转换为统一的小写字母，然后根据首字母确定该字符串会属于键盘上的哪一列，因为键盘上只有3列，分别为q，a，z行，如果确定属于q行，接下来将剩下的字符一直在q行内循环，一旦遇到不在q行，就退出这一次的循环，进行下一个字符串的判断。

leetcode Day3 complement number 【补码】

李魔佛发表了文章 • 0 个评论 • 3916 次浏览 • 2017-02-27 21:47 • 来自相关话题

Given a positive integer, output its complement number. The complement strategy is to flip the bits of its binary representation.

Note:

The given integer is guaranteed to fit within the range of a 32-bit signed integer.
You could assume no leading zero bit in the integer’s binary representation.

Example 1:

Input: 5
Output: 2
Explanation: The binary representation of 5 is 101 (no leading zero bits), and its complement is 010. So you need to output 2.
Input: 1
Output: 0
Explanation: The binary representation of 1 is 1 (no leading zero bits), and its complement is 0. So you need to output 0.
解决方法：
class Solution(object):
def findComplement(self, num):
"""
:type num: int
:rtype: int
"""
result=[]
while num!=0:
remainder=num%2
num=num/2
result.append(remainder)
l=len(result)

for i in range(l/2):
temp=result[i]
result[i]=result[l-i-1]
result[l-i-1]=temp
lam=lambda x:abs(x-1)

for i in range(l):
result[i]=lam(result[i])

#print result
sum=0
for i in range(l):
sum=sum+result[l-i-1]*pow(2,i)
return sum
PS：上面的方法是第一次开始做的最原始的方法，巨傻无比。完全就是一个没学过计算机原理或者微机原理的人的代码哈。连一个数的补码这种计算机第一节课的内容都还给老师了。。

附上最简便的一个解法：
class Solution(object):
def findComplement(self, num):
i = 1
while i <= num:
i = i << 1
return (i - 1) ^ num
以为其实一个补码就是一个数与另外一个全1的数进行异或处理哈。
鄙视自己！！！

查看全部

Given a positive integer, output its complement number. The complement strategy is to flip the bits of its binary representation.

Note:

The given integer is guaranteed to fit within the range of a 32-bit signed integer.
You could assume no leading zero bit in the integer’s binary representation.

Example 1:

Input: 5

Output: 2

Explanation: The binary representation of 5 is 101 (no leading zero bits), and its complement is 010. So you need to output 2.

Input: 1

Output: 0

Explanation: The binary representation of 1 is 1 (no leading zero bits), and its complement is 0. So you need to output 0.

解决方法：

class Solution(object):

    def findComplement(self, num):

        """

        :type num: int

        :rtype: int

        """

        result=[]

        while num!=0:

            remainder=num%2

            num=num/2

            result.append(remainder)

        l=len(result)



        for i in range(l/2):

            temp=result[i]

            result[i]=result[l-i-1]

            result[l-i-1]=temp

        lam=lambda x:abs(x-1)



        for i in range(l):

            result[i]=lam(result[i])



        #print result

        sum=0

        for i in range(l):

            sum=sum+result[l-i-1]*pow(2,i)

        return sum

PS：上面的方法是第一次开始做的最原始的方法，巨傻无比。完全就是一个没学过计算机原理或者微机原理的人的代码哈。连一个数的补码这种计算机第一节课的内容都还给老师了。。

附上最简便的一个解法：

class Solution(object):

    def findComplement(self, num):

        i = 1

        while i <= num:

            i = i << 1

        return (i - 1) ^ num

以为其实一个补码就是一个数与另外一个全1的数进行异或处理哈。
鄙视自己！！！

leetcode Day2 Hamming Distance 海明距离

李魔佛发表了文章 • 0 个评论 • 4379 次浏览 • 2017-02-26 21:57 • 来自相关话题

The Hamming distance between two integers is the number of positions at which the corresponding bits are different.

Given two integers x and y, calculate the Hamming distance.

Note:
0 ≤ x, y < 231.

Example:

Input: x = 1, y = 4

Output: 2

Explanation:
1 (0 0 0 1)
4 (0 1 0 0)
↑ ↑

The above arrows point to positions where the corresponding bits are different.

中文的大概意思是，两个数取异或，然后看异或后，1的个数，也就是二进制数中不同位的个数。这编码常用于信道编码。

python中异或用符号^, 然后python自带一个函数bin(),可以把一个数转换为二进制。
转为为二进制后，再用一个循环计算1出现的个数。

上代码：

x=10
y=20
z=x^y
#solution 1
s=bin(z)
print type(s)
distance=0
for i in s[2:]:
if i=='1':
distance+=1
print "distance is %d" %distance 查看全部

The Hamming distance between two integers is the number of positions at which the corresponding bits are different.

Given two integers x and y, calculate the Hamming distance.

Note:
0 ≤ x, y < 231.

Example:

Input: x = 1, y = 4



Output: 2



Explanation:

1   (0 0 0 1)

4   (0 1 0 0)

       ↑   ↑



The above arrows point to positions where the corresponding bits are different.

中文的大概意思是，两个数取异或，然后看异或后，1的个数，也就是二进制数中不同位的个数。这编码常用于信道编码。

python中异或用符号^, 然后python自带一个函数bin(),可以把一个数转换为二进制。
转为为二进制后，再用一个循环计算1出现的个数。

上代码：

x=10

y=20

z=x^y

#solution 1

s=bin(z)

print type(s)

distance=0

for i in s[2:]:

    if i=='1':

        distance+=1

print "distance is %d" %distance

leetcode Day1 Two Sum 两数之和

李魔佛发表了文章 • 0 个评论 • 4324 次浏览 • 2017-02-25 18:39 • 来自相关话题

题目：
Given an array of integers, return indices of the two numbers such that they add up to a specific target.

You may assume that each input would have exactly one solution, and you may not use the same element twice.

Example:
Given
nums = [2, 7, 11, 15], target = 9,
Because nums[0] + nums[1] = 2 + 7 = 9,
return [0, 1].

中文意思就是，给定输入一个列表和一个目标数，输出的是两个数的下标，这个数的对应的值相加，等于目标数。

上代码：

def twoSum(nums, target):
"""
:type nums: List[int]
:type target: int
:rtype: List[int]
"""
indics=[]
for i in range(len(nums)-1):
#使用两次循环，类似于冒泡算法，尝试每个数和另外一个数的和，枚举
for j in range(i+1,len(nums)):
if target== nums[i]+nums[j]:
indics.append(i)
indics.append(j)
return indics
查看全部

题目：
Given an array of integers, return indices of the two numbers such that they add up to a specific target.

You may assume that each input would have exactly one solution, and you may not use the same element twice.

Example:
Given
nums = [2, 7, 11, 15], target = 9,
Because nums[0] + nums[1] = 2 + 7 = 9,
return [0, 1].

中文意思就是，给定输入一个列表和一个目标数，输出的是两个数的下标，这个数的对应的值相加，等于目标数。

上代码：



    def twoSum(nums, target):

        """

        :type nums: List[int]

        :type target: int

        :rtype: List[int]

        """

        indics=[]

        for i in range(len(nums)-1):

        #使用两次循环，类似于冒泡算法，尝试每个数和另外一个数的和，枚举

            for j in range(i+1,len(nums)):

                if target== nums[i]+nums[j]:

                    indics.append(i)

                    indics.append(j)

                    return indics

python 遍历文件夹删除 txt后缀的文件（或者其他符合规律的文件）

李魔佛发表了文章 • 0 个评论 • 5062 次浏览 • 2017-02-19 21:23 • 来自相关话题

使用cabilre生产的电子书文件目录下，会有一个txt和aw3格式的两个文件，但是放入电子书kindle中只需要一个aw3就好了，不然重复的文件会在kindle上显示2本一样的书

代码如下：
# -*-coding=utf-8-*-
__author__ = 'Rocky'
import os,re

cwd=os.getcwd()
p=re.compile('\.txt')
print cwd
for dirpath, dirname,filename in os.walk(cwd):
#print dirpath,dirname,filename
#print dirpath
print dirname
print type(filename)
if filename is not None:
for i in filename:
#if filename is not None:

if p.search(i):
os.remove(os.path.join(dirpath,i)) 查看全部

使用cabilre生产的电子书文件目录下，会有一个txt和aw3格式的两个文件，但是放入电子书kindle中只需要一个aw3就好了，不然重复的文件会在kindle上显示2本一样的书

代码如下：

# -*-coding=utf-8-*-

__author__ = 'Rocky'

import os,re



cwd=os.getcwd()

p=re.compile('\.txt')

print cwd

for dirpath, dirname,filename in os.walk(cwd):

    #print dirpath,dirname,filename

    #print dirpath

    print dirname

    print type(filename)

    if filename is not None:

        for i in filename:

        #if filename is not None:



            if p.search(i):

                os.remove(os.path.join(dirpath,i))

在学习装饰器的过程中遇到的奇怪的输出

李魔佛发起了问题 • 1 人关注 • 0 个回复 • 5459 次浏览 • 2017-02-09 18:56 • 来自相关话题

微信自动回复微信小助手

李魔佛发表了文章 • 0 个评论 • 6599 次浏览 • 2017-02-04 15:30 • 来自相关话题

无意中发现itchat这个库，python太牛了，只有想不到，没有做不到哈。

用法很简单。

#-*-coding=utf-8-*-
import itchat

@itchat.msg_register(itchat.content.TEXT)
def text_reply(msg):
reply_msg=u'新年快乐! 我是xxx的小秘书,你的消息已收到,主人正忙,稍后会回复你哦~'
return reply_msg

itchat.auto_login(hotReload=True)
itchat.run()
然后运行上面的python文件，用自己的微信扫码登录就可以了。
只要别人发微信给你，对方就可以收到你的自动回复的内容。

查看全部

无意中发现itchat这个库，python太牛了，只有想不到，没有做不到哈。

用法很简单。

#-*-coding=utf-8-*-

import itchat



@itchat.msg_register(itchat.content.TEXT)

def text_reply(msg):

    reply_msg=u'新年快乐! 我是xxx的小秘书,你的消息已收到,主人正忙,稍后会回复你哦~'

    return reply_msg



itchat.auto_login(hotReload=True)

itchat.run()

然后运行上面的python文件，用自己的微信扫码登录就可以了。
只要别人发微信给你，对方就可以收到你的自动回复的内容。

使用android系统对wifi密码进行枚举破解

李魔佛发表了文章 • 0 个评论 • 4557 次浏览 • 2017-01-25 12:31 • 来自相关话题

ETA 1.28
Link： https://github.com/Rockyzsu/crack_wifi_by_android

自动抢雪球红包 python代码

python爬虫 • 李魔佛发表了文章 • 0 个评论 • 17689 次浏览 • 2017-01-25 12:29 • 来自相关话题

ETA 1.30
Link https://github.com/Rockyzsu/red_bag

pyautogui 在Windows下遇到 WindowsError: [Error 5] Access is denied. 错误

李魔佛发起了问题 • 1 人关注 • 0 个回复 • 6722 次浏览 • 2017-01-16 02:03 • 来自相关话题

ubuntu python安装MySQL (MySQLdb)

李魔佛发表了文章 • 0 个评论 • 3900 次浏览 • 2016-12-29 17:53 • 来自相关话题

首先安装mysql数据库
sudo apt-get install mysql-server
设置好用户密码

然后安装pyMySQLdb，使用pip安装最方便。

pip install MySQL-python

如果遇到错误：
EnvironmentError: mysql_config not found

说明没找到配置文件，需要安装：
libmysqlclient-dev

安装命令：

sudo apt-get install libmysqlclient-dev

安装完成之后在python命令行中输入

import MySQLdb

没有出错的话就说明安装成功了。

查看全部

首先安装mysql数据库
sudo apt-get install mysql-server
设置好用户密码

然后安装pyMySQLdb，使用pip安装最方便。

pip install MySQL-python

如果遇到错误：
EnvironmentError: mysql_config not found

说明没找到配置文件，需要安装：
libmysqlclient-dev

安装命令：

sudo apt-get install libmysqlclient-dev

安装完成之后在python命令行中输入

import MySQLdb

没有出错的话就说明安装成功了。

python NoneType的判断

李魔佛发表了文章 • 0 个评论 • 6325 次浏览 • 2016-10-22 15:26 • 来自相关话题

比如在爬虫过程中
content = urllib2.urlopen("http://www.qq1.com").read()
title=bs.title.string.strip()

上面由于网址写错了，那么title的值如果为NoneType （不同于null 类型）
那么需要用的判断和null不一样

if title is None:
print "No title"

这样就可以避免 title哪里出错。
（TypeError: object of type 'NoneType' has no len()
或者
TypeError: object of type 'NoneType' has no strip()
）查看全部

比如在爬虫过程中
content = urllib2.urlopen("http://www.qq1.com").read()
title=bs.title.string.strip()

上面由于网址写错了，那么title的值如果为NoneType （不同于null 类型）
那么需要用的判断和null不一样

if title is None:
print "No title"

这样就可以避免 title哪里出错。
（TypeError: object of type 'NoneType' has no len()
或者
TypeError: object of type 'NoneType' has no strip()
）

深圳汽车摇号系统的登录验证码就是一坨垃圾学生做的

李魔佛发表了文章 • 0 个评论 • 6081 次浏览 • 2016-10-05 23:46 • 来自相关话题

每次填完一次就自动更新，时间能不能慢点呀？所以每次填入验证码都是提示错误。

网站是学生做的，漏洞百出，垃圾中的战斗机。

python中的 if name == main 语句

李魔佛发表了文章 • 0 个评论 • 3044 次浏览 • 2016-08-16 17:24 • 来自相关话题

python中的

if "__name__" == "__main__" :

不一定会在开头执行，因为前面还有语句呢。
比如：

print "Hello"
if "__name__"=="__main__":
print "Main"

这个就会打印
Hello
Mian
查看全部

python中的

if "__name__" == "__main__" :

不一定会在开头执行，因为前面还有语句呢。
比如：

print "Hello"
if "__name__"=="__main__":
print "Main"

这个就会打印
Hello
Mian

python 判断sqlite数据库中的表是否存在，不存在就创建

李魔佛发表了文章 • 0 个评论 • 32179 次浏览 • 2016-08-11 22:26 • 来自相关话题

#判断表存不存在来创建表
def create_table():

conn = sqlite3.connect(db_name)
try:
create_tb_cmd='''
CREATE TABLE IF NOT EXISTS USER
(NAME TEXT,
AGE INT,
SALARY REAL);
'''
#主要就是上面的语句
conn.execute(create_tb_cmd)
except:
print "Create table failed"
return False
insert_dt_cmd='''
INSERT INTO USER (NAME,AGE,SALARY) VALUES ("Jack",10,20.1);
'''
conn.execute(insert_dt_cmd)
conn.commit()
conn.close()
代码如上，主要就是
CREATE TABLE IF NOT EXISTS USER 查看全部

#判断表存不存在来创建表

def create_table():



    conn = sqlite3.connect(db_name)

    try:

        create_tb_cmd='''

        CREATE TABLE IF NOT EXISTS USER

        (NAME TEXT,

        AGE INT,

        SALARY REAL);

        '''

        #主要就是上面的语句

        conn.execute(create_tb_cmd)

    except:

        print "Create table failed"

        return False

    insert_dt_cmd='''

    INSERT INTO USER (NAME,AGE,SALARY) VALUES ("Jack",10,20.1);

    '''

    conn.execute(insert_dt_cmd)

    conn.commit()

    conn.close()

代码如上，主要就是
CREATE TABLE IF NOT EXISTS USER

python @classmethod 的使用场合

李魔佛发表了文章 • 0 个评论 • 13862 次浏览 • 2016-08-07 11:01 • 来自相关话题

官方的说法：
classmethod(function)
中文说明：
classmethod是用来指定一个类的方法为类方法，没有此参数指定的类的方法为实例方法，使用方法如下：class C:
@classmethod
def f(cls, arg1, arg2, ...): ...

看后之后真是一头雾水。说的啥子东西呢？？？

自己到国外的论坛看其他的例子和解释，顿时就很明朗。下面自己用例子来说明。

看下面的定义的一个时间类：class Data_test(object):
day=0
month=0
year=0
def __init__(self,year=0,month=0,day=0):
self.day=day
self.month=month
self.year=year

def out_date(self):
print "year :"
print self.year
print "month :"
print self.month
print "day :"
print self.day

t=Data_test(2016,8,1)
t.out_date()

输出： year :
2016
month :
8
day :
1
符合期望。

如果用户输入的是 "2016-8-1" 这样的字符格式，那么就需要调用Date_test 类前做一下处理：string_date='2016-8-1'
year,month,day=map(int,string_date.split('-'))
s=Data_test(year,month,day)
先把‘2016-8-1’ 分解成 year，month，day 三个变量，然后转成int，再调用Date_test(year,month,day)函数。也很符合期望。

那我可不可以把这个字符串处理的函数放到 Date_test 类当中呢？

那么@classmethod 就开始出场了class Data_test2(object):
day=0
month=0
year=0
def __init__(self,year=0,month=0,day=0):
self.day=day
self.month=month
self.year=year

@classmethod
def get_date(cls,
string_date):
#这里第一个参数是cls，表示调用当前的类名
year,month,day=map(int,string_date.split('-'))
date1=cls(year,month,day)
#返回的是一个初始化后的类
return date1

def out_date(self):
print "year :"
print self.year
print "month :"
print self.month
print "day :"
print self.day
在Date_test类里面创建一个成员函数，前面用了@classmethod装饰。它的作用就是有点像静态类，比静态类不一样的就是它可以传进来一个当前类作为第一个参数。

那么如何调用呢？r=Data_test2.get_date("2016-8-6")
r.out_date()输出：year :
2016
month :
8
day :
1
这样子等于先调用get_date（）对字符串进行处理，然后才使用Data_test的构造函数初始化。

这样的好处就是你以后重构类的时候不必要修改构造函数，只需要额外添加你要处理的函数，然后使用装饰符 @classmethod 就可以了。

本文原创
转载请注明出处：http://30daydo.com/article/89
查看全部

官方的说法：
classmethod(function)
中文说明：
classmethod是用来指定一个类的方法为类方法，没有此参数指定的类的方法为实例方法，使用方法如下：

class C:

    @classmethod

    def f(cls, arg1, arg2, ...): ...

看后之后真是一头雾水。说的啥子东西呢？？？

自己到国外的论坛看其他的例子和解释，顿时就很明朗。下面自己用例子来说明。

看下面的定义的一个时间类：

class Data_test(object):

    day=0

    month=0

    year=0

    def __init__(self,year=0,month=0,day=0):

        self.day=day

        self.month=month

        self.year=year



    def out_date(self):

        print "year :"

        print self.year

        print "month :"

        print self.month

        print "day :"

        print self.day

t=Data_test(2016,8,1)

t.out_date()

输出：

year :

2016

month :

8

day :

1

符合期望。

如果用户输入的是 "2016-8-1" 这样的字符格式，那么就需要调用Date_test 类前做一下处理：

string_date='2016-8-1'

year,month,day=map(int,string_date.split('-'))

s=Data_test(year,month,day)

先把‘2016-8-1’ 分解成 year，month，day 三个变量，然后转成int，再调用Date_test(year,month,day)函数。也很符合期望。

那我可不可以把这个字符串处理的函数放到 Date_test 类当中呢？

那么@classmethod 就开始出场了

class Data_test2(object):

    day=0

    month=0

    year=0

    def __init__(self,year=0,month=0,day=0):

        self.day=day

        self.month=month

        self.year=year



    @classmethod

    def get_date(cls,

string_date):

        #这里第一个参数是cls， 表示调用当前的类名

        year,month,day=map(int,string_date.split('-'))

        date1=cls(year,month,day)

        #返回的是一个初始化后的类

        return date1



    def out_date(self):

        print "year :"

        print self.year

        print "month :"

        print self.month

        print "day :"

        print self.day

在Date_test类里面创建一个成员函数，前面用了@classmethod装饰。它的作用就是有点像静态类，比静态类不一样的就是它可以传进来一个当前类作为第一个参数。

那么如何调用呢？

r=Data_test2.get_date("2016-8-6")

r.out_date()

输出：

year :

2016

month :

8

day :

1

这样子等于先调用get_date（）对字符串进行处理，然后才使用Data_test的构造函数初始化。

这样的好处就是你以后重构类的时候不必要修改构造函数，只需要额外添加你要处理的函数，然后使用装饰符 @classmethod 就可以了。

本文原创
转载请注明出处：http://30daydo.com/article/89

怎么segmentfault上的问题都这么入门级别的？

李魔佛发表了文章 • 0 个评论 • 2782 次浏览 • 2016-07-28 16:37 • 来自相关话题

遇到一些问题，上去segmentfault上搜索答案，以为segmentfault是中文版的stackoverflow。结果大失所望。
基本都是一些菜鸟的问题。

搜索关键字： python
出来的是

结果都是怎么安装python，选择python2还是python3 这一类的问题。着实无语。
看来在中国肯义务分享技术的人并不像国外那么多，那么慷慨。
（也有可能大神们都在忙于做项目，没空帮助小白们吧）查看全部

遇到一些问题，上去segmentfault上搜索答案，以为segmentfault是中文版的stackoverflow。结果大失所望。
基本都是一些菜鸟的问题。

搜索关键字： python
出来的是

结果都是怎么安装python，选择python2还是python3 这一类的问题。着实无语。
看来在中国肯义务分享技术的人并不像国外那么多，那么慷慨。
（也有可能大神们都在忙于做项目，没空帮助小白们吧）

使用pandas的dataframe数据进行操作的总结

李魔佛发表了文章 • 0 个评论 • 6332 次浏览 • 2016-07-17 16:47 • 来自相关话题

t = df.iloc[0]<class 'pandas.core.series.Series'>

#使用iloc后，t已经变成了一个子集。已经不再是一个dataframe数据。所以你使用 t['high'] 返回的是一个值。此时t已经没有index了，如果这个时候调用 t.index

t=df[:1]
class 'pandas.core.frame.DataFrame'>

#这是返回的是一个DataFrame的一个子集。此时你可以继续用dateFrame的一些方法进行操作。

删除dataframe中某一行

df.drop()

df的内容如下：

df.drop(df[df[u'代码']==300141.0].index,inplace=True)
print df

输出如下

记得参数inplace=True，因为默认的值为inplace=False，意思就是你不添加的话就使用Falase这个值。
这样子原来的df不会被修改，只是会返回新的修改过的df。这样的话需要用一个新变量来承接它
new_df=df.drop(df[df[u'代码']==300141.0].index)

判断DataFrame为None
if df is None:
print "None len==0"
return False
查看全部

t = df.iloc[0]<class 'pandas.core.series.Series'>

#使用iloc后，t已经变成了一个子集。已经不再是一个dataframe数据。所以你使用 t['high'] 返回的是一个值。此时t已经没有index了，如果这个时候调用 t.index

t=df[:1]
class 'pandas.core.frame.DataFrame'>

#这是返回的是一个DataFrame的一个子集。此时你可以继续用dateFrame的一些方法进行操作。

删除dataframe中某一行

df.drop()

df的内容如下：

df.drop(df[df[u'代码']==300141.0].index,inplace=True)
print df

输出如下

记得参数inplace=True，因为默认的值为inplace=False，意思就是你不添加的话就使用Falase这个值。
这样子原来的df不会被修改，只是会返回新的修改过的df。这样的话需要用一个新变量来承接它
new_df=df.drop(df[df[u'代码']==300141.0].index)

判断DataFrame为None

    if df is None:

        print "None len==0"

        return False

python 爬虫下载的图片打不开？

李魔佛发表了文章 • 0 个评论 • 7604 次浏览 • 2016-07-09 17:33 • 来自相关话题

代码如下片段

__author__ = 'rocky'
import urllib,urllib2,StringIO,gzip
url="http://image.xitek.com/photo/2 ... ot%3B
filname=url.split("/")[-1]
req=urllib2.Request(url)
resp=urllib2.urlopen(req)
content=resp.read()
#data = StringIO.StringIO(content)
#gzipper = gzip.GzipFile(fileobj=data)
#html = gzipper.read()
f=open(filname,'w')
f.write()
f.close()

运行后生成的文件打开后不显示图片。

后来调试后发现，如果要保存为图片格式，文件的读写需要用'wb'，也就是上面代码中
f=open(filname,'w') 改一下改成

f=open(filname,'wb')

就可以了。
查看全部

代码如下片段

__author__ = 'rocky'

import urllib,urllib2,StringIO,gzip

url="http://image.xitek.com/photo/2 ... ot%3B

filname=url.split("/")[-1]

req=urllib2.Request(url)

resp=urllib2.urlopen(req)

content=resp.read()

#data = StringIO.StringIO(content)

#gzipper = gzip.GzipFile(fileobj=data)

#html = gzipper.read()

f=open(filname,'w')

f.write()

f.close()

运行后生成的文件打开后不显示图片。

后来调试后发现，如果要保存为图片格式，文件的读写需要用'wb'，也就是上面代码中
f=open(filname,'w') 改一下改成

f=open(filname,'wb')

就可以了。

判断网页内容是否经过gzip压缩 python代码

李魔佛发表了文章 • 0 个评论 • 4402 次浏览 • 2016-07-09 15:10 • 来自相关话题

同一个网页某些页面会通过gzip压缩网页内容，给正常的爬虫造成一定的错误干扰。

那么可以在代码中添加一个判断，判断网页内容是否经过gzip压缩，是的话多一个处理就可以了。

python 下使用beautifulsoup还是lxml ？

李魔佛发表了文章 • 0 个评论 • 8049 次浏览 • 2016-06-29 18:29 • 来自相关话题

刚开始接触爬虫是从beautifulsoup开始的，觉得beautifulsoup很好用。然后后面又因为使用scrapy的缘故，接触到lxml。到底哪一个更加好用？

然后看了下beautifulsoup的源码，其实现原理使用的是正则表达式，而lxml使用的节点递归的技术。

Don't use BeautifulSoup, use lxml.soupparser then you're sitting on top of the power of lxml and can use the good bits of BeautifulSoup which is to deal with really broken and crappy HTML.

9down vote
In summary, lxml is positioned as a lightning-fast production-quality html and xml parser that, by the way, also includes a soupparser module to fall back on BeautifulSoup's functionality. BeautifulSoupis a one-person project, designed to save you time to quickly extract data out of poorly-formed html or xml.
lxml documentation says that both parsers have advantages and disadvantages. For this reason, lxml provides a soupparser so you can switch back and forth. Quoting,
[quote]
BeautifulSoup uses a different parsing approach. It is not a real HTML parser but uses regular expressions to dive through tag soup. It is therefore more forgiving in some cases and less good in others. It is not uncommon that lxml/libxml2 parses and fixes broken HTML better, but BeautifulSoup has superiour support for encoding detection. It very much depends on the input which parser works better.

In the end they are saying,

The downside of using this parser is that it is much slower than the HTML parser of lxml. So if performance matters, you might want to consider using soupparser only as a fallback for certain cases.

If I understand them correctly, it means that the soup parser is more robust --- it can deal with a "soup" of malformed tags by using regular expressions --- whereas lxml is more straightforward and just parses things and builds a tree as you would expect. I assume it also applies to BeautifulSoup itself, not just to the soupparser for lxml.
They also show how to benefit from BeautifulSoup's encoding detection, while still parsing quickly with lxml:[code]>>> from BeautifulSoup import UnicodeDammit

>>> def decode_html(html_string):
... converted = UnicodeDammit(html_string, isHTML=True)
... if not converted.unicode:
... raise UnicodeDecodeError(
... "Failed to detect encoding, tried [%s]",
... ', '.join(converted.triedEncodings))
... # print converted.originalEncoding
... return converted.unicode

>>> root = lxml.html.fromstring(decode_html(tag_soup))[/code]
(Same source: http://lxml.de/elementsoup.html).
In words of BeautifulSoup's creator,

That's it! Have fun! I wrote Beautiful Soup to save everybody time. Once you get used to it, you should be able to wrangle data out of poorly-designed websites in just a few minutes. Send me email if you have any comments, run into problems, or want me to know about your project that uses Beautiful Soup.[code] --Leonard[/code]

Quoted from the Beautiful Soup documentation.
I hope this is now clear. The soup is a brilliant one-person project designed to save you time to extract data out of poorly-designed websites. The goal is to save you time right now, to get the job done, not necessarily to save you time in the long term, and definitely not to optimize the performance of your software.
Also, from the lxml website,

lxml has been downloaded from the Python Package Index more than two million times and is also available directly in many package distributions, e.g. for Linux or MacOS-X.

And, from Why lxml?,

The C libraries libxml2 and libxslt have huge benefits:... Standards-compliant... Full-featured... fast. fast! FAST! ... lxml is a new Python binding for libxml2 and libxslt...

[/quote]
意思大概就是不要用Beautifulsoup，使用lxml， lxml才能让你提要到让你体会到html节点解析的速度之快。

查看全部

刚开始接触爬虫是从beautifulsoup开始的，觉得beautifulsoup很好用。然后后面又因为使用scrapy的缘故，接触到lxml。到底哪一个更加好用？

然后看了下beautifulsoup的源码，其实现原理使用的是正则表达式，而lxml使用的节点递归的技术。

Don't use BeautifulSoup, use lxml.soupparser then you're sitting on top of the power of lxml and can use the good bits of BeautifulSoup which is to deal with really broken and crappy HTML.

9down vote
In summary,
lxml
is positioned as a lightning-fast production-quality html and xml parser that, by the way, also includes a
soupparser
module to fall back on BeautifulSoup's functionality.
BeautifulSoup
is a one-person project, designed to save you time to quickly extract data out of poorly-formed html or xml.
lxml documentation says that both parsers have advantages and disadvantages. For this reason,
lxml
provides a
soupparser
so you can switch back and forth. Quoting,
[quote]
BeautifulSoup uses a different parsing approach. It is not a real HTML parser but uses regular expressions to dive through tag soup. It is therefore more forgiving in some cases and less good in others. It is not uncommon that lxml/libxml2 parses and fixes broken HTML better, but BeautifulSoup has superiour support for encoding detection. It very much depends on the input which parser works better.

In the end they are saying,

The downside of using this parser is that it is much slower than the HTML parser of lxml. So if performance matters, you might want to consider using soupparser only as a fallback for certain cases.

If I understand them correctly, it means that the soup parser is more robust --- it can deal with a "soup" of malformed tags by using regular expressions --- whereas

lxml

is more straightforward and just parses things and builds a tree as you would expect. I assume it also applies to

BeautifulSoup

itself, not just to the

soupparser

for

lxml

.
They also show how to benefit from

BeautifulSoup

's encoding detection, while still parsing quickly with

lxml

:

[code]>>> from BeautifulSoup import UnicodeDammit



>>> def decode_html(html_string):

...     converted = UnicodeDammit(html_string, isHTML=True)

...     if not converted.unicode:

...         raise UnicodeDecodeError(

...             "Failed to detect encoding, tried [%s]",

...             ', '.join(converted.triedEncodings))

...     # print converted.originalEncoding

...     return converted.unicode



>>> root = lxml.html.fromstring(decode_html(tag_soup))

[/code]
(Same source: http://lxml.de/elementsoup.html).
In words of

BeautifulSoup

's creator,

That's it! Have fun! I wrote Beautiful Soup to save everybody time. Once you get used to it, you should be able to wrangle data out of poorly-designed websites in just a few minutes. Send me email if you have any comments, run into problems, or want me to know about your project that uses Beautiful Soup.
[code] --Leonard
[/code]

Quoted from the Beautiful Soup documentation.
I hope this is now clear. The soup is a brilliant one-person project designed to save you time to extract data out of poorly-designed websites. The goal is to save you time right now, to get the job done, not necessarily to save you time in the long term, and definitely not to optimize the performance of your software.
Also, from the lxml website,

lxml has been downloaded from the Python Package Index more than two million times and is also available directly in many package distributions, e.g. for Linux or MacOS-X.

And, from Why lxml?,

The C libraries libxml2 and libxslt have huge benefits:... Standards-compliant... Full-featured... fast. fast! FAST! ... lxml is a new Python binding for libxml2 and libxslt...

[/quote]
意思大概就是不要用Beautifulsoup，使用lxml， lxml才能让你提要到让你体会到html节点解析的速度之快。

python获取列表中的最大值

李魔佛发表了文章 • 0 个评论 • 5466 次浏览 • 2016-06-29 16:35 • 来自相关话题

其实python提供了内置的max函数，直接调用即可。

list=[1,2,3,5,4,6,434,2323,333,99999]
print "max of list is ",
print max(list)
输出 99999 查看全部

其实python提供了内置的max函数，直接调用即可。

    list=[1,2,3,5,4,6,434,2323,333,99999]

    print "max of list is ",

    print max(list)

输出 99999

python使用lxml加载 html---xpath

李魔佛发表了文章 • 0 个评论 • 3173 次浏览 • 2016-06-23 22:09 • 来自相关话题

首先确定安装了lxml。
然后按照以下代码去使用

#-*-coding=utf-8-*-
__author__ = 'rocchen'
from lxml import html
from lxml import etree
import urllib2

def lxml_test():
url="http://www.caixunzz.com"
req=urllib2.Request(url=url)
resp=urllib2.urlopen(req)
#print resp.read()

tree=etree.HTML(resp.read())
href=tree.xpath('//a[@class="label"]/@href')
#print href.tag
for i in href:
#print html.tostring(i)
#print type(i)
print i

print type(href)

lxml_test()

使用urllib2读取了网页内容，然后导入到lxml，为的就是使用xpath这个方便的函数。比单纯使用beautifulsoup要方便的多。（个人认为）查看全部

首先确定安装了lxml。
然后按照以下代码去使用

#-*-coding=utf-8-*-

__author__ = 'rocchen'

from lxml import html

from lxml import etree

import urllib2



def lxml_test():

    url="http://www.caixunzz.com"

    req=urllib2.Request(url=url)

    resp=urllib2.urlopen(req)

    #print resp.read()



    tree=etree.HTML(resp.read())

    href=tree.xpath('//a[@class="label"]/@href')

    #print href.tag

    for i in href:

        #print html.tostring(i)

        #print type(i)

        print i



    print type(href)



lxml_test()

使用urllib2读取了网页内容，然后导入到lxml，为的就是使用xpath这个方便的函数。比单纯使用beautifulsoup要方便的多。（个人认为）

python中字典赋值常见错误

李魔佛发表了文章 • 0 个评论 • 4118 次浏览 • 2016-06-19 11:39 • 来自相关话题

初学Python，在学到字典时，出现了一个疑问，见下两个算例：
算例一：>>> x = { }
>>> y = x
>>> x = { 'a' : 'b' }
>>> y
>>> { }
算例二：>>> x = { }
>>> y = x
>>> x['a'] = 'b'
>>> y
>>> { 'a' : 'b' }

疑问：为什么算例一中，给x赋值后，y没变（还是空字典），而算例二中，对x进行添加项的操作后，y就会同步变化。

解答：

y = x 那么x,y 是对同一个对象的引用。
算例一
中x = { 'a' : 'b' } x引用了一个新的字典对象
所以出现你说的情况。
算例二：修改y,x 引用的同一字典，所以出现你说的情况。

可以加id(x), id(y) ,如果id() 函数的返回值相同，表示是对同一个对象的引用。

查看全部

初学Python，在学到字典时，出现了一个疑问，见下两个算例：
算例一：

>>> x = { }

>>> y = x

>>> x = { 'a' : 'b' }

>>> y

>>> { }

算例二：

>>> x = { }

>>> y = x

>>> x['a'] = 'b'

>>> y

>>> { 'a' : 'b' }

疑问：为什么算例一中，给x赋值后，y没变（还是空字典），而算例二中，对x进行添加项的操作后，y就会同步变化。

解答：

y = x 那么x,y 是对同一个对象的引用。
算例一
中x = { 'a' : 'b' } x引用了一个新的字典对象
所以出现你说的情况。
算例二：修改y,x 引用的同一字典，所以出现你说的情况。

可以加id(x), id(y) ,如果id() 函数的返回值相同，表示是对同一个对象的引用。

ubuntu12.04 安装 scrapy 爬虫模块一系列问题与解决办法

李魔佛发起了问题 • 1 人关注 • 0 个回复 • 6058 次浏览 • 2016-06-16 16:18 • 来自相关话题

subprocess popen 使用PIPE 阻塞进程，导致程序无法继续运行

李魔佛发表了文章 • 0 个评论 • 9756 次浏览 • 2016-06-12 18:31 • 来自相关话题

subprocess用于在python内部创建一个子进程，比如调用shell脚本等。

举例：p = subprocess.Popen(cmd, stdout = subprocess.PIPE, stdin = subprocess.PIPE, shell = True)
p.wait()
// hang here
print "finished"

在python的官方文档中对这个进行了解释：http://docs.python.org/2/library/subprocess.html

原因是stdout产生的内容太多，超过了系统的buffer

解决方法是使用communicate()方法。p = subprocess.Popen(cmd, stdout = subprocess.PIPE, stdin = subprocess.PIPE, shell = True)
stdout, stderr = p.communicate()
p.wait()
print "Finsih" 查看全部

subprocess用于在python内部创建一个子进程，比如调用shell脚本等。

举例：

p = subprocess.Popen(cmd, stdout = subprocess.PIPE, stdin = subprocess.PIPE, shell = True)

p.wait()

// hang here

print "finished"

在python的官方文档中对这个进行了解释：http://docs.python.org/2/library/subprocess.html

原因是stdout产生的内容太多，超过了系统的buffer

解决方法是使用communicate()方法。

p = subprocess.Popen(cmd, stdout = subprocess.PIPE, stdin = subprocess.PIPE, shell = True)

stdout, stderr = p.communicate()

p.wait()

print "Finsih"

抓取知乎日报中的大误系类文章，生成电子书推送到kindle

python爬虫 • 李魔佛发表了文章 • 0 个评论 • 10064 次浏览 • 2016-06-12 08:52 • 来自相关话题

无意中看了知乎日报的大误系列的一篇文章，之后就停不下来了，大误是虚构故事，知乎上神人虚构故事的功力要高于网络上的很多写手啊！！看的欲罢不能，不过还是那句，手机屏幕太小，连续看几个小时很疲劳，而且每次都要联网去看。

所以写了下面的python脚本，一劳永逸。脚本抓取大误从开始到现在的所有文章，并推送到你自己的kindle账号。

# -*- coding=utf-8 -*-
__author__ = 'rocky @ www.30daydo.com'
import urllib2, re, os, codecs,sys,datetime
from bs4 import BeautifulSoup
# example https://zhhrb.sinaapp.com/index.php?date=20160610
from mail_template import MailAtt
reload(sys)
sys.setdefaultencoding('utf-8')

def save2file(filename, content):
filename = filename + ".txt"
f = codecs.open(filename, 'a', encoding='utf-8')
f.write(content)
f.close()

def getPost(date_time, filter_p):
url = 'https://zhhrb.sinaapp.com/index.php?date=' + date_time
user_agent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"
header = {"User-Agent": user_agent}
req = urllib2.Request(url, headers=header)
resp = urllib2.urlopen(req)
content = resp.read()
p = re.compile('<h2 class="question-title">(.*)</h2></br></a>')
result = re.findall(p, content)
count = -1
row = -1
for i in result:
#print i
return_content = re.findall(filter_p, i)

if return_content:
row = count
break
#print return_content[0]
count = count + 1
#print row
if row == -1:
return 0
link_p = re.compile('<a href="(.*)" target="_blank" rel="nofollow">')
link_result = re.findall(link_p, content)[row + 1]
print link_result
result_req = urllib2.Request(link_result, headers=header)
result_resp = urllib2.urlopen(result_req)
#result_content= result_resp.read()
#print result_content

bs = BeautifulSoup(result_resp, "html.parser")
title = bs.title.string.strip()
#print title
filename = re.sub('[\/:*?"<>|]', '-', title)
print filename
print date_time
save2file(filename, title)
save2file(filename, "\n\n\n\n--------------------%s Detail----------------------\n\n" %date_time)

detail_content = bs.find_all('div', class_='content')

for i in detail_content:
#print i
save2file(filename,"\n\n-------------------------answer -------------------------\n\n")
for j in i.strings:

save2file(filename, j)

smtp_server = 'smtp.126.com'
from_mail = sys.argv[1]
password = sys.argv[2]
to_mail = 'xxxxx@kindle.cn'
send_kindle = MailAtt(smtp_server, from_mail, password, to_mail)
send_kindle.send_txt(filename)

def main():
sub_folder = os.path.join(os.getcwd(), "content")
if not os.path.exists(sub_folder):
os.mkdir(sub_folder)
os.chdir(sub_folder)

date_time = '20160611'
filter_p = re.compile('大误.*')
ori_day=datetime.date(datetime.date.today().year,01,01)
t=datetime.date(datetime.date.today().year,datetime.date.today().month,datetime.date.today().day)
delta=(t-ori_day).days
print delta
for i in range(delta):
day=datetime.date(datetime.date.today().year,01,01)+datetime.timedelta(i)
getPost(day.strftime("%Y%m%d"),filter_p)
#getPost(date_time, filter_p)

if __name__ == "__main__":
main()

github： https://github.com/Rockyzsu/zhihu_daily__kindle

上面的代码可以稍作修改，就可以抓取瞎扯或者深夜食堂的系列文章。

附福利：
http://pan.baidu.com/s/1kVewz59
所有的知乎日报的大误文章。（截止2016/6/12日）查看全部

无意中看了知乎日报的大误系列的一篇文章，之后就停不下来了，大误是虚构故事，知乎上神人虚构故事的功力要高于网络上的很多写手啊！！看的欲罢不能，不过还是那句，手机屏幕太小，连续看几个小时很疲劳，而且每次都要联网去看。

所以写了下面的python脚本，一劳永逸。脚本抓取大误从开始到现在的所有文章，并推送到你自己的kindle账号。

# -*- coding=utf-8 -*-

__author__ = 'rocky @ www.30daydo.com'

import urllib2, re, os, codecs,sys,datetime

from bs4 import BeautifulSoup

# example https://zhhrb.sinaapp.com/index.php?date=20160610

from mail_template import MailAtt

reload(sys)

sys.setdefaultencoding('utf-8')



def save2file(filename, content):

    filename = filename + ".txt"

    f = codecs.open(filename, 'a', encoding='utf-8')

    f.write(content)

    f.close()





def getPost(date_time, filter_p):

    url = 'https://zhhrb.sinaapp.com/index.php?date=' + date_time

    user_agent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"

    header = {"User-Agent": user_agent}

    req = urllib2.Request(url, headers=header)

    resp = urllib2.urlopen(req)

    content = resp.read()

    p = re.compile('<h2 class="question-title">(.*)</h2></br></a>')

    result = re.findall(p, content)

    count = -1

    row = -1

    for i in result:

        #print i

        return_content = re.findall(filter_p, i)



        if return_content:

            row = count

            break

            #print return_content[0]

        count = count + 1

    #print row

    if row == -1:

        return 0

    link_p = re.compile('<a href="(.*)" target="_blank" rel="nofollow">')

    link_result = re.findall(link_p, content)[row + 1]

    print link_result

    result_req = urllib2.Request(link_result, headers=header)

    result_resp = urllib2.urlopen(result_req)

    #result_content= result_resp.read()

    #print result_content



    bs = BeautifulSoup(result_resp, "html.parser")

    title = bs.title.string.strip()

    #print title

    filename = re.sub('[\/:*?"<>|]', '-', title)

    print filename

    print date_time

    save2file(filename, title)

    save2file(filename, "\n\n\n\n--------------------%s Detail----------------------\n\n" %date_time)



    detail_content = bs.find_all('div', class_='content')



    for i in detail_content:

        #print i

        save2file(filename,"\n\n-------------------------answer  -------------------------\n\n")

        for j in i.strings:



            save2file(filename, j)



    smtp_server = 'smtp.126.com'

    from_mail = sys.argv[1]

    password = sys.argv[2]

    to_mail = 'xxxxx@kindle.cn'

    send_kindle = MailAtt(smtp_server, from_mail, password, to_mail)

    send_kindle.send_txt(filename)





def main():

    sub_folder = os.path.join(os.getcwd(), "content")

    if not os.path.exists(sub_folder):

        os.mkdir(sub_folder)

    os.chdir(sub_folder)





    date_time = '20160611'

    filter_p = re.compile('大误.*')

    ori_day=datetime.date(datetime.date.today().year,01,01)

    t=datetime.date(datetime.date.today().year,datetime.date.today().month,datetime.date.today().day)

    delta=(t-ori_day).days

    print delta

    for i in range(delta):

        day=datetime.date(datetime.date.today().year,01,01)+datetime.timedelta(i)

        getPost(day.strftime("%Y%m%d"),filter_p)

    #getPost(date_time, filter_p)



if __name__ == "__main__":

    main()

github： https://github.com/Rockyzsu/zhihu_daily__kindle

上面的代码可以稍作修改，就可以抓取瞎扯或者深夜食堂的系列文章。

附福利：
http://pan.baidu.com/s/1kVewz59
所有的知乎日报的大误文章。（截止2016/6/12日）

python 爆解zip压缩文件密码

李魔佛发表了文章 • 0 个评论 • 9794 次浏览 • 2016-06-09 21:43 • 来自相关话题

出于对百度网盘的不信任，加上前阵子百度会把一些侵犯版权的文件清理掉或者一些百度认为的尺度过大的文件进行替换，留下一个4秒的教育视频。为何不提前告诉用户？擅自把用户的资料删除，以后用户哪敢随意把资料上传上去呢?

抱怨归抱怨，由于现在金山快盘，新浪尾盘都关闭了，速度稍微快点的就只有百度网盘了。所以我会把文件事先压缩好，加个密码然后上传。

可是有时候下载下来却忘记了解压密码，实在蛋疼。所以需要自己逐一验证密码。所以就写了这个小脚本。很简单，没啥技术含量。

代码就用图片吧，大家可以上机自己敲敲代码也好。 ctrl+v 代码其实会养成一种惰性。

github: https://github.com/Rockyzsu/zip_crash
查看全部

出于对百度网盘的不信任，加上前阵子百度会把一些侵犯版权的文件清理掉或者一些百度认为的尺度过大的文件进行替换，留下一个4秒的教育视频。为何不提前告诉用户？擅自把用户的资料删除，以后用户哪敢随意把资料上传上去呢?

抱怨归抱怨，由于现在金山快盘，新浪尾盘都关闭了，速度稍微快点的就只有百度网盘了。所以我会把文件事先压缩好，加个密码然后上传。

可是有时候下载下来却忘记了解压密码，实在蛋疼。所以需要自己逐一验证密码。所以就写了这个小脚本。很简单，没啥技术含量。

代码就用图片吧，大家可以上机自己敲敲代码也好。 ctrl+v 代码其实会养成一种惰性。

github: https://github.com/Rockyzsu/zip_crash

批量删除某个目录下所有子目录的指定后缀的文件

李魔佛发表了文章 • 0 个评论 • 4565 次浏览 • 2016-06-07 17:51 • 来自相关话题

平时硬盘中下载了大量的image文件，用做刷机。下载的文件是tgz格式，刷机前需要用 tar zxvf xxx.tgz 解压。
日积月累，硬盘空间告急，所以写了下面的脚本用来删除指定的解压文件，但是源解压文件不能够删除，因为后续可能会要继续用这个tgz文件的时候（需要再解压然后刷机）。如果手动去操作，需要进入每一个文件夹，然后选中tgz，然后反选，然后删除。很费劲。

import os

def isContain(des_str,ori_str):
for i in des_str:
if ori_str == i:
return True
return False

path=os.getcwd()
print path
des_str=['img','cfg','bct','bin','sh','dtb','txt','mk','pem','mk','pk8','xml','lib','pl','blob','dat']
for fpath,dirs,fname in os.walk(path):
#print fname

if fname:
for i in fname:
#print i
name=i.split('.')
if len(name)>=2:
#print name[1]
if isContain(des_str,name[1]):
filepath=os.path.join(fpath,i)
print "delete file %s" %filepath
os.remove(filepath)
github： https://github.com/Rockyzsu/RmFile
查看全部

平时硬盘中下载了大量的image文件，用做刷机。下载的文件是tgz格式，刷机前需要用 tar zxvf xxx.tgz 解压。
日积月累，硬盘空间告急，所以写了下面的脚本用来删除指定的解压文件，但是源解压文件不能够删除，因为后续可能会要继续用这个tgz文件的时候（需要再解压然后刷机）。如果手动去操作，需要进入每一个文件夹，然后选中tgz，然后反选，然后删除。很费劲。

import os



def isContain(des_str,ori_str):

	for i in des_str:

		if ori_str == i:

			return True

	return False





path=os.getcwd()

print path

des_str=['img','cfg','bct','bin','sh','dtb','txt','mk','pem','mk','pk8','xml','lib','pl','blob','dat']

for fpath,dirs,fname in os.walk(path):

	#print fname

	

	if fname:

		for i in fname:

			#print i

			name=i.split('.')

			if len(name)>=2:

				#print name[1]

				if isContain(des_str,name[1]):

					filepath=os.path.join(fpath,i)

					print "delete file %s" %filepath

					os.remove(filepath)

github： https://github.com/Rockyzsu/RmFile

通知设置新通知