Mayx的博客

Logo

Mayx's Home Page

View My GitHub Profile

About Me

17 June 2020 - 字数统计:3822 - 阅读大约需要11分钟 - Hits: Loading...

论备份的重要性

by mayx


只有事情发生到自己头上才想到要解决

起因

今天早上发生了一件很糟糕的事情,一打开聊天软件就发现有人在说我维护的花火学园挂掉了,错误信息是无法访问源站。我觉得挺奇怪,服务器又没啥负载,我也有一段时间没登进去了,怎么服务器就挂掉了?
我试着用SSH连接,同样无法连接,我发现事情不太对劲,然后就登到了Vultr里看了看。结果发现我的服务器在00:00之后就像一个死人一样,CPU负载被拉成了一条直线,就那样保持0%的位置。我以为是因为莫名其妙的原因服务器关机了,然而我重启以后仍然没有解决问题。
登录到终端一看,No bootable device就这样显示在屏幕上,硬盘直接读不出来了,这下可不得了了,我赶紧去快照里看了一下,发现最后一次快照的时间在5月30日,也就是说如果没能恢复数据这十几天的所有信息都将消失!

难以想象的垃圾服务商:Vultr

首先我要做的事情当然是想办法先恢复服务,虽然那个快照有点早,但是先顶上再说吧……
想一想我的防护应该做的也没啥问题,而且一般成功入侵服务器的人也应该是删库然后留一条信息的那种,直接干死硬盘的我还真没见过。于是我开始发Ticket给Vultr,看看到底是怎么回事。
Vultr在我问完的4个小时后给出了最终的解决方案,把我的服务器直接重置,在上面安了新的操作系统然后给我赔了两个月的服务器费用……原文如下:

Hello,

In the past 24 hours, we sent notification of a node failure impacting your cloud server listed above.

Despite extensive efforts, our attempts to manually recover your cloud server were unsuccessful.

Our engineering team is currently deploying new instances with the same operating system and IP and you will receive login details in a separate message. You may also deploy a backup or snapshot on a new instance with a new IP if you prefer.

Our staff will be applying a two month account credit for the affected services shortly.

Regards,
Bryan M.
Systems Administrator

哇,这真是太糟糕了,作为一家云服务器商就直接把客户的数据搞没了,然后就赔2个月的费用?要知道数据无价啊,就这么不负责任的吗?简直是不可思议啊!
不过我也没什么好办法了,也许他们不重装我还想着试试SystemRescueCD试试看能不能把整个磁盘复制出来,但是他们既然已经直接重装那就彻底没救了……QAQ

亡羊补牢

既然数据已经救不回来了,那我们也只能向前看,得想办法避免以后再出现这样的问题。因为我最近在期末阶段,比较忙,所以也不经常去打快照。虽然以前也出现过服务出问题的情况,像MySQL挂了、CDN挂了、还有一次好像是交换机出问题了,但是无论如何数据从来没有丢失过。这一次数据都能丢了也真的是太糟糕了,要不是有快照,那就真成删库跑路了……
既然没时间打快照,我得想个办法搞一个自动打快照的东西。在网上搜了搜,还真有这样的脚本,于是我拿来改了改就装上去用了。

自动快照的脚本

import requests
from requests import get
import re
import json


class __RPC:
    def __init__(self, api_key, name):
        self.api_key = api_key
        self.api_info = None
        self.name = name
        self.errors = {
            200: "Function successfully executed.",
            400: "Invalid API location. Check the URL that you are using.",
            403: "Invalid or missing API key. Check that your API key is present and matches your assigned key.",
            405: "Invalid HTTP method. Check that the method (POST|GET) matches what the documentation indicates.",
            412: "Request failed. Check the response body for a more detailed description.",
            500: "Internal server error. Try again at a later time.",
            503: "Rate limit hit. API requests are limited to an average of 2/s. Try your request again later."
        }

    def api_info_initial(self):
        self.api_info = {"snapshot/create":"POST","snapshot/destroy":"POST","snapshot/list":"GET","server/list":"GET"}

    def __getattr__(self, name):
        return eval("__RPC")(self.api_key, self.name + "/" + name)

    def __call__(self, **kwargs):
        if not self.api_info:
            self.api_info_initial()
        if self.name not in self.api_info:
            raise ValueError("The API is not exists.")

        if self.api_info[self.name] == "GET":
            res = requests.get("https://api.vultr.com/v1/" + self.name, headers={"API-Key": self.api_key},
                               params=kwargs)
        elif self.api_info[self.name] == "POST":
            res = requests.post("https://api.vultr.com/v1/" + self.name, headers={"API-Key": self.api_key}, data=kwargs)

        if res.status_code == 200:
            return res.status_code, res.text.strip()
        elif res.status_code in self.errors.keys():
            return res.status_code, self.errors.get(res.status_code)
        else:
            res.raise_for_status()


class Vultr:
    def __init__(self, api_key):
        self.api_key = api_key

    def __getattr__(self, name):
        return eval("__RPC")(self.api_key, name)

vultr = Vultr("API Key")
data = {'SUBID': '实例ID'}
status_code, resp = vultr.snapshot.create(**data)
requests.post("https://sc.ftqq.com/SCKEY.send",data ={"text":"快照已创建","desp": str(status_code)+resp})
# 删除旧快照
status_code, resp = vultr.snapshot.list()  # /v1/snapshot/list
if status_code != 200:
    print('获取快照列表失败' + str(status_code) + resp)
else:
    print('成功获取到快照列表')
    data_list = list(json.loads(resp).values())
    data_list.sort(key=lambda x: x['date_created'])  # 默认时间排序,由近到远
    data_list_del = data_list[::-1][9:]  # 取超过9个之后的快照
    for data_del in data_list_del:
        data = {'SNAPSHOTID': data_del.get('SNAPSHOTID')}
        status_code, resp = vultr.snapshot.destroy(**data)  # /v1/snapshot/destroy
        if status_code != 200:
            print('删除旧快照失败')
        else:
            print('成功删除一个旧快照')

把这个脚本放到Crontab里,每天执行一次就行了。

后记

相信服务器厂商是完全靠不住的事情,自己还得想办法做好备份。我甚至在想,阿三把Intel和微软都占领了,会不会有一个阿三也跑到Vultr里,然后对着我的硬盘大喊“把你变成咖喱”之类的23333。
现在不过是权宜之计,以后还是得想办法把整个论坛下载到本地,至少能搞个数据库的差异备份啥的也行啊……

tags: 备份
召唤伊斯特瓦尔