header image

Python 3.7 | Python for Data Analysis 2nd Edition: Kleiner Code zum Zusammenfügen der Dateien im 'babynames' Ordner

Der Schwarze Wolf liest momentan das Buch: ‘Python for Data Analysis 2nd Edition’. Da es in solchen Büchern in der Regel Übungen gibt, liegt es auf der Pfote zumindest einige davon zu machen.

Für die Übung ‘US Baby Names 1880–2010’ aus Kapitel 14.3 hat der Schwarze Wolf einen kleinen Code geschrieben, der die verschiedenen ‘.txt’ Dateien zu einer einzigen ‘.csv’ Datei zusammenfügt und dabei einen Index sowie die Jahreszahlen in die CSV hinzufügt.

Zudem gibt es auf folgender Seite einen aktuelleren Datensatz (1880 - 2017):
https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-level-data/resource/fdfd2c5c-6190-4fac-9ead-ae478e0c2790

Code:

"""W01f [SchwarzerWolf.cc]
date = '2018-10-19'
version = '0.0.2

Copyright (c) 2018 by W01f [SchwarzerWolf.cc]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

"""


import os
import csv


# constants
dir_path = './data'
csv_save_path = './america_babynames_full[1880-2017].csv'


def get_files(path):
    return sorted(os.listdir(path))


def migration(files):
    data = [('id', 'date', 'name', 'gender', 'count')]
    ix = 0

    for file in files:
        split = file.split('.')
        year = file.rstrip('.txt').lstrip('yob')
        date = year + '-12-31'

        if split[1] == 'txt':
            with open(os.path.join(dir_path, file), newline='') as csvfile:
                reader = csv.reader(csvfile, delimiter=',')
                for name, gender, count in reader:
                    ix += 1
                    data.append((ix, date, name, gender, count))

    return data


def to_csv(data):
    with open(csv_save_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')
        writer.writerows(data)


def main():
    files = get_files(dir_path)
    data = migration(files)
    to_csv(data)
    print('Executed')


if __name__ == '__main__':
    main()


[2018-10-19] - Der Schwarze Wolf hat den Code optimiert
[2018-02-02] - Codeüberschrift hinzugefügt

Veröffentlicht: 19. Oktober 2018 05:22 von W01f

Python 3.7 | Pandas Beispiel mit Zeitserien (Time Series) und Gruppenoperationen (Group operations)

Hier mal ein kleines Beispiel, wie leicht es ist, mit Pandas Daten umzugestalten. Benutzt wird dafür ‘pivot_table’. Der W01f hat dafür die IPython Konsole benutzt.

Laden der Module

import numpy as np
import pandas as pd

from pandas.tseries.offsets import MonthEnd

Kategorien vorbereiten

cats = ['food', 'body_care', 'tickets', 'family']
cat_list = [np.random.choice(cats) for ix in range(100)]

DataFrame erstellen

df = pd.DataFrame({
    'category': cat_list,
    'amount': [np.random.randint(1, 10) for ix in range(100)]
}, index=pd.date_range('2018-01-01', periods=100))

Index umbenennen

df.index.name = 'Date'

Die ersten 10 Zeilen anzeigen

df.head(10)
Category Amount
Date
2018-01-01 body_care 8
2018-01-02 family 2
2018-01-03 tickets 5
2018-01-04 body_care 4
2018-01-05 food 4
2018-01-06 family 1
2018-01-07 family 1
2018-01-08 food 1
2018-01-09 family 1
2018-01-10 ticket 1

Gruppenoperation

df2 = df.pivot_table('amount', 
                     index=end.rollforward, 
                     columns='category', 
                     aggfunc='sum')

Resultat

df2
category body_care family food tickets
2018-01-31 36 24 39 36
2018-02-28 46 18 28 74
2018-03-31 46 42 43 17
2018-04-30 5 8 19 6

[2019-31-01] - In | Out entfernt, Einige Überschriften hinzugefügt.

Veröffentlicht: 17. Oktober 2018 05:02 von W01f

Python: Backup | Restore des Linux 'home' Verzeichnisses

Der Schwarze Wolf hat zwei kleine Skripte geschrieben um das ‘home’ Verzeichnis in Linux zu sichern und um es wiederherzustellen. Da der komplette Inhalt des ‘home’ Verzeichnisses gelöscht wird, um das Back-up wiederherzustellen, wird es in der gui vermutlich zu Problemen kommen, sofern man dort eingeloggt ist. Getestet hat das der Schwarze Wolf allerdings nicht.

Da der Schwarze Wolf nur Debian benutzt, liegt es auf der Pfote, dass es nur dort gestestet wurde. Die Pfade sind natürlich in der ‘config’ Funktion entsprechend an die eigenen Bedürfnisse anzupassen.

Backup:


#!/usr/bin/env python

"""SchwarzerWolf.cc
date = '2018-06-08'
version = '0.1.1'

The W01f hacks in Linux for Linux. Fuck Microsoft, Apple and many more.
Fuck capitalism!

Backup -> linux home directory
Copyright (C) 2018 W0lf

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.
"""


import os
import pathlib
import datetime


def config():
    """Configuration: Here all important values can be adjusted.

    :return: path, source, destination
    """

    # Determines the currently logged in user
    user = pathlib.Path.home().name

    """The code assumes that an external hard disk was previously 
    mounted under '/media/<username>/<hard_disk_name>'. This is e.g. 
    the standard in Debian Stretch. At the end is the directory that in 
    this case is called 'backup'. The syntax would be:
    '/media/<username>/<harddisk_name>/<directory>'."""
    path = '/media/{user}/secret/backup'.format(user=user)

    # Get the current date
    date = datetime.date.today().isoformat()

    # The path to the home directory of the current user
    source = str(pathlib.Path.home())

    # Merges path and date variable
    destination = os.path.join(path, date)

    return path, source, destination


def execute(path, source, destination):
    """Execute function. In this case: backup of the home directory

    :param path: from function config.
    :param source: from function config.
    :param destination: from function config.
    :return: None
    """

    """Check if the specified directory path to which the backup should 
    come is available. If True, a Linux command is executed."""
    if os.path.exists(path):
        os.system('rsync -a -v --progress {source} {destination}'.format(
            source=source,
            destination=destination
        ))


def main():
    """Main function of the script."""

    path, source, destination = config()
    execute(path, source, destination)


if __name__ == '__main__':
    main()

Restore:


#!/usr/bin/env python

"""SchwarzerWolf.cc
date = '2018-06-08'
version = '0.1.1'

The W01f hacks in Linux for Linux. Fuck Microsoft, Apple and many more.
Fuck capitalism!

Restore backup -> linux home directory
Copyright (C) 2018 W0lf

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>.
"""


import os
import pathlib


def config():
    """Configuration: Here all important values can be adjusted.

    :return: source, destination
    """

    # Determines the currently logged in user
    user = pathlib.Path.home().name

    """The code assumes that an external hard disk was previously 
    mounted under '/media/<username>/<hard_disk_name>'. This is e.g. 
    the standard in Debian Stretch. At the end is the directory that in 
    this case is called 'backup'. The syntax would be:
    '/media/<username>/<harddisk_name>/<directory>'."""
    path = '/media/{user}/secret/backup'.format(user=user)

    # Date of the backup to be restored
    date = '2018-06-08'

    # Merges path, date and user variable
    source = os.path.join(path, date, user)

    # The path to the home directory of the current user
    destination = str(pathlib.Path.home())

    return source, destination


def execute(source, destination):
    """Execute function. In this case: restore of the home directory.

    The restore should only be done when logged in a terminal, not in
    the gui. Since the complete contents of the home directory will be
    deleted to restore the backup, it will probably cause problems in
    the gui, provided that one is logged in there. However, the W01f
    did not test that.

    :param source: from function config.
    :param destination: from function config.
    :return: None
    """

    """Check if the specified directory path to which the backup should 
    come is available. When true, Linux commands are executed."""
    if os.path.exists(source):
        os.system('rm -rf {destination}/*'.format(destination=destination))
        os.system('rsync -a -v --progress {source}/ {destination}'.format(
            source=source,
            destination=destination
        ))


def main():
    """Main function of the script."""

    source, destination = config()
    execute(source, destination)


if __name__ == '__main__':
    main()

[2018-05-20] - Code -> Docstring verbessert

Veröffentlicht: 8. Juni 2018 16:58 von W01f

« Erste Vorherige Seite 3 von 3