header image

Python 3.7 | Python for Data Analysis 2nd Edition: Kleiner Code zum Zusammenfügen der Dateien im 'babynames' Ordner

Der Schwarze Wolf liest momentan das Buch: ‘Python for Data Analysis 2nd Edition’. Da es in solchen Büchern in der Regel Übungen gibt, liegt es auf der Pfote zumindest einige davon zu machen.

Für die Übung ‘US Baby Names 1880–2010’ aus Kapitel 14.3 hat der Schwarze Wolf einen kleinen Code geschrieben, der die verschiedenen ‘.txt’ Dateien zu einer einzigen ‘.csv’ Datei zusammenfügt und dabei einen Index sowie die Jahreszahlen in die CSV hinzufügt.

Zudem gibt es auf folgender Seite einen aktuelleren Datensatz (1880 - 2017):
https://catalog.data.gov/dataset/baby-names-from-social-security-card-applications-national-level-data/resource/fdfd2c5c-6190-4fac-9ead-ae478e0c2790

Code:

"""W01f [SchwarzerWolf.cc]
date = '2018-10-19'
version = '0.0.2

Copyright (c) 2018 by W01f [SchwarzerWolf.cc]

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

"""


import os
import csv


# constants
dir_path = './data'
csv_save_path = './america_babynames_full[1880-2017].csv'


def get_files(path):
    return sorted(os.listdir(path))


def migration(files):
    data = [('id', 'date', 'name', 'gender', 'count')]
    ix = 0

    for file in files:
        split = file.split('.')
        year = file.rstrip('.txt').lstrip('yob')
        date = year + '-12-31'

        if split[1] == 'txt':
            with open(os.path.join(dir_path, file), newline='') as csvfile:
                reader = csv.reader(csvfile, delimiter=',')
                for name, gender, count in reader:
                    ix += 1
                    data.append((ix, date, name, gender, count))

    return data


def to_csv(data):
    with open(csv_save_path, 'w', newline='') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')
        writer.writerows(data)


def main():
    files = get_files(dir_path)
    data = migration(files)
    to_csv(data)
    print('Executed')


if __name__ == '__main__':
    main()


[2018-10-19] - Der Schwarze Wolf hat den Code optimiert
[2018-02-02] - Codeüberschrift hinzugefügt

Veröffentlicht 19. Oktober 2018 05:22 von W01f