Crecimiento de un Restaurante

Usted es el due帽o de un restaurante y desea analizar una posible expansi贸n (habr谩 al menos un cliente diario). Calcula la media m贸vil de lo que ha pagado el cliente en un periodo de siete d铆as (es decir, el d铆a actual + 6 d铆as antes). average_amount debe redondearse a dos decimales. Regrese el resultado ordenado por visited_on en orden ascendente.

#pandas#subqueries

Tabla: Customer

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| customer_id   | int     |
| name          | varchar |
| visited_on    | date    |
| amount        | int     |
+---------------+---------+

- (customer_id, visited_on) es la llave primaria.
- Esta tabla contiene informaci贸n sobre las transacciones de los clientes en un restaurante.
- visited_on es la fecha en la que el cliente con ID (customer_id) visit贸 el restaurante.
- amount es el monto total pagado por el cliente.

El formato del resultado se muestra en el siguiente ejemplo.

Ejemplo 1:

Entrada:

Tabla Customer:
+-------------+--------------+--------------+-------------+
| customer_id | name         | visited_on   | amount      |
+-------------+--------------+--------------+-------------+
| 1           | Jhon         | 2019-01-01   | 100         |
| 2           | Daniel       | 2019-01-02   | 110         |
| 3           | Jade         | 2019-01-03   | 120         |
| 4           | Khaled       | 2019-01-04   | 130         |
| 5           | Winston      | 2019-01-05   | 110         | 
| 6           | Elvis        | 2019-01-06   | 140         | 
| 7           | Anna         | 2019-01-07   | 150         |
| 8           | Maria        | 2019-01-08   | 80          |
| 9           | Jaze         | 2019-01-09   | 110         | 
| 1           | Jhon         | 2019-01-10   | 130         | 
| 3           | Jade         | 2019-01-10   | 150         | 
+-------------+--------------+--------------+-------------+

Salida:

+--------------+--------------+----------------+
| visited_on   | amount       | average_amount |
+--------------+--------------+----------------+
| 2019-01-07   | 860          | 122.86         |
| 2019-01-08   | 840          | 120            |
| 2019-01-09   | 840          | 120            |
| 2019-01-10   | 1000         | 142.86         |
+--------------+--------------+----------------+

Explicaci贸n: 
- La 1er media m贸vil del 2019-01-01 al 2019-01-07 tiene un average_amount de (100 + 110 + 120 + 130 + 110 + 140 + 150)/7 = 122.86
- La 2a media m贸vil del 2019-01-02 to 2019-01-08 tiene un average_amount de of (110 + 120 + 130 + 110 + 140 + 150 + 80)/7 = 120
- La 3ra media m贸vil del 2019-01-03 to 2019-01-09 tiene un average_amount de of (120 + 130 + 110 + 140 + 150 + 80 + 110)/7 = 120
- La 4a media m贸vil del 2019-01-04 to 2019-01-10 tiene un average_amount de of (130 + 110 + 140 + 150 + 80 + 110 + 130 + 150)/7 = 142.86

Soluci贸n

import pandas as pd


def restaurant_growth(customer: pd.DataFrame) -> pd.DataFrame:
    """More Advanced solution"""
    df = customer.groupby(['visited_on'])['amount'].sum().reset_index()
    df['amount1'] = df['amount'].rolling(window=7).sum()
    df['average_amount'] = round(df['amount1']/7, 2)
    df1 = df[['visited_on','amount1','average_amount']]
    df1 = df1[~df1['amount1'].isna()]
    df1 = df1.rename(columns={'amount1':'amount'})

    return df


def restaurant_growth_mine(customer: pd.DataFrame) -> pd.DataFrame:
    dates = pd.DataFrame({
        'visited_on': customer['visited_on'].unique()
    })
    dates['diff'] = dates['visited_on'].diff(6)
    dates.dropna(inplace=True)

    customer['cumsum'] = customer['amount'].cumsum()

    data = []
    for i, date in enumerate(dates['visited_on']):
        df = customer.loc[customer['visited_on'].eq(date)]

        item = df.loc[df.index[-1]]  # Holds largest sum till index
        diffmask = item['visited_on'] - pd.Timedelta('8D')
        cumsum = item['cumsum']
        if i > 0:
            lastidx = customer[
                customer.visited_on.between(
                    diffmask,
                    diffmask + pd.Timedelta('1D')
                )
            ].index[-1]
            cumsum -= customer.loc[lastidx]['cumsum']

        data.append([
            date, cumsum, round((cumsum / 7) + 1e-6, 2)
        ])

    cols = {
        'visited_on': 'datetime64[ns]',
        'amount': 'Int64',
        'average_amount': float
    }
    result = pd.DataFrame(
        data, columns=cols.keys()).astype(cols)
    result = result.sort_values(
        by='visited_on').reset_index(drop=True)

    return result


def test_restaurant_growth():
    data = [
        [1, 'Jhon', '2019-01-01', 100],
        [2, 'Daniel', '2019-01-02', 110],
        [3, 'Jade', '2019-01-03', 120],
        [4, 'Khaled', '2019-01-04', 130],
        [5, 'Winston', '2019-01-05', 110],
        [6, 'Elvis', '2019-01-06', 140],
        [7, 'Anna', '2019-01-07', 150],
        [8, 'Maria', '2019-01-08', 80],
        [9, 'Jaze', '2019-01-09', 110],
        [1, 'Jhon', '2019-01-10', 130],
        [3, 'Jade', '2019-01-10', 150],
    ]
    cols = {
        'customer_id': 'Int64',
        'name': 'object',
        'visited_on': 'datetime64[ns]',
        'amount': 'Int64'
    }
    customer = pd.DataFrame(
        data, columns=cols.keys()).astype(cols)

    cols = {
        'visited_on': 'datetime64[ns]',
        'amount': 'Int64',
        'average_amount': float
    }
    data = [
        ['2019-01-07', 860, 122.86],
        ['2019-01-08', 840, 120],
        ['2019-01-09', 840, 120],
        ['2019-01-10', 1000, 142.86],
    ]
    exp = pd.DataFrame(
        data, columns=cols.keys()).astype(cols)
    got = restaurant_growth(customer)
    pd.testing.assert_frame_equal(got, exp)

slackmart blog 漏 2024