lunax.data_processing ======================== This module provides utility functions for data loading, splitting, and preprocessing operations. .. py:function:: load_data(file_path: str) -> pd.DataFrame Load tabular data from a file into a DataFrame. :param file_path: Path to the data file (supports csv, parquet, xlsx, xls) :type file_path: str :return: Loaded data as DataFrame :rtype: pd.DataFrame :raises ValueError: If file format is not supported :raises Exception: If data loading fails .. py:function:: split_data(df: pd.DataFrame, target: str, test_size: float = 0.2, random_state: int = 42) -> Tuple[pd.DataFrame, pd.DataFrame, pd.Series, pd.Series] Split dataset into training and validation sets. :param df: Input DataFrame :type df: pd.DataFrame :param target: Name of the target column :type target: str :param test_size: Proportion of the dataset to include in the validation split :type test_size: float :param random_state: Random seed for reproducibility :type random_state: int :return: X_train, X_val, y_train, y_val :rtype: Tuple[pd.DataFrame, pd.DataFrame, pd.Series, pd.Series] .. py:function:: preprocess_data(df: pd.DataFrame, target: str = None, numeric_strategy: str = "mean", category_strategy: str = "most_frequent", scale_numeric: bool = True, encode_categorical: bool = True) -> pd.DataFrame Perform data preprocessing including missing value handling, encoding, and standardization. :param df: Input DataFrame :type df: pd.DataFrame :param target: Target column name (if any) :type target: str, optional :param numeric_strategy: Strategy for filling numeric missing values ('mean' or 'median') :type numeric_strategy: str :param category_strategy: Strategy for filling categorical missing values ('most_frequent') :type category_strategy: str :param scale_numeric: Whether to standardize numeric features :type scale_numeric: bool :param encode_categorical: Whether to encode categorical features :type encode_categorical: bool :return: Preprocessed DataFrame :rtype: pd.DataFrame **Features:** - Handles both numeric and categorical features - Supports missing value imputation - Performs feature scaling (standardization) - Provides label encoding for categorical variables - Preserves original data by working on a copy Example Usage -------------- .. code-block:: python from lunax.data_processing.utils import preprocess_data # Load your data df = pd.DataFrame(...) # Preprocess with default settings processed_df = preprocess_data(df, target='target_column') # Customize preprocessing processed_df = preprocess_data( df, target='target_column', numeric_strategy='median', scale_numeric=False, encode_categorical=True )