Tools: Scared of Linux as a Beginner Data Engineer? Here’s How to Get Started - Full Analysis

Tools: Scared of Linux as a Beginner Data Engineer? Here’s How to Get Started - Full Analysis

Step 1: Connect to the Server via SSH and Update It

Step 1: Connect via SSH

Step 2: Update the Server

Step 3: Create Your Own User

Step 4: Create Folders and Files

Step 5: Edit Files

Step 6: Downloading file from webpage and Managing it

Step 7: Transfer Files Between Your Local PC and Server

Summary Takeaways as a beginner If you're scared of Linux as a beginner data engineer, you're not alone. Almost everyone feels this way at the start. This year, I decided to transition from being a data analyst to a data engineer with zero Linux experience. Over the past two weeks, I’ve been learning practical Linux skills and how they apply to solving real world data problems for businesses. Here’s a summary of what I’ve learned. Firstly, Every stage of the data engineering pipeline runs on Linux servers, usually in the cloud. As a data engineer, here’s what I’ll actually use Linux for: Secondly: In real life, businesses pull data from APIs, databases, or external files daily. One has to automatically pull the data from these APIs using a Linux Server. To achieve this, one has learn how to: Below are simplified steps to achieve this. SSH (Secure Shell) allowed me to open an encrypted terminal session to a remote server. I needed two things: On Windows, you can use PowerShell or Git Bash. I was using PowerShell. SSH (Secure Shell) opens an encrypted terminal session. You need your

server's IP address and username. On Windows, use PowerShell or Git Bash: Always update your server first before doing anything else: Avoid using root regularly by creating a personal user right after setup: Now that you are logged in as your own user, organize your workspace: Use nano to write or paste your code into the file: View file contents anytime with: Now that the workspace is set up, you can bring in data files: Move files from your local machine to the server using SCP

(Secure Copy Protocol): On the server, navigate to the folder and run your script: If you’re also learning Linux for data engineering, what’s been challenging for you so far?. Drop a comment. I’d love to learn from your experience. Also, stay tuned for the next two weeks progress update. Templates let you quickly answer FAQs or store snippets for re-use. Hide child comments as well For further actions, you may consider blocking this person and/or reporting abuse

Command

Copy

$ ssh [email protected] ssh [email protected] ssh [email protected] -weight: 600;">sudo -weight: 500;">apt -weight: 500;">update # Check for updates -weight: 600;">sudo -weight: 500;">apt -weight: 500;">upgrade # Install updates pwd # See your current directory ls # List files and folders -weight: 600;">sudo -weight: 500;">apt -weight: 500;">update # Check for updates -weight: 600;">sudo -weight: 500;">apt -weight: 500;">upgrade # Install updates pwd # See your current directory ls # List files and folders -weight: 600;">sudo -weight: 500;">apt -weight: 500;">update # Check for updates -weight: 600;">sudo -weight: 500;">apt -weight: 500;">upgrade # Install updates pwd # See your current directory ls # List files and folders -weight: 600;">sudo useradd -m grace # Create user with home folder -weight: 600;">sudo passwd grace # Set password logout # Log out from root ssh [email protected] -weight: 600;">sudo useradd -m grace # Create user with home folder -weight: 600;">sudo passwd grace # Set password logout # Log out from root ssh [email protected] -weight: 600;">sudo useradd -m grace # Create user with home folder -weight: 600;">sudo passwd grace # Set password logout # Log out from root ssh [email protected] mkdir Project # Create a folder cd Project # Enter the folder touch main.py # Create a Python file mkdir data # Create a sub-folder ls # Verify folder and files mkdir Project # Create a folder cd Project # Enter the folder touch main.py # Create a Python file mkdir data # Create a sub-folder ls # Verify folder and files mkdir Project # Create a folder cd Project # Enter the folder touch main.py # Create a Python file mkdir data # Create a sub-folder ls # Verify folder and files nano main.py nano main.py nano main.py cat main.py less main.py more main.py cat main.py less main.py more main.py cat main.py less main.py more main.py -weight: 500;">wget https://example.com/data.csv # Download a file tar -xzf archive.tgz # Extract compressed files -weight: 500;">wget https://example.com/data.csv # Download a file tar -xzf archive.tgz # Extract compressed files -weight: 500;">wget https://example.com/data.csv # Download a file tar -xzf archive.tgz # Extract compressed files scp main.py [email protected]:/home/grace/from_local/ scp main.py [email protected]:/home/grace/from_local/ scp main.py [email protected]:/home/grace/from_local/ cd from_local python3 main.py cd from_local python3 main.py cd from_local python3 main.py - Setting up and managing servers: Configuring the machines where your data tools run. - Scheduling jobs: Using CRON to trigger data pipelines automatically. - Debugging failures: Connecting via SSH to investigate logs when a pipeline breaks. - Moving and managing files: Handling raw data before it lands in storage like S3. - Installing tools: Setting up Python, Spark, Airflow, and other software on a server. - Monitoring resources: Checking server memory, disk usage, and overall health. - Connect to a virtual Linux Server - Manage files on the server - The server's IP address - My username - Type yes to accept the server key. - Enter your password (it won't show). - Press Enter, and you're in! - Paste your text or code - Press Ctrl + O to save - Press Ctrl + X to exit - Every tool in a data engineering pipeline runs on a Linux server to navigate, organize, and run tasks. - SSH is your bridge between your PC and the server. - Always -weight: 500;">update your server and create a personal user before anything else. - Start small: create folders, files, and scripts, then automate tasks. - Everything you do here mirrors real world data engineering work, like managing pipelines, logs, or datasets.