Skip to content Skip to sidebar Skip to footer

Reading In A File From Ubuntu (aws Ec2) On Local Machine?

I have a python script which I'm running on AWS (EC2 instance with Ubuntu). This python script outputs a JSON file daily, to a directory in /home/ubuntu: with open('/home/ubuntu/

Solution 1:

The problem occurs because the file does not exist on your local machine, only on the running EC2 instance. A possible solution is to upload the JSON file from EC2 instance to S3 and afterward download the JSON file to your local machine /home/ubuntu/bandsintown/sf_events.json.

First, install the AWS CLI toolkit on running EC2 instance AWS CLI and run the following commands in the terminal

aws configure
aws s3 cp /home/ubuntu/bandsintown/sf_events.json s3://mybucket/sf_events.json

Or install Python AWS SDK boto3 and upload it via python

s3 = boto3.resource('s3')

def upload_file_to_s3(s3_path, local_path):
    bucket = s3_path.split('/')[2] #bucket is always second as paths are S3://bucket/.././
    file_path = '/'.join(s3_path.split('/')[3:])
    response = s3.Object(bucket, file_path).upload_file(local_path)
    return response

s3_path = "s3://mybucket/sf_events.json"
local_path = "/home/ubuntu/bandsintown/sf_events.json"
upload_file_to_s3(s3_path, local_path)

Then on your local machine download file from s3 via AWS CLI

aws configure
aws s3 cp s3://mybucket/sf_events.json /home/ubuntu/bandsintown/sf_events.json

Or if you prefer python SDK:

s3 = boto3.resource('s3')

def download_file_from_s3(s3_path, local_path):
    bucket = s3_path.split('/')[2] #bucket is always second as paths are S3://bucket/.././
    file_path = '/'.join(s3_path.split('/')[3:])
    filename = os.path.basename(s3_path) 
    s3.Object(bucket, file_path).download_file(local_file_path)

s3_path = "s3://mybucket/sf_events.json"
local_path = "/home/ubuntu/bandsintown/sf_events.json"
download_file_from_s3(s3_path, local_path)

Or using Javascript SDK running inside of browser, but I would not recommend this because you must make your bucket public and also take care of browser compatibility issue

Solution 2:

You can use aws S3

You can run one python script on your instance which uploads the json file to s3 whenever the json gets generated and another python script on local machine where you can use (script for sqs queue and s3 download configuration) or (script which downloads the latest file uploaded to s3 bucket).

Case1:

Whenever the json file gets uploaded to S3 you will get message in the sqs queue that the file has been uploaded to s3 and then the file gets downloaded to your local machine.

Case2:

Whenever the json file gets uploaded to s3, you can run the download script which downloads the latest json file.

upload.py:

import boto3
import os
import socket

def upload_files(path):
    session = boto3.Session(
    aws_access_key_id='your access key id',
    aws_secret_access_key='your secret key id',
    region_name='region'
    )
    s3 = session.resource('s3')
    bucket = s3.Bucket('bucket name')

    for subdir, dirs, files inos.walk(path):
    for file in files:
        full_path = os.path.join(subdir, file)
        print(full_path[len(path)+0:])
        with open(full_path, 'rb') as data:
            bucket.put_object(Key=full_path[len(path)+0:], Body=data)


if __name__ == "__main__":
    upload_files('your pathwhich in your case is (/home/ubuntu/)')

your other script on local machine:

download1.py with sqs queue

import boto3
import logzero
from logzero import logger

s3_resource = boto3.resource('s3')
sqs_client=boto3.client('sqs')

### Queue URL
queue_url = 'queue url'### aws s3 bucket
bucketName = "your bucket-name"### Receive the message from SQS queue
response_message = sqs_client.receive_message(
QueueUrl=queue_url,
MaxNumberOfMessages=1,
    MessageAttributeNames=[
    'All'
],
)

message=response_message['Messages'][0]
receipt_handle = message['ReceiptHandle']
messageid=message['MessageId']
filename=message['Body']

try:
    s3_resource.Bucket(bucketName).download_file(filename,filename)
except botocore.exceptions.ClientError as e:
    if e.response['Error']['Code']=='404':
        logger.info("The object does not exist.")

    else:
        raise

logger.info("File Downloaded")

download2.py with latest file downloading from s3:

import boto3

### S3 connection
s3_resource = boto3.resource('s3')
s3_client = boto3.client('s3')

bucketName = 'your bucket-name'
response = s3_client.list_objects_v2(Bucket=bucketName)
all = response['Contents']        
latest = max(all, key=lambda x: x['LastModified'])
s3 = boto3.resource('s3')
key=latest['Key']

print("downloading file")
s3_resource.Bucket(bucketName).download_file(key,key)
print("file download")

Solution 3:

You basically need to copy a file from remote machine to your local one. The most simple way is to use scp. In the following example it just copies to your current directory. If you are on Windows, open PowerShell, if you are on Linux , scp should be installed already.

scp <username>@<your ec2 instance host or IP>:/home/ubuntu/bandsintown/sf_events.json ./

Run the command, enter your password, done. The same way you are using ssh to connect to your remote machine. (I believe your username would be ubuntu)

More advanced method would be mounting your remote directory via SSHFS. It is a little cumbersome to set up, but then you will have instant access to the remote files as if they were local.

And if you want to do it pragramatically from Python, see this question.

Solution 4:

Copying files from local to EC2

Your private key must not be publicly visible. Run the following command so that only the root user can read the file.

chmod 400 yourPublicKeyFile.pem

To copy files between your computer and your instance you can use an FTP service like FileZilla or the command scp. “scp” means “secure copy”, which can copy files between computers on a network. You can use this tool in a Terminal on a Unix/Linux/Mac system.

To use scp with a key pair use the following command:

scp -i /directory/to/abc.pem /your/local/file/to/copyuser@ec2-xx-xx-xxx-xxx.compute-1.amazonaws.com:path/to/file

You need to specify the correct Linux user. From Amazon: For Amazon Linux, the user name is ec2-user. For RHEL, the user name is ec2-user or root. For Ubuntu, the user name is ubuntu or root. For Centos, the user name is centos. For Fedora, the user name is ec2-user. For SUSE, the user name is ec2-user or root. Otherwise, if ec2-user and root don’t work, check with your AMI provider. To use it without a key pair, just omit the flag -i and type in the password of the user when prompted.

Note: You need to make sure that the user “user” has the permission to write in the target directory. In this example, if ~/path/to/file was created by user “user”, it should be fine. Copying files from EC2 to local To use scp with a key pair use the following command:

scp -i /directory/to/abc.pem user@ec2-xx-xx-xxx-xxx.compute-1.amazonaws.com:path/to/file /your/local/directory/files/to/download

Reference: Screenshot from terminal

Hack 1: While downloading file from EC2, download folder by archiving it.

zip -r squash.zip /your/ec2/directory/

Hack 2 : You can download all archived files from ec2 to just by below command.

scp -i /directory/to/abc.pem user@ec2-xx-xx-xxx-xxx.compute-1.amazonaws.com:~/* /your/local/directory/files/to/download

Solution 5:

Have you thought about using EFS for this? You can mount EFS on ec2 as well as on your local machine over a VPN or a direct connect? Can you not save the file on EFS so both sources can access it?

Hope this helps.

Post a Comment for "Reading In A File From Ubuntu (aws Ec2) On Local Machine?"