New access arrangements to EMI datasets (retirement of anonymous FTP)

  • 1.1K Views
  • Last post 26 September 2019
Matthew Keir posted this 16 October 2018

The old anonymous FTP access to the EMI datasets has been replaced by REST-based access to Microsoft Azure storage.  The existing FTP access and testing sites will be disestablished on 8 November 2018. Please ensure you have adjusted any scripts you rely on before then. 

These options are being provided following the testing and feedback received here and via email. Thank you to those users who participated in testing and provided feedback.

Three methods to access EMI datasets are supported

1. Web-based access via www.emi.ea.govt.nz

You can still use the EMI website to browse content as you always have. The main menu categories have a 'datasets' option in the drop-down that lets users browse folder content and download files one at a time. Each folder will usually be supported by a short paragraph describing its content.

This method is best for ad-hoc downloads of a few files every now and again and provides the most descriptive information to support the data.

2. Access via a storage client (Azure Storage Explorer) 

If you’d like to browse through the datasets as you’ve previously done with an FTP client, you can now use Azure Storage Explorer and connect to EMI datasets using this Shared Access Signature URI: 

https://emidatasets.blob.core.windows.net/publicdata?sv=2018-03-28&si=exp2019-12-31&sr=c&sig=RgEr3fnUCRgCg%2FGc%2BYus0OJHXpWQBZvUPpIDxsOtJQE%3D

This method is best if you are used to using a client or want to manually download whole folders or a large set of files.

Figure 1: Add the URI above in the connect dialogue box

The access token included in the URI above does expire and will be updated in the future. The expiry date is denoted as part of the URI “si=exp2019-12-31” we will update users by posting on the forum prior to this expiry date.

3. Programmatic access

If you want to automate access to the datasets, or you already have scripts that do this, they will need adjusting. The BLOB storage endpoint to use is the same as above or you can just directly access the storage container via https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list.

This method is best if you want to set up scripts to automatically download files into your own system.

The date modified of files can be used to find the latest files. This process will normally be fine, although some intermediate files may get refreshed multiple times. In addition, we may infrequently regenerate entire sets of files as we improve the data quality and align formats. We’ll aim to notify you on this forum of any such changes ahead of time.

Example 1: Access more list information

Each list request will contain a maximum of 5000 blobs. If there are more than 5000 blobs a marker value is included in the ‘NextMarker’ element at the end of the XML response. To return the next set of results, pass the value returned in the NextMarker tag as the marker parameter in the request URI.

https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list&marker=2!144!MDAwMDY0IURhdGFzZXRzL1dob2xlc2FsZS9CaWRzQW5kT2ZmZXJzL09mZmVycy8yMDE2LzIwMTYxMjEwX09mZmVycy5jc3YhMDAwMDI4ITk5OTktMTItMzFUMjM6NTk6NTkuOTk5OTk5OVoh

Example 2: Access to a folder

Alternatively, you can narrow your search to a specific folder by using the 'prefix' parameter in the request URI.

https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list&prefix=Datasets/Wholesale/BidsAndOffers/Bids/2018

Example 3: Access to a file

https://emidatasets.blob.core.windows.net/publicdata/Datasets/Wholesale/BidsAndOffers/Bids/2018/20181015_Bids.csv

Further assistance is available via the links below:

REST reference to access AZURE BLOB storage: https://docs.microsoft.com/en-us/rest/api/storageservices/Blob-Service-REST-API 

The following articles offer specific guidance in your language of choice:

Please note that the testing sites discussed in the previous post will be removed on 8 November 2018 along with the FTP access.

Order by: Standard | Newest | Votes
geoff_ey posted this 26 September 2019

Hi Oliver, that was the issue I was filtering for text/csv! I get all the files now. Thanks so much for your help.

 

Geoff.

Oliver Butt posted this 25 September 2019

Hi Geoff, The earlier files were uploaded to Azure storage as part of the migration to Azure and got given the content type "text/csv". Files uploaded after October 2018 were created by scheduled jobs and got the content type "application/octet-stream". This is the cause of the differing icons.

If I run the code below for Generation_MD I get files for 2019, can you confirm that you do too?


from azure.storage.blob import BlockBlobService

service = BlockBlobService(account_name="emidatasets")

blobs = service.list_blobs("publicdata", "Datasets/Wholesale/Generation/Generation_MD/2019")

for b in blobs:

   print(b.name)


 

Thanks,

Oliver

 

 

geoff_ey posted this 25 September 2019

Hi Matt,

I'm using the REST API detailed above to acces EMI, through Python. I'm trying to retrieve URLs to datafiles in the EMI. When retrieving URLs for "Final Prices", according to the API, the last file available is for July 2018. This is the last available month for Reserve Prices as well. I also tested against generation data. In this case, the last URL provided is 201812_Generation_MD.csv. However, 201809_Generation_MD.csv, 201810_Generation_MD.csv and 201811_Generation_MD.csv data is missing! I've tried to access the missing files through the Azure Storage Explorer and I can see that they exist but they have a different icon next to them - maybe a clue as to why they aren't appearing through the API call. I've attaced a screenshot below.

Do you know what this means Matt? Am I meant to be providing some other parameters to my call? Please advise.

Thanks,

Geoff

.

davidw posted this 09 November 2018

Thanks. I'll also keep an eye out for the NextMarker element.

Matthew Keir posted this 05 November 2018

Hi Dave,

That was news to me too. Yes, you are correct - 5000 blobs per list request. You'll need to use the 'NextMarker' element at the end of the XML in the first list in your next list request URI. I've edited the original post to include a brief description so all the info for users is in one place.

More info is in the REST reference to access AZURE BLOB storage available via the link in the post.

Hope that helps.

Matthew

davidw posted this 05 November 2018

Hi Matthew,

I'm downloading the full XML index (https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list), and then parsing that to get the addresses of the files I want, but it doesn't seem to include all files. It's mostly files of the form "HydrologicalModellingDataset", "BidsAndOffers" or "AncillaryServices".

Any idea why that might be? Is there, perhaps, a limit on how many results get returned when looking at the whole structure? Or am I doing something stupid?

Cheers,

Dave

Edit - I've done a workaround by getting the XML file for each specific folder, and that seems to come under the record limit. But I'll leave this comment here in case anyone else had the same question.

Matthew Keir posted this 24 October 2018

Hi Josh,

Did you check the right location? I see you had a discussion with Phil earlier about the change for case files (https://www.emi.ea.govt.nz/Forum/thread/final-pricing-spd-case-files-and-vspd-gdx-files-location-about-to-change/)

In the Azure storage:

Yesterday's file is: https://emidatasets.blob.core.windows.net/publicdata/Datasets/Wholesale/FinalPricing/CaseFiles/2018/MSS_211112018101100014_0X.ZIP

The whole folder is: https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list&prefix=Datasets/Wholesale/FinalPricing/CaseFiles/2018

We're still experiencing occasional issues with the latest modified date being overwritten. We should have this sorted soon.

Cheers,

Matthew

 

Josh Smith posted this 22 October 2018

Hi Matthew,

Just wondering if or when the SPD raw case files will be moved over to the Azure Storage platform?

I can't see any case files when I go to this XML page "https://emidatasets.blob.core.windows.net/publicdata?restype=container&comp=list"

Thanks in advance!

Josh

Close