Important message for EMI FTP users

  • 1.3K Views
  • Last post 16 October 2018
Phil Bishop posted this 17 August 2018

Towards the end of September 2018, our EMI FTP server for direct access to EMI datasets will be decommissioned. Nearer the time, we will advertise on this forum the precise decommissioning date. We will also provide guidance on how to gain access to the Azure storage account that will replace the EMI FTP server and fileshare.

In the meantime, if you have any questions about the decommissioning of our EMI FTP server, please post to this discussion.

For quite some time now, we have been planning to migrate our data warehouse and EMI platform to Microsoft Azure, a public cloud facility. The challenges in doing so have been many and the hurdles have been high. But we're almost there and we can see the light at the end of the tunnel.

The following options provide high-level details on a number of ways to access the outbound storage account. A few random files have been placed in the account solely for testing purposes.

Option 1: Direct manual storage account access

The storage is quite easily accessed publicly if you're familiar with Azure Storage. Navigating to this address will provide an XML document detailing the contents of the storage account:

http://datasets.emi.ea.govt.nz/publicdata?restype=container&comp=list

Any of the URL elements can be located and downloaded directly. For example:

http://datasets.emi.ea.govt.nz/publicdata/Market structure/20180630_MarketShareTrendsByNetworkReportingRegion.csv

Option 2: Scripted access

A small Powershell script (or Python or any other scripting language) can be used to download files from storage… essentially automating the Option 1 process above.

Option 3: Web-based access

A single-page website displays the contents of the storage account and provides links to each file.

https://maoutstoreprd.z8.web.core.windows.net/

 Option 4: SAS token access

Use a tool like Azure Storage Explorer and connect using a Shared Access connection URL. For example:

https://maoutstoreprd.blob.core.windows.net/publicdata?st=2018-08-09T03%3A05%3A42Z&se=2020-08-10T03%3A05%3A00Z&sp=rl&sv=2018-03-28&sr=c&sig=oYh9vJ%2FJlaFO8YVu7dHqpKHGSLOPk2yL7WQVqlOT7II%3D

Option 4 will return an error at this stage as we have not generated the necessary tokens. In fact we've yet to decide whether we will make this option available at all. It's in the mix as an option to connect to the storage account using an Azure Storage explorer tool. The SAS token is a Shared Access Signature which grants read-only rights into the storage for a fixed period of time to anyone with the URL.

Order by: Standard | Newest | Votes
NigelCByron posted this 20 August 2018

Thanks for the heads up, though we are not happy with only 6 weeks notice of a significant change for our automated systems.

Q1. Can we request that the deadline be extended, if it proves to be unachievable within our current work schedule ?

Q2. Will the storage account hold multiple versions of the same file (as per the FTP site folder structure) ?

Q3. Will the ' XML document detailing the contents of the storage account:' return ALL files in the account or only the LATEST file ?

Q4. If Q2 or Q3 result in only one version of each file being available, what facilities will exist for us to retrieve any missed files due to communication or systems failure?

Q5. To enable requirements development, it would be extremely helpful to have the Storage Account mirroring Production files as soon as possible.  This would need to be by end of August at latest to enable us to fit in a normal specification, development, test and deploy cycle.

Q6. Are the folder structure and file naming conventions going to be maintained?

Regards

Nigel Byron

  • Liked by
  • msouness
Phil Bishop posted this 21 August 2018

Nigel - can you please clarify what you mean when you say 'multiple versions of the same file' and, similarly, 'only one version of each file being available'?

I am not aware that we ever put multiple versions of the same file on the FTP server.

Thanks, Phil

NigelCByron posted this 23 August 2018

Phil - I mean that FTP holds a sequence of a particular file eg. 20180801_Final_FK_offers.xml, 20180802_Final_FK_offers.xml, 20180803_Final_FK_offers.xml etc

There will be a number of the last few files... we currently deteremine which we have not yet downloaded and retrieve them.

I am trying to determine if the new storage will have a history sequence of each file (for say last 10 days) and whether The 'contents document' will list every file in sequence or just the latest one.?

With the storage account we have to use the specific URL for each file , so we are trying to figure out how we can automate the search for new files and download them.

Nigel

Phil Bishop posted this 23 August 2018

Nigel - we've dumped hundreds of files from the EMI FTP server into the Azure storage account. Hopefully this better facilitates your own automation and testing efforts.

Regarding some of your original questions, I can state the following at this time:

Q2: Yes, the Azure storage account will have the same direcotry structure as the current FTP server does. All files with the same root name but different dates will all exist in a folder, e.g. 20180801_Final_FK_offers.xml, 20180802_Final_FK_offers.xml, and 20180803_Final_FK_offers.xml will all be found in the same folder.

Q3: See the URLs under Option 1 of my original post. The first one will return an XML file with the entire contents of the Azure storage account - all files in all folders and subfolders. I would've thought parsing this XML file to find what yo want is not the way to go. Rather, you'd want to figure out specifically what files you're looking for, or the folder you're interested in, and specifically call that, e.g. the second URL in Option 1.

Q4: I don't think this questions matters. If you're looking for something and it's not there, look again. We will only ever put files in the Azure storage account; we won't remove them (unless something goes wrong and they're corrupted in which case we'd replace with corrected versions).

Q6: Yes. Although in a move unrelated to the Azure migration, we are tweaking/rebuilding some of the legacy processes for publishing data. This may result in minor changes to folder names, file names etc as we try to make everything conform to a predetermined set of standards/conventions. Any such changes will be signalled well in advance and would run in parallel with the legacy arrangements for some weeks.

Thanks, Phil

NigelCByron posted this 13 September 2018

 Phil

1. We assume that you will never delete any files, so the number of files per folder will continually increase.

2. We would prefer to download all files (of a particular name) on the basis of the Modified Date Time (files modified since we last found one). This will be impractical with Option 1 returning an ever increasing sized file of all documents!

3. Will you be following the practice on FTP whereby you 'hide' files after approx. 30 days ?

4. WIll you be future transitioning all files to the same sort of structure as FrequencyKeeping\FInalOffers\ where they are grouped by Year?

5. Can Option 1 be provided with parameters to define say 'Folder path' and 'modified date range' , or 'exclude archives' ?

6. Our reading of your replies is that current files (XML, CSV etc) will remain in the same format for now ?

7. We currently download

/Datasets/Wholesale/Ancillaryservices/Frequencykeeping/Finaloffers/*Final_FK_offers*.XML

/Datasets/Wholesale/Ancillaryservices/Frequencykeeping/Dispatchedoffers/*offers_dispatched*.XML

/Datasets/Wholesale/Ancillaryservices/Frequencykeeping/Constrainedonoffcosts/*constrainedonoffcosts.csv

/Datasets/Wholesale/MappingsAndGeospatial/NetworkSupplyPointsTable/*NetworksupplyPointsTable.csv

/Datasets/Wholesale/Final_Pricing/Cleared_Offers/*Cleared_Offers*F.csv

/Datasets/Wholesale/Generation/Generation_MD/*_Generation_MD.csv

/Datasets/Wholesale/Ancillaryservices/Frequencykeeping/Finaloffers/2018/*_Final_offers.csv

8. Are you contemplating changing any of the above to allow Rest API download - like /Datasets/Retail/Marketstructure/*_MarketsharetrendsbyrootNSP.csv

Regards

Nigel

 

Phil Bishop posted this 18 September 2018

Nigel

We intend synchronising the FTP server (ftp://emiftp.ea.govt.nz) with the Azure outbound storage account for 30 days after we go live in Azure. That means you should have plenty of time after we cutover to Azure to keep using your legacy arrangements to obtain files from our FTP server while you finalise how you will interact with the Azure storage account.

I should have a better idea of the cutover date after a meeting tomorrow afternoon. At this stage I'd say it is likely 25 or 27 Sept 2018.

I will come back to all the specific questions you have asked in this discussion and by emailbut in the meantime I can share the following:

  • There is a 'prefix' parameter which can be used to filter the list of results returned. For example, http://datasets.emi.ea.govt.nz/publicdata?restype=container&comp=list&prefix=Datasets/Wholesale/AncillaryServices/FrequencyKeeping/FinalOffers
  • We definitely will not be providing keys to the storage account, as anyone with the keys can do whatever they like to the contents of the storage accounts
  • The SAS tokens provide a limited access scope (e.g. read only) so we can and will provide SAS tokens to facilitate access for those users wishing to use the Storage Explorer tool.

Hope this helps,

Phil

 

Matthew Keir posted this 16 October 2018

Followers of this discussion should read the latest instructions in the following post: New access arrangements to EMI datasets (retirement of anonymous FTP)

Thanks to those users who have provided feedback on the methods.