Hi,
I currently have about 9000 zip files, with each of those zip files containing roughly 5000-10000 zip files and each of those zip files contain about 1000 csv files.
I need to strip mine some data from those csv files. Preferably using C++. I don't want to be doing any writing to disk cause that may take a while.
Currently, I'm using libzip
Code (dodgy hack) so far:
Code:
int readMasterZipFile(string zipFile)
{
struct zip *z;
struct zip_stat st;
struct entry *e;
int numberOfFiles;
int n;
int err;
char errstr[1024];
off_t size;
unsigned int crc;
char b[8192];
struct zip_file *zf;
if ((z=zip_open(zipFile.c_str(), 0, &err)) == NULL)
{
zip_error_to_str(errstr, sizeof(errstr), err, errno);
fprintf(stderr, "Cannot open zip archive `%s': %s\n", zipFile.c_str(), errstr);
return -1;
}
numberOfFiles = zip_get_num_files(z);
fprintf(stdout, "The number of files in zip = %d\n", numberOfFiles);
// Stop testing with all the files - just one for now
int test = 1;//numberOfFiles;
for ( int j = 0; j < test; j++ )
{
//zip_stat_index(z, j, 0, &st);
// Need to open this zip file
zf = zip_fopen_index(z, j, ZIP_FL_COMPRESSED);
while ((n=zip_fread(zf, b, sizeof(b))) > 0)
{
}
zip_fclose(zf);
}
// Close this zip file
zip_close(z);
// For now, useless return
return 1;
}
My problem is that I need to read the archive within the archive. I can read csv files straight out of a single zip.
Anybody offer any solutions/ideas/alternatives?